/
Recognition Part II: Face Detection via Recognition Part II: Face Detection via

Recognition Part II: Face Detection via - PowerPoint Presentation

davies
davies . @davies
Follow
0 views
Uploaded On 2024-03-13

Recognition Part II: Face Detection via - PPT Presentation

AdaBoost Linda Shapiro CSE 455 1 Whats Coming The basic AdaBoost algorithm next The Viola Jones face d etector features The modified AdaBoost algorithm that is used in ViolaJones face detection ID: 1047184

feature face weak classifier face feature classifier weak training features threshold positive sum false jones faces viola adaboost detection

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Recognition Part II: Face Detection via" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. RecognitionPart II: Face Detection via AdaBoostLinda ShapiroCSE 4551

2. What’s ComingThe basic AdaBoost algorithm (next)The Viola Jones face detector featuresThe modified AdaBoost algorithm that is used in Viola-Jones face detectionHW 42

3. Learning from weighted dataConsider a weighted datasetD(i) – weight of i th training example (xi,yi)Interpretations:i th training example counts as if it occurred D(i) timesIf I were to “resample” data, I would get more samples of “heavier” data pointsNow, always do weighted calculations:e.g., MLE for Naïve Bayes, redefine Count(Y=y) to be weighted count:where δ(P) = 1 when P is true else 0, andsetting D(j)=1 (or any constant like 1/n) for all j, will recreates unweighted case3sample class weight1.5 2.6 I 1/22.3 8.9 II 1/2

4. AdaBoost OverviewInput is a set of training examples (Xi, yi) i = 1 to m.We are going to train a sequence of weak classifiers, such as decision trees, neural nets or SVMs. Weak because not as strong as the final classifier.The training examples will have weights, initially all equal.At each step, we use the current weights, train a new classifier, and use its performance on the training data to produce new weights for the next step.But we keep ALL the weak classifiers.When it’s time for testing on a new feature vector, we will combine the results from all of the weak classifiers.4

5. Idea of Boosting(from AI text)5

6. How to choose Many possibilities. Will see one shortly!Final Result: linear sum of “base” or “weak” classifier outputs.labeled trainingdatastart with equal weightsupdateweights6weight at time t+1 for sample im samples 2 classessum over m samples

7. errorαt is a weight for weak learner ht.7δ(P) = 1 when P is true else 0

8. Boosting results – Digit recognitionBoosting:Seems to be robust to overfittingTest error can decrease even after training error is zero!!![Schapire, 1989]Test errorTraining error8

9. Boosting generalization error boundConstants:T: number of boosting roundsHigher T  Looser bound, what does this imply?d: VC dimension of weak learner, measures complexity of classifier Higher d  bigger hypothesis space  looser boundm: number of training examplesmore data  tighter bound[Freund & Schapire, 1996]9The VC dimension (for Vapnik–Chervonenkis dimension) is a measure of the capacity (complexity) of a statistical classification algorithm.

10. Boosting and Logistic RegressionLogistic regression equivalent to minimizing log loss:Boosting minimizes similar loss function:10

11. Face detectionState-of-the-art face detection demo(Courtesy Boris Babenko)11

12. Face detection and recognitionDetectionRecognition“Sally”12

13. Face detectionWhere are the faces? 13

14. Face DetectionWhat kind of features?What kind of classifiers?14

15. Image Features“Rectangle filters”Value = ∑ (pixels in white area) – ∑ (pixels in black area)+1-115People call them Haar-like features,since similar to 2D Haar wavelets.

16. Feature extractionK. Grauman, B. LeibeFeature output is difference between adjacent regionsViola & Jones, CVPR 2001Efficiently computable with integral image: any sum can be computed in constant timeAvoid scaling images scale features directly for same cost“Rectangular” filters16

17. Recall: Sums of rectangular regions2432392402252061851882182112062162252422392181106731341522132062082212432421235894821327710820820821523521711521224323624713991209208211233208131222219226196114742082132142322171311167715069565220122822323223218218618417915912393232235235232236201154216133129811752522412402352382301281721386563234249241245237236247143597810942552482472512342372451935533115144213255253251248245161128149109138654715623925519010739102947311458177511372332331481682031794327171281726121602552551092226193524How do we compute the sum of the pixels in the red box?After some pre-computation, this can be done in constant time for any box.This “trick” is commonly used for computing Haar wavelets (a fundemental building block of many object recognition approaches.)

18. Sums of rectangular regionsThe trick is to compute an “integral image.” Every pixel is the sum of its neighbors to the upper left.Sequentially compute using:

19. Sums of rectangular regionsABCDSolution is found using:A + D – B - CWhat if the position of the box lies between pixels?Use bilinear interpolation.

20. Large library of filtersConsidering all possible filter parameters: position, scale, and type: 160,000+ possible features associated with each 24 x 24 windowUse AdaBoost both to select the informative features and to form the classifierViola & Jones, CVPR 200120

21. Feature selectionFor a 24x24 detection region, the number of possible rectangle features is ~160,000! At test time, it is impractical to evaluate the entire feature set Can we create a good classifier using just a small subset of all possible features?How to select such a subset?21

22. AdaBoost for feature+classifier selectionWant to select the single rectangle feature and threshold that best separates positive (faces) and negative (non-faces) training examples, in terms of weighted error.Outputs of a possible rectangle feature on faces and non-faces.…Resulting weak classifier:For next round, reweight the examples according to errors, choose another filter/threshold combo.Viola & Jones, CVPR 200122θt is a threshold for classifier ht0

23. Weak ClassifiersEach weak classifier works on exactly one rectangle feature.Each weak classifier has 3 associated variablesits threshold θits polarity p its weight αThe polarity can be 0 or 1The weak classifier computes its one feature fWhen the polarity is 1, we want f > θ for faceWhen the polarity is 0, we want f < θ for faceThe weight will be used in the final classification by AdaBoost.23h(x) = 1 if p*f(x) < pθ, else 0used for the combination stepThe codedoes notactuallycompute h.

24. AdaBoost: IntuitionK. Grauman, B. LeibeFigure adapted from Freund and SchapireConsider a 2-d feature space with positive and negative examples.Each weak classifier splits the training examples with at least 50% accuracy.Examples misclassified by a previous weak learner are given more emphasis at future rounds.24

25. AdaBoost: IntuitionK. Grauman, B. Leibe25

26. AdaBoost: IntuitionK. Grauman, B. LeibeFinal classifier is combination of the weak classifiers26

27. Final classifier is combination of the weak ones, weighted according to error they had.27βt = εt / (1- εt): the training error of the classifier ht

28. AdaBoost Algorithm modified by Viola JonesFind the best threshold and polarity for each feature, and return error.Re-weight the examples:Incorrectly classified -> more weightCorrectly classified -> less weight{x1,…xn}For T rounds:28NOTE: Our code uses equal weightsfor all samplessum over training samplesmeaning we will construct T weak classifiersNormalize weights

29. RecallClassificationNearest NeighborNaïve BayesDecision Trees and ForestsLogistic RegressionBoosting.... Face DetectionSimple FeaturesIntegral ImagesBoostingABCD29

30. Picking the (threshold for the) best classifierEfficient single pass approach:At each sample compute:Find the minimum value of , and use the value of the corresponding sample as the threshold. = min ( S + (T – S), S + (T – S) )S = sum of samples with feature value below the current sampleT = total sum of all samplesS and T are for faces; S and T are for background.30

31. Picking the threshold for the best classifierEfficient single pass approach:At each sample compute:Find the minimum value of , and use the value of the corresponding sample as the threshold. = min ( S + (T – S), S + (T – S) )S = sum of weights of samples with feature value below the current sampleT = total sum of all samplesS and T are for faces; S and T are for background.31The features are actually sortedin the code according to numericvalue!

32. Picking the threshold for the best classifierAt each sample, add weight to FS or BG and compute:Find the minimum value of e, and use the feature value of the corresponding sample as the threshold. 32The features for the training samples are actually sorted in the code according to numeric value!Algorithm:find AFS, the sum of the weights of all the face samplesfind ABG, the sum of the weights of all the background samplesset to zero FS, the sum of the weights of face samples so farset to zero BG, the sum of the weights of background samples so fargo through each sample s in a loop IN THE SORTED ORDER= min (BG + (AFS – FS), FS + (ABG – BG))

33. What’s going on?33error = min (BG + (AFS – FS), FS + (ABG –BG)) left rightLet’s pretend the weights on the samples are all 1’s.The samples are arranged in a sorted order by feature value and we know which ones are faces (f) and background (b).Left is the number of background patches so far plus the number of faces yet to be encountered.Right is the number of faces so far plus the number of background patches yet to be encountered. b b b f b f f b f f(6,4) (7,3) (8,2) (7,3) (8,2) (7,3) (4,4) (7,3) (6,4) (5,5) 4 3 2 3 2 3 4 3 4 51+5-00+5-1

34. Measuring classification performanceConfusion matrixAccuracy(TP+TN)/(TP+TN+FP+FN)True Positive Rate=RecallTP/(TP+FN)False Positive RateFP/(FP+TN)PrecisionTP/(TP+FP)F1 Score2*Recall*Precision/(Recall+Precision)Predicted classClass1Class2Class3ActualclassClass14016Class23257Class34910PredictedPositiveNegativeActualPositiveTrue PositiveFalse NegativeNegativeFalse PositiveTrue Negative34

35. Boosting for face detectionFirst two features selected by boosting:This feature combination can yield 100% detection rate and 50% false positive rate35

36. Boosting for face detectionA 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14084Is this good enough?Receiver operating characteristic (ROC) curve36

37. Attentional cascade (from Viola-Jones)This part will be extra credit for HW4We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windowsPositive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so onA negative outcome at any point leads to the immediate rejection of the sub-windowFACEIMAGESUB-WINDOWClassifier 1TClassifier 3TFNON-FACETClassifier 2TFNON-FACEFNON-FACE37

38. Attentional cascadeChain of classifiers that are progressively more complex and have lower false positive rates:vs false neg determined by% False Pos% Detection0 500 100FACEIMAGESUB-WINDOWClassifier 1TClassifier 3TFNON-FACETClassifier 2TFNON-FACEFNON-FACEReceiver operating characteristic38

39. Attentional cascadeThe detection rate and the false positive rate of the cascade are found by multiplying the respective rates of the individual stagesA detection rate of 0.9 and a false positive rate on the order of 10-6 can be achieved by a 10-stage cascade if each stage has a detection rate of 0.99 (0.9910 ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6×10-6) FACEIMAGESUB-WINDOWClassifier 1TClassifier 3TFNON-FACETClassifier 2TFNON-FACEFNON-FACE39

40. Training the cascadeSet target detection and false positive rates for each stageKeep adding features to the current stage until its target rates have been met Need to lower AdaBoost threshold to maximize detection (as opposed to minimizing total classification error)Test on a validation setIf the overall false positive rate is not low enough, then add another stageUse false positives from current stage as the negative training examples for the next stage40

41. Viola-Jones Face Detector: SummaryTrain with 5K positives, 350M negativesReal-time detector using 38 layer cascade6061 features in final layer[Implementation available in OpenCV: http://www.intel.com/technology/computing/opencv/]FacesNon-facesTrain cascade of classifiers with AdaBoostSelected features, thresholds, and weightsNew imageApply to each subwindow41

42. The implemented systemTraining Data5000 facesAll frontal, rescaled to 24x24 pixels300 million non-faces9500 non-face imagesFaces are normalizedScale, translationMany variationsAcross individualsIlluminationPose42

43. System performanceTraining time: “weeks” on 466 MHz Sun workstation38 layers, total of 6061 featuresAverage of 10 features evaluated per window on test set“On a 700 Mhz Pentium III processor, the face detector can process a 384 by 288 pixel image in about .067 seconds” 15 Hz15 times faster than previous detector of comparable accuracy (Rowley et al., 1998)43

44. Non-maximal suppression (NMS)Many detections above threshold.44

45. Non-maximal suppression (NMS)45

46. Similar accuracy, but 10x fasterIs this good?46

47. Viola-Jones Face Detector: Results47

48. Viola-Jones Face Detector: Results48

49. Viola-Jones Face Detector: Results49

50. Detecting profile faces?Detecting profile faces requires training separate detector with profile examples.50

51. Paul Viola, ICCV tutorialViola-Jones Face Detector: Results51

52. Summary: Viola/Jones detectorRectangle featuresIntegral images for fast computationBoosting for feature selectionAttentional cascade for fast rejection of negative windows52