/
Supervised Descent Method and its Applications to Face Alignment Xuehan Xiong Fe Supervised Descent Method and its Applications to Face Alignment Xuehan Xiong Fe

Supervised Descent Method and its Applications to Face Alignment Xuehan Xiong Fe - PDF document

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
501 views
Uploaded On 2014-10-03

Supervised Descent Method and its Applications to Face Alignment Xuehan Xiong Fe - PPT Presentation

cmuedu ftorrecscmuedu Abstract Many computer vision problems eg camera calibra tion image alignment structure from motion are solved through a nonlinear optimization method It is generally accepted that nd order descent methods are the most ro bust f ID: 2510

cmuedu ftorrecscmuedu Abstract Many computer

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Supervised Descent Method and its Applic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

However,whenapplyingNewton'smethodtocomputervisionproblems,threemainproblemsarise:(1)TheHes-sianispositivedeniteatthelocalminimum,butitmightnotbepositivedeniteelsewhere;therefore,theNewtonstepsmightnotbetakeninthedescentdirection.(2)New-ton'smethodrequiresthefunctiontobetwicedifferen-tiable.Thisisastrongrequirementinmanycomputervi-sionapplications.Forinstance,considerthecaseofimagealignmentusingSIFT[21]features,wheretheSIFTcanbeseenasanon-differentiableimageoperator.Inthesecases,wecanestimatethegradientortheHessiannumerically,butthisistypicallycomputationallyexpensive.(3)Thedimen-sionoftheHessianmatrixcanbelarge;invertingtheHes-sianrequiresO(p3)operationsandO(p2)inspace,wherepisthedimensionoftheparametertoestimate.AlthoughexplicitinversionoftheHessianisnotneededusingQuasi-NetwonmethodssuchasL-BFGS[9],itcanstillbecom-putationallyexpensivetousethesemethodsincomputervi-sionproblems.Inordertoaddresspreviouslimitations,thispaperproposesaSupervisedDescentMethod(SDM)thatlearnsthedescentdirectionsinasupervisedmanner.Fig.1illustratesthemainideaofourmethod.ThetopimageshowstheapplicationofNewton'smethodtoaNon-linearLeastSquares(NLS)problem,wheref(x)isanon-linearfunctionandyisaknownvector.Inthiscase,f(x)isanon-linearfunctionofimagefeatures(e.g.,SIFT)andyisaknownvector(i.e.,template).xrepresentsthevectorofmotionparameters(i.e.,rotation,scale,non-rigidmo-tion).ThetraditionalNewtonupdatehastocomputetheHessianandtheJacobian.Fig.1billustratesthemainideabehindSDM.Thetrainingdataconsistsofasetoffunc-tionsff(x;yi)gsampledatdifferentlocationsyi(i.e.,dif-ferentpeople)wheretheminimafxigareknown.Usingthistrainingdata,SDMlearnsaseriesofparameterupdates,whichincrementally,minimizesthemeanofallNLSfunc-tionsintraining.InthecaseofNLS,suchupdatescanbedecomposedintotwoparts:asamplespeciccomponent(e.g.,yi)andagenericdescentdirectionsRk.SDMlearnsaveragedescentdirectionsRkduringtraining.Intesting,givenanunseeny,anupdateisgeneratedbyprojectingy-speciccomponentsontothelearnedgenericdirectionsRk.WeillustratethebenetsofSDMonanalyticfunc-tions,andintheproblemoffacialfeaturedetectionandtracking.WeshowhowSDMimprovesstate-of-the-artperformanceforfacialfeaturedetectionintwo“faceinthewild”databases[26,4]anddemonstrateextremelygoodperformancetrackingfacesintheYouTubecelebritydatabase[20].2.PreviousworkThissectionreviewspreviousworkonfacealignment.ParameterizedAppearanceModels(PAMs),suchasActiveAppearanceModels[11,14,2],MorphableMod-els[6,19],Eigentracking[5],andtemplatetracking[22,30]buildanobjectappearanceandshaperepresentationbycomputingPrincipalComponentAnalysis(PCA)onasetofmanuallylabeleddata.Fig.2aillustratesanimagelabeledwithplandmarks(p=66inthiscase).AftertheimagesarealignedwithProcrustes,theshapemodelislearnedbycom-putingPCAontheregisteredshapes.Alinearcombinationofksshapebasis,Us22pkscanreconstruct(approxi-mately)anyalignedshapeinthetrainingset.Similarly,anappearancemodel,Ua2mka,isbuiltbyperformingPCAonthetexture.Alignmentisachievedbyndingthemotionparameterpandappearancecoefcientscathatbestalignstheimagew.r.t.thesubspaceUa,i.e.,min.ca;pjjd(f(x;p))�Uacajj22;(2)x=[x1;y1;:::xl;yl]&#x]TJ/;༐ ;.97;8 T; 18;&#x.733;&#x 3.6; T; [0;isthevectorcontainingthecoor-dinatesofthepixelstodetect/track.f(x;p)representsageometrictransformation;thevalueoff(x;p)isavec-tordenotedby[u1;v1;:::;ul;vl]&#x]TJ/;༐ ;.97;8 T; 18;&#x.733;&#x 3.6; T; [0;.d(f(x;p))istheap-pearancevectorofwhichtheithentryistheintensityofimagedatpixel(ui;vi).Forafneandnon-rigidtransformations,(ui;vi)relatesto(xi;yi)byuivi=a1a2a4a5xsiysi+a3a6:Here[xs1;ys1;:::xsl;ysl]&#x]TJ/;༐ ;.97;8 T; 18;&#x.733;&#x 3.6; T; [0;= x+Uscs,where xisthemeanshapeface.a;csareafneandnon-rigidmotionparametersrespectivelyandp=[a;cs].Givenanimaged,PAMsalignmentalgorithmsopti-mizeEq.2.Duetothehighdimensionalityofthemo-tionspace,astandardapproachtoefcientlysearchovertheparameterspaceistousetheGauss-Newtonmethod[5,2,11,14]bydoingaTaylorseriesexpansiontoapproxi-mated(f(x;p+p))d(f(x;p))+Jd(p)p;whereJd(p)=@d(f(x;p)) @pistheJacobianoftheimagedw.r.t.tothemotionparameterp[22].Discriminativeapproacheslearnamappingfromim-agefeaturestomotionparametersorlandmarks.Cootesetal.[11]proposedtotAAMsbylearningalinearre-gressionbetweentheincrementofmotionparameterspandtheappearancedifferencesd.ThelinearregressorisanumericalapproximationoftheJacobian[11].Fol-lowingthisidea,severaldiscriminativemethodsthatlearnamappingfromdtophavebeenproposed.GradientBoosting,rstintroducedbyFriedman[16],hasbecomeoneofthemostpopularregressorsinfacealignmentbe-causeofitsefciencyandtheabilitytomodelnonlinear-ities.SaragihandG¨ocke[27]andTresadernetal.[29]showedthatusingboostedregressionforAAMdiscrimi-nativettingsignicantlyimprovedovertheoriginallin-earformulation.Doll´aretal.[15]incorporated“posein-dexedfeatures”totheboostingframework,wherenotonly (a)x(b)x0Figure2:a)Manuallylabeledimagewith66landmarks.Blueoutlineindicatesfacedetector.b)Meanlandmarks,x0,initializedusingthefacedetector.anewweakregressorislearnedateachiterationbutalsothefeaturesarere-computedatthelatestestimateofthelandmarklocation.Beyondthegradientboosting,RiveraandMartinez[24]exploredkernelregressiontomapfromimagefeaturesdirectlytolandmarklocationachievingsur-prisingresultsforlow-resolutionimages.Recently,Cootesetal.[12]investigatedRandomForestregressorsinthecon-textoffacealignment.Atthesametime,S´anchezetal.[25]proposedtolearnaregressionmodelinthecontinuousdo-maintoefcientlyanduniformlysamplethemotionspace.Inthecontextoftracking,Zimmermannetal.[32]learnedasetofindependentlinearpredictorfordifferentlocalmotionandthenasubsetofthemischosenduringtracking.Part-baseddeformablemodelsperformalignmentbymaximizingtheposteriorlikelihoodofpartlocationsgivenanimage.Theobjectivefunctioniscomposedofthelocallikelihoodofeachparttimesaglobalshapeprior.Differ-entmethodstypicallyvarytheoptimizationmethodsortheshapeprior.ConstrainedLocalModels(CLM)[13]modelthispriorsimilarlyasAAMsassumingallfaceslieinalin-earsubspaceexpandedbyPCAbases.Saragihetal.[28]proposedanon-parametricrepresentationtomodelthepos-teriorlikelihoodandtheresultingoptimizationmethodisreminiscentofmean-shift.In[4],theshapepriorwasmodelednon-parametricallyfromtrainingdata.Recently,Saragih[26]derivedasamplespecicpriortoconstraintheoutputspacethatsignicantlyimprovesovertheorig-inalPCAprior.Insteadofusingaglobalmodel,Huangetal.[18]proposedtobuildseparateGaussianmodelsforeachpart(e.g.,mouth,eyes)topreservemoredetailedlocalshapedeformations.ZhuandRamanan[31]assumedthatthefaceshapeisatreestructure(forfastinference),andusedapart-basedmodelforfacedetection,poseestimation,andfacialfeaturedetection.3.SupervisedDescentMethod(SDM)ThissectiondescribestheSDMinthecontextoffacealignment,anduniesdiscriminativemethodswithPAMs.3.1.DerivationofSDMGivenanimaged2m1ofmpixels,d(x)2p1indexesplandmarksintheimage.hisanon-linearfeatureextractionfunction(e.g.,SIFT)andh(d(x))2128p1inthecaseofextractingSIFTfeatures.Duringtraining,wewillassumethatthecorrectplandmarks(inourcase66)areknown,andwewillrefertothemasx(seeFig.2a).Also,toreproducethetestingscenario,weranthefacedetectoronthetrainingimagestoprovideaninitialcongurationofthelandmarks(x0),whichcorrespondstoanaverageshape(seeFig.2b).Inthissetting,facealignmentcanbeframedasminimizingthefollowingfunctionoverxf(x0+x)=kh(d(x0+x))�k22;(3)where=h(d(x))representstheSIFTvaluesinthemanuallylabeledlandmarks.Inthetrainingimages,andxareknown.Eq.3hasseveralfundamentaldifferenceswithpreviousworkonPAMsinEq.2.First,inEq.3wedonotlearnanymodelofshapeorappearancebeforehandfromtrain-ingdata.Wealigntheimagew.r.t.atemplate.Fortheshape,ourmodelwillbeanon-parametricone,andwewilloptimizethelandmarklocationsx22p1directly.RecallthatintraditionalPAMs,thenon-rigidmotionismodeledasalinearcombinationofshapebaseslearnedbycomputingPCAonatrainingset.Ournon-parametricshapemodelisabletogeneralizebettertountrainedsituations(e.g.,asym-metricfacialgestures).Second,weuseSIFTfeaturesex-tractedfrompatchesaroundthelandmarkstoachievearo-bustrepresentationagainstillumination.ObservethattheSIFToperatorisnotdifferentiableandminimizingEq.3usingrstorsecondordermethodsrequiresnumericalap-proximations(e.g.,nitedifferences)oftheJacobianandtheHessian.However,numericalapproximationsareverycomputationallyexpensive.ThegoalofSDMistolearnaseriesofdescentdirectionsandre-scalingfactors(donebytheHessianinthecaseofNewton'smethod)suchthatitproducesasequenceofupdates(xk+1=xk+xk)startingfromx0thatconvergestoxinthetrainingdata.Now,onlyforderivationpurposes,wewillassumethathistwicedifferentiable.Suchassumptionwillbedroppedatalaterpartofthesection.SimilartoNewton'smethod,weapplyasecondorderTaylorexpansiontoEq.3as,f(x0+x)f(x0)+Jf(x0)&#x]TJ/;÷ 6;&#x.973; Tf;&#x 16.;؅ ;.61; Td;&#x [00;x+1 2x�H(x0)x;(4)whereJf(x0)andH(x0)aretheJacobianandHessianma-tricesoffevaluatedatx0.Inthefollowing,wewillomitx0tosimplifythenotation.Differentiating(4)withrespecttoxandsettingittozerogivesustherstupdateforx,x1=�H�1Jf=�2H�1J�h(0�);(5) Function TrainingSet TestSeth(x) yx=h�1(y) y sin(x) [-1:0.2:1]arcsin(y) [-1:0.05:1]x3 [-27:3:27]y1 3 [-27:0.5:27]erf(x) [-0.99:0.11:0.99]erf�1(y) [-0.99:0.03:0.99]ex [1:3:28]log(y) [1:0.5:28] Table1:ExperimentalsetupfortheSDMonanalyticfunctions.erf(x)istheerrorfunction,erf(x)=2 p Rx0e�t2dt.4.ExperimentsThissectionreportsexperimentalresultsonbothsyn-theticandrealdata.TherstexperimentcomparestheSDMwiththeNewtonmethodinfouranalyticfunctions.Inthesecondexperiment,wetestedtheperformanceoftheSDMintheproblemoffacialfeaturedetectionintwostandarddatabases.Finally,inthethirdexperimentweillustratehowthemethodcanbeappliedtofacialfeaturetracking.4.1.SDMonanalyticscalarfunctionsThisexperimentcomparestheperformanceinspeedandaccuracyoftheSDMagainsttheNewton'smethodonfouranalyticfunctions.TheNLSproblemthatweoptimizeis:minxf(x)=(h(x)�y)2;whereh(x)isascalarfunction(seeTable1)andyisagivenconstant.Observethatthe1stand2ndderivativesofthosefunctionscanbederivedanalytically.Assumethatwehaveaxedinitializationx0=candwearegivenasetoftrainingdatax=fxigni=1andy=fh(xi)gni=1.UnliketheSDMforfacealignment,inthiscasenobiastermislearnedsinceyisknownattestingtime.WetrainedtheSDMasexplainedinSec.3.2.ThetrainingandtestingsetupforeachfunctionareshowninTable1inMatlabnotation.Wehavechosenonlyinvertiblefunctions.Otherwise,foragivenymultipleso-lutionsmaybeobtained.Inthetrainingdata,theoutputvariablesyaresampleduniformlyinalocalregionofh(x),andtheircorrespondinginputsxarecomputedbyevaluat-ingyattheinversefunctionofh(x).Thetestdatayisgeneratedatanerresolutionthanintraining.Tomeasuretheaccuracyofbothmethods,wecomputedthenormalizedleastsquareresidualskxk�xk kxkattherst10steps.Fig.3showstheconvergencecomparisonbe-tweenSDMandNewtonmethod.Surprisingly,SDMcon-vergeswiththesamenumberofiterationasNewtonmethodbuteachiterationisfaster.Moreover,SDMismorerobustagainstbadinitializationsandill-conditions(f000).Forexample,whenh(x)=x3theNewtonmethodstartsfromasaddlepointandstaysthereinthefollowingiterations(ob-servethatintheFig.3theNewtonmethodstaysat1).In Figure3:Normalizederrorversusiterationsonfouranalytic(seeTable1)functionsusingtheNewtonmethodandSDM.thecaseofh(x)=ex,theNewtonmethoddivergesbe-causeitisill-conditioned.Notsurprisingly,whentheNew-tonmethodconvergesitprovidesmoreaccurateestimationthanSDM,becauseSDMusesagenericdescentdirection.Iffisquadratic(e.g.,hislinearfunctionofx),SDMwillconvergeinoneiteration,becausetheaveragegradienteval-uatedatdifferentlocationswillbethesameforlinearfunc-tions.Thiscoincideswithawell-knownfactthatNewtonmethodconvergesinoneiterationforquadraticfunctions.4.2.FacialfeaturedetectionThissectionreportsexperimentsonfacialfeaturedetec-tionintwo“faceinthewild”datasets,andcomparesSDMwithstate-of-the-artmethods.ThetwofacedatabasesaretheLFPWdataset1[4]andtheLFW-A&Cdataset[26].Theexperimentalsetupisasfollows.Firstthefaceisde-tectedusingtheOpenCVfacedetector[7].Theevaluationisperformedontheimagesinwhichafacecanbedetected.Thefacedetectionratesare96.7%onLFPWand98.7%onLFW-A&C,respectively.Theinitialshapeestimateisgivenbycenteringthemeanfaceatthenormalizedsquare.Thetranslationalandscalingdifferencesbetweentheinitialandtruelandmarklocationsarealsocomputed,andtheirmeansandvariancesareusedforgeneratingMonteCarlosamplesinEq.9.Wegenerated10perturbedsamplesforeachtrain-ingimage.SIFTdescriptorsarecomputedon3232localpatches.Toreducethedimensionalityofthedata,weper-formedPCApreserving98%oftheenergyontheimagefeatures.LFPWdatasetcontainsimagesdownloadedfromthewebthatexhibitlargevariationsinpose,illumination,andfacialexpression.Unfortunately,onlyimageURLsaregivenandsomearenolongervalid.Wedownloaded884 1http://www.kbvt.com/LFPW/ Figure6:ExampleresultsfromourmethodonLFPWdataset.Thersttworowsshowfaceswithstrongchangesinposeandillumination,andfacespartiallyoccluded.Thelastrowshowsthe10worstimagesmeasuredbynormalizedmeanerror. Figure7:ExampleresultsonLFW-A&Cdataset. Figure8:ComparisonbetweenthetrackingresultsfromSDM(toprow)andperson-specictracker(bottomrow). Figure9:ExampleresultsontheYoutubeCelebritydataset.