oxacukianrobotsoxacuk Abstract We derive a probabilistic framework for robust realtime visual tracking of previously unseen objects from a moving camera The tracking problem is handled using a bagofpixels representation and comprises a rigid registra ID: 24461
Download Pdf The PPT/PDF document "Robust RealTime Visual Tracking using Pi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
amodel{incommonwithothersimplerdensity-basedrepresentationssuchascolour-histograms{isthedegreeofinvariancetoviewpointthisconfers.Like[4],wederiveaprobabilistic,regionbased,level-setframework,whichcomprisesanoptimalrigidregistration,followedbyasegmentationtore-segmenttheobjectandaccountfornon-rigiddeformations.Asidefromissuesofspeed(whicharenotaddressedin[4])thereareanumberofkeydierencesbetween[4]andourwork,someofwhichstemfromthegenerativemodelweuseforimagedata(seeSect.2).First,ourderivationgivesaprobabilisticinterpretationtotheHeavisidestepfunctionusedinmostregionbasedlevel-setmethods[7,4].Second,giventhisinterpretationweproposeapixel-wiseposteriorterm,asopposedtoalikelihood,whichallowsustomarginaliseoutmodelparametersatapixellevel.AsweshowinSect.2,thisderivesnaturallyfromourgenerativemodel,andisasubtlebutabsolutelycrucialdierencebetweenourmethodandotherse.g.[4,2,3],asourresultsshowinSect.7.Third,incontrastto[7,4]andsimilarto[8,9]weassumeanon-parametricdistributionforimagevaluesasopposedtoasingleGaussian(foranentireregion).Finally,weintroduceapriorontheembeddingfunctionwhichconstrainsittobeanapproximatesigneddistancefunction.Weshowthatthisgivesacleanprobabilisticinterpretationtotheideaproposedby[10]andavoidstheneedforreinitialisationoftheembeddingfunctionthatisnecessaryinthemajorityoflevel-setbasedapproaches.Ourworkalsobearssomesimilarityto[11]whosoughttherigidtransfor-mationthatbestalignsaxedshape-kernelwithimagedatausingtheBhat-tacharyyacoecient.Thisworkextendedthepioneeringworkofthistype[12,13]tohandletranslation+scale+rotationasopposedtotranslationonlyortrans-lation+scale.Incontrastto[11],however,weallowtheshapetochangeonlineandproposeanovelframeworkusingpixel-wiseposteriors,whichremovesthecostofbuildinganempiricaldistributionandtestingitwiththeBhattacharyyacoecient.Thishasasecondhiddenbenetasitavoidstheneedtobuilda`good'empiricaldistributiongivenlimiteddata,wendinpracticethisgivesasignicantimprovementover[12,13,11].Unlike[4],FreedmanandZhang[8,9]useanon-parametricdistributionforimagedata.Theyderivecontour owsbasedonbothKL-divergenceandtheBhattacharyyacoecient.Thoughtheydemonstratethatbothareeectivefortracking,theydonotmodelrigidtransformationparametersexplicitly,theymustrecomputetheirnon-parametricdistributionsateveryiteration,and{asweshowinSect.7{objectivesbasedontheBhattacharyyacoecientareinferiortotheonewepropose.Finally,itisworthmentioningtemplatebasedtrackingmethods(see[14]foranexcellentsummaryofpastwork).WeincludeanidealSSDcostinourresults(Sect.7),whichusesthecorrecttemplateateachframe.ThoughthisunfairlyadvantagestheSSDmethod{sinceinrealitytheexacttemplateisneveravail-able{itdoessuggestthatinfuturetherewouldbebenetinconsideringhowspatialinformationcanbeincorporated.Thusalthoughourmethodiscurrentlybasedsolelyonnon-parametricdistributionswearecurrentlyinvestigatingwaystoaugmentitwithspatialinformation. Werepresenttheobjectbeingtrackedby:itsshapeC,itslocationintheimageW(x;p)andtwounderlyingappearancemodels:onefortheforegroundP(yjMf)andoneforthebackgroundP(yjMb).Figure1illustratesthiswithasimpleexample.Shape:isrepresentedbythezerolevel-setC=fxj(x)=0gofanembed-dingfunction(x)[1,5].Thepixels intheobjectframearesegmentedintotworegions:onefortheforeground fandoneforthebackground b.Location:isdescribedbyawarpW(x;p)thattakesapixellocationxintheobjectframeandwarpsitintotheimageframeaccordingtoparametersp.Thiswarpmustformagroup[14];however,thisisacceptableasmanycommonusefultransformationsincomputervisiondoformgroups,forinstance:translation,translation+scale,similaritytransforms,anetransformsandhomographies.Appearancemodels:P(yjMf)andP(yjMb)arerepresentedwithYUVhistogramsusing32binsperchannel.Thehistogramsareinitialisedeitherfromadetectionmoduleorauserinputtedinitialboundingbox.Thepixelsinsidetheboundingboxareusedtobuildtheforegroundmodelandthepixelsfromanin atedboundingboxareusedtobuildthebackgroundmodel.Thetwoinitialdistributionsarethenusedtoproduceatentativesegmentation,whichisinturnusedtorebuildthemodel.Thisprocedureisiterateduntiltheshapeconverges(similarto[15]).OncetrackingcommencestheappearancemodelsandshapeCareestimated(adapted)online,asdescribedinSect.6.Insummary,weusethefollowingnotation: { x:Apixellocationintheobjectcoordinateframe. { y:Apixelvalue(inourexperimentsthisisaYUVvalue). { I:Image. { W(x;p):Warpwithparametersp(mustformagroup). { M=fMf;Mbg:Modelparametereitherforegroundorbackground. { P(yjMf):Foregroundmodeloverpixelvaluesy. { P(yjMb):Backgroundmodeloverpixelvaluesy. { C:Thecontourthatsegmentstheforegroundfrombackground. { (x):Shapekernel(inourcasethelevel-setembeddingfunction). { =f f; bg:Pixelsintheobjectframe[fx0;y0g;:::;fxN;yNg],whichispartitionedintoforegroundpixels fandbackgroundpixels b. { H(z):SmoothedHeavisidestepfunction. { (z):SmoothedDiracdeltafunction.Figure1illustratesthesimplegenerativemodelweusetorepresenttheimagefor-mationprocess.Thismodeltreatstheimageasabag-of-pixels[6]andcan,giventhemodelM,theshapeandthelocationp,beusedtosamplepixelsfx;yg.Al-thoughtheresultantimagewouldnotlooklikethetrueforeground/backgroundimagetoahuman(thepixelswouldbejumbledup),thecolourdistributionscorrespondingtotheforeground/backgroundregions f= bwouldmatchthemodelsP(yjMf)andP(yjMb).Itisthissimplicitythatgivesmoreinvariancetoviewpointandallows3Dobjectstobetrackedrobustlywithouthavingtomodeltheirspecic3Dstructure.ThejointdistributionforasinglepixelgivenbythemodelinFig.1is: 3SegmentationThetypicalapproachtoregionbasedsegmentationmethodsistotakeaproductofthepixel-wiselikelihoodfunctionsQNi=1P(I(xi)jMi),overpixellocationsxi,togettheoveralllikelihoodP(IjM).Thiscanthenbeexpressedasasummationbytakinglogsandoptimisedusingvariationallevel-sets[1,5].Incontrasttothesemethods,ourderivationleadstopixel-wiseposteriorsandmarginalisation(5),asubtlebutimportantdierence.Fortheremainderofthissection,inordertosimplifyourexpressions(andwithoutlossofgenerality),weassumethattheregistrationiscorrectandthere-forexi=W(xi;p).WenowspecifythetermP(xij;p;M)in(5)andthetermP(M)in(3):P(xij;p;Mf)=H((xi)) fP(xij;p;Mb)=1H((xi)) b(7)P(Mf)=f P(Mb)=b ;(8)where=f+b;f=NXi=1H((xi));b=NXi=11H((xi)):(9)Equation(7)representsnormalisedversionsoftheblurredHeavisidestepfunc-tionsusedintypicalregionbasedlevel-setmethodsandcannowbeinterpretedprobabilisticallyasmodelspecicspatialpriorsforapixellocationx.Equation(8)representsthemodelpriors,whicharegivenbytheratiooftheareaofthemodelspecicregiontothetotalareaofbothmodels.Equation(9)containsthenormalisationconstants(notethat=N).Wenowspecifyageometricprioronthatrewardsasigneddistancefunc-tion:P()=NYi=11 p 2exp(jO(xi)j1)2 22;(10)wherespeciestherelativeweightoftheprior.Thisgivesaprobabilisticin-terpretationtotheworkin[10].Substituting(7),(8),(9)and(10)into(5)andtakinglogs,givesthefollowingexpressionforthelogposterior:log(P(;pj ))/NXi=1(log(P(xij;p;yi))(jO(xi)j1)2 22)+Nlog1 p 2+log(P(p));(11) IntroducingawarpW(xi;p)into(14)anddroppingthepriortermforbrevity(werevisitthisterminSect.5):log(P(;pj ))/NXi=1nlog(P(W(xi;p)j;p;yi))o;(15)whereprepresentsanincrementalwarpoftheshapekernel.Therearemanywaysthisexpressioncouldbeoptimised,themostsimilarworkusessimplegradi-entascent[4].Incontrast,wetakeadvantageofthefactthatalloftheindividualtermsareprobabilities,andthereforestrictlypositive.Thisallowsustowritecertaintermsassquaredsquare-rootsandsubstituteinarst-orderTaylorseriesapproximationforeachsquare-root,forexample:hp H((W(xi;p))i2"p H((xi))+1 2p H((xi))Jp#2;(16)where:J=@H @@ @x@W @p=((xi))O(xi)@W @p:Likewiseweapplyasimilarexpansionto(1H((W(xi;p))),allowingusthentooptimiseusingGauss-Newton2.ThishastheadvantagethattheHessianitselfisnotrequired,rather,arst-orderapproximationoftheHessianisused.Inconsequenceitisfast,andinourexperienceexhibitsrapidandreliablecon-vergenceinourproblemdomain.Italsoavoidstheissueshighlightedin[17]ofchoosingtheappropriatestepsizeforgradientascent.Excludingthefulldetailsforbrevitywearriveatanexpressionforp:p="NXi=11 2P(xij;p;yi)Pf H((xi))+Pb (1H((xi)))JTJ#1NXi=1(PfPb)JT P(xij;p;yi):(17)Equation(17)isthenusedtoupdatetheparameterspbycomposingW(xi;p)withW(xi;p)1,analogoustoinversecompositionaltracking[14].5DriftCorrectionHavingtheobjectrepresentedbyitslocationpandshapeleavesanambiguitywhereitispossibletoexplainrigidtransformationsoftheshapeeitherwithp 2TheTaylorexpansionispoorlyconditionedifH((xi)=0;inpracticethisdoesnothappenasthetermsareneverequaltozero. Fig.2.Qualitativeevaluation:(top)aspeedboatundergoinga180out-of-planerota-tionillustratingshapeadaptation;(middle)apersonjumpingaroundwithsignicantmotionblur;(bottom)ahandbeingtrackedinfrontofachallengingbackground. eachperturbation,evaluateasetofcostfunctionsthatarecommonlyusedinothertrackingmethods,suchas:level-setmethodsbasedonlikelihoods[4,2],mean-shift[12,13,11],inversecompositional[14]anddistributionbasedtrack-ing[8,9].Weconsiderperturbationsforeachdimensionseparatelytoallowustoplotone-dimensionalgraphs,i.e.translationinx,translationiny,scaleandrotation.Byexaminingthesecostfunctionswecanndallextremaandexam-inehowtheyaredistributedacrossthespace.Anidealcostfunctionwouldbeconvexwithasingleextremaatthetruelocation.Theparticularcostfunctionsweconsiderare: { LogPWP:Pixel-wiseposteriorsfusedusingalogarithmicopinionpool. { LinPWP:Pixel-wiseposteriorsfusedusingalinearopinionpool. { LogLike:Loglikelihood,usedinmostlevel-setwork[5,4,2]. { BhattF:Bhattacharyyacoecient:B( f)=PVj=1p P(yjjMf)P(yjj f),usedby[12,13,11]. { BhattFB:Bhattacharyyacoecientwithabackgroundmodel:B( f; b)=PVj=1p P(yjjMf)p(yjj f)+PVj=1p P(yjjMb)P(yjj b). { BhattFBM:Bhattacharyyacoecientwithabackgroundmismatch:B( f; b)=PVj=1p P(yjMf)p(y; f)PVj=1p P(yjjMf)P(yjj b),sug-gestedby[9]. { IdealSSD:Sumofsquaredpixeldierencesusingtheidealtemplatei.e.thetemplateextractedatthecurrentlocationp.Thisisessentiallywhatyouwouldgetifyouhadtheperfectgenerativemodelgivingthetruepixelvalueateachpixellocationincludingthenoise.Thisofcourseisnevergoing Fig.4.QuantitativeAnalysis:Logprobabilitydistributionofextremainthecostfunc-tionsgeneratedfrom20,000framesofrealvideodata. { Rotation:AllBhattacharyyamethodsandtheloglikelihoodarepooratcorrectlylocalisingtherotation.ThestraightBhattacharyyacoecientforexamplehasmorethana1%chanceofexhibitingextremaanywhereintherotationspace,ata30Hzframeratethiscorrespondstoapproximately1frameinevery3secondsofvideo.Itisworthnotingthatthesidelobes(atapproximately25)exhibitedbyourmethodsandidealSSDareduetotheselfsimilaritycorrespondingtongersinthehandsequences.Experimentallywewereunabletomaketheloglikelihoodsuccessfullytracksev-eralofoursequences,whichisconrmedbyitspoorperformanceinFig.4.Onepossibleexplanationisthatinotherwork[4,5,2],asingleGaussianparametricmodelisused.Thisimplicitlyenforcesasmooth,unimodaldistributionforthejointlikelihood.Non-parametricrepresentationsdonotexhibittheseproperties;however,theyarebetteratdescribingcomplicateddistributionsandthereforedesirable.Thereasonthatourmethodcandealwiththesedistributionsisbe-causeofthenormalisingdenominatorin(3)andthemarginalisationstepin(4).Thesetwostepspreventindividualpixelsfromdominatingthecostfunctionhencemakingitsmootherandwell-behaved.Theworkof[8]anditssubsequentimprovement[9]usedistributionmatch-ingtechniquestoincorporatenon-parametricdistributionsintoalevel-setframe-work.Thesemethods,similartotheBhattacharyyabasedmethods,involvecom- putingtheempiricaldensitiesateveryiterationoftheoptimisation;whereas,ourmethodavoidsthisextracost.Notonlyisourmethodsuperiortotheseap-proachesintermsofcostfunctions(seeFig.4),butitiscomputationallycheapertoevaluateasitdoesnotrequireempiricaldistributions.Thisisasignicantbenetbecauseitnotonlyreducesthecostperiteration,butavoidstheissueofhavingtobuild`good'distributions.Oneexplanationfordierencebetweentheperformanceofthesemethodsandours,isthatitishardtobuild`good'em-piricaldistributionsinreal-timeandmostmethodsrelyonsimplehistograms.AlthoughthiscouldbeimprovedwithParzenorNPwindowingtechniques[19],itwouldalmostcertainlysacricereal-timeperformance.7.1TimingAlltermsin(17)include((xi))(blurredDiracdeltafunction).Thismeansthatanindividualpixel'scontributiontotheoptimisationdiminishesthefurtherfromthecontouritis.Anecientimplementation,therefore,recognisesthis.Ourimplementationignorespixelsoutsideanarrowbandandforanobjectsizeof180180runsin500sonaP43.6GHzmachine.Onaveragethesystemrunsataframerateof85Hzforthecompletealgorithmandifshapeandappearancelearningareturnedo(i.e.rigidregistrationonly)itaverages230Hz.8ConclusionsWehaveproposedanovelprobabilisticframeworkforrobust,real-time,visualtrackingofpreviouslyunseenobjectsfromamovingcamera.Thekeycontri-butionofourmethodandreasonforitssuperiorperformancecomparedwithothersistheuseofpixel-wiseposteriorsasopposedtoaproductoverpixel-wiselikelihoods.Incontrasttoothermethods[4,5],wesolvetheregistrationusingGaussNewton,whichhassignicantpracticalbenets,namely:(i)thedicultyassociatedwithstepsizeselectionisremovedand(ii)reliableandfastconver-gence.Wehavedemonstratedthebenetsofourmethodbothqualitativelyandquantitativelywithathoroughanalysisofpixel-wiseposteriorsversuscompet-ingalternativesusingover20,000videoframes.Ourresultsdemonstratethatusingpixel-wiseposteriorsprovidesexcellentperformancewhenincorporatingnon-parametricdistributionsintoregionbasedlevel-sets.Itnotonlyoerssu-periorcostfunctionsbutavoidstheneedforcomputingempiricaldistributions[12,8,9,11]andisthereforefaster.InourongoingresearchweareinvestigatinginmoredetailtheprobabilisticinterpretationoftheblurredHeavisidestepfunctions,andinparticularthelinkwithGaussianuncertaintyoncontourlocation.Otherworkoffutureinterestinvolvesmodifyingthegenerativemodeltocapturespatialinformation,andinvestigatinghowtoincorporatemultipleobjectsandonlineocclusionhandling.