177K - views

Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid Active Vision Lab Department of Engineering Science University of Oxford cbibbyrobots

oxacukianrobotsoxacuk Abstract We derive a probabilistic framework for robust realtime visual tracking of previously unseen objects from a moving camera The tracking problem is handled using a bagofpixels representation and comprises a rigid registra

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Robust RealTime Visual Tracking using Pi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid Active Vision Lab Department of Engineering Science University of Oxford cbibbyrobots






Presentation on theme: "Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid Active Vision Lab Department of Engineering Science University of Oxford cbibbyrobots"— Presentation transcript:

amodel{incommonwithothersimplerdensity-basedrepresentationssuchascolour-histograms{isthedegreeofinvariancetoviewpointthisconfers.Like[4],wederiveaprobabilistic,regionbased,level-setframework,whichcomprisesanoptimalrigidregistration,followedbyasegmentationtore-segmenttheobjectandaccountfornon-rigiddeformations.Asidefromissuesofspeed(whicharenotaddressedin[4])thereareanumberofkeydi erencesbetween[4]andourwork,someofwhichstemfromthegenerativemodelweuseforimagedata(seeSect.2).First,ourderivationgivesaprobabilisticinterpretationtotheHeavisidestepfunctionusedinmostregionbasedlevel-setmethods[7,4].Second,giventhisinterpretationweproposeapixel-wiseposteriorterm,asopposedtoalikelihood,whichallowsustomarginaliseoutmodelparametersatapixellevel.AsweshowinSect.2,thisderivesnaturallyfromourgenerativemodel,andisasubtlebutabsolutelycrucialdi erencebetweenourmethodandotherse.g.[4,2,3],asourresultsshowinSect.7.Third,incontrastto[7,4]andsimilarto[8,9]weassumeanon-parametricdistributionforimagevaluesasopposedtoasingleGaussian(foranentireregion).Finally,weintroduceapriorontheembeddingfunctionwhichconstrainsittobeanapproximatesigneddistancefunction.Weshowthatthisgivesacleanprobabilisticinterpretationtotheideaproposedby[10]andavoidstheneedforreinitialisationoftheembeddingfunctionthatisnecessaryinthemajorityoflevel-setbasedapproaches.Ourworkalsobearssomesimilarityto[11]whosoughttherigidtransfor-mationthatbestalignsa xedshape-kernelwithimagedatausingtheBhat-tacharyyacoecient.Thisworkextendedthepioneeringworkofthistype[12,13]tohandletranslation+scale+rotationasopposedtotranslationonlyortrans-lation+scale.Incontrastto[11],however,weallowtheshapetochangeonlineandproposeanovelframeworkusingpixel-wiseposteriors,whichremovesthecostofbuildinganempiricaldistributionandtestingitwiththeBhattacharyyacoecient.Thishasasecondhiddenbene tasitavoidstheneedtobuilda`good'empiricaldistributiongivenlimiteddata,we ndinpracticethisgivesasigni cantimprovementover[12,13,11].Unlike[4],FreedmanandZhang[8,9]useanon-parametricdistributionforimagedata.Theyderivecontour owsbasedonbothKL-divergenceandtheBhattacharyyacoecient.Thoughtheydemonstratethatbotharee ectivefortracking,theydonotmodelrigidtransformationparametersexplicitly,theymustrecomputetheirnon-parametricdistributionsateveryiteration,and{asweshowinSect.7{objectivesbasedontheBhattacharyyacoecientareinferiortotheonewepropose.Finally,itisworthmentioningtemplatebasedtrackingmethods(see[14]foranexcellentsummaryofpastwork).WeincludeanidealSSDcostinourresults(Sect.7),whichusesthecorrecttemplateateachframe.ThoughthisunfairlyadvantagestheSSDmethod{sinceinrealitytheexacttemplateisneveravail-able{itdoessuggestthatinfuturetherewouldbebene tinconsideringhowspatialinformationcanbeincorporated.Thusalthoughourmethodiscurrentlybasedsolelyonnon-parametricdistributionswearecurrentlyinvestigatingwaystoaugmentitwithspatialinformation. Werepresenttheobjectbeingtrackedby:itsshapeC,itslocationintheimageW(x;p)andtwounderlyingappearancemodels:onefortheforegroundP(yjMf)andoneforthebackgroundP(yjMb).Figure1illustratesthiswithasimpleexample.Shape:isrepresentedbythezerolevel-setC=fxj(x)=0gofanembed-dingfunction(x)[1,5].Thepixels intheobjectframearesegmentedintotworegions:onefortheforeground fandoneforthebackground b.Location:isdescribedbyawarpW(x;p)thattakesapixellocationxintheobjectframeandwarpsitintotheimageframeaccordingtoparametersp.Thiswarpmustformagroup[14];however,thisisacceptableasmanycommonusefultransformationsincomputervisiondoformgroups,forinstance:translation,translation+scale,similaritytransforms,anetransformsandhomographies.Appearancemodels:P(yjMf)andP(yjMb)arerepresentedwithYUVhistogramsusing32binsperchannel.Thehistogramsareinitialisedeitherfromadetectionmoduleorauserinputtedinitialboundingbox.Thepixelsinsidetheboundingboxareusedtobuildtheforegroundmodelandthepixelsfromanin atedboundingboxareusedtobuildthebackgroundmodel.Thetwoinitialdistributionsarethenusedtoproduceatentativesegmentation,whichisinturnusedtorebuildthemodel.Thisprocedureisiterateduntiltheshapeconverges(similarto[15]).OncetrackingcommencestheappearancemodelsandshapeCareestimated(adapted)online,asdescribedinSect.6.Insummary,weusethefollowingnotation: { x:Apixellocationintheobjectcoordinateframe. { y:Apixelvalue(inourexperimentsthisisaYUVvalue). { I:Image. { W(x;p):Warpwithparametersp(mustformagroup). { M=fMf;Mbg:Modelparametereitherforegroundorbackground. { P(yjMf):Foregroundmodeloverpixelvaluesy. { P(yjMb):Backgroundmodeloverpixelvaluesy. { C:Thecontourthatsegmentstheforegroundfrombackground. { (x):Shapekernel(inourcasethelevel-setembeddingfunction). { =f f; bg:Pixelsintheobjectframe[fx0;y0g;:::;fxN;yNg],whichispartitionedintoforegroundpixels fandbackgroundpixels b. { H(z):SmoothedHeavisidestepfunction. { (z):SmoothedDiracdeltafunction.Figure1illustratesthesimplegenerativemodelweusetorepresenttheimagefor-mationprocess.Thismodeltreatstheimageasabag-of-pixels[6]andcan,giventhemodelM,theshapeandthelocationp,beusedtosamplepixelsfx;yg.Al-thoughtheresultantimagewouldnotlooklikethetrueforeground/backgroundimagetoahuman(thepixelswouldbejumbledup),thecolourdistributionscorrespondingtotheforeground/backgroundregions f= bwouldmatchthemodelsP(yjMf)andP(yjMb).Itisthissimplicitythatgivesmoreinvariancetoviewpointandallows3Dobjectstobetrackedrobustlywithouthavingtomodeltheirspeci c3Dstructure.ThejointdistributionforasinglepixelgivenbythemodelinFig.1is: 3SegmentationThetypicalapproachtoregionbasedsegmentationmethodsistotakeaproductofthepixel-wiselikelihoodfunctionsQNi=1P(I(xi)jMi),overpixellocationsxi,togettheoveralllikelihoodP(IjM).Thiscanthenbeexpressedasasummationbytakinglogsandoptimisedusingvariationallevel-sets[1,5].Incontrasttothesemethods,ourderivationleadstopixel-wiseposteriorsandmarginalisation(5),asubtlebutimportantdi erence.Fortheremainderofthissection,inordertosimplifyourexpressions(andwithoutlossofgenerality),weassumethattheregistrationiscorrectandthere-forexi=W(xi;p).WenowspecifythetermP(xij;p;M)in(5)andthetermP(M)in(3):P(xij;p;Mf)=H((xi)) fP(xij;p;Mb)=1�H((xi)) b(7)P(Mf)=f P(Mb)=b ;(8)where=f+b;f=NXi=1H((xi));b=NXi=11�H((xi)):(9)Equation(7)representsnormalisedversionsoftheblurredHeavisidestepfunc-tionsusedintypicalregionbasedlevel-setmethodsandcannowbeinterpretedprobabilisticallyasmodelspeci cspatialpriorsforapixellocationx.Equation(8)representsthemodelpriors,whicharegivenbytheratiooftheareaofthemodelspeci cregiontothetotalareaofbothmodels.Equation(9)containsthenormalisationconstants(notethat=N).Wenowspecifyageometricprioronthatrewardsasigneddistancefunc-tion:P()=NYi=11 p 2exp�(jO(xi)j�1)2 22;(10)wherespeci estherelativeweightoftheprior.Thisgivesaprobabilisticin-terpretationtotheworkin[10].Substituting(7),(8),(9)and(10)into(5)andtakinglogs,givesthefollowingexpressionforthelogposterior:log(P(;pj ))/NXi=1(log(P(xij;p;yi))�(jO(xi)j�1)2 22)+Nlog1 p 2+log(P(p));(11) IntroducingawarpW(xi;p)into(14)anddroppingthepriortermforbrevity(werevisitthisterminSect.5):log(P(;pj ))/NXi=1nlog(P(W(xi;p)j;p;yi))o;(15)whereprepresentsanincrementalwarpoftheshapekernel.Therearemanywaysthisexpressioncouldbeoptimised,themostsimilarworkusessimplegradi-entascent[4].Incontrast,wetakeadvantageofthefactthatalloftheindividualtermsareprobabilities,andthereforestrictlypositive.Thisallowsustowritecertaintermsassquaredsquare-rootsandsubstituteina rst-orderTaylorseriesapproximationforeachsquare-root,forexample:hp H((W(xi;p))i2"p H((xi))+1 2p H((xi))Jp#2;(16)where:J=@H @@ @x@W @p=((xi))O(xi)@W @p:Likewiseweapplyasimilarexpansionto(1�H((W(xi;p))),allowingusthentooptimiseusingGauss-Newton2.ThishastheadvantagethattheHessianitselfisnotrequired,rather,a rst-orderapproximationoftheHessianisused.Inconsequenceitisfast,andinourexperienceexhibitsrapidandreliablecon-vergenceinourproblemdomain.Italsoavoidstheissueshighlightedin[17]ofchoosingtheappropriatestepsizeforgradientascent.Excludingthefulldetailsforbrevitywearriveatanexpressionforp:p="NXi=11 2P(xij;p;yi)Pf H((xi))+Pb (1�H((xi)))JTJ#�1NXi=1(Pf�Pb)JT P(xij;p;yi):(17)Equation(17)isthenusedtoupdatetheparameterspbycomposingW(xi;p)withW(xi;p)�1,analogoustoinversecompositionaltracking[14].5DriftCorrectionHavingtheobjectrepresentedbyitslocationpandshapeleavesanambiguitywhereitispossibletoexplainrigidtransformationsoftheshapeeitherwithp 2TheTaylorexpansionispoorlyconditionedifH((xi)=0;inpracticethisdoesnothappenasthetermsareneverequaltozero. Fig.2.Qualitativeevaluation:(top)aspeedboatundergoinga180out-of-planerota-tionillustratingshapeadaptation;(middle)apersonjumpingaroundwithsigni cantmotionblur;(bottom)ahandbeingtrackedinfrontofachallengingbackground. eachperturbation,evaluateasetofcostfunctionsthatarecommonlyusedinothertrackingmethods,suchas:level-setmethodsbasedonlikelihoods[4,2],mean-shift[12,13,11],inversecompositional[14]anddistributionbasedtrack-ing[8,9].Weconsiderperturbationsforeachdimensionseparatelytoallowustoplotone-dimensionalgraphs,i.e.translationinx,translationiny,scaleandrotation.Byexaminingthesecostfunctionswecan ndallextremaandexam-inehowtheyaredistributedacrossthespace.Anidealcostfunctionwouldbeconvexwithasingleextremaatthetruelocation.Theparticularcostfunctionsweconsiderare: { LogPWP:Pixel-wiseposteriorsfusedusingalogarithmicopinionpool. { LinPWP:Pixel-wiseposteriorsfusedusingalinearopinionpool. { LogLike:Loglikelihood,usedinmostlevel-setwork[5,4,2]. { BhattF:Bhattacharyyacoecient:B( f)=PVj=1p P(yjjMf)P(yjj f),usedby[12,13,11]. { BhattFB:Bhattacharyyacoecientwithabackgroundmodel:B( f; b)=PVj=1p P(yjjMf)p(yjj f)+PVj=1p P(yjjMb)P(yjj b). { BhattFBM:Bhattacharyyacoecientwithabackgroundmismatch:B( f; b)=PVj=1p P(yjMf)p(y; f)�PVj=1p P(yjjMf)P(yjj b),sug-gestedby[9]. { IdealSSD:Sumofsquaredpixeldi erencesusingtheidealtemplatei.e.thetemplateextractedatthecurrentlocationp.Thisisessentiallywhatyouwouldgetifyouhadtheperfectgenerativemodelgivingthetruepixelvalueateachpixellocationincludingthenoise.Thisofcourseisnevergoing Fig.4.QuantitativeAnalysis:Logprobabilitydistributionofextremainthecostfunc-tionsgeneratedfrom20,000framesofrealvideodata. { Rotation:AllBhattacharyyamethodsandtheloglikelihoodarepooratcorrectlylocalisingtherotation.ThestraightBhattacharyyacoecientforexamplehasmorethana1%chanceofexhibitingextremaanywhereintherotationspace,ata30Hzframeratethiscorrespondstoapproximately1frameinevery3secondsofvideo.Itisworthnotingthatthesidelobes(atapproximately25)exhibitedbyourmethodsandidealSSDareduetotheselfsimilaritycorrespondingto ngersinthehandsequences.Experimentallywewereunabletomaketheloglikelihoodsuccessfullytracksev-eralofoursequences,whichiscon rmedbyitspoorperformanceinFig.4.Onepossibleexplanationisthatinotherwork[4,5,2],asingleGaussianparametricmodelisused.Thisimplicitlyenforcesasmooth,unimodaldistributionforthejointlikelihood.Non-parametricrepresentationsdonotexhibittheseproperties;however,theyarebetteratdescribingcomplicateddistributionsandthereforedesirable.Thereasonthatourmethodcandealwiththesedistributionsisbe-causeofthenormalisingdenominatorin(3)andthemarginalisationstepin(4).Thesetwostepspreventindividualpixelsfromdominatingthecostfunctionhencemakingitsmootherandwell-behaved.Theworkof[8]anditssubsequentimprovement[9]usedistributionmatch-ingtechniquestoincorporatenon-parametricdistributionsintoalevel-setframe-work.Thesemethods,similartotheBhattacharyyabasedmethods,involvecom- putingtheempiricaldensitiesateveryiterationoftheoptimisation;whereas,ourmethodavoidsthisextracost.Notonlyisourmethodsuperiortotheseap-proachesintermsofcostfunctions(seeFig.4),butitiscomputationallycheapertoevaluateasitdoesnotrequireempiricaldistributions.Thisisasigni cantbene tbecauseitnotonlyreducesthecostperiteration,butavoidstheissueofhavingtobuild`good'distributions.Oneexplanationfordi erencebetweentheperformanceofthesemethodsandours,isthatitishardtobuild`good'em-piricaldistributionsinreal-timeandmostmethodsrelyonsimplehistograms.AlthoughthiscouldbeimprovedwithParzenorNPwindowingtechniques[19],itwouldalmostcertainlysacri cereal-timeperformance.7.1TimingAlltermsin(17)include((xi))(blurredDiracdeltafunction).Thismeansthatanindividualpixel'scontributiontotheoptimisationdiminishesthefurtherfromthecontouritis.Anecientimplementation,therefore,recognisesthis.Ourimplementationignorespixelsoutsideanarrowbandandforanobjectsizeof180180runsin500sonaP43.6GHzmachine.Onaveragethesystemrunsataframerateof85Hzforthecompletealgorithmandifshapeandappearancelearningareturnedo (i.e.rigidregistrationonly)itaverages230Hz.8ConclusionsWehaveproposedanovelprobabilisticframeworkforrobust,real-time,visualtrackingofpreviouslyunseenobjectsfromamovingcamera.Thekeycontri-butionofourmethodandreasonforitssuperiorperformancecomparedwithothersistheuseofpixel-wiseposteriorsasopposedtoaproductoverpixel-wiselikelihoods.Incontrasttoothermethods[4,5],wesolvetheregistrationusingGaussNewton,whichhassigni cantpracticalbene ts,namely:(i)thedicultyassociatedwithstepsizeselectionisremovedand(ii)reliableandfastconver-gence.Wehavedemonstratedthebene tsofourmethodbothqualitativelyandquantitativelywithathoroughanalysisofpixel-wiseposteriorsversuscompet-ingalternativesusingover20,000videoframes.Ourresultsdemonstratethatusingpixel-wiseposteriorsprovidesexcellentperformancewhenincorporatingnon-parametricdistributionsintoregionbasedlevel-sets.Itnotonlyo erssu-periorcostfunctionsbutavoidstheneedforcomputingempiricaldistributions[12,8,9,11]andisthereforefaster.InourongoingresearchweareinvestigatinginmoredetailtheprobabilisticinterpretationoftheblurredHeavisidestepfunctions,andinparticularthelinkwithGaussianuncertaintyoncontourlocation.Otherworkoffutureinterestinvolvesmodifyingthegenerativemodeltocapturespatialinformation,andinvestigatinghowtoincorporatemultipleobjectsandonlineocclusionhandling.