/
Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid

Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
541 views
Uploaded On 2014-12-15

Robust RealTime Visual Tracking using PixelWise Posteriors Charles Bibby and Ian Reid - PPT Presentation

oxacukianrobotsoxacuk Abstract We derive a probabilistic framework for robust realtime visual tracking of previously unseen objects from a moving camera The tracking problem is handled using a bagofpixels representation and comprises a rigid registra ID: 24461

oxacukianrobotsoxacuk Abstract derive

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Robust RealTime Visual Tracking using Pi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

amodel{incommonwithothersimplerdensity-basedrepresentationssuchascolour-histograms{isthedegreeofinvariancetoviewpointthisconfers.Like[4],wederiveaprobabilistic,regionbased,level-setframework,whichcomprisesanoptimalrigidregistration,followedbyasegmentationtore-segmenttheobjectandaccountfornon-rigiddeformations.Asidefromissuesofspeed(whicharenotaddressedin[4])thereareanumberofkeydi erencesbetween[4]andourwork,someofwhichstemfromthegenerativemodelweuseforimagedata(seeSect.2).First,ourderivationgivesaprobabilisticinterpretationtotheHeavisidestepfunctionusedinmostregionbasedlevel-setmethods[7,4].Second,giventhisinterpretationweproposeapixel-wiseposteriorterm,asopposedtoalikelihood,whichallowsustomarginaliseoutmodelparametersatapixellevel.AsweshowinSect.2,thisderivesnaturallyfromourgenerativemodel,andisasubtlebutabsolutelycrucialdi erencebetweenourmethodandotherse.g.[4,2,3],asourresultsshowinSect.7.Third,incontrastto[7,4]andsimilarto[8,9]weassumeanon-parametricdistributionforimagevaluesasopposedtoasingleGaussian(foranentireregion).Finally,weintroduceapriorontheembeddingfunctionwhichconstrainsittobeanapproximatesigneddistancefunction.Weshowthatthisgivesacleanprobabilisticinterpretationtotheideaproposedby[10]andavoidstheneedforreinitialisationoftheembeddingfunctionthatisnecessaryinthemajorityoflevel-setbasedapproaches.Ourworkalsobearssomesimilarityto[11]whosoughttherigidtransfor-mationthatbestalignsa xedshape-kernelwithimagedatausingtheBhat-tacharyyacoecient.Thisworkextendedthepioneeringworkofthistype[12,13]tohandletranslation+scale+rotationasopposedtotranslationonlyortrans-lation+scale.Incontrastto[11],however,weallowtheshapetochangeonlineandproposeanovelframeworkusingpixel-wiseposteriors,whichremovesthecostofbuildinganempiricaldistributionandtestingitwiththeBhattacharyyacoecient.Thishasasecondhiddenbene tasitavoidstheneedtobuilda`good'empiricaldistributiongivenlimiteddata,we ndinpracticethisgivesasigni cantimprovementover[12,13,11].Unlike[4],FreedmanandZhang[8,9]useanon-parametricdistributionforimagedata.Theyderivecontour owsbasedonbothKL-divergenceandtheBhattacharyyacoecient.Thoughtheydemonstratethatbotharee ectivefortracking,theydonotmodelrigidtransformationparametersexplicitly,theymustrecomputetheirnon-parametricdistributionsateveryiteration,and{asweshowinSect.7{objectivesbasedontheBhattacharyyacoecientareinferiortotheonewepropose.Finally,itisworthmentioningtemplatebasedtrackingmethods(see[14]foranexcellentsummaryofpastwork).WeincludeanidealSSDcostinourresults(Sect.7),whichusesthecorrecttemplateateachframe.ThoughthisunfairlyadvantagestheSSDmethod{sinceinrealitytheexacttemplateisneveravail-able{itdoessuggestthatinfuturetherewouldbebene tinconsideringhowspatialinformationcanbeincorporated.Thusalthoughourmethodiscurrentlybasedsolelyonnon-parametricdistributionswearecurrentlyinvestigatingwaystoaugmentitwithspatialinformation. Werepresenttheobjectbeingtrackedby:itsshapeC,itslocationintheimageW(x;p)andtwounderlyingappearancemodels:onefortheforegroundP(yjMf)andoneforthebackgroundP(yjMb).Figure1illustratesthiswithasimpleexample.Shape:isrepresentedbythezerolevel-setC=fxj(x)=0gofanembed-dingfunction(x)[1,5].Thepixels intheobjectframearesegmentedintotworegions:onefortheforeground fandoneforthebackground b.Location:isdescribedbyawarpW(x;p)thattakesapixellocationxintheobjectframeandwarpsitintotheimageframeaccordingtoparametersp.Thiswarpmustformagroup[14];however,thisisacceptableasmanycommonusefultransformationsincomputervisiondoformgroups,forinstance:translation,translation+scale,similaritytransforms,anetransformsandhomographies.Appearancemodels:P(yjMf)andP(yjMb)arerepresentedwithYUVhistogramsusing32binsperchannel.Thehistogramsareinitialisedeitherfromadetectionmoduleorauserinputtedinitialboundingbox.Thepixelsinsidetheboundingboxareusedtobuildtheforegroundmodelandthepixelsfromanin atedboundingboxareusedtobuildthebackgroundmodel.Thetwoinitialdistributionsarethenusedtoproduceatentativesegmentation,whichisinturnusedtorebuildthemodel.Thisprocedureisiterateduntiltheshapeconverges(similarto[15]).OncetrackingcommencestheappearancemodelsandshapeCareestimated(adapted)online,asdescribedinSect.6.Insummary,weusethefollowingnotation: { x:Apixellocationintheobjectcoordinateframe. { y:Apixelvalue(inourexperimentsthisisaYUVvalue). { I:Image. { W(x;p):Warpwithparametersp(mustformagroup). { M=fMf;Mbg:Modelparametereitherforegroundorbackground. { P(yjMf):Foregroundmodeloverpixelvaluesy. { P(yjMb):Backgroundmodeloverpixelvaluesy. { C:Thecontourthatsegmentstheforegroundfrombackground. { (x):Shapekernel(inourcasethelevel-setembeddingfunction). { =f f; bg:Pixelsintheobjectframe[fx0;y0g;:::;fxN;yNg],whichispartitionedintoforegroundpixels fandbackgroundpixels b. { H(z):SmoothedHeavisidestepfunction. { (z):SmoothedDiracdeltafunction.Figure1illustratesthesimplegenerativemodelweusetorepresenttheimagefor-mationprocess.Thismodeltreatstheimageasabag-of-pixels[6]andcan,giventhemodelM,theshapeandthelocationp,beusedtosamplepixelsfx;yg.Al-thoughtheresultantimagewouldnotlooklikethetrueforeground/backgroundimagetoahuman(thepixelswouldbejumbledup),thecolourdistributionscorrespondingtotheforeground/backgroundregions f= bwouldmatchthemodelsP(yjMf)andP(yjMb).Itisthissimplicitythatgivesmoreinvariancetoviewpointandallows3Dobjectstobetrackedrobustlywithouthavingtomodeltheirspeci c3Dstructure.ThejointdistributionforasinglepixelgivenbythemodelinFig.1is: 3SegmentationThetypicalapproachtoregionbasedsegmentationmethodsistotakeaproductofthepixel-wiselikelihoodfunctionsQNi=1P(I(xi)jMi),overpixellocationsxi,togettheoveralllikelihoodP(IjM).Thiscanthenbeexpressedasasummationbytakinglogsandoptimisedusingvariationallevel-sets[1,5].Incontrasttothesemethods,ourderivationleadstopixel-wiseposteriorsandmarginalisation(5),asubtlebutimportantdi erence.Fortheremainderofthissection,inordertosimplifyourexpressions(andwithoutlossofgenerality),weassumethattheregistrationiscorrectandthere-forexi=W(xi;p).WenowspecifythetermP(xij;p;M)in(5)andthetermP(M)in(3):P(xij;p;Mf)=H((xi)) fP(xij;p;Mb)=1�H((xi)) b(7)P(Mf)=f P(Mb)=b ;(8)where=f+b;f=NXi=1H((xi));b=NXi=11�H((xi)):(9)Equation(7)representsnormalisedversionsoftheblurredHeavisidestepfunc-tionsusedintypicalregionbasedlevel-setmethodsandcannowbeinterpretedprobabilisticallyasmodelspeci cspatialpriorsforapixellocationx.Equation(8)representsthemodelpriors,whicharegivenbytheratiooftheareaofthemodelspeci cregiontothetotalareaofbothmodels.Equation(9)containsthenormalisationconstants(notethat=N).Wenowspecifyageometricprioronthatrewardsasigneddistancefunc-tion:P()=NYi=11 p 2exp�(jO(xi)j�1)2 22;(10)wherespeci estherelativeweightoftheprior.Thisgivesaprobabilisticin-terpretationtotheworkin[10].Substituting(7),(8),(9)and(10)into(5)andtakinglogs,givesthefollowingexpressionforthelogposterior:log(P(;pj ))/NXi=1(log(P(xij;p;yi))�(jO(xi)j�1)2 22)+Nlog1 p 2+log(P(p));(11) IntroducingawarpW(xi;p)into(14)anddroppingthepriortermforbrevity(werevisitthisterminSect.5):log(P(;pj ))/NXi=1nlog(P(W(xi;p)j;p;yi))o;(15)whereprepresentsanincrementalwarpoftheshapekernel.Therearemanywaysthisexpressioncouldbeoptimised,themostsimilarworkusessimplegradi-entascent[4].Incontrast,wetakeadvantageofthefactthatalloftheindividualtermsareprobabilities,andthereforestrictlypositive.Thisallowsustowritecertaintermsassquaredsquare-rootsandsubstituteina rst-orderTaylorseriesapproximationforeachsquare-root,forexample:hp H((W(xi;p))i2"p H((xi))+1 2p H((xi))Jp#2;(16)where:J=@H @@ @x@W @p=((xi))O(xi)@W @p:Likewiseweapplyasimilarexpansionto(1�H((W(xi;p))),allowingusthentooptimiseusingGauss-Newton2.ThishastheadvantagethattheHessianitselfisnotrequired,rather,a rst-orderapproximationoftheHessianisused.Inconsequenceitisfast,andinourexperienceexhibitsrapidandreliablecon-vergenceinourproblemdomain.Italsoavoidstheissueshighlightedin[17]ofchoosingtheappropriatestepsizeforgradientascent.Excludingthefulldetailsforbrevitywearriveatanexpressionforp:p="NXi=11 2P(xij;p;yi)Pf H((xi))+Pb (1�H((xi)))JTJ#�1NXi=1(Pf�Pb)JT P(xij;p;yi):(17)Equation(17)isthenusedtoupdatetheparameterspbycomposingW(xi;p)withW(xi;p)�1,analogoustoinversecompositionaltracking[14].5DriftCorrectionHavingtheobjectrepresentedbyitslocationpandshapeleavesanambiguitywhereitispossibletoexplainrigidtransformationsoftheshapeeitherwithp 2TheTaylorexpansionispoorlyconditionedifH((xi)=0;inpracticethisdoesnothappenasthetermsareneverequaltozero. Fig.2.Qualitativeevaluation:(top)aspeedboatundergoinga180out-of-planerota-tionillustratingshapeadaptation;(middle)apersonjumpingaroundwithsigni cantmotionblur;(bottom)ahandbeingtrackedinfrontofachallengingbackground. eachperturbation,evaluateasetofcostfunctionsthatarecommonlyusedinothertrackingmethods,suchas:level-setmethodsbasedonlikelihoods[4,2],mean-shift[12,13,11],inversecompositional[14]anddistributionbasedtrack-ing[8,9].Weconsiderperturbationsforeachdimensionseparatelytoallowustoplotone-dimensionalgraphs,i.e.translationinx,translationiny,scaleandrotation.Byexaminingthesecostfunctionswecan ndallextremaandexam-inehowtheyaredistributedacrossthespace.Anidealcostfunctionwouldbeconvexwithasingleextremaatthetruelocation.Theparticularcostfunctionsweconsiderare: { LogPWP:Pixel-wiseposteriorsfusedusingalogarithmicopinionpool. { LinPWP:Pixel-wiseposteriorsfusedusingalinearopinionpool. { LogLike:Loglikelihood,usedinmostlevel-setwork[5,4,2]. { BhattF:Bhattacharyyacoecient:B( f)=PVj=1p P(yjjMf)P(yjj f),usedby[12,13,11]. { BhattFB:Bhattacharyyacoecientwithabackgroundmodel:B( f; b)=PVj=1p P(yjjMf)p(yjj f)+PVj=1p P(yjjMb)P(yjj b). { BhattFBM:Bhattacharyyacoecientwithabackgroundmismatch:B( f; b)=PVj=1p P(yjMf)p(y; f)�PVj=1p P(yjjMf)P(yjj b),sug-gestedby[9]. { IdealSSD:Sumofsquaredpixeldi erencesusingtheidealtemplatei.e.thetemplateextractedatthecurrentlocationp.Thisisessentiallywhatyouwouldgetifyouhadtheperfectgenerativemodelgivingthetruepixelvalueateachpixellocationincludingthenoise.Thisofcourseisnevergoing Fig.4.QuantitativeAnalysis:Logprobabilitydistributionofextremainthecostfunc-tionsgeneratedfrom20,000framesofrealvideodata. { Rotation:AllBhattacharyyamethodsandtheloglikelihoodarepooratcorrectlylocalisingtherotation.ThestraightBhattacharyyacoecientforexamplehasmorethana1%chanceofexhibitingextremaanywhereintherotationspace,ata30Hzframeratethiscorrespondstoapproximately1frameinevery3secondsofvideo.Itisworthnotingthatthesidelobes(atapproximately25)exhibitedbyourmethodsandidealSSDareduetotheselfsimilaritycorrespondingto ngersinthehandsequences.Experimentallywewereunabletomaketheloglikelihoodsuccessfullytracksev-eralofoursequences,whichiscon rmedbyitspoorperformanceinFig.4.Onepossibleexplanationisthatinotherwork[4,5,2],asingleGaussianparametricmodelisused.Thisimplicitlyenforcesasmooth,unimodaldistributionforthejointlikelihood.Non-parametricrepresentationsdonotexhibittheseproperties;however,theyarebetteratdescribingcomplicateddistributionsandthereforedesirable.Thereasonthatourmethodcandealwiththesedistributionsisbe-causeofthenormalisingdenominatorin(3)andthemarginalisationstepin(4).Thesetwostepspreventindividualpixelsfromdominatingthecostfunctionhencemakingitsmootherandwell-behaved.Theworkof[8]anditssubsequentimprovement[9]usedistributionmatch-ingtechniquestoincorporatenon-parametricdistributionsintoalevel-setframe-work.Thesemethods,similartotheBhattacharyyabasedmethods,involvecom- putingtheempiricaldensitiesateveryiterationoftheoptimisation;whereas,ourmethodavoidsthisextracost.Notonlyisourmethodsuperiortotheseap-proachesintermsofcostfunctions(seeFig.4),butitiscomputationallycheapertoevaluateasitdoesnotrequireempiricaldistributions.Thisisasigni cantbene tbecauseitnotonlyreducesthecostperiteration,butavoidstheissueofhavingtobuild`good'distributions.Oneexplanationfordi erencebetweentheperformanceofthesemethodsandours,isthatitishardtobuild`good'em-piricaldistributionsinreal-timeandmostmethodsrelyonsimplehistograms.AlthoughthiscouldbeimprovedwithParzenorNPwindowingtechniques[19],itwouldalmostcertainlysacri cereal-timeperformance.7.1TimingAlltermsin(17)include((xi))(blurredDiracdeltafunction).Thismeansthatanindividualpixel'scontributiontotheoptimisationdiminishesthefurtherfromthecontouritis.Anecientimplementation,therefore,recognisesthis.Ourimplementationignorespixelsoutsideanarrowbandandforanobjectsizeof180180runsin500sonaP43.6GHzmachine.Onaveragethesystemrunsataframerateof85Hzforthecompletealgorithmandifshapeandappearancelearningareturnedo (i.e.rigidregistrationonly)itaverages230Hz.8ConclusionsWehaveproposedanovelprobabilisticframeworkforrobust,real-time,visualtrackingofpreviouslyunseenobjectsfromamovingcamera.Thekeycontri-butionofourmethodandreasonforitssuperiorperformancecomparedwithothersistheuseofpixel-wiseposteriorsasopposedtoaproductoverpixel-wiselikelihoods.Incontrasttoothermethods[4,5],wesolvetheregistrationusingGaussNewton,whichhassigni cantpracticalbene ts,namely:(i)thedicultyassociatedwithstepsizeselectionisremovedand(ii)reliableandfastconver-gence.Wehavedemonstratedthebene tsofourmethodbothqualitativelyandquantitativelywithathoroughanalysisofpixel-wiseposteriorsversuscompet-ingalternativesusingover20,000videoframes.Ourresultsdemonstratethatusingpixel-wiseposteriorsprovidesexcellentperformancewhenincorporatingnon-parametricdistributionsintoregionbasedlevel-sets.Itnotonlyo erssu-periorcostfunctionsbutavoidstheneedforcomputingempiricaldistributions[12,8,9,11]andisthereforefaster.InourongoingresearchweareinvestigatinginmoredetailtheprobabilisticinterpretationoftheblurredHeavisidestepfunctions,andinparticularthelinkwithGaussianuncertaintyoncontourlocation.Otherworkoffutureinterestinvolvesmodifyingthegenerativemodeltocapturespatialinformation,andinvestigatinghowtoincorporatemultipleobjectsandonlineocclusionhandling.