Russell Andrew Zisserman William T Freeman Alexei A Efros INRIA Ecole Normale Sup erieure University of Oxford Massachusetts Institute of Technology Carnegie Mellon University josefrussell diensfr azrobotsoxacuk billfcsailmitedu efroscscmuedu Abstr ID: 24753
Download Pdf The PPT/PDF document "Unsupervised Discovery of Visual Object ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
bytopicsonasinglepath(fromtheroottoaleaf)throughthetree.ThehLDAmodelcanalsobeviewedasasetofstandardLDAmodels,onealongeachpathofthetree,wherethetop-icsassociatedwithinternalnodesofthetreearesharedbytwoormoreLDAs,withtherootnodesharedbyallLDAmodels.AssumingthatthetreestructureTisknown,wecansam-plewordsinasingledocumentusingthefollowinggener-ativeprocess:(1)Pickapathcthroughthetree;(2)Sam-pleanL-vectorofmixingweightsfromaDirichletdistri-butionp(j);(3)Samplewordsinadocumentusingthetopicslyingalongthepathcinthetree.Thisgenerativepro-cesscorrespondstothegraphicalmodelshowningure2(b).Eachdocumenthasanassociatedhiddenvariablecindicat-ingwhichpathofthetreeitwasgeneratedfrom.Givenaparticularpathc,thehiddenvariablezi,associatedwitheachwordwiinthedocument,indicateswhichlevelofthetreewiwassampledfrom.Foraparticulardocumentw,thejointdistributionofobservedandhiddenvariables,conditionedon(hyper)-parametersandfactorsasp(w;z;c;;j;;T)=NYi=1p(wijzi;c;)p(zij)p(j)p(j)p(cjT):(2)Herewealsoconditionedp(c)onTtoindicatethatthetreestructureisxedandknown.Inpracticehowever,itisoftendifculttospecifyasuitabletreestructurea-priori.Recentlyhowever,Bleietal.[5]developedahierarchicalLDAmodel,whichautomaticallyinfersthestructureofthetreefromthedata.ThisisachievedbyplacinganestedChineserestaurantprocess(nCRP)priorontreestructures.nCRPprior:ThenestedChineserestaurantprocess[5]speciesadistributiononpartitionsofdocumentsintopathsina(xeddepth)L-leveltree.TogenerateatreestructurefromnCRP,assignmentsofdocumentstopathsaresampledsequentially.TherstdocumentformsaninitialL-levelpath,i.e.atreewithasinglebranch.Eachsubsequentdoc-umentiseitherassignedtooneoftheexistingpaths(wherepathswithmoredocumentsaremoreprobable),ortoanovelpathbranchingoffatanyexisting(non-leaf)nodeofthetree.Theprobabilityofcreatingnovelbranchesiscontrolledbyparameter ,wheresmallervaluesof resultintreeswithfewerbranches.Notethatthenumberofbranchesateachnodecanvary.UsingthehierarchicalLDAmodeldescribedabovecom-binedwiththenestedCRPpriorontreeswecansamplewordsinadocumentbythefollowinggenerativeprocess[5]:(1)PickaL-levelpathcfromthenCRPprior.(2)Sam-pleL-vectorofmixingweightsfromDirichletdistributionp(j);(3)Samplewordsinadocumentusingthetopicsly- ingalongthepathcinthetree.Thecorrespondinggraphicalmodelisshowningure2(c).Modellearning:ThehierarchicalLDA(hLDA)modelisttedusingaGibbssamplerasdescribedin[5].ThegoalistoobtainsamplesfromtheposteriordistributionofthelatenttreestructureT,thelevelassignmentszofallwordsandthepathassignmentscforalldocumentsconditionedontheobservedcollectionofdocumentsw.ForeachdocumenttheGibbssamplerisdividedintotwosteps.Intherststep,thelevelallocationszmarere-sampledwhilekeepingthecurrentpathassignmentcmxed.Inthesecondstep,thepathassignmentcmisre-sampledwhilekeepingthelevelallocationszmxed,whichcanresultinadeletion/creationofabranchinthetree.Example:ToillustratethehLDAmodelconsiderathreelevelbarhierarchyshowningure3(a).Similar`bar'topicexampleswereshownin[5,12].ThestructureofthetreewassampledfromthenCRPpriorwith =0:3.Figure3(b)showsatopichierarchyautomaticallyrecoveredusingtheGibbssamplerof[5]fromthecollectionof100documents,eachcontaining250words,sampledfromthetopichierarchyshowningure3(a),withtopicproportionssampledfromDirichletpriorwith=[50;30;10].Notethatparametersaresettovalues1toencouragehighmixingoftopicsalongthepath.Wehaveobservedempiricallyonsimilarsimulateddatasets,wherethetruevaluesofz,candTknown,that,theGibbssamplerconvergesveryslowlyrequiringthousandsofiterations.Ifhowever,wetreatthetreelevelassignmentzofeachwordineachdocumentasobservedandxthemtotheirtruevalues,theGibbssamplerndsthecorrecttreestructure(theassignmentscofdocumentstopathsinthetree)withinafewiterations.Ontheotherhand,whenthepathassignmentscaretreatedasobservedandxedtothecorrectvalues,re-coveringthelevelassignmentszisstilldifcultandrequiresthousandsofiterations.Inotherwords,knowingfromwhichlevelofthehierar-chyeachwordcomes,whichistheinformationcarriedinz,greatlysimpliestheanalysisofthedataandmakesndingtheunderlyingtopictreestructuresignicantlyeasier.Moti-vatedbythisobservationwedesignanimagerepresentationwhichwillallowustomakeareasonableguessofz,whichwecanusethentoinitializetheGibbssampler.3.ImagerepresentationusingvisualwordsThegoalistoobtainanimagerepresentationtoleranttointra-classvariationsandacertaindegreeoflightingchanges.WeachievethisbyrepresentingimagesusingavisualvocabularyofquantizedSIFT[19]descriptors.Inad-dition,wewanttoobtaina`coarse-to-ne'descriptionoftheimagewithvaryingdegreesofappearanceandspatiallocal-izationgranularity,suitableforhierarchicalobjectrepresen- isinitializedbysamplingarandomtreefromthenCRPpriorwith =1.WeruntheGibbssampler10times(initial-izedwithadifferentrandomtree)for50iterations.Ateachiteration,thecurrentsampleofzandcisusedtocomputeMAPestimates[12]ofthemixingweights,MAP,andtopicdistributions,MAP,whichareinturnusedtoevaluatethelog-likelihoodoftheobserveddataw.Thislog-likelihoodisusedtoassesstheconvergenceandcomparedifferentrunsoftheGibbssampler(hereweshowmodelswiththehigh-estlog-likelihood).OneiterationoftheGibbssamplertakesabout10secondsona2GHzmachine.Intermsofparametervariation,wefoundthatthehLDAmodelismostsensitivetochoosingthehyperparametercontrollingthesmoothing/sparsityoftopicspecicvisualworddistributions,wheresmallervalues(=0:1)producelargetreeswithsparsetopics,andlargervalues(=1)producesmallertreeswithnon-sparse`shared'topics(here=1).Similarsensitivitytothechoiceofwasfoundinthetextdomain[5].Toencouragehighmixingoftopicsalongpathsinthetreehyperparameterissettovalue1,typically300-500.Asin[5]thenCRPpriorhyperparameterisxedto =1.ThehLDAmodelrequireschoosingthedepthofthehierarchymanuallyandwedemonstratelearn-ingtreeswithupto5-levels.Wefoundthattheinitializationoflevelassignmentszde-scribedaboveisimportant.Wheninitializedwithrandomlevelassignments,thesamplerconvergestoaninferiorsolu-tionbothintermsoflog-likelihoodandclassicationperfor-mance(describedinsection5),evenafter10,000iterations.Notethatinitializationoflevelassignmentszisbasedsolelyonspatialandappearancegranularityofthevisualvocabu-laryanddoesnotrequireanyknowledgeofobjectlabels,i.e.isunsupervised.Thelearnt4-levelobjecthierarchyisshowningure1.Distinctpathsinthetreecorrespondfairlyaccuratelytoob-jectclasses.Inaddition,screensandtrafclightsshareacommonthird-leveltopic(node6);trafclights,screensandswitchesshareacommonsecondleveltopic(node10);andcarssideandcarsrearshareacommonsecondleveltopic(node15).4.2.ExampleII:MSRCdatasetHereweconsiderthemorechallengingMSRC-B1dataset[33]of240imagesof9objectclasses:faces,cows,grass,trees,buildings,cars,airplanes,bicyclesandsky.Weusethemanualsegmentationsprovidedwiththedata,atotalof553segments,andtreateachimagesegmentasasepa-rate`document'.Welearna5-levelhLDAmodel.Asabove,weinitializedthelevelassignmentszusingtheappearanceandspatialgranularityofthevocabulary,thistimestartingatlevel2ofthetree,leavingtheroottopicempty.Thediscov-eredobjecthierarchyisshowningure5.Somenodesofthehierarchyarefurtherillustratedbyexampleimagesegmentsingure6.Theclassicationaccuracyisdiscussednext. 5.AssessinghierarchiesusingclassicationSofarweexaminedthelearnthierarchiesvisually.Inthissectionweassesstheirqualitybyusingthemforclassica-tionofobjectcategories.Notethattheassignmentofimagestopathsinthetreeimpliesahierarchicalpartitionofthedataandwecanusethispartitionforimageclassication.Foraccurateclassi-cation,wewouldlikeallimagesofaparticularobjectclasstobe`assigned'1toasinglenode(internalorleaf)ofthetree(highrecall).Inaddition,wewouldlikenootherimages(ofotherobjectclasses)tobeassignedtothesamenodeofthetree(highprecision).Toreecttheaboverequirementswedenea`classicationoverlapscore'foranobjectclassiandnodetinthetreeas(i;t)=GTi\Nt GTi[Nt;whereGTiisthe(manuallyobtained)groundtruthsetofimagesofclassiandNtisthesetofimageswhichareassignedtoapathpassingthroughnodet.Thisscorerangesbetween0and1withhigherscoresindicatingbetter`overlap'betweentheobjectclassiandnodet.Toobtainasinglenumberperfor-mancemeasure,,wetakethenodewithmaximumover-lapforeachclassandthenaveragescoresoverallclasses,=1=NcPimaxt(i;t),whereNcisthetotalnumberofgroundtruthobjectclasses.Forexample,theobjecthierarchyshowningure1hasclassicationoverlapscore0.95.Theperfectscoreof1.00isnotachieveddueto`computerscreens'beingsplitintothreebottomlevelnodes(3,4and5with20,3and1imagere-spectively).Inthiscasethescoreismeasuredfornode3.Thissplittingseemstobeduetodifferentvisualwordrepre-sentationsoftheinsideofthescreen(dependingonwhetherthescreenisemptyornot).5.1.ComparisonwithLDAHereweusetheclassicationoverlapscoretocomparetheobjecthierarchylearnedfromtheMSRC-B1dataset,showningure5,withpartitionsofthedataobtainedbythestandardLDAmodel[6,22,25]withvaryingnumberoftopics.Thesamerepresentationofimagesegmentsusingvi-sualwordsisusedforbothLDAandhLDA.InthecaseofLDA,weestimatemixingweightsforeachsegmentandassigneachsegmenttothetopicwiththemaximummixingweight.Resultsaresummarizedintable1.Empiricallyweobservedthatifthenumberoftopicsissmall(K=4,5,10)LDAtendstogroupsomeobjectclasses(suchasairplanesandcars,treesandgrass,orfacesandcows)togetherinasin-gletopic.Forahighernumberoflearnedtopics(K20),someobjectclassessuchas`buildings',`grass'and`trees'tendtosplitbetweenseveral,usuallyfairlypure,topics.Insomecasesmixedtopicsalsooccur.IncontrasttoLDA,whichlearnsaattopicstructure,hLDAlearnsatopichier- 1AlthoughinthehLDAmodeleachimageisassignedtoacomplete(roottoaleaf)pathinthetree,inthefollowingwecallallimagesassignedtopathssharingaparticularinternalnodeas`assigned'tothatnode. Figure7.A5-levelhLDAhierarchylearnedontheMSRC-B1datasetusingmultiplesegmentations.Brancheswithlessthan5imagesegments(indistinctimages)areremovedfromthetree.Branchesinthetreeweremanuallylabelledbyobjectclassnames(showninred)basedonvisualinspection.Nodenumbersareshowninblackbeloweachnode.Someofthediscoveredtopicsareshowninmoredetailingure8.Usingautomatic`imperfect'segmentationsseemstoaffectboththestructureofthelearnttreeandthediscoveredobjectclasses(c.f.gure5,wheremanualsegmentationswereused).Someobjectclassesareconsistentlygroupedtogetherusingbothmanualandautomaticsegmentations,notably:(i)carsandairplanesand(ii)treesandgrass.However,usingautomaticsegmentationsresultsinasmallnumberof`spurious'branchescontainingsegmentsofseveralobjectclassesormixedobjectsegments(e.g.node14,showningure8,ornode41).Notethat8(outof9)objectclassesarediscovered.Wedonotndanybuildingtopicsasbuildingsseemtohavelessconsistentsegmentationsacrossthedataset.Usingautomaticsegmentationsenablesdiscoveringnewobjectclassesnotlabelledinthedata(herea`road/asphalt'topic,node28,showningure8). 02 07 61 28 42 43 48 25 19 14 Figure8.Selectednodesfromthehierarchy,showningure7.Eachnodeisillustratedbyamontageofthetopvesegments.Nodenumbers,referringtogure7,areshowntotheleftofeachmontage.Notethatthehierarchyofobjectsandtheirsegmentationwereautomaticallylearnedfromanunlabelledsetofimageswithoutsupervision. [24] J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.InCVPR,pages731743,1997. [25] J.Sivic,B.C.Russell,A.A.Efros,A.Zisserman,andW.T.Freeman.Discoveringobjectsandtheirlocationinimages.InICCV,pages370377,2005. [26] E.B.Sudderth,A.Torralba,W.T.Freeman,andA.S.Willsky.De-scribingvisualscenesusingtransformedobjectsandparts.IJCV,77(13):291330,2008. [27] S.TodorovicandN.Ahuja.Extractingsubimagesofanunknowncategoryfromasetofimages.InCVPR,2006. [28] A.Torralba,K.P.Murphy,andW.T.Freeman.Sharingfeatures:ef-cientboostingproceduresformulticlassobjectdetection.InCVPR,pages762769,2004. [29] A.Vailaya,M.Figueiredo,A.Jain,andH.-J.Zhang.Imageclas-sicationforcontent-basedindexing.IEEETran.onImageProc., 10(1):117130,2001. [30] M.VarmaandA.Zisserman.Textureclassication:Arelterbanksnecessary?InCVPR,volume2,pages691698,2003. [31] N.Vasconcelos.Imageindexingwithmixturehierarchies.InCVPR,2001. [32] G.Wang,Y.Zhang,andL.Fei-Fei.Usingdependentregionsforob-jectcategorizationinagenerativeframework.InCVPR,2006. [33] J.Winn,A.Criminisi,andT.Minka.Objectcategorizationbylearneduniversalvisualdictionary.InICCV,pagesI:756763,2005. [34] L.Zhu,Y.Chen,andA.Yuille.Unsupervisedlearningofaprobabilis-ticgrammarforobjectdetectionandparsing.InNIPS,2006. [35] A.ZweigandD.Weinshall.Exploitingobjecthierarchy:Combiningmodelsfromdifferentcategorylevels.InICCV,2007.