/
Unsupervised Discovery of Visual Object Class Hierarchies Josef Sivic Bryan C Unsupervised Discovery of Visual Object Class Hierarchies Josef Sivic Bryan C

Unsupervised Discovery of Visual Object Class Hierarchies Josef Sivic Bryan C - PDF document

liane-varnes
liane-varnes . @liane-varnes
Follow
501 views
Uploaded On 2014-12-16

Unsupervised Discovery of Visual Object Class Hierarchies Josef Sivic Bryan C - PPT Presentation

Russell Andrew Zisserman William T Freeman Alexei A Efros INRIA Ecole Normale Sup erieure University of Oxford Massachusetts Institute of Technology Carnegie Mellon University josefrussell diensfr azrobotsoxacuk billfcsailmitedu efroscscmuedu Abstr ID: 24753

Russell Andrew Zisserman William

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Unsupervised Discovery of Visual Object ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

bytopicsonasinglepath(fromtheroottoaleaf)throughthetree.ThehLDAmodelcanalsobeviewedasasetofstandardLDAmodels,onealongeachpathofthetree,wherethetop-icsassociatedwithinternalnodesofthetreearesharedbytwoormoreLDAs,withtherootnodesharedbyallLDAmodels.AssumingthatthetreestructureTisknown,wecansam-plewordsinasingledocumentusingthefollowinggener-ativeprocess:(1)Pickapathcthroughthetree;(2)Sam-pleanL-vectorofmixingweightsfromaDirichletdistri-butionp(j );(3)Samplewordsinadocumentusingthetopicslyingalongthepathcinthetree.Thisgenerativepro-cesscorrespondstothegraphicalmodelshowningure2(b).Eachdocumenthasanassociatedhiddenvariablecindicat-ingwhichpathofthetreeitwasgeneratedfrom.Givenaparticularpathc,thehiddenvariablezi,associatedwitheachwordwiinthedocument,indicateswhichlevelofthetreewiwassampledfrom.Foraparticulardocumentw,thejointdistributionofobservedandhiddenvariables,conditionedon(hyper)-parameters andfactorsasp(w;z;c;; j ;;T)=NYi=1p(wijzi;c; )p(zij)p(j )p( j)p(cjT):(2)Herewealsoconditionedp(c)onTtoindicatethatthetreestructureisxedandknown.Inpracticehowever,itisoftendifculttospecifyasuitabletreestructurea-priori.Recentlyhowever,Bleietal.[5]developedahierarchicalLDAmodel,whichautomaticallyinfersthestructureofthetreefromthedata.ThisisachievedbyplacinganestedChineserestaurantprocess(nCRP)priorontreestructures.nCRPprior:ThenestedChineserestaurantprocess[5]speciesadistributiononpartitionsofdocumentsintopathsina(xeddepth)L-leveltree.TogenerateatreestructurefromnCRP,assignmentsofdocumentstopathsaresampledsequentially.TherstdocumentformsaninitialL-levelpath,i.e.atreewithasinglebranch.Eachsubsequentdoc-umentiseitherassignedtooneoftheexistingpaths(wherepathswithmoredocumentsaremoreprobable),ortoanovelpathbranchingoffatanyexisting(non-leaf)nodeofthetree.Theprobabilityofcreatingnovelbranchesiscontrolledbyparameter ,wheresmallervaluesof resultintreeswithfewerbranches.Notethatthenumberofbranchesateachnodecanvary.UsingthehierarchicalLDAmodeldescribedabovecom-binedwiththenestedCRPpriorontreeswecansamplewordsinadocumentbythefollowinggenerativeprocess[5]:(1)PickaL-levelpathcfromthenCRPprior.(2)Sam-pleL-vectorofmixingweightsfromDirichletdistributionp(j );(3)Samplewordsinadocumentusingthetopicsly- ingalongthepathcinthetree.Thecorrespondinggraphicalmodelisshowningure2(c).Modellearning:ThehierarchicalLDA(hLDA)modelisttedusingaGibbssamplerasdescribedin[5].ThegoalistoobtainsamplesfromtheposteriordistributionofthelatenttreestructureT,thelevelassignmentszofallwordsandthepathassignmentscforalldocumentsconditionedontheobservedcollectionofdocumentsw.ForeachdocumenttheGibbssamplerisdividedintotwosteps.Intherststep,thelevelallocationszmarere-sampledwhilekeepingthecurrentpathassignmentcmxed.Inthesecondstep,thepathassignmentcmisre-sampledwhilekeepingthelevelallocationszmxed,whichcanresultinadeletion/creationofabranchinthetree.Example:ToillustratethehLDAmodelconsiderathreelevelbarhierarchyshowningure3(a).Similar`bar'topicexampleswereshownin[5,12].ThestructureofthetreewassampledfromthenCRPpriorwith =0:3.Figure3(b)showsatopichierarchyautomaticallyrecoveredusingtheGibbssamplerof[5]fromthecollectionof100documents,eachcontaining250words,sampledfromthetopichierarchyshowningure3(a),withtopicproportionssampledfromDirichletpriorwith =[50;30;10].Notethat parametersaresettovalues1toencouragehighmixingoftopicsalongthepath.Wehaveobservedempiricallyonsimilarsimulateddatasets,wherethetruevaluesofz,candTknown,that,theGibbssamplerconvergesveryslowlyrequiringthousandsofiterations.Ifhowever,wetreatthetreelevelassignmentzofeachwordineachdocumentasobservedandxthemtotheirtruevalues,theGibbssamplerndsthecorrecttreestructure(theassignmentscofdocumentstopathsinthetree)withinafewiterations.Ontheotherhand,whenthepathassignmentscaretreatedasobservedandxedtothecorrectvalues,re-coveringthelevelassignmentszisstilldifcultandrequiresthousandsofiterations.Inotherwords,knowingfromwhichlevelofthehierar-chyeachwordcomes,whichistheinformationcarriedinz,greatlysimpliestheanalysisofthedataandmakesndingtheunderlyingtopictreestructuresignicantlyeasier.Moti-vatedbythisobservationwedesignanimagerepresentationwhichwillallowustomakeareasonableguessofz,whichwecanusethentoinitializetheGibbssampler.3.ImagerepresentationusingvisualwordsThegoalistoobtainanimagerepresentationtoleranttointra-classvariationsandacertaindegreeoflightingchanges.WeachievethisbyrepresentingimagesusingavisualvocabularyofquantizedSIFT[19]descriptors.Inad-dition,wewanttoobtaina`coarse-to-ne'descriptionoftheimagewithvaryingdegreesofappearanceandspatiallocal-izationgranularity,suitableforhierarchicalobjectrepresen- isinitializedbysamplingarandomtreefromthenCRPpriorwith =1.WeruntheGibbssampler10times(initial-izedwithadifferentrandomtree)for50iterations.Ateachiteration,thecurrentsampleofzandcisusedtocomputeMAPestimates[12]ofthemixingweights,MAP,andtopicdistributions, MAP,whichareinturnusedtoevaluatethelog-likelihoodoftheobserveddataw.Thislog-likelihoodisusedtoassesstheconvergenceandcomparedifferentrunsoftheGibbssampler(hereweshowmodelswiththehigh-estlog-likelihood).OneiterationoftheGibbssamplertakesabout10secondsona2GHzmachine.Intermsofparametervariation,wefoundthatthehLDAmodelismostsensitivetochoosingthehyperparametercontrollingthesmoothing/sparsityoftopicspecicvisualworddistributions,wheresmallervalues(=0:1)producelargetreeswithsparsetopics,andlargervalues(=1)producesmallertreeswithnon-sparse`shared'topics(here=1).Similarsensitivitytothechoiceofwasfoundinthetextdomain[5].Toencouragehighmixingoftopicsalongpathsinthetreehyperparameter issettovalue1,typically300-500.Asin[5]thenCRPpriorhyperparameterisxedto =1.ThehLDAmodelrequireschoosingthedepthofthehierarchymanuallyandwedemonstratelearn-ingtreeswithupto5-levels.Wefoundthattheinitializationoflevelassignmentszde-scribedaboveisimportant.Wheninitializedwithrandomlevelassignments,thesamplerconvergestoaninferiorsolu-tionbothintermsoflog-likelihoodandclassicationperfor-mance(describedinsection5),evenafter10,000iterations.Notethatinitializationoflevelassignmentszisbasedsolelyonspatialandappearancegranularityofthevisualvocabu-laryanddoesnotrequireanyknowledgeofobjectlabels,i.e.isunsupervised.Thelearnt4-levelobjecthierarchyisshowningure1.Distinctpathsinthetreecorrespondfairlyaccuratelytoob-jectclasses.Inaddition,screensandtrafclightsshareacommonthird-leveltopic(node6);trafclights,screensandswitchesshareacommonsecondleveltopic(node10);andcarssideandcarsrearshareacommonsecondleveltopic(node15).4.2.ExampleII:MSRCdatasetHereweconsiderthemorechallengingMSRC-B1dataset[33]of240imagesof9objectclasses:faces,cows,grass,trees,buildings,cars,airplanes,bicyclesandsky.Weusethemanualsegmentationsprovidedwiththedata,atotalof553segments,andtreateachimagesegmentasasepa-rate`document'.Welearna5-levelhLDAmodel.Asabove,weinitializedthelevelassignmentszusingtheappearanceandspatialgranularityofthevocabulary,thistimestartingatlevel2ofthetree,leavingtheroottopicempty.Thediscov-eredobjecthierarchyisshowningure5.Somenodesofthehierarchyarefurtherillustratedbyexampleimagesegmentsingure6.Theclassicationaccuracyisdiscussednext. 5.AssessinghierarchiesusingclassicationSofarweexaminedthelearnthierarchiesvisually.Inthissectionweassesstheirqualitybyusingthemforclassica-tionofobjectcategories.Notethattheassignmentofimagestopathsinthetreeimpliesahierarchicalpartitionofthedataandwecanusethispartitionforimageclassication.Foraccurateclassi-cation,wewouldlikeallimagesofaparticularobjectclasstobe`assigned'1toasinglenode(internalorleaf)ofthetree(highrecall).Inaddition,wewouldlikenootherimages(ofotherobjectclasses)tobeassignedtothesamenodeofthetree(highprecision).Toreecttheaboverequirementswedenea`classicationoverlapscore'foranobjectclassiandnodetinthetreeas(i;t)=GTi\Nt GTi[Nt;whereGTiisthe(manuallyobtained)groundtruthsetofimagesofclassiandNtisthesetofimageswhichareassignedtoapathpassingthroughnodet.Thisscorerangesbetween0and1withhigherscoresindicatingbetter`overlap'betweentheobjectclassiandnodet.Toobtainasinglenumberperfor-mancemeasure,,wetakethenodewithmaximumover-lapforeachclassandthenaveragescoresoverallclasses,=1=NcPimaxt(i;t),whereNcisthetotalnumberofgroundtruthobjectclasses.Forexample,theobjecthierarchyshowningure1hasclassicationoverlapscore0.95.Theperfectscoreof1.00isnotachieveddueto`computerscreens'beingsplitintothreebottomlevelnodes(3,4and5with20,3and1imagere-spectively).Inthiscasethescoreismeasuredfornode3.Thissplittingseemstobeduetodifferentvisualwordrepre-sentationsoftheinsideofthescreen(dependingonwhetherthescreenisemptyornot).5.1.ComparisonwithLDAHereweusetheclassicationoverlapscoretocomparetheobjecthierarchylearnedfromtheMSRC-B1dataset,showningure5,withpartitionsofthedataobtainedbythestandardLDAmodel[6,22,25]withvaryingnumberoftopics.Thesamerepresentationofimagesegmentsusingvi-sualwordsisusedforbothLDAandhLDA.InthecaseofLDA,weestimatemixingweightsforeachsegmentandassigneachsegmenttothetopicwiththemaximummixingweight.Resultsaresummarizedintable1.Empiricallyweobservedthatifthenumberoftopicsissmall(K=4,5,10)LDAtendstogroupsomeobjectclasses(suchasairplanesandcars,treesandgrass,orfacesandcows)togetherinasin-gletopic.Forahighernumberoflearnedtopics(K20),someobjectclassessuchas`buildings',`grass'and`trees'tendtosplitbetweenseveral,usuallyfairlypure,topics.Insomecasesmixedtopicsalsooccur.IncontrasttoLDA,whichlearnsaattopicstructure,hLDAlearnsatopichier- 1AlthoughinthehLDAmodeleachimageisassignedtoacomplete(roottoaleaf)pathinthetree,inthefollowingwecallallimagesassignedtopathssharingaparticularinternalnodeas`assigned'tothatnode. Figure7.A5-levelhLDAhierarchylearnedontheMSRC-B1datasetusingmultiplesegmentations.Brancheswithlessthan5imagesegments(indistinctimages)areremovedfromthetree.Branchesinthetreeweremanuallylabelledbyobjectclassnames(showninred)basedonvisualinspection.Nodenumbersareshowninblackbeloweachnode.Someofthediscoveredtopicsareshowninmoredetailingure8.Usingautomatic`imperfect'segmentationsseemstoaffectboththestructureofthelearnttreeandthediscoveredobjectclasses(c.f.gure5,wheremanualsegmentationswereused).Someobjectclassesareconsistentlygroupedtogetherusingbothmanualandautomaticsegmentations,notably:(i)carsandairplanesand(ii)treesandgrass.However,usingautomaticsegmentationsresultsinasmallnumberof`spurious'branchescontainingsegmentsofseveralobjectclassesormixedobjectsegments(e.g.node14,showningure8,ornode41).Notethat8(outof9)objectclassesarediscovered.Wedonotndanybuildingtopicsasbuildingsseemtohavelessconsistentsegmentationsacrossthedataset.Usingautomaticsegmentationsenablesdiscoveringnewobjectclassesnotlabelledinthedata(herea`road/asphalt'topic,node28,showningure8). 02 07 61 28 42 43 48 25 19 14 Figure8.Selectednodesfromthehierarchy,showningure7.Eachnodeisillustratedbyamontageofthetopvesegments.Nodenumbers,referringtogure7,areshowntotheleftofeachmontage.Notethatthehierarchyofobjectsandtheirsegmentationwereautomaticallylearnedfromanunlabelledsetofimageswithoutsupervision. [24] J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.InCVPR,pages731–743,1997. [25] J.Sivic,B.C.Russell,A.A.Efros,A.Zisserman,andW.T.Freeman.Discoveringobjectsandtheirlocationinimages.InICCV,pages370–377,2005. [26] E.B.Sudderth,A.Torralba,W.T.Freeman,andA.S.Willsky.De-scribingvisualscenesusingtransformedobjectsandparts.IJCV,77(1–3):291–330,2008. [27] S.TodorovicandN.Ahuja.Extractingsubimagesofanunknowncategoryfromasetofimages.InCVPR,2006. [28] A.Torralba,K.P.Murphy,andW.T.Freeman.Sharingfeatures:ef-cientboostingproceduresformulticlassobjectdetection.InCVPR,pages762–769,2004. [29] A.Vailaya,M.Figueiredo,A.Jain,andH.-J.Zhang.Imageclas-sicationforcontent-basedindexing.IEEETran.onImageProc., 10(1):117–130,2001. [30] M.VarmaandA.Zisserman.Textureclassication:Arelterbanksnecessary?InCVPR,volume2,pages691–698,2003. [31] N.Vasconcelos.Imageindexingwithmixturehierarchies.InCVPR,2001. [32] G.Wang,Y.Zhang,andL.Fei-Fei.Usingdependentregionsforob-jectcategorizationinagenerativeframework.InCVPR,2006. [33] J.Winn,A.Criminisi,andT.Minka.Objectcategorizationbylearneduniversalvisualdictionary.InICCV,pagesI:756–763,2005. [34] L.Zhu,Y.Chen,andA.Yuille.Unsupervisedlearningofaprobabilis-ticgrammarforobjectdetectionandparsing.InNIPS,2006. [35] A.ZweigandD.Weinshall.Exploitingobjecthierarchy:Combiningmodelsfromdifferentcategorylevels.InICCV,2007.