/
Beyond Bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories Beyond Bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories

Beyond Bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
521 views
Uploaded On 2014-12-21

Beyond Bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories - PPT Presentation

edu Beckman Institute University of Illinois Cordelia Schmid CordeliaSchmidinrialpesfr INRIA Rh oneAlpes Montbonnot France Jean Ponce poncecsuiucedu Ecole Normale Sup erieure Paris France Abstract This paper presents a method for recognizing scene ca ID: 27175

edu Beckman Institute University

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Beyond Bags of Features Spatial Pyramid ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

BeyondBagsofFeatures:SpatialPyramidMatchingforRecognizingNaturalSceneCategoriesSvetlanaLazebnikBeckmanInstituteUniversityofIllinoisCordeliaSchmidCordelia.Schmid@inrialpes.frINRIARhMontbonnot,FranceJeanPonceEcoleNormaleSup searchforspeciÞcobjects(e.g.,iftheimage,basedonitsglobaldescription,islikelytobeahighway,wehaveahighprobabilityofÞndingacar,butnotatoaster).Inaddition,thesimplicityandefÞciencyofourmethod,incombina-tionwithitstendencytoyieldunexpectedlyhighrecogni-tionratesonchallengingdata,couldmakeitagoodbase-lineforÒcalibratingÓnewdatasetsandforevaluatingmoresophisticatedrecognitionapproaches.2.PreviousWorkIncomputervision,histogramshavealonghistoryasamethodforimagedescription(see,e.g.,[16,19]).Koen-derinkandVanDoorn[10]havegeneralizedhistogramstolocallyorderlessimages,orhistogram-valuedscalespaces(i.e.,foreachGaussianapertureatagivenlocationandscale,thelocallyorderlessimagereturnsthehistogramofimagefeaturesaggregatedoverthataperture).Ourspatialpyramidapproachcanbethoughtofasanalternativefor-mulationofalocallyorderlessimage,whereinsteadofaGaussianscalespaceofapertures,wedeÞneaÞxedhier-archyofrectangularwindows.KoenderinkandVanDoornhavearguedpersuasivelythatlocallyorderlessimagesplayanimportantroleinvisualperception.Ourretrievalexper-iments(Fig.4)conÞrmthatspatialpyramidscancaptureperceptuallysalientfeaturesandsuggestthatÒlocallyor-derlessmatchingÓmaybeapowerfulmechanismforesti-matingoverallperceptualsimilaritybetweenimages.Itisimportanttocontrastourproposedapproachwithmultiresolutionhistograms[8],whichinvolverepeatedlysubsamplinganimageandcomputingaglobalhistogramofpixelvaluesateachnewlevel.Inotherwords,amul-tiresolutionhistogramvariestheresolutionatwhichthefea-tures(intensityvalues)arecomputed,butthehistogramres-olution(intensityscale)staysÞxed.WetaketheoppositeapproachofÞxingtheresolutionatwhichthefeaturesarecomputed,butvaryingthespatialresolutionatwhichtheyareaggregated.Thisresultsinahigher-dimensionalrep-resentationthatpreservesmoreinformation(e.g.,animageconsistingofthinblackandwhitestripeswouldretaintwomodesateverylevelofaspatialpyramid,whereasitwouldbecomeindistinguishablefromauniformlygrayimageatallbuttheÞnestlevelsofamultiresolutionhistogram).Fi-nally,unlikeamultiresolutionhistogram,aspatialpyramid,whenequippedwithanappropriatekernel,canbeusedforapproximategeometricmatching.TheoperationofÒsubdivideanddisorderÓÑi.e.,par-titiontheimageintosubblocksandcomputehistograms(orhistogramstatistics,suchasmeans)oflocalfeaturesinthesesubblocksÑhasbeenpracticednumeroustimesincomputervision,bothforglobalimagedescription[6,18,20,21]andforlocaldescriptionofinterestregions[12].Thus,thoughtheoperationitselfseemsfundamental,pre-viousmethodsleaveopenthequestionofwhatistherightsubdivisionscheme(althougharegulargridseemstobethemostpopularimplementationchoice),andwhatistherightbalancebetweenÒsubdividingÓandÒdisordering.ÓThespatialpyramidframeworksuggestsapossiblewaytoaddressthisissue:namely,thebestresultsmaybeachievedwhenmultipleresolutionsarecombinedinaprincipledway.ItalsosuggeststhatthereasonfortheempiricalsuccessofÒsubdivideanddisorderÓtechniquesisthefactthattheyac-tuallyperformapproximategeometricmatching.3.SpatialPyramidMatchingWeÞrstdescribetheoriginalformulationofpyramidmatching[7],andthenintroduceourapplicationofthisframeworktocreateaspatialpyramidimagerepresentation.3.1.PyramidMatchKernelsbetwosetsofvectorsinafeaturespace.GraumanandDarrell[7]proposepyramidmatchingtoÞndanapproximatecorrespondencebetweenthesetwosets.Informally,pyramidmatchingworksbyplacingasequenceofincreasinglycoarsergridsoverthefeaturespaceandtakingaweightedsumofthenumberofmatchesthatoccurateachlevelofresolution.AtanyÞxedresolution,twopointsaresaidtomatchiftheyfallintothesamecellofthegrid;matchesfoundatÞnerresolutionsareweightedmorehighlythanmatchesfoundatcoarserresolu-tions.MorespeciÞcally,letusconstructasequenceofgridsatresolutions,suchthatthegridatlevelcellsalongeachdimension,foratotalofcells.Letdenotethehistogramsofatthisres-olution,sothatarethenumbersofpointsthatfallintothethcellofthegrid.Thenthenumberofmatchesatlevelisgivenbythehistogramintersectionfunction[19]:Inthefollowing,wewillabbreviateNotethatthenumberofmatchesfoundatlevelalsoin-cludesallthematchesfoundattheÞnerlevel.There-fore,thenumberofnewmatchesfoundatlevelisgivenTheweightassociatedwithlevelissetto ,whichisinverselyproportionaltocellwidthatthatlevel.Intuitively,wewanttopenalizematchesfoundinlargercellsbecausetheyinvolveincreas-inglydissimilarfeatures.Puttingallthepiecestogether,we getthefollowingdeÞnitionofapyramidmatchkernelX,Y 2LŠI(2)=1 2LI0+L Boththehistogramintersectionandthepyramidmatchker-nelareMercerkernels[7].3.2.SpatialMatchingSchemeAsintroducedin[7],apyramidmatchkernelworkswithanorderlessimagerepresentation.Itallowsforpre-cisematchingoftwocollectionsoffeaturesinahigh-dimensionalappearancespace,butdiscardsallspatialin-formation.ThispaperadvocatesanÒorthogonalÓapproach:performpyramidmatchinginthetwo-dimensionalimagespace,andusetraditionalclusteringtechniquesinfeatureSpeciÞcally,wequantizeallfeaturevectorsintodiscretetypes,andmakethesimplifyingassumptionthatonlyfeaturesofthesametypecanbematchedtoonean-other.Eachchannelgivesustwosetsoftwo-dimensionalvectors,,representingthecoordinatesoffea-turesoftypefoundintherespectiveimages.TheÞnalkernelisthenthesumoftheseparatechannelkernels:X,YThisapproachhastheadvantageofmaintainingcontinuitywiththepopularÒvisualvocabularyÓparadigmÑinfact,itreducestoastandardbagoffeatureswhenBecausethepyramidmatchkernel(3)issimplyaweightedsumofhistogramintersections,andbecausea,b)=min(ca,cbforpositivenumbers,wecanasasinglehistogramintersectionofÒlongÓvectorsformedbyconcatenatingtheappropriatelyweightedhistogramsofallchannelsatallresolutions(Fig.1).Forlevelsandchannels,theresultingvectorhasdimen- .Severalexperi-mentsreportedinSection5usethesettingsof=400,resultingin-dimensionalhistogramin-tersections.However,theseoperationsareefÞcientbecausethehistogramvectorsareextremelysparse(infact,justasin[7],thecomputationalcomplexityofthekernelislinearinthenumberoffeatures).ItmustalsobenotedthatwedidnotobserveanysigniÞcantincreaseinperformancebeyond=200,wheretheconcatenatedhistogramsareonly Inprinciple,itispossibletointegrategeometricinformationdirectlyintotheoriginalpyramidmatchingframeworkbytreatingimagecoordi-natesastwoextradimensionsinthefeaturespace. ++++++++ ++ + +++++++++ ++ + +++++++++ ++ + level2level1level0 + Figure1.Toyexampleofconstructingathree-levelpyramid.Theimagehasthreefeaturetypes,indicatedbycircles,diamonds,andcrosses.Atthetop,wesubdividetheimageatthreedifferentlev-elsofresolution.Next,foreachlevelofresolutionandeachchan-nel,wecountthefeaturesthatfallineachspatialbin.Finally,weweighteachspatialhistogramaccordingtoeq.(3).TheÞnalimplementationissueisthatofnormalization.FormaximumcomputationalefÞciency,wenormalizeallhistogramsbythetotalweightofallfeaturesintheimage,ineffectforcingthetotalnumberoffeaturesinallimagestobethesame.Becauseweuseadensefeaturerepresentation(seeSection4),andthusdonotneedtoworryaboutspuri-ousfeaturedetectionsresultingfromclutter,thispracticeissufÞcienttodealwiththeeffectsofvariableimagesize.4.FeatureExtractionThissectionbrießydescribesthetwokindsoffeaturesusedintheexperimentsofSection5.First,wehaveso-calledÒweakfeatures,Ówhichareorientededgepoints,i.e.,pointswhosegradientmagnitudeinagivendirectionex-ceedsaminimumthreshold.Weextractedgepointsattwoscalesandeightorientations,foratotalof=16nels.WedesignedthesefeaturestoobtainarepresentationsimilartotheÒgistÓ[21]ortoaglobalSIFTdescriptor[12]oftheimage.Forbetterdiscriminativepower,wealsoutilizehigher-dimensionalÒstrongfeatures,ÓwhichareSIFTdescriptorspixelpatchescomputedoveragridwithspacingpixels.Ourdecisiontouseadenseregulargridin-steadofinterestpointswasbasedonthecomparativeevalu-ationofFei-FeiandPerona[4],whohaveshownthatdensefeaturesworkbetterforsceneclassiÞcation.Intuitively,adenseimagedescriptionisnecessarytocaptureuniformre-gionssuchassky,calmwater,orroadsurface(todealwithlow-contrastregions,weskiptheusualSIFTnormalizationprocedurewhentheoverallgradientmagnitudeofthepatchistooweak).Weperform-meansclusteringofarandomsubsetofpatchesfromthetrainingsettoformavisualvo-cabulary.Typicalvocabularysizesforourexperimentsare=200=400 ofÞcekitchenlivingroom bedroomstoreindustrial tallbuildinginsidecity highwayopencountry suburbFigure2.Exampleimagesfromthescenecategorydatabase.ThestarredcategoriesoriginatefromOlivaandTorralba[13]. Weakfeatures(=16 Strongfeatures(=200 Strongfeatures(=400 L Single-levelPyramid Single-levelPyramid Single-levelPyramid ×1) 0.5 0.6 0.3 ×2) 0.30.6 0.60.5 0.40.5 ×4) 0.60.7 0.30.3 0.50.5 ×8) 0.80.6 0.40.3 0.50.6 Table1.ClassiÞcationresultsforthescenecategorydatabase(seetext).Thehighestresultsforeachkindoffeatureareshowninbold.5.ExperimentsInthissection,wereportresultsonthreediversedatasets:Þfteenscenecategories[4],Caltech-101[3],andGraz[14].Weperformallprocessingingrayscale,evenwhencolorimagesareavailable.Allexperimentsarere-peatedtentimeswithdifferentrandomlyselectedtrainingandtestimages,andtheaverageofper-classrecognitionisrecordedforeachrun.TheÞnalresultisreportedasthemeanandstandarddeviationoftheresultsfromthein-dividualruns.Multi-classclassiÞcationisdonewithasup-portvectormachine(SVM)trainedusingtheone-versus-allrule:aclassiÞerislearnedtoseparateeachclassfromtherest,andatestimageisassignedthelabeloftheclassiÞerwiththehighestresponse. Thealternativeperformancemeasure,thepercentageofalltestim-agesclassiÞedcorrectly,canbebiasediftestsetsizesfordifferentclassesvarysigniÞcantly.ThisisespeciallytrueoftheCaltech-101dataset,wheresomeoftheÒeasiestÓclassesaredisproportionatelylarge.5.1.SceneCategoryRecognitionOurÞrstdataset(Fig.2)iscomposedofÞfteenscenecat-egories:thirteenwereprovidedbyFei-FeiandPerona[4](eightofthesewereoriginallycollectedbyOlivaandTor-ralba[13]),andtwo(industrialandstore)werecollectedbyourselves.Eachcategoryhas200to400images,andav-erageimagesizeispixels.ThemajorsourcesofthepicturesinthedatasetincludetheCORELcollection,personalphotographs,andGoogleimagesearch.Thisisoneofthemostcompletescenecategorydatasetusedintheliteraturethusfar.Table1showsdetailedresultsofclassiÞcationexperi-mentsusing100imagesperclassfortrainingandtherestfortesting(thesamesetupas[4]).First,letusexaminetheperformanceofstrongfeaturesfor=200correspondingtoastandardbagoffeatures.Ourclassi-Þcationrateisforthe13classesinheritedfromFei-FeiandPerona),whichismuchhigherthantheirbestresultsof,achievedwithanorderlessmethodandafeaturesetcomparabletoours.WeconjecturethatFei-FeiandPeronaÕsapproachisdisadvantagedbyitsre- officeofficekitchenkitchen68.5living roomliving room60.4bedroombedroom68.3storestore76.2industrialindustrial65.4tall buildinginside cityinside city80.5streetstreet90.2highwayhighway86.6coastcoast82.4open countryopen country70.5mountainmountain88.8forestforest94.7suburbsuburb99.4 Figure3.Confusiontableforthescenecategorydataset.AverageclassiÞcationratesforindividualclassesarelistedalongthediag-onal.TheentryinthethrowandthcolumnisthepercentageofimagesfromclassthatweremisidentiÞedasclasslianceonlatentDirichletallocation(LDA)[2],whichisessentiallyanunsuperviseddimensionalityreductiontech-niqueandassuch,isnotnecessarilyconducivetoachiev-ingthehighestclassiÞcationaccuracy.Toverifythis,wehaveexperimentedwithprobabilisticlatentsemanticanaly-sis(pLSA)[9],whichattemptstoexplainthedistributionoffeaturesintheimageasamixtureofafewÒscenetopicsÓorÒaspectsÓandperformsverysimilarlytoLDAinprac-tice[17].FollowingtheschemeofQuelhasetal.[15],werunpLSAinanunsupervisedsettingtolearna60-aspectmodelofhalfthetrainingimages.Next,weapplythismodeltotheotherhalftoobtainprobabilitiesoftopicsgiveneachimage(thusreducingthedimensionalityofthefeaturespacefrom200to60).Finally,wetraintheSVMonthesereducedfeaturesandusethemtoclassifythetestset.Inthissetup,ouraverageclassiÞcationratedropstotheoriginal.Forthe13classesinheritedfromFei-FeiandPerona,itdropsto,whichisnowverysimilartotheirresults.Thus,wecanseethatla-tentfactoranalysistechniquescanadverselyaffectclassiÞ-cationperformance,whichisalsoconsistentwiththeresultsofQuelhasetal.[15].Next,letusexaminethebehaviorofspatialpyramidmatching.Forcompleteness,Table1liststheperformanceachievedusingjustthehighestlevelofthepyramid(theÒsingle-levelÓcolumns),aswellastheperformanceofthecompletematchingschemeusingmultiplelevels(theÒpyra-midÓcolumns).Forallthreekindsoffeatures,resultsim-provedramaticallyaswegofromtoamulti-levelsetup.Thoughmatchingatthehighestpyramidlevelseemstoaccountformostoftheimprovement,usingallthelevelstogetherconfersastatisticallysigniÞcantbeneÞt.Forstrongfeatures,single-levelperformanceactuallydropsaswego.ThismeansthatthehighestlevelofpyramidistooÞnelysubdivided,withindivid-ualbinsyieldingtoofewmatches.Despitethediminisheddiscriminativepowerofthehighestlevel,theperformanceoftheentirepyramidremainsessentiallyidenticaltothatofthepyramid.This,then,isthemainadvantageofthespatialpyramidrepresentation:becauseitcombinesmultipleresolutionsinaprincipledfashion,itisrobusttofailuresatindividuallevels.Itisalsointerestingtocompareperformanceofdiffer-entfeaturesets.Asexpected,weakfeaturesdonotper-formaswellasstrongfeatures,thoughincombinationwiththespatialpyramid,theycanalsoachieveacceptablelevelsofaccuracy(notethatbecauseweakfeatureshaveamuchhigherdensityandmuchsmallerspatialextentthanstrongfeatures,theirperformancecontinuestoimproveaswego).Increasingthevisualvocabularysizefrom=200=400resultsinasmallperfor-manceincreaseat,butthisdifferenceisallbutelim-inatedathigherpyramidlevels.Thus,wecanconcludethatthecoarse-grainedgeometriccuesprovidedbythepyramidhavemorediscriminativepowerthananenlargedvisualvo-cabulary.Ofcourse,theoptimalwaytoexploitstructurebothintheimageandinthefeaturespacemaybetocom-binetheminauniÞedmultiresolutionframework;thisissubjectforfutureresearch.Fig.3showsaconfusiontablebetweentheÞfteenscenecategories.Notsurprisingly,confusionoccursbetweentheindoorclasses(kitchen,bedroom,livingroom),andalsobe-tweensomenaturalclasses,suchascoastandopencountry.Fig.4showsexamplesofimageretrievalusingthespatialpyramidkernelandstrongfeatureswith=200.Theseexamplesgiveasenseofthekindofvisualinformationcap-turedbyourapproach.Inparticular,spatialpyramidsseemsuccessfulatcapturingtheorganizationofmajorpictorialelementsorÒblobs,Óandthedirectionalityofdominantlinesandedges.Becausethepyramidisbasedonfeaturescom-putedattheoriginalimageresolution,evenhigh-frequencydetailscanbepreserved.Forexample,queryimage(b)showswhitekitchencabinetdoorswithdarkborders.ThreeoftheretrievedÒkitchenÓimagescontainsimilarcabinets,theÒofÞceÓimageshowsawallplasteredwithwhitedocu-mentsindarkframes,andtheÒinsidecityÓimageshowsawhitebuildingwithdarkerwindowframes.5.2.Caltech-101OursecondsetofexperimentsisontheCaltech-101database[3](Fig.5).Thisdatabasecontainsfrom31to800imagespercategory.Mostimagesaremediumresolu-tion,i.e.,aboutpixels.Caltech-101isprobablythemostdiverseobjectdatabaseavailabletoday,thoughit (a)kitchenlivingroomlivingroomlivingroomofÞcelivingroomlivingroomlivingroomlivingroom (b)kitchenofÞceinsidecity (c)storemountainforest (d)tallbldginsidecityinsidecity (e)tallbldginsidecitymountainmountainmountain (f)insidecitytallbldg (g)streetFigure4.Retrievalfromthescenecategorydatabase.Thequeryimagesareontheleft,andtheeightimagesgivingthehighestvaluesofthespatialpyramidkernel(for=200)areontheright.Theactualclassofincorrectlyretrievedimagesislistedbelowthem.isnotwithoutshortcomings.Namely,mostimagesfeaturerelativelylittleclutter,andtheobjectsarecenteredandoc-cupymostoftheimage.Inaddition,anumberofcategories,suchasminaret(seeFig.5),areaffectedbyÒcornerÓarti-factsresultingfromartiÞcialimagerotation.Thoughtheseartifactsaresemanticallyirrelevant,theycanprovidestablecuesresultinginmisleadinglyhighrecognitionrates.WefollowtheexperimentalsetupofGraumanandDar-rell[7]andJ.Zhangetal.[25],namely,wetrainon30im-agesperclassandtestontherest.ForefÞciency,welimitthenumberoftestimagesto50perclass.Notethat,be-causesomecategoriesareverysmall,wemayendupwithjustasingletestimageperclass.Table2givesabreak-downofclassiÞcationratesfordifferentpyramidlevelsforweakfeaturesandstrongfeatureswith=200.Theresultsfor=400arenotshown,becausejustasforthescenecategorydatabase,theydonotbringanysigniÞ-cantimprovement.For,strongfeaturesgivewhichisslightlybelowthereportedbyGraumanandDarrell.Ourbestresultis,achievedwithstrongfea-turesat.ThisexceedsthehighestclassiÞcationratepreviouslypublished,thatofreportedbyJ.Zhangetal.[25].Bergetal.[1]reportaccuracyusing15trainingimagesperclass.Ouraveragerecognitionratewiththissetupis.Thebehaviorofweakfeaturesonthisdatabaseisalsonoteworthy:for,theygiveaclas-siÞcationrateof,whichisconsistentwithanaivegraylevelcorrelationbaseline[1],butinconjunctionwithafour-levelspatialpyramid,theirperformancerisestoÑonparwiththebestresultsintheliterature.Fig.5showsafewoftheÒeasiestÓandÒhardestÓobjectclassesforourmethod.Thesuccessfulclassesareeitherdominatedbyrotationartifacts(likeminaret),haveverylit-tleclutter(likewindsorchair),orrepresentcoherentnaturalÒscenesÓ(likejoshuatreeandokapi).Theleastsuccess-fulclassesareeithertexturelessanimals(likebeaverandcougar),animalsthatcamoußagewellintheirenvironment See,however,H.Zhangetal.[24]intheseproceedings,foranal-gorithmthatyieldsaclassiÞcationrateof%for30trainingexamples,and%for15examples. minaret()windsorchair()joshuatree()okapi( cougarbody()beaver()crocodile()ant(Figure5.Caltech-101results.Top:someclassesonwhichourmethod(=200)achievedhighperformance.Bottom:someclassesonwhichourmethodperformedpoorly. Weakfeatures Strongfeatures(200) L Single-levelPyramid Single-levelPyramid 0 0.9 1.2 1 1.21.3 0.90.8 2 1.11.4 0.90.8 3 0.81.1 0.90.7 Table2.ClassiÞcationresultsfortheCaltech-101database. class1mis- class2mis- class1/class2 classiÞedas classiÞedas class2 class1 ketch/schooner 21.6 14.8 lotus/waterlily 15.3 20.0 crocodile/crocodilehead 10.5 10.0 crayÞsh/lobster 9.1 ßamingo/ibis 9.5 10.4 Table3.TopÞveconfusionsforourmethod(=200ontheCaltech-101database. Opelt[14] Zhang[25] Bikes 2.0 2.5 92.0 People 79.52.3 3.1 88.0 Table4.Resultsofourmethod(=200)fortheGrazdatabaseandcomparisonwithtwoexistingmethods.(likecrocodile),orÒthinÓobjects(likeant).Table3showsthetopÞveofourmethodÕsconfusions,allofwhicharebetweencloselyrelatedclasses.Tosummarize,ourmethodhasoutperformedbothstate-of-the-artorderlessmethods[7,25]andmethodsbasedonprecisegeometriccorrespondence[1].SigniÞcantly,allthesemethodsrelyonsparsefeatures(interestpointsorsparselysamplededgepoints).However,becauseofthegeometricstabilityandlackofclutterofCaltech-101,densefeaturescombinedwithglobalspatialrelationsseemtocap-turemorediscriminativeinformationabouttheobjects.5.3.TheGrazDatasetAsseenfromSections5.1and5.2,ourproposedap-proachdoesverywellonglobalsceneclassiÞcationtasks,oronobjectrecognitiontasksintheabsenceofclutterwithmostoftheobjectsassumingÒcanonicalÓposes.However,itwasnotdesignedtocopewithheavyclutterandposechanges.Itisinterestingtoseehowwellouralgorithmcandobyexploitingtheglobalscenecuesthatstillremainundertheseconditions.Accordingly,ourÞnalsetofex-perimentsisontheGrazdataset[14](Fig.6),whichischaracterizedbyhighintra-classvariation.Thisdatasethastwoobjectclasses,bikes(373images)andpersons(460im-ages),andabackgroundclass(270images).Theimageres-olutionis,andtherangeofscalesandposesatwhichexemplarsarepresentedisverydiverse,e.g.,aÒper-sonÓimagemayshowapedestrianinthedistance,asideviewofacompletebody,orjustacloseupofahead.Forthisdatabase,weperformtwo-classdetection(objectvs.back-ground)usinganexperimentalsetupconsistentwiththatofOpeltetal.[14].Namely,wetraindetectorsforpersonsandbikeson100positiveand100negativeimages(ofwhich50aredrawnfromtheotherobjectclassand50fromtheback-ground),andtestonasimilarlydistributedset.WegenerateROCcurvesbythresholdingrawSVMoutput,andreporttheROCequalerrorrateaveragedovertenruns.Table4summarizesourresultsforstrongfeatureswith=200.Notethatthestandarddeviationisquitehighbe-causetheimagesinthedatabasevarygreatlyintheirlevelofdifÞculty,sotheperformanceforanysinglerunisdepen-dentonthecompositionofthetrainingset(inparticular,for,theperformanceforbikesrangesfromForthisdatabase,theimprovementfromisrelativelysmall.Thismakesintuitivesense:whenaclassischaracterizedbyhighgeometricvariability,itisdifÞculttoÞndusefulglobalfeatures.Despitethisdisadvantageofourmethod,westillachieveresultsveryclosetothoseofOpeltetal.[14],whouseasparse,locallyinvariantfeaturerepresentation.Inthefuture,weplantocombinespatialpyramidswithinvariantfeaturesforimprovedrobustnessagainstgeometricchanges.6.DiscussionThispaperhaspresentedaÒholisticÓapproachforimagecategorizationbasedonamodiÞcationofpyramidmatchkernels[7].Ourmethod,whichworksbyrepeatedlysub-dividinganimageandcomputinghistogramsofimagefea-turesovertheresultingsubregions,hasshownpromisingre- bikepersonbackgroundFigure6.TheGrazdatabase.sultsonthreelarge-scale,diversedatasets.Despitethesim-plicityofourmethod,anddespitethefactthatitworksnotbyconstructingexplicitobjectmodels,butbyusingglobalcuesasindirectevidenceaboutthepresenceofanobject,itconsistentlyachievesanimprovementoveranorderlessimagerepresentation.Thisisnotatrivialaccomplishment,giventhatawell-designedbag-of-featuresmethodcanout-performmoresophisticatedapproachesbasedonpartsandrelations[25].Ourresultsalsounderscorethesurprisingandubiquitouspowerofglobalscenestatistics:eveninhighlyvariabledatasets,suchasGraz,theycanstillprovideusefuldiscriminativeinformation.ItisimportanttodevelopmethodsthattakefulladvantageofthisinformationÑei-therasstand-alonescenecategorizers,asÒcontextÓmod-uleswithinlargerobjectrecognitionsystems,orastoolsforevaluatingbiasespresentinnewlycollecteddatasets.Acknowledgments.ThisresearchwaspartiallysupportedbytheNationalScienceFoundationundergrantsIIS-0308087andIIS-0535152,andtheUIUC/CNRS/INRIAcollaborationagreement.References[1]A.Berg,T.Berg,andJ.Malik.Shapematchingandobjectrecognitionusinglowdistortioncorrespondences.InProc.,volume1,pages26Ð33,2005.[2]D.Blei,A.Ng,andM.Jordan.LatentDirichletallocation.JournalofMachineLearningResearch,3:993Ð1022,2003.[3]L.Fei-Fei,R.Fergus,andP.Perona.Learninggenerativevisualmodelsfromfewtrainingexamples:anincrementalBayesianapproachtestedon101objectcategories.InCVPRWorkshoponGenerative-ModelBasedVision,2004.http://www.vision.caltech.edu/Image [4]L.Fei-FeiandP.Perona.ABayesianhierarchicalmodelforlearningnaturalscenecategories.InProc.CVPR,2005.[5]R.Fergus,P.Perona,andA.Zisserman.Objectclassrecog-nitionbyunsupervisedscale-invariantlearning.InProc.,volume2,pages264Ð271,2003.[6]M.GorkaniandR.Picard.TextureorientationforsortingphotosÒataglanceÓ.InIAPRInternationalConferenceonPatternRecognition,volume1,pages459Ð464,1994.[7]K.GraumanandT.Darrell.Pyramidmatchkernels:Dis-criminativeclassiÞcationwithsetsofimagefeatures.InProc.ICCV,2005.[8]E.Hadjidemetriou,M.Grossberg,andS.Nayar.Multireso-lutionhistogramsandtheiruseinrecognition.IEEETrans.PAMI,26(7):831Ð847,2004.[9]T.Hofmann.Unsupervisedlearningbyprobabilisticlatentsemanticanalysis.MachineLearning,42(1):177Ð196,2001.[10]J.KoenderinkandA.V.Doorn.Thestructureoflocallyor-derlessimages.,31(2/3):159Ð168,1999.[11]S.Lazebnik,C.Schmid,andJ.Ponce.Amaximumentropyframeworkforpart-basedtextureandobjectrecognition.InProc.ICCV,2005.[12]D.Lowe.Towardsacomputationalmodelforobjectrecogni-tioninITcortex.InBiologicallyMotivatedComputerVisionpages20Ð31,2000.[13]A.OlivaandA.Torralba.Modelingtheshapeofthescene:aholisticrepresentationofthespatialenvelope.42(3):145Ð175,2001.[14]A.Opelt,M.Fussenegger,A.Pinz,andP.Auer.Weakhypothesesandboostingforgenericobjectdetectionandrecognition.InProc.ECCV,volume2,pages71Ð84,2004.http://www.emt.tugraz.at/÷pinz/data.[15]P.Quelhas,F.Monay,J.-M.Odobez,D.Gatica,T.Tuyte-laars,andL.V.Gool.Modelingsceneswithlocaldescriptorsandlatentaspects.InProc.ICCV,2005.[16]B.SchieleandJ.Crowley.Recognitionwithoutcorrespon-denceusingmultidimensionalreceptiveÞeldhistograms.,36(1):31Ð50,2000.[17]J.Sivic,B.Russell,A.Efros,A.Zisserman,andW.Freeman.Discoveringobjectsandtheirlocationinimages.InProc.,2005.[18]D.Squire,W.Muller,H.Muller,andJ.Raki.Content-basedqueryofofimagedatabases,inspirationsfromtextretrieval:invertedÞles,frequency-basedweightsandrelevancefeed-back.InProceedingsofthe11thScandinavianconferenceonimageanalysis,pages143Ð149,1999.[19]M.SwainandD.Ballard.Colorindexing.,7(1):11Ð32,[20]M.SzummerandR.Picard.Indoor-outdoorimageclassiÞ-cation.InIEEEInternationalWorkshoponContent-BasedAccessofImageandVideoDatabases,pages42Ð51,1998.[21]A.Torralba,K.P.Murphy,W.T.Freeman,andM.A.Rubin.Context-basedvisionsystemforplaceandobjectrecogni-tion.InProc.ICCV,2003.[22]C.Wallraven,B.Caputo,andA.Graf.Recognitionwithlocalfeatures:thekernelrecipe.InProc.ICCV,volume1,pages257Ð264,2003.[23]J.Willamowski,D.Arregui,G.Csurka,C.R.Dance,andL.Fan.Categorizingninevisualclassesusinglocalappear-ancedescriptors.InICPRWorkshoponLearningforAdapt-ableVisualSystems,2004.[24]H.Zhang,A.Berg,M.Maire,andJ.Malik.SVM-KNN:DiscriminativenearestneighborclassiÞcationforvisualcat-egoryrecognition.InProc.CVPR,2006.[25]J.Zhang,M.Marszalek,S.Lazebnik,andC.Schmid.Localfeaturesandkernelsforclassifcationoftextureandobjectcategories:Anin-depthstudy.TechnicalReportRR-5737,INRIARhone-Alpes,2005.