/
Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng

Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng - PDF document

tawny-fly
tawny-fly . @tawny-fly
Follow
487 views
Uploaded On 2014-11-10

Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng - PPT Presentation

stanfordedu Abstract In this paper we study the problem of 64257negrained im age categorization The goal of our method is to explore 64257ne image statistics and identify the discriminative image patches for recognition We achieve this goal by combin ID: 9842

stanfordedu Abstract this paper

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Combining Randomization and Discriminati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CombiningRandomizationandDiscriminationforFine-GrainedImageCategorizationBangpengYaoAdityaKhoslaLiFei-FeiComputerScienceDepartment,StanfordUniversity,Stanford,CAInthispaper,westudytheproblemof“ne-grainedim-agecategorization.Thegoalofourmethodistoexplore BangpengYaoandAdityaKhoslacontributedequallytothispaper.                   methodsigni“cantlyimprovesthestrengthofthedecisiontreesintherandomforestwhilestillmaintaininglowcorre-lationbetweenthetrees.Thisallowsourmethodtoachievelowgeneralizationerroraccordingtothetheoryofrandomforest[Weevaluateourmethodontwo“ne-grainedcategoriza-tiontasks:humanactivityrecognitioninstillimages[andsubordinatecategorizationofcloselyrelatedanimalspecies[],outperformingstate-of-the-artresults.Further-more,ourmethodidenti“essemanticallymeaningfulimagepatchesthatcloselymatchhumanintuition.Additionally,ourmethodtendstoautomaticallygenerateacoarse-to-“nestructureofdiscriminativeimageregions,whichparallelsthehumanvisualsystem[Theremainingpartofthispaperisorganizedasfollows:discussesrelatedwork.Sec.describesourdensefeaturespaceandSec.describesouralgorithmformin-ingthisspace.ExperimentalresultsarediscussedinSec.andSec.concludesthepaper.2.RelatedWorkImageclassi“cationhasbeenstudiedformanyyears.Mostoftheexistingworkfocusesonbasic-levelcategoriza-tionsuchasobjects[]orscenes[].Inthispaperwefocuson“ne-grainedimagecategorization[whichrequiresanapproachtocapturethe“neanddetailedinformationinimages.Inthiswork,weexploreadensefeaturerepresentationtodistinguish“ne-grainedimageclasses.Ourpreviousworkhasshowntheadvantageofdensefeatures(GroupletŽfea-tures[])inclassifyinghumanactivities.InsteadofusingthegenerativelocalfeaturesasinGrouplet,herewecon-sideraricherfeaturespaceinadiscriminativesettingwherebothlocalandglobalvisualinformationarefusedtogether.Inspiredby[],ourapproachalsoconsiderspairwiseinteractionsbetweenimageregions.Weusearandomforestframeworktoidentifydiscrimi-nativeimageregions.Randomforestshavebeenusedsuc-cessfullyinmanyvisiontaskssuchasobjectdetection[segmentation[]andcodebooklearning[].Inspiredfrom[],wecombinediscriminativetrainingandrandom-izationtoobtainaneffectiveclassi“erwithgoodgeneraliz-ability.Ourmethoddiffersfrom[]inthatforeachtreenode,wetrainanSVMclassi“erfromoneoftherandomlysampledimageregions,insteadofusingAdaBoosttocom-bineweakfeaturesfroma“xedsetofregions.Thisallowsustoexploreanextremelylargefeaturesetef“ciently.Aclassicalimageclassi“cationframework[]isFea-tureExtractionPoolingFeatureextractionaction14]andbetterods[]havebeenextensivelystudiedforobjectrecogni-tion.Inthiswork,weusediscriminativefeatureminingandrandomizationtoproposeanewfeature Figure2.(a)Illustrationofourdensesamplingspace.Wedenselysamplerectangularimagepatcheswithvaryingwidthsandheights.Theregionsarecloselylocatedandhavesigni“cantover-laps.Thereddenotethecentersofthepatches,andthear-rowsindicatetheincrementofthepatchwidthorheight.(Theac-tualdensityofregionsconsideredinouralgorithmissigni“cantlyhigher.This“gurehasbeensimpli“edforvisualclarity.)WenotethattheregionsconsideredbySpatialPyramidMatching[]isaverysmallsubsetlyingalongthediagonaloftheheight-widthplanethatweconsider.(b)Illustrationofsomeimagepatchesthatmaybediscriminativeforplaying-guitarŽ.Allthosepatchescanbesampledfromourdensesamplingspace.approach,anddemonstrateitseffectivenesson“ne-grainedimagecategorizationtasks.3.DenseSamplingSpaceOuralgorithmaimstoidentify“neimagestatisticsthatareusefulfor“ne-grainedcategorization.Forexample,inordertoclassifywhetherahumanisplayingaguitarorholdingaguitarwithoutplayingit,wewanttousetheim-agepatchesbelowthehumanfacethatarecloselyrelatedtothehuman-guitarinteraction(Fig.).Analgorithmthatcanreliablylocatesuchregionsisexpectedtoachievehighclassi“cationaccuracy.Weachievethisgoalbysearchingoverrectangularimagepatchesofarbitrarywidth,height,andimagelocation.Werefertothisextensivesetofimageregionsasthedensesamplingspace,asshowninFig.Furthermore,tocapturemorediscriminativedistinctions,weconsiderinteractionsbetweenpairsofarbitrarypatches.Thepairwiseinteractionsaremodeledbyapplyingconcate-nation,absoluteofdifference,orintersectionbetweenthefeaturerepresentationsoftwoimagepatches.However,thedensesamplingspaceisveryhuge.Sam-plingimagepatchesofsizeinaageeveryfourpixelsleadstothousandsofpatches.Thisincreasesmany-foldswhenconsideringregionswitharbi-trarywidthsandheights.Furtherconsideringpairwisein-teractionsofimagepatcheswilleffectivelyleadtotrillionsoffeaturesforeachimage.Inaddition,thereismuchnoiseandredundancyinthisfeatureset.Ontheonehand,manyimagepatchesarenotdiscriminativefordistinguish-ingdifferentimageclasses.Ontheotherhand,theimagepatchesarehighlyoverlappedinthedensesamplingspace,whichintroducessigni“cantredundancyamongthesefea- tures.Therefore,itischallengingtoexplorethishigh-dimensional,noisy,andredundantfeaturespace.Inthiswork,weaddressthisissueusingrandomization.4.RandomForestwithDiscriminativeDeci-sionTreesInordertoexplorethedensesamplingfeaturespacefor“ne-grainedvisualcategorization,wecombinetwocon-cepts:(1)Discriminativetrainingtoextracttheinformationintheimagepatcheseffectively;(2)toex-plorethedensefeaturespaceef“ciently.Speci“cally,weadoptarandomforest[]frameworkwhereeachtreenodeisadiscriminativeclassi“erthatistrainedononeorapairofimagepatches.Inoursetting,thediscriminativetrainingandrandomizationcanbene“tfromeachother.Wesumma-rizetheadvantagesofourmethodbelow:Therandomforestframeworkallowsustoconsiderasubsetoftheimageregionsatatime,whichallowsustoexplorethedensesamplingspaceef“cientlyinaprincipledway.Randomforestselectsabestimagepatchineachnode,andthereforeitcanremovethenoise-proneimagepatchesandreducetheredundancyinthefeatureset.Byusingdiscriminativeclassi“erstotrainthetreenodes,ourrandomforesthasmuchstrongerdecisiontreeswithsmallcorrelation.Thisallowsourmethodtohavelowgeneralizationerror(Sec.)comparedwiththetraditionalrandomforest[]whichusesweakclas-si“ersinthetreenodes.AnoverviewoftherandomforestframeworkweuseisshowninAlgorithm.Inthefollowingsections,we“rstdescribethisframework(Sec.).Thenweelabo-rateonourfeaturesampling(Sec.)andsplitlearning)strategiesindetail,anddescribethegeneralizationtheory[]ofrandomforestwhichguaranteestheeffective-nessofouralgorithm(Sec.4.1.TheRandomForestFrameworkRandomforestisamulti-classclassi“erconsistingofanensembleofdecisiontreeswhereeachtreeisconstructedviasomerandomization.AsillustratedinFig.,theleafnodesofeachtreeencodeadistributionovertheimageclasses.Allinternalnodescontainabinarytestthatsplitsthedataandsendsthesplitstoitschildrennodes.Thesplit-tingisstoppedwhenaleafnodeisencountered.Animageisclassi“edbydescendingeachtreeandcombiningtheleafdistributionsfromallthetrees.Thismethodallowsthe”ex-ibilitytoexplorealargefeaturespaceeffectivelybecauseitonlyconsidersasubsetoffeaturesineverytreenode.Eachtreereturnstheposteriorprobabilityofanexamplebelongingtothegivenclasses.Theposteriorprobabilityof Weak classifierLeaf(a)Conventionalrandomdecisiontree. Strong classifierLeaf(b)Discriminativedecisiontree.Figure3.Comparisonofconventionalrandomdecisiontreeswithourdiscriminativedecisiontrees.Solidbluearrowsshowbinarysplitsofthedata.Dottedlinesfromtheshadedimageregionsindi-catetheregionusedateachnode.Conventionaldecisiontreesuseinformationfromtheentireimageateachnode,whichencodesnospatialorstructuralinformation,whileourdecisiontreessamplesingleormultipleimageregionsfromthedensesamplingspace).Thehistogramsbelowtheleafnodesillustratethepos-teriorprobabilitydistribution).In(b),dottedredarrowsbetweennodesshowournestedtreestructurethatallowsinformationto”owinatop-downmanner.Ourapproachusesstrongclassi“ersineachnode(Sec.),whiletheconventionalmethodusesweakclassi“ers.aparticularclassateachleafnodeislearnedasthepro-portionofthetrainingimagesbelongingtothatclassatthegivenleafnode.Theposteriorprobabilityofclassatleafoftreeisdenotedas.Thus,atestimagecanbeclassi“edbyaveragingtheposteriorprobabilityfromtheleafnodeofeachtree:  isthepredictedclasslabel,isthetotalnumberoftrees,andistheleafnodethattheimagefallsinto.Inthefollowingsections,wedescribetheprocessofob-usingouralgorithm.Readerscanrefertopreviousworks[]formoredetailsoftheconven-tionaldecisiontreelearningprocedure.4.2.SamplingtheDenseFeatureSpaceAsshowninFig.,eachinternalnodeinourdeci-siontreecorrespondstoasingleorapairofrectangularim-ageregionsthataresampledfromthedensesamplingspace),wheretheregionscanhavemanypossiblewidths,heights,andimagelocations.Inordertosampleacandidateimageregion,we“rstnormalizeallimagestounitwidth foreachtree -Obtainarandomsetoftrainingexamplesneedstosplit i.Randomlysamplethecandidate(pairsof)imageregions(Sec.ii.Selectthebestregiontosplitintotwosets forthecurrentleafnode. Algorithm1Overviewoftheprocessofgrowingdecisiontreesintherandomforestframework.andheight,andthenrandomlysamplefromauniformdistribution.Thesecoordinatesspecifytwodiagonallyoppositeverticesofarectangularre-gion.Suchregionscouldcorrespondtosmallareasoftheimage(e.g.thepurpleboundingboxesinFig.)oreventhecompleteimage.Thisallowsourmethodtocapturebothglobalandlocalinformationintheimage.Inourapproach,eachsampledimageregionisrepre-sentedbyahistogramofvisualdescriptors.Forapairofregions,thefeaturerepresentationisformedbyapplyinghistogramoperations(e.g.concatenation,intersection,etc.)tothehistogramsobtainedfrombothregions.Furthermore,thefeaturesareaugmentedwiththedecisionvalue(describedinSec.)ofthisimagefromitsparentnode(indicatedbythedashedredlinesinFig.).Therefore,ourfeaturerepresentationcombinestheinformationofallupstreamtreenodesthatthecorrespondingimagehasde-scendedfrom.WerefertothisideaasnestingŽ.Usingfeaturesamplingandnesting,weobtainacandidatesetof,correspondingtoacandidateimagere-gionofthecurrentnode.Implementationdetails.Ourmethodis”exibletousemanydifferentvisualdescriptors.Inthiswork,wedenselyextractSIFT[]descriptorsoneachimagewithaspacingoffourpixels.Thescalesofthegridstoextractdescrip-torsare8,12,16,24,and30.Usingk-meansclustering,weconstructavocabularyofcodewords.Then,weuseLocality-constrainedLinearCoding[]toassignthede-scriptorstocodewords.Abag-of-wordshistogramrepre-sentationisusediftheareaofthepatchissmallerthan0.2,whilea2-levelor3-levelspatialpyramidisusediftheareaisbetween0.2and0.8orlargerthan0.8respectively.Duringsampling(stepiofAlgorithm),weconsiderfoursettingsofimagepatches:asingleimagepatchandthreetypesofpairwiseinteractions(concatenation,inter- Adictionarysizeof1024,256,256isusedforPASCALaction[PPMI[],andCaltech-UCSDBirds[]datasetsrespectively.section,andabsoluteofdifferenceofthetwohistograms).Wesample25and50imageregions(orpairsofregions)intherootnodeandthe“rstlevelnodesrespectively,andsample100regions(orpairsofregions)inallothernodes.Samplingasmallernumberofimagepatchesintherootcanreducethecorrelationbetweentheresultingtrees.4.3.LearningtheSplitsInthissection,wedescribetheprocessoflearningthebinarysplitsofthedatausingSVM(stepiiinAlgorithmThisisachievedintwosteps:(1)Randomlyassigningallexamplesfromeachclasstoabinarylabel;(2)UsingSVMtolearnabinarysplitofthedata.Assumethatwehaveclassesofimagesatagivennode.Weuniformlysamplebinaryvariables,,andas-signallexamplesofaparticularclassaclasslabelofAseachnodeperformsabinarysplitofthedata,thisal-lowsustolearnasimplebinarySVMateachnode.Thisimprovesthescalabilityofourmethodtoalargenumberofclassesandresultsinwell-balancedtrees.Usingthefeatureofanimageregion(orpairsofregions)asdescribedinSec.,we“ndabinarysplitofthedata:gotoleftchildgotorightchildisthesetofweightslearnedfromalinearSVM.Weevaluateeachbinarysplitthatcorrespondstoanim-ageregionorpairsofregionswiththeinformationgaincri-teria[],whichiscomputedfromthecompletetrainingim-agesthatfallatthecurrenttreenode.Thesplitsthatmaxi-mizetheinformationgainareselectedandthesplittingpro-cess(stepiiiinAlgorithm)isrepeatedwiththenewsplitsofthedata.Thetreesplittingstopsifapre-speci“edmax-imumtreedepthhasbeenreached,ortheinformationgainofthecurrentnodeislargerthanathreshold,orthenumberofsamplesinthecurrentnodeissmall.4.4.GeneralizationErrorofRandomForestsIn[],ithasbeenshownthatanupperboundforthegen-eralizationerrorofarandomforestisgivenbyisthestrengthofthedecisiontreesintheforest,andisthecorrelationbetweenthetrees.Therefore,thegener-alizationerrorofarandomforestcanbereducedbymakingthedecisiontreesstrongerorreducingthecorrelationbe-tweenthetrees.Inourapproach,welearndiscriminativeSVMclassi-“ersforthetreenodes.Therefore,comparedtothetradi-tionalrandomforestswherethetreenodesareweakclas-si“ersofrandomlygeneratedfeatureweights[],ourde-cisiontreesaremuchstronger.Furthermore,sinceweareconsideringanextremelydensefeaturespace,eachdeci-siontreeonlyconsidersarelativelysmallsubsetofimage Method Phoning Playing Reading Riding Riding Running Taking Using Walking Overall instrument bike horse photo computer CVC-BASE 56.2 56.5 34.7 75.1 83.6 86.5 25.4 60.0 69.2 60.8 CVC-SEL 49.8 52.8 34.3 74.2 85.5 85.1 24.9 64.1 72.5 60.4 SURREY-KDA 52.6 53.5 35.9 81.0 89.3 86.5 32.8 59.2 68.6 62.2 UCLEAR-DOSP 47.0 57.8 26.9 78.8 89.7 87.3 32.5 60.0 70.1 61.1 UMCO-KSVM 53.5 43.0 32.0 67.9 68.8 83.0 34.1 45.9 60.4 54.3 OurMethod 45.0 57.4 41.5 81.8 90.5 89.5 37.9 65.0 72.7 64.6 Table1.Comparisonoftheaverageprecision(%)ofourmethodwiththewinnersofPASCALVOC2010actionclassi“cationchallenge[Eachrowshowstheresultsobtainedfromonemethod.Thebestresultsarehighlightedwithboldfonts.patches.Thismeansthereislittlecorrelationbetweenthetrees.Therefore,ourrandomforestwithdiscriminativede-cisiontreesalgorithmcanachieveverygoodperformanceon“ne-grainedimageclassi“cation,whereexploring“neimagestatisticsdiscriminativelyisimportant.InSec.weshowthestrengthandcorrelationofdifferentsettingsofrandomforestswithrespecttothenumberofdecisiontrees,whichjusti“estheabovearguments.Pleasereferto[]fordetailsabouthowtocomputethestrengthandcorrelationvaluesforarandomforest.5.ExperimentsInthissection,weevaluateouralgorithmonthree“ne-grainedimagedatasets:theactionclassi“cationdatasetinPASCALVOC2010[](Sec.),actionsofpeople-playing-musical-instrument(PPMI)[](Sec.),andasubordinateobjectcategorizationdatasetof200birdspecies[](Sec.).Experimentalresultsshowthatouralgorithmoutperformsstate-of-the-artmethodsonthesedatasets.Wealsoevaluatethestrengthandcorrelationofthedecisiontreesinourmethod,andcomparetheresultwiththeothersettingsofrandomforeststoshowwhyourmethodcanleadtobetterclassi“cationperformance(Sec.5.1.PASCALActionClassi“cationThemostrecentPASCALVOCchallenge[]incorpo-ratedthetaskofrecognizingactionsinstillimages.Theimagesdescribeninecommonhumanactivities:PhoningŽ,PlayingamusicalinstrumentŽ,ReadingŽ,Ridingabicy-cleormotorcycleŽ,RidingahorseŽ,RunningŽ,TakingaphotographŽ,UsingacomputerŽ,andWalkingŽ.Eachpersonthatweneedtoclassifyisindicatedbyaboundingboxandisannotatedwithoneofthenineactionstheyareperforming.Thereare4090training/validationimagesandasimilarnumberoftestingimagesforeachclass.Asin[],weobtainaforegroundimageforeachper-sonbyextendingtheboundingboxofthepersontocontaintheoriginalsizeoftheboundingbox,andresizingitsuchthatthelargerdimensionis300pixels.Wealsoresizetheoriginalimageaccordingly.Thereforeforeachperson,wehaveapersonimageŽaswellasabackgroundimageŽ. Figure4.Heatmapsthatshowdistributionsoffrequencythatanimagepatchisselectedinourmethod.Theheatmapsareobtainedbyaggregatingimageregionsofallthetreenodesintherandomforestweightedbytheprobabilityofthecorrespondingclass.Redindicateshighfrequencyandblueindicateslowfrequency.Weonlysampleregionsfromtheforegroundandconcate-natethefeatureswitha2-levelspatialpyramidoftheback-ground.Weuse100decisiontreesinourrandomforest.WecompareouralgorithmwiththemethodsonPAS-CALchallenge[]thatachievethebestaverageprecision.TheresultsareshowninTbl..Ourmethodoutperformstheothersintermsofmeanaverageprecision,andachievesthebestresultonsevenofthenineactions.NotethatweachievedthisaccuracybasedononlygrayscaleSIFTde-scriptors,withoutusinganyotherfeaturesorcontextualin-formationlikeobjectdetectors.showsthefrequencyofanimagepatchbeingse- Method BoW Grouplet SPM LLC Ours [22] [13] [20] mAP(%) 22.7 36.7 39.1 41.8 47.0 Table2.MeanAveragePrecision(%mAP)onthe24-classclassi“-cationproblemofthePPMIdataset.Thebestresultishighlightedwithboldfonts.ThegroupletusesoneSIFTscale,whilealltheothermethodsusemultipleSIFTscalesdescribedinSec. Instrument BoW Grouplet SPM LLC Ours [22] [13] [20] Bassoon 73.6 78.5 84.6 85.0 86.2 Erhu 82.2 87.6 88.0 89.5 89.8 Flute 86.3 95.7 95.3 97.3 98.6 FrenchHorn 79.0 84.0 93.2 93.6 97.3 Guitar 85.1 87.7 93.7 92.4 93.0 Saxophone 84.4 87.7 89.5 88.2 92.4 Violin 80.6 93.0 93.4 96.3 95.7 Trumpet 69.3 76.3 82.5 86.7 90.0 Cello 77.3 84.6 85.7 82.3 86.7 Clarinet 70.5 82.3 82.7 84.8 90.4 Harp 75.0 87.1 92.1 93.9 92.8 Recorder 73.0 76.5 78.0 79.1 92.8 Average 78.0 85.1 88.2 89.2 92.1 Table3.ComparisonofmeanAveragePrecision(%mAP)ofourmethodandthebaselineresultsonthePPMIbinaryclassi“cationtasksofpeopleplayingandholdingdifferentmusicalinstruments.Eachcolumnshowstheresultsobtainedfromonemethod.Thebestresultsarehighlightedwithboldfonts.lectedbyourmethod.Foreachactivity,the“gureisob-tainedbyconsideringthefeaturesselectedinthetreenodesweightedbytheproportionofsamplesofthisactivityinthisnode.Fromtheresults,wecanclearlyseethedifferenceofdistributionsfordifferentactivities.Forexample,theim-agepatchescorrespondingtohuman-objectinteractionsareusuallyhighlighted,suchasthepatchesofbikesandbooks.Wecanalsoseethattheimagepatchescorrespondingtobackgroundarenotfrequentlyselected.Thisdemonstratesouralgorithmsabilitytodealwithbackgroundclutter.5.2.People-Playing-Musical-Instrument(PPMI)Thepeople-playing-musical-instrument(PPMI)datasetisintroducedin[].Thisdatasetputsemphasisonunder-standingsubtleinteractionsbetweenhumansandobjects.Therearetwelvemusicalinstruments;foreachinstrumentthereareimagesofpeopleplayingtheinstrumentandhold-ingtheinstrumentbutnotplayingit.Weevaluatetheper-formanceofourmethodwith100decisiontreesonthe24-classclassi“cationproblem.Wecompareourmethodwithmanybaselineresults.Tbl.showsthatwesigni“cantly Thebaselineresultsareavailablefromthedatasetwebsite (a)”ute (b)guitar (c)violinFigure5.(a)Heatmapofthedominantregionsofinterestselectedbyourmethodforplaying”uteonimagesofplaying”ute(toprow)andholdinga”utewithoutplayingit(bottomrow).(b,c)showssimilarimagesforguitarandviolin,respectively.Refertoforhowtheheatmapsareobtained.  # $& '(!)*& '(!+)Figure6.HeatmapforplayingtrumpetŽclasswiththeweightedaverageareaofselectedimageregionsforeachtreedepth.outperformthebaselineresults.showstheresultofourmethodonthe12binaryclassi“cationtaskswhereeachtaskinvolvesdistinguishingtheactivitiesofplayingandnotplayingforthesameinstru-ment.DespiteahighbaselineofmAP,ourmethodoutperformsbytoachievearesultofoverall.Furthermore,weoutperformthebaselinemethodsonnineofthetwelvebinaryclassi“cationtasks.InFig.,wevi-sualizetheheatmapofthefeatureslearnedforthistask.Weobservethattheyshowsemanticallymeaningfulloca-tionsofwherewewouldexpectthediscriminativeregionsofpeopleplayingdifferentinstrumentstooccur.Forexam-ple,for”ute,theregionaroundthefaceprovidesimportantinformationwhileforguitar,theregiontotheleftofthetorsoprovidesmorediscriminativeinformation.Itisinter-estingtonotethatdespitetherandomizationandthealgo-rithmhavingnopriorinformation,itisabletolocatetheregionofinterestreliably.Furthermore,wealsodemonstratethatthemethodlearnsacoarse-to-“neregionofinterestforidenti“cation.Thisissimilartothehumanvisualsystemwhichisbelievedtoana-lyzerawinputinorderfromlowtohighspatialfrequenciesorfromlargeglobalshapestosmallerlocalones[].Fig.showstheheatmapoftheareaselectedbyourclassi“erasweconsiderdifferentdepthsofthedecisiontree.Weob- ,-,-, -Figure7.Eachrowrepresentsvisualizationsforasingleclassofbirds(fromtoptobottom):boattailedgrackle,brewersparrow,andgoldenwingedwar-bler.Foreachclass,wevisualize:(a)HeatmapforthegivenclassasdescribedinFig.;(b,c)Twoexampleimagesofthecorrespondingclassandthedistributionofimagepatchesselectedforthespe-ci“cimage.Theheatmapsareobtainedbydescend-ingeachtreeforthecorrespondingimageandonlyconsideringtheimageregionsofthenodesthatthisimagefallsin. Method MKL[ LLC[ Ours Accuracy 19.0% 18.0% 19.2% Table4.Comparisonofthemeanclassi“cationaccuracyofourmethodandthebaselineresultsontheCaltech-UCSDBirds200dataset.Thebestperformanceisindicatedwithboldfonts.servethatourrandomforestfollowsasimilarcoarse-to-“nestructure.Theaverageareaofthepatchesselectedreducesasthetreedepthincreases.Thisshowsthattheclassi“er“rststartswithmoreglobalfeaturesorhighfrequencyfea-turestodiscriminatebetweenmultipleclasses,and“nallyzerosinonthespeci“cdiscriminativeregionsforsomepar-ticularclasses.5.3.Caltech-UCSDBirds200(CUB-200)TheCaltech-UCSDBirds(CUB-200)datasetcontains6,033annotatedimagesof200differentbirdspecies[Thisdatasethasbeendesignedforsubordinateimagecat-egorization.Itisaverychallengingdatasetasthedif-ferentspeciesareverycloselyrelatedandhavesimilarshape/color.Therearearound30imagesperclasswith15fortrainingandtheremainingfortesting.Thetest-trainsplitsare“xed(providedonthewebsite).Theimagesarecroppedtotheprovidedboundingboxannotations.Theseregionsareresizedsuchthatthesmallerimagedimensionis150pixels.Ascolorprovidesimpor-tantdiscriminativeinformation,weextractC-SIFTdescrip-tors[]inthesamewaydescribedinSec..Weuse300decisiontreesinourrandomforest.Tbl.theperformanceofouralgorithmagainsttheLLCbase-lineandthestate-of-the-artresult(multiplekernellearning(MKL)[])onthisdataset.OurmethodoutperformsLLCandachievescomparableperformancewiththeMKLap-proach.Wenotethat[]usesmultiplefeaturese.g.ge-ometricblur,gray/colorSIFT,fullimagecolorhistogramsetc.Itisexpectedthatincludingthesefeaturescanfurtherimprovetheperformanceofourmethod.Furthermore,weshowinFig.thatourmethodisabletocapturetheintra-classposevariationsbyfocusingondifferentimageregionsfordifferentimages.5.4.StrengthandCorrelationofDecisionTreesWecompareourmethodagainsttwocontrolsettingsofrandomforestsonthePASCALactiondataset[Densefeature,weakclassi“er:Foreachimageregionorpairsofregionssampledfromourdensesamplingspace,replacetheSVMclassi“erinourmethodwithaweakclassi“erasintheconventionaldecisiontreelearningapproach[],i.e.randomlygenerating100setsoffeatureweightsandselectthebestone.SPMfeature,strongclassi“er:UseSVMclassi“erstosplitthetreenodesasinourmethod,buttheimagere-gionsarelimitedtothatfroma4-levelspatialpyramid.Notethatallothersettingsoftheabovetwoapproachesremainunchangedascomparedtoourmethod(asdescribedinSec.).Fig.showsthatonthisdataset,asetofstrongclassi“erswithrelativelyhighcorrelationcanleadtobetterperformancethanasetofweakclassi“erswithlowcorrela-tion.Wecanseethattheperformanceofrandomforestscanbesigni“cantlyimprovedbyusingstrongclassi“ersinthenodesofdecisiontrees.Comparedtotherandomforeststhatonlysamplespatialpyramidregions,usingthedensesamplingspaceobtainsstrongertreeswithoutsignif-icantlyincreasingthecorrelationbetweendifferenttrees,therebyimprovingtheclassi“cationperformance.Further-more,theperformanceoftherandomforestsusingdiscrim-inativenodeclassi“ersconvergeswithasmallnumberofdecisiontrees,indicatingthatourmethodismoreef“cientthantheconventionalrandomforestapproach.Inourex-periment,thetwosettingsandourmethodneedasimilaramountoftimetotrainasingledecisiontree.Additionally,weshowtheeffectivenessofrandombi-naryassignmentofclasslabels(Sec.)whenwetrainclassi“ersforeachtreenode.Hereweignorethisstepand           !"#$ % !$  !$ (a)Meanaverageprecision(mAP).         %&  !"#$ % !$  !$ (b)Strengthofthedecisiontrees.        '$  !"#$ % !$  !$ (c)Correlationbetweenthedecisiontrees.Figure8.(a)Wecomparetheclassi“cationperformance(mAP)obtainedbyourmethoddensefeature,strongclassi“erŽwithtwocontrolsettings.PleaserefertoSec.fordetailsofthesesettings.(b,c)Wealsocomparethestrengthofthedecisiontreeslearnedbytheseapproachesandcorrelationbetweenthesetrees(Sec.),whicharehighlyrelatedtothegeneralizationerrorofrandomforests.trainaone-vs-allmulti-classSVMforeachsampledimageregionorpairsofregions.Inthiscasesetsofweightsareobtainedwhenthereareclassesofimagesatthecurrentnode.Thebestsetofweightsisselectedusinginformationgainasbefore.Thissettingleadstodeeperandsigni“cantlyunbalancedtrees,andtheperformancedecreasesto58.1%with100trees.Furthermore,itishighlyinef“cientasitdoesnotscalewellwithincreasingnumberofclasses.6.ConclusionInthiswork,weproposearandomforestwithdiscrimi-nativedecisiontreesalgorithmtoexploreadensesamplingspacefor“ne-grainedimagecategorization.Experimentalresultsonsubordinateclassi“cationandactivityclassi“ca-tionshowthatourmethodachievesstate-of-the-artperfor-manceanddiscoversmuchsemanticallymeaningfulinfor-mation.Thefutureworkistojointlytrainalltheinforma-tionthatisobtainedfromthedecisiontrees.Acknowledgement.L.F-F.ispartiallysupportedbyanNSFCAREERgrant(IIS-0845230),anONRMURIgrant,theDARPAVIRATprogramandtheDARPAMindsEyeprogram.B.Y.ispartiallysupportedbytheSAPStan-fordGraduateFellowship.WewouldliketothankCarstenRotherandhiscolleaguesinMicrosoftResearchCam-bridgeforhelpfuldiscussionsaboutrandomforest.WealsowouldliketothankHaoSuandOlgaRussakovskyfortheircommentsonthepaper.References[1]A.Bosch,A.Zisserman,andX.Munoz.Imageclassi“cationusingrandomforestsandferns.In,2007.[2]S.Branson,C.Wah,B.Babenko,F.Schroff,P.Welinder,P.Perona,andS.Belongie.Visualrecognitionwithhumansintheloop.In,2010.[3]L.Breiman.Randomforests.Mach.Learn.,45:5…32,2001.[4]C.A.CollinandP.A.McMullen.Subordinate-levelcategorizationreliesonhighspatialfrequenciestoagreaterdegreethanbasic-levelcategorization.Percept.Psychophys.,67(2):354…364,2005.[5]V.Delaitre,I.Laptev,andJ.Sivic.Recognizinghumanactionsinstillimages:Astudyofbag-of-featuresandpart-basedrepresentations.In,2010.[6]G.Duan,C.Huang,H.Ai,andS.Lao.Boostingassociatedpairingcomparisonfeaturesforpedestriandetection.InICCVWorkshoponVisualSurveillance,2009.[7]M.Everingham,L.V.Gool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2010(VOC2010)Results.[8]L.Fei-Fei,R.Fergus,andA.Torralba.Recognizingandlearningobjectcategories.InICCVTutorial,2005.[9]L.Fei-FeiandP.Perona.Abayesianhierarchicalmodelforlearningnaturalscenecategories.In,2005.[10]P.F.Felzenszwalb,R.B.Girshick,D.McAllester,andD.Ramanan.Objectdetectionwithdiscriminativelytrainedpart-basedmodels.IEEET.PatternAnal.,32(9):1627…1645,2010.[11]A.B.HillelandD.Weinshall.Subordinateclassrecognitionusingrelationalobjectmodels.In,2007.[12]K.E.JohnsonandA.T.Eilers.Effectsofknowledgeanddevelop-mentonsubordinatelevelcategorization.CognitiveDev.,13(4):515…545,1998.[13]S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories.,2006.[14]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkey-Int.J.Comput.Vision,60(2):91…110,2004.[15]F.Moosmann,B.Triggs,andF.Jurie.Fastdiscriminativevisualcodebooksusingrandomizedclusteringforests.In,2007.[16]A.OlivaandA.Torralba.Modelingtheshapeofthescene:Aholis-ticrepresentationofthespatialenvelope.Int.J.Comput.Vision42(3):145…175,2001.[17]J.Shotton,M.Johnson,andR.Cipolla.Semantictextonforestsforimagecategorizationandsegmentation.In,2008.[18]Z.Tu.Probabilisticboosting-tree:Learningdiscriminativemodelsforclassi“cation,recognition,andclustering.In,2005.[19]K.E.A.vandeSande,T.Gevers,andC.G.M.Snoek.Evaluatingcolordescriptorsforobjectandscenerecognition.IEEET.Pattern,32(9):1582…1596,2010.[20]J.Wang,J.Yang,K.Yu,F.Lv,T.Huang,andY.Gong.Locality-constrainedlinearcodingforimageclassi“cation.In,2010.[21]P.Welinder,S.Branson,T.Mita,C.Wah,F.Schroff,S.Belongie,andP.Perona.Caltech-UCSDbirds200.TechnicalReportCNS-TR-201,Caltech,2010.[22]B.YaoandL.Fei-Fei.Grouplet:Astructuredimagerepresentationforrecognizinghumanandobjectinteractions.In,2010.