Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Yao Aditya Khosla Li FeiFei Computer Science Department Stanford University Stanford CA bangpengadityafeifeili - PDF document

Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Yao Aditya Khosla Li FeiFei Computer Science Department Stanford University Stanford CA bangpengadityafeifeili
Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Yao Aditya Khosla Li FeiFei Computer Science Department Stanford University Stanford CA bangpengadityafeifeili

Presentation on theme: "Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Yao Aditya Khosla Li FeiFei Computer Science Department Stanford University Stanford CA bangpengadityafeifeili "— Presentation transcript:

CombiningRandomizationandDiscriminationforFine-GrainedImageCategorizationBangpengYaoAdityaKhoslaLiFei-FeiComputerScienceDepartment,StanfordUniversity,Stanford,CAInthispaper,westudytheproblemof“ne-grainedim-agecategorization.Thegoalofourmethodistoexplore BangpengYaoandAdityaKhoslacontributedequallytothispaper.                   methodsigni“cantlyimprovesthestrengthofthedecisiontreesintherandomforestwhilestillmaintaininglowcorre-lationbetweenthetrees.Thisallowsourmethodtoachievelowgeneralizationerroraccordingtothetheoryofrandomforest[Weevaluateourmethodontwo“ne-grainedcategoriza-tiontasks:humanactivityrecognitioninstillimages[andsubordinatecategorizationofcloselyrelatedanimalspecies[],outperformingstate-of-the-artresults.Further-more,ourmethodidenti“essemanticallymeaningfulimagepatchesthatcloselymatchhumanintuition.Additionally,ourmethodtendstoautomaticallygenerateacoarse-to-“nestructureofdiscriminativeimageregions,whichparallelsthehumanvisualsystem[Theremainingpartofthispaperisorganizedasfollows:discussesrelatedwork.Sec.describesourdensefeaturespaceandSec.describesouralgorithmformin-ingthisspace.ExperimentalresultsarediscussedinSec.andSec.concludesthepaper.2.RelatedWorkImageclassi“cationhasbeenstudiedformanyyears.Mostoftheexistingworkfocusesonbasic-levelcategoriza-tionsuchasobjects[]orscenes[].Inthispaperwefocuson“ne-grainedimagecategorization[whichrequiresanapproachtocapturethe“neanddetailedinformationinimages.Inthiswork,weexploreadensefeaturerepresentationtodistinguish“ne-grainedimageclasses.Ourpreviousworkhasshowntheadvantageofdensefeatures(GroupletŽfea-tures[])inclassifyinghumanactivities.InsteadofusingthegenerativelocalfeaturesasinGrouplet,herewecon-sideraricherfeaturespaceinadiscriminativesettingwherebothlocalandglobalvisualinformationarefusedtogether.Inspiredby[],ourapproachalsoconsiderspairwiseinteractionsbetweenimageregions.Weusearandomforestframeworktoidentifydiscrimi-nativeimageregions.Randomforestshavebeenusedsuc-cessfullyinmanyvisiontaskssuchasobjectdetection[segmentation[]andcodebooklearning[].Inspiredfrom[],wecombinediscriminativetrainingandrandom-izationtoobtainaneffectiveclassi“erwithgoodgeneraliz-ability.Ourmethoddiffersfrom[]inthatforeachtreenode,wetrainanSVMclassi“erfromoneoftherandomlysampledimageregions,insteadofusingAdaBoosttocom-bineweakfeaturesfroma“xedsetofregions.Thisallowsustoexploreanextremelylargefeaturesetef“ciently.Aclassicalimageclassi“cationframework[]isFea-tureExtractionPoolingFeatureextractionaction14]andbetterods[]havebeenextensivelystudiedforobjectrecogni-tion.Inthiswork,weusediscriminativefeatureminingandrandomizationtoproposeanewfeature Figure2.(a)Illustrationofourdensesamplingspace.Wedenselysamplerectangularimagepatcheswithvaryingwidthsandheights.Theregionsarecloselylocatedandhavesigni“cantover-laps.Thereddenotethecentersofthepatches,andthear-rowsindicatetheincrementofthepatchwidthorheight.(Theac-tualdensityofregionsconsideredinouralgorithmissigni“cantlyhigher.This“gurehasbeensimpli“edforvisualclarity.)WenotethattheregionsconsideredbySpatialPyramidMatching[]isaverysmallsubsetlyingalongthediagonaloftheheight-widthplanethatweconsider.(b)Illustrationofsomeimagepatchesthatmaybediscriminativeforplaying-guitarŽ.Allthosepatchescanbesampledfromourdensesamplingspace.approach,anddemonstrateitseffectivenesson“ne-grainedimagecategorizationtasks.3.DenseSamplingSpaceOuralgorithmaimstoidentify“neimagestatisticsthatareusefulfor“ne-grainedcategorization.Forexample,inordertoclassifywhetherahumanisplayingaguitarorholdingaguitarwithoutplayingit,wewanttousetheim-agepatchesbelowthehumanfacethatarecloselyrelatedtothehuman-guitarinteraction(Fig.).Analgorithmthatcanreliablylocatesuchregionsisexpectedtoachievehighclassi“cationaccuracy.Weachievethisgoalbysearchingoverrectangularimagepatchesofarbitrarywidth,height,andimagelocation.Werefertothisextensivesetofimageregionsasthedensesamplingspace,asshowninFig.Furthermore,tocapturemorediscriminativedistinctions,weconsiderinteractionsbetweenpairsofarbitrarypatches.Thepairwiseinteractionsaremodeledbyapplyingconcate-nation,absoluteofdifference,orintersectionbetweenthefeaturerepresentationsoftwoimagepatches.However,thedensesamplingspaceisveryhuge.Sam-plingimagepatchesofsizeinaageeveryfourpixelsleadstothousandsofpatches.Thisincreasesmany-foldswhenconsideringregionswitharbi-trarywidthsandheights.Furtherconsideringpairwisein-teractionsofimagepatcheswilleffectivelyleadtotrillionsoffeaturesforeachimage.Inaddition,thereismuchnoiseandredundancyinthisfeatureset.Ontheonehand,manyimagepatchesarenotdiscriminativefordistinguish-ingdifferentimageclasses.Ontheotherhand,theimagepatchesarehighlyoverlappedinthedensesamplingspace,whichintroducessigni“cantredundancyamongthesefea- tures.Therefore,itischallengingtoexplorethishigh-dimensional,noisy,andredundantfeaturespace.Inthiswork,weaddressthisissueusingrandomization.4.RandomForestwithDiscriminativeDeci-sionTreesInordertoexplorethedensesamplingfeaturespacefor“ne-grainedvisualcategorization,wecombinetwocon-cepts:(1)Discriminativetrainingtoextracttheinformationintheimagepatcheseffectively;(2)toex-plorethedensefeaturespaceef“ciently.Speci“cally,weadoptarandomforest[]frameworkwhereeachtreenodeisadiscriminativeclassi“erthatistrainedononeorapairofimagepatches.Inoursetting,thediscriminativetrainingandrandomizationcanbene“tfromeachother.Wesumma-rizetheadvantagesofourmethodbelow:Therandomforestframeworkallowsustoconsiderasubsetoftheimageregionsatatime,whichallowsustoexplorethedensesamplingspaceef“cientlyinaprincipledway.Randomforestselectsabestimagepatchineachnode,andthereforeitcanremovethenoise-proneimagepatchesandreducetheredundancyinthefeatureset.Byusingdiscriminativeclassi“erstotrainthetreenodes,ourrandomforesthasmuchstrongerdecisiontreeswithsmallcorrelation.Thisallowsourmethodtohavelowgeneralizationerror(Sec.)comparedwiththetraditionalrandomforest[]whichusesweakclas-si“ersinthetreenodes.AnoverviewoftherandomforestframeworkweuseisshowninAlgorithm.Inthefollowingsections,we“rstdescribethisframework(Sec.).Thenweelabo-rateonourfeaturesampling(Sec.)andsplitlearning)strategiesindetail,anddescribethegeneralizationtheory[]ofrandomforestwhichguaranteestheeffective-nessofouralgorithm(Sec.4.1.TheRandomForestFrameworkRandomforestisamulti-classclassi“erconsistingofanensembleofdecisiontreeswhereeachtreeisconstructedviasomerandomization.AsillustratedinFig.,theleafnodesofeachtreeencodeadistributionovertheimageclasses.Allinternalnodescontainabinarytestthatsplitsthedataandsendsthesplitstoitschildrennodes.Thesplit-tingisstoppedwhenaleafnodeisencountered.Animageisclassi“edbydescendingeachtreeandcombiningtheleafdistributionsfromallthetrees.Thismethodallowsthe”ex-ibilitytoexplorealargefeaturespaceeffectivelybecauseitonlyconsidersasubsetoffeaturesineverytreenode.Eachtreereturnstheposteriorprobabilityofanexamplebelongingtothegivenclasses.Theposteriorprobabilityof Weak classifierLeaf(a)Conventionalrandomdecisiontree. Strong classifierLeaf(b)Discriminativedecisiontree.Figure3.Comparisonofconventionalrandomdecisiontreeswithourdiscriminativedecisiontrees.Solidbluearrowsshowbinarysplitsofthedata.Dottedlinesfromtheshadedimageregionsindi-catetheregionusedateachnode.Conventionaldecisiontreesuseinformationfromtheentireimageateachnode,whichencodesnospatialorstructuralinformation,whileourdecisiontreessamplesingleormultipleimageregionsfromthedensesamplingspace).Thehistogramsbelowtheleafnodesillustratethepos-teriorprobabilitydistribution).In(b),dottedredarrowsbetweennodesshowournestedtreestructurethatallowsinformationto”owinatop-downmanner.Ourapproachusesstrongclassi“ersineachnode(Sec.),whiletheconventionalmethodusesweakclassi“ers.aparticularclassateachleafnodeislearnedasthepro-portionofthetrainingimagesbelongingtothatclassatthegivenleafnode.Theposteriorprobabilityofclassatleafoftreeisdenotedas.Thus,atestimagecanbeclassi“edbyaveragingtheposteriorprobabilityfromtheleafnodeofeachtree:  isthepredictedclasslabel,isthetotalnumberoftrees,andistheleafnodethattheimagefallsinto.Inthefollowingsections,wedescribetheprocessofob-usingouralgorithm.Readerscanrefertopreviousworks[]formoredetailsoftheconven-tionaldecisiontreelearningprocedure.4.2.SamplingtheDenseFeatureSpaceAsshowninFig.,eachinternalnodeinourdeci-siontreecorrespondstoasingleorapairofrectangularim-ageregionsthataresampledfromthedensesamplingspace),wheretheregionscanhavemanypossiblewidths,heights,andimagelocations.Inordertosampleacandidateimageregion,we“rstnormalizeallimagestounitwidth foreachtree -Obtainarandomsetoftrainingexamplesneedstosplit i.Randomlysamplethecandidate(pairsof)imageregions(Sec.ii.Selectthebestregiontosplitintotwosets forthecurrentleafnode. Algorithm1Overviewoftheprocessofgrowingdecisiontreesintherandomforestframework.andheight,andthenrandomlysamplefromauniformdistribution.Thesecoordinatesspecifytwodiagonallyoppositeverticesofarectangularre-gion.Suchregionscouldcorrespondtosmallareasoftheimage(e.g.thepurpleboundingboxesinFig.)oreventhecompleteimage.Thisallowsourmethodtocapturebothglobalandlocalinformationintheimage.Inourapproach,eachsampledimageregionisrepre-sentedbyahistogramofvisualdescriptors.Forapairofregions,thefeaturerepresentationisformedbyapplyinghistogramoperations(e.g.concatenation,intersection,etc.)tothehistogramsobtainedfrombothregions.Furthermore,thefeaturesareaugmentedwiththedecisionvalue(describedinSec.)ofthisimagefromitsparentnode(indicatedbythedashedredlinesinFig.).Therefore,ourfeaturerepresentationcombinestheinformationofallupstreamtreenodesthatthecorrespondingimagehasde-scendedfrom.WerefertothisideaasnestingŽ.Usingfeaturesamplingandnesting,weobtainacandidatesetof,correspondingtoacandidateimagere-gionofthecurrentnode.Implementationdetails.Ourmethodis”exibletousemanydifferentvisualdescriptors.Inthiswork,wedenselyextractSIFT[]descriptorsoneachimagewithaspacingoffourpixels.Thescalesofthegridstoextractdescrip-torsare8,12,16,24,and30.Usingk-meansclustering,weconstructavocabularyofcodewords.Then,weuseLocality-constrainedLinearCoding[]toassignthede-scriptorstocodewords.Abag-of-wordshistogramrepre-sentationisusediftheareaofthepatchissmallerthan0.2,whilea2-levelor3-levelspatialpyramidisusediftheareaisbetween0.2and0.8orlargerthan0.8respectively.Duringsampling(stepiofAlgorithm),weconsiderfoursettingsofimagepatches:asingleimagepatchandthreetypesofpairwiseinteractions(concatenation,inter- Adictionarysizeof1024,256,256isusedforPASCALaction[PPMI[],andCaltech-UCSDBirds[]datasetsrespectively.section,andabsoluteofdifferenceofthetwohistograms).Wesample25and50imageregions(orpairsofregions)intherootnodeandthe“rstlevelnodesrespectively,andsample100regions(orpairsofregions)inallothernodes.Samplingasmallernumberofimagepatchesintherootcanreducethecorrelationbetweentheresultingtrees.4.3.LearningtheSplitsInthissection,wedescribetheprocessoflearningthebinarysplitsofthedatausingSVM(stepiiinAlgorithmThisisachievedintwosteps:(1)Randomlyassigningallexamplesfromeachclasstoabinarylabel;(2)UsingSVMtolearnabinarysplitofthedata.Assumethatwehaveclassesofimagesatagivennode.Weuniformlysamplebinaryvariables,,andas-signallexamplesofaparticularclassaclasslabelofAseachnodeperformsabinarysplitofthedata,thisal-lowsustolearnasimplebinarySVMateachnode.Thisimprovesthescalabilityofourmethodtoalargenumberofclassesandresultsinwell-balancedtrees.Usingthefeatureofanimageregion(orpairsofregions)asdescribedinSec.,we“ndabinarysplitofthedata:gotoleftchildgotorightchildisthesetofweightslearnedfromalinearSVM.Weevaluateeachbinarysplitthatcorrespondstoanim-ageregionorpairsofregionswiththeinformationgaincri-teria[],whichiscomputedfromthecompletetrainingim-agesthatfallatthecurrenttreenode.Thesplitsthatmaxi-mizetheinformationgainareselectedandthesplittingpro-cess(stepiiiinAlgorithm)isrepeatedwiththenewsplitsofthedata.Thetreesplittingstopsifapre-speci“edmax-imumtreedepthhasbeenreached,ortheinformationgainofthecurrentnodeislargerthanathreshold,orthenumberofsamplesinthecurrentnodeissmall.4.4.GeneralizationErrorofRandomForestsIn[],ithasbeenshownthatanupperboundforthegen-eralizationerrorofarandomforestisgivenbyisthestrengthofthedecisiontreesintheforest,andisthecorrelationbetweenthetrees.Therefore,thegener-alizationerrorofarandomforestcanbereducedbymakingthedecisiontreesstrongerorreducingthecorrelationbe-tweenthetrees.Inourapproach,welearndiscriminativeSVMclassi-“ersforthetreenodes.Therefore,comparedtothetradi-tionalrandomforestswherethetreenodesareweakclas-si“ersofrandomlygeneratedfeatureweights[],ourde-cisiontreesaremuchstronger.Furthermore,sinceweareconsideringanextremelydensefeaturespace,eachdeci-siontreeonlyconsidersarelativelysmallsubsetofimage Method Phoning Playing Reading Riding Riding Running Taking Using Walking Overall instrument bike horse photo computer CVC-BASE 56.2 56.5 34.7 75.1 83.6 86.5 25.4 60.0 69.2 60.8 CVC-SEL 49.8 52.8 34.3 74.2 85.5 85.1 24.9 64.1 72.5 60.4 SURREY-KDA 52.6 53.5 35.9 81.0 89.3 86.5 32.8 59.2 68.6 62.2 UCLEAR-DOSP 47.0 57.8 26.9 78.8 89.7 87.3 32.5 60.0 70.1 61.1 UMCO-KSVM 53.5 43.0 32.0 67.9 68.8 83.0 34.1 45.9 60.4 54.3 OurMethod 45.0 57.4 41.5 81.8 90.5 89.5 37.9 65.0 72.7 64.6 Table1.Comparisonoftheaverageprecision(%)ofourmethodwiththewinnersofPASCALVOC2010actionclassi“cationchallenge[Eachrowshowstheresultsobtainedfromonemethod.Thebestresultsarehighlightedwithboldfonts.patches.Thismeansthereislittlecorrelationbetweenthetrees.Therefore,ourrandomforestwithdiscriminativede-cisiontreesalgorithmcanachieveverygoodperformanceon“ne-grainedimageclassi“cation,whereexploring“neimagestatisticsdiscriminativelyisimportant.InSec.weshowthestrengthandcorrelationofdifferentsettingsofrandomforestswithrespecttothenumberofdecisiontrees,whichjusti“estheabovearguments.Pleasereferto[]fordetailsabouthowtocomputethestrengthandcorrelationvaluesforarandomforest.5.ExperimentsInthissection,weevaluateouralgorithmonthree“ne-grainedimagedatasets:theactionclassi“cationdatasetinPASCALVOC2010[](Sec.),actionsofpeople-playing-musical-instrument(PPMI)[](Sec.),andasubordinateobjectcategorizationdatasetof200birdspecies[](Sec.).Experimentalresultsshowthatouralgorithmoutperformsstate-of-the-artmethodsonthesedatasets.Wealsoevaluatethestrengthandcorrelationofthedecisiontreesinourmethod,andcomparetheresultwiththeothersettingsofrandomforeststoshowwhyourmethodcanleadtobetterclassi“cationperformance(Sec.5.1.PASCALActionClassi“cationThemostrecentPASCALVOCchallenge[]incorpo-ratedthetaskofrecognizingactionsinstillimages.Theimagesdescribeninecommonhumanactivities:PhoningŽ,PlayingamusicalinstrumentŽ,ReadingŽ,Ridingabicy-cleormotorcycleŽ,RidingahorseŽ,RunningŽ,TakingaphotographŽ,UsingacomputerŽ,andWalkingŽ.Eachpersonthatweneedtoclassifyisindicatedbyaboundingboxandisannotatedwithoneofthenineactionstheyareperforming.Thereare4090training/validationimagesandasimilarnumberoftestingimagesforeachclass.Asin[],weobtainaforegroundimageforeachper-sonbyextendingtheboundingboxofthepersontocontaintheoriginalsizeoftheboundingbox,andresizingitsuchthatthelargerdimensionis300pixels.Wealsoresizetheoriginalimageaccordingly.Thereforeforeachperson,wehaveapersonimageŽaswellasabackgroundimageŽ. Figure4.Heatmapsthatshowdistributionsoffrequencythatanimagepatchisselectedinourmethod.Theheatmapsareobtainedbyaggregatingimageregionsofallthetreenodesintherandomforestweightedbytheprobabilityofthecorrespondingclass.Redindicateshighfrequencyandblueindicateslowfrequency.Weonlysampleregionsfromtheforegroundandconcate-natethefeatureswitha2-levelspatialpyramidoftheback-ground.Weuse100decisiontreesinourrandomforest.WecompareouralgorithmwiththemethodsonPAS-CALchallenge[]thatachievethebestaverageprecision.TheresultsareshowninTbl..Ourmethodoutperformstheothersintermsofmeanaverageprecision,andachievesthebestresultonsevenofthenineactions.NotethatweachievedthisaccuracybasedononlygrayscaleSIFTde-scriptors,withoutusinganyotherfeaturesorcontextualin-formationlikeobjectdetectors.showsthefrequencyofanimagepatchbeingse- Method BoW Grouplet SPM LLC Ours [22] [13] [20] mAP(%) 22.7 36.7 39.1 41.8 47.0 Table2.MeanAveragePrecision(%mAP)onthe24-classclassi“-cationproblemofthePPMIdataset.Thebestresultishighlightedwithboldfonts.ThegroupletusesoneSIFTscale,whilealltheothermethodsusemultipleSIFTscalesdescribedinSec. Instrument BoW Grouplet SPM LLC Ours [22] [13] [20] Bassoon 73.6 78.5 84.6 85.0 86.2 Erhu 82.2 87.6 88.0 89.5 89.8 Flute 86.3 95.7 95.3 97.3 98.6 FrenchHorn 79.0 84.0 93.2 93.6 97.3 Guitar 85.1 87.7 93.7 92.4 93.0 Saxophone 84.4 87.7 89.5 88.2 92.4 Violin 80.6 93.0 93.4 96.3 95.7 Trumpet 69.3 76.3 82.5 86.7 90.0 Cello 77.3 84.6 85.7 82.3 86.7 Clarinet 70.5 82.3 82.7 84.8 90.4 Harp 75.0 87.1 92.1 93.9 92.8 Recorder 73.0 76.5 78.0 79.1 92.8 Average 78.0 85.1 88.2 89.2 92.1 Table3.ComparisonofmeanAveragePrecision(%mAP)ofourmethodandthebaselineresultsonthePPMIbinaryclassi“cationtasksofpeopleplayingandholdingdifferentmusicalinstruments.Eachcolumnshowstheresultsobtainedfromonemethod.Thebestresultsarehighlightedwithboldfonts.lectedbyourmethod.Foreachactivity,the“gureisob-tainedbyconsideringthefeaturesselectedinthetreenodesweightedbytheproportionofsamplesofthisactivityinthisnode.Fromtheresults,wecanclearlyseethedifferenceofdistributionsfordifferentactivities.Forexample,theim-agepatchescorrespondingtohuman-objectinteractionsareusuallyhighlighted,suchasthepatchesofbikesandbooks.Wecanalsoseethattheimagepatchescorrespondingtobackgroundarenotfrequentlyselected.Thisdemonstratesouralgorithmsabilitytodealwithbackgroundclutter.5.2.People-Playing-Musical-Instrument(PPMI)Thepeople-playing-musical-instrument(PPMI)datasetisintroducedin[].Thisdatasetputsemphasisonunder-standingsubtleinteractionsbetweenhumansandobjects.Therearetwelvemusicalinstruments;foreachinstrumentthereareimagesofpeopleplayingtheinstrumentandhold-ingtheinstrumentbutnotplayingit.Weevaluatetheper-formanceofourmethodwith100decisiontreesonthe24-classclassi“cationproblem.Wecompareourmethodwithmanybaselineresults.Tbl.showsthatwesigni“cantly Thebaselineresultsareavailablefromthedatasetwebsite (a)”ute (b)guitar (c)violinFigure5.(a)Heatmapofthedominantregionsofinterestselectedbyourmethodforplaying”uteonimagesofplaying”ute(toprow)andholdinga”utewithoutplayingit(bottomrow).(b,c)showssimilarimagesforguitarandviolin,respectively.Refertoforhowtheheatmapsareobtained.  # $& '(!)*& '(!+)Figure6.HeatmapforplayingtrumpetŽclasswiththeweightedaverageareaofselectedimageregionsforeachtreedepth.outperformthebaselineresults.showstheresultofourmethodonthe12binaryclassi“cationtaskswhereeachtaskinvolvesdistinguishingtheactivitiesofplayingandnotplayingforthesameinstru-ment.DespiteahighbaselineofmAP,ourmethodoutperformsbytoachievearesultofoverall.Furthermore,weoutperformthebaselinemethodsonnineofthetwelvebinaryclassi“cationtasks.InFig.,wevi-sualizetheheatmapofthefeatureslearnedforthistask.Weobservethattheyshowsemanticallymeaningfulloca-tionsofwherewewouldexpectthediscriminativeregionsofpeopleplayingdifferentinstrumentstooccur.Forexam-ple,for”ute,theregionaroundthefaceprovidesimportantinformationwhileforguitar,theregiontotheleftofthetorsoprovidesmorediscriminativeinformation.Itisinter-estingtonotethatdespitetherandomizationandthealgo-rithmhavingnopriorinformation,itisabletolocatetheregionofinterestreliably.Furthermore,wealsodemonstratethatthemethodlearnsacoarse-to-“neregionofinterestforidenti“cation.Thisissimilartothehumanvisualsystemwhichisbelievedtoana-lyzerawinputinorderfromlowtohighspatialfrequenciesorfromlargeglobalshapestosmallerlocalones[].Fig.showstheheatmapoftheareaselectedbyourclassi“erasweconsiderdifferentdepthsofthedecisiontree.Weob- ,-,-, -Figure7.Eachrowrepresentsvisualizationsforasingleclassofbirds(fromtoptobottom):boattailedgrackle,brewersparrow,andgoldenwingedwar-bler.Foreachclass,wevisualize:(a)HeatmapforthegivenclassasdescribedinFig.;(b,c)Twoexampleimagesofthecorrespondingclassandthedistributionofimagepatchesselectedforthespe-ci“cimage.Theheatmapsareobtainedbydescend-ingeachtreeforthecorrespondingimageandonlyconsideringtheimageregionsofthenodesthatthisimagefallsin. Method MKL[ LLC[ Ours Accuracy 19.0% 18.0% 19.2% Table4.Comparisonofthemeanclassi“cationaccuracyofourmethodandthebaselineresultsontheCaltech-UCSDBirds200dataset.Thebestperformanceisindicatedwithboldfonts.servethatourrandomforestfollowsasimilarcoarse-to-“nestructure.Theaverageareaofthepatchesselectedreducesasthetreedepthincreases.Thisshowsthattheclassi“er“rststartswithmoreglobalfeaturesorhighfrequencyfea-turestodiscriminatebetweenmultipleclasses,and“nallyzerosinonthespeci“cdiscriminativeregionsforsomepar-ticularclasses.5.3.Caltech-UCSDBirds200(CUB-200)TheCaltech-UCSDBirds(CUB-200)datasetcontains6,033annotatedimagesof200differentbirdspecies[Thisdatasethasbeendesignedforsubordinateimagecat-egorization.Itisaverychallengingdatasetasthedif-ferentspeciesareverycloselyrelatedandhavesimilarshape/color.Therearearound30imagesperclasswith15fortrainingandtheremainingfortesting.Thetest-trainsplitsare“xed(providedonthewebsite).Theimagesarecroppedtotheprovidedboundingboxannotations.Theseregionsareresizedsuchthatthesmallerimagedimensionis150pixels.Ascolorprovidesimpor-tantdiscriminativeinformation,weextractC-SIFTdescrip-tors[]inthesamewaydescribedinSec..Weuse300decisiontreesinourrandomforest.Tbl.theperformanceofouralgorithmagainsttheLLCbase-lineandthestate-of-the-artresult(multiplekernellearning(MKL)[])onthisdataset.OurmethodoutperformsLLCandachievescomparableperformancewiththeMKLap-proach.Wenotethat[]usesmultiplefeaturese.g.ge-ometricblur,gray/colorSIFT,fullimagecolorhistogramsetc.Itisexpectedthatincludingthesefeaturescanfurtherimprovetheperformanceofourmethod.Furthermore,weshowinFig.thatourmethodisabletocapturetheintra-classposevariationsbyfocusingondifferentimageregionsfordifferentimages.5.4.StrengthandCorrelationofDecisionTreesWecompareourmethodagainsttwocontrolsettingsofrandomforestsonthePASCALactiondataset[Densefeature,weakclassi“er:Foreachimageregionorpairsofregionssampledfromourdensesamplingspace,replacetheSVMclassi“erinourmethodwithaweakclassi“erasintheconventionaldecisiontreelearningapproach[],i.e.randomlygenerating100setsoffeatureweightsandselectthebestone.SPMfeature,strongclassi“er:UseSVMclassi“erstosplitthetreenodesasinourmethod,buttheimagere-gionsarelimitedtothatfroma4-levelspatialpyramid.Notethatallothersettingsoftheabovetwoapproachesremainunchangedascomparedtoourmethod(asdescribedinSec.).Fig.showsthatonthisdataset,asetofstrongclassi“erswithrelativelyhighcorrelationcanleadtobetterperformancethanasetofweakclassi“erswithlowcorrela-tion.Wecanseethattheperformanceofrandomforestscanbesigni“cantlyimprovedbyusingstrongclassi“ersinthenodesofdecisiontrees.Comparedtotherandomforeststhatonlysamplespatialpyramidregions,usingthedensesamplingspaceobtainsstrongertreeswithoutsignif-icantlyincreasingthecorrelationbetweendifferenttrees,therebyimprovingtheclassi“cationperformance.Further-more,theperformanceoftherandomforestsusingdiscrim-inativenodeclassi“ersconvergeswithasmallnumberofdecisiontrees,indicatingthatourmethodismoreef“cientthantheconventionalrandomforestapproach.Inourex-periment,thetwosettingsandourmethodneedasimilaramountoftimetotrainasingledecisiontree.Additionally,weshowtheeffectivenessofrandombi-naryassignmentofclasslabels(Sec.)whenwetrainclassi“ersforeachtreenode.Hereweignorethisstepand           !"#$ % !$  !$ (a)Meanaverageprecision(mAP).         %&  !"#$ % !$  !$ (b)Strengthofthedecisiontrees.        '$  !"#$ % !$  !$ (c)Correlationbetweenthedecisiontrees.Figure8.(a)Wecomparetheclassi“cationperformance(mAP)obtainedbyourmethoddensefeature,strongclassi“erŽwithtwocontrolsettings.PleaserefertoSec.fordetailsofthesesettings.(b,c)Wealsocomparethestrengthofthedecisiontreeslearnedbytheseapproachesandcorrelationbetweenthesetrees(Sec.),whicharehighlyrelatedtothegeneralizationerrorofrandomforests.trainaone-vs-allmulti-classSVMforeachsampledimageregionorpairsofregions.Inthiscasesetsofweightsareobtainedwhenthereareclassesofimagesatthecurrentnode.Thebestsetofweightsisselectedusinginformationgainasbefore.Thissettingleadstodeeperandsigni“cantlyunbalancedtrees,andtheperformancedecreasesto58.1%with100trees.Furthermore,itishighlyinef“cientasitdoesnotscalewellwithincreasingnumberofclasses.6.ConclusionInthiswork,weproposearandomforestwithdiscrimi-nativedecisiontreesalgorithmtoexploreadensesamplingspacefor“ne-grainedimagecategorization.Experimentalresultsonsubordinateclassi“cationandactivityclassi“ca-tionshowthatourmethodachievesstate-of-the-artperfor-manceanddiscoversmuchsemanticallymeaningfulinfor-mation.Thefutureworkistojointlytrainalltheinforma-tionthatisobtainedfromthedecisiontrees.Acknowledgement.L.F-F.ispartiallysupportedbyanNSFCAREERgrant(IIS-0845230),anONRMURIgrant,theDARPAVIRATprogramandtheDARPAMindsEyeprogram.B.Y.ispartiallysupportedbytheSAPStan-fordGraduateFellowship.WewouldliketothankCarstenRotherandhiscolleaguesinMicrosoftResearchCam-bridgeforhelpfuldiscussionsaboutrandomforest.WealsowouldliketothankHaoSuandOlgaRussakovskyfortheircommentsonthepaper.References[1]A.Bosch,A.Zisserman,andX.Munoz.Imageclassi“cationusingrandomforestsandferns.In,2007.[2]S.Branson,C.Wah,B.Babenko,F.Schroff,P.Welinder,P.Perona,andS.Belongie.Visualrecognitionwithhumansintheloop.In,2010.[3]L.Breiman.Randomforests.Mach.Learn.,45:5…32,2001.[4]C.A.CollinandP.A.McMullen.Subordinate-levelcategorizationreliesonhighspatialfrequenciestoagreaterdegreethanbasic-levelcategorization.Percept.Psychophys.,67(2):354…364,2005.[5]V.Delaitre,I.Laptev,andJ.Sivic.Recognizinghumanactionsinstillimages:Astudyofbag-of-featuresandpart-basedrepresentations.In,2010.[6]G.Duan,C.Huang,H.Ai,andS.Lao.Boostingassociatedpairingcomparisonfeaturesforpedestriandetection.InICCVWorkshoponVisualSurveillance,2009.[7]M.Everingham,L.V.Gool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2010(VOC2010)Results.[8]L.Fei-Fei,R.Fergus,andA.Torralba.Recognizingandlearningobjectcategories.InICCVTutorial,2005.[9]L.Fei-FeiandP.Perona.Abayesianhierarchicalmodelforlearningnaturalscenecategories.In,2005.[10]P.F.Felzenszwalb,R.B.Girshick,D.McAllester,andD.Ramanan.Objectdetectionwithdiscriminativelytrainedpart-basedmodels.IEEET.PatternAnal.,32(9):1627…1645,2010.[11]A.B.HillelandD.Weinshall.Subordinateclassrecognitionusingrelationalobjectmodels.In,2007.[12]K.E.JohnsonandA.T.Eilers.Effectsofknowledgeanddevelop-mentonsubordinatelevelcategorization.CognitiveDev.,13(4):515…545,1998.[13]S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories.,2006.[14]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkey-Int.J.Comput.Vision,60(2):91…110,2004.[15]F.Moosmann,B.Triggs,andF.Jurie.Fastdiscriminativevisualcodebooksusingrandomizedclusteringforests.In,2007.[16]A.OlivaandA.Torralba.Modelingtheshapeofthescene:Aholis-ticrepresentationofthespatialenvelope.Int.J.Comput.Vision42(3):145…175,2001.[17]J.Shotton,M.Johnson,andR.Cipolla.Semantictextonforestsforimagecategorizationandsegmentation.In,2008.[18]Z.Tu.Probabilisticboosting-tree:Learningdiscriminativemodelsforclassi“cation,recognition,andclustering.In,2005.[19]K.E.A.vandeSande,T.Gevers,andC.G.M.Snoek.Evaluatingcolordescriptorsforobjectandscenerecognition.IEEET.Pattern,32(9):1582…1596,2010.[20]J.Wang,J.Yang,K.Yu,F.Lv,T.Huang,andY.Gong.Locality-constrainedlinearcodingforimageclassi“cation.In,2010.[21]P.Welinder,S.Branson,T.Mita,C.Wah,F.Schroff,S.Belongie,andP.Perona.Caltech-UCSDbirds200.TechnicalReportCNS-TR-201,Caltech,2010.[22]B.YaoandL.Fei-Fei.Grouplet:Astructuredimagerepresentationforrecognizinghumanandobjectinteractions.In,2010.

Download Presentation

Download Pdf - The PPT/PDF document "Combining Randomization and Discriminati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

View more...

If you wait a while, download link will show on top.Please download the presentation after loading the download link.

Combining Randomization and Discrimination for FineGrained Image Categorization Bangpeng Yao Aditya Khosla Li FeiFei Computer Science Department Stanford University Stanford CA bangpengadityafeifeili - Description

stanfordedu Abstract In this paper we study the problem of 64257negrained im age categorization The goal of our method is to explore 64257ne image statistics and identify the discriminative image patches for recognition We achieve this goal by combin ID: 9842 Download Pdf

Uploaded By: tawny-fly
Views: 143
Type: Public

Tags

stanfordedu Abstract this paper

Related Documents