/
PortmanteauVocabulariesforMulti-CueImageRepresentation PortmanteauVocabulariesforMulti-CueImageRepresentation

PortmanteauVocabulariesforMulti-CueImageRepresentation - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
368 views
Uploaded On 2017-03-06

PortmanteauVocabulariesforMulti-CueImageRepresentation - PPT Presentation

FahadShahbazKhan1JoostvandeWeijer1AndrewDBagdanov12MariaVanrell11CentredeVisioperComputadorComputerScienceDepartment1UniversitatAutonomadeBarcelonaEdifciOCampusUABBellaterraBarcelonaSpain2 ID: 522973

FahadShahbazKhan1 JoostvandeWeijer1 AndrewD.Bagdanov1;2 MariaVanrell11CentredeVisioperComputador ComputerScienceDepartment1UniversitatAutonomadeBarcelona EdifciO CampusUAB(Bellaterra) Barcelona Spain2

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "PortmanteauVocabulariesforMulti-CueImage..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

PortmanteauVocabulariesforMulti-CueImageRepresentation FahadShahbazKhan1,JoostvandeWeijer1,AndrewD.Bagdanov1;2,MariaVanrell11CentredeVisioperComputador,ComputerScienceDepartment1UniversitatAutonomadeBarcelona,EdifciO,CampusUAB(Bellaterra),Barcelona,Spain2MediaIntegrationandCommunicationCenter,UniversityofFlorence,ItalyAbstractWedescribeanoveltechniqueforfeaturecombinationinthebag-of-wordsmodelofimageclassication.Ourapproachbuildsdiscriminativecompoundwordsfromprimitivecueslearnedindependentlyfromtrainingimages.Ourmainobservationisthatmodelingjoint-cuedistributionsindependentlyismorestatisticallyrobustfortypicalclassicationproblemsthanattemptingtoempiricallyestimatethede-pendent,joint-cuedistributiondirectly.WeuseInformationtheoreticvocabularycompressiontonddiscriminativecombinationsofcuesandtheresultingvocab-ularyofportmanteau1wordsiscompact,hasthecuebindingproperty,andsup-portsindividualweightingofcuesinthenalimagerepresentation.State-of-the-artresultsonboththeOxfordFlower-102andCaltech-UCSDBird-200datasetsdemonstratetheeffectivenessofourtechniquecomparedtoother,signicantlymorecomplexapproachestomulti-cueimagerepresentation.1IntroductionImagecategorizationisthetaskofclassifyinganimageascontaininganobjectsfromapredenedlistofcategories.Oneofthemostsuccessfulapproachestothisproblemisthebag-of-words(BOW)[4,15,11,2].Inthebag-of-wordsmodelanimageisrstrepresentedbyacollectionoflocalimagefeaturesdetectedeithersparselyorinaregular,densegrid.Eachlocalfeatureisthenrepresentedbyoneormorecues,eachdescribingoneaspectofasmallregionaroundthecorrespondingfeature.Typicallocalcuesincludecolor,shape,andtexture.Thesecuesarethenquantizedintovisualwordsandthenalimagerepresentationisahistogramoverthesevisualvocabularies.InthenalstageoftheBOWapproachthehistogramrepresentationsaresenttoaclassier.ThesuccessofBOWishighlydependentonthequalityofthevisualvocabulary.Inthispaperweinvestigatevisualvocabularieswhichareusedtorepresentimageswhoselocalfeaturesaredescribedbybothshapeandcolor.ToextendBOWtomultiplecues,twopropertiesareespeciallyimportant:cuebindingandcueweighting.Avisualvocabularyissaidtohavethebindingpropertywhentwoindependentcuesappearingatthesamelocationinanimageremaincoupledinthenalimagerepresentation.Forexample,ifeverylocalpatchinanimageisindependentlydescribedbyashapewordandacolorword,inthenalimagerepresentationusingcompoundwordsthebindingpropertyensuresthatshapeandcolorwordscomingfromthesamefeaturelocationarecoupledinthenalrepresentation.Thetermbindingisborrowedfromtheneuroscienceeldwhereitisusedtodescribethewayinwhichhumansselectandintegratetheseparatecuesofobjectsinthecorrectcombinationsinordertoaccuratelyrecognizethem[17].Thepropertyofcueweightingimpliesthatitispossible 1Aportmanteauisacombinationoftwoormorewordstoformaneologismthatcommunicatesaconceptbetterthananyindividualword(e.g.Skiresort+Konference=Skonference).Weusethetermtodescribeourvocabulariestoemphasizetheconnotationwithcombiningcolorandshapewordsintonew,moremeaningfulrepresentations.1 toadapttherelevanceofeachcuedependingonthedataset.TheimportanceofcueweightingcanbeseenfromthesuccessofMultipleKernelLearning(MKL)techniqueswhereweightsforeachcueareautomaticallylearned[3,13,21,14,1,20].Traditionally,twoapproachesexisttohandlemultiplecuesinBOW.Wheneachcuehasitsownvisualvocabularytheresultisknownasalatefusionimagerepresentationinwhichanimageisrepresentedasonehistogramovershape-wordsandanotherhistogramovercolor-words.Sucharepresentationdoesnothavethecuebindingproperty,meaningthatitisimpossibletoknowexactlywhichcolor-shapeeventsco-occurredatlocalfeatures.Latefusiondoes,however,allowcueweight-ing.Anotherapproach,calledearlyfusion,constructsasinglevisualvocabularyofjointcolor-shapewords.Representationsoverearlyfusionvocabularieshavethecuebindingproperty,meaningthatthespatialco-occurrenceofshapeandcoloreventsispreserved.However,cueweightinginearlyfusionvocabulariesisverycumbersomesincemustbeperformedbeforevocabularyconstructionmakingcross-validationveryexpensive.Recently,Khanetal.[10]proposedamethodwhichcom-binescuebindingandweighting.However,theirnalimagerepresentationsizeisequaltonumberofvocabularywordstimesthenumberofclasses,andisthereforenotfeasibleforthelargedatasetsconsideredinthispaper.Astraightforward,ifcombinatoriallyinconvenient,approachtoensuringthebindingpropertyistocreateanewvocabularythatcontainsonewordforeachcombinationoforiginalshapeandcolorfeature.Consideringthateachoftheoriginalshapeandcolorvocabulariesmaycontainthousandsofwords,theresultingjointvocabularymaycontainmillions.Suchlargevocabulariesareimpracticalasestimatingjointcolor-shapestatisticsisofteninfeasibleduetothedifcultyofsamplingfromlimitedtrainingdata.Furthermore,withsomanyparameterstheresultingclassiersarepronetoovertting.Becauseofthisandotherproblems,thistypeofjointfeaturerepresentationhasnotbeenfurtherpursuedasawayofensuringthatimagerepresentationshavethebindingproperty.Inrecentyearsanumberofvocabularycompressiontechniqueshaveappearedthatderivesmall,discriminativevocabulariesfromverylargeones[16,7,5].Mostofthesetechniquesarebasedoninformationtheoreticclusteringalgorithmsthatattempttocombinewordsthatareequivalentlydis-criminativeforthesetofobjectcategoriesbeingconsidered.Becausethesetechniquesareguidedbythediscriminativepowerofclustersofvisualwords,estimatesofclass-conditionalvisualwordprob-abilitiesareessential.Theserecentdevelopmentsinvocabularycompressionallowustoreconsiderthedirect,Cartesianproductapproachtobuildingcompoundvocabularies.Thesevocabularycompressiontechniqueshavebeendemonstratedonsingle-cuevocabularieswithafewtensofthousandsofwords.Startingfromevenmoderatelysizedshapeandcolorvocabulariesresultsinacompoundshape-colorvocabularyanorderofmagnitudelarger.Insuchcases,robustestimatesoftheunderlyingclass-conditionaljoint-cuedistributionsmaybedifculttoobtain.Weshowthatfortypicaldatasetsastrongindependenceassumptionaboutthejointcolor-shapedistri-butionleadstomorerobustestimatesoftheclass-conditionaldistributionsneededforvocabularycompression.Inaddition,ourestimationtechniqueallowsexiblecue-specicweightingthatcan-notbeeasilyperformedwithothercuecombinationtechniquesthatmaintainthebindingproperty.2PortmanteauvocabulariesInthissectionweproposeanewmulti-cuevocabularyconstructionmethodthatresultsincom-pactvocabularieswhichpossessboththecuebindingandthecueweightingpropertiesdescribedabove.Ourapproachistobuildportmanteauvocabulariesofdiscriminative,compoundshapeandcolorwordschosenfromindependentlylearnedcolorandshapelexicons.Thetermportmanteauisusedinnaturallanguageforwordswhichareablendoftwootherwordsandwhichcombinetheirmeaning.Weusethetermportmanteautodescribethesecompoundtermstoemphasizethefactthat,similarlytotheuseofneologisticportmanteauxinnaturallanguagetocapturecomplexandcompoundconcepts,wecreategroupsofcolorandshapewordstodescribesemanticconceptsinadequatelydescribedbyshapeorcoloralone.Asimplewaytoensurethebindingpropertyisbyconsideringaproductvocabularythatcontainsanewwordforeverycombinationofshapeandcolorterms.AssumethatS=fs1;s2;:::;sMgandC=fc1;c2;:::;cNgrepresentthevisualshapeandcolorvocabularies,respectively.Thenthe2 Figure1:Comparisonoftwoestimatesofthejointcuedistributionp(S;CjR)ontwolargedatasets.ThegraphsplottheJenson-Shannondivergencebetweeneachestimateandthetruejointdistributionasafunctionsofthenumberoftrainingimagesusedtoestimatethem.Thetruejointdistributionisestimatedempiricallyoverallimagesineachdataset.Estimationusingtheindependenceassump-tionofequation(2)yieldssimilarorbetterestimatesthantheirempiricalcounterparts.productvocabularyisgivenbyW=fw1;w2;:::;wTg=ffsi;cjgj1iM;1jNg;whereT=MN.WewillalsousethethenotationsmtoidentifyamemberfromthesetS.AdisadvantageofvocabulariesofcompoundtermsconstructedbyconsideringtheCartesianproductofallprimitiveshapeandcolorwordsisthatthetotalnumberofvisualwordsisequaltothenumberofcolorwordstimesthenumberofshapewords,whichtypicallyresultsinhundredsofthousandsofelementsinthenalvocabulary.Thisisimpracticalfortworeasons.First,thehighdimensionalityoftherepresentationhamperstheuseofcomplexclassierssuchasSVMs.Second,insufcienttrainingdataoftenrendersrobustestimationofparametersverydifcultandtheresultingclassierstendtoovertthetrainingset.Becauseofthesedrawbacks,compoundproductvocabularieshave,tothebestofourknowledge,notbeenpursuedinliterature.Inthenexttwosubsectionswediscussourapproachtoovercomingthesetwodrawbacks.2.1CompactPortmanteauVocabulariesInrecentyears,severalalgorithmsforfeatureclusteringhavebeenproposedwhichcompresslargevocabulariesintosmallones[16,7,5].Toreducethehigh-dimensionalityoftheproductvocabulary,weapplyDivisiveInformation-TheoreticfeatureClustering(DITC)algorithm[5],whichwasshowntooutperformAIB[16].Furthermore,DITChasalsobeensuccessfullyemployedtoconstructcompactpyramidrepresentations[6].TheDITCalgorithmisdesignedtondaxednumberofclusterswhichminimizethelossinmutualinformationbetweenclustersandtheclasslabelsoftrainingsamples.Inouralgorithm,lossinmutualinformationismeasuredbetweenoriginalproductvocabularyandtheresultingclusters.Thealgorithmjoinswordswhichhavesimilardiscriminativepoweroverthesetofclassesintheimagecategorizationproblem.Thisismeasuredbytheprobabilitydistributionsp(Rjwt),whereR=fr1;r2;::rLgisthesetofLclasses.Moreprecisely,thedropinmutualinformationIbetweenthevocabularyWandtheclasslabelsRwhengoingfromtheoriginalsetofvocabularywordsWtotheclusteredrepresentationWR=fW1;W2;:::;WJg(whereeveryWjrepresentsaclusterofwordsfromW)isequaltoI(R;W)�I�R;WR=JXj=1Xwt2Wjp(wt)KL(p(Rjwt)jjp(RjWj));(1)whereKListheKullback-Leiblerdivergencebetweentwodistributions.Equation(1)statesthatthedropinmutualinformationisequaltotheprior-weightedKL-divergencebetweenawordanditsassignedcluster.TheDITCalgorithmminimizesthisobjectivefunctionbyalternatingcomputation3 0,00E+00 2,00E - 06 4,00E - 06 6,00E - 06 8,00E - 06 1,00E - 05 1,20E - 05 1,40E - 05 1,60E - 05 0 2 4 6 8 10 12 14 16 Bird - 200 Direct Empirical Independence Assumption 0,00E+00 2,00E - 06 4,00E - 06 6,00E - 06 8,00E - 06 1,00E - 05 0 2 4 6 8 10 12 14 16 18 20 22 Flower - 102 Direct Empirical Independence Assumption Figure2:Theeffectof onDITCclusters.Eachofthelargeboxescontains100imagepatchessampledfromonePortmanteauwordontheOxfordFlower-102dataset.Toprow:veclustersfor =0:1.Notehowtheseclustersarerelativelyhomogeneousincolor,whileshapevariesconsiderablywithineach.Middlerow:veclusterssampledfor =0:5.Theclustersshowconsistencyoverbothcolorandshape.Bottomrow:veclusterssampledfor =0:9.Noticehowinthiscaseshapeisinsteadhomogeneouswithineachcluster.oftheclusterdistributionsandassignmentofcompoundvisualwordstotheirclosestcluster.FormoredetailsontheDITCalgorithmwerefertoDhillonetal.[5].HereweapplytheDITCalgorithmtoreducethehigh-dimensionalityofthecompoundvocabularies.WecallthecompactvocabularywhichistheoutputoftheDITCalgorithmtheportmanteauvocabularyanditswordsaccordinglyportmanteauwords.Thenalimagerepresentationp(WR)isadistributionovertheportmanteauwords.2.2JointdistributionestimationInsolvingtheproblemofhigh-dimensionalityofthecompoundvocabulariesweseeminglyfur-thercomplicatedtheestimationproblem.AsDITCisbasedonestimatesoftheclass-conditionaldistributionsp(S;CjR)=p(WjR)overproductvocabularies,wehaveincreasedthenumberofparameterstobeestimatedtoMNL.Thiscaneasilyreachmillionsofparametersforstandardimagedatasets.Tosolvethisproblemweproposetoestimatetheclassconditionaldistributionsbyassumingindependenceofcolorandshape,giventheclass:p(sm;cnjR)/p(smjR)p(cnjR):(2)Notethatwedonotassumeindependenceofthecuesthemselves,butratherthelessrestrictivein-dependenceofthecuesgiventheclass.Insteadofdirectlyestimatingtheempiricaljointdistributionp(S;CjR),wereducethenumberofparameterstoestimateto(M+N)L,whichinthevo-cabularycongurationsdiscussedinthispaperrepresentsareductionincomplexityoftwoordersofmagnitude.Asanadditionaladvantage,wewillshowinsection2.3thatestimatingthejointdistributionp(S;CjR)allowsustointroducecueweighting.Toverifythequalityoftheempiricalestimatesofequation(2)weperformthefollowingexperiment.Ingure1weplottheJensen-Shannon(JS)divergencebetweentheempiricaljointdistributionob-tainedfromthetestimagesandthetwoestimates:directestimationoftheempiricaljointdistributionp(S;CjR)onthetrainingset,andanapproximateestimatemadebyassumingindependenceasin4

Related Contents


Next Show more