PleasecitethisarticleasAnnaBoschetalAreviewWhichisthebestwaytoorganizeclassifyimagesbycontentImageandVisionComputing2006doi101016jimavis200607015 Subblockstheimageis ID: 328076
Download Pdf The PPT/PDF document "ARTICLEINPRESS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 Sub-blocks:theimageisrstpartitionedintoseveralblocks,andthenfeaturesareextractedfromeachofthoseblocks.Inthissectionareviewofthemostrecentandrepresen-tativeproposalsofbothglobalandsub-blockapproachesispresented.2.1.GlobalVailayaetal.[10 12]considerthehierarchicalclassica-tionofvacationimages,andshowthatlow-levelfeaturescansuccessfullydiscriminatebetweenmanyscenestypesusingahierarchicallystructure.UsingbinaryBayesianclassiers,theyattempttocapturehigh-levelconceptsfromlow-levelimagefeaturesundertheconstraintthatthetestimagebelongstooneoftheclasses.Atthehighestlevel,imagesareclassiedasindoororoutdoor;outdoorimagesarefurtherclassiedascityorlandscape;nally,asubsetoflandscapeimagesisclassiedintosunset,forest,andmountainclasses.Dierentqualitativemeasures,extractedfromthewholeimage,areusedateachleveldependingontheclassicationproblem:indoor/outdoor(usingspatialcolourmoments);city/landscape(edgedirectioncoherencevectors),andsoon.TheclassicationproblemisaddressedbyusingBayesdecisiontheory.Eachimageisrepresentedbyafeaturevectorextractedfromtheimage.Theprobabi-listicmodelsrequiredfortheBayesianapproachareesti-matedduringatrainingstep.Considertrainingsamplesfromaclass.Avectorquantiserisusedtoextractcode-bookvectors,,fromthetrainingsamples.Theclass-conditionaldensityofafeaturevectorgiventheclass,i.e.,),isthenapproximatedbyamixtureofGaus-sians(withidentitycovariancematrices),eachcenteredatacodebookvector,resultingin:istheproportionoftrainingsamplesassignedto.TheBayesianclassieristhendenedusingthemaxi-mumaposteriori(MAP)criterionasfollows:argmaxÞg¼argmaxÞgðisthesetofpatternclassesand)representstheaprioriclassprobability.Theproposalreportsanexcellentperformanceateachlevelofthehierarchyoverasetof6931images.However,itsuersalimitationinherenttohierarchicalclassiersthatisthecascadingoferrors.Toclassifyatestimage,forexampleaforest,intoacategoryimpliesthatwehavetosuccessfullyclassifytheimageatseveralstages(1)outdoor,(2)landscape,and(3)forest,withtheprobabilityofmissingateachlevel.Andobviouslyaninitialmistakecannotbesolvedatlowerlevels.Alsoininglobalfeaturesareusedtoproduceasetofsemanticallabelswithacertainbeliefforeachimage.Theymanuallylabeleachtrainingimagewithasemanticlabelandtrainclassiers(oneforeachsemanticlabel)usingsupportvectormachines(SVM).Eachtestimageisclassi-edbytheclassiersandassignedacondencescoreforthelabelthateachclassierisattemptingtopredict.Asaresult,a-narylabel-vectorconsistingof-classmember-shipisgeneratedforeachimage.ThisapproachisspeciallyusefulforContentBasedImageRetrieval(CBIR)andRel-evanceFeedback(RF)systems.Otherauthorshavefol-lowedthisglobalapproach,althoughtheyhavetakenotheraspectsintoaccount.Forexample,Shenetal. Low-levelModelling SemanticModelling Global Sub-Blocks ConceptsPropertiesInput image Fig.2.Sceneclassicationapproaches.Low-levelorsemanticmodelling,isthemainpropertytodistinguishbasicstrategiestotackletheproposeclassication.Severalapproacheshavebeenidentiedinbothmainstrategiesdependingonhowtheyachievethenalsceneclassication.A.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 teriorprobabilitycanbeexpressedbythejointprobability,whichcanbefurtherexpressedbytheconditionalandpriorprobabilities: ðS;EÞPðEÞ¼ wheredenotesthesemantictaskanddenotesevidence.Theecacyofthisframeworkisdemonstratedviathreeapplicationsinvolvingsemanticunderstandingofpictorialimages:(i)detectionofthemainphotographicsubjectsinanimageimage,(ii)selectingthemostappealingimageinanevent,and(iii)classifyingimagesintoindoororoutdoorscenes.Thislastapplicationrefersspecicallytotheprob-lemofsceneclassicationion.Theperformanceisquantit-ativellyevaluatedusingonlylow-levelfeatures(OhtacolourspacehistogramsandMSARtexturefeaturesasinin),andincorporatingsemanticfeatures(skyandgrassobjects).Theydemonstratethattheclassicationperfor-mancecanbesignicantlyimprovedwhensemanticfea-turesareemployedintheclassicationprocess.Aksoyetal.al.alsoappliedaBayesianframeworkinavisualgrammar.Scenerepresentationisachievedbydecom-posingtheimageintoprototyperegionsandmodellingtheinteractionsbetweentheseregionsintermsoftheirspatialrelationships.Initiallyanimagesegmentationisperformedusingaclassicalsplit-and-mergealgorithm.Then,thetech-niqueautomaticallylearnsrepresentativeregiongroupswhichdiscriminatedierentscenesandbuildsvisualgram-marmodels.Similarly,inin,aftersegmentingtheimageintoregions,featuresareextractedandregionsclassied.Finally,basedonthislocalclassicationthealgorithmclas-siestheentireimage.Theirmaincontributionisthattheyfoundthattheadditionofeigenregions(theprincipalcom-ponentsoftheintensityoftheregion)tothefeaturevectorimprovesregionclassicationresultsandfurthermoretheimageclassicationrates.AsimilarapproachwasproposedbyMojsilovicetal.al.whereauthorsrstsegmenttheimageusingcolourandtextureinformationtondthesemanticindicators(e.g.skin,sky,water,etc.).Then,theseobjectsareusedtoidentifythesemanticcategories(i.e.peo-ple,outdoor,landscapes,etc.).Finally,wecanalsoincludeinthisapproachthepropos-alofVogelandSchiele[27,28],althoughinthiscasethesegmentationisperformedbyasimplespatialgridlayoutwhichsplitstheimageintoregularsubregions.Thetech-niqueusesbothcolourandtexturetoperformlandscapesceneclassicationandretrievalbasedonatwo-stagesys-tem.First,theimageispartitionedinto1010subregions,andeachoneisclassiedusingK-NNorSVM.Animageisthenrepresentedbyaso-calledconceptoccurrencevector(COV),whichmeasuresthefrequencyofdierentobjectsinaparticularimage.TheaverageCOVoverallmembersofacategorydenesthecategoryprototype( referstooneofthescenecategoriesandtothenumberofimagesinthatcategory.Giventhisimagerepre-sentation,aprototypicalrepresentationforeachscenecat-egorycanbelearnt.SceneclassicationiscarriedoutbyusingtheprototypicalrepresentationitselforMulti-SVM3.2.LocalsemanticconceptsInthelastyearswecanndintheliteratureonsceneclassication,anincreasingnumberofproposalswhichmakeuseoflocalsemanticconcepts.Hence,anintermedi-arysemanticlevelrepresentationisintroducedasarststepbetweenimagepropertiesandsceneclassicationinordertodealwiththesemanticgapbetweenlow-levelfea-turesandhigh-levelconcepts.Nevertheless,allthesepro-posalsdonotrelyonaninitialsegmentation.Otherwise,thecontentofthesceneisdescribedbylocaldescriptors,forexamplecodewords[29 32]asshowninFig.4.Thesemethodshaveincommonthatworkoverthebag-of-wordsatechniqueusedforthestatisticaltextanalysis.3.2.1.Bag-of-wordsThebag-of-wordsmethodologywasrstproposedfortextdocumentanalysisandfurtheradaptedforcomputervisionapplications.Themodelsareappliedtoimagesbyusingaanalogueofa,formedbyvectorquan-tisingvisualfeatures(colour,texture,etc.)likeregiondescriptors.Recentworkshaveshownthatlocalfeaturesrepresentedbybags-of-wordsaresuitableforsceneclassi-cationshowingimpressivelevelsofperformance[33 36] Fig.4.Localdescriptorsrepresentedby55patchesatgreyscale.A.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 showsthatthespatialenvelopepropertyisesti-matedbyadotproductbetweentheamplitudespectrumoftheimageandatemplateDST().TheDST(discrim-inantspectraltemplate)isafunctionthatdescribeshoweachspectralcomponentcontributestoaspatialenvelopeproperty.TheDSTisparametrizedbythecolumnvectorwhichisdeterminedduringalearningstage.Asim-ilarestimationcanbeperformedwhenusingthespecto-gramfeatureswTheWDST(windoweddiscriminantspectraltemplate)de-scribeshowthespectralcomponentsatdierentspatiallocationscontributetoaspatialenvelopeproperty.Theperformanceofthespatialenvelopemodelshowsthatspe-cicinformationaboutobjectshapeoridentityisnotarequirementforscenecategorisationandthatmodellingaholisticrepresentationofthesceneinformsaboutitsprob-ablesemanticcategory.arethenumberoffunc-tionsusedfortheapproximationsanddeterminethedimensionalityofeachrepresentationand)arebasisoftheenergyspectrum.4.EvaluationArobustandobjectivemethodologyfortheevaluationofexistingapproachestosceneclassicationisneededinordertodiscernthebestmethodforanspecicapplicationeld.Thus,albeitnecessary,isnotatrivialtaskduetotheheterogenousdataandclassicationimplementations,andhasoftenbeenmisregardedintheexistingliteratureonthatspecictopic.Proposalsdieronobjectivestheytrytosat-isfy(e.g.numberandkindofscenestoclassify),andtheimagedataovertheyworkwith(speciallyconstrainedinsomecases).Furthermore,testdetailsashowtheimagesweresplitintotrainingandtestsetsareoftennotspeciedinpublishedworks.Hence,unlessagivensystemisimple-mentedandtestedforspecicimagedata,itisverydiculttoevaluatefromthepublishedworkshowwellitwouldworkforthatdata.Itisouraimhere,toprovideandevaluationofalthoughnotallexistingmethodologies,themustrepresentativeworksderivedfromeachcriteriareviewedsofar.Wedesignedandimplementedthreealgorithmsrepresentativeofthemainapproachesidentiedinthiswork.Wethentest-edthemoverthesamedatasetusedbyVogelandSchielehiele,whichallowedustocomparetheirperformancetotheexistingresultspublished.Wecomparedtheresultsonsceneclassicationobtainedbyfourdierentmethodsmen-tionedabove:(i)low-levelimagerepresentation(LLI),(ii)low-levelblockrepresentation(LLB),(iii)imagesegmenta-tionbyclassifyingpresentobjects(IS)and(iv)bag-of-wordsmodelusingpLSA(BOW).TheVogelandSchieleieledatasetusedincludes700naturalscenesfromtheCorelDatabaseconsistingofsixcategories:144coasts,103for-ests,179mountains,131opencountry,111riverand34sky/clouds.Thesizeoftheimagesis720480(landscapeformat)or480720(portraitformat).Everyscenecatego-ryischaracterisedbyahighdegreeofdiversityandpresentspotentialambiguitiessinceitdependsstronglyonthesub-jectiveperceptionoftheviewer.ForexampleinFig.7thethreescenescouldbealsolabelledasforestsomeone,yetthereisalsoaforestintheseimages.More-over,wealsoevaluatedthecomputationalcostofthemethods.4.1.FeaturesandmethodologyWeused600randomlyselectedtrainingimagesandtherestfortestingasinin.Featuresusedareaconcatenationofan84HSIhistogram(with36binsfor,32binsforand16binsfor),24featuresofthegray-levelco-occur-rencematrices(32graylevels):contrast,energy,entropy,homogeneity,inversedierencemoment,andcorrelation,forthedisplacements1,and,anda72-binedgedirectionhistogram.Thenalfeaturevectoristhen180-dimensional.Moreover,wehaveevaluatedopti-mumparametersvaluesforeachtechnique.Here,onlythebestresultsobtainedwiththesesparametervaluesareshown.Themethodologiesforeachstrategyarethefollowing:Thealgorithmcomputesglobalfeaturesforeachtrainingimage,theneachimageisrepresentedbya180-dimensionalvector.AtestimageisclassiedusingK-NN(withK=10).LLB:ThealgorithmextractsvectorfeaturesforeachblockinthetrainingimagefollowingthestrategyproposedbySzummerandPicardPicard.Wedividedtheimageinto22and4blocks.Eachblockfromthetestimageisclassi-edusingK-NN(withK=10)andthencombin-ingtheseresultsweclassifytheimagebyamajorityvoting.IS:Thisisthemethodimplementedinin.Theyrstclassifyeachimagepatch(1010grid)providingfromacertainobjectandusingtheobjectdistributiontheimageclassicationiscarriedout.Authorsinnworkedwiththesamedatasetandfeaturesastheusedhereinforevaluation.Thuswehaveusedtheirpublishedperformancetocompareittootherapproachesinthispaper.BOW:5squareneighbourhoodaroundapixelisusedtocomputethefeaturevector.Thepatchesarespacedby3pixelsonaregulargrid.Inthiscasewehavelotsoffeaturevectorsandwequantizethemusingthek-meansalgorithmtoformthevisualA.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 vocabulary.ThenweclassifyeachimageusingpLSA.Weused=700forthek-meansalgorithm(vectorquantization)and=25whenrunningpLSA.Theclassicationtaskistoassigneachtestimagetooneofthesixcategories.Theperformanceismeasuredusingaconfusiontable,andoverallperformanceratesaremea-suredbytheaveragevalueofthediagonalentriesoftheconfusiontable.4.2.ClassicationresultsClassicationresultsareshowninTable1.Asitisclearlystated,worseresultswereobtainedbylow-levelapproaches.A53.25%ofcorrectclassicationwasachievedbytheLLIalgorithm,whileapoor49.12%wasreachedbytheLLBalgorithm.Alow-levelstrategy,whichconsidersthesceneasanindividualobject,isnormallyusedtoclassifyonlyasmallnumberofscenecategories(indoorversusoutdoor,cityversuslandscape,etc.).Thesixcategoriesconsideredinourexperimentsaretoocomplextobedistinguishedbylow-levelsceneproperties.IfwelookatFig.7a,thegroundtruthofleftimageiswhilethegroundtruthoftheotheronesisHowevertheircolourandtexturedistributionisverysim-ilar(Fig.7bshowstheHSIhistograms)andlow-levelmethodLLIfailswhenittriestoclassifyscenes,clas-sifyingthemasforestscenes.SomethingsimilarhappensFig.7c,whereimagesareconfusedasmountainInfact,thesetofimagesandcategoriesusedbymostoftheauthorsareoftenconstrained.Asanexample,thecat-egoriesusedbyVaiyalaVaiyalawerechosenspecicallytobenicelyseparable.Thesameauthorrecognisedinin:wethusrestrictedclassicationoflandscapeimagesintothreeclassesthatcouldbemoreunambiguouslydistinguished,namelysunset,forest,andmountainclasses.Sunsetscenescanbecharacterisedbysaturatedcolours(red,orangeoryellow),forestsceneshavepredominatelygreencolourdis-tributionduetothepresenceofdensetreesandfoliage,andmountainscenescanbecharacterisedbylongdistanceshotsofmountains.Incontrast,whenusinglocalsemanticconcepts,orobjectsegmentation,wecandealwithobjects(orconcepts)intheimages,andclassifytheminaneasyway,according bcd MountainCoast Fig.7.SometypicalscenesconfusedwhenusingglobalmethodLLIforclassication.(a)riverimagesconfusedasforest,(c)coastimagesconfusedasmountain.(bandd)HSIhistogramsfromtheaboveimages. Table1Performanceofthecomparedapproachesoverasamedataset.LLI,LLB,BOWhavebeenimplementedbyourselves,whileISperformanceisthescorepublishedininLLI(%)LLB(%)IS(%)Bow(%)53.2549.1274.1076.92A.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 totheirdistribution.Semantictechniqueswhichmakeuseofanintermediaterepresentationachievedbesttheresults,scoresfrom74%toalmost77%havebeenobtained(seeTable1).Theycandealwiththepartsoftheimagethatcorrespondto,andtheonesthatcorrespondtothe,anditwillallowtodistinguishbetweenforestscenes(seeFig.8).Notethatwehavenotprovidedacomparisonforthesemanticproperties-basedmethodol-ogies,howeverithasbeendemonstratedininthatbag-of-wordsmethodsperformbetterthantherepresentativeproposalofOlivaandTorralbaTorralba.ComputationalCost.Low-levelstrategieshavetwoclearadvantages:theirsimplicityandtheirlowcomputationalcost.Overthesetof600trainingimages,3minareneededtoconstructtheclassierwhenusingLLI,and6minand40swhenLLB.ThiscomputationalcostismuchhigherwhenusingBOW.Thisisbecauseweusemoreinformationfromtheimages.Weneedapreprocessingsteptoconstructthevisualvocabularywhichisabitexpensive:around4hextracting6400descriptorsperimageandrunningk-meanswith700clusters.ThestepforttingpLSAtakes10min.However,thecoststoclassifyatestimagearecomparable:2and7sforLLIandLLBrespectivelyand15sforBOW.Authorsinndidnotgivethecomputationalcostoftheiralgorithm.Allaboveexperimentshavebeendoneona1.7GHzComputerandMatlabimplementation.4.3.DiscussionAlthoughlow-levelstrategiespresentalowestcomputa-tionalcost,theyhaveapoorperformance,becausetheyareunabletodistinguishbetweencomplexscenes.Hierarchicschemeshavebeenproposedtoovercomethisdrawback,howeverourresultsseemtocorroboratetheinappropriateofthesemethodswhenthenumberofcategoriesisincreased.Ontheotherhand,thebestresultshavebeenobtainedwhenusingLocalSemanticConceptswiththebag-of-wordsandpLSAmethod(76.92%).Besides,thisapproachhasanice,veryrelevantproperty;localsemanticapproachesarealsotheoneswhichrequirelessuserinterventiontolearnintermediaterepresentations:theydirectlylearnfromthedatabyanunsupervised(e.g.[35,33])orsemi-supervisedvised)way.Contrarily,amainrequirementoftheothersemanticmodellingapproachesisthemanualannotationoftheseproperties.InOlivaandTorralbaworkwork,humansubjectsareinstructedtorankeachofthehundredsoftrain-ingscenesinto6dierentproperties.InIn,humansubjectsareaskedtoclassifynear60,000localpatchesfromthetrain-ingimagesintoninedierentsemanticconcepts.Bothcasesinvolvestensofhoursofmanuallabelling.Hence,adrawbackofthesestrategiesistheirpreprocessingcost,althoughthisstepcouldbedoneo-line.Focusingonthesemanticapproaches,themaindrawbackwhenusingsegmentationtechniques(IS)respecttotheoneswhichuselocalsemanticconcepts(BOW),isprobablytheaccuracyofthesegmentationmethod.Ifobjectsarenotwellsegmentedalltheposteriorclassicationstageswillprobablyfail.Incontrast,whenusinglocalsemanticconceptstheseg-mentationprocessisomittedandtheimageisclassiedlook-ingatthelocalpatches.Furthermore,anotherimportantfeature,tounderstandthebestresultsobtainedbyBOWtechnique,isprobablyitsfreedomtochooseappropriateconcepts(=25)foradataset.Thesystemorganisestheminhisownwayinordertohavedierentobjectrepresenta- b 0,20,40,60,8135791113151719z (topics)P(z|d) 0,20,40,60,8135791113151719z (topics)P(z|d) 0,20,40,60,8135791113151719z (topics)P(z|d) 0,20,40,60,8135791113151719z (topics)P(z|d) Fig.8.Somescenesconfusedwhenusinglow-levelmethodsandwellclassiedwhenusinglocalsemanticconcepts.(a)originalimage,(b)topicsassignedbypLSAtoeachpatch,(c)topicdistribution ) foreachimage.Notethatpreviousconfusedimages(seetext)havedierentdistributionsaccordingtoitskindofscene,whichallowstoobtainabetterclassicationrate.A.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 Table2SummaryoftheanalysedsceneclassicationsystemsAuthorObjectsScenesFeatures#scenesLow-levelstrategiesGlobalVaialyaetal.[10,12] BayesianclassiersLUVandHSVcolorspace(spatialmoments,histograms,coherencevectors);MSAR;edgedirectionshistograms,coherencevectors5:indoor,city,sunset,forestandmountainChangetal.al. SVM;BayespointmachineColor,texture15:architecture,bears,clouds,elephants,fabric,reworks,owers,food,landscape,objectimages,people,texture,tigers,tools,andwavesShenetal.al. SVM;K-NN;GMMColor;texture;shape10:naturalscenery,architecture,plants,animals,rocks,ags,buses,food,humanfacesandrosesSub-blocksSzummerandPiccardccard K-NN;3-layerNN;mixtureofExpertsclassierOhtacolorspace;MSAR;shift-invariant2:indoorandoutdoorSerranoetal.al. SVMLSTcolorspace;wavelettexture2:indoorandoutdoorSemanticstrategiesObjectsFanetal.al.SVMBayesianframeworkcoverageratio,regioncenter,regionrectangularbox,Tamuratexture,wavelettexture/colorLUV,6:mountain,view,beach,garden,saliling,skiinganddesertLuoetal.[21,23]K-NNBayesianNetworkOhtaColorspace,MSAR2:indoor,outdoorFredembachetal.al.MultivariategaussinanalysisbasedonthemaximumaposterioriruleProbabilisticmethodRGB,Lab,co-occurrencematrix,amplitudespectrumofthefouriertransform3:vegetation,skyandskinMojsilovicetal.al.NaiveBayesclassierBayesianframeworkregionspatialrelationshipsPortraits,people,outdoor,crowes,city,indoor,lanscapes,etc.VogelandSchiele[27,28]K-NN;SVMM-SVM;SSDbetweencategoryprototypesColor(HSV,RGB)andedgehistograms;co-occurrencematrix6:sky,coast,mountains,eld,riverandConceptsFei-Feietal.al.ImplicitwiththemethodBag-of-wordsandLDAextensionDenseSIFTandgrayleveldescriptorsonaregulargrid13:forest,coast,mountain,opencountry,street,insidecity,tallbuildings,highway,bedroom,suburb,livingroom,kitchenandoceQuelhasetal.al.Implicitwiththemethodbag-of-wordsandpLSAsparseSIFTaroundinterestpoints3:indoor,cityandlandscapeBoschetal.al.ImplicitwiththemethodBag-of-wordsandpLSADenseSIFTonaregulargrid concentricpatchesallowscalevariationUpto13:thesameasinnPerroninetal.al.ImplicitwiththemethodBag-of-wordsandGMMdenseSIFTonaregulargridUpto10:Africa,beach,buildings,buses,dinousaurs,elephants,owers,horses,mountainsandfoodLazebniketal.al.ImplicitwiththemethodBag-of-wordsandpyramidkernelsDenseSIFTonaregulargrid15:thesameasinnplusindustrialandPropert.OlivaandTorralba[18,44,45]ImplicitwiththemethodK-NNSpatialenvelope(DSTWDST)4man-made:street,highway,tallbuildingsandinsidecity;4natural:forest,coast,mountainandopencountryA.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015 [21]J.Luo,A.E.Savakis,A.Singhal,Abayesiannetwork-basedframeworkforsemanticimageunderstanding,PatternRecognition38(2005)919 934.[22]J.Luo,A.Singhal,S.P.Etz,R.T.Gray,Acomputationalapproachtodeterminationofmainsubjectregionsinphotographicimages,ImageandVisionComputing22(2004)227 241.[23]J.Luo,A.Savakis,Indoorvsoutdoorclassicationofconsumerphotographsusinglow-levelandsemanticfeatures,in:IEEEInter-nationalConferenceonImageProcessing,vol.2,Thessaloniki,Greece,2001,pp.745 748.[24]S.Aksoy,K.Koperski,C.Tusk,G.Marchisio,J.C.Tilton,LearningBayesianclassiersforsceneclassicationwithavisualgrammar,IEEETransactionsonGeoscienceandRemoteSensing43(3)(2005)[25]C.Fredembach,M.Schroder,S.Susstrunk,Eigenregionsforimageclassication,IEEETransactionsonPatternAnalysisandMachineIntelligence26(12)(2004)1645 1649.[26]A.Mojsilovic,J.Gomes,B.Rogowitz,Isee:Perceptualfeaturesforimagelibrarynavigationin:Proc.SPIEHumanvisionandelectronicimaging,vol.4662,SanJose,California,2002,pp.266 277.[27]J.Vogel,SemanticSceneModelingandRetrieval,no.33inSelectedReadingsinVisionandGraphics,HoughtonHartung-GorreVerlagKonstanz,2004.[28]J.Vogel,B.Schiele,Naturalsceneretrievalbasedonasemanticmodelingstep,in:InternationalConferenceonImageandVideoRetrieval,LNCS,vol.3115,Dublin,Ireland,2004,pp.207 215.[29]M.Varma,A.Zisserman,Textureclassication:Arelterbanksnecessary?,in:IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition,Vol.2,Madison,Wisconsin,2003,pp.691 698.[30]T.Leung,J.Malik,Representingandrecognizingthevisualappear-anceofmaterialsusingthree-dimensionaltextons,InternationalJournalofComputerVision43(1)(2001)29 44.[31]J.Portilla,E.Simoncelli,Aparametrictexturemodelbasedonjointstatisticsofcomplexwaveletcoecients,InternationalJournalofComputerVision40(1)(2000)49 70.[32]S.Lazebnik,C.Schmid,J.Ponce,Asparsetexturerepresentationusingane-invariantregions,in:IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition,vol.2,Madison,Wisconsin,2003,pp.319 324.[33]A.Bosch,A.Zisserman,X.Munoz,Sceneclassicationviaplsa,in:EuropeanConferenceonComputerVision,vol.4,Graz,Austria,2006,pp.517 530.[34]L.Fei-Fei,P.Perona,Abayesianhierarchicalmodelforlearningnaturalscenecategories,in:IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition,Washington,DC,USA,2005,pp.524 531.[35]P.Quelhas,F.Monay,J.Odobez,D.Gatica-Perez,T.Tuytelaars,L.VanGool,Modelingsceneswithlocaldescriptorsandlatentaspects,in:InternationalConferenceonComputerVision,Beijing,China,2005,pp.883 890.[36]S.Lazebnik,C.Schmid,J.Ponce,Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories,in:IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition,NewYork,2006,toappear.[37]T.Hofmann,Unsupervisedlearningbyprobabilisticlatentsemanticanalysis,MachineLearning41(2)(2001)177 196.[38]D.Blei,A.Ng,M.Jordan,Latentdirichletallocation,JournalofMachineLearningResearch3(2003)993 1022.[39]Y.Teh,M.Jordan,M.Beal,D.Blei,Hierarchicaldirichletprocess,NeuralInformationProcessingSystems17(2005)1385 1392.[40]B.C.Russell,A.A.Efros,J.Sivic,W.T.Freeman,A.Zisserman,Usingmultiplesegmentationstodiscoverobjectsandtheirextentinimagecollections,in:IEEEComputerSocietyConferenceonCom-puterVisionandPatternRecognition,NewYork,2006,toappear.[41]F.Perronin,C.Dance,G.Csurka,M.Bressan,Adaptedvocabulariesforgenericvisualcategorization,in:EuropeanConferenceonComputerVision,vol.4,Graz,Austria,2006,pp.464 475.[42]A.Bosch,X.Munoz,J.Marti,Usingappearanceandcontextforoutdoorsceneobjectclassication,in:IEEEInternationalConferenceonImageProcessing,vol.II,Genova,Italy,2005,pp.1218 1221.[43]R.Fergus,L.Fei-Fei,P.Perona,A.Zisserman,Learningobjectcategoriesfromgooglesimagesearch,in:InternationalConferenceonComputerVision,vol.II,Beijing,China,2005,pp.1816 1823.[44]A.Torralba,A.Oliva,Semanticorganizationofscenesusingdiscriminantstructuraltemplates,in:InternationalConferenceonComputerVision,Korfu,Greece,1999,pp.1253 1258.[45]A.Oliva,A.Torralba,Scene-centereddescriptionfromspatialenvelopeproperties,in:InternationalWorkshoponBiologicallyMotivatedComputerVision,LNCS,vol.2525,Tuebingen,Germany,2002,pp.263 272.[46]A.Bosch,X.Munoz,A.Oliver,R.Mart,Objectandsceneclassication:whatdoesasupervisedapproachprovideus?,in:IAPRInternationalConferenceonPatternRecognition,HongKong,2006,toappear.[47]M.D.Squire,W.Muller,H.Muller,T.Pun,Content-basedqueryofimagedatabases:inspirationsfromtextretrieval,PatternRecognitionLetters21(2000)1193 1198.[48]J.Sivic,B.Russell,A.Efros,A.Zisserman,W.T.Freeman,Discoveringobjectsandtheirlocationsinimages,in:InternationalConferenceonComputerVision,Beijing,China,2005,pp.370 377.A.Boschetal./ImageandVisionComputingxxx(2006)xxx xxx ARTICLEINPRESS Pleasecitethisarticleas:AnnaBosch,etal.,Areview:Whichisthebestwaytoorganize/classifyimagesbycontent?,ImageandVisionComputing(2006),doi:10.1016/j.imavis.2006.07.015