/
IEEE TRANSA CTIONS ON TTERN AN AL YSIS AND MA CHINE IN IEEE TRANSA CTIONS ON TTERN AN AL YSIS AND MA CHINE IN

IEEE TRANSA CTIONS ON TTERN AN AL YSIS AND MA CHINE IN - PDF document

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
469 views
Uploaded On 2015-05-05

IEEE TRANSA CTIONS ON TTERN AN AL YSIS AND MA CHINE IN - PPT Presentation

The system can differ entiate outdoor scenes fr om arious sites on college campus using multiscale set of earlyvisual featur es which captur the gist of the scene into lo wdimensional signatur ector Distinct fr om pr vious appr oaches the algorithm ID: 61498

The system can differ

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "IEEE TRANSA CTIONS ON TTERN AN AL YSIS A..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE1RapidBiologically-InspiredSceneClassicationUsingFeaturesSharedwithVisualAttentionChristianSiagian,Member,IEEE,LaurentItti,Member,IEEE,Abstract—Wedescribeandvalidateasimplecontext-basedscenerecognitionalgorithmformobileroboticsapplications.Thesystemcandifferentiateoutdoorscenesfromvarioussitesonacollegecampususingamultiscalesetofearly-visualfeatures,whichcapturethe“gist”ofthesceneintoalow-dimensionalsignaturevector.Distinctfrompreviousapproaches,thealgorithmpresentstheadvantageofbeingbiologicallyplausibleandofhavinglowcomputationalcomplexity,sharingitslow-levelfeatureswithamodelforvisualattentionthatmayoperatecon-currentlyonarobot.Wecompareclassicationaccuracyusingsceneslmedatthreeoutdoorsitesoncampus(13,965to34,711framespersite).Dividingeachsiteintoninesegments,weobtainsegmentclassicationratesbetween84.21%and88.62%.Combiningscenesfromallsites(75,073framesintotal)yields86.45%correctclassication,demonstratinggeneralizationandscalabilityoftheapproach.IndexTerms—Gistofascene,saliency,scenerecog-nition,computationalneuroscience,imageclassication,imagestatistics,robotvision,robotlocalization.I.INTRODUCTIONBUILDINGthenextgenerationofmobilerobotshingesonsolvingtaskssuchaslocalization,mapping,andnavigation.Thesetaskscriticallyde-pendondevelopingcapabilitiestorobustlyanswerthecentralquestion:Wherearewe?Asignicantnumberofmobileroboticsapproachesaddressthisfundamentalproblembyutilizingsonar,laser,orotherrangesensors[1]–[3].Theyareparticularlyeffectiveindoorsduetomanyspatialandstruc-turalregularities,includingatwallsandnarrowcorridors.Intheoutdoors,however,thesesensorsbecomelessrobustbecausethestructureoftheenvi-romentcanvarytremendously.Itthenbecomeshardtopredictthesensorinputgivenalltheprotrusionsandsurfaceirregularities[4].Forexample,aslightC.SiagianandL.IttiarewiththeUniversityofSouthernCali-fornia,DepartmentsofComputerScience,Psychology,andNeuro-scienceProgram,HedcoNeuroscienceBuilding-Room30A,3641WattWay,LosAngeles,California,90089-2520.Correspondenceshouldbeaddressedtoitti@usc.edu.changeinposecanresultinlargejumpsinrangereadingbecauseoftreetrunks,movingbranches,andleaves.Thesedifcultieswithtraditionalrobotsensorshavepromptedresearchtowardsotherwaystoobtainnavigationalinput,especiallybyusingtheprimarysensorymodalityofhumans:vision.WithinComputerVision(thereareseveraldifferentap-proaches,listedbelow),lighting(especiallyintheoutdoors),dynamicbackgrounds,andview-invariantmatchingbecomemajorhurdlestoovercome.A.Object-BasedSceneRecognitionAlargeportionofthevision-basedapproachtowardsscenerecognitionisobject-based[5]–[7].Thatis,aphysicallocationisrecognizedbyiden-tifyingasetoflandmarkobjects(andpossiblytheirconguration)knowntobepresentatthatlocation.Thistypicallyinvolvesintermediatestepssuchassegmentation,featuregrouping,andobjectrecognition.Suchlayeredapproachispronetocar-ryingoverandamplifyinglow-levelerrorsalongthestreamofprocessing.Forexample,upstreamidenticationofverysmallobjects(pixel-wise)ishinderedbydownstreamnoiseinherenttocamerasensors,andbyvariablelightingconditions.Thisisparticularlyproblematicinspaciousenvironmentsliketheopenoutdoors,wherethelandmarkstendtobemorespreadoutandpossiblyatfartherdistancesfromtheagent.Itshouldalsobepointedoutthatthisapproachneedstobeenvironment-specicforthesimplicityofselectingasmallsetofanchorobjects,andthatdecidingonreliableandpersistentcandidateobjectsaslandmarksisanopenproblem.InrecentyearsScaleInvariantFeatureTransform(SIFT)[8]hasbeenusedinroboticsquiteexten-sively.WeputSIFTintheobject-basedcategorybecauseitisstillalocalfeatureanditsusagearetiedtotheexistanceofreliabledistinctivesub-structures,anobject.Thisisespeciallytrueifthebackground IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE2isclosetotextureless(wideopenspaces,thesky).Ontheotherextreme,distractingbackgroundwithtoomuchtexturethataremostlyephemeral(movingvegitationinsideaforest)mayalsobetoomuchtodealwith,increasingthenumberofkeypoints.Onvisuallydistinctivebackgrounds,however,thismethodcanbemuchmorepowerfulthanstrictlyobject-based.Somesystems[9],[10]bypassthesegmentationstageandusethewholesceneisonelandmark.BecauseSIFTalsoallowsforpartialrecognition,foregroundobjects,whichtendtobelessreliable(peoplewalking,etc.),canbetreatedasdistractionsandthebackgroundbecometheprevail-ingstrength(solongastheforegroundobjectdoesnotdominatethescene).AndbecauseSIFT'sabilitytoabsorbvariabilityfromout-of-planerotationcanonlygosofar,systemsarealsoburdenedwiththeneedtostorealargenumberofkey-pointsfrommultipleviews,whichdonotscalewell.B.Region-BasedSceneRecognitionAdifferentsetofapproaches[11]–[13]eliminateslandmarkobjects,insteadusingsegmentedimageregionsandtheircongurationalrelationshipstoformasignatureofalocation.Atthislevelofrepre-sentation,themajorproblemisreliableregion-basedsegmentationinwhichindividualregionshavetoberobustlycharacterizedandassociated.Na¨vetemplatematchinginvolvingrigidrelationshipsisoftennotexibleenoughinthefaceofover/under-segmentation.Thisisespeciallytruewithuncon-strainedenvironmentssuchasaparkwherevegeta-tiondominates(refertoExperiment2inthepresentstudy).Asaremedy,onecancombineobject-basedandregion-basedapproaches[13]byusingregionsasanintermediatesteptolocatelandmarkobjects.Nevertheless,suchtechniqueisstillpronetothesamecaveatsastheobject-basedapproach.C.Context-BasedSceneRecognitionThelastsetofapproaches,whichiscontext-based,bypassestheabovetraditionalprocessingsteps.Context-basedapproachesconsidertheinputimageasawholeandextractalow-dimensionalsignaturethatcompactlysummarizestheimage'sstatisticsand/orsemantics.Onemotivationforsuchapproachisthatitshouldproducemorerobustsolu-tionsbecauserandomnoise,whichmaycatastrophi-callyinuencelocalprocessing,tendstoaverageoutglobally.Byidentifyingwholescenes,andnotsmallsetsofobjectsorpreciseregionboundarieswithinthescenes,context-basedapproachesdonothavetodealwithnoiseandlow-levelimagevariationsinsmallisolatedregions,whichplaguebothregionsegmentationandlandmarkrecognition.Thechal-lengetodiscoveracompactandholisticrepresenta-tionforunconstrainedimageshashencepromptedsignicantrecentresearch.Forexample,RennigerandMalik[14]useasetoftexturedescriptors,andhistogramtocreateanoverallproleofanimage.UlrichandNourbakhsh[15]buildcolorhistogramsandperformmatchingusingavotingprocedure.Incontrast,OlivaandTorralba[16]alsoencodesomespatialinformationbyperforming2DFourierTransformanalysesinindividualimagesub-regionsonaregularly-spacedgrid.Theresultingspatially-arrangedsetofsignatures,onepergridregion,isthenfurtherreducedusingprincipalcomponentanalysis(PCA)toyieldauniquelow-dimensionalimageclassicationkey.Interestingly,Torralbare-portsthattheentriesinthekeyvectorsometimestendtocorrelatewithsemantically-relevantdimen-sions,suchascityvs.nature,orbeachvs.for-est.Inmorerecentimplementations,Torralba[17]alsousedsteerablewaveletpyramidsinsteadoftheFouriertransform.D.Biologically-PlausibleSceneRecognitionDespitealltherecentadvancesincomputervisionandrobotics,humansstillperformordersofmagni-tudebetterthanthebestavailablevisionsystemsinoutdoorslocalizationandnavigation.Assuch,itisinspiringtoexaminethelow-levelmechanismsaswellasthesystem-levelcomputationalarchitectureaccordingtowhichhumanvisionisorganized.Earlyon,thehumanvisualprocessingsystemalreadymakesdecisionstofocusattentionandprocessingresourcesontothosesmallregionswithintheeldofviewwhichlookmoreinteresting.Themechanismbywhichveryrapidholisticimageanalysisgivesrisetoasmallsetofcandidatesalientlocationsinascenehasrecentlybeenthesubjectofcomprehen-siveresearcheffortsandisfairlywellunderstood[18]–[21].Parallelwithattentionguidanceandmecha-nismsforsaliencycomputation,humansdemon-strateexquisiteabilityatinstantlycapturingthe“gist”ofascene;forexample,followingpresen-tationofaphotographforjustafractionofa IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE3second,anobservermayreportthatitisanindoorkitchenscenewithnumerouscolorfulobjectsonthecountertop[22]–[25].Suchreportatarstglanceontoanimageisremarkableconsideringthatitsummarizesthequintessentialcharacteristicsofanimage,aprocesspreviouslyexpectedtorequiremuchanalysis.Withverybriefexposures(100msorbelow),reportsaretypicallylimitedtoafewgeneralsemanticattributes(e.g.,indoors,outdoors,ofce,kitchen)andacoarseevaluationofdistributionsofvisualfeatures(e.g.,highlycolorful,grayscale,severallargemasses,manysmallobjects)[26],[27].However,answeringspecicquestionssuchaswhetherananimalwaspresentornotinthescenecanbeperformedreliablydowntoexposuretimesof28ms[28],[29],evenwhenthesubject'sattentionissimultaneouslyengagedbyanotherconcurrentvi-sualdiscriminationtask[30].Gistmaybecomputedinbrainareaswhichhavebeenshowntoprefer-entiallyrespondto“places,”thatis,visualscenetypeswitharestrictedspatiallayout[31].Spectralcontentsandcolordiagnosticityhavebeenshowntoinuencegistperception[25],[32],leadingtothedevelopmentoftheexistingcomputationalmodelsthatemphasizespectralanalysis[33],[34].Inwhatfollows,weusethetermgistinamorespecicsensethanitsbroadpsychologicaldenition(whatobserverscangatherfromasceneoverasingleglance):weformalizegistasarelativelylow-dimensional(comparedtoarawimagepixelarray)scenerepresentationwhichisacquiredoververyshorttimeframes,andwethusrepresentgistasavectorinsomefeaturespace.Sceneclassicationbasedongistthenbecomespossibleifandwhenthegistvectorcorrespondingtoagivenimagecanbereliablyclassiedasbelongingtoagivenscenecategory.Fromthepointofviewofdesiredresults,gistandsaliencyappeartobecomplementaryoppo-sites:ndingsalientlocationsrequiresndingthoseimageregionswhichstandoutbysignicantlydifferingfromtheirneighbors,whilecomputinggistinvolvesaccumulatingimagestatisticsovertheentirescene.Yet,despitethesedifferences,thereisonlyonevisualcortexintheprimatebrain,whichmustservebothsaliencyandgistcomputations.Partofourcontributionistomaketheconnectionbetweenthesetwocrucialcomponentsofbiologicalvision.Tothisend,tobebiologically-plausible,wehereexplicitlyexplorewhetheritispossibletodeviseaworkingsystemwherethelow-levelfeatureextractionmechanisms—coarselycorrespondingtocorticalvisualareasV1throughV4andMT—aresharedandservebothattentionandgist,asopposedtocomputedseparatelybytwodifferentmachinevisionmodules.Thedivergencecomesatalaterstage,inhowthelow-levelvisionfeaturesarefurtherprocessedbeforebeingutilized.Inourneuralsimulationofposteriorparietalcortexalongthedorsalor“where”streamofvisualprocessing[35],asaliencymapisbuiltthroughspatialcompetitionoflow-levelfea-tureresponsesthroughoutthevisualeld.Thiscom-petitionquietsdownlocationswhichmayinitiallyyieldstronglocalfeatureresponsesbutresembletheirneighbors,whileamplifyinglocationswhichhavedistinctiveappearances.Incontrast,inourneuralsimulationofinferiortemporalorthe“what”streamofvisualprocessing,responsesfromthelow-levelfeaturedetectorsarecombinedtoproducethegistvectorasaholisticlow-dimensionalsignatureoftheentireinputimage.Thetwomodels,whenruninparallel,canhelpeachotherandprovideamorecompletedescriptionofthesceneinquestion.Figure1showsadiagramofourimplementation.Inthepresentpaper,ourfocusisonimageclassicationusingthegistsignaturecomputedbythismodel,whileexploitationofthesaliencymaphasbeenextensivelydescribedpreviouslyforanumberofvisiontasks[20],[21],[36],[37].Wedescribe,inthefollowingsections,ouralgorithmtocomputegistinaveryinexpensivemannerbyusingthesamelow-levelvisualfront-endasthesaliencymodel.Wethenextensivelytestthemodelinthreechallengingoutdoorenvironmentsacrossmultipledaysandtimesofdays,wherethedominatingshadows,vegetation,andotherephemerousphe-nomenaareexpectedtodefeatlandmark-basedandregion-basedapproaches.Oursuccessinachievingreliableperformanceineachenvironmentisfurthergeneralizedbyshowingthatperformancedoesnotdegradewhencombiningallthreeenvironments.Theseresultssupportourhypothesisthatgistcanreliablybeextractedatverylowcomputationalcost,usingverysimplevisualfeaturessharedwithanattentionsysteminanoverallbiologically-plausibleframework. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE4Input imageLinear Filtering at 8 Spatial ScalesOrientation ChannelColor ChannelIntensity ChannelSaliency ModelGist ModelAttentionGist of ImageLocal Object RecognitionLayoutCognitionFig.1.ModelofHumanVisionwithGistandSaliencyII.DESIGNANDIMPLEMENTATIONThecoreofourpresentresearchfocusesontheprocessofextractingthegistofanimageusingfeaturesfromseveraldomains,calculatingitsholisticcharacteristicsbutstilltakingintoaccountcoarsespatialinformation.ThestartingpointfortheproposednewmodelistheexistingsaliencymodelofIttietal.[20],[38],whichisfreelyavailableontheWorld-Wide-Web.A.VisualCortexFeatureExtractionInthesaliencymodel,aninputimageislteredinanumberoflow-levelvisual“featurechannels”atmultiplespatialscales,forfeaturesofcolor,intensity,orientation,ickerandmotion(foundinVisualCortex).Somechannels(color,orientation,andmotion)haveseveralsub-channels(colortype,orientation,directionofmotion).Eachsub-channelhasanine-scalepyramidalrepresentationoflteroutputs,aratioof1:1(level0)to1:256(level8)inbothhorizontalandverticaldimensions,witha5-by-5Gaussiansmoothingappliedinbetweenscales.Withineachsub-channeli,themodelper-formscenter-surroundoperations(commonlyfoundinbiological-visionwhichcomparesimageval-uesincenter-locationtoitsneighboringsurround-locations)betweenlteroutputmaps,Oi(s),atdifferentscalessinthepyramid.ThisyieldsfeaturemapsMi(c;s),given“center”(ner)scalecand“surround”(coarser)scales.Ourimplementationusesc=2;3;4ands=c+d,withd=3;4.Across-scaledifference(operator )betweentwomapsisobtainedbyinterpolationtothecenter(ner)scaleandpointwiseabsolutedifference(eqn.1).Forcolorandintensitychannels:Mi(c;s)=jOi(c) Oi(s)j=jOi(c)Interpsc(Oi(s))j(1)Hence,wecomputesixfeaturemapsforeachtypeoffeatureatscales2-5,2-6,3-6,3-7,4-7,and4-8,sothatthesystemcangatherinformationinregionsatseveralscales,withaddedlightinginvari-anceprovidedbythecenter-surroundcomparison(furtherdiscussedbelow).Inthesaliencymodel,featuremapswereusedtodetectconspicuousregionsineachchannel,throughadditionalwinner-take-allmechanismswhichem-phasizelocationswhichsubstantiallydifferfromtheirneighbors[20].Thefeaturemapsarethenlinearlycombinedtoyieldasaliencymap.Tore-usethesamelow-levelmapsforgistasforattention,ourgistmodelusesthealreadyavailableorientation,colorandintensitychannels.Flickerandmotionchannels,whichalsocontributetothesaliencymap,areassumedtobemoredominantlydeterminedbytherobot'segomotionandhenceunreliableinformingagistsignatureofagivenlocation.Ourbasicapproachistoexploitstatisticaldataofcolorandtexturemeasurementsinprede-terminedregionsubdivisions.Thesefeaturesareindependentofshapeastheyaresimplydenotinglinesandblobs.Weincorporateinformationfromtheorientationchannel,employingGaborlterstothegreyscaleinputimage(eqn.2)atfourdifferentangles(i=0;45;90;135)andatfourspatialscales(c=0;1;2;3)forasubtotalofsixteensub-channels. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE5Fororientationchannels:Mi(c)=Gabor(i;c)(2)Wedonotperformcenter-surroundontheGaborlteroutputsbecausetheseltersalreadyarediffer-entialbynature.ThecolorandintensitychannelscombinetocomposethreepairsofcoloropponentsderivedfromEwaldHering'sColorOpponencythe-ories[39],whichidentifyfourprimarycolorsred,green,blue,yellow(denotedasR,G,B,andYineqns.3,4,5,and6,respectively)andtwohuelessdarkandbrightcolors7,computedfromtherawcamerar,g,boutputs[20].R=r(g+b)=2(3)G=g(r+b)=2(4)B=b(r+g)=2(5)Y=r+g2(jrgj+b)(6)I=(r+g+b)=3(7)Thecoloropponencypairsarethetwocolorchannels'red-greenandblue-yellow(eqn.8and9),alongwiththeintensitychannel'sdark-brightopponency(eqn.10).Eachoftheopponentpairsareusedtoconstructsixcenter-surroundscalecom-binations.Theseeighteensub-channelsalongwiththesixteenGaborcombinationsadduptoatotalofthirty-foursub-channelsaltogether.Becausethepresentgistmodelisnotspecictoanydomain,otherchannelssuchasstereocouldbeusedaswell.RG(c;s)=j(R(c)G(c)) (R(s)G(s))j(8)BY(c;s)=j(B(c)Y(c)) (B(s)Y(s))j(9)I(c;s)=jI(c) I(s)j(10)Figure2illustratesthegistmodelarchitecture.B.GistFeatureExtractionAfterthelow-levelcenter-surroundfeaturesarecomputed,eachsub-channelextractsagistvectorfromitscorrespondingfeaturemap.Weapplyav-eragingoperations(thesimplestneurally-plausiblecomputation)inaxedfour-by-fourgridofsub-regionsoverthemap.Observegure3forvisual-izationoftheprocess.Cross Scale Center-Surround DifferencesPCA/ICA Dimension ReductionPlace ClassifierMost Likely LocationInput imageLinear Filtering at 8 Spatial ScalesOrientation ChannelColor ChannelIntensity ChannelGist Feature ExtractionGist Feature VectorFig.2.VisualFeatureChannelsUsedintheGistModelFig.3.GistDecompositionofVerticalOrientationSub-channel.Theoriginalimage(topleft)isputthroughavertically-orientedGabor®ltertoproduceafeaturemap(topright).Thatmapisthendividedinto4-by-4gridsub-regions.Wethentakethemeanofeachgridtoproduce16valuesforthengistfeaturevector(bottom).Thus,asproposedinintroduction,gistcumu-latesinformationoverspaceinimagesub-regions, IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE6whilesaliencyreliedoncompetitionacrossspace.Equation11formalizesthecomputationforeachofthesixteenrawgistfeatures(Gk;li(c;s))permap,takingthesum,overagivensub-region(speciedbyindiceskandlinthehorizontalandverticaldirection,respectively),ofthevaluesinMi(c;s),thendividingbythenumberofpixelsinthesub-region.Forcolor,intensitychannels:Gk;li(c;s)=116WH(k+1)W41Xu=kW4(l+1)H41Xv=lH4[Mi(c;s)](u;v)(11)WhereWandHarethewidthandheightoftheentireimage.WesimilarlyprocesstheorientationmapsMi(c)tocomputeGk;li(c).Althoughadditionalstatisticssuchasvariancewouldcertainlyprovideusefuladditionaldescrip-tors,theircomputationalcostismuchhigherthanthatofrst-orderstatisticsandtheirbiologicalplau-sibilityremainsdebated[40].Thus,hereweexplorewhetherrst-orderstatisticswouldbesufcienttoyieldreliableclassication,ifonereliesonusingtheavailablevarietyofvisualfeaturestocompensateformorecomplexstatisticswithineachfeature.C.ColorConstancyTheadvantageofcoarsestatistical-basedgistisstabilityinaveragingoutlocalandrandomnoise.Whatismoreconcerningisglobalbiassuchaslightingasitchangestheappearanceoftheentireimage.Colorconstancyalgorithmssuchasgrayworldandwhitepatchassumethatlightingiscon-stantthroughoutascene[41],[42].Unfortunately,outdoorambientlightisnotquiteasstraightfor-ward.Notonlydoesitchangeovertime,bothinluminanceandchrominance,butalsovarywithinasinglesceneasitisnotapointlight-source.Differentsunpositionsandatmosphericconditionsilluminatedifferentpartsofasceneinvaryingdegreesasillustratedbyimagestakenonehourapart,juxtaposedintherstrowofgure4.Wecanseethattheforegroundofimage1receivesmorelightwhilethebackgrounddoesnot.Con-versely,theoppositeoccursinimage2.Itisim-portanttonotethatthegoalofthestepisnottorecognize/normalizecolorwithhighaccuracy,buttoproducestablegistfeaturesovercolorandChannelImage1Image2Sub-pSNRchannel(db)Rawr9.24g9.60b10.08Green2&532.572&632.133&634.28Red-3&733.95Green4&736.324&835.822&532.442&630.833&632.42Blue-3&730.95Yellow4&731.954&831.952&515.032&612.723&613.79Dark-3&712.21Bright4&713.294&813.33Fig.4.Exampleoftwolightingconditionsofthesamescene.pSNR(peaksignal-to-noiseratio)valuesmeasurehowsimilarthemapsarebetweenimages1and2.HigherpSNRvaluesforagivenmapindicatebetterrobustnessofthatfeaturetovariationsinlightingconditions.Ourcenter-surroundchannelsexhibitbetterinvariancethantherawr,g,bchannelsalthough,obviously,arenotcompletelyinvariant.intensity.Wealsoconsideranother(iterativeandslowerconverging)normalizationtechniquecalledComprehensiveColorNormalization(CCN)[43],whichcanbeseenasbothglobalandlocal.Oneindisputablefactisthatwhentextureislostbecauseoflightingsaturation(bothtoobrightortoodarkforthecamerasensor),nonormalization,howeversophisticated,canbringitback.Tothisend,becauseofthenatureofourgistcomputation,thebestwayistorecognizegistsofsceneswithdifferentlightingseparately.Wethusoptednottoaddanypreprocessing,butinsteadtotrainourgistclassier(describedbelow)onseverallightingconditions.Thegistfeaturesthemselvesalreadyhelpedminimizetheeffectofilluminationchangebecauseoftheirdifferentialnature(Gabororcenter-surround).PeakSignal-to-Ratio(pSNR)testsforthetwoimageswithdifferinglightingconditionsingure4showbetterinvarianceforourdiffer-entialfeaturesthanfortherawr,g,bfeatures,especiallyforthetwoopponentcolorchannels.Thisshowsthatthelow-levelfeatureprocessingproducescontrastinformationthatisreasonablyrobusttolighting.Notethatusingdifferentialfeaturescomes IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE7ataprice:baselineinformation(e.g.,absolutecolordistributions)isomittedfromourgistencodingeventhoughithasbeenshowntobeuseful[15].Thecenter-surroundcanbeconstruedasonlylookingforedgessurroundingregions(andnottheregionsthemselves),becausethatiswherethecontrastsare.Ontheotherhand,calculatingcolordistributionhistogramsamountstomeasuringthesizeofthoseregions.Thisiswherethepyramidschemehelpsrecoversomeoftheinformation.Withthepyramidschemethesystemcanpickupregions,atcoarserscales[25]andindirectlyinfertheabsolutedistribu-tioninformationwiththeaddedlightinginvariance.Asanexample,theintensitychanneloutputfortheillustrationimageofgure5showsdifferent-sizedregionsbeingemphasizedaccordingtotheirrespectivecenter-surroundparameter.D.PCA/ICADimensionReductionThetotalnumberofrawgistfeaturedimensionis544,34featuremapstimes16regionspermap(gure5).WereducethedimensionsusingPrincipalComponentAnalysis(PCA)andthenIndependentComponentAnalysis(ICA)withFastICA[44]toamorepracticalnumberof80whilestillpreservingupto97%ofthevarianceforasetintheupwardsof30,000campusscenes.E.SceneClassicationForsceneclassication,weuseathree-layerneu-ralnetwork(withintermediatelayersof200and100nodes),trainedwiththeback-propagationalgorithmdoneona1.667GHzAthlonAMDmachine.Themainreasonforusinganeuralnetworkclassieristhesuccessrate(seeResults)whichprovesthatitisadequate.Inaddition,itiseasytoaddmoresamplesandthetrainingprocesstakesashortamountoftime.Thecompleteprocessisillustratedingure5.III.TESTINGANDRESULTSWetestthesystematseveralsitesoncampus(mapshowningure6).TherstoneistheAhmansonCenterforBiologicalResearch(ACB),inwhichthescenesarelmedaroundthebuildingcomplex.Mostofthesurroundingsareatwallswithlittletextureandsolidlinesthatdelineatethewallsanddifferentpartsofthebuildings.Aregion-basedrepresentationwouldndtheenvironment231Fig.6.Mapoftheexperimentsitesideal.Figure7showssomeofthescenesaroundACBwiththeircorrespondingvisualdepictionofgist.Thesecondsiteisaregioncomprisedoftwoadjoiningparks:AssociateandFounderspark(AnF)whicharedominatedbyvegetations.Largeareasoftheimagesarepracticallyun-segmentableasleavesoverrunmostregions.Andalthoughthereareobjectssuchaslamppostsandbenches,lightinginsidetheparkmayoftenrendertheirrecognitiondifcultbecausethesunlightisrandomlyblockedbythetrees.RefertoFigure8forthevarietyofthevisualstimulicollectedalongthepath.ThethirdsitewetestedareanopenareaintheFrederick.D.Faggpark.Alargeportionofthescenes(gure9)isthesky,mostlytexturelessspacewithrandomlightclouds.Tocollectvisualdataweusean8mmhandheldcamcorder.Thecapturedvideoclipsarehardlystableasthecameraiscarriedbyapersonwalkingwhilelming.Moreover,attemptstosmoothoutimagejittersarenotalwayssuccessful.Atthispointthedataisstillview-specic,aseachlocationisonlytraversedfromonedirection.Foraview-invariantscenerecognition,weneedtotrainthesystemonmultipleviews[17].Wehavetriedtosweepthecameralefttoright(andviceversa)tocreateawiderpointofview,althoughtoretainperformancethesweephastobedoneatamuchslowerpacethanregularwalkingspeed[45].Thecurrenttestingsetupisslightlyselectiveastheamountofforegroundinterferenceisminimizedbylmingduringoff-peakhourswherefewerpeople IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE8PCA/ICA Dimension ReductionPlace ClassifierMost Likely LocationGist FeatureOrientation ChannelColor ChannelIntensity ChannelFeature MapsGist FeaturesInput ImageFig.5.ExampleofGistFeatureExtractionareoutwalking.Itshouldbepointedoutthatgistcanincludeforegroundobjectsaspartofasceneaslongastheydonotdominateit.Forclassication,wedividethevideoclipsintosegments.Asegmentisaportionofahallway,path,orroadinterruptedbyacrossingoraphysicalbarrieratbothends.Thepathisdividedthiswaybe-cause,alongwithnaturalgeographicaldelineation,imageswithineachsegmentlooksimilartoahumanobserver.Moreover,whenseparatingthedataatthejunction,wetakespecialcareincreatingacleanbreakbetweenthetwoinvolvedsegments.Thatis,westopshortofacrossingforthecurrentsegmentandwaitafewmomentsbeforestartingthenextone.Thisansuresthatthesystemwillbetrainedwithdatawhereground-truthlabeling(assigningasegmentnumbertoanimage)isunambiguous.Inaddition,fortheframesinthemiddleoftheclipsweincludeallofthem,thereisnoby-handselectionbeingdone.Themainissueincollectingtrainingsamplesistheselectionoflmingtimesthatincludealllight-ingconditions.Becauselightingspaceishardtogauge,weperformtrial-and-errortocomeupwithtimesofthedaywhichattempttocoverthespace.Wetakedataonuptosixdifferenttimesoftheday,twiceforeach,forseveraldays.Theycoverthebrightest(noontime)tothedarkest(earlyevening)lightingconditions,overcastvs.clear,andencom-passnotablechangesinapperanceduetoincreasesintemperature(hazymid-afternoon).Althoughweareboundtoexcludesomelightingconditions,the IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE9Fig.7.ExamplesofimagesineachsegmentofACBresultsshowthatthecollectedsamplescoveralargeportionofthespace.Itshouldbenotedthatmost(10of12)ofthetrainingandtestingdataaretakenondifferentdays.Forthetwotakenonthesameday,thetestingdatawasrecordedisintheearlyevening(darklighting)whiletrainingdatawasrecordednearnoon(brightlighting).Eachoftherstthreeexperimentsusesthesameclassierneuralnetworkstructure,withnineoutputlayernodes(sameasthenumberofsegments).Weuseabsoluteencodingforthetrainingdata.Forexample,ifthecorrectanswerforanimageissegment1,thecorrespondingnodeisassigned1.0,whiletheothersareall0.0.Theencodingallowsforprobabilisticoutputsforscenesinthetestingdata.Forcompleteness,theintermediatelayershave200and100nodes,respectively,whilewehave80inputnodes(forthe80featuresofthegistvector).Thatisatotalof:80*200+200*100+100*9=36,900connections.Ourcut-offforconvergenceis1%trainingerror.Alltrainingisdoneona1.667GHzAthlonAMDmachine.A.Experiment1:AhmansonCenterforBiologicalResearch(ACB)Thisexperimentalsitewaschosentoinvestigatewhatthesystemcanachieveinarigidandlessspaciousman-madeenvironment.Eachsegmentisastraightlineandpartofahallway.Somehallwaysaredividedintotwosegmentssothateachsegmentisapproximatelyofthesamelength.Figure10showsthemapofthesegmentswhilegure7displaysscenesofeachsegment.Notethatthe IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE10Fig.8.ExamplesofimagesineachsegmentofAssociateandFoundersparkmapshowsthatthesegmentsarenotpartofasinglecontinuouspath,butaseriesofavailablewalkwayswithinanenvironment,traversedfromasingledirection.Figure11representsthefourlightingconditionsusedintesting:lateafternoon,earlyevening(notethatthelightsarealreadyturnedon),noon,andmid-afternoon.Wechosetwodarkandtwobrightcon-ditionstoassureawiderangeoftestingconditions.Wetrainthesystemseveraltimesbeforechoosingatrainingrunthatgivesthehighestclassicationresultonthetrainingdata.Afteronlyabouttwentyepochs,thenetworkconvergestolessthan1%error.Afastrateoftrainingconvergenceintherstfewepochsappearstobeatellingsignofhowsuccessfulclassicationwillbeduringtesting.TableIshowsresults.Theterm“False+”forsegmentxmeansthenumberofincorrectsegmentxguessesgiventhatthecorrectanswerisanothersegment,dividedbythetotalnumberframesinthesegment;conversely,“False-”isthenumberofincorrectguessesgiventhatthecorrectanswerissegmentx,dividedthetotalnumberframesinthesegment.Thetableshowsthatthesystemisabletoclassifythesegmentsconsistentlyduringthetestingphasewithatotalerrorof12.04%oranoverall87.96%correctness.WealsoreporttheconfusionmatrixintableIIandndthattheerrorsare,ingeneral,notuniformlydistributed.Spikesofclassicationerrorsbetweensegments1and2wouldsuggestapossibilityofsignicantoverlappingscenes(segment2isacon-tinuationofsegment1;seegure10).Ontheotherhand,therearealsoerrorsthatarenotaseasilyexplainable.Forexample,163falsepositivesfor IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE11Fig.9.Examplesofimages,onefromeachsegmentofFrederickD.Faggpark(segments1through9fromlefttorightandtoptobottom).TABLEIAHMANSONCENTERFORBIOLOGYEXPERIMENTALRESULTSTrial1Trial2Trial3Trial4TotalSegmentFalse+False-False+False-False+False-False+False-False+Percent.False-Percent.114/39011/38717/38047/41032/39327/38839/4455/411102/16086.34%90/15965.64%220/346114/440133/468101/43685/49254/46118/325131/438256/163115.70%400/177522.54%31/4633/4650/45629/48582/50243/46333/47531/473116/18966.12%106/18865.62%47/34818/35924/3387/3215/22684/3057/148108/24943/10604.06%217/123417.59%546/3485/30752/3890/33764/29095/321125/40341/319287/143020.07%141/128410.98%624/56713/55639/47856/49523/53324/53469/5647/502155/21427.24%100/20874.79%743/41071/43855/371129/445136/43995/398108/48622/400342/170620.05%317/168118.86%8101/3910/29018/2650/24767/32021/27437/30322/288223/127917.44%43/10993.91%965/32086/34146/40415/37317/26268/31329/22798/296157/121312.94%267/132320.18%Total321/3583384/3549511/3457465/33761681/13965Percent.8.96%10.82%14.78%13.77%12.04% IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE12123456789CBast WingSHSCBestingCBOCWCEMLJSFig.10.MapofthepathsegmentsofACBsegment2whenthegroundtruthissegment7(and141falsepositivesintheotherdirection).Fromgure7wecanseethatthereislittleresemblancebetweenthetwoapperance-wise.However,ifweconsiderjustthecoarselayout,thestructureofbothsegmentsaresimilar,withawhiteregionontheleftsideandadarkredoneontherightside.B.Experiment2:AssociatesandFoundersPark(AnF)WenowcomparetheresultsofExperiment1with,conceivably,amoredifcultclassicationtask:segmentingpathsinavegetation-dominatedsite.FigureIIImapsoutthesegmentswhilegure8displaysasampleimagefromeachofthem.Aswecanseetherearefewerextractablestructures.Inaddition,thelengthsofthesegmentsatthissiteareabouttwicethelengthsofthesegmentsinExperiment1.AswithExperiment1,weperformmulti-layerneuralnetworkclassicationusingtheFig.11.LightingconditionsusedfortestingatAhmansonCenterforBiology(ACB).Clockwisefromtopleft:lateafternoon,earlyevening,noon,andmid-afternoonTABLEIIAHMANSONCENTERFORBIOLOGYCONFUSIONMATRIXSegmentnumberguessedbyalgorithmTruesegmentnumberSegment12345678911506390101225013277137500054141111173161780194011424143166101766491418050104011434114456097361198710100718163013119136482218014131537105609314384104327741056back-propagationalgorithmwiththesamenetworkarchitectureandparameters.Thenumberofepochsfortrainingconvergenceislessthan40,abouttwicethatfoundforExperiment1.Figure13showsfourlightingconditionstested:earlyevening(lightsalreadyturnedon),overcast,noon,andmid-afternoon.Alsonotethatatthersttestrun,thebenchinthefrontismissingfromtheimage.Weencountersimilarchallengesinothersegments,suchasservicevehiclesparkedorahugestorageboxplacedintheparkforaday. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE13TABLEIIIASSOCIATEANDFOUNDERSPARKEXPERIMENTALRESULTSTrial1Trial2Trial3Trial4TotalSegmentFalse+False-False+False-False+False-False+False-False+Percent.False-Percent.171/559210/698177/539440/802140/786245/89149/73362/746437/261716.70%957/313730.51%238/54464/570107/4296/328271/558187/474122/58412/474538/211525.44%269/184614.57%357/85171/86554/814217/977206/109678/96838/9965/963355/37579.45%371/37739.83%461/51831/48872/61158/597221/730179/688131/652111/632485/251119.32%379/240515.76%582/66930/617142/86745/770121/785110/77454/74487/777399/306513.02%272/29389.26%6300/125447/1001265/1210177/1122273/1084192/1003148/1079167/1098986/462721.31%583/422413.80%742/297167/422177/643104/57054/55362/56176/41659/399349/190918.28%392/195220.08%854/57775/59873/69669/69259/77185/79760/77058/768246/28148.74%287/285510.05%9106/737116/74753/8584/809146/655353/86269/732186/849374/298212.54%659/326720.17%Total811/60061120/66671491/7018747/67064169/26397Percent.13.50%16.80%21.25%11.14%15.79%123456789ADMNCTTHHssoatesarkoundersarkFig.12.MapofthepathsegmentsofAssociateandFoundersParkTheresultareshownattableIII.Theconfusionmatrix(tableIV)isalsoreported.AquickglanceattableIIIrevealsthattheperformance,withatotalerrorof15.79%(84.21%successrate),ishigherthaninExperiment1.However,ifwelookatthechallengespresentedbythescenes,itisquiteanaccomplishmenttoloselessthan4%inperformance.Inaddition,nocalibrationisdoneinmovingfromtherstenvironmenttothesecond.Fig.13.LightingconditionsusedfortestingatAsociateandFounderspark(AnF).Clockwisefromtopleft:earlyevening,over-cast,noon,andmid-afternoonIncreasesinlengthofsegmentsdonotaffecttheresultsdrastically.Theresultsfromthethirdexper-imentwhichhasevenlongersegmentswillconrmthisassessment.Itappearsthatthelongerlengthdoesnotmeanmorevariabilitytoabsorbbecausethemajorityofthesceneswithinasegmentdonotchangeallthatmuch.Theconfusionmatrix(tableIV)showsthattheerrorsaremarginallymore IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE14TABLEIVASSOCIATEANDFOUNDERSPARKCONFUSIONMATRIXSegmentnumberguessedbyalgorithmTruesegmentnumberSegment1234567891218032212821237496521422201577437118674131182634027351427333943131892026466455365682913266649810518622865914240364116166773862114152011560457878247352142932568149901756015662772372608uniformthaninExperiment1(fewzeroentries),probablyastheenvironmentislessstructuredandpronetomoreaccidentalclassicationerrorsamongpossiblynon-adjacentsegmentswhenvegetationdominates.Experiment3:FrederickD.Faggpark(FDF)ThethirdandnalsiteisanopenareainfrontoftheLeaveyandDohenylibrariescalledtheFrederickD.Faggpark,whichthestudentsusetostudyoutdoorsandtocatchsomesun.Themainmotivationfortestingatthissiteistoassessthegistresponseonsparserdata.Figure14showsthemapofsegmentswhilegure9showsthescenesfromeachsegment.Thesegmentsareabout50%longerthantheonesinthesecondexperiment(threetimesthatofexperiment1).Thenumberofepochsintraininggoesupbyabouttenandtheamounttimeofconvergenceroughlydoublesfromthatofexperiment2,toabout50minutes.Figure15representsthefourlightingconditionstested:earlyevening(thestreetlightsnotyetturnedon),evening(thestreetlightsarealreadyturnedon),noon,andmiddleofafternoon.TableVshowstheresultsfortheexperiment,listingtotalerrorof11.38%(88.62%classication).Theresultfromtrail2(7.95%error)isthebestamongallrunsforallexperiments.Wesuspectthatthissuccessisbecausethelightingverycloselyresemblestheoneofthetrainingdata.Therunisconductedatnoon,inwhichthelightingdoestendtostaythesameforlongperiodsoftime.Asaperformancereference,whenwetestthesystemwithasetofdatatakenrightafteratrainingset,theerrorratesareabout9%to11%.Whentraining152346789VKCSOSWPHVLASrederick Dagg Jr. ParkFig.14.MapofthepathsegmentsofFrederickD.Faggparkimageswiththesamelightingconditionasonasubsetoftestingdataofinterestareexcludedduringtraining,theerrorforthatrunusuallyatleasttriples(toaboutthirtytofortypercent),whichsuggeststhatlightingcoverageinthetrainingphaseisacriticalfactor.TheconfusionmatrixforExperiment3(tableVIisalsoreported.Overall,theresultsarebetterthanExperiment1and2eventhoughtheseg-mentsarelongeronaverage.Itcanbearguedthatthesystemperformancedegradesgracefullywiththesubjectively-assessedvisualdifcultyoftheenvironment,experiment2(AnF)beingthemostchallengingone.D.Experiment4:CombinedsitesTogaugethesystem'sscalability,wecombinescenesfromallthreesitesandtrainittoclassifytwentysevendifferentsegments.Theonlydiffer-enceintheneural-networkclassieristhattheoutputlayernowconsistsoftwenty-sevennodes.Thenumberoftheinputandhiddennodesre-mainsthesame.Thenumberofconnectionsis IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE15TABLEVFREDERICKD.FAGGPARKEXPERIMENTALRESULTSTrial1Trial2Trial3Trial4TotalSegmentFalse+False-False+False-False+False-False+False-False+Percent.False-Percent.122/657246/88140/69911/67044/684207/84728/735246/953134/27754.83%710/335121.19%2246/102212/78853/74944/74056/727126/797105/758225/878460/325614.13%407/320312.71%311/691178/8580/6897/696341/121845/922282/11475/870634/347516.93%235/33467.02%45/79943/8373/66380/74053/8837/83735/75799/82196/31023.09%229/32357.08%518/440409/83111/390369/7482/6960/69416/8700/85447/23961.96%778/312724.88%6343/197647/168012/155027/1565182/1772122/1712243/1770145/1672780/706811.04%341/66295.14%70/806231/103725/9444/9230/675182/85730/88638/89455/33111.66%455/371112.26%8483/160748/1172436/156879/1211319/158193/1355149/1244175/12701387/600023.12%395/50087.89%986/8250/73965/86624/82542/579257/794164/788119/743357/305811.67%400/310112.90%Total1214/8823645/81181039/88151052/89553950/34711Percent.13.76%7.95%11.787%11.75%11.38%Fig.15.LightingConditionsuseforTestingatFrederickD.Faggpark(FDF).Clockwisefromtopleft:earlyevening,evening,noon,andmiddleofafternoonincreasedby1,800(18newoutputnodesby100secondhidden-layernodes),from36,900to38,700connections(4.88%).Weusethesameprocedureaswellastrainingandtestingdata(175,406and75,073frames,respectively).Thetrainingprocesstakesmuchlongerthanfortheotherexperiments.Itisabout260epochswiththelast200epochsTABLEVIFREDERICKD.FAGGPARKCONFUSIONMATRIXSegmentnumberguessedbyalgorithmTruesegmentnumberSegment12345678912641111650036401700255279610607970707230713111001530011401361300600092059230023496313660106223025456628803217015400246232561684784530403563404613196935685643822242701orsoconvergingveryslowlyfrom3%downto1%.Whentraining,weprinttheconfusionmatrixperiodicallytoanalyzetheprocessofconvergence,andwendthatthenetworkconvergesfrominter-siteclassicationbeforegoingfurtherandeliminatetheintra-siteerrors.Weorganizetheresultsintosegment-level(TableVII)andsite-level(TableVIII)statistics.Forsegment-levelclassication,thetotalerrorrateis13.55%.Weexpectedtheresultstobesomewhatworsethanallpreviousthreeexperimentswheneachsiteisclassiedindividually.However,suchisnotthecasewhencomparingitwiththeAnFexperiment(15.79%;experiment2)whilebeingmarginallyworsethantheothertwo(12.04%forACBinexperiment2and11.38%forFDFinexper-iment3).Noticealsothattherelativeerroramongtheindividualsiteschangesaswell.TheresultsforAnFsegmentsinthecombinedsetupimproveby2.47%to13.32%errorwhilerateforsegments IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE16TABLEVIIICOMBINEDSITESEXPERIMENTALRESULTSSiteACBAnFFDFFalse-/TotalPct.errACB128825635201083/139657.76%AnF35025668379729/263972.76%FDF1631433331151596/347114.60%False+51319968993408Total13395276643401475073Pct.err3.83%7.22%2.64%4.54%inACBandFDFdegradesby4.24%and1.25%,respectively.Fromthesite-levelconfusionmatrix(tableVIII),weseethatthesystemcanreliablypinagiventestimagetothecorrectsitewithonly4.54%error(95.46%classication).Thisisencouragingbecausetheclassiercanthenprovidevariouslevelsofoutputs.Forinstance,whenthesystemisunsureabouttheactualsegmentlocation,itcanatleastrelyonbeingattherightsite.Oneoftheconcernsincombiningsegmentsfromdifferentsitesisthatthenumberofsamplesforeachofthembecomesunbalancedastherearesomesegmentsthattakelesstimetowalkthrough(asshortas10seconds)whileotherscantakeuptoaminuteandahalf.Thatis,thelowernumberofsamplesforACBmayyieldanetworkconvergencethatgivesheavierweightsoncorrectlyclassifyingthelongersegmentsfromAnFandFDF.Fromthesite-levelstatistics(tableVIII),wecanseethatthetrenddoeshold,althoughnottoanalarmingextent.IV.DISCUSSIONWehaveshownthatthegistfeaturessucceedinclassifyingalargesetofimageswithoutthehelpoftemporalltering(one-shotrecognitiononeachimageconsideredindividually),whichwouldbeexpectedtofurtherimprovetheresultsbyreducingnoisesignicantly[17].Intermsofrobustness,thefeaturesareabletohandletranslational,angular,scaleandilluminationchanges.Becausetheyarecomputedfromlargeimagesub-regions,ittakesasizabletranslationalshifttoaffectthevalues.Asforangularstability,thenaturalperturbationofacameracarriedwhilewalkingseemstoaidthedemonstratedinvariance.Forlargerangulardiscrep-ancy,likeforexampleininthecaseofoff-roadenvironments,anengineeringapproachlikeaddingsensorssuchasagyroscopetocorrecttheangleofviewmaybeadvisable.Thegistfeaturesarealsoinvarianttoscalebecausethemajorityofthescenes(background)arestationaryandthesystemistrainedatallviewingdistances.Thegistfea-turesachieveasolidilluminationinvariancewhentrainedwithdifferentlightingconditions.Lastly,thecombined-sitesexperimentshowsthatthenumberofdifferentiablescenescanbequitehigh.Twentysevensegmentscanmakeupadetailedmapofalargearea.Aprofoundeffectofusinggististheutilizationofbackgroundinformationmoresothanforeground.However,onedrawbackofthecurrentgistimple-mentationisthatitcannotcarryoutpartialback-groundmatchingforscenesinwhichlargepartsareoccludedbydynamicforegroundobjects.Asmentionedearlierthevideosarelmedduringoff-peakhourswhenfewpeople(orvehicles)areontheroad.Nevertheless,theycanstillcreateproblemswhenmovingtooclosetothecamera.Inoursystem,theseimagescanbetakenoutusingthemotioncuesfromthemotionchannelofthesaliencyalgorithmasapreprocessinglter,detectingsignicantocclusionbythresholdingthesumofthemotionchannelfeaturemaps[37].Furthermore,awide-anglelens(withsoftwaredistortioncorrection)canhelptoseemoreofthebackgroundscenesand,incomparison,decreasethesizeofthemovingforegroundobjects.Thegistfeatures,despitetheircurrentsimplisticimplementation,areabletoachieveapromisinglocalizationperformance.Thetechniquehighlightstherapidnatureofgistwhilestillaccurateinperformingitstasks.This,inlargepart,isbecausethebasiccomputationalmechanismsofextractingthegistfeaturesaresimpleaveragingofvisualcuesfromdifferentdomains.Theoretically,scalabilityisaconcernbecausewhenweaverageoverlargespaces,backgrounddetailsthatmaybecriticalindistinguishingcertainlocationscanbelost.How-ever,althoughmoresophisticatedgistcomputationcouldbeincorporated,weavoidcomplicationsthatoccurwhentryingtotmorecomplexmodelstounconstrainedandnoisydata.Forexample,agraphwouldbeamoreexpressivelayoutrepresentationthanthecurrentgrid-baseddecomposition(refertogure3).Itcanrepresentasceneassegmentedregionfeaturevectors,orevenobjects,foreachnode,andcoarsespatialrelationshipsfortheedges.Thenodeinformationcanprovideexplicitshaperecognition,which,inessence,iswhatislackinginourcurrentimplementation.Howver,asmentioned IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE17TABLEVIICOMBINEDSEGMENTEXPERIMENTALRESULTSACBAnFFDFSegmentFalse+%err.False-%err.False+%err.False-%err.False+%err.False-%err.1292/165716.20%231/159614.47%565/312018.11%582/313718.55%231/230610.02%1276/335138.08%2277/171016.20%342/177519.27%636/215929.46%323/184617.50%455/317514.33%483/320315.08%3275/203113.54%130/18866.89%555/419813.22%130/37733.45%893/388123.01%358/334610.70%461/12115.04%84/12346.81%233/24019.70%237/240524.05%56/31021.81%189/32355.84%5129/120810.68%205/128415.97%583/325117.93%270/29389.19%107/31153.43%119/31273.81%6162/20407.94%209/208710.01%926/446220.75%688/422416.29%784/642612.20%987/662914.89%7308/143821.42%551/168132.78%298/168017.74%570/195229.20%309/37048.34%316/37118.52%883/9618.64%221/109920.11%730/327822.72%307/285510.75%300/48336.21%475/50089.48%9116/113910.18%300/132322.68%257/31158.25%409/326712.52%551/347215.87%180/31015.80%Total1703/1339512.714%2273/1396516.276%4783/2766417.290%3516/2639713.320%3686/3401410.837%4383/3471112.627%Total10172/75073=13.55%inintroductionsuchapproachcanbreakdownwhenasegmentationerroroccur.Inoursecondexper-iment(AnF),forexample,overlappingtreesoroverlappingbuildingscanbejumbledtogether.Anotherwaytoincreasethetheoreticalstrengthofthegistfeaturesistogotoanergridtoincorporatemorespatialinformation.Forthecurrentextractionprocess,togotonextlevelinthepyramid(aneight-by-eightgrid)istoincreasethenumberoffeaturesfrom16to64ineachsub-channel.Howevermorespatialresolutionalsomeansmoredata(quadrupledtheamount)toprocessanditisnotobviouswherethepointofdiminishingreturnis.Wehavetostrikeabalanceinresolutionandgeneralization,pushingthecomplexityofexpres-sivenessofthefeatureswhilekeepingrobustnessandcompactnessinmind.Thatsaid,ourgoalistoemulatehuman-levelgistunderstandingthatcanbeappliedtoalargersetofproblems.Assuchourfurtherdirectionwouldbetostayfaithfultotheavailablescienticdataonhumanvision.Wehavenotdiscussedinlengththeneedforlocalizationwithinasegment.Thegistfeatureswouldhaveproblemsdifferentiatingsceneswhenmostofthebackgroundoverlapsasisthecaseforsceneswithinsegment.Gist,bydenition,isnotamechanismtoproduceadetailedpictureandanaccuratelocalization,justthecoarsecontext.Thisiswheresaliencyandlocalizedobjectrecognitionmaycomplementgist.Therawgistfeaturesaresharedwiththesaliencymodelsothatwecanattacktheproblemfrommultiplesidesefciently.Oneoftheissuesweencounterwiththisarrangementwillbetheneedtosynchronizetherepresentationsothatthetwocancoincidenaturallytoprovideacompletescenedescription.Withgistwecanlocateourgeneralwhereabouttothesegmentlevel.Withsaliencywecannowpin-pointourexactloca-tionsbyndingdistinctivecuessituatedwithinthesegmentandapproximateourdistancefromthem.Thegistmodelcanevenprimeregionswithinascene[46]byprovidingcontexttopruneoutnoisypossiblesalientlocations.V.CONCLUSIONWehaveshownthatthegistmodelcanbeusefulinoutdoorslocalizationforawalkinghuman,withobviousapplicationtoautonomousmobilerobotics.Themodelisabletoprovidehigh-levelcontextinformation(asegmentwithinasite)fromvariousoutdoorenvironmentsdespiteusingcoarsefeatures.Itisabletocontrastscenesinaglobalmannerandautomaticallytakesobviousidiosyncrasiesintoac-count.Thiscapabilityreducestheneedfordetailedcalibrationinwhicharobothastorelyonthead-hocknowledgeofthedesignerforreliablelandmarks.Furthermore,weareworkingonextendingthecur-rentsystemtorecognizeplaces/segmentswithouthavingtotrainitexplicitly.Thisrequirestheabilitytoclustergistfeaturevectorsfromasamelocation,whichalsohelpalerttherobotwhenitismovingfromonelocationtoanother.Becausetherawfeaturesaresharedwiththesaliencymodel,thesystemcanefcientlyincreaselocalizationresolution.Itcanusesalientcuestocreatedistinctsignaturesofindividualscenes,nerpointsofreferencewithinasegmentthatmaynotbedifferentiablebygistalone.Thesalientcuescanalsohelpguidelocalizationforthetransitionsbetweensegmentswhichwedidnottrytoclassify. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE18Inthefuturewewouldliketopresentaphysicalimplementationofamodelthatusesbottom-upsalientcuesaswellascontexttoproduceausefultopographicalmapfornavigationinunconstrained,outdoorenvironment.ACKNOWLEDGMENTThisresearchwassupportedbytheNationalGeospatial-IntelligenceAgency,theNationalSci-enceFoundation,andtheHumanFrontierScienceProgram.Theauthorsafrmthattheviewsex-pressedhereinaresolelytheirown,anddonotrepresenttheviewsoftheUnitedStatesgovernmentoranyagencythereof.REFERENCES[1]D.Fox,W.Burgard,F.Dellaert,andS.Thrun,ªMontecarlolocalization:Ef®cientpositionestimationformobilerobots,ºinProceedingsoftheSixteenthNationalConferenceonArticialIntelligence(AAAI'99).,July1999.[2]J.J.LeonardandH.F.Durrant-Whyte,ªMobilerobotlocal-izationbytrackinggeometricbeacons,ºIEEETransactionsonRoboticsandAutomation,vol.7,no.3,pp.376±382,June1991.[3]S.Thrun,D.Fox,andW.Burgard,ªAprobabilisticapproachtoconcurrentmappingandlocalizationformobilerobots,ºMachineLearning,vol.31,pp.29±53,1998.[4]K.Lingemann,H.Surmann,A.Nuchter,andJ.Hertzberg,ªIndoorandoutdoorlocalizationforfastmobilerobots,ºinIROS,2004.[5]Y.Abe,M.Shikano,T.Fukuda,F.Arai,andY.Tanaka,ªVisionbasednavigationsystemforautonomousmobilerobotwithglobalmatching,ºIEEEInternationalConferenceonRoboticsandAutomation,vol.20,pp.1299±1304,May1999.[6]S.Thrun,ªFindinglandmarksformobilerobotnavigation,ºinICRA,1998,pp.958±963.[7]S.Maeyama,A.Ohya,andS.Yuta,ªLongdistanceoutdoornavigationofanautonomousmobilerobotbyplaybackofperceivedroutemap,ºinISER,1997,pp.185±194.[8]D.G.Lowe,ªDistinctiveimagefeaturesfromscale-invariantkeypoints,ºInternationalJournalofComputerVision,vol.60,no.2,pp.91±110,2004.[9]S.Se,D.G.Lowe,andJ.J.Little,ªVision-basedgloballocalizationandmappingformobilerobots,ºIEEETransactionsonRobotics,vol.21,no.3,pp.364±375,2005.[10]L.Goncalves,E.D.Bernardo,D.Benson,M.Svedman,J.Os-trowski,N.Karlssona,andP.Pirjanian,ªAvisualfront-endforsimultaneouslocalizationandmapping,ºinICRA,April18-222005,pp.44±49.[11]H.Katsura,J.Miura,M.Hild,andY.Shirai,ªAview-basedoutdoornavigationusingobjectrecognitionrobusttochangesofweatherandseasons,ºinProceedingsofthe2003IEEE/RSJInternationalConferenceonIntelligentRobotandSystems(IROS2003),LasVegas,Nevada,US,October27-312003,pp.2974±2979.[12]Y.Matsumoto,M.Inaba,andH.Inoue,ªView-basedapproachtorobotnavigation,ºinIEEE-IROS,2000,pp.1702±1708.[13]R.Murrieta-Cid,C.Parra,andM.Devy,ªVisualnavigationinnaturalenvironments:Fromrangeandcolordatatoalandmark-basedmodel,ºAutonomousRobots,vol.13,no.2,pp.143±168,2002.[14]L.RennigerandJ.Malik,ªWhenissceneidenti®cationjusttexturerecognition?ºVisionResearch,vol.44,pp.2301±2311,2004.[15]I.UlrichandI.Nourbakhsh,ªAppearance-basedplacercogni-tionfortopologicallocalization,ºinIEEE-ICRA,April2000,pp.1023±1029.[16]A.OlivaandA.Torralba,ªModelingtheshapeofthescene:Aholisticrepresentationofthespatialenvelope,ºInternationalJournalofComputerVision,vol.42,no.3,pp.145±175,2001.[17]A.Torralba,K.P.Murphy,W.T.Freeman,andM.A.Rubin,ªContext-basedvisionsystemforplaceandobjectrecognition,ºinIEEEIntl.ConferenceonComputerVision(ICCV),Nice,France,October2003,pp.1023±1029.[18]A.TreismanandG.Gelade,ªAfeature-integrationtheoryofattention,ºCognitivePsychology,vol.12,pp.97±137,1980.[19]J.Wolfe,ªGuidedsearch2.0:Arevisedmodelofvisualsearch,ºPsychonomicBulletinandReview,vol.1,no.2,pp.202±238,1994.[20]L.Itti,C.Koch,andE.Niebur,ªAmodelofsaliency-basedvisualattentionforrapidsceneanalysis,ºIEEETransactionsonPatternAnalysisandMachineIntelligence,vol.20,no.11,pp.1254±1259,Nov1998.[21]L.IttiandC.Koch,ªComputationalmodelingofvisualatten-tion,ºNatureReviewsNeuroscience,vol.2,no.3,pp.194±203,Mar2001.[22]M.C.Potter,ªMeaninginvisualsearch,ºScience,vol.187,no.4180,pp.965±966,1975.[23]I.Biederman,ªDobackgrounddepthgradientsfacilitateobjectidenti®cation?ºPerception,vol.10,pp.573±578,1982.[24]B.TverskyandK.Hemenway,ªCategoriesoftheenvironmentalscenes,ºCognitivePsychology,vol.15,pp.121±149,1983.[25]A.OlivaandP.Schyns,ªCoarseblobsor®needges?evidencethatinformationdiagnosticitychangestheperceptionofcom-plexvisualstimuli,ºCognitivePsychology,vol.34,pp.72±107,1997.[26]T.SanockiandW.Epstein,ªPrimingspatiallayoutofscenes,ºPsychol.Sci.,vol.8,pp.374±378,1997.[27]R.A.Rensink,ªThedynamicrepresentationofscenes,ºVisualCognition,vol.7,pp.17±42,2000.[28]S.Thorpe,D.Fize,andC.Marlot,ªSpeedofprocessinginthehumanvisualsystem,ºNature,vol.381,pp.520±522,1995.[29]M.M.MJ,S.J.Thorpe,andM.Fabre-Thorpe,ªRapidcatego-rizationofachromaticnaturalscenes:howrobustatverylowcontrasts?ºEurJNeurosci.,vol.21,no.7,pp.2007±2018,April2005.[30]F.Li,R.VanRullen,C.Koch,andP.Perona,ªRapidnaturalscenecategorizationinthenearabsenceofattention,ºinProc.Natl.Acad.Sci.,2002,pp.8378±8383.[31]R.Epstein,D.Stanley,A.Harris,andN.Kanwisher,ªTheparahippocampalplacearea:Perception,encoding,ormemoryretrieval?ºNeuron,vol.23,pp.115±125,2000.[32]A.OlivaandP.Schyns,ªColoreddiagnosticblobsmediatescenerecognition,ºCognitivePsychology,vol.41,pp.176±210,2000.[33]A.Torralba,ªModelingglobalscenefactorsinattention,ºJournalofOpticalSocietyofAmerica,vol.20,no.7,pp.1407±1418,2003.[34]C.AckermanandL.Itti,ªRobotsteeringwithspectralimageinformation,ºIEEETransactionsonRobotics,vol.21,no.2,pp.247±251,Apr2005.[35]L.G.UngerleiderandM.Mishkin,ªTwocorticalvisualsys-tems,ºinAnalysisofVisualBehavior,D.J.Ingle,M.A.Goodale,andR.J.W.Mans®eld,Eds.Cambridge,MA:MITPress,1982,pp.549±586.[36]L.IttiandC.Koch,ªAsaliency-basedsearchmechanismfor IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE19overtandcovertshiftsofvisualattention,ºVisionResearch,vol.40,no.10-12,pp.1489±1506,May2000.[37]L.Itti,ªAutomaticfoveationforvideocompressionusinganeurobiologicalmodelofvisualattention,ºIEEETransactionsonImageProcessing,vol.13,no.10,pp.1304±1318,Oct2004.[38]ÐÐ,ªModelsofbottom-upandtop-downvisualattention,ºPh.D.dissertation,CaliforniaInstituteofTechnology,Pasadena,California,Jan2000.[39]R.S.Turner,Intheeye'smind:visionandtheHelmholtz-Heringcontroversy.PrincetonUniversityPress,1994.[40]C.ChubbandG.Sperling,ªDrift-balancedrandomstimuli:ageneralbasisforstudyingnon-fouriermotionperception,ºJOSA,vol.5,no.11,pp.1986±2007,1988.[41]K.Barnard,V.Cardei,andB.Funt,ªAcomparisonofcom-putationalcolorconstancyalgorithms;partone:Methodologyandexperimentswithsynthesizeddata,ºIEEETransactionsinImageProcessing,vol.11,no.9,pp.972±984,2002.[42]K.Barnard,L.Martin,,A.Coath,andB.Funt,ªAcomparisonofcolorconstancyalgorithms.parttwo.experimentswithimagedata,ºIEEETransactionsinImageProcessing,vol.11,no.9,pp.985±996,2002.[43]G.Finlayson,B.Schiele,andJ.Crowley,ªComprehensivecolourimagenormalization,ºin5thEuropeanConferenceonComputerVision,May1998,pp.475±490.[44]A.Hyvrinen,ªFastandrobust®xed-pointalgorithmsforin-dependentcomponentanalysis,ºIEEETransactionsonNeuralNetworks,vol.10,no.3,pp.626±634,1999.[45]C.SiagianandL.Itti,ªGist:Amobileroboticsapplicationofcontext-basedvisioninoutdoorenvironment,ºinProc.IEEE-CVPRWorkshoponAttentionandPerformanceinComputerVision(WAPCV'05),SanDiego,California,Jun2005,pp.1±7.[46]A.TorralbaandP.Sinha,ªStatisticalcontextprimingforobjectdetection,ºinIEEEProc.OfInt.ConfinComp.Vision,2001,pp.763±770.ChristianSiagianiscurrentlyworkingto-wardsaPh.D.degreeinthe®eldofComputerScience.Hisresearchinterestsincluderoboticsandcomputervision,suchasvision-basedmobilerobotlocalizationandsceneclassi®ca-tion,particularlytheonesthatarebiologically-inspired.LaurentIttireceivedhisM.S.degreeinImageProcessingfromtheEcoleNationaleSup´erieuredesT´el´ecommunicationsinParisin1994,andhisPh.D.inComputationandNeuralSystemsfromCaltechin2000.HeisnowanassociateprofessorofComputerScience,Psychology,andNeuroscienceattheUniversityofSouthernCalifornia.Dr.Itti'sresearchinterestsareinbiologically-inspiredcomputationalvision,inparticularinthedomainsofvisualattention,gist,saliency,andsurprise,withtechnologicalapplicationstovideocompression,targetdetection,androbotics.