During indexing a list of instantiations shape identities and poses is compiled constrained only by no missed detections at the expense of false positives Global information such as expected relationships among poses is incorporated afterward to rem ID: 22925
Download Pdf The PPT/PDF document "A CoarsetoFine Strategy for Multiclass S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ACoarse-to-FineStrategyforMulticlassShapeDetectionYaliAmit,DonaldGeman,andXiaodongFanMulticlassshapedetection,inthesenseofrecognizingandlocalizinginstancesfrommultipleshapeclasses,isformulatedasatwo-stepprocessinwhichlocalindexingprimesglobalinterpretation.Duringindexingalistofinstantiations(shapeidentitiesandposes)iscompiled,constrainedonlybynomisseddetectionsattheexpenseoffalsepositives.Globalinformation,suchasexpectedrelationshipsamongposes,isincorporatedafterwardtoremoveambiguities.Thisdivisionismotivatedbycomputationalefficiency.In Y.AmitiswiththeDepartmentofStatisticsandtheDepartmentofComputerScience,UniversityofChicago,Chicago,IL,60637.E-mail:amit@marx.uchicago.edu.D.GemaniswiththeDepartmentofAppliedMathematicsandStatistics, proceduresbutdedicatedtospecificclassesandposes.Accumulatedfalsepositivesareeventuallyremovedbymoreintense,butfocused,processing.Inthisway,theissueofcomputationstronglyinfluencestheverydevelopmentofthealgorithms,ratherthanbeinganafterthought.Anaturalcontrolparameterforbalancingdiscriminationandcomputationisthedegreeofinvarianceofnotinthesenseoffineshapeattributes,suchasgeometricsingularitiesofcurvesandsurfaces,butrathercoarse,genericfeatureswhicharecommoninasetofclass/posepairs.Spreadfeatures([1],[3],[8])provideasimpleexample:Alocalfeatureissaidtobedetectedatagivenlocationiftheresponseofthefeaturedetectorisstrongenoughanywherenearby.Thelargerthespreading(degreeoflocalORing),thehighertheincidenceonanygivenensembleofclassesandposesandcheckingforacertainnumberofdistinguishedspreadfeaturesprovidesasimple,computationallyefficienttestfortheensemble.Duringthecomputationalprocess,theamountofspreadingissuccessivelydiminished.1.2InterpretationTheoutcomeofindexingisacollectionofinstantiationsclass/posepairs.Nocontextualinformation,suchasstructuralorsemanticconstraints,hasbeenemployed.Inparticular,someinstantiationsmaybeinconsistentwithpriorinformationaboutthescenelayout.Moreover,severalclasseswilloftenbedetectedatroughlythesamelocationduetotheinsistenceonminimizingfalsenegatives.Inthispaper,thepassagefromindexingtointerpretationislargelybasedontakingintoaccountpriorknowledgeaboutthenumberofshapesandthemannerinwhichtheyarespatiallyarranged.Assumingshapesdonotoverlap,akeycomponentofthisanalysisisacompetitionamongshapesorsequencesofshapescoveringthesameimageregion,forwhichweemployalikelihoodratiotestmotivatedbyourstatisticalmodelforlocalfeatures.Sincearelativelysmallnumberofcandidateinstantiationsareeverinvolved,itisalsocomputationallyfeasibletobringfinerfeaturesintoplay,aswellastemplate-matching,contextualdisambiguation,andotherintensive1.3NewDirectionsWeexplorethreenewdirections:MultipleShapeClasses.Ourpreviousworkcon-cernedcoarse-to-fine(CTF)representationsandsearchstrategiesforasingleshapeorobjectclassand,hence,wasbasedentirelyonposeaggregation.WeextendthistohierarchiesbasedonrecursivelyContextualAnalysis.Withmultipleclasses,testingonespecific(partial)interpretationagainstanotheriseventuallyunavoidable,whichmeansweneedtestsforcompetinghypotheses.Inparticular,wederiveonlinetestsbasedonlocalfeaturesforresolvingonespecifichypothesis(acharacteratagivenpose)againstanother.Model-BasedFramework.Weintroduceastatisticalmodelforthelocalfeatureswhichprovidesaunifyingframeworkforthealgorithmsemployedinallstagesoftheanalysisandwhichallowsustomathematicallyanalyzetheroleofspreadfeaturesinbalancingdiscriminationandcomputationduringcoarse-to-fineindexing.Theseideasareillustratedbyattemptingtoreadthecharactersappearingonlicenseplates.Surprisingly,per-haps,theredoesnotseemtobeanypublishedliteratureapartfrompatents.SeveralsystemsappeartobeimplementedintheUSandEurope.Forexample,inLondon,carsenteringthemetropolitanareaareidentifiedinordertochargeanentrancefeeand,inFrance,thegoalistoestimatetheaveragedrivingspeedbetweentwopoints.Wehavenowaytoassesstheperformanceoftheseimplementations.Ourworkwasmotivatedbytheproblemofidentifyingcarsenteringaparkinggarageforwhichcurrentsolutionsstillfallshortofcommercialviability,mainlyduetothehighlevelofclutterandvariationinlighting.Itisclearthat,foranyspecifictask,therearelikelytobehighlydedicatedproceduresforimprovingperformance;forexample,onlyreportingplateswithidenticalmatchesontwodifferentphotos,takenatthesameordifferenttimes.OurgoalinsteadisagenericsolutionwhichcouldbeeasilyadaptedtootherOCRscenariosandtoothershapecategoriesand,eventually,tothree-dimensionalobjects.Inparticular,wedonotuseanyformoftraditional,bottom-upsegmentationinordertoidentifycandidateregionsorjumpstarttherecognitionprocess.Therearemanywell-developedtechniquesofthiskindinthedocumentanalysisliteraturewhichareratherdedicatedtospecificapplications;see,forexample,thereview[24]ortheworkin[19].Relatedworkonvisualattention,CTFsearch,hierarchicaltemplate-matching,andlocalORingissurveyedinthefollowingsection.OurformulationofmulticlassshapedetectionisgiveninSection3,followedinSection4byabriefoverviewofthecomputationalstrategy.ThestatisticalmodelforthelocalfeaturesisdescribedinSection5,leadingtoanaturallikelihoodratiotestforanindividualdetection.EfficientindexingisthethemeofSections6,7,and8.ThespreadlocalfeaturesareintroducedinSection6,includingacomparisonofspreadingversustwonaturalalternativessummingthemanddownsamplingandthediscrimina-tion/computationtradeoffisstudiedunderasimplestatis-ticalmodel.HowtestsarelearnedfromdataandorganizedintoaCTFsearcharediscussedinSections7and8,respectively.InSection9,weexplainhowaninterpretationoftheimageisderivedfromtheoutputoftheindexingstage.Theapplicationtoreadinglicenseplates,includingthecontextualanalysis,ispresentedinSection10andweconcludewithadiscussionofourapproachinSection11.Thedivisionofthesystemintoindexingfollowedbyglobalinterpretationismotivatedbycomputa-tionalefficiency.Moregenerally,ourindexingphaseisawayfocusingattention,whichisstudiedinbothcomputervisionandinmodelingbiologicalvision.Thepurposeistofocussubsequent,detailedprocessingonasmallportionofthedata.Twoframeworksareusuallyconsidered:task-indepen-dent,bottom-upcontrolbasedonthesaliencyofvisualstimuli(see,e.g.,[16],[20],[28],[27])andtask-driven,top-downcontrol(see,e.g.,[3],[25],[35],[36]).Ourapproachisessentiallytop-downinthatattentionisdeterminedbytheshapeswesearchfor,althoughthecoarsesttestscouldbeinterpretedasgenericsaliencydetectors.CTFSearch.CTFobjectrecognitionisscatteredthrough-outtheliterature.Forinstance,translation-basedversionscan AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION befoundin[31],[17]andworkondistancematching([32]).Theversionappearingin[12]prefiguresourwork.Relatedideasondealingwithmultipleobjectscanbefoundin[2].Inaddition,CTFsearchmotivatedthefacedetectionalgorithmin[3]andwassystematicallyexploredin[8]basedonanestedhierarchyofposebins(andCTFincomplexitywithinbins)andin[7]basedonanabstracttheoreticalframework.Variationshavealsobeenproposedin[33]and[36]:Whereasmostposesareexplicitlyvisited,computationalefficiencyisachievedbyprocessingwhichisCTFinthesenseofprogressivelyfocusingonhardcases.Whatevertheparticu-larCTFmechanism,theendresultisthatintensiveprocessingisrestrictedtoaverysmallportionoftheimagedata,namely,thoseregionscontainingactualobjectsorobject-likeclutter.Workonefficientindexingbasedongeometrichashing([18])andHoughtransforms([14],[30])isalsorelated.Theissueofcontextiscentraltovisionandseveraldistinctapproachescanbediscernedintheliterature.Inours,contextreferstostructuralratherthansemanticrelationships;indexingisentirelynoncontextualandisfollowedbyglobalinterpretationinconjunctionwithstructuralconstraints.Incontrast,allsceneattributesarediscoveredsimultaneouslyinthecompositionalapproach([13]),whichprovidesapowerfulmethodfordealingwithcontextandocclusion,butinvolvesformulatinginterpreta-tionasglobaloptimization,raisingcomputationalissues.Otherworkinvolvescontextualpriming([34])toovercomepoorresolutionbystartinginterpretationwithanestimateofsemanticcontextbasedonlow-levelfeatures.Contextcanalsobeexploited([6])toprovidelocalshapedescriptors.NaturalVision.Therearestrongconnectionsbetweenspreadinglocalfeaturesandneuralresponsesinthevisualcortex.ResponsestoorientededgesarefoundprimarilyinV1,whereso-calledsimplecellsdetectorientededgesatspecificlocations,whereascomplexcellsrespondtoanorientededgeanywhereinthereceptivefield;see[15].Inotherwords,localORingisperformedoverthereceptivefieldregionandtheresponseofacomplexcellcanthusbeviewedasaspreadedge.Becauseofthehighdensityofedgesinnaturalimages,theextentofspreadingmustbelimited;toomuchwillproduceresponseseverywhere.NeuronsinhigherlevelretinotopiclayersV2andV4exhibitsimilarproperties,inspiringtheworkin[9]and[10]aboutdesigninganeural-likearchitectureforrecognizingpatterns.In[1]and[4],thespreadingofmorecomplexfeaturesisincorporatedintoaneuralarchitectureforinvariantdetec-tion.Anextensiontocontinuous-valuedvariablescanbeachievedwithaMAXoperation,ageneralizationofORing,asproposedin[29].HierarchicalTemplateMatching.Recentworkonhierarchicaltemplatematchingusingdistancetransforms,suchas[11],isrelatedtooursinseveralrespectseventhoughwearenotdoingtemplate-matchingperse.LocalORingasadeviceforgainingstabilitycanbeseenasalimiting,binaryversionofdistancetransformssuchastheChamferdistance([5]).Inaddition,thereisaversionofCTFsearchin[11](althoughonlytranslationisconsideredbasedonmultipleresolutions)whichstillhasmuchincommonwithourapproach,includingedgefeatures,detectingmultipleobjectsusingaclasshierarchy,andimposingarunningnullfalsenegativeconstraint.Anotherapproachtoedge-based,multipleobjectdetectionappearsin[26].LocalFeatures.Finally,inconnectionwithspreadinglocalfeatures,anothermechanismhasbeenproposedin[22]thatallowsforaffineor3Dviewpointchangesornonrigiddeformations.TheresultingSIFTdescriptor,basedonlocalhistogramsofgradientorientations,characterizesaneighborhood(intheGaussianblurredimage)aroundeachindividualdetectedkeypoint,whichissimilartospread-ingthegradientsoveraregion.AdetailedcomparisonoftheperformanceofSIFTwithotherdescriptorscanbefoundin[23]Considerasingle,gray-levelimage.Inparticular,thereisnoinformationfrommotion,depth,orcolor.Weanticipatealargerangeoflightingconditions,asillustratedinFig.1(seealsoFig.5),aswellasaconsiderablerangeofposesatwhicheachshapemaybepresent.Moreover,weanticipateacomplexbackgroundconsistingpartlyofextendedstruc-tures,suchasclutterandnondistinguishedshapes,whichlocallymayappearindistinguishablefromtheshapesofbetherawintensitydataontheimage.Eachshapeofinteresthasaandeachinstantiation(presentationin)ischaracterizedbya.Broadlyspeaking,theposerepresents(nui-sance)parameterswhichatleastpartiallycharacterizetheinstantiation.Forexample,onecomponentoftheposeofaprintedcharactermightbethefont.Insomecontexts,onemightalsoconsiderparametersofillumination.Forsimpli-city,however,weshallrestrictourdiscussiontothepresentationand,specifically(inviewoftheexperimentsonlicenseplates),toposition,scale,andorientation.Muchofwhatfollowsextendstoaffineandmoregeneraltransforma-tions;similarly,itwouldnotbedifficulttoaccommodateparameterssuchasthefontofacharacter.Forapose,letbethetranslation,thescale,andtherotation.Denotebytheidentitypose,namely,1608IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.1.Twoimagesofthebackofacarfromwhichthelicenseplateistoberead. and,andbyareferencesublatticeofthefullimagelatticesuchthatanyshapefitsinside.Foranysubset,letInparticular,isthesupportoftheshapeatposeThesetofpossibleforanimageimageKk¼1ðCwhere,obviously,representsthemaximumnumberofshapesinanygivenlayout.Thus,eachinterpretationhastheform¼fð.Thesupportofaninterpretationisdenoteddenotedki¼1RðkÞ:Wewriteforthetrueinterpretationandassumeitisunambiguous,i.e.,Priorinformationwillprovidesomeconstraintsonthepossiblelists;forinstance,inthecaseofthelicenseplates,weknowapproximatelyhowmanycharactersthereareandhowtheyarelaidout.Infact,itwillbeusefultoconsiderthetrueinterpretationtobearandomvariable,,andtosupposethatknowledgeaboutthelayoutiscapturedbyahighlyconcentratedpriordistributionon.Mostinterpretationshavemasszerounderthisdistributionandmanyinterpreta-tionsinitssupport,denotedby,haveapproximatelythesamemass.Indeed,forsimplicity,wewillassumethatthepriorisuniformonitssupportVERVIEWOFTHEWhatfollowsisasummaryoftheoverallrecognitionstrategy.Allofthematerialfromthispointtotheexperimentspertainstooneoffourtopics:StatisticalModeling.Thegray-levelimagedatatransformedintoanarrayofbinarylocalfeatureswhicharerobusttophotometricvariations.Forsimplicity,weuseeightorientededgefeatures(Section5),buttheentireconstructioncanbeappliedtomorecomplexfeatures,forexample,functionsoftheoriginaledges(seeSection11).Weintroducealikelihoodmodelgivenanimage.Thismodelmotivatesthedefini-tionofanimage-dependentsetÞCdetections,calledan,basedonlikelihoodratiotests.Accordingtotheinvarianceconstraint,thetestsareperformedwithnomisseddetections(i.e.,nulltypeIerror),whichinturnimpliesthatprobabilityone(atleastinprinciple).However,directcomputationofishighlyintensiveduetotheloopoverclass/posepairs.EfficientIndexing.ThepurposeoftheCTFsearchistoacceleratethecomputationof.ThisdependsondevelopingatestforanentiresubsetCwhosecomplexityisoftheorderofthetestforasinglec;butwhichnonethelessretainssomedis-criminatingpower;seeSection6.Thesetisthenfoundbyperformingfirstandthenexploringtheindividualhypothesesinone-by-oneonlyifpositive.Thistwo-stepprocedureistheneasilyextended(inSection8)toafullCTFsearchfortheelementsofandthecomputationalgainprovidedbytheCTFsearchcanbeestimated.SpreadingFeatures.ThekeyingredientintheconstructionofisthenotionofaspreadfeaturebasedonlocalORing.Checkingforaminimumnumberofspreadfeaturesprovidesatestforthe.Thesamespreadfeaturesareusedformanydifferentbins,thusprecomputingthematthestartyieldsanimportantcomputationalgain.IntheAppendix,theoptimaldomainofORing,intermsofdiscrimination,isderivedundertheproposedstatisticalmodelandsomesimplifyingassumptions.GlobalInterpretation.Thefinalphaseischoosinganestimate.Akeystepisacompetitionbetweenanytwointerpretationsy;yforwhich,i.e.,whichcoverthesameimageregion.Thesubinterpretationsmustsatisfythepriorconstraints,namely,y;y;seeSection9.Aspecialcaseofthisprocessisacompetitionbetweensingledetectionswithdifferentclassesbutverysimilarposes.(Weassumeaminimumseparationbetweenshapes,inparticularnoocclusion.)Thecompetitionsonceagaininvolvelikelihoodratiotestsbasedonthelocalfeaturemodel.Wedescribeastatisticalmodelforthepossibleappearancesofacollectionofshapesinanimageaswellasacrudemodelforbackground,i.e.,thoseportionsoftheimagewhichdonotbelongtoshapesofinterest.5.1EdgesTheimagedataistransformedintoarraysofbinaryedges,indicatingthelocationsofasmallnumberofcoarselydefinededgeorientations.Weusetheedgefeaturesdefinedin[3],whichmoreorlesstakethelocalmaximaofthegradientinoneoffourpossibledirectionsandtwopolarities.Theseedgeshaveproveneffectiveinourpreviousworkonobjectrecognition;see[2]and[8].Thereisaverylowthresholdonthegradient;asaresult,severaledgeorientationsmaybepresentatthesamelocation.However,theseedgefeatureshavethreeimportantadvantages:Theycanbecomputedveryquickly,theyarerobustwithrespecttophotometricvaria-tions,andtheyprovidetheingredientsforasimpleback-groundmodelbasedonlabeledpointprocesses.Moresophisticatededgeextractionmethodscanbeused[21],althoughatsomecomputationalcost.Inaddition,morecomplexfeaturescanbedefinedasfunctionsofthebasicedges,thusdecreasingtheirbackgrounddensityandincreasingtheirdiscriminatorypower(see[2])andinsuchawaythatmakestheassumedstatisticalmodelsmorecredible.Fortransparency,wedescribethealgorithmandreportexperimentswiththesimpleedgefeatures.Althoughthestatisticalmodelsbelowaredescribedintermsoftheedgesarrays,implicitlytheydetermineanaturalmodelfortheoriginaldata,namely,uniformoverintensityarraysgivingrisetothesameedges.Still,weshallnotbefurtherconcernedwithdistributionsdirectlyonbeabinaryvariableindicatingwhetherornotanedgeoftypeispresentatlocation.Thetyperepresentstheorientationandpolarity.Theresultingfamily AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION ofbinarymapstransformedintensitydataisdenotedbyÞ¼f;z.Westillassumethat,i.e.,isuniquelydeterminedbytheedgedata.5.2ProbabilityModelTobeginwith,weassumetherandomvariablesconditionallyindependent.Weoffertwoprincipaljustificationsforthishypothesisaswellasanimportantdrawback:Ingeneral,thedegreeofclass-condi-tionalindependenceamongtypicallocalfeaturesdependsstronglyontheamountofinformationcarriedintheposethemoredetailedthedescriptionoftheinstantiation,themoredecoupledthefeatures.Inthecaseofprintedcharacters,mostoftherelevantinformation(otherthanthefont)iscapturedbyposition,scale,andorientation.InaBayesiancontext,conditionalinde-pendenceleadstothenaiveBayesclassifier,amajorsimplification.Whenthedimensionalityofthefea-turesislargerelativetotheamountoftrainingdata,favoringsimpleovercomplexmodels(and,hence,sacrificingmodelingaccuracy)maybeultimatelyadvantageousintermsofcomputationTheresultingbackgroundmodelisnotrealistic.Thebackgroundisahighlycomplexinwhichnearbyedgesarecorrelatedduetoclutterconsistingofpartsofoccludedobjectsandothernondistinguishedstructures.Inparticular,theinde-pendenceassumptionrendersthelikelihoodofactualbackgrounddata(see(4))fartoosmallandthisinturnleadstothetraditionalMAPestimator,beingunreliable.Itisforthisreasonthatwewillnotattempttocompute.Instead,webasetheupcominglikelihoodratiotestsonthresholdscorre-spondingtoafixedmisseddetectionratelearnedfromdata,eitherbyestimatingbackgroundcorrelationsorteststatisticsundershapehypotheses.Foranyinterpretation¼fðÞg2Y,weassumetheshapeshavenonoverlappingsupports,i.e.,.DecomposetheimagelatticeintointoRðyÞc.Theregionrepresentsbackground.Ofcourse,theimagedataovermaybequitecomplexduetoclutterandotherregularstructures,suchasthesmallcharactersanddesignswhichoftenappearonlicenseplates.ItfollowsthatwherewehavewrittenforaWeassumethattheconditionaldistributionofthedataovereachdependsonlyonand,hence,thedistributionofischaracterizedbytheproductoftheindividual(marginal)edgeprobabilitieswherewehavewrittentoindicateconditionalprobabilitygiventheevent.Noticethat(2)iswell-definedduetotheassumptionofnonoverlappingForeaseofexposition,wechooseaverysimplemodelofconstantedgeprobabilitiesonadistinguished,classandpose-dependentsetofpoints.Theideasgeneralizeeasilytothecasewheretheprobabilitiesvarywithtypeandlocation.Specifi-cally,wemakethefollowingapproximation:Foreachclassandforeachedgetype,thereisadistinguishedset;coflocationsinthereferencegridatwhichanedgeoftypehighrelativelikelihoodwhenshapeisatthereferencepose(seeFig.2a).Inotherwords,;cisasetofmodeledges.Furthermore,givenshapeappearsatpose,theprobabilitiesoftheedgesatlocationsaregivenby:c;;c:;;cwherepq.FinallyweassumetheexistenceofabackgroundedgefrequencywhichisthesameasFrom(1)and(2),thefulldatamodelisthen;c;c1610IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.2.(a)Thehorizontaledgelocationsinthereferencegrid,.(b)Edgesintheimageforonepose,.(c)Themodeledges,,fortheentireposebin.(d)Apartitionofintodisjointregionsoftheform.(e)Thelocations(blackpoints)ofactualedgesandthedomainoflocalORing(graystrip),resultingin Underthismodel,theprobabilityofthedatagivennoshapesintheimageis¼;Þ¼EARCHINGFORIndexingreferstocompilingalistofclass/posecandidatesforanimagewithoutconsideringglobalconstraints.Themodeldescribedintheprevioussectionmotivatesaverysimpleprocedurefordefiningbasedonlikelihoodratiotests.Thesnagiscomputationcompilingthelistbybruteforcecomputationishighlyinefficient.Thismotivatestheintroductionofspreadedgesasamechan-ismforacceleratingthecomputationof6.1LikelihoodRatioTestsConsideranonnullinterpretation¼fð.Wearegoingtocomparethelikelihoodoftheedgedatatothelikelihoodofthesamedataunderisthesameasexceptthatoneoftheelementsisreplacedbythebackgroundinterpretation,say.Then,using(3)andcancellationoutside;c ;c pqðzÞ Thislikelihoodratiosimplifiesto: ;c pð1qÞð1pÞqand¼log andtheresultingstatisticonlyinvolvesedgedatarelevanttotheclassposepairTheloglikelihoodratiotestatzerotypeIerrorrelativetothenullhypothesisc;(i.e.,forclassatposereducestoasimple,lineartestevaluating1if0otherwise;candthethresholdischosensuchthatc;.Notethatthesumisoverarelativelysmallnumberoffeatures,concentratedaroundthecontoursoftheshape,i.e.,ontheset;c.Wethereforeseekthesetofallpairsc;forwhich.NoticethatDÞ¼BayesianInference.Maintaininginvariance(nomisseddetections)meansthatwewanttoperformthelikelihoodratiotestin(5)withnomisseddetections.Ofcoursecomputingtheactual(model-based)thresholdwhichachievesthisisintractableand,hence,itwillbefromtrainingdata;seeSection7.Noticethatthethresholdofunityin(5)wouldcorrespondtoalikelihoodratiotestdesignedtominimizetotalerror;moreover,impliesthattheratioin(5)mustexceedunity.However,duetothesevereunderestimationofbackgroundlikelihoods(duetotheindependenceassumption),takingaunitthresholdwouldresultinagreatmanyfalsepositives.Inotherwords,thethresholdsthatarisefromastrictBayesiananalysisarefarmoreconservativethannecessarytoachieveinvariance.ItisforthesereasonsthatthemodelmotivatesourcomputationalstrategyratherthanservingasafoundationforBayesianinference.6.2EfficientSearchWebeginwithpurelypose-basedsubsetsof.Fix,letbeaneighborhoodoftheidentitypose,andput.Supposewewanttofindallforwhich.Wecouldperformabruteforcesearchoverthesetandevaluateforeachelement.Generally,however,thisprocedurewillfailforelementsinsincethebackgroundhypothesisisstatisticallydominant(relativeto).Therefore,itwouldbepreferabletohaveacomputationallyefficientbinarytestforthecompoundevent.Ifthattestfails,thereisnoneedtoperformthesearchforindividualposes.Forsimplicity,weassumethateithertheimagecontainsonlyoneinstancefromornoshapeatallThetestforwillbebasedonathresholdedsumofamoderatenumberofbinaryfeatures,approximatelythesamenumberasin(6).Thetestshouldbecomputationallyefficient(hence,avoidinglargeloopsandonlineoptimiza-tion)andhaveareasonablefalsepositiverateataverysmallfalsenegativerate.Notethatthebruteforcesearchthroughcanbeviewedasatestfortheabovehypothesisoftheform1ifmax0otherwisedenotethesetofimagelocationsofallmodeledgesoftypefortheposesinin20G;cThisisshowninFig.2cfortheclassforhorizontaledgesofonepolarityandasetofposesconsistingofsmallshiftsandscalechanges.Roughlyspeaking,merelyathickeningoftheoftheboundaryofatemplateforclass6.3SumTestOnestraightforwardwaytoconstructabintestfromtheedgedataissimplytosumallthedetectedmodeledgesforalltheposesin,namely,todefine;cThecorrespondingtestisthenmeaning,ofcourse,thatwechoose.ThethresholdshouldsatisfyThediscriminationlevelofthistest(i.e.,falsepositiverateortypeIIerror)is AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION Wewouldnotexpectthistesttobeverydiscriminating.Asimplecomputationshowsthat,under,theprobabilities,areallontheorderofthebackgroundprobabilities.Consequently,thenulltypeIerrorconstraintcanonlybesatisfiedbychoosingarelativelylowthreshold,inwhichcasemightberatherlarge.Inotherwords,inordertocapturealltheshapesofinterest,wewouldneedtoallowmanyconfigurationsofclutter(nottomentionothershapes)topassthetest.Thisobservationwillbeexaminedmorecarefullylateron.6.4SpreadTestAmorediscriminatingtestforcanbeconstructedbybyasmallersumofspreadedgesinordertotakeadvantageofthefactthat,under,weknowapproximatelyhowmanyon-shapeedgesoftypetoexpectinasmallsubregionof.Tothisend,letbeaneighborhoodoftheoriginwhoseshapemaybeadaptedtothefeaturetype.(Forinstance,foraverticaledgebeahorizontalstrip.)Eventually,thesizeofwilldependonthesizeof,butfornowletusconsideritfixed.Foreach,definethespreadedgeoftypeatlocationtobeThus,ifanedgeoftypeisdetectedintheregioncenteredatitisrecordedat.(SeeFig.2e.)Obviously,thiscorrespondstoalocaldisjunctionofelementaryfeatures.Thespreadedges;zareprecomputedandstored.Definealso;nbesetoflocationswhosesurrounding;iinthesensethattheregionsareareni¼1ðz;i(SeeFig.2d.)Tofurthersimplifytheargument,justsupposethesesetscoincide;thiscanalwaysbearrangeduptoafewpixels.Inthatcase,wecanrewrite;iNow,replace;i;i.Thecorrespondingbintestisthen;iThefalsepositiverateis6.5Comparisonrequireanimplicitloopoverthelocations.Theexhaustivetestrequiresasimilarsizeloop(somewhatlargersincethesamelocationcanbehittwicebytwodifferentposes).However,thereisanimportantdifference:Thefeaturescanbecomputedandusedforallsubsequenttests.TheyareThus,thetestsaresignificantlymoreefficientthan.Sincealltestsareinvariantsfor(i.e.,havenulltypeIerrorfor),thekeyissueisreallyoneof.Noticethat,asincreases,theprobabilityofoccurrenceofthefeaturesbothconditionalonconditionalon.Asaresult,theeffectofspreadingonfalsepositiverateisnotentirelyHenceforth,weonlyconsiderrectangularsetswhichareoflengthinthedirectionorthogonaltotheedgeorientationandoflengthintheparalleldirection.(SeeFigs.2d,2e,12b,and12d.)Notethatifwetakeregions,i.e.,regionsofjustonepixel.AssumenowthatthesethasmoreorlessfixedwidthIntheAppendix,weshow,undersimplifyingassump-tions,that:Thetestwithregionsisthemostdiscriminatingoverallpossiblecombinationss;k.Inotherwords,thesmallestachievedwith;kand,hence,theoptimalchoiceforisasingle-pixelstripwhoseorientationisorthogonaltothedirectionoftheedgetypeandwhoselengthroughlymatchesthewidthoftheextendedboundary.Thisresultisveryintuitive:Spreadingasopposedtosum-mingoveraregionthatcancontainatmostoneshapeedgeforanyinstantiationinpreventsoff-shapeedgesfromcontributingexcessivelytothetotalsum.Notethat,if,i.e.,nooff-shapeedgesappear,thenthetwotestsareidentical.Forfutureuse,forageneralspreadlength,letnowreferstotheoptimaltestusingregions6.6SpreadingversusDownsamplingApossiblealternativeforabintestcouldbebasedonthesameedgefeatures,computedonblurredanddownsampledversionsoftheoriginaldata.Thisapproachisverycommoninmanyalgorithmsandarchitectures;see,forexample,thesuccessivedownsamplinginthefeedforwardnetworkof[19]orthejetsproposedin[37].Indeed,lowresolutionedgesdohavehigherincidenceatmodellocations,buttheyarestilllessrobustthanspreadingattheoriginalresolution.Theblurringoperationsmoothsoutlow-contrastboundariesandtherelevantinformationgetslost.ThisisespeciallytrueforrealdatasuchasthatshowninFig.5takenathighlyvaryingcontrastsandlightingconditions.Asanillustration,wetookasampleoftheAandproduced100randomsamplesfromaposebininvolvingshiftsof2pixels,rotationsof10degrees,andscalingineachaxisof20percent;seeFig.3a.Withspread1intheoriginalresolution,plentyoflocationswerefoundwithhighprobability.Forexample,inFig.3b,weshowaprobabilitymapofaverticaledgetypeatalllocationsonthereferencegrid,darkerrepresentshigherprobability.Alongsideisabinaryimageindicatingalllocationswheretheprobabilitywasabove0.7.InFig.3c,thesameinformationisshownforthesameverticaledgetypefromtheimagesblurredanddownsampledby2.Theprobabilitymapsweremagnifiedbyafactorof2tocomparetotheoriginalscale.Notethatmanyfewerlocationsarefoundofprobabilityover0.7.ThestructuresontherightlegoftheAareunstableatlowresolution.Ingeneral,the1612IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 probabilitiesatlowerresolutionwithoutspreadarelowerthantheprobabilitiesattheoriginalresolutionwithspread1.6.7ComputationalGainWehaveproposedthefollowingtwo-stepprocedure.Given,firstcomputethe;iftheresultisnegative,stopand,otherwise,evaluateforeachc;.Thisyieldsawhichmustcontain;moreover,either¼D\ItisofinteresttocomparethisCTFproceduretodirectlyloopingover,whichbydefinitionresultsin.Obviously,thetwo-stepprocedureismorediscriminatingsinceD\.Noticethatthedegreetowhichweoverestimatewillaffecttheamountofprocessingtofollow,inparticular,thenumberofpairwisecomparisonteststhatmustbeperformedfordetectionswithposestoosimilartocoexistinAsforcomputation,wemaketworeasonableassump-tions:1)Meancomputationiscalculatedunderthehypoth-.(Recallthatthebackgroundhypothesisisusuallytrue.)2)Thetesthasapproximatelythesamecomputa-tionalcost,say,as,i.e.,checkingforasinglehypothesisc;.Asaresult,thefalsepositiverateofisthenConsequently,directsearchhascost,whereasthetwo-stepprocedurehas(expected)cost.Measuringthecomputationalgainbytheratiogives whichcanbeverylargeforlargebins.Infact,typically,,sothatwegainevenifhasonlytwoelements.Thereissomeextracostincomputingthefeaturesrelativetosimplydetectingtheoriginaledges.However,sincethesefeaturesaretobeusedinmanytestsfordifferentbins,theyarecomputedonceandforallaprioriandthisextracostcanbeignored.Thisisanimportantadvantageofthefeaturesthathavebeendevelopedforthebintestcanbereusedinanyotherbintest.Wedescribethemechanismfordeterminingatestforageneralsubset.Denoteby,respectively,thesetsofallposesandclassesfoundintheelementsofFromhereon,alltestsarebasedonspreadedges.Consequently,wecandropthesuperscriptandsimply,etc.Forageneralbin,accordingtothedefinitionsof(7)andin(9),weneedtoidentify;cforeachc;;thelocationsandtheextentofthespreadedgesappearingin;andthethreshold.Intestingindividualcandidatesc;using(6),thereisnospreadandthepointsaregivenbythelocationsin;c.These,inturn,canbedirectlycomputedfromthedistinguishedsets;c,whichweassumearederivedfromshapemodels,e.g.,shapetemplates.Insomecases,thestructureofissimpleenoughthatwecandoeverythingdirectlyfromthedistinguishedmodelsets;c.Thisistheprocedureadoptedintheplateexperiments(seeSection10.1).Inothercases,identifyingallc;,andcomputing,canbedifficult.Itmaybemorepracticaltodirectlylearnthedistinguishedspreadedgesfromasamplesubimageswithinstantiationsfrom.Fixaminimum,say.Startwithspread.Findallpairs;zz2BRðÞsuchthat,wheredenotesanestimateofthegivenprobabilitybasedonthetrainingdata.Iftherearemorethansomeminimumnumberofthese,weconsiderthemapreliminarypoolfromwhichtheswillbechosen.Otherwise,takeandrepeatthesearch,andsoforth,allowingthespreadtoincreaseuptosomevalueIffewerthansuchfeatureswithfrequencyatleastfoundat,wedeclarethebintobetooheterogeneoustoconstructaninformativetest.Inthiscase,weassignthebinthetrivialtest,whichispassedbyanydatapoint.If,featuresarefound,weprunethecollectionsothatthespreadingregionsofanytwofeaturesaredisjoint.Thisprocedurewillyieldaspreadandasetoffeature/locationpairs,say;z,suchthatthespreadhas(estimated)probabilityatleastofbeingfoundonaninstantiationfromthebinpopulation.Thebasicassumptionisthat,withareasonablechoiceof,theestimatedspreadwillmoreorlesscorrespondtothewidthoftheset.Ourbintestisthen;zisthethresholdwhichhasyettobedetermined.Estimatingisdelicate,especiallyinviewofourinvarianceconstraint,whichissevereandsomewhatunrealistic,atleastwithoutmassivetrainingsets.Thereareseveralwaystoproceed.Perhapsthemoststraightforwardistoestimatebasedonistheminimumvalueobservedoverorsomefractionthereoftoinsuregoodgeneralization.Thisiswhatisdonein[8],for AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION Fig.3.(a)AsampleofthepopulationofAs.(b)ProbabilitymapsofaverticaledgetypeonthepopulationofAsalongsidelocationsaboveprobability0.7.(c)ProbabilitymapsofaverticaledgetypeonthepopulationofAsblurredanddownsampledby2,alongsidelocationsaboveprobability0.7. AnalternativeistouseaGaussianapproximationtothesumanddeterminebasedonthedistributionof.Sincethevariables;zactuallycorrelatedonbackground,weestimateaback-groundcovariancematrixwhoseentriesarethecovariancesbetweenforarangeofdisplacements.Thematricesarethenusedtodetermineforanyasfollows:First,estimatethemarginalprobabilitiesbasedonback-groundsamples;callthisestimate,whichallowsforbutis,ofcourse,translation-invariant.Themeanandvarianceofarethenestimatedby;z;z;Finally,wetakeis,asindicated,,i.e.,isadjustedtoobtainnofalsenegativesforinthehierarchy.Thisispossible(atthelossofsomediscrimination)duetotheinherentbackgroundnormalization.Ofcourse,sincewearenotdirectlycontrollingthefalsepositiveerror,theresultingthresholdmightnotbeinthetailofthebackground8CTFSThetwostepproceduredescribedinSection6.2wasdedicatedtoprocessingthatportionoftheimagedeterminedbythebin.Asaresultofimposingtranslationinvariance,thisiseasilyextendedtoprocessingtheentireimageinatwo-levelsearchandevenfurthertoamultilevelsearch.8.1Two-LevelSearchFixasmallintegerandletbethesetofposesÞj.ForanyCandanyelement,denotebythesetofclass/posepairsc;forsomec;namely,allposesappearinginwithpositionsshiftedby;zDuetotranslationinvariance,weneedonlydevelopmodelsforsubsetsof.Letbeapartitionof;itisnotessentialthattheelementsofbedisjoint.Inanycase,assumethat,foreach,atesthasbeenlearnedasinSection7basedonaofdistinguishedfeatures.bethesublatticeofthefullimagelatticeonthespacing¼fð;k.Then,thefullsetofposesiscoveredbyshiftsoftheelementsofalongthecoarsesublattice:sublattice:B2B[z2ZBþz:Inordertofindthefullindexset,wefirstloopoveralland,foreach,weloopoverallperformthetest,where;kForthosesubsetsforwhich,weloopoverallindividualc;andexamineeachoneseparatelybasedonthelikelihoodratiotestdescribedinSection6.1.8.2MultilevelSearchTheextensiontomultiplelevelsisstraightforward.Letbeasequenceoffinerandfinerpartitions.Eachelementistheunionofelements.Performthesameloopovershiftsdescribedaboveforallelements.Ifforsome,loopoverallelementsofsuchthatetc.,untilthefinestlevel.ElementsofthatarereachedandpasstheirtestareaddedtoNotethattheloopoverallshiftsintheimageisperformedonlyonthecoarselatticeatthetoplevelofthehierarchy.ThisissummarizedinFig.4.8.3IndexingTheresultofsuchaCTFsearchisasetofdetections(orCwhich,ofcourse,dependsontheimagedata.Moreprecisely,c;ifandonlyifappearingintheentirehierarchy(i.e.,inanypartition)whichcontainsc;.Inotherwords,suchapairc;beenacceptedbyeveryrelevanthypothesistest.Ifindeedeverytestinthehierarchyhadzerofalsenegativeerror,thenwewouldhave,i.e.,thetrueinterpretationwouldonlyinvolveelementsof.Inanycase,wedoconfinefutureprocessingtoIngeneral,,thesetofclass/posepairssatisfyinghypothesistest(6),aredifferent.However,ifthehierarchygoesallthewaydowntoindividualpairsc;.Ofcourse,constraintsonlearningandmemoryrenderthisdifficultwhenisverylarge.Hence,itmaybenecessarytoallowthefinestbinstorepresentmultipleexplanations,althoughperhapspureinclass.NDEXINGTOWenowseektheadmissibleinterpretationwithhighestlikelihood.Inprinciple,wecouldperformabruteforceloopoverallsubsetsof.But,thiscanbesignificantlysimplified.,wherearetwoadmissibleinter-pretationswhoseconcatenationgives,andsimilarlylet1614IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.4.Pseudocodeformultilevelsearch. .Assumethatandthatthesupportsofthetwointerpretationsy;yarethesame,i.e.,whichimpliesthat.Then,duetocancellationoverthebackgroundandoverthedataassociatedwith,itfollowsimmediatelyfrom(3)that PðXjY¼yÞPðXjY¼y0Þ¼ 9.1IndividualShapeCompetitionIntheequationabove,ifthetwointerpretationsy;ydifferbyonlyoneshape,i.e.,if,thentheassumptionsimplythat.Thus,weneedtocomparethelikelihoodsonthetwolargelyoverlappingregions.Thissuggeststhatanefficientstrategyfordisambiguationistobegintheprocessbyresolvingcompet-ingdetectionsinwithverysimilarposes.Differentelementsofmayindeedhaveverysimilarposes;afterall,thedatainaregioncanpassthesequenceoftestsleadingtomorethanoneterminalnodeoftheCTFhierarchy.Inprinciple,onecouldsimplyevaluatethelikelihoodofthedatagiveneachhypothesisandtakethelargest.However,theestimatedposemaynotbesufficientlyprecisetowarrantsuchadecisionandsuchstraightforwardevaluationstendtobesensitivetobackgroundnoise.More-over,wearestilldealingwithindividualdetectionsandthedataconsideredinthelikelihoodevaluationinvolvesonlythe,whichmaynotcoincidewithAmorerobustapproachistoperformlikelihoodratiotestsbetweenpairsofhypothesesc;,andontheregionregionRð0Þsothatthedataconsideredisthesameforbothhypotheses.Thestraightforwardlikelihoodratiobasedon(3)andtakingintoaccountcancellationsisgivenby c;;c;c pqXðzÞÞlog ;c;c pqXðzÞÞlog 9.2SpreadingtheLikelihoodRatioTestNoticethat,foreachedgetype,thesumsrangeoverthesymmetricdifferenceoftheedgesupportsforthetwoshapesattheirrespectiveposes.Inordertostabilizethislog-ratio,werestrictthetwosumstoregionswherethetwo;c;carereallydifferentasopposedtobeingslightshiftsofoneanother.Thisisachievedbylimitingthesumsto;c;c;cÞ½;crespectively,where,foranyset,wedefinetheexpandedversionforsomeisaneighborhoodoftheorigin.TheseregionsareillustratedinFig.9.9.3CompetitionbetweenInterpretationsThispairwisecompetitionisperformedonlyondetectionswithsimilarposes;.Itmakesnosensetoapplyittodetectionswithoverlappingregionswheretherearelargenonoverlappingareas,inwhichcasethetwodetectionsarereallynotexplainingthesamedata.Intheeventofsuchanoverlap,itisnecessary,asindicatedabove,toperformacompetitionbetweenadmissibleinterpretationswiththesamesupport.Thecompetitionbetweentwosuchsequencesisperformedusingthesameloglikelihoodratiotestasfortwoindividualdetections.Foredgetypeandeachinterpretation,let;yyki¼1G;cThetwosumsin(11)arenowperformedon;y;y;y;y,respectively.TheseregionsareillustratedinFig.10.Thenumberofsuchsubinterpretationcomparisonscangrowveryquicklyiftherearelargechainsofpartiallyoverlappingdetections.Inparticular,thisoccurswhendetectionsarefoundthatstraddletworealshapes.Thisdoesnotoccurveryfrequentlyintheexperimentsreportedbelowandvarioussimplepruningmechanismscanbeemployedtoreducesuchinstances.10RStartingfromaphotographoftherearofacar,weseektoidentifythecharactersinthelicenseplate.Onlyonefontis AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION Fig.5.(a)and(b)ThesubimagesextractedfromtheimagesinFig.1usingacoarsedetectorforasequenceofcharacters.(c)Verticaledgesfrom(a).(d)Horizontaledgesfrom(b).(e)Spreadverticaledgeson(a).(f)Spreadhorizontaledgeson(b). modeledalllicenseplatesinthedatasetarefromthestateofMassachusettsandallimagesaretakenfrommoreorlessthesamedistance,althoughthelocationoftheplateintheimagecanvarysignificantly.TwotypicalphotographsareshowninFig.1,illustratingsomeofthechallenges.Duetodifferentilluminations,thecharactersinthetwoimageshaveverydifferentstrokewidthsdespitehavingthesametemplate.Also,differentcontrastsettingsandphysicalconditionsofthelicenseplatesproducevaryingdegreesofbackgroundclutterinthelocalneighborhoodofthechar-acters,asobservedintheleftpanel.Othervariationsresultfromsmallrotationsoftheplatesandsmalldeformationsduetoperspectiveprojection.Forexample,theplateontherightissomewhatwarpedatthemiddleandthesizeofthecharactersisabout25percentsmallerthanthesizeofthoseontheleft.Foradditionalplateimages,togetherwiththeresultingdetections,seeFig.11.Theplateintheoriginalphotographisdetectedusingaverycoarse,edge-basedmodelforasetofsixgenericcharactersarrangedonahorizontallineandsurroundedbyadarkframe,attheexpectedscale,butat1/4oftheoriginalimageresolution.AsubimageisextractedaroundthehighestscoringregionandprocessedusingtheCTFalgorithm.Ifnocharactersaredetectedinthissubimage,thenexthighestscoringplatedetectionisprocessed,etc.Inalmostallimages,thehighestscoringregionwastheactualplate.Inafewimages,someotherrectangularstructurescoredhighest,butthennocharacterswereactuallydetectedsothattheregionwasrejectedandthenextbestdetectionwastheactualplate.Weomitfurtherdetailsbecausethisisnotthelimitingfactorforthisapplication.SubimagesextractedfromthetwoimagesofFig.1areshowninFigs.5aand5b.Themeanspatialdensityofedgesinthesubimagethenservesasanestimatefor,thebackgroundedgeprobability,andweestimatein(10)by.Inthisway,thethresholdsforthebintestsareadaptedtothedata,i.e.,image-dependent.TheedgesandspreadedgesontheextractedimagesinFigs.5aand5bareshowninFigs.5c,5d,5e,and5f.10.1TheCTFHierarchySincethescaleisroughlyknownandtherotationisgenerallysmall,wecantake,definedasfollows:ÞjÞj(i.e.,con-finedtoawindow).Thereare37classesdefinedbytheprototypes(bitmaps),showninFig.6.Bottom-up,binaryclusteringyieldsthepure-classhierarchy.Startingfromtheedgemapsoftheprototypes,ateverylevelofthehierarchyeachclusterismergedwiththenearestonestillavailable,wherethedistancebetweentwoclustersismeasuredastheaverageHammingdistancebetweenanytwooftheirelements.ThehierarchyisshowninFig.6withouttheroot(allclassestogether)andtheleaves(individualclasses).Theclass/posehierarchystartswiththesamestruc-turethereisabincorrespondingtoeachisasetintheclasshierarchy.Eachbininthelastlayeristhenoftheformandissplitintosubbinscorrespondingtotwoscaleranges()andtonine(overlapping)windowsinsidedeterminedbyÞjThespreadingisdeterminedasinSection7andthesets;carecomputeddirectlyfromthecharactertemplates.Thetestsforbinsareconstructedbytakingalledge/locationpairsthatbelongtoclassesinatthereferencepose.Thespreadisnotallowedunderbecausewecananticipatethewidthofbasedontherangeofposesin.Asubsampleofalledge/locationpairsistakentoensurenonoverlappingspreadingdomains.ThisprovidesthesetdescribedinSection7.Thereisnotestfortheroot;thesearchcommenceswiththefourtestscorrespondingtothefoursubnodesoftherootbecausemerginganyofthesefourwithspreadproducedverysmallsets.Perhapsthiscouldbedonemoregradually.1616IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.6.Top:The37prototypesforcharactersinMassachusettslicenseplates.Bottom:Theclasshierarchy.Notshownistherootandthefinalsplitintopureclasses. Fig.7.Thesparsesubsetofedge/locationpairsforsomeofafewbinsinthehierarchy.(a)45degreeedgesonthecluster.(b)90degreeedgesonclusterB;C;D;G;O;Q;.(c)0degreeedgesonclusterJ;S;U;.(d)45degreeedgesonclusterG;O.(e)135degreeedgeson ThesubsetsforseveralbinsaredepictedinFig.7.Forthesubbinsdescribedabove,whichhaveasmallerrangeofposes,thespreadissetto.Moreover,sincethispartofthehierarchyispurelypose-basedandtheclassisunique,onlythehighestscoringdetectionisretainedfor10.2TheIndexingStageWehaveset(seeSection8.1)sothattheimage(i.e.,subimagecontainingtheplate)isscannedatthecoarsestlevelevery5pixels,totalingapproximatelytestsforaplatesubimageofsize.TheoutcomeforthisstageisshowninFig.8a;eachwhitedotrepresentsaforwhichoneofthefourcoarsesttests(seeFig.6)ispositiveatthatshift.Ifthetestforacoarsebinpassesatshift,thetestsatthechildrenareperformedatshiftandsoonattheirchildreniftheresultispositive,untilaleafofthehierarchyisreached.Notethat,duetoourCTFstrategyforevaluatingthetestsinthehierarchy,ifthedatadoreachaleaf,then,necessarily,foreveryancestorinthehierarchy;however,theconditionbyitselfdoesnotimplythatallancestortestsarealsopositive.Thesetofallleafbinsreached(equivalently,thesetofallcompletechains)thenconstitutes.Eachsuchdetectionhasauniqueclass(sinceleavesarepureinclass),buttheposehasonlybeendetermineduptotheresolutionofthesubbinsofAlso,therecanbeseveraldetectionscorrespondingtodifferentclassesatthesameornearbylocations.ThesetoflocationsinisshowninFig.8b.Theposeofeachdetectionisrefinedbyloopingoverasmallrangeofscales,rotations,andshiftsandselectingthec;withthehighestlikelihood,thatis,thehighestscoreunder10.3Interpretation:PriorInformationandCompetitionTheindexsetconsistsofseveraltenstoseveralhundreddetectionsdependingonthecomplexityofthebackgroundandthetypeofclutterintheimage.Atthispoint,wecantakeadvantageoftheaprioriknowledgethatthecharactersappearonastraightlinebyclusteringtheverticalcoordinatesofthedetectedlocationsandusingthelargestclustertoestimatethisglobalposeparameter(seeFig.8c.)Thiseliminatessomefalsepositivescreatedbycombiningpartofarealcharacterwithpartofthebackground,forexamplepartofthesmallcharactersinthewordMassa-chusettsatthetopoftheplate;seeFig.8b.Amongtheremainingdetectionsweperform,thepairwisecompetitionsasdescribedinSection9.ThisisillustratedinFig.9showingaregioninaplatewherebotha3anda5weredetected.Foronetypeofedgeverticaltheregionsareshowningray(Figs.9aand9b).ThewhiteareasillustrateaspreadingoftheseregionsasdefinedinSection9.Figs.9cand9dshowinwhitethelocationsinÞn½),whereanedgeisAfterthepairwisecompetitions,therearesometimeschainsofoverlappingdetections.Itisthennecessarytoperformcompetitions,asdescribedinSection9,betweenvalidcandidateofthechain.Avalidsubse-quenceisonewhichdoesnothaveoverlappingcharacters,andisnotasubsequenceofavalidsubsequence.Thislastcriterionfollowssimplyfrom(5).InFig.10a,weshowaregioninaplatewhereachainofoverlappingdetectionswasfound.Theregions;y;yforonecompetingsubsequence)areshowninFig.10b,foranother()inFig.10c,andtheresultingsymmetricdifferenceinFig.10d.10.4PerformanceMeasures10.4.1ClassificationRateWehavetestedthealgorithmon520plates.Thecorrectcharacterstringisfoundonallbut17plates.Theclassificationratepersymbolismuchhigherover99percent.Mostoftheerrorsinvolveconfusionsbetweenandbetween.SomedetectionsareshowninFig.11.However,therearealsofalsepositives,about30inalltheplatescombined,includingasmallnumberinthecorrectlylabeledplates,usuallyduetodetectingthesymbolnearthebordersoftheplate.OtherfalsepositivesareduetopairsofsmallercharactersasinthelastrowofFig.11.Wehavenotattemptedtodealseparatelywiththeseinthesenseofdesigningdedicatedproceduresforeliminatingthem.10.4.2ComputationTimeTheaverageclassificationtimeis3.5secondsperphotographonaPentium31Mghzlaptop.Approximately1.6secondsisneededtoobtainthesetviatheCTFsearch.Theremaining1.9secondsisdevotedtorefiningtheposeandperformingtheOfinterestistheaveragenumberofdetectionsperbininthetreehierarchyasafunctionofthelevel,ofwhichthere AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION Fig.8.(a)Coarseleveldetections.(b)Fineleveldetections.(c)Detectionsafterpruningbasedonverticalalignment. Fig.9.Competitionbetweenatalocationontheplate.(a)Ingray,inwhite.(b)Sameforclass(c)LocationsinÞn½,whereanedgeisdetected.(d)Locationsin,whereanedgeisdetected. arefive,notincludingtheroot.Forthecoarsestlevel(whichhasfourbins),thereare,onaverage,183detectionsperbinperplate,then37,29,and18forthenextthreelevels,and,finally,fourforthefinestlevel.Onaverage,theCTFsearchyieldsabout150detectionsperplate.IftheCTFsearchisinitiatedwiththeleavesofthehierarchyinFig.6,i.e.,withthepairwiseclusters,theclassificationresultsarealmostthesame,butthecomputa-tiontimedoublesanddetectiontakesabout5seconds.Therefore,approximatelythesameamountoftimeisdevotedtothepostdetectionprocessing(sincetheresultingisaboutthesame).ThisclearlydemonstratestheadvantageoftheCTFcomputation.11DWehavepresentedanapproachtomulticlassshapedetectionwhichfocusesonthecomputationalprocess,dividingitintotworatherdistinctphases:asearchforinstancesofshapesfrommultipleclasseswhichisCTF,context-independent,andconstrainedbyminimizingfalsenegativeerror,followedbyarrangingsubsetsofdetectionsintoglobalinterpretationsusingstructuralconstraintsandmodel-basedcompetitionstoresolveambiguities.SpreadedgesarethekeytoproducingefficienttestsforsubsetsofclassesandposesintheCTFhierarchy;theyarereusableand,hence,efficient,commononshapeinstantiations,andyetsufficientlydiscriminatingagainstbackgroundtolimitthenumberoffalsedetections.Spreadingalsoservesasameanstostabilizelikelihoodratiotestsinthecompetitionphase.Theexperimentsinvolvereadinglicenseplates.Inthisspecialscenario,thereisexactlyoneprototypeshapeforeachobjectclass,buttheproblemisextremelychallengingduetothemultiplicityofposes,extensivebackgroundclutter,andlargevariationsinillumination.TheCTFrecognitionstrategycanbeextendedinvariousdirections,forinstancetomultipleprototypesperclass,(e.g.,multiplefontsinOCR),tosituationsinwhichtemplatesdonotexist(e.g.,faces)andthetestsforclass/posebinsare1618IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.10.Sequencecompetition.(a)Detectedclassesonasubimageachainwithlabels.(b)Thesetsforthesubsequence.(c)forthesubsequence.(d)Thesymmetricdifference Fig.11.Examplesofdetectionsonavarietyofplates.Thelastrowillustratesfalsepositives. learneddirectlyfromsampleimages,andperhapstothree-dimensionalanddeformableobjects.Furthermore,theframeworkcanbeextendedfromedgestomorecomplexfeatureshavingmuchlowerbackgroundprobabilities.Indeeditseemsimperativetoadoptmorediscriminatingfeaturesinordertocopewithmorechallen-gingclutterandawiderrangeofobjectswithmorevariability.Eveninthepresentcontext,itispossiblethatthenumberofindexedinstantiationscouldbesignificantlyreducedusingmorecomplexfeatures;someevidenceforthiswithasingleclasscanbefoundin[2].Thisisadirectionwearecurrentlyexploring,alongwithseveralothers,includinghypothesistestsagainstspecificalternatives(ratherthanbackground),inducingCTFdecompositionsdirectlyfromdatainordertogeneralizetocaseswheretemplatesarenotavailable,andsequentiallearningtechniquessuchasincrementallyupdat-ingCTFhierarchies,andrefiningthetests,asadditionalclassesandsamplesareencountered.RecallfromSection6.4thatourgoalistodeterminetheoptimaldomainofORingforabinoftheformunderourstatisticaledgemodel.A.1SimplifyingAssumptionsTosimplifytheanalysis,supposetheclassisasquare.Inthiscase,therearetwoedgetypesofinteresthorizontalandverticalandacorrespondingsetofmodeledgeforeachone.Supposealsothatonlytranslationinanneighborhoodoftheorigin;scaleandorientationarefixed.ThisisillustratedinFig.12.Therectangles,for.When,adetectededgeisspreadtoastriporientedperpendiculartothedirectionoftheedge;forinstance,foraverticaledge,anedgedetectedatisspreadtoahorizontalstripofwidth1andlengthcenteredatSeeFigs.12cand12dfortwodifferentregionshapescorrespondingto;kWerestricttheanalysistoasingle,sayvertical,anddropthedependenceon;thegeneralresult,combiningedgetypes,isthenstraightforward.Definewhere:ThethresholdsarechosentoinsureanulltypeIerrorandwewishtocomparethetypeIIerrors,,fordifferentvaluesof.Notethataretakentobedisjointand,foranychoiceof,theirunionisafixedset;seeFig.12.Thus,thesmaller,thelargerthenumberofregions.Wealsoassumethattheimageeithercontainsnoshapeoritcontainsoneinstanceoftheshapeatsomeposeandletdenotethenumberofregions.Sinceisthewidthoftheregion,wehaveM=k,whereisthenumberofregionsused;k.LetM=k.Notethatweassumeeachposehitsthesamenumber,,ofregions.Conditioningon,wehavec;Thisimpliesthat,butthevariablesarenotindependent.Furthermore,c;Varc;ÞþðSincetheconditionalexpectationdoesnotdependon,weTheconditionalvarianceisalsoindependentofand,sincethevarianceoftheconditionalexpectationisVarOnbackground,thetestisbinomialn;Qandwehave,andA.2TheCaseThiscase,althoughunrealistic,isilluminating.Sinceisanonnegativerandomvariableaddedtothe.Thus,thelargestpossiblezerofalsenegativethresholdis.Foranyfixed,wehavesincewearesimplyreplacingpartsofthesumby AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION Fig.12.(a)Themodelsquarewiththeregion.(b)Therangeoftranslationsofthesquare.(c)Themodelsquarewiththeregiontiledby,centeredaroundpoints.(d)Thesamewith maxima.Sinceisindependentof,itfollowsthatProposition.,assume1) .Then,.Asaresult,Inparticular,thetest;kisthemostefficient.Note.Theassumptionsarevalidwithinreasonablerangesfortheparametersq;,say,wehaveand,under,thestatisticisbinomialandtheisbinomialM;q.Usingthenormalapproximationtothebinomial, MlqMq and,similarly, mmð1qÞkÞð1qÞkð1qÞkÞ=2264375¼1m1=2 ð1qÞkð1qÞkÞ=22435:Now,m ð1qÞkð1qÞkÞm 1qq2m ð1qÞ2qM wherewehaveused2)inthefirstinequality,1)inthesecond,andinthethird.Theresultfollowsdirectlyfromthisinequality.A.3TheCase,wecannotguaranteenofalsenegatives.Instead, andchoose,mak-ingtheeventveryunlikelyunder.Again,usingthenormalapproximation,theerrorisadecreasingfunctionofs;k Forgeneral,wedonotattemptanalyticalbounds.Rather,weprovidenumericalresultsfortherangeofvaluesofand.InFig.13,weshowplotsforthevalues,and.TheconclusionsarethesameasforisdecreasinginintherangeincreasingforsforanyTheoptimaltestiscorrespondingto;kYaliAmitwassupportedinpartbyUSNationalScienceFoundationITRDMS-0219016.DonaldGemanwassup-portedinpartbytheUSOfficeofNavalResearchundercontractN000120210053,theUSArmyResearchOfficeundergrantDAAD19-02-1-0337,andUSNationalScienceFoundationITRDMS-0219016.XiaodongFanwassup-portedinpartbytheUSOfficeofNavalResearchundercontractN000120210053.[1]Y.Amit,ANeuralNetworkArchitectureforVisualSelection,NeuralComputation,vol.12,pp.1059-1082,2000.2000.Y.Amit,2DObjectDetectionandRecognition.MITPress,2002.2002.Y.AmitandD.Geman,AComputationalModelforVisualNeuralComputation,vol.11,pp.1691-1715,1999.1999.Y.AmitandM.Mascaro,AnIntegratedNetworkforInvariantVisualDetectionandRecognition,VisionResearch,Research,H.Barrow,J.M.Tenenbaum,R.C.Boles,andH.C.Wolf,Para-metricCorrespondenceandChamferMatching:TwoNewTechniquesforImageMatching,Proc.IntlJointConf.Artificialpp.659-663,1977.1977.S.BelongieandJ.Malik,andS.Puzicha,ShapeMatchingandObjectRecognitionUsingShapeContext,IEEETrans.PatternAnalysisandMachineIntelligence,vol.24,pp.509-523,2002.2002.G.BlanchardandD.Geman,HierarchicalTestingDesignsforPatternRecognition,AnnalsofStatistics,2005,toappear.appear.F.FleuretandD.Geman,Coarse-to-FineFaceDetection,IntlJ.ComputerVision,vol.41,pp.85-107,2001.2001.K.FukushimaandS.Miyake,Neocognitron:ANewAlgorithmforPatternRecognitionTolerantofDeformationsandShiftsinPatternRecognition,vol.15,pp.455-469,1982.1982.K.FukushimaandN.Wake,HandwrittenAlphanumericCharacterRecognitionbytheNeocognitron,IEEETrans.Neuralvol.2,pp.355-365,1991.1991.D.M.Gavrila,Multi-FeatureHierarchicalTemplateMatchingUsingDistanceTransforms,Proc.IEEEIntlConf.PatternRecognition98,1620IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.26,NO.12,DECEMBER2004 Fig.13.s;k.(a).(b) S.Geman,K.Manbeck,andE.McClure,Coarse-to-FineSearchandRank-SumStatisticsinObjectRecognition,technicalreport,BrownUniv.,1995.1995.S.Geman,D.Potter,andZ.Chi,CompositionSystems,QuarterlyJ.AppliedMath.,vol.LX,pp.707-737,2002.2002.W.E.L.Grimson,ObjectRecognitionbyComputer:TheRoleofGeometricConstraints.Cambridge,Mass.:MITPress,1990.1990.H.D.Hubel,Eye,Brain,andVision.NewYork:ScientificAm.Library,1988.1988.L.Itti,E.Koch,andC.amdNiebur,AModelofSaliency-BasedVisualAttentionforRapidSceneAnalysis,IEEETrans.PatternAnalysisandMachineIntelligence,vol.20,pp.1254-1260,1998.1998.T.KanadeandH.Schneiderman,ProbabilisticModelingofLocalAppearanceandSpatialRelationshipsforObjectRecognition,ComputerVisionandPatternRecognition,,Y.Lamdan,J.T.Schwartz,andH.J.Wolfson,ObjectRecognitionbyAffineInvariantMatching,Proc.IEEEConf.ComputerVisionandPatternRecognition,pp.335-344,1988.1988.Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,Gradient-BasedLearningAppliedtoDocumentRecognition,Proc.IEEE,vol.86,no.11,pp.2278-2324,Nov.1998.1998.T.Lindeberg,DetectingSalientBlob-LikeImageStructuresandTheirScaleswithaScaleSpacePrimalSketch:AMethodforFocus-of-Attention,IntlJ.ComputerVision,vol.11,pp.283-318,283-318,T.Lindeberg,EdgeDetectionandRidgeDetectionwithAuto-maticScaleSelection,IntlJ.ComputerVision,vol.30,pp.117-156,117-156,D.G.Lowe,DistinctiveImageFeaturesfromScale-InvariantKeypoints,technicalreport,Univ.ofBritishColumbia,2003.2003.K.MikolajczykandC.Schmid,APerformanceEvaluationofLocalDescriptors,Proc.IEEEComputerVisionandPatternRecognition03,pp.257-263,2003.2003.G.Nagy,TwentyYearsofDocumentImageAnalysis,Trans.PatternAnalysisandMachineIntelligence,vol.22,pp.38-62,38-62,V.NavalpakkamandL.Itti,SharingResources:BuyAttention,GetRecognition,Proc.IntlWorkshopAttentionandPerformanceComputerVision,Vision,C.F.OlsonandD.P.Huttenlocher,AutomaticTargetRecognitionbyMatchingOrientedEdgeSegments,IEEETrans.ImageProcessing,vol.6,no.1,pp.103-113,Jan.1997.1997.C.M.PriviteraandL.W.Stark,AlgorithmsforDefiningVisualRegions-of-Interest:ComparisonwithEyeFixation,IEEETrans.PatternAnalysisandMachineIntelligence,pp.970-982,vol.22,2000.2000.D.Reisfeld,H.Wolfson,andY.Yeshurun,Context-FreeAtten-tionalOperators:TheGeneralizedSymmetryTransform,IntlJ.ComputerVision,vol.14,pp.119-130,1995.1995.M.RiesenhuberandT.Poggio,HierarchicalModelsofObjectRecognitioninCortex,NatureNeuroscience,vol.2,pp.1019-1025,1019-1025,A.S.RojerandE.L.Schwartz,AQuotientSpaceHoughTrans-formforScpae-VariantVisualAttention,NeuralNetworksforVisionandImageProcessingG.A.CarpenterandS.Grossberg,eds.MITPress,1992.1992.H.A.Rowley,S.Baluja,andT.Kanade,NeuralNetwork-BasedFaceDetection,IEEETrans.PatternAnalysisandMachineIntelligence,vol.20,pp.23-38,1998.1998.W.Rucklidge,LocatingObjectsUsingtheHausdorffdistance,Proc.IntlConf.ComputerVision,pp.457-464,1995.1995.D.A.Socolinsky,J.D.Neuheisel,C.E.Priebe,J.DeVinney,andD.Marchette,FastFaceDetectionwithaBoostedCCCDClassifier,technicalreport,TheJohnsHopkinsUniv.,2002.2002.A.Torralba,ContextualPrimingforObjectDetection,IntlJ.ComputerVision,vol.53,pp.153-167,2003.2003.S.Ullman,SequenceSeekingandCouterStreams:AComputa-tionalModelforBidirectionalInformationFlowintheVisualCerebralCortex,vol.5,pp.1-11,1995.1995.P.ViolaandM.J.Jones,RobustReal-TimeFaceDetection,IntlConf.ComputerVision,vol.II,p.747,2001.2001.L.Wiskott,J.-M.Fellous,N.Kruger,andC.vonderMarlsburg,FaceRecognitionbyElasticBunchGraphMatching,IEEETrans.PatternAnalysisandMachineIntelligence,vol.19,pp.775-779,1997.YaliAmitreceivedthePhDdegreeinmathe-maticsfromtheWeizmannInstitute,Israel,in1988.HespentthreeyearsasavisitingassistantprofessorintheDivisionofAppliedMathatBrownUniversity,wherehestartedworkingonimageanalysis.In1991,hejoinedtheDepart-mentofStatisticsattheUniversityofChicago.In2000,hewasappointedafullprofessorwithajointappointmentintheDepartmentsofStatisticsandComputerScience.Inrecentyears,hisresearchinterestshavefocusedonobjectdetectionandrecognition,speechrecognition,machinelearning,andcomputationalmodelsforthebiologicalvisualsystem.Muchofthisworkispresentedinamonographpublishedin2002byMITPress.DonaldGemanreceivedtheBAdegreeinEnglishliteraturefromtheUniversityofIllinoisin1965andthePhDdegreeinmathematicsfromNorthwesternUniversityin1970.HejoinedtheDepartmentofMathematicsandStatisticsattheUniversityofMassachusetts-Amherstin1970,wherehebecameaDistin-guishedUniversityProfessor.In2001,hemovedtoTheJohnsHopkinsUniversity,whereheiscurrentlyaprofessorintheDepartmentofAppliedMathematicsandStatisticsandamemberoftheCenterforImagingScienceintheWhitakerInstitute.VisitingappointmentsincludethoseattheUniversityofNorthCarolina(1976-1977),BrownUniversity(1991-1992),EcolePolytechnique(1997-1999),andEcoleNormaleSuperieure-Cachan(2000-2003).Hiscurrentresearchinter-estsincludecomputationalvision,statisticalandsequentiallearning,andbioinformatics.XiaodongFanreceivedtheBSandMSdegreesinelectricalengineeringfromShanghaiJiaoTongUniversity,Shanghai,PeoplesRepublicofChina,in1997and2000,respectively.HejoinedtheDepartmentofElectricalandComputerEngineer-ingatTheJohnsHopkinsUniversityin2000,whereheiscurrentlyaPhDstudent.Hisresearchinterestsincludecomputervision,patternrecog-nition,machinelearning,andimageandvideoFormoreinformationonthisoranyothercomputingtopic,pleasevisitourDigitalLibraryatwww.computer.org/publications/dlib. AMITETAL.:ACOARSE-TO-FINESTRATEGYFORMULTICLASSSHAPEDETECTION