Download presentation
1 -

DistinctiveImageFeaturesfromScaleInvariantKeypointsDavidGLoweComputer


1IntroductionImagematchingisafundamentalaspectofmanyproblemsincomputervisionincludingobjectorscenerecognitionsolvingfor3Dstructurefrommultipleimagesstereocorrespon-denceandmotiontrackingThispaperdescr

jordyn's Recent Documents

Register of Recognised Courses
Register of Recognised Courses

Issue19Updated April2021Academic departments wishing to submit applications for Recognitionare invited to seek further information from the membershipteamat recognitionioporgDegree RecognitionThe degr

published 1K
x0000x0000Attachment 1Repayable Advancesof JuneUNEMPLOYMENT TRUST FUND
x0000x0000Attachment 1Repayable Advancesof JuneUNEMPLOYMENT TRUST FUND

Attachment 1Repayable AdvancesJune2of 65Version NumberDate Description of ChangeEffectiveUSSGL TFM1011/07/2011Original versionS2 11-022011/02/2012Proprietary modifications for USSGL accounts 2590and 5

published 0K
Grade 5 Draft
Grade 5 Draft

NYSED Updated June2019New York State Next Generation MathematicsLearning StandardsGrade 5CrosswalkOperations and Algebraic ThinkingClusterNYS P-12 CCLSNYS Next Generation Learning Standard Write and i

published 0K
xMCIxD 46x 000xMCIxD 46x 000 xMCIxD 48x 000xMC
xMCIxD 46x 000xMCIxD 46x 000 xMCIxD 48x 000xMC

ONC Approach and Stakeholder Opportunities or Advancing Health IT across the Care Continuum PART 31 POTENTIAL STAKEHOLDER ACTIVITIES Establish/ identify clinical and/or interoperability priorities us

published 0K
EXP NUMERO  325510OCTAVA SALA
EXP NUMERO 325510OCTAVA SALA

1EXPEDIENTE NMERO 325510EXP NUMERO 3255/10 OCTAVA SALA2noventa y uno al treinta y uno de diciembre de mil novecientos noventa y seisDel Instituto de Seguridad y Servicios Sociales de l

published 0K
BLACKROCK
BLACKROCK

Investment Stewardship GroupVoting Bulletin Australia New Zealand Banking Group LtdAs part of our fiduciary duty BlackRocks Investment Stewardship team BIS advocates for sound corporate governance an

published 0K
Andrew Saunders                           CounselFacilitiesEngineering
Andrew Saunders CounselFacilitiesEngineering

Mr Saunders receivedbachelor artsin historyfromColgate Universityin 1983 andjurisdoctorcum laudein 1991 fromAlbanyLaw School ofUnionUniversityHisawards includetheMeritoriousCivilianServiceAward the

published 0K
COMPLETE PHILOSOPHICAL ANDofANSELM of CANTERBURYTranslatedJASPER HOPKI
COMPLETE PHILOSOPHICAL ANDofANSELM of CANTERBURYTranslatedJASPER HOPKI

Library of Congress Control Number 00-133229Printed in the United States of AmericaCopyright 2000 by The Arthur J Banning Press MinneapolisMinnesota 55402 All rights reservedIn the notes to the trans

published 0K
Download Section

Download - The PPT/PDF document "" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.






Document on Subject : "DistinctiveImageFeaturesfromScaleInvariantKeypointsDavidGLoweComputer"— Transcript:

1 DistinctiveImageFeaturesfromScale-Invari
DistinctiveImageFeaturesfromScale-InvariantKeypointsDavidG.LoweComputerScienceDepartmentUniversityofBritishColumbiaVancouver,B.C.,Canadalowe@cs.ubc.caJanuary5,2004AbstractThispaperpresentsamethodforextractingdistinctiveinvariantfeaturesfromimagesthatcanbeusedtoperformreliablematchingbetweendifferentviewsofanobjectorscene.Thefeaturesareinvarianttoimagescaleandrotation,andareshowntoproviderobustmatchingacrossaasubstantialrangeofafnedis-tortion,changein3Dviewpoint,additionofnoise,andchangeinillumination.Thefeaturesarehighlydistinctive,inthesensethatasinglefeaturecanbecor-rectlymatchedwithhighprobabilityagainstalargedatabaseoffeaturesfrommanyimages.Thispaperalsodescribesanapproachtousingthesefeaturesforobjectrecognition.Therecognitionproceedsbymatchingindividualfea-turestoadatabaseoffeaturesfromknownobjectsusingafastnearest-neighboralgorithm,followedbyaHoughtransformtoidentifyclustersbelongingtoasin-gleobject,andnallyperformingvericationthroughleast-squaressolutionforconsistentposeparameters.Thisapproachtorecognitioncanrobustlyidentifyobjectsamongclutterandocclusionwhileachievingnearreal-timeperformance.AcceptedforpublicationintheInternationalJournalofComputerVision,2004.1 1IntroductionImagematchingisafundamentalaspectofmanyproblemsincomputervision,includingobjectorscenerecognition,solvingfor3Dstructurefrommultipleimages,stereocorrespon-dence,andmotiontracking.Thispaperdescribesimagefeaturesthathavemanypropertiesthatmakethemsuitableformatchingdifferingimagesofanobjectorscene.Thefeaturesareinvarianttoimagescalingandrotation,andpartiallyinvarianttochangeinilluminationand3Dcameraviewpoint.Theyarewelllocalizedinboththespatialandfrequencydomains,re-ducingtheprobabilityofdisruptionbyocclusion,clutter,ornoise.Largenumbersoffeaturescanbeextractedfromtypicalimageswithefcientalgorithms.Inaddition,thefeaturesarehighlydistinctive,whichallowsasinglefeaturetobecorrectlymatchedwithhighprobabilityagainstalargedatabaseoffeatures,providingabasisforobjectandscenerecognition.Thecostofextractingthesefeaturesisminimizedbytakingacascadelteringapproach,inwhichthemoreexpensiveoperationsareappliedonlyatlocationsthatpassaninitialtest.Followingarethemajorstagesofcomputationusedtogeneratethesetofimagefeatures:1.Scale-spaceextremadetection:Therststageofcomputationsearchesoverallscalesandimagelocations.Itisimplementedefcientlybyusingadifference-of-Gaussianfunctiontoidentifypoten

2 tialinterestpointsthatareinvarianttoscal
tialinterestpointsthatareinvarianttoscaleandorientation.2.Keypointlocalization:Ateachcandidatelocation,adetailedmodelisttodeterminelocationandscale.Keypointsareselectedbasedonmeasuresoftheirstability.3.Orientationassignment:Oneormoreorientationsareassignedtoeachkeypointlo-cationbasedonlocalimagegradientdirections.Allfutureoperationsareperformedonimagedatathathasbeentransformedrelativetotheassignedorientation,scale,andlocationforeachfeature,therebyprovidinginvariancetothesetransformations.4.Keypointdescriptor:Thelocalimagegradientsaremeasuredattheselectedscaleintheregionaroundeachkeypoint.Thesearetransformedintoarepresentationthatallowsforsignicantlevelsoflocalshapedistortionandchangeinillumination.ThisapproachhasbeennamedtheScaleInvariantFeatureTransform(SIFT),asittransformsimagedataintoscale-invariantcoordinatesrelativetolocalfeatures.Animportantaspectofthisapproachisthatitgenerateslargenumbersoffeaturesthatdenselycovertheimageoverthefullrangeofscalesandlocations.Atypicalimageofsize500x500pixelswillgiverisetoabout2000stablefeatures(althoughthisnumberdependsonbothimagecontentandchoicesforvariousparameters).Thequantityoffeaturesispartic-ularlyimportantforobjectrecognition,wheretheabilitytodetectsmallobjectsinclutteredbackgroundsrequiresthatatleast3featuresbecorrectlymatchedfromeachobjectforreli-ableidentication.Forimagematchingandrecognition,SIFTfeaturesarerstextractedfromasetofref-erenceimagesandstoredinadatabase.Anewimageismatchedbyindividuallycomparingeachfeaturefromthenewimagetothispreviousdatabaseandndingcandidatematch-ingfeaturesbasedonEuclideandistanceoftheirfeaturevectors.Thispaperwilldiscussfastnearest-neighboralgorithmsthatcanperformthiscomputationrapidlyagainstlargedatabases.Thekeypointdescriptorsarehighlydistinctive,whichallowsasinglefeaturetonditscorrectmatchwithgoodprobabilityinalargedatabaseoffeatures.However,inacluttered2 image,manyfeaturesfromthebackgroundwillnothaveanycorrectmatchinthedatabase,givingrisetomanyfalsematchesinadditiontothecorrectones.Thecorrectmatchescanbelteredfromthefullsetofmatchesbyidentifyingsubsetsofkeypointsthatagreeontheobjectanditslocation,scale,andorientationinthenewimage.Theprobabilitythatseveralfeatureswillagreeontheseparametersbychanceismuchlowerthantheprobabilitythatanyindividualfeaturematchwillbeinerror.Thedeterminationoftheseconsistentclusterscanbeperformedrapidlybyusinganefcienthashtableimplement

3 ationofthegeneralizedHoughtransform.Each
ationofthegeneralizedHoughtransform.Eachclusterof3ormorefeaturesthatagreeonanobjectanditsposeisthensubjecttofurtherdetailedverication.First,aleast-squaredestimateismadeforanafneapproxi-mationtotheobjectpose.Anyotherimagefeaturesconsistentwiththisposeareidentied,andoutliersarediscarded.Finally,adetailedcomputationismadeoftheprobabilitythataparticularsetoffeaturesindicatesthepresenceofanobject,giventheaccuracyoftandnumberofprobablefalsematches.Objectmatchesthatpassallthesetestscanbeidentiedascorrectwithhighcondence.2RelatedresearchThedevelopmentofimagematchingbyusingasetoflocalinterestpointscanbetracedbacktotheworkofMoravec(1981)onstereomatchingusingacornerdetector.TheMoravecdetectorwasimprovedbyHarrisandStephens(1988)tomakeitmorerepeatableundersmallimagevariationsandnearedges.Harrisalsoshoweditsvalueforefcientmotiontrackingand3Dstructurefrommotionrecovery(Harris,1992),andtheHarriscornerdetectorhassincebeenwidelyusedformanyotherimagematchingtasks.Whilethesefeaturedetectorsareusuallycalledcornerdetectors,theyarenotselectingjustcorners,butratheranyimagelocationthathaslargegradientsinalldirectionsatapredeterminedscale.Theinitialapplicationsweretostereoandshort-rangemotiontracking,buttheapproachwaslaterextendedtomoredifcultproblems.Zhangetal.(1995)showedthatitwaspossi-bletomatchHarriscornersoveralargeimagerangebyusingacorrelationwindowaroundeachcornertoselectlikelymatches.Outlierswerethenremovedbysolvingforafunda-mentalmatrixdescribingthegeometricconstraintsbetweenthetwoviewsofrigidsceneandremovingmatchesthatdidnotagreewiththemajoritysolution.Atthesametime,asimilarapproachwasdevelopedbyTorr(1995)forlong-rangemotionmatching,inwhichgeometricconstraintswereusedtoremoveoutliersforrigidobjectsmovingwithinanimage.Theground-breakingworkofSchmidandMohr(1997)showedthatinvariantlocalfea-turematchingcouldbeextendedtogeneralimagerecognitionproblemsinwhichafeaturewasmatchedagainstalargedatabaseofimages.TheyalsousedHarriscornerstoselectinterestpoints,butratherthanmatchingwithacorrelationwindow,theyusedarotationallyinvariantdescriptorofthelocalimageregion.Thisallowedfeaturestobematchedunderarbitraryorientationchangebetweenthetwoimages.Furthermore,theydemonstratedthatmultiplefeaturematchescouldaccomplishgeneralrecognitionunderocclusionandclutterbyidentifyingconsistentclustersofmatchedfeatures.TheHarriscornerdetectorisverysensitivetochangesinimagescale,soitdoesnotpr

4 ovideagoodbasisformatchingimagesofdiffer
ovideagoodbasisformatchingimagesofdifferentsizes.Earlierworkbytheauthor(Lowe,1999)extendedthelocalfeatureapproachtoachievescaleinvariance.Thisworkalsodescribedanewlocaldescriptorthatprovidedmoredistinctivefeatureswhilebeingless3 sensitivetolocalimagedistortionssuchas3Dviewpointchange.Thiscurrentpaperprovidesamorein-depthdevelopmentandanalysisofthisearlierwork,whilealsopresentinganumberofimprovementsinstabilityandfeatureinvariance.Thereisaconsiderablebodyofpreviousresearchonidentifyingrepresentationsthatarestableunderscalechange.SomeoftherstworkinthisareawasbyCrowleyandParker(1984),whodevelopedarepresentationthatidentiedpeaksandridgesinscalespaceandlinkedtheseintoatreestructure.Thetreestructurecouldthenbematchedbetweenimageswitharbitraryscalechange.Morerecentworkongraph-basedmatchingbyShokoufandeh,MarsicandDickinson(1999)providesmoredistinctivefeaturedescriptorsusingwaveletco-efcients.TheproblemofidentifyinganappropriateandconsistentscaleforfeaturedetectionhasbeenstudiedindepthbyLindeberg(1993,1994).Hedescribesthisasaproblemofscaleselection,andwemakeuseofhisresultsbelow.Recently,therehasbeenanimpressivebodyofworkonextendinglocalfeaturestobeinvarianttofullafnetransformations(Baumberg,2000;TuytelaarsandVanGool,2000;MikolajczykandSchmid,2002;SchaffalitzkyandZisserman,2002;BrownandLowe,2002).Thisallowsforinvariantmatchingtofeaturesonaplanarsurfaceunderchangesinortho-graphic3Dprojection,inmostcasesbyresamplingtheimageinalocalafneframe.How-ever,noneoftheseapproachesareyetfullyafneinvariant,astheystartwithinitialfeaturescalesandlocationsselectedinanon-afne-invariantmannerduetotheprohibitivecostofexploringthefullafnespace.Theafneframesarearealsomoresensitivetonoisethanthoseofthescale-invariantfeatures,soinpracticetheafnefeatureshavelowerrepeatabilitythanthescale-invariantfeaturesunlesstheafnedistortionisgreaterthanabouta40degreetiltofaplanarsurface(Mikolajczyk,2002).Widerafneinvariancemaynotbeimportantformanyapplications,astrainingviewsarebesttakenatleastevery30degreesrotationinview-point(meaningthatrecognitioniswithin15degreesoftheclosesttrainingview)inordertocapturenon-planarchangesandocclusioneffectsfor3Dobjects.Whilethemethodtobepresentedinthispaperisnotfullyafneinvariant,adifferentapproachisusedinwhichthelocaldescriptorallowsrelativefeaturepositionstoshiftsignif-icantlywithonlysmallchangesinthedescriptor.Thisapproachnotonlyallowsthedescrip-to

5 rstobereliablymatchedacrossaconsiderable
rstobereliablymatchedacrossaconsiderablerangeofafnedistortion,butitalsomakesthefeaturesmorerobustagainstchangesin3Dviewpointfornon-planarsurfaces.Otheradvantagesincludemuchmoreefcientfeatureextractionandtheabilitytoidentifylargernumbersoffeatures.Ontheotherhand,afneinvarianceisavaluablepropertyformatchingplanarsurfacesunderverylargeviewchanges,andfurtherresearchshouldbeperformedonthebestwaystocombinethiswithnon-planar3Dviewpointinvarianceinanefcientandstablemanner.Manyotherfeaturetypeshavebeenproposedforuseinrecognition,someofwhichcouldbeusedinadditiontothefeaturesdescribedinthispapertoprovidefurthermatchesunderdifferingcircumstances.Oneclassoffeaturesarethosethatmakeuseofimagecontoursorregionboundaries,whichshouldmakethemlesslikelytobedisruptedbyclutteredback-groundsnearobjectboundaries.Matasetal.,(2002)haveshownthattheirmaximally-stableextremalregionscanproducelargenumbersofmatchingfeatureswithgoodstability.Miko-lajczyketal.,(2003)havedevelopedanewdescriptorthatuseslocaledgeswhileignoringunrelatednearbyedges,providingtheabilitytondstablefeaturesevenneartheboundariesofnarrowshapessuperimposedonbackgroundclutter.NelsonandSelinger(1998)haveshowngoodresultswithlocalfeaturesbasedongroupingsofimagecontours.Similarly,4 PopeandLowe(2000)usedfeaturesbasedonthehierarchicalgroupingofimagecontours,whichareparticularlyusefulforobjectslackingdetailedtexture.Thehistoryofresearchonvisualrecognitioncontainsworkonadiversesetofotherimagepropertiesthatcanbeusedasfeaturemeasurements.CarneiroandJepson(2002)describephase-basedlocalfeaturesthatrepresentthephaseratherthanthemagnitudeoflocalspatialfrequencies,whichislikelytoprovideimprovedinvariancetoillumination.SchieleandCrowley(2000)haveproposedtheuseofmultidimensionalhistogramssummarizingthedistributionofmeasurementswithinimageregions.Thistypeoffeaturemaybeparticularlyusefulforrecognitionoftexturedobjectswithdeformableshapes.BasriandJacobs(1997)havedemonstratedthevalueofextractinglocalregionboundariesforrecognition.Otherusefulpropertiestoincorporateincludecolor,motion,gure-grounddiscrimination,regionshapedescriptors,andstereodepthcues.Thelocalfeatureapproachcaneasilyincorporatenovelfeaturetypesbecauseextrafeaturescontributetorobustnesswhentheyprovidecorrectmatches,butotherwisedolittleharmotherthantheircostofcomputation.Therefore,futuresystemsarelikelytocombinemanyfeaturetypes.3Detectionofscale-spaceextremaAsdescrib

6 edintheintroduction,wewilldetectkeypoint
edintheintroduction,wewilldetectkeypointsusingacascadelteringapproachthatusesefcientalgorithmstoidentifycandidatelocationsthatarethenexaminedinfurtherdetail.Therststageofkeypointdetectionistoidentifylocationsandscalesthatcanberepeatablyassignedunderdifferingviewsofthesameobject.Detectinglocationsthatareinvarianttoscalechangeoftheimagecanbeaccomplishedbysearchingforstablefeaturesacrossallpossiblescales,usingacontinuousfunctionofscaleknownasscalespace(Witkin,1983).IthasbeenshownbyKoenderink(1984)andLindeberg(1994)thatunderavarietyofreasonableassumptionstheonlypossiblescale-spacekernelistheGaussianfunction.There-fore,thescalespaceofanimageisdenedasafunction,L(x;y;),thatisproducedfromtheconvolutionofavariable-scaleGaussian,G(x;y;),withaninputimage,I(x;y):L(x;y;)=G(x;y;)I(x;y);whereistheconvolutionoperationinxandy,andG(x;y;)=1 22e(x2+y2)=22:Toefcientlydetectstablekeypointlocationsinscalespace,wehaveproposed(Lowe,1999)usingscale-spaceextremainthedifference-of-Gaussianfunctionconvolvedwiththeimage,D(x;y;),whichcanbecomputedfromthedifferenceoftwonearbyscalesseparatedbyaconstantmultiplicativefactork:D(x;y;)=(G(x;y;k)G(x;y;))I(x;y)=L(x;y;k)L(x;y;):(1)Thereareanumberofreasonsforchoosingthisfunction.First,itisaparticularlyefcientfunctiontocompute,asthesmoothedimages,L;needtobecomputedinanycaseforscalespacefeaturedescription,andDcanthereforebecomputedbysimpleimagesubtraction.5 Scale octave) Scale(nextoctaveGauDifference ofGau Figure1:Foreachoctaveofscalespace,theinitialimageisrepeatedlyconvolvedwithGaussianstoproducethesetofscalespaceimagesshownontheleft.AdjacentGaussianimagesaresubtractedtoproducethedifference-of-Gaussianimagesontheright.Aftereachoctave,theGaussianimageisdown-sampledbyafactorof2,andtheprocessrepeated.Inaddition,thedifference-of-Gaussianfunctionprovidesacloseapproximationtothescale-normalizedLaplacianofGaussian,2r2G,asstudiedbyLindeberg(1994).LindebergshowedthatthenormalizationoftheLaplacianwiththefactor2isrequiredfortruescaleinvariance.Indetailedexperimentalcomparisons,Mikolajczyk(2002)foundthatthemaximaandminimaof2r2Gproducethemoststableimagefeaturescomparedtoarangeofotherpossibleimagefunctions,suchasthegradient,Hessian,orHarriscornerfunction.TherelationshipbetweenDand2r2Gcanbeunderstoodfromtheheatdiffusionequa-tion(parameterizedintermsofratherthanthemoreusualt=2):@G @=r2G:Fromthis,weseethatr2Gcanbecomputedfrom

7 thenitedifferenceapproximationto@G=@,u
thenitedifferenceapproximationto@G=@,usingthedifferenceofnearbyscalesatkand:r2G=@G @G(x;y;k)G(x;y;) kandtherefore,G(x;y;k)G(x;y;)(k1)2r2G:Thisshowsthatwhenthedifference-of-Gaussianfunctionhasscalesdifferingbyacon-stantfactoritalreadyincorporatesthe2scalenormalizationrequiredforthescale-invariant6 Scale Figure2:Maximaandminimaofthedifference-of-Gaussianimagesaredetectedbycomparingapixel(markedwithX)toits26neighborsin3x3regionsatthecurrentandadjacentscales(markedwithcircles).Laplacian.Thefactor(k1)intheequationisaconstantoverallscalesandthereforedoesnotinuenceextremalocation.Theapproximationerrorwillgotozeroaskgoesto1,butinpracticewehavefoundthattheapproximationhasalmostnoimpactonthestabilityofextremadetectionorlocalizationforevensignicantdifferencesinscale,suchask=p 2.AnefcientapproachtoconstructionofD(x;y;)isshowninFigure1.TheinitialimageisincrementallyconvolvedwithGaussianstoproduceimagesseparatedbyaconstantfactorkinscalespace,shownstackedintheleftcolumn.Wechoosetodivideeachoctaveofscalespace(i.e.,doublingof)intoanintegernumber,s,ofintervals,sok=21=s:Wemustproduces+3imagesinthestackofblurredimagesforeachoctave,sothatnalextremadetectioncoversacompleteoctave.Adjacentimagescalesaresubtractedtoproducethedifference-of-Gaussianimagesshownontheright.Onceacompleteoctavehasbeenprocessed,weresampletheGaussianimagethathastwicetheinitialvalueof(itwillbe2imagesfromthetopofthestack)bytakingeverysecondpixelineachrowandcolumn.Theaccuracyofsamplingrelativetoisnodifferentthanforthestartofthepreviousoctave,whilecomputationisgreatlyreduced.3.1LocalextremadetectionInordertodetectthelocalmaximaandminimaofD(x;y;),eachsamplepointiscomparedtoitseightneighborsinthecurrentimageandnineneighborsinthescaleaboveandbelow(seeFigure2).Itisselectedonlyifitislargerthanalloftheseneighborsorsmallerthanallofthem.Thecostofthischeckisreasonablylowduetothefactthatmostsamplepointswillbeeliminatedfollowingtherstfewchecks.Animportantissueistodeterminethefrequencyofsamplingintheimageandscaledo-mainsthatisneededtoreliablydetecttheextrema.Unfortunately,itturnsoutthatthereisnominimumspacingofsamplesthatwilldetectallextrema,astheextremacanbearbitrar-ilyclosetogether.Thiscanbeseenbyconsideringawhitecircleonablackbackground,whichwillhaveasinglescalespacemaximumwherethecircularpositivecentralregionofthedifference-of-Gaussianfunctionmatchesthesizeandlocationofthecircle.For

8 averyelongatedellipse,therewillbetwomaxi
averyelongatedellipse,therewillbetwomaximaneareachendoftheellipse.Asthelocationsofmaximaareacontinuousfunctionoftheimage,forsomeellipsewithintermediateelongationtherewillbeatransitionfromasinglemaximumtotwo,withthemaximaarbitrarilycloseto7 0 20 40 60 80 100 1 2 3 4 5 6 7 8 Repeatability (%) Number of scales sampled per octaveMatching location and scale Nearest descriptor in database 500 1000 1500 2000 2500 3000 3500 1 2 3 4 5 6 7 8 Number of keypoints per image Number of scales sampled per octaveTotal number of keypoints Nearest descriptor in database Figure3:Thetoplineoftherstgraphshowsthepercentofkeypointsthatarerepeatablydetectedatthesamelocationandscaleinatransformedimageasafunctionofthenumberofscalessampledperoctave.Thelowerlineshowsthepercentofkeypointsthathavetheirdescriptorscorrectlymatchedtoalargedatabase.Thesecondgraphshowsthetotalnumberofkeypointsdetectedinatypicalimageasafunctionofthenumberofscalesamples.eachothernearthetransition.Therefore,wemustsettleforasolutionthattradesoffefciencywithcompleteness.Infact,asmightbeexpectedandisconrmedbyourexperiments,extremathatareclosetogetherarequiteunstabletosmallperturbationsoftheimage.Wecandeterminethebestchoicesexperimentallybystudyingarangeofsamplingfrequenciesandusingthosethatprovidethemostreliableresultsunderarealisticsimulationofthematchingtask.3.2FrequencyofsamplinginscaleTheexperimentaldeterminationofsamplingfrequencythatmaximizesextremastabilityisshowninFigures3and4.Thesegures(andmostothersimulationsinthispaper)arebasedonamatchingtaskusingacollectionof32realimagesdrawnfromadiverserange,includingoutdoorscenes,humanfaces,aerialphotographs,andindustrialimages(theimagedomainwasfoundtohavealmostnoinuenceonanyoftheresults).Eachimagewasthensubjecttoarangeoftransformations,includingrotation,scaling,afnestretch,changeinbrightnessandcontrast,andadditionofimagenoise.Becausethechangesweresynthetic,itwaspossibletopreciselypredictwhereeachfeatureinanoriginalimageshouldappearinthetransformedimage,allowingformeasurementofcorrectrepeatabilityandpositionalaccuracyforeachfeature.Figure3showsthesesimulationresultsusedtoexaminetheeffectofvaryingthenumberofscalesperoctaveatwhichtheimagefunctionissampledpriortoextremadetection.Inthiscase,eachimagewasresampledfollowingrotationbyarandomangleandscalingbyarandomamountbetween0.2of0.9timestheoriginalsize.Keypointsfromthereducedresolut

9 ionimagewerematchedagainstthosefromtheor
ionimagewerematchedagainstthosefromtheoriginalimagesothatthescalesforallkeypointswouldbebepresentinthematchedimage.Inaddition,1%imagenoisewasadded,meaningthateachpixelhadarandomnumberaddedfromtheuniforminterval[-0.01,0.01]wherepixelvaluesareintherange[0,1](equivalenttoprovidingslightlylessthan6bitsofaccuracyforimagepixels).8 0 20 40 60 80 100 1 1.2 1.4 1.6 1.8 2 Repeatability (%) Prior smoothing for each octave (sigma)Matching location and scale Nearest descriptor in database Figure4:Thetoplineinthegraphshowsthepercentofkeypointlocationsthatarerepeatablydetectedinatransformedimageasafunctionofthepriorimagesmoothingfortherstlevelofeachoctave.Thelowerlineshowsthepercentofdescriptorscorrectlymatchedagainstalargedatabase.ThetoplineintherstgraphofFigure3showsthepercentofkeypointsthataredetectedatamatchinglocationandscaleinthetransformedimage.Forallexamplesinthispaper,wedeneamatchingscaleasbeingwithinafactorofp 2ofthecorrectscale,andamatchinglocationasbeingwithinpixels,whereisthescaleofthekeypoint(denedfromequation(1)asthestandarddeviationofthesmallestGaussianusedinthedifference-of-Gaussianfunction).Thelowerlineonthisgraphshowsthenumberofkeypointsthatarecorrectlymatchedtoadatabaseof40,000keypointsusingthenearest-neighbormatchingproceduretobedescribedinSection6(thisshowsthatoncethekeypointisrepeatablylocated,itislikelytobeusefulforrecognitionandmatchingtasks).Asthisgraphshows,thehighestrepeatabilityisobtainedwhensampling3scalesperoctave,andthisisthenumberofscalesamplesusedforallotherexperimentsthroughoutthispaper.Itmightseemsurprisingthattherepeatabilitydoesnotcontinuetoimproveasmorescalesaresampled.Thereasonisthatthisresultsinmanymorelocalextremabeingdetected,buttheseextremaareonaveragelessstableandthereforearelesslikelytobedetectedinthetransformedimage.ThisisshownbythesecondgraphinFigure3,whichshowstheaveragenumberofkeypointsdetectedandcorrectlymatchedineachimage.Thenumberofkeypointsriseswithincreasedsamplingofscalesandthetotalnumberofcorrectmatchesalsorises.Sincethesuccessofobjectrecognitionoftendependsmoreonthequantityofcorrectlymatchedkeypoints,asopposedtotheirpercentagecorrectmatching,formanyapplicationsitwillbeoptimaltousealargernumberofscalesamples.However,thecostofcomputationalsoriseswiththisnumber,sofortheexperimentsinthispaperwehavechosentousejust3scalesamplesperoctave.Tosummarize,theseexperimentsshowthatthescale-spacedifference-

10 of-Gaussianfunc-tionhasalargenumberofext
of-Gaussianfunc-tionhasalargenumberofextremaandthatitwouldbeveryexpensivetodetectthemall.Fortunately,wecandetectthemoststableandusefulsubsetevenwithacoarsesamplingofscales.9 3.3FrequencyofsamplinginthespatialdomainJustaswedeterminedthefrequencyofsamplingperoctaveofscalespace,sowemustde-terminethefrequencyofsamplingintheimagedomainrelativetothescaleofsmoothing.Giventhatextremacanbearbitrarilyclosetogether,therewillbeasimilartrade-offbetweensamplingfrequencyandrateofdetection.Figure4showsanexperimentaldeterminationoftheamountofpriorsmoothing,,thatisappliedtoeachimagelevelbeforebuildingthescalespacerepresentationforanoctave.Again,thetoplineistherepeatabilityofkeypointdetection,andtheresultsshowthattherepeatabilitycontinuestoincreasewith.However,thereisacosttousingalargeintermsofefciency,sowehavechosentouse=1:6,whichprovidesclosetooptimalrepeatability.ThisvalueisusedthroughoutthispaperandwasusedfortheresultsinFigure3.Ofcourse,ifwepre-smooththeimagebeforeextremadetection,weareeffectivelydis-cardingthehighestspatialfrequencies.Therefore,tomakefulluseoftheinput,theimagecanbeexpandedtocreatemoresamplepointsthanwerepresentintheoriginal.Wedou-blethesizeoftheinputimageusinglinearinterpolationpriortobuildingtherstlevelofthepyramid.Whiletheequivalentoperationcouldeffectivelyhavebeenperformedbyus-ingsetsofsubpixel-offsetltersontheoriginalimage,theimagedoublingleadstoamoreefcientimplementation.Weassumethattheoriginalimagehasablurofatleast=0:5(theminimumneededtopreventsignicantaliasing),andthatthereforethedoubledimagehas=1:0relativetoitsnewpixelspacing.Thismeansthatlittleadditionalsmoothingisneededpriortocreationoftherstoctaveofscalespace.Theimagedoublingincreasesthenumberofstablekeypointsbyalmostafactorof4,butnosignicantfurtherimprovementswerefoundwithalargerexpansionfactor.4AccuratekeypointlocalizationOnceakeypointcandidatehasbeenfoundbycomparingapixeltoitsneighbors,thenextstepistoperformadetailedttothenearbydataforlocation,scale,andratioofprincipalcurvatures.Thisinformationallowspointstoberejectedthathavelowcontrast(andarethereforesensitivetonoise)orarepoorlylocalizedalonganedge.Theinitialimplementationofthisapproach(Lowe,1999)simplylocatedkeypointsatthelocationandscaleofthecentralsamplepoint.However,recentlyBrownhasdevelopedamethod(BrownandLowe,2002)forttinga3Dquadraticfunctiontothelocalsamplepointstodeterminetheinterpolatedlocationofthemaximum,and

11 hisexperimentsshowedthatthisprovidesasub
hisexperimentsshowedthatthisprovidesasubstantialimprovementtomatchingandstability.HisapproachusestheTaylorexpansion(uptothequadraticterms)ofthescale-spacefunction,D(x;y;),shiftedsothattheoriginisatthesamplepoint:D(x)=D+@D @xTx+1 2xT@2D @x2x(2)whereDanditsderivativesareevaluatedatthesamplepointandx=(x;y;)Tistheoffsetfromthispoint.Thelocationoftheextremum,^x,isdeterminedbytakingthederivativeofthisfunctionwithrespecttoxandsettingittozero,giving^x=@2D @x21@D @x:(3)10 (a) (b) (c) (d) Figure5:Thisgureshowsthestagesofkeypointselection.(a)The233x189pixeloriginalimage.(b)Theinitial832keypointslocationsatmaximaandminimaofthedifference-of-Gaussianfunction.Keypointsaredisplayedasvectorsindicatingscale,orientation,andlocation.(c)Afterapplyingathresholdonminimumcontrast,729keypointsremain.(d)Thenal536keypointsthatremainfollowinganadditionalthresholdonratioofprincipalcurvatures.AssuggestedbyBrown,theHessianandderivativeofDareapproximatedbyusingdif-ferencesofneighboringsamplepoints.Theresulting3x3linearsystemcanbesolvedwithminimalcost.Iftheoffset^xislargerthan0.5inanydimension,thenitmeansthattheex-tremumliesclosertoadifferentsamplepoint.Inthiscase,thesamplepointischangedandtheinterpolationperformedinsteadaboutthatpoint.Thenaloffset^xisaddedtothelocationofitssamplepointtogettheinterpolatedestimateforthelocationoftheextremum.Thefunctionvalueattheextremum,D(^x),isusefulforrejectingunstableextremawithlowcontrast.Thiscanbeobtainedbysubstitutingequation(3)into(2),givingD(^x)=D+1 2@D @xT^x:Fortheexperimentsinthispaper,allextremawithavalueofjD(^x)jlessthan0.03werediscarded(asbefore,weassumeimagepixelvaluesintherange[0,1]).Figure5showstheeffectsofkeypointselectiononanaturalimage.Inordertoavoidtoomuchclutter,alow-resolution233by189pixelimageisusedandkeypointsareshownasvectorsgivingthelocation,scale,andorientationofeachkeypoint(orientationassignmentisdescribedbelow).Figure5(a)showstheoriginalimage,whichisshownatreducedcontrastbehindthesubsequentgures.Figure5(b)showsthe832keypointsatalldetectedmaxima11 andminimaofthedifference-of-Gaussianfunction,while(c)showsthe729keypointsthatremainfollowingremovalofthosewithavalueofjD(^x)jlessthan0.03.Part(d)willbeexplainedinthefollowingsection.4.1EliminatingedgeresponsesForstability,itisnotsufcienttorejectkeypointswithlowcontrast.Thedifference-of-Gaussianfunctionwillhaveastrongresponsealongedges,evenifthelocationalongtheedgeispoorly

12 determinedandthereforeunstabletosmallamo
determinedandthereforeunstabletosmallamountsofnoise.Apoorlydenedpeakinthedifference-of-Gaussianfunctionwillhavealargeprincipalcurvatureacrosstheedgebutasmalloneintheperpendiculardirection.Theprincipalcurva-turescanbecomputedfroma2x2Hessianmatrix,H,computedatthelocationandscaleofthekeypoint:H="DxxDxyDxyDyy#(4)Thederivativesareestimatedbytakingdifferencesofneighboringsamplepoints.TheeigenvaluesofHareproportionaltotheprincipalcurvaturesofD.BorrowingfromtheapproachusedbyHarrisandStephens(1988),wecanavoidexplicitlycomputingtheeigenvalues,asweareonlyconcernedwiththeirratio.Let betheeigenvaluewiththelargestmagnitudeand bethesmallerone.Then,wecancomputethesumoftheeigenvaluesfromthetraceofHandtheirproductfromthedeterminant:Tr(H)=Dxx+Dyy= + ;Det(H)=DxxDyy(Dxy)2= :Intheunlikelyeventthatthedeterminantisnegative,thecurvatureshavedifferentsignssothepointisdiscardedasnotbeinganextremum.Letbetheratiobetweenthelargestmagnitudeeigenvalueandthesmallerone,sothat =r .Then,Tr(H)2 Det(H)=( + )2 =(r + )2 r 2=(+1)2 ;whichdependsonlyontheratiooftheeigenvaluesratherthantheirindividualvalues.Thequantity(+1)2=risataminimumwhenthetwoeigenvaluesareequalanditincreaseswith.Therefore,tocheckthattheratioofprincipalcurvaturesisbelowsomethreshold,,weonlyneedtocheckTr(H)2 Det(H)(+1)2 :Thisisveryefcienttocompute,withlessthan20oatingpointoperationsrequiredtotesteachkeypoint.Theexperimentsinthispaperuseavalueof=10,whicheliminateskeypointsthathavearatiobetweentheprincipalcurvaturesgreaterthan10.ThetransitionfromFigure5(c)to(d)showstheeffectsofthisoperation.12 5OrientationassignmentByassigningaconsistentorientationtoeachkeypointbasedonlocalimageproperties,thekeypointdescriptorcanberepresentedrelativetothisorientationandthereforeachievein-variancetoimagerotation.ThisapproachcontrastswiththeorientationinvariantdescriptorsofSchmidandMohr(1997),inwhicheachimagepropertyisbasedonarotationallyinvariantmeasure.Thedisadvantageofthatapproachisthatitlimitsthedescriptorsthatcanbeusedanddiscardsimageinformationbynotrequiringallmeasurestobebasedonaconsistentrotation.Followingexperimentationwithanumberofapproachestoassigningalocalorientation,thefollowingapproachwasfoundtogivethemoststableresults.ThescaleofthekeypointisusedtoselecttheGaussiansmoothedimage,L,withtheclosestscale,sothatallcompu-tationsareperformedinascale-invariantmanner.Foreachimagesample,L(x;y),atthisscale,thegradientmagnitude,m(x;y),an

13 dorientation,(x;y),isprecomputedusingpi
dorientation,(x;y),isprecomputedusingpixeldifferences:m(x;y)=q (L(x+1;y)L(x1;y))2+(L(x;y+1)L(x;y1))2(x;y)=tan1((L(x;y+1)L(x;y1))=(L(x+1;y)L(x1;y)))Anorientationhistogramisformedfromthegradientorientationsofsamplepointswithinaregionaroundthekeypoint.Theorientationhistogramhas36binscoveringthe360degreerangeoforientations.Eachsampleaddedtothehistogramisweightedbyitsgradientmagni-tudeandbyaGaussian-weightedcircularwindowwithathatis1.5timesthatofthescaleofthekeypoint.Peaksintheorientationhistogramcorrespondtodominantdirectionsoflocalgradients.Thehighestpeakinthehistogramisdetected,andthenanyotherlocalpeakthatiswithin80%ofthehighestpeakisusedtoalsocreateakeypointwiththatorientation.Therefore,forlocationswithmultiplepeaksofsimilarmagnitude,therewillbemultiplekeypointscreatedatthesamelocationandscalebutdifferentorientations.Onlyabout15%ofpointsareassignedmultipleorientations,butthesecontributesignicantlytothestabilityofmatching.Finally,aparabolaisttothe3histogramvaluesclosesttoeachpeaktointerpolatethepeakpositionforbetteraccuracy.Figure6showstheexperimentalstabilityoflocation,scale,andorientationassignmentunderdifferingamountsofimagenoise.Asbeforetheimagesarerotatedandscaledbyrandomamounts.Thetoplineshowsthestabilityofkeypointlocationandscaleassign-ment.Thesecondlineshowsthestabilityofmatchingwhentheorientationassignmentisalsorequiredtobewithin15degrees.Asshownbythegapbetweenthetoptwolines,theorientationassignmentremainsaccurate95%ofthetimeevenafteradditionof10%pixelnoise(equivalenttoacameraprovidinglessthan3bitsofprecision).Themeasuredvari-anceoforientationforthecorrectmatchesisabout2.5degrees,risingto3.9degreesfor10%noise.ThebottomlineinFigure6showsthenalaccuracyofcorrectlymatchingakeypointdescriptortoadatabaseof40,000keypoints(tobediscussedbelow).Asthisgraphshows,theSIFTfeaturesareresistanttoevenlargeamountsofpixelnoise,andthemajorcauseoferroristheinitiallocationandscaledetection.13 0 20 40 60 80 100 0% 2% 4% 6% 8% 10% Repeatability (%) Image noiseMatching location and scale Matching location, scale, and orientation Nearest descriptor in database Figure6:Thetoplineinthegraphshowsthepercentofkeypointlocationsandscalesthatarerepeat-ablydetectedasafunctionofpixelnoise.Thesecondlineshowstherepeatabilityafteralsorequiringagreementinorientation.Thebottomlineshowsthenalpercentofdescriptorscorrectlymatchedtoalargedatabase.6Thelocalimagedescr

14 iptorThepreviousoperationshaveassignedan
iptorThepreviousoperationshaveassignedanimagelocation,scale,andorientationtoeachkey-point.Theseparametersimposearepeatablelocal2Dcoordinatesysteminwhichtodescribethelocalimageregion,andthereforeprovideinvariancetotheseparameters.Thenextstepistocomputeadescriptorforthelocalimageregionthatishighlydistinctiveyetisasinvariantaspossibletoremainingvariations,suchaschangeinilluminationor3Dviewpoint.Oneobviousapproachwouldbetosamplethelocalimageintensitiesaroundthekey-pointattheappropriatescale,andtomatchtheseusinganormalizedcorrelationmeasure.However,simplecorrelationofimagepatchesishighlysensitivetochangesthatcausemis-registrationofsamples,suchasafneor3Dviewpointchangeornon-rigiddeformations.AbetterapproachhasbeendemonstratedbyEdelman,Intrator,andPoggio(1997).Theirpro-posedrepresentationwasbaseduponamodelofbiologicalvision,inparticularofcomplexneuronsinprimaryvisualcortex.Thesecomplexneuronsrespondtoagradientataparticularorientationandspatialfrequency,butthelocationofthegradientontheretinaisallowedtoshiftoverasmallreceptiveeldratherthanbeingpreciselylocalized.Edelmanetal.hypoth-esizedthatthefunctionofthesecomplexneuronswastoallowformatchingandrecognitionof3Dobjectsfromarangeofviewpoints.Theyhaveperformeddetailedexperimentsusing3Dcomputermodelsofobjectandanimalshapeswhichshowthatmatchinggradientswhileallowingforshiftsintheirpositionresultsinmuchbetterclassicationunder3Drotation.Forexample,recognitionaccuracyfor3Dobjectsrotatedindepthby20degreesincreasedfrom35%forcorrelationofgradientsto94%usingthecomplexcellmodel.Ourimplementationdescribedbelowwasinspiredbythisidea,butallowsforpositionalshiftusingadifferentcomputationalmechanism.14 Image gradientsKeypoint descriptor Figure7:Akeypointdescriptoriscreatedbyrstcomputingthegradientmagnitudeandorientationateachimagesamplepointinaregionaroundthekeypointlocation,asshownontheleft.TheseareweightedbyaGaussianwindow,indicatedbytheoverlaidcircle.Thesesamplesarethenaccumulatedintoorientationhistogramssummarizingthecontentsover4x4subregions,asshownontheright,withthelengthofeacharrowcorrespondingtothesumofthegradientmagnitudesnearthatdirectionwithintheregion.Thisgureshowsa2x2descriptorarraycomputedfroman8x8setofsamples,whereastheexperimentsinthispaperuse4x4descriptorscomputedfroma16x16samplearray.6.1DescriptorrepresentationFigure7illustratesthecomputationofthekeypointdescriptor.Firsttheimagegradientmag-nitudesandorient

15 ationsaresampledaroundthekeypointlocatio
ationsaresampledaroundthekeypointlocation,usingthescaleofthekeypointtoselectthelevelofGaussianblurfortheimage.Inordertoachieveorientationinvariance,thecoordinatesofthedescriptorandthegradientorientationsarerotatedrelativetothekeypointorientation.Forefciency,thegradientsareprecomputedforalllevelsofthepyramidasdescribedinSection5.TheseareillustratedwithsmallarrowsateachsamplelocationontheleftsideofFigure7.AGaussianweightingfunctionwithequaltoonehalfthewidthofthedescriptorwin-dowisusedtoassignaweighttothemagnitudeofeachsamplepoint.ThisisillustratedwithacircularwindowontheleftsideofFigure7,although,ofcourse,theweightfallsoffsmoothly.ThepurposeofthisGaussianwindowistoavoidsuddenchangesinthedescriptorwithsmallchangesinthepositionofthewindow,andtogivelessemphasistogradientsthatarefarfromthecenterofthedescriptor,asthesearemostaffectedbymisregistrationerrors.ThekeypointdescriptorisshownontherightsideofFigure7.Itallowsforsignicantshiftingradientpositionsbycreatingorientationhistogramsover4x4sampleregions.Thegureshowseightdirectionsforeachorientationhistogram,withthelengthofeacharrowcorrespondingtothemagnitudeofthathistogramentry.Agradientsampleontheleftcanshiftupto4samplepositionswhilestillcontributingtothesamehistogramontheright,therebyachievingtheobjectiveofallowingforlargerlocalpositionalshifts.Itisimportanttoavoidallboundaryaffectsinwhichthedescriptorabruptlychangesasasampleshiftssmoothlyfrombeingwithinonehistogramtoanotherorfromoneorientationtoanother.Therefore,trilinearinterpolationisusedtodistributethevalueofeachgradientsampleintoadjacenthistogrambins.Inotherwords,eachentryintoabinismultipliedbyaweightof1dforeachdimension,wheredisthedistanceofthesamplefromthecentralvalueofthebinasmeasuredinunitsofthehistogrambinspacing.15 Thedescriptorisformedfromavectorcontainingthevaluesofalltheorientationhis-togramentries,correspondingtothelengthsofthearrowsontherightsideofFigure7.Thegureshowsa2x2arrayoforientationhistograms,whereasourexperimentsbelowshowthatthebestresultsareachievedwitha4x4arrayofhistogramswith8orientationbinsineach.Therefore,theexperimentsinthispaperusea4x4x8=128elementfeaturevectorforeachkeypoint.Finally,thefeaturevectorismodiedtoreducetheeffectsofilluminationchange.First,thevectorisnormalizedtounitlength.Achangeinimagecontrastinwhicheachpixelvalueismultipliedbyaconstantwillmultiplygradientsbythesameconstant,sothiscontrastchangewillbecancele

16 dbyvectornormalization.Abrightnesschange
dbyvectornormalization.Abrightnesschangeinwhichaconstantisaddedtoeachimagepixelwillnotaffectthegradientvalues,astheyarecomputedfrompixeldifferences.Therefore,thedescriptorisinvarianttoafnechangesinillumination.However,non-linearilluminationchangescanalsooccurduetocamerasaturationorduetoilluminationchangesthataffect3Dsurfaceswithdifferingorientationsbydifferentamounts.Theseeffectscancausealargechangeinrelativemagnitudesforsomegradients,butarelesslikelytoaffectthegradientorientations.Therefore,wereducetheinuenceoflargegradientmagnitudesbythresholdingthevaluesintheunitfeaturevectortoeachbenolargerthan0.2,andthenrenormalizingtounitlength.Thismeansthatmatchingthemagnitudesforlargegradientsisnolongerasimportant,andthatthedistributionoforientationshasgreateremphasis.Thevalueof0.2wasdeterminedexperimentallyusingimagescontainingdifferingilluminationsforthesame3Dobjects.6.2DescriptortestingTherearetwoparametersthatcanbeusedtovarythecomplexityofthedescriptor:thenumberoforientations,,inthehistograms,andthewidth,n,ofthennarrayoforientationhistograms.Thesizeoftheresultingdescriptorvectorisrn2.Asthecomplexityofthedescriptorgrows,itwillbeabletodiscriminatebetterinalargedatabase,butitwillalsobemoresensitivetoshapedistortionsandocclusion.Figure8showsexperimentalresultsinwhichthenumberoforientationsandsizeofthedescriptorwerevaried.Thegraphwasgeneratedforaviewpointtransformationinwhichaplanarsurfaceistiltedby50degreesawayfromtheviewerand4%imagenoiseisadded.Thisisnearthelimitsofreliablematching,asitisinthesemoredifcultcasesthatdescriptorperformanceismostimportant.Theresultsshowthepercentofkeypointsthatndacorrectmatchtothesingleclosestneighboramongadatabaseof40,000keypoints.Thegraphshowsthatasingleorientationhistogram(n=1)isverypooratdiscriminating,buttheresultscontinuetoimproveuptoa4x4arrayofhistogramswith8orientations.Afterthat,addingmoreorientationsoralargerdescriptorcanactuallyhurtmatchingbymakingthedescriptormoresensitivetodistortion.Theseresultswerebroadlysimilarforotherdegreesofview-pointchangeandnoise,althoughinsomesimplercasesdiscriminationcontinuedtoimprove(fromalreadyhighlevels)with5x5andhigherdescriptorsizes.Throughoutthispaperweusea4x4descriptorwith8orientations,resultinginfeaturevectorswith128dimensions.Whilethedimensionalityofthedescriptormayseemhigh,wehavefoundthatitconsistentlyperformsbetterthanlower-dimensionaldescriptorsonarangeofmatchingtasksandtha

17 tthecomputationalcostofmatchingremainslo
tthecomputationalcostofmatchingremainslowwhenusingtheapproximatenearest-neighbormethodsdescribedbelow.16 0 10 20 30 40 50 60 1 2 3 4 5 Correct nearest descriptor (%) Width n of descriptor (angle 50 deg, noise 4%)With 16 orientations With 8 orientations With 4 orientations Figure8:Thisgraphshowsthepercentofkeypointsgivingthecorrectmatchtoadatabaseof40,000keypointsasafunctionofwidthofthennkeypointdescriptorandthenumberoforientationsineachhistogram.Thegraphiscomputedforimageswithafneviewpointchangeof50degreesandadditionof4%noise.6.3SensitivitytoafnechangeThesensitivityofthedescriptortoafnechangeisexaminedinFigure9.Thegraphshowsthereliabilityofkeypointlocationandscaleselection,orientationassignment,andnearest-neighbormatchingtoadatabaseasafunctionofrotationindepthofaplaneawayfromaviewer.Itcanbeseenthateachstageofcomputationhasreducedrepeatabilitywithincreas-ingafnedistortion,butthatthenalmatchingaccuracyremainsabove50%outtoa50degreechangeinviewpoint.Toachievereliablematchingoverawiderviewpointangle,oneoftheafne-invariantdetectorscouldbeusedtoselectandresampleimageregions,asdiscussedinSection2.Asmentionedthere,noneoftheseapproachesistrulyafne-invariant,astheyallstartfrominitialfeaturelocationsdeterminedinanon-afne-invariantmanner.Inwhatappearstobethemostafne-invariantmethod,Mikolajczyk(2002)hasproposedandrundetailedexperimentswiththeHarris-afnedetector.Hefoundthatitskeypointrepeatabilityisbelowthatgivenhereouttoabouta50degreeviewpointangle,butthatitthenretainscloseto40%repeatabilityouttoanangleof70degrees,whichprovidesbetterperformanceforextremeafnechanges.Thedisadvantagesareamuchhighercomputationalcost,areductioninthenumberofkeypoints,andpoorerstabilityforsmallafnechangesduetoerrorsinassigningaconsistentafneframeundernoise.Inpractice,theallowablerangeofrotationfor3Dobjectsisconsiderablylessthanforplanarsurfaces,soafneinvarianceisusuallynotthelimitingfactorintheabilitytomatchacrossviewpointchange.Ifawiderangeofafneinvarianceisdesired,suchasforasurfacethatisknowntobeplanar,thenasimplesolutionistoadopttheapproachofPritchardandHeidrich(2003)inwhichadditionalSIFTfeaturesaregeneratedfrom4afne-transformedversionsofthetrainingimagecorrespondingto60degreeviewpointchanges.ThisallowsfortheuseofstandardSIFTfeatureswithnoadditionalcostwhenprocessingtheimagetoberecognized,butresultsinanincreaseinthesizeofthefeaturedatabasebyafactorof3.17 0

18 20 40 60 80 100 0 10 20 30 40
20 40 60 80 100 0 10 20 30 40 50 Repeatability (%) Viewpoint angle (degrees)Matching location and scale Matching location, scale, and orientation Nearest descriptor in database Figure9:Thisgraphshowsthestabilityofdetectionforkeypointlocation,orientation,andnalmatchingtoadatabaseasafunctionofafnedistortion.Thedegreeofafnedistortionisexpressedintermsoftheequivalentviewpointrotationindepthforaplanarsurface.6.4MatchingtolargedatabasesAnimportantremainingissueformeasuringthedistinctivenessoffeaturesishowthere-liabilityofmatchingvariesasafunctionofthenumberoffeaturesinthedatabasebeingmatched.Mostoftheexamplesinthispaperaregeneratedusingadatabaseof32imageswithabout40,000keypoints.Figure10showshowthematchingreliabilityvariesasafunc-tionofdatabasesize.Thisgurewasgeneratedusingalargerdatabaseof112images,withaviewpointdepthrotationof30degreesand2%imagenoiseinadditiontotheusualrandomimagerotationandscalechange.Thedashedlineshowstheportionofimagefeaturesforwhichthenearestneighborinthedatabasewasthecorrectmatch,asafunctionofdatabasesizeshownonalogarithmicscale.Theleftmostpointismatchingagainstfeaturesfromonlyasingleimagewhiletherightmostpointisselectingmatchesfromadatabaseofallfeaturesfromthe112images.Itcanbeseenthatmatchingreliabilitydoesdecreaseasafunctionofthenumberofdistractors,yetallindicationsarethatmanycorrectmatcheswillcontinuetobefoundouttoverylargedatabasesizes.Thesolidlineisthepercentageofkeypointsthatwereidentiedatthecorrectmatch-inglocationandorientationinthetransformedimage,soitisonlythesepointsthathaveanychanceofhavingmatchingdescriptorsinthedatabase.Thereasonthislineisatisthatthetestwasrunoverthefulldatabaseforeachvalue,whileonlyvaryingtheportionofthedatabaseusedfordistractors.Itisofinterestthatthegapbetweenthetwolinesissmall,indicatingthatmatchingfailuresareduemoretoissueswithinitialfeaturelocalizationandorientationassignmentthantoproblemswithfeaturedistinctiveness,evenouttolargedatabasesizes.18 0 20 40 60 80 100 1000 10000 100000 Repeatability (%) Number of keypoints in database (log scale)Matching location, scale, and orientation Nearest descriptor in database Figure10:Thedashedlineshowsthepercentofkeypointscorrectlymatchedtoadatabaseasafunctionofdatabasesize(usingalogarithmicscale).Thesolidlineshowsthepercentofkeypointsassignedthecorrectlocation,scale,andorientation.Imageshadrandomscaleandrotationchanges,anafnetransfo

19 rmof30degrees,andimagenoiseof2%addedprio
rmof30degrees,andimagenoiseof2%addedpriortomatching.7ApplicationtoobjectrecognitionThemajortopicofthispaperisthederivationofdistinctiveinvariantkeypoints,asdescribedabove.Todemonstratetheirapplication,wewillnowgiveabriefdescriptionoftheiruseforobjectrecognitioninthepresenceofclutterandocclusion.Moredetailsonapplicationsofthesefeaturestorecognitionareavailableinotherpapers(Lowe,1999;Lowe,2001;Se,LoweandLittle,2002).Objectrecognitionisperformedbyrstmatchingeachkeypointindependentlytothedatabaseofkeypointsextractedfromtrainingimages.Manyoftheseinitialmatcheswillbeincorrectduetoambiguousfeaturesorfeaturesthatarisefrombackgroundclutter.Therefore,clustersofatleast3featuresarerstidentiedthatagreeonanobjectanditspose,astheseclustershaveamuchhigherprobabilityofbeingcorrectthanindividualfeaturematches.Then,eachclusterischeckedbyperformingadetailedgeometricttothemodel,andtheresultisusedtoacceptorrejecttheinterpretation.7.1KeypointmatchingThebestcandidatematchforeachkeypointisfoundbyidentifyingitsnearestneighborinthedatabaseofkeypointsfromtrainingimages.ThenearestneighborisdenedasthekeypointwithminimumEuclideandistancefortheinvariantdescriptorvectoraswasdescribedinSection6.However,manyfeaturesfromanimagewillnothaveanycorrectmatchinthetrainingdatabasebecausetheyarisefrombackgroundclutterorwerenotdetectedinthetrainingim-ages.Therefore,itwouldbeusefultohaveawaytodiscardfeaturesthatdonothaveanygoodmatchtothedatabase.Aglobalthresholdondistancetotheclosestfeaturedoesnotperformwell,assomedescriptorsaremuchmorediscriminativethanothers.Amoreef-fectivemeasureisobtainedbycomparingthedistanceoftheclosestneighbortothatofthe19 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 PDF Ratio of distances (closest/next closest)PDF for correct matches PDF for incorrect matches Figure11:Theprobabilitythatamatchiscorrectcanbedeterminedbytakingtheratioofdistancefromtheclosestneighbortothedistanceofthesecondclosest.Usingadatabaseof40,000keypoints,thesolidlineshowsthePDFofthisratioforcorrectmatches,whilethedottedlineisformatchesthatwereincorrect.second-closestneighbor.Iftherearemultipletrainingimagesofthesameobject,thenwedenethesecond-closestneighborasbeingtheclosestneighborthatisknowntocomefromadifferentobjectthantherst,suchasbyonlyusingimagesknowntocontaindifferentob-jects.Thismeasureperformswellbecausecorrectmatchesneedtohavetheclosestn

20 eighborsignicantlycloserthantheclosesti
eighborsignicantlycloserthantheclosestincorrectmatchtoachievereliablematching.Forfalsematches,therewilllikelybeanumberofotherfalsematcheswithinsimilardistancesduetothehighdimensionalityofthefeaturespace.Wecanthinkofthesecond-closestmatchasprovidinganestimateofthedensityoffalsematcheswithinthisportionofthefeaturespaceandatthesametimeidentifyingspecicinstancesoffeatureambiguity.Figure11showsthevalueofthismeasureforrealimagedata.Theprobabilitydensityfunctionsforcorrectandincorrectmatchesareshownintermsoftheratioofclosesttosecond-closestneighborsofeachkeypoint.MatchesforwhichthenearestneighborwasacorrectmatchhaveaPDFthatiscenteredatamuchlowerratiothanthatforincorrectmatches.Forourobjectrecognitionimplementation,werejectallmatchesinwhichthedistanceratioisgreaterthan0.8,whicheliminates90%ofthefalsematcheswhilediscardinglessthan5%ofthecorrectmatches.Thisgurewasgeneratedbymatchingimagesfollowingrandomscaleandorientationchange,adepthrotationof30degrees,andadditionof2%imagenoise,againstadatabaseof40,000keypoints.7.2EfcientnearestneighborindexingNoalgorithmsareknownthatcanidentifytheexactnearestneighborsofpointsinhighdi-mensionalspacesthatareanymoreefcientthanexhaustivesearch.Ourkeypointdescriptorhasa128-dimensionalfeaturevector,andthebestalgorithms,suchasthek-dtree(Friedmanetal.,1977)providenospeedupoverexhaustivesearchformorethanabout10dimensionalspaces.Therefore,wehaveusedanapproximatealgorithm,calledtheBest-Bin-First(BBF)algorithm(BeisandLowe,1997).Thisisapproximateinthesensethatitreturnstheclosest20 neighborwithhighprobability.TheBBFalgorithmusesamodiedsearchorderingforthek-dtreealgorithmsothatbinsinfeaturespacearesearchedintheorderoftheirclosestdistancefromthequerylocation.ThisprioritysearchorderwasrstexaminedbyAryaandMount(1993),andtheyprovidefurtherstudyofitscomputationalpropertiesin(Aryaetal.,1998).Thissearchorderrequirestheuseofaheap-basedpriorityqueueforefcientdeterminationofthesearchorder.Anapproximateanswercanbereturnedwithlowcostbycuttingofffurthersearchafteraspecicnumberofthenearestbinshavebeenexplored.Inourimplementation,wecutoffsearchaftercheckingtherst200nearest-neighborcandidates.Foradatabaseof100,000keypoints,thisprovidesaspeedupoverexactnearestneighborsearchbyabout2ordersofmagnitudeyetresultsinlessthana5%lossinthenumberofcorrectmatches.OnereasontheBBFalgorithmworksparticularlywellforthisproblemisthatweonlyconsidermatchesinwhichthen

21 earestneighborislessthan0.8timesthedista
earestneighborislessthan0.8timesthedistancetothesecond-nearestneighbor(asdescribedintheprevioussection),andthereforethereisnoneedtoexactlysolvethemostdifcultcasesinwhichmanyneighborsareatverysimilardistances.7.3ClusteringwiththeHoughtransformTomaximizetheperformanceofobjectrecognitionforsmallorhighlyoccludedobjects,wewishtoidentifyobjectswiththefewestpossiblenumberoffeaturematches.Wehavefoundthatreliablerecognitionispossiblewithasfewas3features.Atypicalimagecontains2,000ormorefeatureswhichmaycomefrommanydifferentobjectsaswellasbackgroundclutter.WhilethedistanceratiotestdescribedinSection7.1willallowustodiscardmanyofthefalsematchesarisingfrombackgroundclutter,thisdoesnotremovematchesfromothervalidobjects,andweoftenstillneedtoidentifycorrectsubsetsofmatchescontaininglessthan1%inliersamong99%outliers.Manywell-knownrobustttingmethods,suchasRANSACorLeastMedianofSquares,performpoorlywhenthepercentofinliersfallsmuchbelow50%.Fortunately,muchbetterperformancecanbeobtainedbyclusteringfeaturesinposespaceusingtheHoughtransform(Hough,1962;Ballard,1981;Grimson1990).TheHoughtransformidentiesclustersoffeatureswithaconsistentinterpretationbyusingeachfeaturetovoteforallobjectposesthatareconsistentwiththefeature.Whenclustersoffeaturesarefoundtovoteforthesameposeofanobject,theprobabilityoftheinterpretationbeingcorrectismuchhigherthanforanysinglefeature.Eachofourkeypointsspecies4parameters:2Dlocation,scale,andorientation,andeachmatchedkeypointinthedatabasehasarecordofthekeypoint'sparametersrelativetothetrainingimageinwhichitwasfound.Therefore,wecancreateaHoughtransformentrypredictingthemodellocation,orientation,andscalefromthematchhypothesis.Thispredictionhaslargeerrorbounds,asthesimilaritytransformimpliedbythese4parametersisonlyanapproximationtothefull6degree-of-freedomposespacefora3Dobjectandalsodoesnotaccountforanynon-rigiddeformations.Therefore,weusebroadbinsizesof30degreesfororientation,afactorof2forscale,and0.25timesthemaximumprojectedtrainingimagedimension(usingthepredictedscale)forlocation.Toavoidtheproblemofboundaryeffectsinbinassignment,eachkeypointmatchvotesforthe2closestbinsineachdimension,givingatotalof16entriesforeachhypothesisandfurtherbroadeningtheposerange.InmostimplementationsoftheHoughtransform,amulti-dimensionalarrayisusedtorepresentthebins.However,manyofthepotentialbinswillremainempty,anditisdifculttocomputetherangeofpossiblebinvaluesduetotheirmu

22 tualdependence(forexample,21 thedependen
tualdependence(forexample,21 thedependencyoflocationdiscretizationontheselectedscale).Theseproblemscanbeavoidedbyusingapseudo-randomhashfunctionofthebinvaluestoinsertvotesintoaone-dimensionalhashtable,inwhichcollisionsareeasilydetected.7.4SolutionforafneparametersTheHoughtransformisusedtoidentifyallclusterswithatleast3entriesinabin.Eachsuchclusteristhensubjecttoageometricvericationprocedureinwhichaleast-squaressolutionisperformedforthebestafneprojectionparametersrelatingthetrainingimagetothenewimage.Anafnetransformationcorrectlyaccountsfor3Drotationofaplanarsurfaceunderorthographicprojection,buttheapproximationcanbepoorfor3Drotationofnon-planarobjects.Amoregeneralsolutionwouldbetosolveforthefundamentalmatrix(LuongandFaugeras,1996;HartleyandZisserman,2000).However,afundamentalmatrixsolutionrequiresatleast7pointmatchesascomparedtoonly3fortheafnesolutionandinpracticerequiresevenmorematchesforgoodstability.Wewouldliketoperformrecognitionwithasfewas3featurematches,sotheafnesolutionprovidesabetterstartingpointandwecanaccountforerrorsintheafneapproximationbyallowingforlargeresidualerrors.Ifweimagineplacingaspherearoundanobject,thenrotationofthesphereby30degreeswillmovenopointwithinthespherebymorethan0.25timestheprojecteddiameterofthesphere.Fortheexamplesoftypical3Dobjectsusedinthispaper,anafnesolutionworkswellgiventhatweallowresidualerrorsupto0.25timesthemaximumprojecteddimensionoftheobject.Amoregeneralapproachisgivenin(BrownandLowe,2002),inwhichtheinitialsolutionisbasedonasimilaritytransform,whichthenprogressestosolutionforthefundamentalmatrixinthosecasesinwhichasufcientnumberofmatchesarefound.Theafnetransformationofamodelpoint[xy]Ttoanimagepoint[uv]Tcanbewrittenas"#="m1m2m3m4#"xy#+"txty#wherethemodeltranslationis[txty]Tandtheafnerotation,scale,andstretcharerepre-sentedbythemiparameters.Wewishtosolveforthetransformationparameters,sotheequationabovecanberewrit-tentogathertheunknownsintoacolumnvector:26664xy001000xy01::::::37775266666664m1m2m3m4txty377777775=264...375Thisequationshowsasinglematch,butanynumberoffurthermatchescanbeadded,witheachmatchcontributingtwomorerowstotherstandlastmatrix.Atleast3matchesareneededtoprovideasolution.WecanwritethislinearsystemasAx=22 Figure12:Thetrainingimagesfortwoobjectsareshownontheleft.Thesecanberecognizedinaclutteredimagewithextensiveocclusion,showninthemiddle.Theresultsofrecognitionareshownontheright.Aparall

23 elogramisdrawnaroundeachrecognizedobject
elogramisdrawnaroundeachrecognizedobjectshowingtheboundariesoftheoriginaltrainingimageundertheafnetransformationsolvedforduringrecognition.Smallersquaresindicatethekeypointsthatwereusedforrecognition.Theleast-squaressolutionfortheparametersxcanbedeterminedbysolvingthecorrespond-ingnormalequations,x=[T]1T;whichminimizesthesumofthesquaresofthedistancesfromtheprojectedmodellocationstothecorrespondingimagelocations.Thisleast-squaresapproachcouldreadilybeextendedtosolvingfor3Dposeandinternalparametersofarticulatedandexibleobjects(Lowe,1991).Outlierscannowberemovedbycheckingforagreementbetweeneachimagefeatureandthemodel.Giventhemoreaccurateleast-squaressolution,wenowrequireeachmatchtoagreewithinhalftheerrorrangethatwasusedfortheparametersintheHoughtransformbins.Iffewerthan3pointsremainafterdiscardingoutliers,thenthematchisrejected.Asoutliersarediscarded,theleast-squaressolutionisre-solvedwiththeremainingpoints,andtheprocessiterated.Inaddition,atop-downmatchingphaseisusedtoaddanyfurthermatchesthatagreewiththeprojectedmodelposition.ThesemayhavebeenmissedfromtheHoughtransformbinduetothesimilaritytransformapproximationorothererrors.Thenaldecisiontoacceptorrejectamodelhypothesisisbasedonadetailedprobabilis-ticmodelgiveninapreviouspaper(Lowe,2001).Thismethodrstcomputestheexpectednumberoffalsematchestothemodelpose,giventheprojectedsizeofthemodel,thenumberoffeatureswithintheregion,andtheaccuracyofthet.ABayesiananalysisthengivestheprobabilitythattheobjectispresentbasedontheactualnumberofmatchingfeaturesfound.Weacceptamodelifthenalprobabilityforacorrectinterpretationisgreaterthan0.98.Forobjectsthatprojecttosmallregionsofanimage,3featuresmaybesufcientforreli-ablerecognition.Forlargeobjectscoveringmostofaheavilytexturedimage,theexpectednumberoffalsematchesishigher,andasmanyas10featurematchesmaybenecessary.8RecognitionexamplesFigure12showsanexampleofobjectrecognitionforaclutteredandoccludedimagecon-taining3Dobjects.Thetrainingimagesofatoytrainandafrogareshownontheleft.23 Figure13:Thisexampleshowslocationrecognitionwithinacomplexscene.Thetrainingimagesforlocationsareshownattheupperleftandthe640x315pixeltestimagetakenfromadifferentviewpointisontheupperright.Therecognizedregionsareshownonthelowerimage,withkeypointsshownassquaresandanouterparallelogramshowingtheboundariesofthetrainingimagesundertheafnetransformusedforrecognition.Themiddleimage(ofsize600x480pixels)c

24 ontainsinstancesoftheseobjectshiddenbehi
ontainsinstancesoftheseobjectshiddenbehindothersandwithextensivebackgroundcluttersothatdetectionoftheobjectsmaynotbeim-mediateevenforhumanvision.Theimageontherightshowsthenalcorrectidenticationsuperimposedonareducedcontrastversionoftheimage.Thekeypointsthatwereusedforrecognitionareshownassquareswithanextralinetoindicateorientation.Thesizesofthesquarescorrespondtotheimageregionsusedtoconstructthedescriptor.Anouterparallel-ogramisalsodrawnaroundeachinstanceofrecognition,withitssidescorrespondingtotheboundariesofthetrainingimagesprojectedunderthenalafnetransformationdeterminedduringrecognition.Anotherpotentialapplicationoftheapproachistoplacerecognition,inwhichamobiledeviceorvehiclecouldidentifyitslocationbyrecognizingfamiliarlocations.Figure13givesanexampleofthisapplication,inwhichtrainingimagesaretakenofanumberoflocations.Asshownontheupperleft,thesecanevenbeofsuchseeminglynon-distinctiveitemsasawoodenwalloratreewithtrashbins.Thetestimage(ofsize640by315pixels)ontheupperrightwastakenfromaviewpointrotatedabout30degreesaroundthescenefromtheoriginalpositions,yetthetrainingimagelocationsareeasilyrecognized.24 Allstepsoftherecognitionprocesscanbeimplementedefciently,sothetotaltimetorecognizeallobjectsinFigures12or13islessthan0.3secondsona2GHzPentium4processor.Wehaveimplementedthesealgorithmsonalaptopcomputerwithattachedvideocamera,andhavetestedthemextensivelyoverawiderangeofconditions.Ingeneral,texturedplanarsurfacescanbeidentiedreliablyoverarotationindepthofupto50degreesinanydirectionandunderalmostanyilluminationconditionsthatprovidesufcientlightanddonotproduceexcessiveglare.For3Dobjects,therangeofrotationindepthforreliablerecognitionisonlyabout30degreesinanydirectionandilluminationchangeismoredisruptive.Forthesereasons,3Dobjectrecognitionisbestperformedbyintegratingfeaturesfrommultipleviews,suchaswithlocalfeatureviewclustering(Lowe,2001).Thesekeypointshavealsobeenappliedtotheproblemofrobotlocalizationandmap-ping,whichhasbeenpresentedindetailinotherpapers(Se,LoweandLittle,2001).Inthisapplication,atrinocularstereosystemisusedtodetermine3Destimatesforkeypointloca-tions.Keypointsareusedonlywhentheyappearinall3imageswithconsistentdisparities,resultinginveryfewoutliers.Astherobotmoves,itlocalizesitselfusingfeaturematchestotheexisting3Dmap,andthenincrementallyaddsfeaturestothemapwhileupdatingtheir3DpositionsusingaKalmanlter.Thisprovidesarobustandaccurateso

25 lutiontotheproblemofrobotlocalizationinu
lutiontotheproblemofrobotlocalizationinunknownenvironments.Thisworkhasalsoaddressedtheproblemofplacerecognition,inwhicharobotcanbeswitchedonandrecognizeitslocationanywherewithinalargemap(Se,LoweandLittle,2002),whichisequivalenttoa3Dimplementationofobjectrecognition.9ConclusionsTheSIFTkeypointsdescribedinthispaperareparticularlyusefulduetotheirdistinctive-ness,whichenablesthecorrectmatchforakeypointtobeselectedfromalargedatabaseofotherkeypoints.Thisdistinctivenessisachievedbyassemblingahigh-dimensionalvectorrepresentingtheimagegradientswithinalocalregionoftheimage.Thekeypointshavebeenshowntobeinvarianttoimagerotationandscaleandrobustacrossasubstantialrangeofafnedistortion,additionofnoise,andchangeinillumination.Largenumbersofkeypointscanbeextractedfromtypicalimages,whichleadstorobustnessinextractingsmallobjectsamongclutter.Thefactthatkeypointsaredetectedoveracompleterangeofscalesmeansthatsmalllocalfeaturesareavailableformatchingsmallandhighlyoccludedobjects,whilelargekeypointsperformwellforimagessubjecttonoiseandblur.Theircomputationisefcient,sothatseveralthousandkeypointscanbeextractedfromatypicalimagewithnearreal-timeperformanceonstandardPChardware.Thispaperhasalsopresentedmethodsforusingthekeypointsforobjectrecognition.Theapproachwehavedescribedusesapproximatenearest-neighborlookup,aHoughtransformforidentifyingclustersthatagreeonobjectpose,least-squaresposedetermination,and-nalverication.Otherpotentialapplicationsincludeviewmatchingfor3Dreconstruction,motiontrackingandsegmentation,robotlocalization,imagepanoramaassembly,epipolarcalibration,andanyothersthatrequireidenticationofmatchinglocationsbetweenimages.Therearemanydirectionsforfurtherresearchinderivinginvariantanddistinctiveimagefeatures.Systematictestingisneededondatasetswithfull3Dviewpointandilluminationchanges.Thefeaturesdescribedinthispaperuseonlyamonochromeintensityimage,sofur-therdistinctivenesscouldbederivedfromincludingillumination-invariantcolordescriptors25 (FuntandFinlayson,1995;BrownandLowe,2002).Similarly,localtexturemeasuresappeartoplayanimportantroleinhumanvisionandcouldbeincorporatedintofeaturedescriptorsinamoregeneralformthanthesinglespatialfrequencyusedbythecurrentdescriptors.Anattractiveaspectoftheinvariantlocalfeatureapproachtomatchingisthatthereisnoneedtoselectjustonefeaturetype,andthebestresultsarelikelytobeobtainedbyusingmanydifferentfeatures,allofwhichcancontributeus

26 efulmatchesandimproveoverallrobustness.A
efulmatchesandimproveoverallrobustness.Anotherdirectionforfutureresearchwillbetoindividuallylearnfeaturesthataresuitedtorecognizingparticularobjectscategories.Thiswillbeparticularlyimportantforgenericobjectclassesthatmustcoverabroadrangeofpossibleappearances.TheresearchofWe-ber,Welling,andPerona(2000)andFergus,Perona,andZisserman(2003)hasshownthepotentialofthisapproachbylearningsmallsetsoflocalfeaturesthataresuitedtorecogniz-inggenericclassesofobjects.Inthelongterm,featuresetsarelikelytocontainbothpriorandlearnedfeaturesthatwillbeusedaccordingtotheamountoftrainingdatathathasbeenavailableforvariousobjectclasses.AcknowledgmentsIwouldparticularlyliketothankMatthewBrown,whohassuggestednumerousimprovementstoboththecontentandpresentationofthispaperandwhoseownworkonfeaturelocalizationandinvariancehascontributedtothisapproach.Inaddition,Iwouldliketothankmanyothersfortheirvaluablesuggestions,includingStephenSe,JimLittle,KrystianMikolajczyk,CordeliaSchmid,TonyLinde-berg,andAndrewZisserman.ThisresearchwassupportedbytheNaturalSciencesandEngineeringResearchCouncilofCanada(NSERC)andthroughtheInstituteforRoboticsandIntelligentSystems(IRIS)NetworkofCentresofExcellence.ReferencesArya,S.,andMount,D.M.1993.Approximatenearestneighborqueriesinxeddimensions.InFourthAnnualACM-SIAMSymposiumonDiscreteAlgorithms(SODA'93),pp.271-280.Arya,S.,Mount,D.M.,Netanyahu,N.S.,Silverman,R.,andWu,A.Y.1998.Anoptimalalgorithmforapproximatenearestneighborsearching.JournaloftheACM,45:891-923.Ballard,D.H.1981.GeneralizingtheHoughtransformtodetectarbitrarypatterns.PatternRecogni-tion,13(2):111-122.Basri,R.,andJacobs,D.W.1997.Recognitionusingregioncorrespondences.InternationalJournalofComputerVision,25(2):145-166.Baumberg,A.2000.Reliablefeaturematchingacrosswidelyseparatedviews.InConferenceonComputerVisionandPatternRecognition,HiltonHead,SouthCarolina,pp.774-781.Beis,J.andLowe,D.G.1997.Shapeindexingusingapproximatenearest-neighboursearchinhigh-dimensionalspaces.InConferenceonComputerVisionandPatternRecognition,PuertoRico,pp.1000-1006.Brown,M.andLowe,D.G.2002.Invariantfeaturesfrominterestpointgroups.InBritishMachineVisionConference,Cardiff,Wales,pp.656-665.Carneiro,G.,andJepson,A.D.2002.Phase-basedlocalfeatures.InEuropeanConferenceonCom-puterVision(ECCV),Copenhagen,Denmark,pp.282-296.Crowley,J.L.andParker,A.C.1984.Arepresentationforshapebasedonpeaksandridgesinthedifferenceoflow-passtransform.IEEE

27 Trans.onPatternAnalysisandMachineIntelli
Trans.onPatternAnalysisandMachineIntelligence,6(2):156-170.26 Edelman,S.,Intrator,N.andPoggio,T.1997.Complexcellsandobjectrecognition.Unpublishedmanuscript:http://kybele.psych.cornell.edu/edelman/archive.htmlFergus,R.,Perona,P.,andZisserman,A.2003.Objectclassrecognitionbyunsupervisedscale-invariantlearning.InIEEEConferenceonComputerVisionandPatternRecognition,Madison,Wisconsin,pp.264-271.Friedman,J.H.,Bentley,J.L.andFinkel,R.A.1977.Analgorithmforndingbestmatchesinloga-rithmicexpectedtime.ACMTransactionsonMathematicalSoftware,3(3):209-226.Funt,B.V.andFinlayson,G.D.1995.Colorconstantcolorindexing.IEEETrans.onPatternAnalysisandMachineIntelligence,17(5):522-529.Grimson,E.1990.ObjectRecognitionbyComputer:TheRoleofGeometricConstraints,TheMITPress:Cambridge,MA.Harris,C.1992.Geometryfromvisualmotion.InActiveVision,A.BlakeandA.Yuille(Eds.),MITPress,pp.263-284.Harris,C.andStephens,M.1988.Acombinedcornerandedgedetector.InFourthAlveyVisionConference,Manchester,UK,pp.147-151.Hartley,R.andZisserman,A.2000.Multipleviewgeometryincomputervision,CambridgeUniver-sityPress:Cambridge,UK.Hough,P.V.C.1962.Methodandmeansforrecognizingcomplexpatterns.U.S.Patent3069654.Koenderink,J.J.1984.Thestructureofimages.BiologicalCybernetics,50:363-396.Lindeberg,T.1993.Detectingsalientblob-likeimagestructuresandtheirscaleswithascale-spaceprimalsketch:amethodforfocus-of-attention.InternationalJournalofComputerVision,11(3):283-318.Lindeberg,T.1994.Scale-spacetheory:Abasictoolforanalysingstructuresatdifferentscales.JournalofAppliedStatistics,21(2):224-270.Lowe,D.G.1991.Fittingparameterizedthree-dimensionalmodelstoimages.IEEETrans.onPatternAnalysisandMachineIntelligence,13(5):441-450.Lowe,D.G.1999.Objectrecognitionfromlocalscale-invariantfeatures.InInternationalConferenceonComputerVision,Corfu,Greece,pp.1150-1157.Lowe,D.G.2001.Localfeatureviewclusteringfor3Dobjectrecognition.IEEEConferenceonComputerVisionandPatternRecognition,Kauai,Hawaii,pp.682-688.Luong,Q.T.,andFaugeras,O.D.1996.Thefundamentalmatrix:Theory,algorithms,andstabilityanalysis.InternationalJournalofComputerVision,17(1):43-76.Matas,J.,Chum,O.,Urban,M.,andPajdla,T.2002.Robustwidebaselinestereofrommaximallystableextremalregions.InBritishMachineVisionConference,Cardiff,Wales,pp.384-393.Mikolajczyk,K.2002.Detectionoflocalfeaturesinvarianttoafnetransformations,Ph.D.thesis,InstitutNationalPolytechniquedeGrenoble,France.Mikolajczyk,K

28 .,andSchmid,C.2002.Anafneinvariantinter
.,andSchmid,C.2002.Anafneinvariantinterestpointdetector.InEuropeanConferenceonComputerVision(ECCV),Copenhagen,Denmark,pp.128-142.Mikolajczyk,K.,Zisserman,A.,andSchmid,C.2003.Shaperecognitionwithedge-basedfeatures.InProceedingsoftheBritishMachineVisionConference,Norwich,U.K.Moravec,H.1981.Rovervisualobstacleavoidance.InInternationalJointConferenceonArticialIntelligence,Vancouver,Canada,pp.785-790.Nelson,R.C.,andSelinger,A.1998.Large-scaletestsofakeyed,appearance-based3-Dobjectrecognitionsystem.VisionResearch,38(15):2469-88.Pope,A.R.,andLowe,D.G.2000.Probabilisticmodelsofappearancefor3-Dobjectrecognition.InternationalJournalofComputerVision,40(2):149-167.27 Pritchard,D.,andHeidrich,W.2003.Clothmotioncapture.ComputerGraphicsForum(Eurographics2003),22(3):263-271.Schaffalitzky,F.,andZisserman,A.2002.Multi-viewmatchingforunorderedimagesets,or`HowdoIorganizemyholidaysnaps?'InEuropeanConferenceonComputerVision,Copenhagen,Denmark,pp.414-431.Schiele,B.,andCrowley,J.L.2000.Recognitionwithoutcorrespondenceusingmultidimensionalreceptiveeldhistograms.InternationalJournalofComputerVision,36(1):31-50.Schmid,C.,andMohr,R.1997.Localgrayvalueinvariantsforimageretrieval.IEEETrans.onPatternAnalysisandMachineIntelligence,19(5):530-534.Se,S.,Lowe,D.G.,andLittle,J.2001.Vision-basedmobilerobotlocalizationandmappingusingscale-invariantfeatures.InInternationalConferenceonRoboticsandAutomation,Seoul,Korea,pp.2051-58.Se,S.,Lowe,D.G.,andLittle,J.2002.Globallocalizationusingdistinctivevisualfeatures.InInternationalConferenceonIntelligentRobotsandSystems,IROS2002,Lausanne,Switzerland,pp.226-231.Shokoufandeh,A.,Marsic,I.,andDickinson,S.J.1999.View-basedobjectrecognitionusingsaliencymaps.ImageandVisionComputing,17:445-460.Torr,P.1995.MotionSegmentationandOutlierDetection,Ph.D.Thesis,Dept.ofEngineeringSci-ence,UniversityofOxford,UK.Tuytelaars,T.,andVanGool,L.2000.Widebaselinestereobasedonlocal,afnelyinvariantregions.InBritishMachineVisionConference,Bristol,UK,pp.412-422.Weber,M.,Welling,M.andPerona,P.2000.Unsupervisedlearningofmodelsforrecognition.InEuropeanConferenceonComputerVision,Dublin,Ireland,pp.18-32.Witkin,A.P.1983.Scale-spaceltering.InInternationalJointConferenceonArticialIntelligence,Karlsruhe,Germany,pp.1019-1022.Zhang,Z.,Deriche,R.,Faugeras,O.,andLuong,Q.T.1995.Arobusttechniqueformatchingtwoun-calibratedimagesthroughtherecoveryoftheunknownepipolargeometry.ArticialIn