Download presentation
1 -

DistinctiveImageFeaturesfromScaleInvariantKeypointsDavidGLoweComputer


1IntroductionImagematchingisafundamentalaspectofmanyproblemsincomputervisionincludingobjectorscenerecognitionsolvingfor3Dstructurefrommultipleimagesstereocorrespon-denceandmotiontrackingThispaperdescr

adah's Recent Documents

nrrnrrnrrnrr
nrrnrrnrrnrr

JUST LIKE A LITTLE BOY Franks mother had always treated him with a great deal of love and attention She faithfully attended all of his school events never missed one of his baseball games fixed him t

published 0K
ContentslistsavailableatIndustrialMarketingManagementjournalhomepageww
ContentslistsavailableatIndustrialMarketingManagementjournalhomepageww

BusinessofbusinessismorethanbusinessManagingduringtheCovidcrisisJagdishShethCharlesHKellstadtProfessorofMarketingGoizuetaBusinessSchoolEmoryUniversityUnitedStatesofAmericaARTICLEINFOPurposeofbusinessS

published 0K
Clear Form
Clear Form

HW7I 2020A 01 VID01ID NO 01FORM HW-7REV 2020STATE OF HAWAII151DEPARTMENT OF TAXATIONEXEMPTION FROM WITHHOLDING ON NONRESIDENTEMPLOYEE146S WAGESTo be 31led by the employerAttach Form HW-6 to this formN

published 0K
Introduction
Introduction

Page 1AANMC Core Competenciesof the Graduating Naturopathic StudentPage 2Tableof ContentsIntroduction4Core Principles6Medical Assessment and Diagnosis7Patient Management8Communication and Collaboratio

published 0K
STATE OF RHODE ISLAND AND  PROVIDENCE  PLANTATIONSDIVISION  OF PUBLIC
STATE OF RHODE ISLAND AND PROVIDENCE PLANTATIONSDIVISION OF PUBLIC

815-RICR-4005-1TITLE - 815 DIVISION OF PUBLIC UTILITIES AND CARRIERSCHAPTER 40 - ELECTRIC UTILITIESSUBCHAPTER 05 - NONREGULATED POWER PRODUCERSPart 1- Rules Applicable to Nonregulated Power Producer -

published 1K
Vol 85 No 240Monday December 14 2020Notices
Vol 85 No 240Monday December 14 2020Notices

2817 CFR 200301503a12 115 USC 78sb1 217 CFR 24019b1504 3See Securities Exchange Act Release No 89063 June 12 2020 85 FR 36923 Comments received on the proposed rule change are available on the https//

published 0K
HeinOnline  93 Harv L Rev  1057 19791980
HeinOnline 93 Harv L Rev 1057 19791980

HeinOnline 93 Harv L Rev 1058 19791980HeinOnline 93 Harv L Rev 1059 19791980HeinOnline 93 Harv L Rev 1060 19791980HeinOnline 93 Harv L Rev 1061 19791980HeinOnline 93 Harv L Rev 1062 19791980

published 0K
IN THE SUPREME COURT OF THE STATE OF FLORIDA   RONALD J ROSEN   Petiti
IN THE SUPREME COURT OF THE STATE OF FLORIDA RONALD J ROSEN Petiti

ON DISCRETIONARY REVIEW FROM THE DISTRICT COURT OF APPEAL FIFTH DISTRICT STATE OF FLORIDA BILL McCOLLUM ATTORNEY GENERAL ROBIN A COMPTON Assistant Attorney General

published 0K
Download Section

Download - The PPT/PDF document "" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.






Document on Subject : "DistinctiveImageFeaturesfromScaleInvariantKeypointsDavidGLoweComputer"— Transcript:

1 DistinctiveImageFeaturesfromScale-Invari
DistinctiveImageFeaturesfromScale-InvariantKeypointsDavidG.LoweComputerScienceDepartmentUniversityofBritishColumbiaVancouver,B.C.,Canadalowe@cs.ubc.caJanuary5,2004AbstractThispaperpresentsamethodforextractingdistinctiveinvariantfeaturesfromimagesthatcanbeusedtoperformreliablematchingbetweendifferentviewsofanobjectorscene.Thefeaturesareinvarianttoimagescaleandrotation,andareshowntoproviderobustmatchingacrossaasubstantialrangeofafnedis-tortion,changein3Dviewpoint,additionofnoise,andchangeinillumination.Thefeaturesarehighlydistinctive,inthesensethatasinglefeaturecanbecor-rectlymatchedwithhighprobabilityagainstalargedatabaseoffeaturesfrommanyimages.Thispaperalsodescribesanapproachtousingthesefeaturesforobjectrecognition.Therecognitionproceedsbymatchingindividualfea-turestoadatabaseoffeaturesfromknownobjectsusingafastnearest-neighboralgorithm,followedbyaHoughtransformtoidentifyclustersbelongingtoasin-gleobject,andnallyperformingvericationthroughleast-squaressolutionforconsistentposeparameters.Thisapproachtorecognitioncanrobustlyidentifyobjectsamongclutterandocclusionwhileachievingnearreal-timeperformance.AcceptedforpublicationintheInternationalJournalofComputerVision,2004.1 1IntroductionImagematchingisafundamentalaspectofmanyproblemsincomputervision,includingobjectorscenerecognition,solvingfor3Dstructurefrommultipleimages,stereocorrespon-dence,andmotiontracking.Thispaperdescribesimagefeaturesthathavemanypropertiesthatmakethemsuitableformatchingdifferingimagesofanobjectorscene.Thefeaturesareinvarianttoimagescalingandrotation,andpartiallyinvarianttochangeinilluminationand3Dcameraviewpoint.Theyarewelllocalizedinboththespatialandfrequencydomains,re-ducingtheprobabilityofdisruptionbyocclusion,clutter,ornoise.Largenumbersoffeaturescanbeextractedfromtypicalimageswithef®cientalgorithms.Inaddition,thefeaturesarehighlydistinctive,whichallowsasinglefeaturetobecorrectlymatchedwithhighprobabilityagainstalargedatabaseoffeatures,providingabasisforobjectandscenerecognition.Thecostofextractingthesefeaturesisminimizedbytakingacascade®lteringapproach,inwhichthemoreexpensiveoperationsareappliedonlyatlocationsthatpassaninitialtest.Followingarethemajorstagesofcomputationusedtogeneratethesetofimagefeatures:1.Scale-spaceextremadetection:The®rststageofcomputationsearchesoverallscalesandimagelocations.Itisimplementedef®cientlybyusingadifference-of-Gaussianfunctiontoidentify

2 potentialinterestpointsthatareinvariantt
potentialinterestpointsthatareinvarianttoscaleandorientation.2.Keypointlocalization:Ateachcandidatelocation,adetailedmodelis®ttodeterminelocationandscale.Keypointsareselectedbasedonmeasuresoftheirstability.3.Orientationassignment:Oneormoreorientationsareassignedtoeachkeypointlo-cationbasedonlocalimagegradientdirections.Allfutureoperationsareperformedonimagedatathathasbeentransformedrelativetotheassignedorientation,scale,andlocationforeachfeature,therebyprovidinginvariancetothesetransformations.4.Keypointdescriptor:Thelocalimagegradientsaremeasuredattheselectedscaleintheregionaroundeachkeypoint.Thesearetransformedintoarepresentationthatallowsforsigni®cantlevelsoflocalshapedistortionandchangeinillumination.ThisapproachhasbeennamedtheScaleInvariantFeatureTransform(SIFT),asittransformsimagedataintoscale-invariantcoordinatesrelativetolocalfeatures.Animportantaspectofthisapproachisthatitgenerateslargenumbersoffeaturesthatdenselycovertheimageoverthefullrangeofscalesandlocations.Atypicalimageofsize500x500pixelswillgiverisetoabout2000stablefeatures(althoughthisnumberdependsonbothimagecontentandchoicesforvariousparameters).Thequantityoffeaturesispartic-ularlyimportantforobjectrecognition,wheretheabilitytodetectsmallobjectsinclutteredbackgroundsrequiresthatatleast3featuresbecorrectlymatchedfromeachobjectforreli-ableidenti®cation.Forimagematchingandrecognition,SIFTfeaturesare®rstextractedfromasetofref-erenceimagesandstoredinadatabase.Anewimageismatchedbyindividuallycomparingeachfeaturefromthenewimagetothispreviousdatabaseand®ndingcandidatematch-ingfeaturesbasedonEuclideandistanceoftheirfeaturevectors.Thispaperwilldiscussfastnearest-neighboralgorithmsthatcanperformthiscomputationrapidlyagainstlargedatabases.Thekeypointdescriptorsarehighlydistinctive,whichallowsasinglefeatureto®nditscorrectmatchwithgoodprobabilityinalargedatabaseoffeatures.However,inacluttered2 image,manyfeaturesfromthebackgroundwillnothaveanycorrectmatchinthedatabase,givingrisetomanyfalsematchesinadditiontothecorrectones.Thecorrectmatchescanbe®lteredfromthefullsetofmatchesbyidentifyingsubsetsofkeypointsthatagreeontheobjectanditslocation,scale,andorientationinthenewimage.Theprobabilitythatseveralfeatureswillagreeontheseparametersbychanceismuchlowerthantheprobabilitythatanyindividualfeaturematchwillbeinerror.Thedeterminationoftheseconsistentclusterscanbeperformedrapidlybyusinganef®cienthashtabl

3 eimplementationofthegeneralizedHoughtran
eimplementationofthegeneralizedHoughtransform.Eachclusterof3ormorefeaturesthatagreeonanobjectanditsposeisthensubjecttofurtherdetailedveri®cation.First,aleast-squaredestimateismadeforanaf®neapproxi-mationtotheobjectpose.Anyotherimagefeaturesconsistentwiththisposeareidenti®ed,andoutliersarediscarded.Finally,adetailedcomputationismadeoftheprobabilitythataparticularsetoffeaturesindicatesthepresenceofanobject,giventheaccuracyof®tandnumberofprobablefalsematches.Objectmatchesthatpassallthesetestscanbeidenti®edascorrectwithhighcon®dence.2RelatedresearchThedevelopmentofimagematchingbyusingasetoflocalinterestpointscanbetracedbacktotheworkofMoravec(1981)onstereomatchingusingacornerdetector.TheMoravecdetectorwasimprovedbyHarrisandStephens(1988)tomakeitmorerepeatableundersmallimagevariationsandnearedges.Harrisalsoshoweditsvalueforef®cientmotiontrackingand3Dstructurefrommotionrecovery(Harris,1992),andtheHarriscornerdetectorhassincebeenwidelyusedformanyotherimagematchingtasks.Whilethesefeaturedetectorsareusuallycalledcornerdetectors,theyarenotselectingjustcorners,butratheranyimagelocationthathaslargegradientsinalldirectionsatapredeterminedscale.Theinitialapplicationsweretostereoandshort-rangemotiontracking,buttheapproachwaslaterextendedtomoredif®cultproblems.Zhangetal.(1995)showedthatitwaspossi-bletomatchHarriscornersoveralargeimagerangebyusingacorrelationwindowaroundeachcornertoselectlikelymatches.Outlierswerethenremovedbysolvingforafunda-mentalmatrixdescribingthegeometricconstraintsbetweenthetwoviewsofrigidsceneandremovingmatchesthatdidnotagreewiththemajoritysolution.Atthesametime,asimilarapproachwasdevelopedbyTorr(1995)forlong-rangemotionmatching,inwhichgeometricconstraintswereusedtoremoveoutliersforrigidobjectsmovingwithinanimage.Theground-breakingworkofSchmidandMohr(1997)showedthatinvariantlocalfea-turematchingcouldbeextendedtogeneralimagerecognitionproblemsinwhichafeaturewasmatchedagainstalargedatabaseofimages.TheyalsousedHarriscornerstoselectinterestpoints,butratherthanmatchingwithacorrelationwindow,theyusedarotationallyinvariantdescriptorofthelocalimageregion.Thisallowedfeaturestobematchedunderarbitraryorientationchangebetweenthetwoimages.Furthermore,theydemonstratedthatmultiplefeaturematchescouldaccomplishgeneralrecognitionunderocclusionandclutterbyidentifyingconsistentclustersofmatchedfeatures.TheHarriscornerdetectorisverysensitivetochangesinimagescal

4 e,soitdoesnotprovideagoodbasisformatchin
e,soitdoesnotprovideagoodbasisformatchingimagesofdifferentsizes.Earlierworkbytheauthor(Lowe,1999)extendedthelocalfeatureapproachtoachievescaleinvariance.Thisworkalsodescribedanewlocaldescriptorthatprovidedmoredistinctivefeatureswhilebeingless3 sensitivetolocalimagedistortionssuchas3Dviewpointchange.Thiscurrentpaperprovidesamorein-depthdevelopmentandanalysisofthisearlierwork,whilealsopresentinganumberofimprovementsinstabilityandfeatureinvariance.Thereisaconsiderablebodyofpreviousresearchonidentifyingrepresentationsthatarestableunderscalechange.Someofthe®rstworkinthisareawasbyCrowleyandParker(1984),whodevelopedarepresentationthatidenti®edpeaksandridgesinscalespaceandlinkedtheseintoatreestructure.Thetreestructurecouldthenbematchedbetweenimageswitharbitraryscalechange.Morerecentworkongraph-basedmatchingbyShokoufandeh,MarsicandDickinson(1999)providesmoredistinctivefeaturedescriptorsusingwaveletco-ef®cients.TheproblemofidentifyinganappropriateandconsistentscaleforfeaturedetectionhasbeenstudiedindepthbyLindeberg(1993,1994).Hedescribesthisasaproblemofscaleselection,andwemakeuseofhisresultsbelow.Recently,therehasbeenanimpressivebodyofworkonextendinglocalfeaturestobeinvarianttofullaf®netransformations(Baumberg,2000;TuytelaarsandVanGool,2000;MikolajczykandSchmid,2002;SchaffalitzkyandZisserman,2002;BrownandLowe,2002).Thisallowsforinvariantmatchingtofeaturesonaplanarsurfaceunderchangesinortho-graphic3Dprojection,inmostcasesbyresamplingtheimageinalocalaf®neframe.How-ever,noneoftheseapproachesareyetfullyaf®neinvariant,astheystartwithinitialfeaturescalesandlocationsselectedinanon-af®ne-invariantmannerduetotheprohibitivecostofexploringthefullaf®nespace.Theaf®neframesarearealsomoresensitivetonoisethanthoseofthescale-invariantfeatures,soinpracticetheaf®nefeatureshavelowerrepeatabilitythanthescale-invariantfeaturesunlesstheaf®nedistortionisgreaterthanabouta40degreetiltofaplanarsurface(Mikolajczyk,2002).Wideraf®neinvariancemaynotbeimportantformanyapplications,astrainingviewsarebesttakenatleastevery30degreesrotationinview-point(meaningthatrecognitioniswithin15degreesoftheclosesttrainingview)inordertocapturenon-planarchangesandocclusioneffectsfor3Dobjects.Whilethemethodtobepresentedinthispaperisnotfullyaf®neinvariant,adifferentapproachisusedinwhichthelocaldescriptorallowsrelativefeaturepositionstoshiftsignif-icantlywithonlysmallchangesinthedescriptor.Thisapproachnotonl

5 yallowsthedescrip-torstobereliablymatche
yallowsthedescrip-torstobereliablymatchedacrossaconsiderablerangeofaf®nedistortion,butitalsomakesthefeaturesmorerobustagainstchangesin3Dviewpointfornon-planarsurfaces.Otheradvantagesincludemuchmoreef®cientfeatureextractionandtheabilitytoidentifylargernumbersoffeatures.Ontheotherhand,af®neinvarianceisavaluablepropertyformatchingplanarsurfacesunderverylargeviewchanges,andfurtherresearchshouldbeperformedonthebestwaystocombinethiswithnon-planar3Dviewpointinvarianceinanef®cientandstablemanner.Manyotherfeaturetypeshavebeenproposedforuseinrecognition,someofwhichcouldbeusedinadditiontothefeaturesdescribedinthispapertoprovidefurthermatchesunderdifferingcircumstances.Oneclassoffeaturesarethosethatmakeuseofimagecontoursorregionboundaries,whichshouldmakethemlesslikelytobedisruptedbyclutteredback-groundsnearobjectboundaries.Matasetal.,(2002)haveshownthattheirmaximally-stableextremalregionscanproducelargenumbersofmatchingfeatureswithgoodstability.Miko-lajczyketal.,(2003)havedevelopedanewdescriptorthatuseslocaledgeswhileignoringunrelatednearbyedges,providingtheabilityto®ndstablefeaturesevenneartheboundariesofnarrowshapessuperimposedonbackgroundclutter.NelsonandSelinger(1998)haveshowngoodresultswithlocalfeaturesbasedongroupingsofimagecontours.Similarly,4 PopeandLowe(2000)usedfeaturesbasedonthehierarchicalgroupingofimagecontours,whichareparticularlyusefulforobjectslackingdetailedtexture.Thehistoryofresearchonvisualrecognitioncontainsworkonadiversesetofotherimagepropertiesthatcanbeusedasfeaturemeasurements.CarneiroandJepson(2002)describephase-basedlocalfeaturesthatrepresentthephaseratherthanthemagnitudeoflocalspatialfrequencies,whichislikelytoprovideimprovedinvariancetoillumination.SchieleandCrowley(2000)haveproposedtheuseofmultidimensionalhistogramssummarizingthedistributionofmeasurementswithinimageregions.Thistypeoffeaturemaybeparticularlyusefulforrecognitionoftexturedobjectswithdeformableshapes.BasriandJacobs(1997)havedemonstratedthevalueofextractinglocalregionboundariesforrecognition.Otherusefulpropertiestoincorporateincludecolor,motion,®gure-grounddiscrimination,regionshapedescriptors,andstereodepthcues.Thelocalfeatureapproachcaneasilyincorporatenovelfeaturetypesbecauseextrafeaturescontributetorobustnesswhentheyprovidecorrectmatches,butotherwisedolittleharmotherthantheircostofcomputation.Therefore,futuresystemsarelikelytocombinemanyfeaturetypes.3Detectionofsc

6 ale-spaceextremaAsdescribedintheintroduc
ale-spaceextremaAsdescribedintheintroduction,wewilldetectkeypointsusingacascade®lteringapproachthatusesef®cientalgorithmstoidentifycandidatelocationsthatarethenexaminedinfurtherdetail.The®rststageofkeypointdetectionistoidentifylocationsandscalesthatcanberepeatablyassignedunderdifferingviewsofthesameobject.Detectinglocationsthatareinvarianttoscalechangeoftheimagecanbeaccomplishedbysearchingforstablefeaturesacrossallpossiblescales,usingacontinuousfunctionofscaleknownasscalespace(Witkin,1983).IthasbeenshownbyKoenderink(1984)andLindeberg(1994)thatunderavarietyofreasonableassumptionstheonlypossiblescale-spacekernelistheGaussianfunction.There-fore,thescalespaceofanimageisde®nedasafunction,L(x;y;),thatisproducedfromtheconvolutionofavariable-scaleGaussian,G(x;y;),withaninputimage,I(x;y):L(x;y;)=G(x;y;)I(x;y);whereistheconvolutionoperationinxandy,andG(x;y;)=122e(x2+y2)=22:Toef®cientlydetectstablekeypointlocationsinscalespace,wehaveproposed(Lowe,1999)usingscale-spaceextremainthedifference-of-Gaussianfunctionconvolvedwiththeimage,D(x;y;),whichcanbecomputedfromthedifferenceoftwonearbyscalesseparatedbyaconstantmultiplicativefactork:D(x;y;)=(G(x;y;k)G(x;y;))I(x;y)=L(x;y;k)L(x;y;):(1)Thereareanumberofreasonsforchoosingthisfunction.First,itisaparticularlyef®cientfunctiontocompute,asthesmoothedimages,L;needtobecomputedinanycaseforscalespacefeaturedescription,andDcanthereforebecomputedbysimpleimagesubtraction.5 Scale (firstctave)Scale(nextctve)aussianiference ofaussian (DOG). . .Figure1:Foreachoctaveofscalespace,theinitialimageisrepeatedlyconvolvedwithGaussianstoproducethesetofscalespaceimagesshownontheleft.AdjacentGaussianimagesaresubtractedtoproducethedifference-of-Gaussianimagesontheright.Aftereachoctave,theGaussianimageisdown-sampledbyafactorof2,andtheprocessrepeated.Inaddition,thedifference-of-Gaussianfunctionprovidesacloseapproximationtothescale-normalizedLaplacianofGaussian,2r2G,asstudiedbyLindeberg(1994).LindebergshowedthatthenormalizationoftheLaplacianwiththefactor2isrequiredfortruescaleinvariance.Indetailedexperimentalcomparisons,Mikolajczyk(2002)foundthatthemaximaandminimaof2r2Gproducethemoststableimagefeaturescomparedtoarangeofotherpossibleimagefunctions,suchasthegradient,Hessian,orHarriscornerfunction.TherelationshipbetweenDand2r2Gcanbeunderstoodfromtheheatdiffusionequa-tion(parameterizedintermsofratherthanthemoreusualt=2):@

7 G@=r2G:Fromthis,weseethatr2Gcanbecompu
G@=r2G:Fromthis,weseethatr2Gcanbecomputedfromthe®nitedifferenceapproximationto@G=@,usingthedifferenceofnearbyscalesatkand:r2G=@G@G(x;y;k)G(x;y;)kandtherefore,G(x;y;k)G(x;y;)(k1)2r2G:Thisshowsthatwhenthedifference-of-Gaussianfunctionhasscalesdifferingbyacon-stantfactoritalreadyincorporatesthe2scalenormalizationrequiredforthescale-invariant6 ScaleFigure2:Maximaandminimaofthedifference-of-Gaussianimagesaredetectedbycomparingapixel(markedwithX)toits26neighborsin3x3regionsatthecurrentandadjacentscales(markedwithcircles).Laplacian.Thefactor(k1)intheequationisaconstantoverallscalesandthereforedoesnotinuenceextremalocation.Theapproximationerrorwillgotozeroaskgoesto1,butinpracticewehavefoundthattheapproximationhasalmostnoimpactonthestabilityofextremadetectionorlocalizationforevensigni®cantdifferencesinscale,suchask=p2.Anef®cientapproachtoconstructionofD(x;y;)isshowninFigure1.TheinitialimageisincrementallyconvolvedwithGaussianstoproduceimagesseparatedbyaconstantfactorkinscalespace,shownstackedintheleftcolumn.Wechoosetodivideeachoctaveofscalespace(i.e.,doublingof)intoanintegernumber,s,ofintervals,sok=21=s:Wemustproduces+3imagesinthestackofblurredimagesforeachoctave,sothat®nalextremadetectioncoversacompleteoctave.Adjacentimagescalesaresubtractedtoproducethedifference-of-Gaussianimagesshownontheright.Onceacompleteoctavehasbeenprocessed,weresampletheGaussianimagethathastwicetheinitialvalueof(itwillbe2imagesfromthetopofthestack)bytakingeverysecondpixelineachrowandcolumn.Theaccuracyofsamplingrelativetoisnodifferentthanforthestartofthepreviousoctave,whilecomputationisgreatlyreduced.3.1LocalextremadetectionInordertodetectthelocalmaximaandminimaofD(x;y;),eachsamplepointiscomparedtoitseightneighborsinthecurrentimageandnineneighborsinthescaleaboveandbelow(seeFigure2).Itisselectedonlyifitislargerthanalloftheseneighborsorsmallerthanallofthem.Thecostofthischeckisreasonablylowduetothefactthatmostsamplepointswillbeeliminatedfollowingthe®rstfewchecks.Animportantissueistodeterminethefrequencyofsamplingintheimageandscaledo-mainsthatisneededtoreliablydetecttheextrema.Unfortunately,itturnsoutthatthereisnominimumspacingofsamplesthatwilldetectallextrema,astheextremacanbearbitrar-ilyclosetogether.Thiscanbeseenbyconsideringawhitecircleonablackbackground,whichwillhaveasinglescalespacemaximumwherethecircularpositivecentralregionofthedifference-of-Gaussian

8 functionmatchesthesizeandlocationoftheci
functionmatchesthesizeandlocationofthecircle.Foraveryelongatedellipse,therewillbetwomaximaneareachendoftheellipse.Asthelocationsofmaximaareacontinuousfunctionoftheimage,forsomeellipsewithintermediateelongationtherewillbeatransitionfromasinglemaximumtotwo,withthemaximaarbitrarilycloseto7 0 20 40 60 80 100 1 2 3 4 5 6 7 8Repeatability (%)Number of scales sampled per octaveMatching location and scaleNearest descriptor in database 500 1000 1500 2000 2500 3000 3500 1 2 3 4 5 6 7 8Number of keypoints per imageNumber of scales sampled per octaveTotal number of keypointsNearest descriptor in databaseFigure3:Thetoplineofthe®rstgraphshowsthepercentofkeypointsthatarerepeatablydetectedatthesamelocationandscaleinatransformedimageasafunctionofthenumberofscalessampledperoctave.Thelowerlineshowsthepercentofkeypointsthathavetheirdescriptorscorrectlymatchedtoalargedatabase.Thesecondgraphshowsthetotalnumberofkeypointsdetectedinatypicalimageasafunctionofthenumberofscalesamples.eachothernearthetransition.Therefore,wemustsettleforasolutionthattradesoffef®ciencywithcompleteness.Infact,asmightbeexpectedandiscon®rmedbyourexperiments,extremathatareclosetogetherarequiteunstabletosmallperturbationsoftheimage.Wecandeterminethebestchoicesexperimentallybystudyingarangeofsamplingfrequenciesandusingthosethatprovidethemostreliableresultsunderarealisticsimulationofthematchingtask.3.2FrequencyofsamplinginscaleTheexperimentaldeterminationofsamplingfrequencythatmaximizesextremastabilityisshowninFigures3and4.These®gures(andmostothersimulationsinthispaper)arebasedonamatchingtaskusingacollectionof32realimagesdrawnfromadiverserange,includingoutdoorscenes,humanfaces,aerialphotographs,andindustrialimages(theimagedomainwasfoundtohavealmostnoinuenceonanyoftheresults).Eachimagewasthensubjecttoarangeoftransformations,includingrotation,scaling,af®nestretch,changeinbrightnessandcontrast,andadditionofimagenoise.Becausethechangesweresynthetic,itwaspossibletopreciselypredictwhereeachfeatureinanoriginalimageshouldappearinthetransformedimage,allowingformeasurementofcorrectrepeatabilityandpositionalaccuracyforeachfeature.Figure3showsthesesimulationresultsusedtoexaminetheeffectofvaryingthenumberofscalesperoctaveatwhichtheimagefunctionissampledpriortoextremadetection.Inthiscase,eachimagewasresampledfollowingrotationbyarandomangleandscalingbyarandomamountbetween0.2of0.9timestheoriginalsize.Keypointsfro

9 mthereducedresolutionimagewerematchedaga
mthereducedresolutionimagewerematchedagainstthosefromtheoriginalimagesothatthescalesforallkeypointswouldbebepresentinthematchedimage.Inaddition,1%imagenoisewasadded,meaningthateachpixelhadarandomnumberaddedfromtheuniforminterval[-0.01,0.01]wherepixelvaluesareintherange[0,1](equivalenttoprovidingslightlylessthan6bitsofaccuracyforimagepixels).8 0 20 40 60 80 100 1 1.2 1.4 1.6 1.8 2Repeatability (%)Prior smoothing for each octave (sigma)Matching location and scaleNearest descriptor in databaseFigure4:Thetoplineinthegraphshowsthepercentofkeypointlocationsthatarerepeatablydetectedinatransformedimageasafunctionofthepriorimagesmoothingforthe®rstlevelofeachoctave.Thelowerlineshowsthepercentofdescriptorscorrectlymatchedagainstalargedatabase.Thetoplineinthe®rstgraphofFigure3showsthepercentofkeypointsthataredetectedatamatchinglocationandscaleinthetransformedimage.Forallexamplesinthispaper,wede®neamatchingscaleasbeingwithinafactorofp2ofthecorrectscale,andamatchinglocationasbeingwithinpixels,whereisthescaleofthekeypoint(de®nedfromequation(1)asthestandarddeviationofthesmallestGaussianusedinthedifference-of-Gaussianfunction).Thelowerlineonthisgraphshowsthenumberofkeypointsthatarecorrectlymatchedtoadatabaseof40,000keypointsusingthenearest-neighbormatchingproceduretobedescribedinSection6(thisshowsthatoncethekeypointisrepeatablylocated,itislikelytobeusefulforrecognitionandmatchingtasks).Asthisgraphshows,thehighestrepeatabilityisobtainedwhensampling3scalesperoctave,andthisisthenumberofscalesamplesusedforallotherexperimentsthroughoutthispaper.Itmightseemsurprisingthattherepeatabilitydoesnotcontinuetoimproveasmorescalesaresampled.Thereasonisthatthisresultsinmanymorelocalextremabeingdetected,buttheseextremaareonaveragelessstableandthereforearelesslikelytobedetectedinthetransformedimage.ThisisshownbythesecondgraphinFigure3,whichshowstheaveragenumberofkeypointsdetectedandcorrectlymatchedineachimage.Thenumberofkeypointsriseswithincreasedsamplingofscalesandthetotalnumberofcorrectmatchesalsorises.Sincethesuccessofobjectrecognitionoftendependsmoreonthequantityofcorrectlymatchedkeypoints,asopposedtotheirpercentagecorrectmatching,formanyapplicationsitwillbeoptimaltousealargernumberofscalesamples.However,thecostofcomputationalsoriseswiththisnumber,sofortheexperimentsinthispaperwehavechosentousejust3scalesamplesperoctave.Tosummarize,theseexperimentsshowthatthescale-spacediff

10 erence-of-Gaussianfunc-tionhasalargenumb
erence-of-Gaussianfunc-tionhasalargenumberofextremaandthatitwouldbeveryexpensivetodetectthemall.Fortunately,wecandetectthemoststableandusefulsubsetevenwithacoarsesamplingofscales.9 3.3FrequencyofsamplinginthespatialdomainJustaswedeterminedthefrequencyofsamplingperoctaveofscalespace,sowemustde-terminethefrequencyofsamplingintheimagedomainrelativetothescaleofsmoothing.Giventhatextremacanbearbitrarilyclosetogether,therewillbeasimilartrade-offbetweensamplingfrequencyandrateofdetection.Figure4showsanexperimentaldeterminationoftheamountofpriorsmoothing,,thatisappliedtoeachimagelevelbeforebuildingthescalespacerepresentationforanoctave.Again,thetoplineistherepeatabilityofkeypointdetection,andtheresultsshowthattherepeatabilitycontinuestoincreasewith.However,thereisacosttousingalargeintermsofef®ciency,sowehavechosentouse=1:6,whichprovidesclosetooptimalrepeatability.ThisvalueisusedthroughoutthispaperandwasusedfortheresultsinFigure3.Ofcourse,ifwepre-smooththeimagebeforeextremadetection,weareeffectivelydis-cardingthehighestspatialfrequencies.Therefore,tomakefulluseoftheinput,theimagecanbeexpandedtocreatemoresamplepointsthanwerepresentintheoriginal.Wedou-blethesizeoftheinputimageusinglinearinterpolationpriortobuildingthe®rstlevelofthepyramid.Whiletheequivalentoperationcouldeffectivelyhavebeenperformedbyus-ingsetsofsubpixel-offset®ltersontheoriginalimage,theimagedoublingleadstoamoreef®cientimplementation.Weassumethattheoriginalimagehasablurofatleast=0:5(theminimumneededtopreventsigni®cantaliasing),andthatthereforethedoubledimagehas=1:0relativetoitsnewpixelspacing.Thismeansthatlittleadditionalsmoothingisneededpriortocreationofthe®rstoctaveofscalespace.Theimagedoublingincreasesthenumberofstablekeypointsbyalmostafactorof4,butnosigni®cantfurtherimprovementswerefoundwithalargerexpansionfactor.4AccuratekeypointlocalizationOnceakeypointcandidatehasbeenfoundbycomparingapixeltoitsneighbors,thenextstepistoperformadetailed®ttothenearbydataforlocation,scale,andratioofprincipalcurvatures.Thisinformationallowspointstoberejectedthathavelowcontrast(andarethereforesensitivetonoise)orarepoorlylocalizedalonganedge.Theinitialimplementationofthisapproach(Lowe,1999)simplylocatedkeypointsatthelocationandscaleofthecentralsamplepoint.However,recentlyBrownhasdevelopedamethod(BrownandLowe,2002)for®ttinga3Dquadraticfunctiontothelocalsamplepointstodeterminetheinterpolatedlocationofth

11 emaximum,andhisexperimentsshowedthatthis
emaximum,andhisexperimentsshowedthatthisprovidesasubstantialimprovementtomatchingandstability.HisapproachusestheTaylorexpansion(uptothequadraticterms)ofthescale-spacefunction,D(x;y;),shiftedsothattheoriginisatthesamplepoint:D(x)=D+@D@xTx+12xT@2D@x2x(2)whereDanditsderivativesareevaluatedatthesamplepointandx=(x;y;)Tistheoffsetfromthispoint.Thelocationoftheextremum,^x,isdeterminedbytakingthederivativeofthisfunctionwithrespecttoxandsettingittozero,giving^x=@2D@x21@D@x:(3)10 (a)(b)(c)(d)Figure5:This®gureshowsthestagesofkeypointselection.(a)The233x189pixeloriginalimage.(b)Theinitial832keypointslocationsatmaximaandminimaofthedifference-of-Gaussianfunction.Keypointsaredisplayedasvectorsindicatingscale,orientation,andlocation.(c)Afterapplyingathresholdonminimumcontrast,729keypointsremain.(d)The®nal536keypointsthatremainfollowinganadditionalthresholdonratioofprincipalcurvatures.AssuggestedbyBrown,theHessianandderivativeofDareapproximatedbyusingdif-ferencesofneighboringsamplepoints.Theresulting3x3linearsystemcanbesolvedwithminimalcost.Iftheoffset^xislargerthan0.5inanydimension,thenitmeansthattheex-tremumliesclosertoadifferentsamplepoint.Inthiscase,thesamplepointischangedandtheinterpolationperformedinsteadaboutthatpoint.The®naloffset^xisaddedtothelocationofitssamplepointtogettheinterpolatedestimateforthelocationoftheextremum.Thefunctionvalueattheextremum,D(^x),isusefulforrejectingunstableextremawithlowcontrast.Thiscanbeobtainedbysubstitutingequation(3)into(2),givingD(^x)=D+12@D@xT^x:Fortheexperimentsinthispaper,allextremawithavalueofjD(^x)jlessthan0.03werediscarded(asbefore,weassumeimagepixelvaluesintherange[0,1]).Figure5showstheeffectsofkeypointselectiononanaturalimage.Inordertoavoidtoomuchclutter,alow-resolution233by189pixelimageisusedandkeypointsareshownasvectorsgivingthelocation,scale,andorientationofeachkeypoint(orientationassignmentisdescribedbelow).Figure5(a)showstheoriginalimage,whichisshownatreducedcontrastbehindthesubsequent®gures.Figure5(b)showsthe832keypointsatalldetectedmaxima11 andminimaofthedifference-of-Gaussianfunction,while(c)showsthe729keypointsthatremainfollowingremovalofthosewithavalueofjD(^x)jlessthan0.03.Part(d)willbeexplainedinthefollowingsection.4.1EliminatingedgeresponsesForstability,itisnotsuf®cienttorejectkeypointswithlowcontrast.Thedifference-of-Gaussianfunctionwillhaveastrongresponsealongedges,evenifthelocationalongtheedgeis

12 poorlydeterminedandthereforeunstabletosm
poorlydeterminedandthereforeunstabletosmallamountsofnoise.Apoorlyde®nedpeakinthedifference-of-Gaussianfunctionwillhavealargeprincipalcurvatureacrosstheedgebutasmalloneintheperpendiculardirection.Theprincipalcurva-turescanbecomputedfroma2x2Hessianmatrix,H,computedatthelocationandscaleofthekeypoint:H="DxxDxyDxyDyy#(4)Thederivativesareestimatedbytakingdifferencesofneighboringsamplepoints.TheeigenvaluesofHareproportionaltotheprincipalcurvaturesofD.BorrowingfromtheapproachusedbyHarrisandStephens(1988),wecanavoidexplicitlycomputingtheeigenvalues,asweareonlyconcernedwiththeirratio.Let betheeigenvaluewiththelargestmagnitudeand bethesmallerone.Then,wecancomputethesumoftheeigenvaluesfromthetraceofHandtheirproductfromthedeterminant:Tr(H)=Dxx+Dyy= + ;Det(H)=DxxDyy(Dxy)2= :Intheunlikelyeventthatthedeterminantisnegative,thecurvatureshavedifferentsignssothepointisdiscardedasnotbeinganextremum.Letrbetheratiobetweenthelargestmagnitudeeigenvalueandthesmallerone,sothat =r .Then,Tr(H)2Det(H)=( + )2 =(r + )2r 2=(r+1)2r;whichdependsonlyontheratiooftheeigenvaluesratherthantheirindividualvalues.Thequantity(r+1)2=risataminimumwhenthetwoeigenvaluesareequalanditincreaseswithr.Therefore,tocheckthattheratioofprincipalcurvaturesisbelowsomethreshold,r,weonlyneedtocheckTr(H)2Det(H)(r+1)2r:Thisisveryef®cienttocompute,withlessthan20oatingpointoperationsrequiredtotesteachkeypoint.Theexperimentsinthispaperuseavalueofr=10,whicheliminateskeypointsthathavearatiobetweentheprincipalcurvaturesgreaterthan10.ThetransitionfromFigure5(c)to(d)showstheeffectsofthisoperation.12 5OrientationassignmentByassigningaconsistentorientationtoeachkeypointbasedonlocalimageproperties,thekeypointdescriptorcanberepresentedrelativetothisorientationandthereforeachievein-variancetoimagerotation.ThisapproachcontrastswiththeorientationinvariantdescriptorsofSchmidandMohr(1997),inwhicheachimagepropertyisbasedonarotationallyinvariantmeasure.Thedisadvantageofthatapproachisthatitlimitsthedescriptorsthatcanbeusedanddiscardsimageinformationbynotrequiringallmeasurestobebasedonaconsistentrotation.Followingexperimentationwithanumberofapproachestoassigningalocalorientation,thefollowingapproachwasfoundtogivethemoststableresults.ThescaleofthekeypointisusedtoselecttheGaussiansmoothedimage,L,withtheclosestscale,sothatallcompu-tationsareperformedinascale-invariantmanner.Foreachimagesample,L(x;y),atthisscale,thegradientmagni

13 tude,m(x;y),andorientation,(x;y),isprec
tude,m(x;y),andorientation,(x;y),isprecomputedusingpixeldifferences:m(x;y)=q(L(x+1;y)L(x1;y))2+(L(x;y+1)L(x;y1))2(x;y)=tan1((L(x;y+1)L(x;y1))=(L(x+1;y)L(x1;y)))Anorientationhistogramisformedfromthegradientorientationsofsamplepointswithinaregionaroundthekeypoint.Theorientationhistogramhas36binscoveringthe360degreerangeoforientations.Eachsampleaddedtothehistogramisweightedbyitsgradientmagni-tudeandbyaGaussian-weightedcircularwindowwithathatis1.5timesthatofthescaleofthekeypoint.Peaksintheorientationhistogramcorrespondtodominantdirectionsoflocalgradients.Thehighestpeakinthehistogramisdetected,andthenanyotherlocalpeakthatiswithin80%ofthehighestpeakisusedtoalsocreateakeypointwiththatorientation.Therefore,forlocationswithmultiplepeaksofsimilarmagnitude,therewillbemultiplekeypointscreatedatthesamelocationandscalebutdifferentorientations.Onlyabout15%ofpointsareassignedmultipleorientations,butthesecontributesigni®cantlytothestabilityofmatching.Finally,aparabolais®ttothe3histogramvaluesclosesttoeachpeaktointerpolatethepeakpositionforbetteraccuracy.Figure6showstheexperimentalstabilityoflocation,scale,andorientationassignmentunderdifferingamountsofimagenoise.Asbeforetheimagesarerotatedandscaledbyrandomamounts.Thetoplineshowsthestabilityofkeypointlocationandscaleassign-ment.Thesecondlineshowsthestabilityofmatchingwhentheorientationassignmentisalsorequiredtobewithin15degrees.Asshownbythegapbetweenthetoptwolines,theorientationassignmentremainsaccurate95%ofthetimeevenafteradditionof10%pixelnoise(equivalenttoacameraprovidinglessthan3bitsofprecision).Themeasuredvari-anceoforientationforthecorrectmatchesisabout2.5degrees,risingto3.9degreesfor10%noise.ThebottomlineinFigure6showsthe®nalaccuracyofcorrectlymatchingakeypointdescriptortoadatabaseof40,000keypoints(tobediscussedbelow).Asthisgraphshows,theSIFTfeaturesareresistanttoevenlargeamountsofpixelnoise,andthemajorcauseoferroristheinitiallocationandscaledetection.13 0 20 40 60 80 1000%2%4%6%8%10%Repeatability (%)Image noiseMatching location and scaleMatching location, scale, and orientationNearest descriptor in databaseFigure6:Thetoplineinthegraphshowsthepercentofkeypointlocationsandscalesthatarerepeat-ablydetectedasafunctionofpixelnoise.Thesecondlineshowstherepeatabilityafteralsorequiringagreementinorientation.Thebottomlineshowsthe®nalpercentofdescriptorscorrectlymatchedtoalargedatabase.6Thelocalimagedes

14 criptorThepreviousoperationshaveassigned
criptorThepreviousoperationshaveassignedanimagelocation,scale,andorientationtoeachkey-point.Theseparametersimposearepeatablelocal2Dcoordinatesysteminwhichtodescribethelocalimageregion,andthereforeprovideinvariancetotheseparameters.Thenextstepistocomputeadescriptorforthelocalimageregionthatishighlydistinctiveyetisasinvariantaspossibletoremainingvariations,suchaschangeinilluminationor3Dviewpoint.Oneobviousapproachwouldbetosamplethelocalimageintensitiesaroundthekey-pointattheappropriatescale,andtomatchtheseusinganormalizedcorrelationmeasure.However,simplecorrelationofimagepatchesishighlysensitivetochangesthatcausemis-registrationofsamples,suchasaf®neor3Dviewpointchangeornon-rigiddeformations.AbetterapproachhasbeendemonstratedbyEdelman,Intrator,andPoggio(1997).Theirpro-posedrepresentationwasbaseduponamodelofbiologicalvision,inparticularofcomplexneuronsinprimaryvisualcortex.Thesecomplexneuronsrespondtoagradientataparticularorientationandspatialfrequency,butthelocationofthegradientontheretinaisallowedtoshiftoverasmallreceptive®eldratherthanbeingpreciselylocalized.Edelmanetal.hypoth-esizedthatthefunctionofthesecomplexneuronswastoallowformatchingandrecognitionof3Dobjectsfromarangeofviewpoints.Theyhaveperformeddetailedexperimentsusing3Dcomputermodelsofobjectandanimalshapeswhichshowthatmatchinggradientswhileallowingforshiftsintheirpositionresultsinmuchbetterclassi®cationunder3Drotation.Forexample,recognitionaccuracyfor3Dobjectsrotatedindepthby20degreesincreasedfrom35%forcorrelationofgradientsto94%usingthecomplexcellmodel.Ourimplementationdescribedbelowwasinspiredbythisidea,butallowsforpositionalshiftusingadifferentcomputationalmechanism.14 Image gradientsKeypoint descriptorFigure7:Akeypointdescriptoriscreatedby®rstcomputingthegradientmagnitudeandorientationateachimagesamplepointinaregionaroundthekeypointlocation,asshownontheleft.TheseareweightedbyaGaussianwindow,indicatedbytheoverlaidcircle.Thesesamplesarethenaccumulatedintoorientationhistogramssummarizingthecontentsover4x4subregions,asshownontheright,withthelengthofeacharrowcorrespondingtothesumofthegradientmagnitudesnearthatdirectionwithintheregion.This®gureshowsa2x2descriptorarraycomputedfroman8x8setofsamples,whereastheexperimentsinthispaperuse4x4descriptorscomputedfroma16x16samplearray.6.1DescriptorrepresentationFigure7illustratesthecomputationofthekeypointdescriptor.Firsttheimagegradientmag-nitudesand

15 orientationsaresampledaroundthekeypointl
orientationsaresampledaroundthekeypointlocation,usingthescaleofthekeypointtoselectthelevelofGaussianblurfortheimage.Inordertoachieveorientationinvariance,thecoordinatesofthedescriptorandthegradientorientationsarerotatedrelativetothekeypointorientation.Foref®ciency,thegradientsareprecomputedforalllevelsofthepyramidasdescribedinSection5.TheseareillustratedwithsmallarrowsateachsamplelocationontheleftsideofFigure7.AGaussianweightingfunctionwithequaltoonehalfthewidthofthedescriptorwin-dowisusedtoassignaweighttothemagnitudeofeachsamplepoint.ThisisillustratedwithacircularwindowontheleftsideofFigure7,although,ofcourse,theweightfallsoffsmoothly.ThepurposeofthisGaussianwindowistoavoidsuddenchangesinthedescriptorwithsmallchangesinthepositionofthewindow,andtogivelessemphasistogradientsthatarefarfromthecenterofthedescriptor,asthesearemostaffectedbymisregistrationerrors.ThekeypointdescriptorisshownontherightsideofFigure7.Itallowsforsigni®cantshiftingradientpositionsbycreatingorientationhistogramsover4x4sampleregions.The®gureshowseightdirectionsforeachorientationhistogram,withthelengthofeacharrowcorrespondingtothemagnitudeofthathistogramentry.Agradientsampleontheleftcanshiftupto4samplepositionswhilestillcontributingtothesamehistogramontheright,therebyachievingtheobjectiveofallowingforlargerlocalpositionalshifts.Itisimportanttoavoidallboundaryaffectsinwhichthedescriptorabruptlychangesasasampleshiftssmoothlyfrombeingwithinonehistogramtoanotherorfromoneorientationtoanother.Therefore,trilinearinterpolationisusedtodistributethevalueofeachgradientsampleintoadjacenthistogrambins.Inotherwords,eachentryintoabinismultipliedbyaweightof1dforeachdimension,wheredisthedistanceofthesamplefromthecentralvalueofthebinasmeasuredinunitsofthehistogrambinspacing.15 Thedescriptorisformedfromavectorcontainingthevaluesofalltheorientationhis-togramentries,correspondingtothelengthsofthearrowsontherightsideofFigure7.The®gureshowsa2x2arrayoforientationhistograms,whereasourexperimentsbelowshowthatthebestresultsareachievedwitha4x4arrayofhistogramswith8orientationbinsineach.Therefore,theexperimentsinthispaperusea4x4x8=128elementfeaturevectorforeachkeypoint.Finally,thefeaturevectorismodi®edtoreducetheeffectsofilluminationchange.First,thevectorisnormalizedtounitlength.Achangeinimagecontrastinwhicheachpixelvalueismultipliedbyaconstantwillmultiplygradientsbythesameconstant,sothiscontrastchangewi

16 llbecanceledbyvectornormalization.Abrigh
llbecanceledbyvectornormalization.Abrightnesschangeinwhichaconstantisaddedtoeachimagepixelwillnotaffectthegradientvalues,astheyarecomputedfrompixeldifferences.Therefore,thedescriptorisinvarianttoaf®nechangesinillumination.However,non-linearilluminationchangescanalsooccurduetocamerasaturationorduetoilluminationchangesthataffect3Dsurfaceswithdifferingorientationsbydifferentamounts.Theseeffectscancausealargechangeinrelativemagnitudesforsomegradients,butarelesslikelytoaffectthegradientorientations.Therefore,wereducetheinuenceoflargegradientmagnitudesbythresholdingthevaluesintheunitfeaturevectortoeachbenolargerthan0.2,andthenrenormalizingtounitlength.Thismeansthatmatchingthemagnitudesforlargegradientsisnolongerasimportant,andthatthedistributionoforientationshasgreateremphasis.Thevalueof0.2wasdeterminedexperimentallyusingimagescontainingdifferingilluminationsforthesame3Dobjects.6.2DescriptortestingTherearetwoparametersthatcanbeusedtovarythecomplexityofthedescriptor:thenumberoforientations,r,inthehistograms,andthewidth,n,ofthennarrayoforientationhistograms.Thesizeoftheresultingdescriptorvectorisrn2.Asthecomplexityofthedescriptorgrows,itwillbeabletodiscriminatebetterinalargedatabase,butitwillalsobemoresensitivetoshapedistortionsandocclusion.Figure8showsexperimentalresultsinwhichthenumberoforientationsandsizeofthedescriptorwerevaried.Thegraphwasgeneratedforaviewpointtransformationinwhichaplanarsurfaceistiltedby50degreesawayfromtheviewerand4%imagenoiseisadded.Thisisnearthelimitsofreliablematching,asitisinthesemoredif®cultcasesthatdescriptorperformanceismostimportant.Theresultsshowthepercentofkeypointsthat®ndacorrectmatchtothesingleclosestneighboramongadatabaseof40,000keypoints.Thegraphshowsthatasingleorientationhistogram(n=1)isverypooratdiscriminating,buttheresultscontinuetoimproveuptoa4x4arrayofhistogramswith8orientations.Afterthat,addingmoreorientationsoralargerdescriptorcanactuallyhurtmatchingbymakingthedescriptormoresensitivetodistortion.Theseresultswerebroadlysimilarforotherdegreesofview-pointchangeandnoise,althoughinsomesimplercasesdiscriminationcontinuedtoimprove(fromalreadyhighlevels)with5x5andhigherdescriptorsizes.Throughoutthispaperweusea4x4descriptorwith8orientations,resultinginfeaturevectorswith128dimensions.Whilethedimensionalityofthedescriptormayseemhigh,wehavefoundthatitconsistentlyperformsbetterthanlower-dimensionaldescriptorsonarangeofma

17 tchingtasksandthatthecomputationalcostof
tchingtasksandthatthecomputationalcostofmatchingremainslowwhenusingtheapproximatenearest-neighbormethodsdescribedbelow.16 0 10 20 30 40 50 60 1 2 3 4 5Correct nearest descriptor (%)Width n of descriptor (angle 50 deg, noise 4%)With 16 orientationsWith 8 orientationsWith 4 orientationsFigure8:Thisgraphshowsthepercentofkeypointsgivingthecorrectmatchtoadatabaseof40,000keypointsasafunctionofwidthofthennkeypointdescriptorandthenumberoforientationsineachhistogram.Thegraphiscomputedforimageswithaf®neviewpointchangeof50degreesandadditionof4%noise.6.3SensitivitytoafnechangeThesensitivityofthedescriptortoaf®nechangeisexaminedinFigure9.Thegraphshowsthereliabilityofkeypointlocationandscaleselection,orientationassignment,andnearest-neighbormatchingtoadatabaseasafunctionofrotationindepthofaplaneawayfromaviewer.Itcanbeseenthateachstageofcomputationhasreducedrepeatabilitywithincreas-ingaf®nedistortion,butthatthe®nalmatchingaccuracyremainsabove50%outtoa50degreechangeinviewpoint.Toachievereliablematchingoverawiderviewpointangle,oneoftheaf®ne-invariantdetectorscouldbeusedtoselectandresampleimageregions,asdiscussedinSection2.Asmentionedthere,noneoftheseapproachesistrulyaf®ne-invariant,astheyallstartfrominitialfeaturelocationsdeterminedinanon-af®ne-invariantmanner.Inwhatappearstobethemostaf®ne-invariantmethod,Mikolajczyk(2002)hasproposedandrundetailedexperimentswiththeHarris-af®nedetector.Hefoundthatitskeypointrepeatabilityisbelowthatgivenhereouttoabouta50degreeviewpointangle,butthatitthenretainscloseto40%repeatabilityouttoanangleof70degrees,whichprovidesbetterperformanceforextremeaf®nechanges.Thedisadvantagesareamuchhighercomputationalcost,areductioninthenumberofkeypoints,andpoorerstabilityforsmallaf®nechangesduetoerrorsinassigningaconsistentaf®neframeundernoise.Inpractice,theallowablerangeofrotationfor3Dobjectsisconsiderablylessthanforplanarsurfaces,soaf®neinvarianceisusuallynotthelimitingfactorintheabilitytomatchacrossviewpointchange.Ifawiderangeofaf®neinvarianceisdesired,suchasforasurfacethatisknowntobeplanar,thenasimplesolutionistoadopttheapproachofPritchardandHeidrich(2003)inwhichadditionalSIFTfeaturesaregeneratedfrom4af®ne-transformedversionsofthetrainingimagecorrespondingto60degreeviewpointchanges.ThisallowsfortheuseofstandardSIFTfeatureswithnoadditionalcostwhenprocessingtheimagetoberecognized,butresultsinanincreaseinthesizeofthefeaturedatabasebyafactorof3

18 .17 0 20 40 60 80 100 0 10 20 30 40 50R
.17 0 20 40 60 80 100 0 10 20 30 40 50Repeatability (%)Viewpoint angle (degrees)Matching location and scaleMatching location, scale, and orientationNearest descriptor in databaseFigure9:Thisgraphshowsthestabilityofdetectionforkeypointlocation,orientation,and®nalmatchingtoadatabaseasafunctionofaf®nedistortion.Thedegreeofaf®nedistortionisexpressedintermsoftheequivalentviewpointrotationindepthforaplanarsurface.6.4MatchingtolargedatabasesAnimportantremainingissueformeasuringthedistinctivenessoffeaturesishowthere-liabilityofmatchingvariesasafunctionofthenumberoffeaturesinthedatabasebeingmatched.Mostoftheexamplesinthispaperaregeneratedusingadatabaseof32imageswithabout40,000keypoints.Figure10showshowthematchingreliabilityvariesasafunc-tionofdatabasesize.This®gurewasgeneratedusingalargerdatabaseof112images,withaviewpointdepthrotationof30degreesand2%imagenoiseinadditiontotheusualrandomimagerotationandscalechange.Thedashedlineshowstheportionofimagefeaturesforwhichthenearestneighborinthedatabasewasthecorrectmatch,asafunctionofdatabasesizeshownonalogarithmicscale.Theleftmostpointismatchingagainstfeaturesfromonlyasingleimagewhiletherightmostpointisselectingmatchesfromadatabaseofallfeaturesfromthe112images.Itcanbeseenthatmatchingreliabilitydoesdecreaseasafunctionofthenumberofdistractors,yetallindicationsarethatmanycorrectmatcheswillcontinuetobefoundouttoverylargedatabasesizes.Thesolidlineisthepercentageofkeypointsthatwereidenti®edatthecorrectmatch-inglocationandorientationinthetransformedimage,soitisonlythesepointsthathaveanychanceofhavingmatchingdescriptorsinthedatabase.Thereasonthislineisatisthatthetestwasrunoverthefulldatabaseforeachvalue,whileonlyvaryingtheportionofthedatabaseusedfordistractors.Itisofinterestthatthegapbetweenthetwolinesissmall,indicatingthatmatchingfailuresareduemoretoissueswithinitialfeaturelocalizationandorientationassignmentthantoproblemswithfeaturedistinctiveness,evenouttolargedatabasesizes.18 0 20 40 60 80 100 1000 10000 100000Repeatability (%)Number of keypoints in database (log scale)Matching location, scale, and orientationNearest descriptor in databaseFigure10:Thedashedlineshowsthepercentofkeypointscorrectlymatchedtoadatabaseasafunctionofdatabasesize(usingalogarithmicscale).Thesolidlineshowsthepercentofkeypointsassignedthecorrectlocation,scale,andorientation.Imageshadrandomscaleandrotationchanges,anaf®netransformof30degrees,and

19 imagenoiseof2%addedpriortomatching.7Appl
imagenoiseof2%addedpriortomatching.7ApplicationtoobjectrecognitionThemajortopicofthispaperisthederivationofdistinctiveinvariantkeypoints,asdescribedabove.Todemonstratetheirapplication,wewillnowgiveabriefdescriptionoftheiruseforobjectrecognitioninthepresenceofclutterandocclusion.Moredetailsonapplicationsofthesefeaturestorecognitionareavailableinotherpapers(Lowe,1999;Lowe,2001;Se,LoweandLittle,2002).Objectrecognitionisperformedby®rstmatchingeachkeypointindependentlytothedatabaseofkeypointsextractedfromtrainingimages.Manyoftheseinitialmatcheswillbeincorrectduetoambiguousfeaturesorfeaturesthatarisefrombackgroundclutter.Therefore,clustersofatleast3featuresare®rstidenti®edthatagreeonanobjectanditspose,astheseclustershaveamuchhigherprobabilityofbeingcorrectthanindividualfeaturematches.Then,eachclusterischeckedbyperformingadetailedgeometric®ttothemodel,andtheresultisusedtoacceptorrejecttheinterpretation.7.1KeypointmatchingThebestcandidatematchforeachkeypointisfoundbyidentifyingitsnearestneighborinthedatabaseofkeypointsfromtrainingimages.Thenearestneighborisde®nedasthekeypointwithminimumEuclideandistancefortheinvariantdescriptorvectoraswasdescribedinSection6.However,manyfeaturesfromanimagewillhavenothaveanycorrectmatchinthetrain-ingdatabasebecausetheyarisefrombackgroundclutterorwerenotdetectedinthetrainingimages.Therefore,itwouldbeusefultohaveawaytodiscardfeaturesthatdonothaveanygoodmatchtothedatabase.Aglobalthresholdondistancetotheclosestfeaturedoesnotperformwell,assomedescriptorsaremuchmorediscriminativethanothers.Amoreeffectivemeasureisobtainedbycomparingthedistanceoftheclosestneighbortothatofthe19 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1PDFRatio of distances (closest/next closest)PDF for correct matchesPDF for incorrect matchesFigure11:Theprobabilitythatamatchiscorrectcanbedeterminedbytakingtheratioofdistancefromtheclosestneighbortothedistanceofthesecondclosest.Usingadatabaseof40,000keypoints,thesolidlineshowsthePDFofthisratioforcorrectmatches,whilethedottedlineisformatchesthatwereincorrect.second-closestneighbor.Iftherearemultipletrainingimagesofthesameobject,thenwede®nethesecond-closestneighborasbeingtheclosestneighborthatisknowntocomefromadifferentobjectthanthe®rst,suchasbyonlyusingimagesknowntocontaindifferentob-jects.Thismeasureperformswellbecausecorrectmatchesneedtohavetheclosestneighborsigni®cantlycloserthanthe

20 closestincorrectmatchtoachievereliablema
closestincorrectmatchtoachievereliablematching.Forfalsematches,therewilllikelybeanumberofotherfalsematcheswithinsimilardistancesduetothehighdimensionalityofthefeaturespace.Wecanthinkofthesecond-closestmatchasprovidinganestimateofthedensityoffalsematcheswithinthisportionofthefeaturespaceandatthesametimeidentifyingspeci®cinstancesoffeatureambiguity.Figure11showsthevalueofthismeasureforrealimagedata.Theprobabilitydensityfunctionsforcorrectandincorrectmatchesareshownintermsoftheratioofclosesttosecond-closestneighborsofeachkeypoint.MatchesforwhichthenearestneighborwasacorrectmatchhaveaPDFthatiscenteredatamuchlowerratiothanthatforincorrectmatches.Forourobjectrecognitionimplementation,werejectallmatchesinwhichthedistanceratioisgreaterthan0.8,whicheliminates90%ofthefalsematcheswhilediscardinglessthan5%ofthecorrectmatches.This®gurewasgeneratedbymatchingimagesfollowingrandomscaleandorientationchange,adepthrotationof30degrees,andadditionof2%imagenoise,againstadatabaseof40,000keypoints.7.2EfcientnearestneighborindexingNoalgorithmsareknownthatcanidentifytheexactnearestneighborsofpointsinhighdi-mensionalspacesthatareanymoreef®cientthanexhaustivesearch.Ourkeypointdescriptorhasa128-dimensionalfeaturevector,andthebestalgorithms,suchasthek-dtree(Friedmanetal.,1977)providenospeedupoverexhaustivesearchformorethanabout10dimensionalspaces.Therefore,wehaveusedanapproximatealgorithm,calledtheBest-Bin-First(BBF)algorithm(BeisandLowe,1997).Thisisapproximateinthesensethatitreturnstheclosest20 neighborwithhighprobability.TheBBFalgorithmusesamodi®edsearchorderingforthek-dtreealgorithmsothatbinsinfeaturespacearesearchedintheorderoftheirclosestdistancefromthequerylocation.Thisprioritysearchorderwas®rstexaminedbyAryaandMount(1993),andtheyprovidefurtherstudyofitscomputationalpropertiesin(Aryaetal.,1998).Thissearchorderrequirestheuseofaheap-basedpriorityqueueforef®cientdeterminationofthesearchorder.Anapproximateanswercanbereturnedwithlowcostbycuttingofffurthersearchafteraspeci®cnumberofthenearestbinshavebeenexplored.Inourimplementation,wecutoffsearchaftercheckingthe®rst200nearest-neighborcandidates.Foradatabaseof100,000keypoints,thisprovidesaspeedupoverexactnearestneighborsearchbyabout2ordersofmagnitudeyetresultsinlessthana5%lossinthenumberofcorrectmatches.OnereasontheBBFalgorithmworksparticularlywellforthisproblemisthatweonlyconsidermatchesinwhichthenearestneighborislessthan0.8

21 timesthedistancetothesecond-nearestneigh
timesthedistancetothesecond-nearestneighbor(asdescribedintheprevioussection),andthereforethereisnoneedtoexactlysolvethemostdif®cultcasesinwhichmanyneighborsareatverysimilardistances.7.3ClusteringwiththeHoughtransformTomaximizetheperformanceofobjectrecognitionforsmallorhighlyoccludedobjects,wewishtoidentifyobjectswiththefewestpossiblenumberoffeaturematches.Wehavefoundthatreliablerecognitionispossiblewithasfewas3features.Atypicalimagecontains2,000ormorefeatureswhichmaycomefrommanydifferentobjectsaswellasbackgroundclutter.WhilethedistanceratiotestdescribedinSection7.1willallowustodiscardmanyofthefalsematchesarisingfrombackgroundclutter,thisdoesnotremovematchesfromothervalidobjects,andweoftenstillneedtoidentifycorrectsubsetsofmatchescontaininglessthan1%inliersamong99%outliers.Manywell-knownrobust®ttingmethods,suchasRANSACorLeastMedianofSquares,performpoorlywhenthepercentofinliersfallsmuchbelow50%.Fortunately,muchbetterperformancecanbeobtainedbyclusteringfeaturesinposespaceusingtheHoughtransform(Hough,1962;Ballard,1981;Grimson1990).TheHoughtransformidenti®esclustersoffeatureswithaconsistentinterpretationbyusingeachfeaturetovoteforallobjectposesthatareconsistentwiththefeature.Whenclustersoffeaturesarefoundtovoteforthesameposeofanobject,theprobabilityoftheinterpretationbeingcorrectismuchhigherthanforanysinglefeature.Eachofourkeypointsspeci®es4parameters:2Dlocation,scale,andorientation,andeachmatchedkeypointinthedatabasehasarecordofthekeypoint'sparametersrelativetothetrainingimageinwhichitwasfound.Therefore,wecancreateaHoughtransformentrypredictingthemodellocation,orientation,andscalefromthematchhypothesis.Thispredictionhaslargeerrorbounds,asthesimilaritytransformimpliedbythese4parametersisonlyanapproximationtothefull6degree-of-freedomposespacefora3Dobjectandalsodoesnotaccountforanynon-rigiddeformations.Therefore,weusebroadbinsizesof30degreesfororientation,afactorof2forscale,and0.25timesthemaximumprojectedtrainingimagedimension(usingthepredictedscale)forlocation.Toavoidtheproblemofboundaryeffectsinbinassignment,eachkeypointmatchvotesforthe2closestbinsineachdimension,givingatotalof16entriesforeachhypothesisandfurtherbroadeningtheposerange.InmostimplementationsoftheHoughtransform,amulti-dimensionalarrayisusedtorepresentthebins.However,manyofthepotentialbinswillremainempty,anditisdif®culttocomputetherangeofpossiblebinvaluesduetotheirmutualdependence(forexam

22 ple,21 thedependencyoflocationdiscretiza
ple,21 thedependencyoflocationdiscretizationontheselectedscale).Theseproblemscanbeavoidedbyusingapseudo-randomhashfunctionofthebinvaluestoinsertvotesintoaone-dimensionalhashtable,inwhichcollisionsareeasilydetected.7.4SolutionforafneparametersTheHoughtransformisusedtoidentifyallclusterswithatleast3entriesinabin.Eachsuchclusteristhensubjecttoageometricveri®cationprocedureinwhichaleast-squaressolutionisperformedforthebestaf®neprojectionparametersrelatingthetrainingimagetothenewimage.Anaf®netransformationcorrectlyaccountsfor3Drotationofaplanarsurfaceunderorthographicprojection,buttheapproximationcanbepoorfor3Drotationofnon-planarobjects.Amoregeneralsolutionwouldbetosolveforthefundamentalmatrix(LuongandFaugeras,1996;HartleyandZisserman,2000).However,afundamentalmatrixsolutionrequiresatleast7pointmatchesascomparedtoonly3fortheaf®nesolutionandinpracticerequiresevenmorematchesforgoodstability.Wewouldliketoperformrecognitionwithasfewas3featurematches,sotheaf®nesolutionprovidesabetterstartingpointandwecanaccountforerrorsintheaf®neapproximationbyallowingforlargeresidualerrors.Ifweimagineplacingaspherearoundanobject,thenrotationofthesphereby30degreeswillmovenopointwithinthespherebymorethan0.25timestheprojecteddiameterofthesphere.Fortheexamplesoftypical3Dobjectsusedinthispaper,anaf®nesolutionworkswellgiventhatweallowresidualerrorsupto0.25timesthemaximumprojecteddimensionoftheobject.Amoregeneralapproachisgivenin(BrownandLowe,2002),inwhichtheinitialsolutionisbasedonasimilaritytransform,whichthenprogressestosolutionforthefundamentalmatrixinthosecasesinwhichasuf®cientnumberofmatchesarefound.Theaf®netransformationofamodelpoint[xy]Ttoanimagepoint[uv]Tcanbewrittenas"uv#="m1m2m3m4#"xy#+"txty#wherethemodeltranslationis[txty]Tandtheaf®nerotation,scale,andstretcharerepre-sentedbythemiparameters.Wewishtosolveforthetransformationparameters,sotheequationabovecanberewrit-tentogathertheunknownsintoacolumnvector:2xy001000xy01::::::32m1m2m3m4txty3=2uv.3Thisequationshowsasinglematch,butanynumberoffurthermatchescanbeadded,witheachmatchcontributingtwomorerowstothe®rstandlastmatrix.Atleast3matchesareneededtoprovideasolution.WecanwritethislinearsystemasAx=b22 Figure12:Thetrainingimagesfortwoobjectsareshownontheleft.Thesecanberecognizedinaclutteredimagewithextensiveocclusion,showninthemiddle.Theresultsofrecognitionareshownontheright.Aparallelogramisdrawnaroundeachrecognizedobjectsh

23 owingtheboundariesoftheoriginaltrainingi
owingtheboundariesoftheoriginaltrainingimageundertheaf®netransformationsolvedforduringrecognition.Smallersquaresindicatethekeypointsthatwereusedforrecognition.Theleast-squaressolutionfortheparametersxcanbedeterminedbysolvingthecorrespond-ingnormalequations,x=[ATA]1ATb;whichminimizesthesumofthesquaresofthedistancesfromtheprojectedmodellocationstothecorrespondingimagelocations.Thisleast-squaresapproachcouldreadilybeextendedtosolvingfor3Dposeandinternalparametersofarticulatedandexibleobjects(Lowe,1991).Outlierscannowberemovedbycheckingforagreementbetweeneachimagefeatureandthemodel.Giventhemoreaccurateleast-squaressolution,wenowrequireeachmatchtoagreewithinhalftheerrorrangethatwasusedfortheparametersintheHoughtransformbins.Iffewerthan3pointsremainafterdiscardingoutliers,thenthematchisrejected.Asoutliersarediscarded,theleast-squaressolutionisre-solvedwiththeremainingpoints,andtheprocessiterated.Inaddition,atop-downmatchingphaseisusedtoaddanyfurthermatchesthatagreewiththeprojectedmodelposition.ThesemayhavebeenmissedfromtheHoughtransformbinduetothesimilaritytransformapproximationorothererrors.The®naldecisiontoacceptorrejectamodelhypothesisisbasedonadetailedprobabilis-ticmodelgiveninapreviouspaper(Lowe,2001).Thismethod®rstcomputestheexpectednumberoffalsematchestothemodelpose,giventheprojectedsizeofthemodel,thenumberoffeatureswithintheregion,andtheaccuracyofthe®t.ABayesiananalysisthengivestheprobabilitythattheobjectispresentbasedontheactualnumberofmatchingfeaturesfound.Weacceptamodelifthe®nalprobabilityforacorrectinterpretationisgreaterthan0.98.Forobjectsthatprojecttosmallregionsofanimage,3featuresmaybesuf®cientforreli-ablerecognition.Forlargeobjectscoveringmostofaheavilytexturedimage,theexpectednumberoffalsematchesishigher,andasmanyas10featurematchesmaybenecessary.8RecognitionexamplesFigure12showsanexampleofobjectrecognitionforaclutteredandoccludedimagecon-taining3Dobjects.Thetrainingimagesofatoytrainandafrogareshownontheleft.23 Figure13:Thisexampleshowslocationrecognitionwithinacomplexscene.Thetrainingimagesforlocationsareshownattheupperleftandthe640x315pixeltestimagetakenfromadifferentviewpointisontheupperright.Therecognizedregionsareshownonthelowerimage,withkeypointsshownassquaresandanouterparallelogramshowingtheboundariesofthetrainingimagesundertheaf®netransformusedforrecognition.Themiddleimage(ofsize600x480pixels)containsinstancesoftheseobjectshid

24 denbehindothersandwithextensivebackgroun
denbehindothersandwithextensivebackgroundcluttersothatdetectionoftheobjectsmaynotbeim-mediateevenforhumanvision.Theimageontherightshowsthe®nalcorrectidenti®cationsuperimposedonareducedcontrastversionoftheimage.Thekeypointsthatwereusedforrecognitionareshownassquareswithanextralinetoindicateorientation.Thesizesofthesquarescorrespondtotheimageregionsusedtoconstructthedescriptor.Anouterparallel-ogramisalsodrawnaroundeachinstanceofrecognition,withitssidescorrespondingtotheboundariesofthetrainingimagesprojectedunderthe®nalaf®netransformationdeterminedduringrecognition.Anotherpotentialapplicationoftheapproachistoplacerecognition,inwhichamobiledeviceorvehiclecouldidentifyitslocationbyrecognizingfamiliarlocations.Figure13givesanexampleofthisapplication,inwhichtrainingimagesaretakenofanumberoflocations.Asshownontheupperleft,thesecanevenbeofsuchseeminglynon-distinctiveitemsasawoodenwalloratreewithtrashbins.Thetestimage(ofsize640by315pixels)ontheupperrightwastakenfromaviewpointrotatedabout30degreesaroundthescenefromtheoriginalpositions,yetthetrainingimagelocationsareeasilyrecognized.24 Allstepsoftherecognitionprocesscanbeimplementedef®ciently,sothetotaltimetorecognizeallobjectsinFigures12or13islessthan0.3secondsona2GHzPentium4processor.Wehaveimplementedthesealgorithmsonalaptopcomputerwithattachedvideocamera,andhavetestedthemextensivelyoverawiderangeofconditions.Ingeneral,texturedplanarsurfacescanbeidenti®edreliablyoverarotationindepthofupto50degreesinanydirectionandunderalmostanyilluminationconditionsthatprovidesuf®cientlightanddonotproduceexcessiveglare.For3Dobjects,therangeofrotationindepthforreliablerecognitionisonlyabout30degreesinanydirectionandilluminationchangeismoredisruptive.Forthesereasons,3Dobjectrecognitionisbestperformedbyintegratingfeaturesfrommultipleviews,suchaswithlocalfeatureviewclustering(Lowe,2001).Thesekeypointshavealsobeenappliedtotheproblemofrobotlocalizationandmap-ping,whichhasbeenpresentedindetailinotherpapers(Se,LoweandLittle,2001).Inthisapplication,atrinocularstereosystemisusedtodetermine3Destimatesforkeypointloca-tions.Keypointsareusedonlywhentheyappearinall3imageswithconsistentdisparities,resultinginveryfewoutliers.Astherobotmoves,itlocalizesitselfusingfeaturematchestotheexisting3Dmap,andthenincrementallyaddsfeaturestothemapwhileupdatingtheir3DpositionsusingaKalman®lter.Thisprovidesarobustandaccuratesolutiontotheproblemofrobotloc

25 alizationinunknownenvironments.Thisworkh
alizationinunknownenvironments.Thisworkhasalsoaddressedtheproblemofplacerecognition,inwhicharobotcanbeswitchedonandrecognizeitslocationanywherewithinalargemap(Se,LoweandLittle,2002),whichisequivalenttoa3Dimplementationofobjectrecognition.9ConclusionsTheSIFTkeypointsdescribedinthispaperareparticularlyusefulduetotheirdistinctive-ness,whichenablesthecorrectmatchforakeypointtobeselectedfromalargedatabaseofotherkeypoints.Thisdistinctivenessisachievedbyassemblingahigh-dimensionalvectorrepresentingtheimagegradientswithinalocalregionoftheimage.Thekeypointshavebeenshowntobeinvarianttoimagerotationandscaleandrobustacrossasubstantialrangeofaf®nedistortion,additionofnoise,andchangeinillumination.Largenumbersofkeypointscanbeextractedfromtypicalimages,whichleadstorobustnessinextractingsmallobjectsamongclutter.Thefactthatkeypointsaredetectedoveracompleterangeofscalesmeansthatsmalllocalfeaturesareavailableformatchingsmallandhighlyoccludedobjects,whilelargekeypointsperformwellforimagessubjecttonoiseandblur.Theircomputationisef®cient,sothatseveralthousandkeypointscanbeextractedfromatypicalimagewithnearreal-timeperformanceonstandardPChardware.Thispaperhasalsopresentedmethodsforusingthekeypointsforobjectrecognition.Theapproachwehavedescribedusesapproximatenearest-neighborlookup,aHoughtransformforidentifyingclustersthatagreeonobjectpose,least-squaresposedetermination,and®-nalveri®cation.Otherpotentialapplicationsincludeviewmatchingfor3Dreconstruction,motiontrackingandsegmentation,robotlocalization,imagepanoramaassembly,epipolarcalibration,andanyothersthatrequireidenti®cationofmatchinglocationsbetweenimages.Therearemanydirectionsforfurtherresearchinderivinginvariantanddistinctiveimagefeatures.Systematictestingisneededondatasetswithfull3Dviewpointandilluminationchanges.Thefeaturesdescribedinthispaperuseonlyamonochromeintensityimage,sofur-therdistinctivenesscouldbederivedfromincludingillumination-invariantcolordescriptors25 (FuntandFinlayson,1995;BrownandLowe,2002).Similarly,localtexturemeasuresappeartoplayanimportantroleinhumanvisionandcouldbeincorporatedintofeaturedescriptorsinamoregeneralformthanthesinglespatialfrequencyusedbythecurrentdescriptors.Anattractiveaspectoftheinvariantlocalfeatureapproachtomatchingisthatthereisnoneedtoselectjustonefeaturetype,andthebestresultsarelikelytobeobtainedbyusingmanydifferentfeatures,allofwhichcancontributeusefulmatchesandimproveov

26 erallrobustness.Anotherdirectionforfutur
erallrobustness.Anotherdirectionforfutureresearchwillbetoindividuallylearnfeaturesthataresuitedtorecognizingparticularobjectscategories.Thiswillbeparticularlyimportantforgenericobjectclassesthatmustcoverabroadrangeofpossibleappearances.TheresearchofWe-ber,Welling,andPerona(2000)andFergus,Perona,andZisserman(2003)hasshownthepotentialofthisapproachbylearningsmallsetsoflocalfeaturesthataresuitedtorecogniz-inggenericclassesofobjects.Inthelongterm,featuresetsarelikelytocontainbothpriorandlearnedfeaturesthatwillbeusedaccordingtotheamountoftrainingdatathathasbeenavailableforvariousobjectclasses.AcknowledgmentsIwouldparticularlyliketothankMatthewBrown,whohassuggestednumerousimprovementstoboththecontentandpresentationofthispaperandwhoseownworkonfeaturelocalizationandinvariancehascontributedtothisapproach.Inaddition,Iwouldliketothankmanyothersfortheirvaluablesuggestions,includingStephenSe,JimLittle,KrystianMikolajczyk,CordeliaSchmid,TonyLinde-berg,andAndrewZisserman.ThisresearchwassupportedbytheNaturalSciencesandEngineeringResearchCouncilofCanada(NSERC)andthroughtheInstituteforRoboticsandIntelligentSystems(IRIS)NetworkofCentresofExcellence.ReferencesArya,S.,andMount,D.M.1993.Approximatenearestneighborqueriesin®xeddimensions.InFourthAnnualACM-SIAMSymposiumonDiscreteAlgorithms(SODA'93),pp.271-280.Arya,S.,Mount,D.M.,Netanyahu,N.S.,Silverman,R.,andWu,A.Y.1998.Anoptimalalgorithmforapproximatenearestneighborsearching.JournaloftheACM,45:891-923.Ballard,D.H.1981.GeneralizingtheHoughtransformtodetectarbitrarypatterns.PatternRecogni-tion,13(2):111-122.Basri,R.,andJacobs,D.W.1997.Recognitionusingregioncorrespondences.InternationalJournalofComputerVision,25(2):145-166.Baumberg,A.2000.Reliablefeaturematchingacrosswidelyseparatedviews.InConferenceonComputerVisionandPatternRecognition,HiltonHead,SouthCarolina,pp.774-781.Beis,J.andLowe,D.G.1997.Shapeindexingusingapproximatenearest-neighboursearchinhigh-dimensionalspaces.InConferenceonComputerVisionandPatternRecognition,PuertoRico,pp.1000-1006.Brown,M.andLowe,D.G.2002.Invariantfeaturesfrominterestpointgroups.InBritishMachineVisionConference,Cardiff,Wales,pp.656-665.Carneiro,G.,andJepson,A.D.2002.Phase-basedlocalfeatures.InEuropeanConferenceonCom-puterVision(ECCV),Copenhagen,Denmark,pp.282-296.Crowley,J.L.andParker,A.C.1984.Arepresentationforshapebasedonpeaksandridgesinthedifferenceoflow-passtransform.IEEETrans.onPatternAna

27 lysisandMachineIntelligence,6(2):156-170
lysisandMachineIntelligence,6(2):156-170.26 Edelman,S.,Intrator,N.andPoggio,T.1997.Complexcellsandobjectrecognition.Unpublishedmanuscript:http://kybele.psych.cornell.edu/edelman/archive.htmlFergus,R.,Perona,P.,andZisserman,A.2003.Objectclassrecognitionbyunsupervisedscale-invariantlearning.InIEEEConferenceonComputerVisionandPatternRecognition,Madison,Wisconsin,pp.264-271.Friedman,J.H.,Bentley,J.L.andFinkel,R.A.1977.Analgorithmfor®ndingbestmatchesinloga-rithmicexpectedtime.ACMTransactionsonMathematicalSoftware,3(3):209-226.Funt,B.V.andFinlayson,G.D.1995.Colorconstantcolorindexing.IEEETrans.onPatternAnalysisandMachineIntelligence,17(5):522-529.Grimson,E.1990.ObjectRecognitionbyComputer:TheRoleofGeometricConstraints,TheMITPress:Cambridge,MA.Harris,C.1992.Geometryfromvisualmotion.InActiveVision,A.BlakeandA.Yuille(Eds.),MITPress,pp.263-284.Harris,C.andStephens,M.1988.Acombinedcornerandedgedetector.InFourthAlveyVisionConference,Manchester,UK,pp.147-151.Hartley,R.andZisserman,A.2000.Multipleviewgeometryincomputervision,CambridgeUniver-sityPress:Cambridge,UK.Hough,P.V.C.1962.Methodandmeansforrecognizingcomplexpatterns.U.S.Patent3069654.Koenderink,J.J.1984.Thestructureofimages.BiologicalCybernetics,50:363-396.Lindeberg,T.1993.Detectingsalientblob-likeimagestructuresandtheirscaleswithascale-spaceprimalsketch:amethodforfocus-of-attention.InternationalJournalofComputerVision,11(3):283-318.Lindeberg,T.1994.Scale-spacetheory:Abasictoolforanalysingstructuresatdifferentscales.JournalofAppliedStatistics,21(2):224-270.Lowe,D.G.1991.Fittingparameterizedthree-dimensionalmodelstoimages.IEEETrans.onPatternAnalysisandMachineIntelligence,13(5):441-450.Lowe,D.G.1999.Objectrecognitionfromlocalscale-invariantfeatures.InInternationalConferenceonComputerVision,Corfu,Greece,pp.1150-1157.Lowe,D.G.2001.Localfeatureviewclusteringfor3Dobjectrecognition.IEEEConferenceonComputerVisionandPatternRecognition,Kauai,Hawaii,pp.682-688.Luong,Q.T.,andFaugeras,O.D.1996.Thefundamentalmatrix:Theory,algorithms,andstabilityanalysis.InternationalJournalofComputerVision,17(1):43-76.Matas,J.,Chum,O.,Urban,M.,andPajdla,T.2002.Robustwidebaselinestereofrommaximallystableextremalregions.InBritishMachineVisionConference,Cardiff,Wales,pp.384-393.Mikolajczyk,K.2002.Detectionoflocalfeaturesinvarianttoaf®netransformations,Ph.D.thesis,InstitutNationalPolytechniquedeGrenoble,France.Mikolajczyk,K.,andSchmid,C

28 .2002.Anaf®neinvariantinterestpointdetec
.2002.Anaf®neinvariantinterestpointdetector.InEuropeanConferenceonComputerVision(ECCV),Copenhagen,Denmark,pp.128-142.Mikolajczyk,K.,Zisserman,A.,andSchmid,C.2003.Shaperecognitionwithedge-basedfeatures.InProceedingsoftheBritishMachineVisionConference,Norwich,U.K.Moravec,H.1981.Rovervisualobstacleavoidance.InInternationalJointConferenceonArti®cialIntelligence,Vancouver,Canada,pp.785-790.Nelson,R.C.,andSelinger,A.1998.Large-scaletestsofakeyed,appearance-based3-Dobjectrecognitionsystem.VisionResearch,38(15):2469-88.Pope,A.R.,andLowe,D.G.2000.Probabilisticmodelsofappearancefor3-Dobjectrecognition.InternationalJournalofComputerVision,40(2):149-167.27 Pritchard,D.,andHeidrich,W.2003.Clothmotioncapture.ComputerGraphicsForum(Eurographics2003),22(3):263-271.Schaffalitzky,F.,andZisserman,A.2002.Multi-viewmatchingforunorderedimagesets,or`HowdoIorganizemyholidaysnaps?”'InEuropeanConferenceonComputerVision,Copenhagen,Denmark,pp.414-431.Schiele,B.,andCrowley,J.L.2000.Recognitionwithoutcorrespondenceusingmultidimensionalreceptive®eldhistograms.InternationalJournalofComputerVision,36(1):31-50.Schmid,C.,andMohr,R.1997.Localgrayvalueinvariantsforimageretrieval.IEEETrans.onPatternAnalysisandMachineIntelligence,19(5):530-534.Se,S.,Lowe,D.G.,andLittle,J.2001.Vision-basedmobilerobotlocalizationandmappingusingscale-invariantfeatures.InInternationalConferenceonRoboticsandAutomation,Seoul,Korea,pp.2051-58.Se,S.,Lowe,D.G.,andLittle,J.2002.Globallocalizationusingdistinctivevisualfeatures.InInternationalConferenceonIntelligentRobotsandSystems,IROS2002,Lausanne,Switzerland,pp.226-231.Shokoufandeh,A.,Marsic,I.,andDickinson,S.J.1999.View-basedobjectrecognitionusingsaliencymaps.ImageandVisionComputing,17:445-460.Torr,P.1995.MotionSegmentationandOutlierDetection,Ph.D.Thesis,Dept.ofEngineeringSci-ence,UniversityofOxford,UK.Tuytelaars,T.,andVanGool,L.2000.Widebaselinestereobasedonlocal,af®nelyinvariantregions.InBritishMachineVisionConference,Bristol,UK,pp.412-422.Weber,M.,Welling,M.andPerona,P.2000.Unsupervisedlearningofmodelsforrecognition.InEuropeanConferenceonComputerVision,Dublin,Ireland,pp.18-32.Witkin,A.P.1983.Scale-space®ltering.InInternationalJointConferenceonArti®cialIntelligence,Karlsruhe,Germany,pp.1019-1022.Zhang,Z.,Deriche,R.,Faugeras,O.,andLuong,Q.T.1995.Arobusttechniqueformatchingtwoun-calibratedimagesthroughtherecoveryoftheunknownepipolargeometry.Arti®cialIntelligen