/
Int J Comput Vis DOI Int J Comput Vis DOI

Int J Comput Vis DOI - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
469 views
Uploaded On 2014-11-16

Int J Comput Vis DOI - PPT Presentation

1007s1126300701073 Modeling the World from Internet Photo Collections Noah Snavely Steven M Seitz Richard Szeliski Received 30 January 2007 Accepted 31 October 2007 57513 Springer ScienceBusiness Media LLC 20 ID: 13067

1007s1126300701073 Modeling the World from

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Int J Comput Vis DOI " is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Dame”returnsoveronemillionhits(asofSeptember, 2007),showingthecathedralfromalmosteveryconceivable viewingpositionandangle,differenttimesofdayandnight, N.Snavely(  ) · S.M.Seitz UniversityofWashington,Seattle,WA,USA e-mail:snavely@cs.washington.edu R.Szeliski MicrosoftResearch,Redmond,WA,USA andchangesinseason,weather,anddecade.Furthermore, entirecitiesarenowbeingcapturedatstreetlevelandfrom abirds-eyeperspective(e.g.,WindowsLiveLocal, 1 , 2 and GoogleStreetview 3 IntJComputVis TheSfMapproachusedinthispaperissimilartothat ofBrownandLowe( 2005 ),withseveralmodicationsto improverobustnessoveravarietyofdatasets.Thesein- cludeinitializingnewcamerasusingposeestimation,to helpavoidlocalminima;adifferentheuristicforselecting theinitialtwoimagesforSfM;checkingthatreconstructed pointsarewell-conditionedbeforeaddingthemtothescene; andusingfocallengthinformationfromimageEXIFtags. SchaffalitzkyandZisserman( 2002 )presentanotherrelated techniqueforreconstructingunorderedimagesets,concen- tratingonefcientlymatchinginterestpointsbetweenim- ages.VergauwenandVanGoolhavedevelopedasimilar approach(VergauwenandVanGool 2006 )andarehostinga web-basedreconstructionserviceforuseinculturalheritage applications 5 .FitzgibbonandZisserman( 1998 )andNistér ( 2000 )preferabottom-upapproach,wheresmallsubsetsof imagesarematchedtoeachotherandthenmergedinan agglomerativefashionintoacomplete3Dreconstruction. WhilealloftheseapproachesaddressthesameSfMprob- lemthatwedo,theyweretestedonmuchsimplerdatasets withmorelimitedvariationinimagingconditions.Ourpa- permarkstherstsuccessfuldemonstrationofSfMtech- niquesappliedtothekindsofreal-worldimagesetsfound onGoogleandFlickr.Forinstance,ourtypicalimageset hasphotosfromhundredsofdifferentcameras,zoomlevels, resolutions,differenttimesofdayorseasons,illumination, weather,anddifferingamountsofocclusion. 2.3Image-BasedModeling Inrecentyears,computervisiontechniquessuchasstructure frommotionandmodel-basedreconstructionhavegained tractioninthecomputergraphicseldunderthenameof image-basedmodeling .IBMistheprocessofcreatingthree- dimensionalmodelsfromacollectionofinputimages(De- bevecetal. 1996 ;Grzeszczuk 2002 ;Pollefeysetal. 2004 ). OneparticularapplicationofIBMhasbeenthecre- ationoflargescalearchitecturalmodels.Notableexam- plesincludethesemi-automaticFaçadesystem(Debevec etal. 1996 ),whichwasusedtoreconstructcompellingy- throughsoftheUniversityofCaliforniaBerkeleycampus; automaticarchitecturereconstructionsystemssuchasthat ofDicketal.( 2004 );andtheMITCityScanningProject (Telleretal. 2003 ),whichcapturedthousandsofcalibrated imagesfromaninstrumentedrigtoconstructa3Dmodelof theMITcampus.Therearealsoseveralongoingacademic andcommercialprojectsfocusedonlarge-scaleurbanscene reconstruction.Theseeffortsincludethe4DCitiesproject (Schindleretal. 2007 ),whichaimstocreateaspatial- temporalmodelofAtlantafromhistoricalphotographs;the 5 Epoch3DWebservice,http://homes.esat.kuleuven.be/~visit3d/ webservice/html/. StanfordCityBlockProject(Románetal. 2004 ),whichuses videoofcityblockstocreatemulti-perspectivestripimages; andtheUrbanScapeprojectofAkbarzadehetal.( 2006 ). Ourworkdiffersfromthesepreviousapproachesinthat weonlyreconstructa sparse 3Dmodeloftheworld,since ouremphasisismoreoncreatingsmooth3Dtransitionsbe- tweenphotographsratherthaninteractivelyvisualizinga3D world. 2.4Image-BasedRendering Theeldofimage-basedrendering(IBR)isdevotedtothe problemofsynthesizingnewviewsofascenefromaset ofinputphotographs.Aforerunnertothiseldwasthe groundbreakingAspenMovieMapproject(Lippman 1980 ), inwhichthousandsofimagesofAspenColoradowerecap- turedfromamovingcar,registeredtoastreetmapofthe city,andstoredonlaserdisc.Auserinterfaceenabledin- teractivelymovingthroughtheimagesasafunctionofthe desiredpathoftheuser.Additionalfeaturesincludedanavi- gationmapofthecityoverlaidontheimagedisplay,andthe abilitytotouchanybuildinginthecurrenteldofviewand jumptoafacadeofthatbuilding.Thesystemalsoallowed attachingmetadatasuchasrestaurantmenusandhistorical imageswithindividualbuildings.Recently,severalcompa- nies,suchasGoogle 6 andEveryScape 7 havebeguncreating similar“surrogatetravel”applicationsthatcanbeviewedin awebbrowser.Ourworkcanbeseenasawaytoautomati- callycreateMovieMapsfromunorganizedcollectionsofim- ages.(Incontrast,theAspenMovieMapinvolvedateamof overadozenpeopleworkingoverafewyears.)Anumber ofourvisualization,navigation,andannotationcapabilities aresimilartothoseintheoriginalMovieMapwork,butin animprovedandgeneralizedform. MorerecentworkinIBRhasfocusedontechniques fornewviewsynthesis,e.g.,(ChenandWilliams 1993 ; McMillanandBishop 1995 ;Gortleretal. 1996 ;Levoyand Hanrahan 1996 ;SeitzandDyer 1996 ;Aliagaetal. 2003 ; Zitnicketal. 2004 ;Buehleretal. 2001 ).Intermsofappli- cations,Aliagaetal.’s( 2003 ) SeaofImages workisperhaps closesttooursinitsuseofalargecollectionofimagestaken throughoutanarchitecturalspace;thesameauthorsaddress theproblemofcomputingconsistentfeaturematchesacross multipleimagesforthepurposesofIBR(Aliagaetal. 2003 ). However,ourimagesarecasuallyacquiredbydifferentpho- tographers,ratherthanbeingtakenonaxedgridwitha guidedrobot. IncontrasttomostpriorworkinIBR,ourobjectiveis not tosynthesizeaphoto-realisticviewoftheworldfrom allviewpoints perse ,buttobrowseaspeciccollectionof 6 GoogleMaps,http://maps.google.com. 7 Everyscape,http://www.everyscape.com. IntJComputVis pointsproducedbySfMmethodsarebythemselvesvery limitedanddonotdirectlyproducecompellingsceneren- derings.Nevertheless,wedemonstratethatthissparseSfM- derivedgeometryandcamerainformation,alongwithmor- phingandnon-photorealisticrenderingtechniques,issuf- cienttoprovidecompellingviewinterpolationsasdescribed in 5 .Leveragingthiscapability,Section 6 describesanovel photoexplorer interfaceforbrowsinglargecollectionsof photographsinwhichtheusercanvirtuallyexplorethe3D spacebymovingfromoneimagetoanother. Often,weareinterestedinlearningmoreaboutthecon- tentofanimage,e.g.,“whichstatueisthis?”or“whenwas thisbuildingconstructed?”Agreatdealofannotatedimage contentofthisformalreadyexistsinguidebooks,maps,and InternetresourcessuchasWikipedia 8 andFlickr.However, theimageyoumaybeviewingatanyparticulartime(e.g., fromyourcellphonecamera)maynothavesuchannota- tions.Akeyfeatureofoursystemistheabilitytotransfer annotationsautomaticallybetweenimages,sothatinforma- tionaboutanobjectinoneimageispropagatedtoallother imagesthatcontainthesameobject(Sect. 7 ). Section 8 presentsextensiveresultson11scenes,with visualizationsandananalysisofthematchingandrecon- structionresultsforthesescenes.Wealsobrieydescribe Photosynth ,arelated3Dimagebrowsingtooldevelopedby MicrosoftLiveLabsthatisbasedontechniquesfromthis paper,butalsoaddsanumberofinterestingnewelements. Finally,weconcludewithasetofresearchchallengesfor thecommunityinSect. 9 . 4ReconstructingCamerasandSparseGeometry Thevisualizationandbrowsingcomponentsofoursystem requireaccurateinformationabouttherelativelocation,ori- entation,andintrinsicparameterssuchasfocallengthsfor eachphotographinacollection,aswellassparse3Dscene geometry.Afewfeaturesofoursystemrequirethe absolute locationsofthecameras,inageo-referencedcoordinate frame.SomeofthisinformationcanbeprovidedwithGPS devicesandelectroniccompasses,butthevastmajorityof existingphotographslacksuchinformation.Manydigital camerasembedfocallengthandotherinformationinthe EXIFtagsofimageles.Thesevaluesareusefulforini- tialization,butaresometimesinaccurate. Inoursystem,wedonotrelyonthecameraoranyother pieceofequipmenttoprovideuswithlocation,orientation, orgeometry.Instead,wecomputethisinformationfromthe imagesthemselvesusingcomputervisiontechniques.We rstdetectfeaturepointsineachimage,thenmatchfeature pointsbetweenpairsofimages,andnallyrunaniterative, 8 Wikipedia,http://www.wikipedia.org. robustSfMproceduretorecoverthecameraparameters.Be- causeSfMonlyestimatesthe relative positionofeachcam- era,andwearealsointerestedinabsolutecoordinates(e.g., latitudeandlongitude),weuseaninteractivetechniqueto registertherecoveredcamerastoanoverheadmap.Eachof thesestepsisdescribedinthefollowingsubsections. 4.1KeypointDetectionandMatching Therststepistondfeaturepointsineachimage.We usetheSIFTkeypointdetector(Lowe 2004 ),becauseofits goodinvariancetoimagetransformations.Otherfeaturede- tectorscouldalsopotentiallybeused;severaldetectorsare comparedintheworkofMikolajczyketal.( 2005 ).Inaddi- tiontothekeypointlocationsthemselves,SIFTprovidesa localdescriptorforeachkeypoint.Atypicalimagecontains severalthousandSIFTkeypoints. Next,foreachpairofimages,wematchkeypointdescrip- torsbetweenthepair,usingtheapproximatenearestneigh- bors(ANN) kd -treepackageofAryaetal.( 1998 ).Tomatch keypointsbetweentwoimages I and J ,wecreatea kd -tree fromthefeaturedescriptorsin J ,then,foreachfeaturein I wendthenearestneighborin J usingthe kd -tree.For efciency,weuseANN’sprioritysearchalgorithm,limiting eachquerytovisitamaximumof200binsinthetree.Rather thanclassifyingfalsematchesbythresholdingthedistance tothenearestneighbor,weusetheratiotestdescribedby Lowe( 2004 ):forafeaturedescriptorin I ,wendthetwo nearestneighborsin J ,withdistances d 1 and d 2 ,thenaccept thematchif d 1 d 2 0 . 6.Ifmorethanonefeaturein I matches thesamefeaturein J ,weremoveallofthesematches,as someofthemmustbespurious. Aftermatchingfeaturesforanimagepair (I,J) ,we robustlyestimateafundamentalmatrixforthepairus- ingRANSAC(FischlerandBolles 1981 ).Duringeach RANSACiteration,wecomputeacandidatefundamental matrixusingtheeight-pointalgorithm(HartleyandZis- serman 2004 ),normalizingtheproblemtoimproverobust- nesstonoise(Hartley 1997 ).WesettheRANSACoutlier thresholdtobe0.6%ofthemaximumimagedimension,i.e., 0 . 006max ( imagewidth , imageheight ) (aboutsixpixelsfor a1024 × 768image).TheF-matrixreturnedbyRANSACis renedbyrunningtheLevenberg-Marquardtalgorithm(No- cedalandWright 1999 )ontheeightparametersoftheF- matrix,minimizingerrorsforalltheinlierstotheF-matrix. Finally,weremovematchesthatareoutlierstotherecov- eredF-matrixusingtheabovethreshold.Ifthenumberof remainingmatchesislessthantwenty,weremoveallofthe matchesfromconsideration. Afterndingasetofgeometricallyconsistentmatches betweeneachimagepair,weorganizethematchesinto tracks ,whereatrackisaconnectedsetofmatchingkey- pointsacrossmultipleimages.Ifatrackcontainsmorethan IntJComputVis Fig.1 Photoconnectivity graph.Thisgraphcontainsa nodeforeachimageinasetof photosoftheTreviFountain, withanedgebetweeneachpair ofphotoswithmatching features.Thesizeofanodeis proportionaltoitsdegree.There aretwodominantclusters correspondingtoday( a )and nighttime( d )photos.Similar viewsofthefacadecluster togetherinthecenter,while nodesintheperiphery,e.g.,( b ) and( c ),aremoreunusual(often close-up)views onekeypointinthesameimage,itisdeemedinconsistent. Wekeepconsistenttrackscontainingatleasttwokeypoints forthenextphaseofthereconstructionprocedure. Oncecorrespondencesarefound,wecanconstructan im- ageconnectivitygraph ,inwhicheachimageisanodeand anedgeexistsbetweenanypairofimageswithmatching features.Avisualizationofanexampleconnectivitygraph fortheTreviFountainisFig. 1 .Thisgraphembeddingwas createdwiththe neato toolintheGraphviztoolkit. 9 Neato representsthegraphasamass-springsystemandsolvesfor anembeddingwhoseenergyisalocalminimum. Theimageconnectivitygraphofthisphotosethassev- eraldistinctfeatures.Thelarge,denseclusterinthecen- terofthegraphconsistsofphotosthatareallfairlywide- angle,frontal,well-litshotsofthefountain(e.g.,image(a)). Otherimages,includingthe“leaf”nodes(e.g.,images(b) and(c))andnighttimeimages(e.g.,image(d)),aremore looselyconnectedtothiscoreset.Otherconnectivitygraphs areshowninFigs. 9 and 10 . 4.2StructurefromMotion Next,werecoverasetofcameraparameters(e.g.,rotation, translation,andfocallength)foreachimageanda3Dlo- cationforeachtrack.Therecoveredparametersshouldbe consistent,inthatthereprojectionerror,i.e.,thesumofdis- tancesbetweentheprojectionsofeachtrackanditscorre- spondingimagefeatures,isminimized.Thisminimization problemcanformulatedasanon-linearleastsquaresprob- lem(seeAppendix 1 )andsolvedusingbundleadjustment. Algorithmsforsolvingthisnon-linearproblem,suchasNo- cedalandWright( 1999 ),areonlyguaranteedtondlo- calminima,andlarge-scaleSfMproblemsareparticularly pronetogettingstuckinbadlocalminima,soitisimportant 9 Graphviz—graphvisualizationsoftware,http://www.graphviz.org/. toprovidegoodinitialestimatesoftheparameters.Rather thanestimatingtheparametersforallcamerasandtracksat once,wetakeanincrementalapproach,addinginonecam- eraatatime. Webeginbyestimatingtheparametersofasinglepair ofcameras.Thisinitialpairshouldhavealargenumber ofmatches,butalsohavealargebaseline,sothattheini- tialtwo-framereconstructioncanberobustlyestimated.We thereforechoosethepairofimagesthathasthelargestnum- berofmatches,subjecttotheconditionthatthosematches cannotbewell-modeledbyasinglehomography,toavoid degeneratecasessuchascoincidentcameras.Inparticular, wendahomographybetweeneachpairofmatchingim- agesusingRANSACwithanoutlierthresholdof0.4%of max ( imagewidth , imageheight ) ,andstorethepercentage offeaturematchesthatareinlierstotheestimatedhomogra- phy.Weselecttheinitialimagepairasthatwiththelowest percentageofinlierstotherecoveredhomography,butwith atleast100matches.Thecameraparametersforthispairare estimatedusingNistér’simplementationofthevepointal- gorithm(Nistér 2004 ), 10 thenthetracksvisibleinthetwo imagesaretriangulated.Finally,wedoatwoframebundle adjustmentstartingfromthisinitialization. Next,weaddanothercameratotheoptimization.We selectthecamerathatobservesthelargestnumberof trackswhose3Dlocationshavealreadybeenestimated, andinitializethenewcamera’sextrinsicparametersusing thedirectlineartransform(DLT)technique(Hartleyand Zisserman 2004 )insideaRANSACprocedure.Forthis RANSACstep,weuseanoutlierthresholdof0.4%of max ( imagewidth , imageheight ) .Inadditiontoproviding anestimateofthecamerarotationandtranslation,theDLT techniquereturnsanupper-triangularmatrix K whichcan 10 Weonlychoosetheinitialpairamongpairsforwhichafocallength estimateisavailableforbothcameras,andthereforeacalibratedrela- tiveposealgorithmcanbeused. IntJComputVis Fig.4 Screenshotsfromthe explorerinterface. Left :when theuservisitsaphoto,that photoappearsatfull-resolution, andinformationaboutitappears inapaneontheleft. Right : aviewlookingdownonthe Praguedataset,renderedina non-photorealisticstyle photosintheexplorer.Mostexistingphotobrowsingtools cutfromonephotographtothenext,sometimessmoothing thetransitionbycross-fading.Inourcase,thegeometricin- formationweinferaboutthephotographsallowsustouse cameramotionandviewinterpolationtomaketransitions morevisuallycompellingandtoemphasizethespatialrela- tionshipsbetweenthephotographs. 5.3.1CameraMotion Whenthevirtualcameramovesfromonephotographtoan- other,thesystemlinearlyinterpolatesthecameraposition betweentheinitialandnalcameralocations,andthecam- eraorientationbetweenunitquaternionsrepresentingthe initialandnalorientations.Theeldofviewofthevirtual cameraisalsointerpolatedsothatwhenthecamerareaches itsdestination,thedestinationimagewillllasmuchofthe screenaspossible.Thecamerapathtimingisnon-uniform, easinginandoutofthetransition. Ifthecameramovesastheresultofanobjectselection (Sect. 6.3 ),thetransitionisslightlydifferent.Beforethe camerastartsmoving,itorientsitselftopointatthemean oftheselectedpoints.Thecameraremainspointedatthe meanasitmoves,sothattheselectedobjectstaysxedin theview.Thishelpskeeptheobjectfromundergoinglarge, distractingmotionsduringthetransition.Thenalorienta- tionandfocallengtharecomputedsothattheselectedobject iscenteredandllsthescreen. 5.3.2ViewInterpolation Duringcameratransitions,wealsodisplayin-betweenim- ages.Wehaveexperimentedwithtwosimpletechniquesfor morphingbetweenthestartanddestinationphotographs:tri- angulatingthepointcloudandusingplanarimpostors. TriangulatedMorphs Tocreateatriangulatedmorphbe- tweentwocameras C j and C k ,werstcomputea2D Delaunaytriangulationforimage I j usingtheprojections ofPoints (C j ) into I j .TheprojectionsofLines (C j ) into I j areimposedasedgeconstraintsonthetriangulation (Chew 1987 ).TheresultingconstrainedDelaunaytriangu- lationmaynotcovertheentireimage,soweoverlayagrid ontotheimageandaddtothetriangulationeachgridpoint notcontainedinsidetheoriginaltriangulation.Eachadded gridpointisassociatedwitha3DpointonPlane (C j ) .The connectivityofthetriangulationisthenusedtocreatea3D mesh;weproject I j ontothemeshinordertotexturemap it.Wecomputeameshfor C k andtexturemapitinthesame way. Then,torenderthetransitionbetween C j and C k ,we movethevirtualcamerafrom C j and C k whilecross-fading betweenthetwomeshes(i.e.,thetexture-mappedmeshfor C j isfadedoutwhilethetexture-mappedmeshfor C k is fadedin,withthedepthbufferturnedofftoavoidpop- ping).Whilethistechniquedoesnotusecompletelyaccu- rategeometry,themeshesareoftensufcienttogiveasense ofthe3Dgeometryofthescene.Forinstance,thisapproach workswellformanytransitionsintheGreatWalldataset (shownasastillinFig. 2 ,andasananimationinthevideo ontheprojectwebsite).However,missinggeometryand outlyingpointscansometimescausedistractingartifacts. PlanarMorphs Wehavealsoexperimentedwithusing planes,ratherthan3Dmeshes,asourprojectionsurfaces. Tocreateamorphbetweencameras C j and C k usinga planarimpostor,wesimplyprojectthetwoimages I j and I k ontoCommonPlane (C j ,C k ) andcross-fadebetweenthe projectedimagesasthecameramovesfrom C j to C k .The resultingin-betweensarenotasfaithfultotheunderlying geometryasthetriangulatedmorphs,tendingtostabilize onlyadominantplaneinthescene,buttheresultingarti- factsareusuallylessobjectionable,perhapsbecauseweare usedtoseeingdistortionscausedbyviewingplanesfrom differentangles.Becauseoftherobustnessofthismethod, weprefertouseitratherthantriangulationasthedefault fortransitions.Examplemorphsusingbothtechniquesare showninthevideoonourprojectwebsite. 15 15 Phototourismwebsite,http://phototour.cs.washington.edu/. IntJComputVis Thereareafewspecialcaseswhichmustbehandled differentlyduringtransitions.First,ifthetwocamerasob- servenocommonpoints,oursystemcurrentlyhasnoba- sisforinterpolatingtheimages.Instead,wefadeoutthe startimage,movethecameratothedestinationasusual, thenfadeinthedestinationimage.Second,ifthenormalto CommonPlane (C j ,C k ) isnearlyperpendiculartotheaver- ageoftheviewingdirectionsof C j and C k ,theprojectedim- ageswouldundergosignicantdistortionduringthemorph. Inthiscase,wereverttousingaplanepassingthroughthe meanofthepointscommontobothviews,whosenormalis theaverageoftheviewingdirections.Finally,ifthevanish- inglineofCommonPlane (C j ,C k ) isvisibleinimages I j or I k (aswouldbethecaseifthisplanewerethegroundplane, andthehorizonwerevisibleineitherimage),itisimpossi- bletoprojecttheentiretyof I j or I k ontotheplane.Inthis case,weprojectasmuchaspossibleof I j and I k ontothe plane,andprojecttherestontotheplaneatinnity. 6PhotoExplorerNavigation Ourimageexplorationtoolsupportsseveralmodesfor navigatingthroughthesceneandndinginterestingpho- tographs.Thesemodesincludefree-ightnavigation,nd- ingrelatedviews,object-basednavigation,andviewing slideshows. 6.1Free-FlightNavigation Thefree-ightnavigationcontrolsincludesomeofthestan- dard3Dmotioncontrolsfoundinmanygamesand3Dview- ers.Theusercanmovethevirtualcameraforward,back, left,right,up,anddown,andcancontrolpan,tilt,andzoom. Thisallowstheusertofreelymovearoundthesceneand providesasimplewaytondinterestingviewpointsand nearbyphotographs. Atanytime,theusercanclickonafrustuminthemain view,andthevirtualcamerawillsmoothlymoveuntilit iscoincidentwiththeselectedcamera.Thevirtualcamera pansandzoomssothattheselectedimagellsasmuchof themainviewaspossible. 6.2MovingBetweenRelatedViews Whenvisitingaphotograph C curr ,theuserhasasnapshot oftheworldfromasinglepointofviewandaninstantin time.Theusercanpanandzoomtoexplorethephoto,but mightalsowanttoseeaspectsofthescenebeyondthose capturedinasinglepicture.Heorshemightwonder,for instance,whatliesjustoutsidetheeldofview,ortothe leftoftheobjectsinthephoto,orwhatthescenelookslike atadifferenttimeofday. Tomakeiteasiertondrelatedviewssuchasthese,we providetheuserwithasetof“geometric”browsingtools. Iconsassociatedwiththesetoolsappearintworowsinthe informationpane,whichappearswhentheuserisvisitinga photograph.Thesetoolsndphotosthatdepictpartsofthe scenewithcertainspatialrelationstowhatiscurrentlyin view.Themechanismforimplementingthesesearchtools istoprojectthepointsobservedbythecurrentcamera, Points (C curr ) ,intootherphotos(orviceversa),andselect viewsbasedontheprojectedmotionofthepoints.Forin- stance,toanswerthequery “showmewhat’stotheleftof thisphoto,” wesearchforaphotoinwhichPoints (C curr ) appeartohavemovedright. Thegeometricbrowsingtoolsfallintotwocategories: toolsforselectingthe scale atwhichtoviewthescene,and directionaltoolsforlookinginaparticulardirection(e.g., leftorright). Therearethreescalingtools:(1)nd details ,orhigher- resolutionclose-ups,ofthecurrentphoto,(2)nd similar photos,and(3)nd zoom-outs ,orphotosthatshowmore surroundingcontext.Ifthecurrentphotois C curr ,thesetools searchforappropriateneighboringphotos C j byestimating therelative“apparentsize”ofsetofpointsineachimage, andcomparingtheseapparentsizes.Specically,toestimate theapparentsizeofasetofpoints P inaimage I ,weproject thepointsinto I ,computetheboundingboxoftheprojec- tionsthatareinsidetheimage,andcalculatetheratioofthe areaoftheboundingbox(inpixels)totheareaoftheimage. WerefertothisquantityasSize (P,C) . Whenoneofthesetoolsisactivated,weclassifyeach neighbor C j as: € a detail of C curr ifSize ( Points (C j ),C curr ) 0 . 75and mostpointsvisiblein C curr arevisiblein C j € similar to C curr if 0 . 75 Size ( Points (C curr ),C j ) Size ( Points (C curr ),C curr ) 1 . 3 andtheanglebetweentheviewingdirectionsof C curr and C j islessthanathresholdof10degrees € a zoom-out of C curr if C curr isadetailof C j . Theresultsofanyofthesesearchesaredisplayedinthe thumbnailpane(sortedbyincreasingapparentsize,inthe caseofdetailsandzoom-outs).Thesetoolsareusefulfor viewingthesceneinmoredetail,comparingsimilarviews ofanobjectwhichdifferinotherrespects,suchastimeof day,season,andyear,andfor“steppingback”toseemore ofthescene. Thedirectionaltoolsgivetheuserasimplewayto“step” leftorright,i.e.,toseemoreofthesceneinaparticular direction.Foreachcamera,wecomputealeftandright neighbor,andlinkthemtoarrowsdisplayedintheinfor- mationpane.Tondaleftandrightimageforcamera C j , IntJComputVis Fig.5 Object-based navigation. Theuserdragsa rectanglearoundNeptuneinone photo,andthesystemndsa new,high-resolutionphotograph wecomputetheaverage2Dmotion m jk oftheprojections ofPoints (C j ) fromimage I j toeachneighboringimage I k . Iftheanglebetween m jk andthedesireddirection(i.e.,left orright),issmall,andtheapparentsizesofPoints (C j ) in bothimagesaresimilar, C k isacandidateleftorrightimage to C j .Outofallthecandidates,weselecttheleftofright imagetobetheimage I k whosemotionmagnitude  m jk  is closestto20%ofthewidthofimage I j . 6.3Object-BasedNavigation Anothersearchqueryoursystemsupportsis “showmepho- tosofthisobject,” wheretheobjectinquestioncanbedi- rectlyselectedinaphotographorinthepointcloud.This typeofsearch,appliedtovideoin(SivicandZisserman 2003 ),iscomplementaryto,andhascertainadvantagesover, keywordsearch.Beingabletoselectanobjectisespecially usefulwhenexploringascene—whentheusercomesacross aninterestingobject,directselectionisanintuitivewayto ndabetterpictureofthatobject. Inourphotoexplorationsystem,theuserselectsanobject bydragginga2Dboxaroundaregionofthecurrentphoto orthepointcloud.Allpointswhoseprojectionsareinside theboxformthesetofselectedpoints, S .Oursystemthen searchesforthe“best”photoof S byscoringeachimagein thedatabasebasedonhowwellitrepresentstheselection. Thetopscoringphotoischosenastherepresentativeview, andthevirtualcameraismovedtothatimage.Otherimages withscoresaboveathresholdaredisplayedinthethumb- nailpane,sortedindescendingorderbyscore.Anexample objectselectioninteractionisshowninFig. 5 . Ourviewscoringfunctionisbasedonthreecriteria: (1)thevisibilityofthepointsin S ,(2)theanglefromwhich thepointsin S areviewed,and(3)theimageresolution.For eachimage I j ,wecomputethescoreasaweightedsumof threeterms, E visible , E angle ,and E detail .Detailsofthecom- putationofthesetermscanbefoundinAppendix 2 . Theset S cansometimescontainpointsthattheuserdid notintendtoselect,especiallyoccludedpointsthathappen toprojectinsidetheselectionrectangle.Ifwehadcomplete knowledgeofvisibility,wecouldcullsuchhiddenpoints. Becauseweonlyhaveasparsemodel,however,weusea setofheuristicstoprunetheselection.Iftheselectionwas madewhilevisitinganimage I j ,wecanusethepointsthat areknowntobevisiblefromthatviewpoint(Points (C j ) )to renetheselection.Inparticular,wecomputethe3 × 3co- variancematrixforthepointsin S  Points (C j ) ,andremove allfrom S allpointswithaMahalanobisdistancegreater than1.2fromthemean.Iftheselectionwasmadewhilenot visitinganimage,weinsteadcomputeaweightedmeanand covariancematrixfortheentireset S .Theweightingfavors pointswhichareclosertothevirtualcamera,theideabe- ingthatthosearemorelikelytobeunoccludedthanpoints whicharefaraway.Thus,theweightforeachpointiscom- putedastheinverseofitsdistancefromthevirtualcamera. 6.4CreatingStabilizedSlideshows Wheneverthethumbnailpanecontainsmorethanoneim- age,itscontentscanbeviewedasaslideshowbypressing the“play”buttoninthepane.Bydefault,thevirtualcamera willmovethroughspacefromcameratocamera,pausing ateachimageforafewsecondsbeforeproceedingtothe next.Theusercanalso“lock”thecamera,xingittotheits currentposition,orientation,andeldofview.Whentheim- agesinthethumbnailpanearealltakenfromapproximately thesamelocation,thismodestabilizestheimages,making iteasiertocompareoneimagetothenext.Thismodeisuse- fulforstudyingchangesinsceneappearanceasafunction oftimeofday,season,year,weatherpatterns,etc.Anexam- plestabilizedslideshowfromtheYosemitedatasetisshown inthecompanionvideo. 16 6.5Photosynth Ourworkonvisualizationofunorderedphotocollections isbeingusedinthePhotosynthTechnologyPreview 17 re- leasedbyMicrosoftLiveLabs.Photosynthisaphotovi- sualizationtoolthatusesthesameunderlyingdata(camera 16 Phototourismwebsite,http://phototour.cs.washington.edu/. 17 MicrosoftLiveLabs,Photosynthtechnologypreview,http://labs. live.com/photosynth. IntJComputVis Fig.6 Photosynthtechnology preview.Thisscreenshotshows auserexploringphotosof PiazzaSanMarcoinVenice ( left )andSt.Peter’sBasilicain theVatican( right ) Fig.7 Exampleofannotation transfer.Threeregionswere annotatedinthephotograph on theleft ;theannotationswere automaticallytransferredtothe otherphotographs,afewof whichareshown ontheright . Oursystemcanhandlepartial andfullocclusions suchlabeledimageswithanexistingcollectionofphotos usingoursystem,wecouldtransfertheexistinglabelsto everyotherrelevantphotointhesystem.Otherimageson thewebareimplicitlyannotated:forinstance,animageon aWikipediapageis“annotated”withtheURLofthatpage. Byregisteringsuchimages,wecouldlinkotherphotosto thesamepage. 8Results Wehaveappliedoursystemtoseveralinputphotocollec- tions,including“uncontrolled”setsconsistingofimages downloadedfromFlickr.Ineachcase,oursystemdetected andmatchedfeaturesontheentiresetofphotosandauto- maticallyidentiedandregisteredasubsetcorresponding tooneconnectedcomponentofthescene.Theuncontrolled setswehavetestedareasfollows: 1. NotreDame ,asetofphotosoftheNotreDameCathe- dralinParis. 2. MountRushmore ,asetofphotosofMountRushmore NationalMonument,SouthDakota. 3. TrafalgarSquare ,asetofphotosfromTrafalgarSquare, London. 4. Yosemite ,asetofphotosofHalfDomeinYosemiteNa- tionalPark. 5. TreviFountain ,asetofphotosoftheTreviFountainin Rome. 6. Sphinx ,asetofphotosoftheGreatSphinxofGiza, Egypt. 7. St.Basil’s ,asetofphotosofSaintBasil’sCathedralin Moscow. 8. Colosseum ,asetofphotosoftheColosseuminRome. Threeothersetsweretakeninmorecontrolledsettings (i.e.,asinglepersonwithasinglecameraandlens): 1. Prague ,asetofphotosoftheOldTownSquarein Prague. 2. Annecy ,asetofphotosofastreetinAnnecy,France. 3. GreatWall ,asetofphotostakenalongtheGreatWall ofChina. Moreinformationaboutthesedatasets(includingthe numberofinputphotos,numberofregisteredphotos,run- ningtime,andaveragereprojectionerror),isshowninTa- ble 1 .Therunningtimesreportedinthistableweregen- eratedbyrunningthecompletepipelineononeormore 3.80GHzIntelPentium4processors.Thekeypointdetec- tionandmatchingphaseswereruninparallelontenproces- sors,andthestructurefrommotionalgorithmwasrunona singleprocessor. VisualizationsofthesedatasetsareshowninFigs. 9 and 10 .Pleaseseethevideoandlivedemoontheproject website 18 forademonstrationoffeaturesofourphotoex- 18 Phototourismwebsite,http://phototour.cs.washington.edu/. IntJComputVis Fig.8 Aregisteredhistorical photo. Left : MoonandHalf Dome ,1960.Photographby AnselAdams.Weregisteredthis historicalphototoourHalf Domemodel. Right :rendering ofDEMdataforHalfDome fromwhereAnselAdamswas standing,asestimatedbyour system.Thewhiteborderwas drawnmanuallyforcomparison (DEMandcolortexture courtesyoftheU.S.Geological Survey) Table1 Datasets.Eachrowlistsinformationabouteachdatasetused CollectionSearchterm#photos#registered#pointsruntimeerror NotreDamenotredameANDparis263559830553512.7days0.616 Mt.Rushmoremountrushmore10004521339942.6days0.444 TrafalgarSq.trafalgarsquare1893278272243.5days1.192 YosemitehalfdomeANDyosemite188267826474310.4days0.757 TreviFountaintreviANDrome46637011474220.5hrs0.698 SphinxsphinxANDegypt10005112067833.4days0.418 St.Basil’sbasilANDredsquare6272202578223.0hrs0.816 ColosseumcolosseumAND(romeORroma)19943901883065.0days1.360 PragueN/A197171389213.1hrs0.731 AnnecyN/A4624241964432.5days0.810 GreatWallN/A12081242252.8hrs0.707 Collection ,thenameoftheset; searchterm thesearchtermusedtogathertheimages; #photos ,thenumberofphotosintheinputset; #registered thenumberofphotosregistered; #points ,thenumberofpointsinthenalreconstruction; runtime ,theapproximatetotaltimeforreconstruction; error ,themeanreprojectionerror,inpixels,afteroptimization.ThersteightdatasetsweregatheredfromtheInternet,andthelastthreewere eachcapturedbyasingleperson plorer,includingobjectselection,relatedimageselection, morphing,andannotationtransfer,onseveraldatasets. FortheHalfDomedataset,afterinitiallyconstructing themodel,wealignedittoadigitalelevationmapusing theapproachdescribedinSect. 4.3.1 .Wethenregistereda historicalphoto,AnselAdam’s“MoonandHalfDome,”to thedataset,bydragginganddroppingitontothemodel usingthemethoddescribedinSect. 7.1 .Figure 8 showsa syntheticrenderingofthescenefromtheestimatedposition whereAnselAdamstookthephoto. 8.1Discussion ThegraphsshowninthethirdcolumnofFigs. 9 and 10 containedgesbetweenpairsofphotoswithmatchingfea- tures,asinFig. 1 .Theseconnectivitygraphssuggestthat manyoftheuncontrolleddatasetswetestedourreconstruc- tionalgorithmonconsistofseverallargeclustersofphotos withasmallnumberofconnectionsspanningclusters,anda sparsesetofphotoshangingoffthemainclusters.Thelarge clustersusuallycorrespondtosetsofphotosfromsimilar viewpoints.Forinstance,thelargeclusterthatdominates the MountRushmore connectivitygraphareallfrontal- viewphotostakenfromtheobservationterraceorthetrails aroundit,andthetwolargeclustersontherightsideof the Colosseum connectivitygraphcorrespondtotheinside andtheoutsideoftheColosseum.Sometimesclusterscor- respondnottoviewpointbuttodifferentlightingconditions, asinthecaseofthe TreviFountain collection(seeFig. 1 ), wherethereisa“daytime”clusteranda“nighttime”clus- ter. IntJComputVis Fig.9 Samplereconstructedscenes. Fromtoptobottom : Notre Dame , MountRushmore , TrafalgarSquare , Yosemite , Trevi Fountain ,and Sphinx .The rstcolumn showsasampleim- age,andthe secondcolumn showsaviewofthereconstruction. The thirdandfourthcolumns showphotoconnectivitygraphs,in whicheachimageinthesetisanodeandanedgelinkseach pairofimageswithfeaturematches.The thirdcolumn shows thephotoconnectivitygraphforthefullimageset,andthe fourthforthesubsetofphotosthatwereultimatelyreconstructed IntJComputVis outsideofabuilding),orthosethataresharp,infocus,and well-lit,inthecaseofclusterstakenatdifferenttimesof day.The“leaf”nodesinthegraphgenerallycorrespondto imagesthatareatextremesalongsomedimension,suchas photosthatareveryzoomedin,photostakenfromasig- nicantlydifferentviewpoint,orphotostakenundervery differentlighting. WhilethegraphsinthethirdcolumnofFigs. 9 and 10 representtheconnectivityofentirephotosets,thefourth columnshowsthepartofthegraphouralgorithmwasable toreconstruct(the reconstructiongraph ).Asdescribedin Sect. 4 ,ingeneral,ouralgorithmdoesnotreconstructall inputphotos,becausetheinputsetmayformseparatecon- nectedcomponents,orclustersthataretooweaklycon- nectedtobereliablereconstructed(duringreconstruction, photosareaddeduntilnoremainingphotoobservesenough 3Dpointstoreliablyaddittothescene).Thesereconstruc- tiongraphssuggestthatforunstructureddatasets,ourrecon- structionalgorithmtendstoreconstructmostofoneofthe mainclusters,andcansometimesbridgegapsbetweenclus- terswithenoughconnectionsbetweenthem.Forinstance, inthe Sphinx collection,ouralgorithmreconstructedtwo prominentclusters,oneontherightsideofthegraph,and oneonthebottom.Theseclusterscorrespondtotwosides oftheSphinx(thefrontandtherightside)whicharecom- monlyphotographed;afewphotosweretakenfrominter- mediateangles,allowingthetwoclusterstobeconnected.In the Colosseum collection,onlytheoutsideofthestructure wassuccessfullyreconstructed,andthereforeonlyasingle clusterintheconnectivitygraphisrepresentedintherecon- structiongraph.Moreimageswouldbeneededtobridgethe otherclustersinthegraph.Ingeneral,themore“connected” theimagegraphis,themoreimagescansuccessfullybereg- istered. Forthecontrolleddatasets( Annecy , Prague ,and Great Wall ),thephotoswerecapturedwiththeintentionofgen- eratingareconstructionfromthem,andtheconnectivity graphsarelessclustered,astheyweretaken,forthemost part,whilewalkingalongapath.Inthe Prague photoset, forinstance,mostofthephotosweretakenallaroundthe OldTownSquare,lookingoutwardatthebuildings.Afew weretakenlookingacrossthesquare,soafewlongerrange connectionsbetweenpartsofthegraphareevident.Ourre- constructionalgorithmwasabletoregistermostofthepho- tosinthesedatasets. 9ResearchChallenges Aprimaryobjectiveofthispaperistomotivatemorere- searchinthecomputervisioncommunityonanalyzingthe diverseandmassivephotocollectionsavailableontheIn- ternet.Whilethispaperpresentedsomeinitialstepstowards processingsuchimageryforthepurposeofreconstruction andvisualization,hugechallengesremain.Someoftheopen researchproblemsforourcommunityinclude: € Scale. Asmoreandmoreoftheworld’ssitesandcities arecapturedphotographically,wecanimagineusingSfM methodstoreconstructasignicantportionoftheworld’s urbanareas.Achievingsuchagoalwillrequire Internet- scale matchingandreconstructionalgorithmsthatoper- ateonmillionsofimages.Somerecentmatchingalgo- rithms(GraumanandDarrell 2005 ;NistérandStewénius 2006 )havedemonstratedtheabilitytooperateeffectively ondatasetsthatapproachthisscale,althoughmorework isneededtoimprovetheaccuracyofthesemethods,es- peciallyonInternetphotocollections.Furthermore,the largeredundancyinonlinephotocollectionsmeansthat asmallfractionofimagesmaybesufcienttoproduce highqualityreconstructions.Theseandotherfactorslead ustobelievethatInternet-scaleSfMisfeasible. € Variability. WhileSIFTandotherfeaturematchingtech- niquesaresurprisinglyrobustwithrespecttoappearance variation,moresignicantappearancechangesstillposea problem.Morerobustmatchingalgorithmscouldenablea numberofexcitingcapabilities,suchasmatchingground- basedtoaerial/satelliteviews,aligningimagesofnatural sitesthroughchangesinseasonsandweatherconditions, registeringhistoricalphotosandartisticrenderingswith modern-dayviews( rephotography 20 ),androbustmatch- ingusinglow-quality(e.g.,cellphonecamera)devices andimagery. € Accuracy. Accuracyisanimportantconcernforap- plicationssuchaslocalization(e.g.,guringoutwhere youarebytakingaphotograph),navigation,andsur- veillance.MostSfMmethodsoperatebyminimizingre- projectionerroranddonotprovideguaranteesonmet- ricaccuracy.However,satelliteimages,maps,DEMs, surveys,andsimilardataprovidearichsourceofmet- ricdataforalargepercentageofworldsites;suchdata couldbeusedtoobtainmoreaccuratemetricSfMre- sults.Thereisalsoaneedforevaluationstudies,inthe spiritof(ScharsteinandSzeliski 2002 ;Seitzetal. 2006 ; Szeliskietal. 2006 ),thatbenchmarkthebest-of-breed SfMalgorithmsagainstgroundtruthdatasetsandencour- agethedevelopmentofmoreaccuratetechniques. € Shape. WhileSfMtechniquesprovideonlysparsegeom- etry,theabilitytocomputeaccuratecameraparame- tersopensthedoorfortechniquessuchasmulti-view stereothatcomputedensesurfaceshapemodels.Inturn, shapeenablescomputingscenereectanceproperties (e.g.,BRDFs)andillumination.Wethereforeenvisiona newbreedofshapeandreectancemodelingtechniques 20 Rephotography,http://en.wikipedia.org/wiki/Rephotography. IntJComputVis Theterm E angle isusedtofavorhead-onviewsofasetof pointsoverobliqueviews.Tocompute E angle ,werstta planetothepointsin S usingaRANSACprocedure.Ifthe percentageofpointsin S whichareinlierstotherecovered planeisaboveathresholdof50%(i.e.,thereappearstobea dominantplaneintheselection),wefavorcamerasthatview theobjecthead-onbysetting E angle = V(C j ) · n ,where V indicatesviewingdirection,and  n thenormaltotherecov- eredplane.Iffewerthan50%ofthepointsttheplane,we set E angle = 0. Finally, E detail favorshigh-resolutionviewsoftheobject. E detail isdenedtobethearea,inpixels,ofthebounding boxoftheprojectionsof S intoimage I j (consideringonly pointsthatprojectinsidetheboundaryof I j ). E detail isnor- malizedbytheareaofthelargestsuchboundingbox,sothe highestresolutionavailableviewwillhaveascoreof1.0. Appendix3:PhotoCredits Wewouldliketothankthefollowingpeopleforallowingus toreproducetheirphotographs: HollyAbles ,ofNashville,TN RakeshAgrawal PedroAlcocer JulienAvarre (http://www.ickr.com/photos/eole/) RaelBennett (http://www.ickr.com/photos/spooky05/) LoïcBernard NicoleBratt NicholasBrown DomenicoCalojero (mikuzz@gmail.com) DeGantaChoudhury (http://www.ickr.com/photos/ deganta/) danclegg ClaudeCovo-Farchi AlperÇu  gun W.GarthDavis StamatiaEliakis DawnEndico (endico@gmail.com) SilvanaM.Felix JeroenHamers CarolineHärdter MaryHarrsch MollyHazelton BillJennings (http://www.ickr.com/photos/mrjennings), supportedbygrantsfromtheNationalEndowmentfor theHumanitiesandtheFundforTeachers MichelleJoo TommyKeswick KirstenGilbertKrenicky GiampaoloMacorig ErinKMalone (photographscopyright2005) DaryoushMansouri PaulMeidinger LauretedeAlbuquerqueMouazan CallieNeylan RobertNorman DirkOlbertz DaveOrtman GeorgeOwens ClaireElizabethPoulin DavidR.Preston JimSellers and LauraKluver PeterSnowling RomSrinivasan JeffAllenWallenPhotographer/Photography DanielWest ToddA.VanZandt DarioZappalà SusanElnadi Wealsoacknowledgethefollowingpeoplewhosepho- tographswereproducedunderCreativeCommonslicenses: Shoshanah http://www.ickr.com/photos/shoshanah 22 DanKamminga http://www.ickr.com/photos/dankamminga 1 TjeerdWiersma http://www.ickr.com/photos/tjeerd 1 Manogamo http://www.ickr.com/photos/se-a-vida-e 23 TedWang http://www.ickr.com/photos/mtwang 24 Arnet http://www.ickr.com/photos/gurvan 3 RebekahMartin http://www.ickr.com/photos/rebekah 3 JeanRuaud http://www.ickr.com/photos/jrparis 3 ImranAli http://www.ickr.com/photos/imran 3 ScottGoldblatt http://www.ickr.com/photos/goldblatt 3 ToddMartin http://www.ickr.com/photos/tmartin 25 Steven http://www.ickr.com/photos/graye 4 ceriess http://www.ickr.com/photos/ceriess 1 CoryPiña http://www.ickr.com/photos/corypina 3 markgallagher http://www.ickr.com/photos/markgallagher 1 Celia http://www.ickr.com/photos/100n30th 3 CarloB. http://www.ickr.com/photos/brodo 3 KurtNaks http://www.ickr.com/photos/kurtnaks 4 AnthonyM. http://www.ickr.com/photos/antmoose 1 VirginiaG http://www.ickr.com/photos/vgasull 3 Collectioncreditandcopyrightnoticefor MoonandHalf Dome ,1960,byAnselAdams:CollectionCenterforCre- 1 http://creativecommons.org/licenses/by/2.0/. 2 http://creativecommons.org/licenses/by-nd/2.0/. 3 http://creativecommons.org/licenses/by-nc-nd/2.0/. 4 http://creativecommons.org/licenses/by-nc/2.0/. IntJComputVis McMillan,L.,&Bishop,G.(1995)Plenopticmodeling:Animage- basedrenderingsystem.In SIGGRAPHconferenceproceedings (pp.39–46). Mikolajczyk,K.,&Schmid,C.(2004).Scale&afneinvariantinterest pointdetectors. InternationalJournalofComputerVision , 60 (1), 63–86. Mikolajczyk,K.,Tuytelaars,T.,Schmid,C.,Zisserman,A.,Matas,J., Schaffalitzky,F.,Kadir,T.,&vanGool,L.(2005).Acompari- sonofafneregiondetectors. InternationalJournalofComputer Vision , 65 (1/2),43–72. Moravec,H.(1983).TheStanfordcartandtheCMUrover. Proceed- ingsoftheIEEE , 71 (7),872–884. Naaman,M.,Paepcke,A.,&Garcia-Molina,H.(2003).Fromwhere towhat:Metadatasharingfordigitalphotographswithgeographic coordinates.In Proceedingsoftheinternationalconferenceonco- operativeinformationsystems (pp.196–217). Naaman,M.,Song,Y.J.,Paepcke,A.,&Garcia-Molina,H.(2004). Automaticorganizationfordigitalphotographswithgeographic coordinates.In ProceedingsoftheACM/IEEE-CSjointconfer- enceondigitallibraries (pp.53–62). Nistér,D.(2000).Reconstructionfromuncalibratedsequenceswith ahierarchyoftrifocaltensors.In ProceedingsoftheEuropean conferenceoncomputervision (pp.649–663). Nistér,D.(2004).Anefcientsolutiontotheve-pointrelativepose problem. IEEETransactionsonPatternAnalysisandMachineIn- telligence , 26 (6),756–777. Nistér,D.,&Stewénius,H.(2006).Scalablerecognitionwithavocab- ularytree.In ProceedingsoftheIEEEconferenceoncomputer visionandpatternrecognition (pp.2118–2125). Nocedal,J.,&Wright,S.J.(1999). Springerseriesinoperationsre- search.Numericaloptimization .NewYork:Springer. Oliensis,J.(1999).Amulti-framestructure-from-motionalgorithmun- derperspectiveprojection. InternationalJournalofComputerVi- sion , 34 (2–3),163–192. Pollefeys,M.,Koch,R.,&VanGool,L.(1999).Self-calibrationand metricreconstructioninspiteofvaryingandunknowninternal cameraparameters. InternationalJournalofComputerVision , 32 (1),7–25. Pollefeys,M.,&VanGool,L.(2002).Fromimagesto3Dmodels. CommunicationsoftheACM , 45 (7),50–55. Pollefeys,M.,vanGool,L.,Vergauwen,M.,Verbiest,F.,Cornelis,K., Tops,J.,&Koch,R.(2004).Visualmodelingwithahand-held camera. InternationalJournalofComputerVision , 59 (3),207– 232. Robertson,D.P.,&Cipolla,R.(2002).Buildingarchitecturalmodels frommanyviewsusingmapconstraints.In Proceedingsofthe Europeanconferenceoncomputervision (Vol.II,pp.155–169). Rodden,K.,&Wood,K.R.(2003).Howdopeoplemanagetheirdigi- talphotographs?In Proceedingsoftheconferenceonhumanfac- torsincomputingsystems (pp.409–416). Román,A.,etal.(2004).Interactivedesignofmulti-perspectiveim- agesforvisualizingurbanlandscapes.In IEEEvisualization2004 (pp.537–544). Russell,B.C.,Torralba,A.,Murphy,K.P.,&Freeman,W.T.(2005). Labelme:adatabaseandweb-basedtoolforimageannotation (TechnicalReportMIT-CSAIL-TR-2005-056).MassachusettsIn- stituteofTechnology. Schaffalitzky,F.,&Zisserman,A.(2002).Multi-viewmatchingforun- orderedimagesets,or“HowdoIorganizemyholidaysnaps?” In ProceedingsoftheEuropeanconferenceoncomputervision (Vol.1,pp.414–431). Scharstein,D.,&Szeliski,R.(2002).Ataxonomyandevaluationof densetwo-framestereocorrespondencealgorithms. International JournalofComputerVision , 47 (1),7–42. Schindler,G.,Dellaert,F.,&Kang,S.B.(2007).Inferringtemporal orderofimagesfrom3Dstructure.In ProceedingsoftheIEEE conferenceoncomputervisionandpatternrecognition . Schmid,C.,&Zisserman,A.(1997).Automaticlinematchingacross views.In ProceedingsoftheIEEEconferenceoncomputervision andpatternrecognition (pp.666–671). Seitz,S.M.,&Dyer,C.M.(1996).Viewmorphing.In SIGGRAPH conferenceproceedings (pp.21–30). Seitz,S.,Curless,B.,Diebel,J.,Scharstein,D.,&Szeliski,R.(2006). Acomparisonandevaluationofmulti-viewstereoreconstruction algorithms.In ProceedingsoftheIEEEconferenceoncomputer visionandpatternrecognition (Vol.1,pp.519–526),June2006. Shi,J.,&Tomasi,C.Goodfeaturestotrack.In Proceedingsofthe IEEEconferenceoncomputervisionandpatternrecognition (pp.593–600),June1994. Sivic,J.,&Zisserman,A.(2003).VideoGoogle:atextretrievalap- proachtoobjectmatchinginvideos.In Proceedingsoftheinter- nationalconferenceoncomputervision (pp.1470–1477),October 2003. Snavely,N.,Seitz,S.M.,&Szeliski,R.(2006).Phototourism:ex- ploringphotocollectionsin3D. ACMTransactionsonGraphics , 25 (3),835–846. Spetsakis,M.E.,&Aloimonos,J.Y.(1991).Amultiframeapproach tovisualmotionperception. InternationalJournalofComputer Vision , 6 (3),245–255. Strecha,C.,Tuytelaars,T.,&VanGool,L.(2003).Densematchingof multiplewide-baselineviews.In Proceedingsoftheinternational conferenceoncomputervision (pp.1194–1201),October2003. Szeliski,R.(2006).Imagealignmentandstitching:atutorial. Foun- dationsandTrendsinComputerGraphicsandComputerVision , 2 (1). Szeliski,R.,&Kang,S.B.(1994).Recovering3Dshapeandmotion fromimagestreamsusingnonlinearleastsquares. JournalofVi- sualCommunicationandImageRepresentation , 5 (1),10–28. Szeliski,R.,Zabih,R.,Scharstein,D.,Veksler,O.,Kolmogorov,V., Agarwala,A.,Tappen,M.,&Rother,C.(2006).Acomparative studyofenergyminimizationmethodsforMarkovrandomelds. In ProceedingsoftheEuropeanconferenceoncomputervision (Vol.2,pp.16–29),May2006. Tanaka,H.,Arikawa,M.,&Shibasaki,R.(2002).A3-dphotocollage systemforspatialnavigations.In Revisedpapersfromthesecond KyotoworkshopondigitalcitiesII,computationalandsociologi- calapproaches (pp.305–316). Teller,S.,Antone,M.,Bodnar,Z.,Bosse,M.,Coorg,S.,Jethwa,M.,& Master,N.(2003).Calibrated,registeredimagesofanextended urbanarea. InternationalJournalofComputerVision , 53 (1),93– 107. Tomasi,C.,&Kanade,T.(1992).Shapeandmotionfromimage streamsunderorthography:afactorizationmethod. International JournalofComputerVision , 9 (2),137–154. Toyama,K.,Logan,R.,&Roseway,A.(2003).Geographiclocation tagsondigitalimages.In Proceedingsoftheinternationalconfer- enceonmultimedia (pp.156–166). Triggs,B.,etal.(1999).Bundleadjustment—amodernsynthesis.In Internationalworkshoponvisionalgorithms (pp.298–372),Sep- tember1999. Tuytelaars,T.,&VanGool,L.(2004).Matchingwidelyseparated viewsbasedonafneinvariantregions. InternationalJournalof ComputerVision , 59 (1),61–85. Vergauwen,M.,&VanGool,L.(2006).Web-based3Dreconstruction service. MachineVisionandApplications , 17 (2),321–329. vonAhn,L.,&Dabbish,L.(2004).Labelingimageswithacomputer game.In Proceedingsoftheconferenceonhumanfactorsincom- putingsystems (pp.319–326). Zitnick,L.,Kang,S.B.,Uyttendaele,M.,Winder,S.,&Szeliski, R.(2004).High-qualityvideoviewinterpolationusingalayered representation.In SIGGRAPHconferenceproceedings (pp.600– 608).