AccurateDenseandRobustMultiViewStereopsis YasutakaFurukawa DepartmentofComputerScience andBeckmanInstitute UniversityofIllinoisat UrbanaChampaignUSA Jean Ponce WillowTeamNSINIANPC D epartement d
154K - views

AccurateDenseandRobustMultiViewStereopsis YasutakaFurukawa DepartmentofComputerScience andBeckmanInstitute UniversityofIllinoisat UrbanaChampaignUSA Jean Ponce WillowTeamNSINIANPC D epartement d

Informatique coleNormaleSup erieure Paris France Abstract This paper proposes a novel algorithm for calibrated multiview stereopsis that outputs a quasi dense set of rectan gular patches covering the surfaces visible in the input images This algorith

Download Pdf

AccurateDenseandRobustMultiViewStereopsis YasutakaFurukawa DepartmentofComputerScience andBeckmanInstitute UniversityofIllinoisat UrbanaChampaignUSA Jean Ponce WillowTeamNSINIANPC D epartement d

Download Pdf - The PPT/PDF document "AccurateDenseandRobustMultiViewStereopsi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "AccurateDenseandRobustMultiViewStereopsis YasutakaFurukawa DepartmentofComputerScience andBeckmanInstitute UniversityofIllinoisat UrbanaChampaignUSA Jean Ponce WillowTeamNSINIANPC D epartement d"— Presentation transcript:

Page 1
Accurate,Dense,andRobustMulti-ViewStereopsis YasutakaFurukawa DepartmentofComputerScience andBeckmanInstitute UniversityofIllinoisat Urbana-Champaign,USA Jean Ponce WillowTeam–)NS+IN,IA+)NPC D epartement d.Informatique )coleNormaleSup erieure, Paris, France Abstract: This paper proposes a novel algorithm for calibrated multi-view stereopsis that outputs a (quasi) dense set of rectan- gular patches covering the surfaces visible in the input images. This algorithm does not require any initialization in the form of a bounding volume, and it detects and discards automatically out-

liers and obstacles. It does not perform any smoothing across nearby features, yet is currently the top performer in terms of both coverage and accuracy for four of the six benchmarkdatasets pre- sented in [ 20 ]. The keys to its performance are effective tech- niques for enforcing local photometric consistency and global visibility constraints. Stereopsis is implemented as a match, ex- pand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these to nearby pixel corre- spondences before using visibility constraints to filter away false

matches. A simple but effective method for turning the resulting patch model into a mesh appropriate for image-based modeling is also presented. The proposed approach is demonstrated on vari- ous datasets including objects with fine surface details, deep con- cavities, and thin structures, outdoor scenes observed from a re- stricted set of viewpoints, and “crowded” scenes where moving obstacles appear in different places in multiple images of a static structure of interest. 1.Introduction As in the binocular case, although most early work in multi-view stereopsis 2e3g3, 4 1% 15 16 78

tended to match and reconstruct all scene points independently, recent ap- proaches typically cast this problem as a variational one, where the objective is to find the surface minimizinga global photometric discrepancy functional, regularized by explicit smoothness constraints 4 17 1; %% %= 72age- ometric consistency terms is sometimes added as well 4 783 Competingapproaches mostly differ in the type of optimization techniques that they use, ranging from local methods such as gradient descent 4 7, level sets 4 1; 7, or expectation maximization 4 %1 7, to global ones such as graph cuts 4

17 %% %= 73 The variational approachhas led to impressive progress, and several of the methodsrecentlysurveyedbySeitzetal34 %0 7achievearel- ativeaccuracybetterthan1+%0021mmfora %0cmwideob- ject8fromasetoflow-resolution2640 4;08images3 Aow- ever, it typically requires determininga boundingvolume 2valid depth range, bounding box, or visual hull8 prior to initiatingtheoptimizationprocess,whichmaynotbefeasi- ble for outdoor scenes and+or cluttered images3 We pro- pose instead a simple and efficient algorithm for calibrated multi-view stereopsis that does not require any initializa- tion,is

capableof detectinganddiscardingoutliersandob- stacles, and outputs a 2quasi8 dense collection of small ori- entedrectangularpatches4 1= 7, obtainedfrompixel-level correspondencesandtightlycoveringtheobservedsurfaces exceptin small texturelessor occluded regions3 It doesnot perform any smoothingacross nearby features, yet is cur- rentlythetopperformerintermsofbothcoverageandaccu- racyforfourofthesixbenchmarkdatasetsprovidedin4 %0 73 Thekeystoitsperformanceareeffectivetechniquesforen- forcinglocal photometric consistency and global visibility constraints3 Stereopsis is implemented as a match,

expand, and filter procedure, startingfrom a sparse set of matched keypoints, and repeatedly expandingthese to nearby pixel correspondences before usingvisibility constraints to fil- ter away false matches3 A simple but effective method for turningthe resultingpatch model into a mesh suitable for image-basedmodeling is also presented3 The proposed ap- proachisappliedtothreeclassesofdatasetsB objects , where a single, compact object is usually fully visible in a set of unclutteredimagestaken from all around it,anditisrelativelystraightforwardtoextracttheapparent

contoursoftheobjectandcomputeitsvisualhullC scenes , where the target object2s8 may be partially oc- cluded and+or embedded in clutter, and the range of view- pointsmaybeseverelylimited,preventingthecomputation of effective boundingvolumes 2typical examples are out- doorsceneswith buildingsorwalls8C and In addition, variational approaches typically involve massive opti- mization tasks with tens of thousands of coupled variables, potentially limitingthe resolution ofthe correspondingreconstructions 2see, however, 1; 7 for a fast DPU implementation83 We will revisit tradeoffs between

computational efficiency and reconstruction accuracy in Sect3
Page 2
Figure 13 Everall approach3 From left to rightB a sample input imageC detected featuresC reconstructed patches after the initial matchingC finalpatches afterexpansion and filteringCpolygonal surface extracted from reconstructed patches3 crowded scenes ,wheremovingobstaclesappearindiffer- entplacesinmultipleimagesofastaticstructureofinterest 2e3g3,peoplepassinginfrontofa building83 Techniques such as space carving4 1% 15 16 7andvari- ational methods based on gradient descent 4 7, level sets 4 1; 7,

or graph cuts 4 17 %% %= 7 typically require an initial boundingvolume and+or a wide range of viewpoints3 Ebject datasets are the ideal input for these al- gorithms,butmethodsusingmultipledepthmaps4 %1 7or small,independentsurfaceelements4 1= 7arebettersuited to the more challenging scene datasets3 Crowded scenes are even more difficult3 The method proposed in 4 %1 7uses expectation maximization and multiple depth maps to re- construct a crowded scene despite the presence of occlud- ers, but it is limited to a small number of images 2typi- cally three83 As shown by qualitative and

quantitative ex- periments in the rest of this paper, our algorithm effec- tively handles all three types of data, and, in particular, outputs accurate object and scene models with fine surface detail despite low-textureregions, large concavities, and+or thin, high-curvature parts3 As noted earlier, it implements multi-view stereopsis as a simple match, expand, and fil- ter procedure 2Fig3 8B 218 Matching B features found by Aarris and Difference-of-Daussians operators are matched across multiple pictures, yieldinga sparse set of patches associated with salient image regions3 Diven

these initial matches,thefollowingtwostepsarerepeated times2 in all ourexperiments8B 2%8 ,xpansion- a techniquesimilar to 4 16 11 1= 7 is used to spread the initial matches to nearby pixels and obtain a dense set of patches3 2=8 .il- tering- visibility constraintsare used to eliminate incorrect matcheslyingeitherinfrontorbehindtheobservedsurface3 This approach is similar to the method proposed by Fhuil- lierandGuan4 1= 7,buttheirexpansionprocedureisgreedy, while our algorithm iterates between expansion and filter- ingsteps, whichallows usto processcomplicatedsurfaces3 Furthermore,

outliers cannot be handled in their method3 ThesedifferencesarealsotruewiththeapproachbyHushal and Ponce 4 11 7 in comparison to ours3 In addition, only a pair of images can be handled at once in 4 11 7, while our methodcanprocessarbitrarynumberofimagesuniformly3 2.KeyElementsoftheProposedApproach BeforedetailingouralgorithminSect3 ,wedefinehere the patches that will make up our reconstructions, as well asthedatastructuresusedthroughouttorepresenttheinput images3 We also introducetwo other fundamentalbuilding blocks of our approach, namely, the methods used to ac- curately reconstruct a

patch once the correspondingimage fragmentshavebeenmatched,anddetermineitsvisibility3 2.1.PatchModels A patch is a rectangle with center and unit nor- mal vector oriented toward the cameras observingit 2Fig3 83 We associate with a reference image ,cho- sensothatitsretinalplaneisclosetoparallelto withlittle distortion3 In turn, determinesthe orientation and ex- tent of the rectangle in the plane orthogonal to ,so theprojectionofoneofitsedgesinto isparalleltothe image rows, and the smallest axis-aligned square contain- ingits image covers a pixel area 2we use values of 5or7for in all of our

experiments83 Two sets of pic- tures are also attached to each patch B the images where p should be visible 2despite self-occlusion8,butmay in practice not be recognizable 2due to highlights, motion blur, etc38, or hidden by movingobstacles, and the images where it is truly found 2 is of course an element of 83 We enforce the followingtwo constraints on the modelB First, we enforce local photometric consistency by requiringthattheprojectedtexturesofeverypatch becon- sistentin atleast images2in otherwords | , with = in all but three of our experiments, where is set to %83 Second, we enforce

global visibility consistency by re- quiringthat no patch be occluded by any other patch in anyimagein 2.2.ImageModels We associate with each image a regular grid of pixel cells ,andattempttoreconstructatleastone A patch pmay be occluded in one or several of the images in by movingobstacles, but these are not reconstructed by our algorithm and thus do not generate occluding patches3
Page 3
S(p) R(p) c(p) n(p) T(p) Figure %3 Definition of a patch 2left8 and of the images associated withit2right83 Seetextfor thedetails3 patch in every cell 2we use values of 1 or % for in all our

experiments83 The cell keeps track of two dif- ferentsets and ofreconstructedpatchespo- tentially visible in BApatch is stored in if ,andin if 3Wealsoas- sociate with the depth of the center of the patch in closest to the optical center of the corresponding camera3 This amountsto attachinga depthmap to ,which will proveusefulin thevisibilitycalculationsofSect3 %34 2.3.EnforcingPhotometricConsistency Divenapatch ,weusethenormalizedcrosscorrelation 2NCC8 ofitsprojectionsintotheimages and to measuretheirphotometricconsistency3 Concretely,a gridis overlaidon and projectedinto the two images, the

correlated values beingobtained through bilinear interpo- lation3 Diven a patch , its reference image ,andthe set of images where it is truly visible, we can now estimate its position and its surface normal by maximizingtheaverageNCC score )= | 218 with respect to these unknowns3 To simplifycomputations, weconstrain tolieontherayjoiningtheopticalcenter of the reference camera to the correspondingimage point, reducingthe number of degrees of freedom of this opti- mization problem to three—depth alongthe ray plus yaw and pitch angles for 8, and use a conjugate gradient method 4 14 7 to

find the optimal parameters3 Simple meth- ods for computingreasonable initial guesses for and aregiveninSects3 =31 and =3% 2.4.EnforcingVisibilityConsistency The visibility of each patch is determined by the im- ages and where it is 2potentially or truly8 ob- served3 Weusetwoslightlydifferentmethodsforconstruct- ing and dependingonthestageofourreconstruc- tion algorithm3 In the matching phase 2Sect3 =31 8, patches arereconstructedfromsparsefeaturematches,andwehave to rely on photometric consistency constraints to deter- mine 2or rather obtain an initial guess for8 visibility3 Con-

cretely,we initialize bothsets of imagesas those for which the NCC score exceeds some thresholdB )= )= 3 En the other hand, in the expan- sion phaseof ouralgorithm2Sect3 =3% 8,patchesare bycon- structiondenseenoughtoassociatedepthmapswithallim- ages,and isconstructedforeachpatchbythresholding thesedepthmaps—thatis, )= )+ where is the depthof the center of alongthe corre- spondingray of image ,and is the depth recorded in the cell associated with image and patch 3The value of is determined automatically as the distance at the depth of correspondingto an image displacement of pixels in 3Ence

hasbeenestimated, photo- metricconsistencyisusedtodeterminetheimageswhere is truly observedas )= This process may fail when the reference image is it- self an outlier, but, as explainedin the next section, our al- gorithm is designed to handle this problem3 Iterating its matchingandfilteringstepsalso helpsimprovethereliabil- ityandconsistencyofthevisibilityinformation3 3.Algorithm 3.1.Matching As the first step of our algorithm, we detect corner and blobfeaturesineachimageusingtheAarrisandDifference- of-Daussian 2DoD8 operators3 To ensure uniform cov- erage, we lay over each image

a coarse regular grid of pixel cells,andreturnascornersandblobsforeach cellthe localmaximaofthetwo operatorswith strongest responses 2we use =% and 4 in all our experi- ments83 Afterthesefeatureshavebeenfoundineachimage, they are matched across multiple pictures to reconstruct a sparse set of patches, which are then stored in the grid of cells overlaid on each image 2Fig3 8B Consider an image and denote by the optical center of the corre- spondingcamera3 For each feature detected in ,wecol- lect in the other imagesthe set of features of the same type 2Aarris or DoD8 that lie within %pixels

from the correspondingepipolar lines, and triangulatethe =D points associated with the pairs 3 We then consider these points in order of increasingdistance from as potential patchcenters, and returnthefirst patch “photoconsistent in at least images2Fig3 , top83 More concretely,for each Briefly, let us denote by a %D Daussian with standard deviation 3 The response of the Aarris filter at some image point is defined as det trace ,where ,and iscomputed byconvolvingtheimage withthe partial derivatives oftheDaussian The response of the DoD filter is 3Weuse 1pixel and

06 in all of our experiments3 )mpirically, this heuristic has proven to be effective in selecting mostly correct matches at a modest computational expense3
Page 4
OPQQQQ,QQQQ,QQQQ,QQQQR )pipolarQline DetectedQfeatures 2Aarris+DoD8 ++ FeaturesQsatisfyingQepipolar consistencyQ2Aarris+DoD8 InputB Featuresdetected in each image3 EutputB Initial sparseset of patches Cover each image witha grid of pixel cellsC For each image withoptical center For each feature detected in and lyingin an empty cell Features satisfyingtheepipolar consistency Sort inanincreasingorder of distancefrom For each

feature ←{ =D point triangulated from and Directionof optical rayfrom to argmax ←{ ←{ If | register tothecorrespondingcellsin exit innermost For loop, and add to Figure =3 Feature matching algorithm3 TopB An example showing the features satisfyingthe epipolar constraint in images and as they are matched to feature in image 2this is an illustration only, not showingactual detected features83 BottomB The matchingalgorithm3 The values used for and in all our experiments are 0 4and0 7 respectively3 feature , weconstructthepotentialsurfacepatch bytri- angulating and to obtainan

estimate of , assign to the direction of the optical ray joiningthis point to and set )= 3 After initializing by usingphoto- metric consistency as in Sect3 %34 , we use the optimization process described in Sect3 %3= to refine the parameters of and ,theninitialize andrecompute 3Fi- nally, if satisfies the constraint | , we compute its projectionsin all imagesin , registerit to the corre- spondingcells, and add it to 2Fig3 , bottom83 Note that sincethepurposeofthisstepisonlytoreconstructaninitial, sparse set of patches, features lyingin non-empty cells are skipped for

efficiency3 Also note that the patch generation process may fail if the reference image is an outlier, for example when correspond to a highlight3 This does notprevent,however,the reconstructionof the correspond- ingsurface patch from another image3 The second part of our algorithm iterates 2three times in all our experiments8 between an expansion step to obtain dense patches and a filteringstep toremoveerroneousmatchesandenforcevis- ibilityconsistency,asdetailedinthenexttwosections3 3.2.Expansion At this stage, we iteratively add new neighbors to ex- istingpatches until they cover

the surfaces visible in the scene3 Intuitively, two patches and are considered to be neighbors when they are stored in adjacent cells and of the same image in , and their tangent planes are close to each other3 We only attempt to create new neighbors when necessary—that is, when is empty, andnoneoftheelementsof is n-adjacent to , where two patches and are said to be n-adjacent when )) )) Similarto isdeterminedautomaticallyasthedistance at the depth of the mid-pointof and correspond- ingto an image displacement of pixels in 3When these two conditions are verified, we initialize the patch

by assigning to ,and the corresponding values for , and assigning to the point where the viewingraypassingthroughthecenterof intersects the plane containingthe patch 3Next, and are refinedbytheoptimizationprocedurediscussedinSect3 %3= and isinitializedfromthedepthmapsasexplainedin Sect3 %34 3 Since some matches 2and thus the correspond- ingdepth map information8 may be incorrect at this point, the elements of are added to to avoid missing any image where may be visible3 Finally, after updating usingphotometric constraints as in Sect3 %34 , we ac- ceptthepatch if | still

holds,thenregisterit to and , and update the depth maps associ- atedwithimagesin 3SeeFig3 forthealgorithm3 3.3.Filtering Two filteringsteps are applied to the reconstructed patchestofurtherenforcevisibilityconsistencyandremove erroneous matches3 The first filter focuses on removing patches that lie outside the real surface 2Fig3 ,left8BCon- siderapatch anddenoteby thesetofpatchesthatitoc- cludes3 We remove as an outlier when 2intuitively,when is an outlier, both and are expectedto be small, and is likely to be removed83 The second filter focuses on outliers lyingin-

sidetheactualsurface2Fig3 ,right8B Wesimplyrecompute and for each patch usingthe depth maps associated with the correspondingimages 2Sect3 %34 8, and Intuitively, any patch in would either already be a neigh- bor of , or be separated from it by a depth discontinuity, neither case warrantingthe addition of a new neighbor3
Page 5
InputB Patches from the feature matchingstep3 EutputB )xpanded set of reconstructed patches3 Use toinitialize,for eachimage, ,and itsdepth map3 While isnot empty Pick and remove a patch from For each image and cell that projects onto For each cell adjacent to

such that is emptyand is not n-adjacent to any patch in Create a new ,copying and from Intersection of optical raythrough center of withplane of argmax Sisible images of estimated by the current depth maps ←{ If ,go back to For -loopC Add to Update and depth maps for ,eturn allthe reconstructed patches stored in and Figure43 Patchexpansion algorithm3 CorrectQpatch Eutlier Figure 53 Eutliers lying outside 2left8 or inside 2right8 the correct surface3 Arrows are drawn between the patches and the images in , while solid arrows correspond to the case where denotes aset of patches occluded by

an outlier3 Seetext fordetails3 remove when 3 Note that the recomputed values of and may be differentfrom those ob- tained in the expansion step since more patches have been computedafterthereconstructionof 3 Finally,weenforce aweakformofregularizationasfollowsB Foreachpatch we collectthe patcheslyingin its ownandadjacentcells in all images of 3 If the proportion of patches that are n- adjacent to inthissetislowerthan %5, isremoved as an outlier3 The threshold is initialized with 0 7, and loweredby0 %aftereachexpansion+filteringiteration3 4.PolygonalSurfaceReconstruction

Thereconstructedpatchesforman orientedpoint ,or sur- fel model3 Despite the growing popularity of this type of models in the computer graphics community 4 10 7, it re- mains desirable to turn our collection of patches into sur- face meshes for image-based modeling applications3 The S* Figure 63 Polygonal surface reconstruction3 FeftB bounding vol- umes for the dino 2visual hull8, steps 2convex hull8, and city-hall 2union of hemispheres8 datasets featured in Figs3 and 10 ,ightB geometric elements driving the deformation process3 approach that we have adopted is a variant of the iterative

deformationalgorithmpresentedin4 7,andconsistsoftwo phases3 Briefly, after initializinga polygonal surface from a predetermined boundingvolume, the convex hull of the reconstructed points, or a set of small hemispheres cen- tered at these points and pointingaway from the cameras, we repeatedlymoveeachvertex accordingto threeforces 2Fig3 8B a smoothness term for regularizationC a photomet- ric consistency term, which is based on the reconstructed patches in the first phase, but is computed solely from the mesh in the second phaseC and, when accurate silhouettes are available, a rim

consistency term pullingthe rim of the deformingsurfacetowardthe correspondingvisualcones3 Concretely, the smoothness term is where denotesthe2discrete8Faplacianoperatorrelativeto a local parameterizationof the tangent plane in and 4 are used in all our experiments83 In the first phase, the photometric consistency term for each vertex essentiallydrivesthesurfacetowardsreconstructedpatches and is given by ,where is the inward unit normal to in )= max min ))) ,and is the signed distance between and the true surface along 2the parameter is used to bound the magni- tude of the force,ensure

stable deformationand avoid self- intersectionsCitsvalueisfixedas03%timestheaverageedge length in 83 In turn, is estimated as followsB We col- lect the set of 10 patches with 2outward8 nor- mals compatible with that of 2that is, 0, see Fig3 8 that lie closest to the line definedby and and compute as the weighted average distance from to the centers of the patches in along —that is, )= )[ )] ,where theweights are Daussian functionsof the distance between and the line, with standard deviation defined as before, and normalized to sum to 13 In the second phase, the pho- tometric

consistency term is computed for each vertex by usingthe patch optimization routine as follows3 At each vertex , we create a patch by initializing with with a surfacenormalestimatedat on ,and a setof visible images from a depth-map testingon the mesh at , then apply the patch optimization routine described in Sect3 %3= 3Fet denote the value of after the optimization, then is used as the photometric consistencyterm3 Inthe first phase,we iterate untilconver-
Page 6
Table 13 Characteristics of the datasets used in our experiments3 roman and skull datasets have been acquired in our

lab, while other datasets have been kindly provided by S3 Seitz, B3 Curless, J3 Diebel, D3 Scharstein, and ,3 Szeliski 2 temple and dino ,see also 4 %0 78C C3 Aern andez )steban, F3 Schmitt and the Museum of Cherbourg2 polynesian 8C S3 Sullivan and Industrial Fight and Magic 2 face face-2 body steps ,and wall 8C and C3 Strecha 2 city- hall and brussels 83 Name Images Image Size roman 4; 1;00 1%00 temple 16 640 4;0 dino 16 640 4;0 skull %4 %000 %000 polynesian =6 1700 %100 face 1400 %%00 face-2 1= 1500 1400 body 1400 %%00 steps 1500 1400 city-hall =000 %000 wall 1500 1400 brussels %000 1=00

gence, remesh, increase the resolution of the surface, and repeat the process until the desired resolution is obtained 2in particular, until image projectionsof edges of the mesh become approximately pixels in length, see 4 7 for de- tails83 The second phase is applied to the mesh only in its desiredresolutionasafinalrefinement3 5.ExperimentsandDiscussion We have implemented the proposed approach in CTT, usingthe WNFIB 4 14 7 implementation of conjugate gradi- ent in the patch optimization routine3 The datasets used in ourexperimentsarelistedinTable ,togetherwiththenum- ber of

input images, their approximatesize and a choice of parameters for each data set3 Note that all the parameters exceptfor and havebeenfixedinourexperiments3 We have first tested our algorithm on object datasets 2Figs3 and 8 for which a segmentation mask is available ineachimage3 A visualhullmodelisthususedtoinitialize the iterative deformation process for all these datasets, ex- ceptfor face and body ,wherea limited set of viewpointsis available, and the convex hull of the reconstructed patches is used instead3 The segmentationmask is also used by our

stereoalgorithm,whichsimplyignoresthebackgrounddur- ingfeature detection and matching3 The rim consistency term has only been used in the surface deformation pro- cess for the roman and skull datasets, for which accurate contours are available3 The boundingvolume information has not been used to filter out erroneous matches in our experiments3 Eur algorithm has successfully reconstructed varioussurfacestructuressuchasthehigh-curvatureand+or shallow surface details of roman , the thin cheek bone and deep eye sockets of skull , and the intricate facial features of face and face-2 3

Guantitative comparisons kindly pro- vided by D3 Scharstein on the datasets presented in 4 %0 show that the proposed method outperforms all the other evaluatedtechniquesin terms of accuracy 2distance such that a given percentage of the reconstruction is within from the ground truth model8 and completeness 2percent- ageofthegroundtruthmodelthatiswithinagivendistance fromthereconstruction8onfouroutofthesixdatasets3 The datasets consists of two objects 2 temple and dino 8, each of whichconstitutesthreedatasets2 sparse ring, ring, and full with differentnumbersof inputimages, rangingfrom16to more

than =00, and our method achieves the best accuracy and completeness on all the dino datasets and the smallest sparse ring temple 3 Note that the sparse ring temple and dino datasets consistingof 16 views have been shown in Fig3 and their quantitative comparison with the top per- formers 4 1; %1 %% %= 7aregiveninFig3 Fi- nally,thebottompartofFig3 comparesouralgorithmwith Aern andez )steban.s method 4 7, which is one of the best multi-view stereo reconstruction algorithms today, for the polynesian dataset, where a laser scanned modelis used as a ground truth3 As shown by the close-ups in this

figure, ourmodelisqualitativelybetterthantheAer andez.smodel, especially at sharp concave structures3 This is also shown quantitatively usingthe same accuracy and completeness measuresasbefore3 ,econstruction results for scene datasets are shown in Fig3 3 Additional information 2such as segmentation masks, boundingboxes, or valid depth ranges8 is notavail- able in this case3 The city-hall example is interestingbe- causeviewpointschangesignificantlyacrossinputcameras, andpartofthebuildingisonlyvisibleinsomeoftheframes3 Nonetheless, our algorithm has successfully reconstructed

thewholescenewithfinestructuraldetails3 The wall dataset is challenging since a large portion of several of the input pictures consists of runningwater, and the corresponding image regions have successfully been detected as outliers, while accurate surface details have been recovered for the rigid wall structure3 Finally, Fig3 10 illustrates our results on crowded scene datasets3 Eur algorithm reconstructs the backgroundbuildingfromthe brussels dataset,despitepeo- pleoccludingvariouspartsofthescene3 The steps-2 dataset is an artificially generated example, where we have manu- ally

painted a red cartoonish human in each image of steps images3 To further test the robustness of our algorithm against outliers, the steps-: dataset has been created from steps-2 by copyingits images but replacingthe fifth one with the third, without changing camera parameters3 This is a particularly challenging example, since the whole fifth imagemust be detectedas an outlier3 We have successfully reconstructedthedetailsofbothdespitetheseoutliers3 Note ,endered views of the reconstructions and all the quantitative evalua- tionscanbefoundat

Page 7
Figure 73 Sample results on object datasetsB From left to right and top to bottomB temple dino skull polynesian face face-2 ,and body datasets3 In each case, one of the input image is shown, along with two views of texture-mapped reconstructed patches and shaded polygonal surfaces3 thatthe convexhullof the reconstructedpatches.centersis usedforthesurfaceinitializationexceptforthe city-hall and brussels ,forwhichtheunionofhemispheresisused3 The bottleneck of our multi-view stereo matchingal- gorithm is the patch expansion step, whose running time varies from about %0 minutes,

for small datasets such as temple and dino , to up to a few hours for datasets con- sisting of high-resolution images, such as polynesian and city-hall 3 The runningtimes of polygonal surface extrac- tion also range from =0 minutes to a few hours depending on the size of datasets3 This is comparable to many varia- tionalmethods4 %0 7,despitethefactthatouralgorithmdoes not involve any large optimization problem3 This is due to several factorsB First, unlike algorithms using voxels or discretized depth labels, our method solves a fully contin- uous optimization problem, thus does not suffer from

dis- cretization errors and can handle high-resolution input im- agesdirectly,buttradesspeedforaccuracy3 Second,weuse a region-based photometric consistency measure, which is muchslower thana point-basedmeasure,but takesinto ac- count surface orientation duringoptimization3 In turn, this allows our algorithm to handle gracefully outdoor images with varyingillumination3 Again, accuracy and speed are conflictingrequirements3 To conclude, let us note that our futurework will be aimed at addinga temporalcomponent to our reconstruction algorithm, with the aim of achieving markerlessface

andbodymotioncapture3 Acknowledgments: ThisworkwassupportedinpartbytheNa- tional Science Foundation under grant IIS-05=515%3 We thank S3 Seitz, B3 Curless, J3 Diebel, D3 Scharstein, and ,3 Szeliski for the temple and dino datasets and evaluations, C3 Aern andez )steban, F3 Schmitt, and the Museum of Cherbourgfor polynesian ,S3Sul- livan, A3 Suter, and Industrial Fight and Magic for face face-2 body steps ,and wall , C3 Strecha for city-hall and brussels ,and finally J3Blumenfeld and S3,3Feigh for the skull data set3 References 417 E3 Faugeras and ,3 Heriven3 Sariational principles,

surface evolution, PD).s, level set methods and the stereo problem3 I,,, Trans. Im. 1roc. ,72=8B==6–=44, 166;3 4%7 S3 Ferrari, T3 Tuytelaars, and F3 San Dool3 Simultaneous object recognition and segmentation by image exploration3 In ,00V ,%0043 4=7 Y3 Furukawa and J3 Ponce3 Carved visual hulls for image- based modeling3 In ,00V ,volume 1, %0063 447 Y3 Furukawa and J3 Ponce3 Aigh-fidelity image-based mod- eling3 Technical,eportCS,%006-0%, University ofIllinois atUrbana-Champaign, %0063 457 M3 Doesele, B3 Curless, and S3 M3 Seitz3 Multi-view stereo revisited3 In 0V1R ,pages %40%–%406,

Page 8
;5 60 65 AccuracyQ2temple8 ,atioQ4U7 )rrorQ4mm7 ;5 60 65 AccuracyQ2dino8 ,atioQ4U7 )rrorQ4mm7 03 03 13 13 ;0 ;% ;4 ;6 ;; 60 6% 64 66 6; 100 CompletenessQ2temple8 )rrorQ4mm7 CompletenessQ4U7 03 03 13 13 ;; 60 6% 64 66 6; 100 CompletenessQ2dino8 )rrorQ4mm7 CompletenessQ4U7 034 036 03; 13% 134 136 03% 034 036 03; 13% 134 Furukawa Doesele Pons Tran Strecha Proposed method Aernandez Sogiatzis FaserQ,ange Scanner EurQmethod Aernandez )steban.sQmethod 60 65 70 75 ;0 ;5 60 65 100 130 036 03; 037 036 035 034 03= 03% 031 031 03% 03= 034 035 036 037 03; 036 130 %0 =0 40 50 60 70 ;0

60 100 EurQmethod Aernandez Accuracy Completeness ,atioQ4U7 )rrorQ4mm7 )rrorQ4mm7 CompletenessQ4U7 Figure ;3 Guantitative comparison with other multi-view stereo algo- rithms for the temple and dino at the top, and for the polynesian at the bottom3 Figure 63 Sample results on scene datasets3 From top to bottomB steps city-hall ,and wall datasets3 467 M3 Aabbecke and F3 Hobbelt3 Iterative multi-view plane fit- ting3 In 11th .all Workshop on VISIO3, MOD,LI3G, A3D VISUALIAATIO3 ,%0063 477 C3 Aern andez )steban and F3 Schmitt3 Silhouette and stereo fusionfor =D object modeling3 0VIU ,662=8,

%0043 4;7 A3 Aornungand F3 Hobbelt3 Aierarchical volumetric multi- view stereo reconstruction of manifold surfaces based on dual graph embedding3 In 0V1R ,%0063 467 ,3Heriven3 Avariationalframeworktoshapefromcontours3 Technical ,eport %00%-%%1, )NPC,%00%3 4107 F3 Hobbelt and M3 Botsch3 A survey of point-based tech- niques in computer graphics3 0omputers & Graphics %;268B;01–;14, %0043 4117 A3 Hushal and J3 Ponce3 A novel approach to modeling =d objects from stereo views and recognizing them in pho- Figure103 Sample results on crowded scene datasets3 ToprowB input im- agesfor brussels and

steps-2 CSecondandthirdrowsB reconstruction results for brussels steps-2 ,and steps-: 3 Notethatthe steps-: dataset isgenerated bycopying steps-2 butreplacing itsthirdimagebythefifthwithout chang- ingcamera parameters3 tographs3 In ,00V ,volume %, pages 56=–574, %0063 41%7 H3Hutulakos and S3Seitz3 Atheory of shape by space carv- ing3 I40V ,=;2=8B166–%1;, %0003 41=7 M3FhuillierandF3Guan3 Aquasi-denseapproachtosurface reconstruction from uncalibrated images3 1AMI , %72=8B41; 4==, %0053 4147 W3Naylor and B3Chapman3 Wnlib3 4157 M3 Ekutami and T3 Hanade3 A multiple-baseline stereo sys- tem3

1AMI , 15248B=5=–=6=, 166=3 4167 D3P3EttoandT3H3W3Chau3 Vregion-growing.algorithmfor matchingof terrainimages3 Image Vision 0omput. ,72%8B;= 64, 16;63 4177 S3 Paris, F3 Sillion, and F3 Guan3 A surface reconstruction method using global graph cut optimization3 In A00V ,Jan- uary %0043 41;7 J3-P3 Pons, ,3 Heriven, and E3 D3 Faugeras3 Modelling dy- namic scenes byregistering multi-viewimage sequences3 In 0V1R (2) , pages ;%%–;%7, %0053 4167 S3 Seitz and C3 Dyer3 Photorealistic scene reconstruction by voxel coloring3 In 0V1R ,pages 1067–107=, 16673 4%07 S3 M3 Seitz, B3 Curless, J3 Diebel, D3

Scharstein, and ,3 Szeliski3 A comparison and evaluation of multi-view stereoreconstruction algorithms3 0V1R ,1, %0063 4%17 C3 Strecha, ,3 Fransens, and F3 S3 Dool3 Combined depth and outlier estimation in multi-view stereo3 In 0V1R ,pages %=64–%401, %0063 4%%7 S3 Tran and F3 Davis3 =d surface reconstruction usinggraph cuts withsurface constraints3 In ,00V ,%0063 4%=7 D3 Sogiatzis, P3 A3 Torr, and ,3 Cipolla3 Multi-view stereo via volumetric graph-cuts3 In 0V1R ,%0053