NonparametricLocalT ransformsfor ComputingVisualCorrespondence Ramin Zabih and John W o o d ll Computer Science Departmen t Cornell Univ ersit  Ithaca NY  USA In terv al Researc h Corp oration C P ag
174K - views

NonparametricLocalT ransformsfor ComputingVisualCorrespondence Ramin Zabih and John W o o d ll Computer Science Departmen t Cornell Univ ersit Ithaca NY USA In terv al Researc h Corp oration C P ag

e prop ose a new approac h to the corresp ondence prob lem that mak es use of nonparametric lo cal transforms as the basis for correlation Nonparametric lo cal transforms rely on the relativ eorder ing of lo cal in tensit yv alues and not on the in

Tags : prop ose
Download Pdf

NonparametricLocalT ransformsfor ComputingVisualCorrespondence Ramin Zabih and John W o o d ll Computer Science Departmen t Cornell Univ ersit Ithaca NY USA In terv al Researc h Corp oration C P ag

Download Pdf - The PPT/PDF document "NonparametricLocalT ransformsfor Computi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "NonparametricLocalT ransformsfor ComputingVisualCorrespondence Ramin Zabih and John W o o d ll Computer Science Departmen t Cornell Univ ersit Ithaca NY USA In terv al Researc h Corp oration C P ag"— Presentation transcript:

Page 1
Non-parametricLocalT ransformsfor ComputingVisualCorrespondence Ramin Zabih and John W o o d ll Computer Science Departmen t, Cornell Univ ersit , Ithaca NY 14853-7501, USA In terv al Researc h Corp oration, 1801-C P age Mill Road, P alo Alto CA 94304, USA Abstract. e prop ose a new approac h to the corresp ondence prob- lem that mak es use of non-parametric lo cal transforms as the basis for correlation. Non-parametric lo cal transforms rely on the relativ eorder- ing of lo cal in tensit yv alues, and not on the in tensit yv alues themselv es. Correlation using suc h

transforms can tolerate a signi can tn um ber of outliers. This can result in impro ed p erformance near ob ject b oundaries when compared with con en tional metho ds suc h as normalized correla- tion. W ein tro duce t o non-parametric lo cal transforms: the anktr ans- form , whic h measures lo cal in tensit , and the ensustr ansform ,whic summarizes lo cal image structure. W e describ e some prop erties of these transforms, and demonstrate their utilit yonbothsyn thetic and real data. In troduction The corresp ondence problem is a fundamen tal problem in vision, as it forms the basis for

stereo depth computation and most optical o w algorithms. Giv en o images of the same scene, a pixel in one image corresp onds to a pixel in the other if b oth pixels are pro jections along lines of sigh t of the same ph ysical scene elemen t. If the t o images are temp orally consecutiv e, then computing corresp ondence determines motion. If the t o images are spatially separated but sim ultaneous, then computing corresp ondence determines stereo depth. a- ase approac hes to the corresp ondence problem [4 ] nd a dense solution, usually y relying on some kind of statistical correlation b et

een lo cal in tensit y regions. In this pap er w e prop ose a new area-based approac h to the corresp ondence problem, based on non-parametric lo cal transforms follo ed b y correlation. W begin b y motiv ating our approac h, then sho who w non-parametric lo cal trans- forms can b e used to determine corresp ondence. In section 3 w ein tro duce the ank and ensus transforms, and describ e their prop erties. W egiv e empirical evidence of the p erformance of our metho ds in section 4, using b oth natural and syn thetic images. Finally , in section 5 w esurv ey related w ork and discuss some

planned extensions. Non-parametriclocaltransforms Our approac h to the corresp ondence problem is rst to apply a lo cal transform to the image, and then to use correlation. In this resp ect, our w ork is similar In Proceedings of European Conference on Computer Vision, Stockholm, Sweden, May 1994, pages 151-158
Page 2
to that of Nishihara [12 ] and Seitz [14 , 1]. Nishihara's transform is the sign bit of the image after con olution with a Laplacian, while Seitz's transform is the direction of the in tensit y gradien t. Most approac hes to the corresp ondence problem ha e dicult y

near discon- tin uities in disparit ,whic h o ccur at the b oundaries of ob jects. Near suc ha b oundary , the pixels in a lo cal region represen t scene elemen ts from t o distinct instensit y p opulations. Some of the pixels come from the ob ject, and some from other parts of the scene. As a result, the lo cal pixel distribution will in general be m ultim o dal near a b oundary . This p oses a problem for man y corresp ondence algorithms, suc h as normalized correlation [6 ]. Corresp ondence algorithms are usually based on standard statistical meth- o ds, whic h are b est suited to a single

p opulation. P arametric measures, suc as the mean or v ariance, do not b eha ew ell in the presence of distinct sub- p opulations, eac h with its o wn coheren t parameters. This problem, whic hw will refer to as factionalism , is a ma jor issue in computer vision, and has b een addressed with a v ariet y of metho ds, including robust statistics [2 , 3], Mark Random Fields [5 ] and regularization [13 ]. The fundamen tal idea b ehind our approac h is to de ne a lo cal image trans- form that tolerates factionalism. Corresp ondence can b e computed b y trans- forming b oth images and then using

correlation. F or this approac h to succeed, the transform m ust result in signi can t lo cal v ariation within a giv en image; in addition, it m ust giv e similar results near corresp onding p oin ts b et een the t images. (Marr and Nishihara [10 ] refer to these t o prop erties as sensitivity and stability .) Finally , to handle stereo imagery , the transform should b e in arian under c hanges in image gain and bias. Our approac h relies on lo cal transforms based on non-parametric measures that are designed to tolerate factionalism. Non-parametric statistics [9 ] is dis- tinguished b y the

use of ordering information among data, rather than the data alues themselv es. Non-parametric lo cal transforms, whic hw ein tro duced in [15 ], are lo cal image transformations that rely on the relativ e ordering of in tensities, and not on the in tensit yv alues themselv es. Theranktransformandthecensustransform e next describ e t o non-parametric lo cal transforms. The rst, called the ank tr ansform , is a non-parametric measure of lo cal in tensit . The second, called the ensustr ansform , is a non-parametric summary of lo cal spatial structure. Let b e a pixel, ) its in tensit y (usually

an 8-bit in teger), and ) the set of pixels in some square neigh b orho o d of diameter surrounding .Allnon- parametric transforms dep end up on the comparativ ein tensities of ersus the pixels in the neigh b orho o d ). The transforms w e will discuss only dep end on the sign of the comparison. De ne P; P )tobe 1if )and0 otherwise. The non-parametric lo cal transforms dep end solely on the set of pixel
Page 3
comparisons, whic h is the set of ordered pairs )= ; P; P )) They di er in terms of their exact reliance on The rst non-parametric lo cal transform is called the anktr ansform

, and is de ned as the n um b er of pixels in the lo cal region whose in tensit y is less than the in tensit y of the cen ter pixel. F ormally , the rank transform )is )= kf gk Note that ) is not an in tensit y at all, but rather an in teger in the range ;:::;d . This distinguishes the rank transform from other attempts to use non-parametric measures suc h as median lters, mo de lters or rank lters [7 ]. T o compute corresp ondence, w eha e used correlation (minimi zing the sum of absolute v alues of di erences) on the rank-transformed images. The second non-parametric transform is named the

ensustr ansform maps the lo cal neigh b orho o d surrounding a pixel to a bit string represen ting the set of neigh b oring pixels whose in tensit y is less than that of .Let )= , where is the Mink wski sum and is a set of displacemen ts, and let denote concatenation. The census transform can then b e sp eci ed, )= i;j P; P +[ i; j ]) Tw o pixels of census transformed images are compared for similarit y using the Hammi ng distance, i.e. the n um b er of bits that di er in the t o bit strings. o compute corresp ondence, w eha e minimi zed the Hamming distance after applying the census

transform. These lo cal transforms rely solely up on the set of comparisons , and are therefore in arian t under c hanges in gain or bias. The tolerance of these trans- forms for factionalism also results from their reliance up on . If a minorit yof pixels in a lo cal neigh borhood has a v ery di eren tin tensit y distribution than the ma jorit , only comparisons in olving a mem b er of the minorit y are a ected. Suc h pixels do not mak ea con tribution prop ortional to their in tensit , but pro- p ortional to their n um b er. This limited dep endence on the minorit y's in tensit alues is a ma

jor distinction b et een our approac h and parametric measures. o illustrate the manner in whic h these transforms tolerate factionalism, consider a three-b y-three region of an image whose in tensities are 127 127 129 126 128 129 127 131 for some v alue 0 A< 256. Consider the e ect on v arious parametric and non- parametric measures, computed at the cen ter of this region, as aries o er its
Page 4
10 12 Template size 100 200 300 400 500 Correct matches Fig.1. Comparison of rank ( ), normalized ( ) and SSD ( ) correlation on Asc an- den data-set with salt-and-p epp er noise 256 p

ossible v alues. The mean of this region v aries from 114 to 142, while the ariance ranges from 2 to 1823. These parametric measures exhibit con tin uous ariation o er a substan tial range as hanges. Non-parametric transforms are more stable, ho ev er. All the elemen ts of except one will remain xed as hanges. will b e 110 10 where is 1 if A< 128, and otherwise 0. The census transform simply results in the bits of in some canonical ordering, suc has ;a . The rank transform will giv e5if A< 128, and otherwise 4. This comparison sho ws the tolerance that non-parametric measures ha e for

factionalism. A minorit y of pixels can ha ea v ery di eren tv alue, but the e ect on the rank and census transforms is limited b y the size of the minorit Empiricalresults eha e implemen ted these non-parametric lo cal transforms, and ha e explored their b eha vior on b oth real and syn thetic imagery . The motiv ation for our ap- proac hw as to obtain b etter results near the edges of ob jects. W eha e obtained comparativ e results on syn thetic data whic hsho w that our metho ds can out- p erform normalized correlation. In [1 ], Asc anden and Guggen b uhl ha e describ ed the p erformance

of a um b er of area-based stereo algorithms under sev eral di eren t noise mo dels. or con enience, w e are rounding the actual v alues
Page 5
Figure 1 compares correlation with the rank transform against t o standard stereo algorithms, namely normalized correlation and sum of squared di erences (SSD) correlation. P erformance is measured as function of template radius, as describ ed in [1 ]. Fig.2. Righ t and left random-dot stereograms Fig.3. Disparities from normalized correlation, rank and census transforms Another w y to compare correlation metho ds is with random dot imagery

Figure 2 sho ws a random dot stereogram of a square oating in fron tof a at surface, on whic h there is a v ertical in tensit y edge. The images are noise-free, but the in tensities di er b y xed gain and bias. Figure 3 sho ws the disparities computed from normalized correlation and from correlation with the rank and census transforms. There should only b e 2 disparities in this scene: one for the bac kground surface (whic h is at disparit 0), and one for the foreground square (whic h is at disparit y 104). Notice the comparativ ely p o or p erformance of normalized correlation near the edges,

where it in tro duces spurious disparities. The p erformance of our approac h can b e seen ycoun ting the pixels with incorrect disparities, as sho wn b elo w. Algorithm Incorrect matc hes Normalized 1385 Rank transform 609 Census transform 407
Page 6
On this example, the non-parametric lo cal transforms app ear to exhibit b etter p erformance than normalized correlation. The b est evidence in fa or of the non-parametric lo cal transforms is their p erformance on real images. W eha e used the rank transform and the census transform on a n um b er of di eren t images to obtain stereo

depth. Depth maps are sho wn with ligh ter shades indicating larger disparities and th us nearer scene elemen ts. All the depth maps sho wn w ere generated with the same parameters (a transform radius of 7 pixels, and a correlation radius of 4 pixels). Figure 4 sho ws a b eam-splitter image of a pupp et (Elmo from the television sho w \Sesame Street"). The depth results of the non-parametric lo cal transforms are sho wn in gure 5. Figure 6 sho ws an image from a tree sequence captured ymo ving a camera along a rail, and the depth results from the transforms. Relatedw orkandplannedextensions

The algorithms w e describ e are related to non-parametric measures of asso ci- ation, suc h as Sp earman's correlation co ecien or Kendall's . These are measures of asso ciation of paired data that are based up on comparisons. Ho w- ev er, suc h measures are v ery exp ensiv e to compute, and do not capture the spatial structure of images. Probably the most similar approac htoours isthe w ork based on robust statis- tics [2 , 11, 3]. Robust statistics di ers from our approac h in that they emphasize reducing the in uence of outliers. Implicit in this w ork is the assumption that outliers are

distributed randomly .Ho ev er, at the edges of ob jects, factionalism pro duces outliers with consisten t distributions. Our approac h tolerates outliers with consisten t distributions, and do es not allo w pixels from a small faction to con tribute in a manner prop ortional to their in tensit One limitatio n of the non-parametric transforms w eha e describ ed is that the amoun t of information they asso ciate with a pixel is not v ery large. W ehope to address this shortcoming b ycom bining a n um b er of di eren t non-parametric transforms in to a v ector of measures asso ciated with a

pixel. Ultimately ,w ould lik etoa oid the correlation phase altogether and simply matc h pixels according to a set of semi-indep enden t measures, in a manner similar to that prop osed b y Kass [8]. Another limitati on of our approac h is that the lo cal measures rely hea vily up on the in tensit y of the cen ter pixel. This has not b een an issue in practice, but w e prop ose to address it b y doing comparisons from a lo cal median in tensit instead of ). An additional idea w ein tend to pursue is to generalize whic h curren tly uses the sign of the in tensit y di erences. W e plan to

explore using higher-order di erences, as w ell as the information con tained in the total ordering of the lo cal pixel in tensities. e are also in terested in ecien t algorithms for implemen ting suc h trans- forms. [15 ] describ es a n um b er of fast algorithms for computing the rank trans- The tree imagery app ears courtesy of Harlyn Bak er and Bob Bolles
Page 7
form based on dynamic programmi ng. W eha e recen tly implemen ted an appro x- imation of the census transform on a Sun w orkstation, whic h pro duces stereo depth with 24 disparities on 640 b y 240 images at 1{2 frames

p er second. Ac kno wledgemen ortions of this w ork w ere done while the rst author w as at the Computer Science Departmen t at Stanford Univ ersit , supp orted b y a fello wship from the annie and John Hertz F oundation. W e wish to thank SRI for the use of their Connection Mac hine. References 1. P .Asc anden and W. Guggen b uhl. Exp erimen tal results from a comparativ study on correlation-t yp e registration algorithms. In F orstner and Ru wiedel, edi- tors, obustComputerVision , pages 268{289. Wic hmann, 1993. 2. P aul Besl, Je rey Birc h, and La yne W atson. Robust windo w op erators.

In Inter- nationalConfer enc eonComputerVision , pages 591{600, 1988. 3. Mic hael Blac k and P Anandan. A framew ork for the robust estimation of optical o w. In InternationalConfer enc eonComputerVision , pages 231{236, 1993. 4. U. Dhond and J. Aggarw al. Structure from stereo | a review. IEEET ansactions onSystems,ManandCyb ernetics , 19(6), 1989. 5. Stuart Geman and Donald Geman. Sto c hastic relaxation, gibbs distributions, and the ba esian restoration of images. IEEEP AMI , 6:721{741, 1984. 6. Marsha Jo Hanna. ComputerMatchingofA asinSter oImages . PhD thesis, Stanford, 1974. 7. R. Ho

dgson, D. Bailey ,M.Na ylor, A. Ng, and S. McNeill. Prop erties, implemen- tations and applications of rank lters. JournalofImageandVisionComputing 3(1):3{14, F ebruary 1985. 8. Mic hael Kass. Computing visual corresp ondence. ARP AImageUnderstandi ng Pr dings , pages 54{60, 1983. 9. E. L. Lehman. Nonp ar ametrics :statistic almetho dsb ase donr anks . Holden-Da 1975. 10. Da vid Marr and Keith Nishihara. Represen tation and recognition of the spatial or- ganization of three-dimensional shap es. Pr dingsoftheR oyalSo cietyofL ondon , 200:269{294, 1978. 11. P eter Meer, Doron Min tz, Azriel

Rosenfeld, and Dong Y o on Kim. Robust regres- sion metho ds for computer vision: A review. InternationalJournalofComputer Vision , 6(1):59{70, 1991. 12. H. Keith Nishihara. Practical real-time imaging stereo matc her. Optic alEngine er- ing , 23(5):536{545, Sept{Oct 1984. 13. T omaso P oggio, Vincen tT orre, and Christof Ko c h. Computational vision and regularizati on theory Natur , 317:314{319, 1985. 14. P eter Seitz. Using lo cal orien tational information as image primitiv e for robust ob ject recognition. SPIEpr dings , 1199:1630{1639, 1989. 15. Ramin Zabih. Individuatin gUnknownObje

ctsbyCombiningMotionandSter PhD thesis, Stanford Univ ersit , 1994 (forthcoming). This article w as pro cessed using the L X macro pac age with LLNCS st yle
Page 8
Fig.4. Elmo stereo pair from b eam-splitter Fig.5. Rank and census results on Elmo Fig.6. ree image with rank and census correlation results