This is an expanded version of paper from Vision Interface  reference  ast Normalized Cr ossCorr elation J
122K - views

This is an expanded version of paper from Vision Interface reference ast Normalized Cr ossCorr elation J

Le wis Industrial Light Magic Abstract Although it is well kno wn that cross correlation can be ef 64257ciently implemented in the transform domain the nor malized form of cross correlation preferred for feature matching applications does not ha sim

Download Pdf

This is an expanded version of paper from Vision Interface reference ast Normalized Cr ossCorr elation J

Download Pdf - The PPT/PDF document "This is an expanded version of paper fro..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "This is an expanded version of paper from Vision Interface reference ast Normalized Cr ossCorr elation J"— Presentation transcript:

Page 1
This is an expanded version of paper from Vision Interface, 1995 (reference [10]) ast Normalized Cr oss-Corr elation J. Le wis Industrial Light Magic Abstract Although it is well kno wn that cross correlation can be ef ficiently implemented in the transform domain, the nor malized form of cross correlation preferred for feature matching applications does not ha simple frequenc domain xpression. Normalized cross correlation has been computed in the spatial domain for this reason. This short paper sho ws that unnormalized cross correlation can be ef ficiently

normalized using precomputing inte- grals of the image and image er the search windo Intr oduction The correlation between tw signals (cross correlation) is standard approach to feature detection [6 7] as well as component of more sophisticated techniques (e.g. [3 ]). xtbook presentations of correlation describe the con o- lution theorem and the attendant possibility of ef ficiently computing correlation in the frequenc domain using the ast ourier transform. Unfortunately the normalized form of correlation (correlation coef ficient) preferred in template matching does not ha

correspondingly sim- ple and ef ficient frequenc domain xpression. or this reason normalized cross-correlation has been computed in the spatial domain (e.g., [7 ], p. 585). Due to the com- putational cost of spatial domain con olution, se eral in- xact ut ast spatial domain matching methods ha also been de eloped [2]. This paper describes recently in- troduced algorithm [10 for obtaining normalized cross correlation from transform domain con olution. The ne algorithm in some cases pro vides an order of magnitude speedup er spatial domain computation of normalized cross correlation

(Section 5). Since we are presenting ersion of amiliar and widely used algorithm no attempt will be made to sur the literature on selection of features, whitening, ast con olution techniques, xtensions, alternate tech- niques, or applications. The literature on these topics can be approached through introductory te xts and handbooks Current address: Interv al Research, alo Alto CA zilla@computer .or [16 13 and recent papers such as [1 19 ]. Ne erthe- less, due to the ariety of feature tracking schemes that ha been adv ocated it may be necessary to establish that normalized cross-correlation

remains viable choice for some if not all applications. This is done in section 3. In order to mak the paper self contained, section de- scribes normalized cross-correlation and section briefly re vie ws transform domain and other ast con olution ap- proaches and the phase correlation technique. These sec- tions can be skipped by most readers. Section describes ho normalized cross-correlation can be obtained from transform domain computation of correlation. Section presents performance results. emplate Matching by Cr oss- Corr elation The use of cross-correlation for template matching is

mo- ti ated by the distance measure (squared Euclidean dis- tance) ;t u; x;y x; u; )] (where is the image and the sum is er x; under the windo containing the feature positioned at u; ). In the xpansion of ;t u; x;y x; x; u; u; )] the term u; is constant. If the term x; is approximately constant then the remaining cross-correlation term u; x;y x; u; (1) is measure of the similarity between the image and the feature. There are se eral disadv antages to using (1) for template matching:
Page 2
If the image ener gy x; aries with position, matching using (1) can ail. or xample, the corre-

lation between the feature and an xactly matching re gion in the image may be less than the correlation between the feature and bright spot. The range of u; is dependent on the size of the feature. Eq. (1) is not in ariant to changes in image ampli- tude such as those caused by changing lighting con- ditions across the image sequence. The corr elation coef ficient ercomes these dif ficulties by normalizing the image and feature ectors to unit length, yielding cosine-lik correlation coef ficient u; (2) x;y x; u;v ][ u; x;y x; u;v x;y u; where is the mean of the feature and u;v

is the mean of x; in the re gion under the feature. refer to (2) as normalized cr oss-corr elation eatur racking ppr oaches and Issues It is clear that normalized cross-correlation (NCC) is not the ideal approach to feature tracking since it is not in ari- ant with respect to imaging scale, rotation, and perspec- ti distortions. These limitations ha been addressed in arious schemes including some that incorporate NCC as component. This paper does not adv ocate the choice of NCC er alternate approaches. Rather the follo wing discussion will point out some of the issues in olv ed in arious

approaches to feature tracking, and will conclude that NCC is reasonable choice for some applications. SSD A. The basis of the sequential similarity detection al- gorithm (SSD A) [2] is the observ ation that full precision is only needed near the maximum of the cross-correlation function, while reduced precision can be used else where. The authors of [2 describe se eral ays of implementing reduced precision. An SSD implementation of cross- correlation proceeds by computing the summation in (1) in random order and uses the partial computation as Monte Carlo estimate of whether the particular

match lo- cation will be near maximum of the correlation surf ace. The computation at particular location is terminated be- fore completing the sum if the estimate suggests that the location corresponds to poor match. The SSD algorithm is simple and pro vides signifi- cant speedup er spatial domain cross-correlation. It has the disadv antage that it does not guarantee finding the maximum of the correlation surf ace. SSD performs well when the correlation surf ace has shallo slopes and broad maxima. While this condition is probably satisfied in man applications, it is vident

that images containing arrays of objects (pebbles, bricks, other te xtures) can gen- erate multiple narro xtrema in the correlation surf ace and thus mislead an SSD approach. secondary disad- antage of SSD is that it has parameters that need to de- termined (the number of terms used to form an estimate of the correlation coef ficient, and the early termination threshold on this estimate). Gr adient Descent Sear h. If it is assumed that feature translation between adjacent frames is small then the translation (and parameters of an af fine arp in [19 ]) can be obtained by gradient

descent [12 ]. Successful gradi- ent descent search requires that the interframe translation be less than the radius of the basin surrounding the min- imum of the matching error surf ace. This condition may be satisfied in man applications. Images sequences from hand-held cameras can violate this requirement, ho we er: small rotations of the camera can cause lar ge object trans- lations. Small or (as with SSD A) te xtured templates re- sult in matching error surf aces with narro xtrema and thus constrain the range of interframe translation that can be successfully track ed. Another dra

wback of gradient descent techniques is that the search is inherently serial, whereas NCC permits parallel implementation. Snak es. Snak es (acti contour models) ha the disad- antage that the cannot track objects that do not ha definable contour Some objects do not ha clearly defined boundary (whether due to intrinsic fuzzyness or due to lighting conditions), ut ne ertheless ha char acteristic distrib ution of color that may be trackable via cross-correlation. Acti contour models address more general problem than that of simple template matching in that the pro vide representation

of the deformed contour er time. Cross-correlation can track objects that deform er time, ut with ob vious and significant qualifications that will not be discussed here. Cross- correlation can also easily track feature that mo es by significant fraction of its wn size across frames, whereas this amount of translation could put snak outside of its basin of con er gence. avelets and other multi-r esolution sc hemes. Al- though the xistence of useful con olution theorem for elets is still matter of discussion (e.g., [11 ]; in some schemes elet con olution is in act imple-

mented using the ourier con olution theorem), ef ficient feature tracking can be implemented with elets and
Page 3
other multi-resolution representations using coarse-to- fine multi-resolution search. Multi-resolution techniques require, ho we er that the images contain suf ficient lo frequenc information to guide the initial stages of the search. As discussed in section 6, ideal features are some- times una ailable and one must resort to poorly defined features that may ha little lo w-frequenc informa- tion, such as configuration of small spots on an

oth- erwise uniform surf ace. Each of the approaches discussed abo has been adv o- cated by arious authors, ut there are fe wer compar isons between approaches. Reference [19 deri es an op- timal feature tracking scheme within the gradient search frame ork, ut the limitations of this frame ork are not addressed. An empirical study of template match- ing algorithms in the presence of arious image distor tions [4 found that NCC pro vides the best performance in all image cate gories, although one of the cheaper algo- rithms performs nearly as well for some types of distor tion. general

hierarchical frame ork for motion track- ing is discussed in [1 ]. correlation based matching ap- proach is selected though gradient approaches are also considered. Despite the age of the NCC algorithm and the xistence of more recent techniques that address its arious short- comings, it is probably air to say that suitable replace- ment has not been uni ersally recognized. NCC mak es fe requirements on the image sequence and has no pa- rameters to be searched by the user NCC can be used as is to pro vide simple feature tracking, or it can be used as component of more sophisticated (possibly

multi- resolution) matching scheme that may address scale and rotation in ariance, feature updating, and other issues. The choice of the correlation coef ficient er alternati matching criteria such as the sum of absolute dif ferences has also been justified as maximum-lik elihood estimation [18 ]. ackno wledge NCC as def ault choice in man applications where feature tracking is not in itself sub- ject of study as well as an occasional uilding block in vision and pattern recognition research (e.g. [3 ]). ast algorithm is therefore of interest. ransf orm Domain Computation Consider

the numerator in (2) and assume that we ha images x; x; u;v and x; x; in which the mean alue has already been remo ed: um u; x;y x; u; (3) or search windo of size and feature of size (3) requires approximately 1) additions and 1) multiplications. Eq. (3) is con olution of the image with the re ersed feature x; and can be computed by fF (4) where is the ourier transform. The comple conju- gate accomplishes re ersal of the feature via the ourier transform property Implementations of the FFT algorithm generally require that and be xtended with zeros to common po wer of tw o. The comple xity of

the transform compu- tation (3) is then 12 og real multiplications and 18 og real additions/subtractions. When is much lar ger than the comple xity of the direct spa- tial computation (3) is approximately multipli- cations/additions, and the direct method is aster than the transform method. The transform method becomes rela- ti ely more ef ficient as approaches and with lar ger 4.1 ast Con olution There are se eral well kno wn f ast con olution algo- rithms that do not use transform domain computation [13 ]. These approaches all into tw cate gories: algo- rithms that trade

multiplications for additional additions, and approaches that find lo wer point on the characteristic of (one-dimensional) con olution by em- bedding sections of one-dimensional con olution into separate dimensions of smaller multidimensional con- olution. While aster than direct con olution these al- gorithms are ne ertheless slo wer than transform domain con olution at moderate sizes [13 and in an case the do not address computation of the denominator of (2). 4.2 Phase Corr elation Because (4) can be ef ficiently computed in the transform domain, se eral transform domain methods

of approxi- mating the image ener gy normalization in (2) ha been de eloped. ariation in the image ener gy under the tem- plate can be reduced by high-pass filtering the image be- fore cross-correlation. This filtering can be con eniently added to the frequenc domain processing, ut selection of the cutof frequenc is problematica lo cutof may lea significant image ener gy ariations, whereas high cutof may remo information useful to the match. more rob ust approach is phase corr elation [9 ]. In this approach the transform coef ficients are normalized to unit magnitude

prior to computing correlation in the frequenc domain. Thus, the correlation is based only on phase information and is insensiti to changes in
Page 4
image intensity Although xperience has sho wn this approach to be successful, it has the dra wback that all transform components are weighted equally whereas one might xpect that insignificant components should be gi en less weight. In principle one should select the spec- tral pre-filtering so as to maximize the xpected correla- tion signal-to-noise ratio gi en the xpected second order moments of the signal and signal

noise. This approach is discussed in [16 and is similar to the classical matched filtering random signal processing technique. ith typi- cal 95 image correlation the best pre-filtering is approximately Laplacian rather than pure whitening. Normalizing Examining again the numerator of (2), we note that the mean of the feature can be precomputed, lea ving um u; x; u; u;v u; Since has zero mean and thus zero sum the term u;v u; is also zero, so the numerator of the normalized cross-correlation can be computed using (4). Examining the denominator of (2), the length of the fea- ture

ector can be precomputed in approximately operations (small compared to the cost of the cross- correlation), and in act the feature can be pre-normalized to length one. The problematic quantities are those in the xpression x;y x; u;v The image mean and local (RMS) ener gy must be computed at each u; i.e. at 1) locations, resulting in almost 1) oper ations (counting add, subtract, multiply as one operation each). This computation is more than is required for the direct computation of (3) and it may considerably out- weight the computation indicated by (4) when the trans- form method is

applicable. more ef ficient means of computing the image mean and ener gy under the feature is desired. These quantities can be ef ficiently computed from tables containing the inte gral (running sum) of the image and image square er the search area, i.e., u; u; )+ )+ u; 1) 1) u; u; u; 1) 1) with u; u; when either u; The ener gy of the image under the feature positioned at u; is then u; 1) 1) 1) 1) and similarly for the image sum under the feature. The problematic quantity x;y x; u;v can no be computed with ery fe operations since it xpands into an xpression in olving only the

image sum and sum squared under the feature. The construction of the ta- bles requires approximately operations, which is less than the cost of computing the numerator by (4) and considerably less than the 1) required to compute x;y x; u;v at each u; This technique of computing definite sum from pre- computed running sum has been independently used in number of fields; computer graphics application is de eloped in [5 ]. If the search for the maximum of the correlation surf ace is done in systematic ro w-scan or der it is possible to combine the table construction and reference

through state ariables and so oid xplicitly storing the table. When implemented on general pur pose computer the size of the table is not major consid- eration, ho we er and fle xibility in searching the correla- tion surf ace can be adv antageous. Note that the u; and u; xpressions are mar ginally stable, meaning that their z-transform (1 (one dimen- sional ersion here) has pole at whereas stabil- ity requires poles to be strictly inside the unit circle [14 ]. The computation should thus use lar ge inte ger rather than floating point arithmetic. erf ormance The performance of this

algorithm will be discussed in the conte xt of special ef fects image processing. The inte gration of synthetic and processed images into spe- cial ef fects sequences often requires accurate tracking of sequence mo ement and features. The use of auto- mated feature tracking in special ef fects as pioneered in mo vies such as Clif fhang er or est Gump and Speed Recently cross-correlation based feature track ers ha been introduced in commercial image compositing sys- tems such as Flame/Flint [20 ], Matador Adv ance [21 ], and After Ef fects [22 ]. The algorithm described in this paper as de

eloped for the mo vie or est Gump (1994), and has been used in number of subsequent projects. Special ef fects sequences in that mo vie included the replacement of arious mo ving elements and the addition of contemporary actor into
Page 5
search windo w(s) length direct NCC ast NCC 168 86 896 frames 15 hours 1.7 hours 115 200 150 150 490 frames 14.3 hours 57 minutes able 1: tracking sequences from or est Gump were re-timed using both direct and ast NCC algorithms using identical features and search windo ws on 100 Mhz R4000 processor These times include 16 sub-pix el search [17 at

the location of the best whole-pix el match. The sub-pix el search as computed using Eq. (2) (direct con olution) in all cases. feature size search windo w(s) Flint ast NCC 40 110 min. 40 seconds 16 seconds (subpix el=1) 40 110 n/a 21 seconds (subpix el=8) able 2: Measured tracking times on short sequence using the commercial Flint system and the algorithm described in the te xt. These are all-clock times obtained on an unloaded 200 Mhz R4400 processor with 380 me gabytes of memory (no sw apping occurred). Flint settings were FI FF It appears that subpix el search is only ailable in the more

xpensi Flame system. 0.4 0.5 0.6 0.7 0.8 0.9 20 50 80 110 10 0.4 0.5 0.6 0.7 0.8 0.9 20 50 80 110 Figure 1: Measured relati performance of trans- form domain ersus spatial domain normalized cross- correlation as function of the search windo size (depth axis) and the ratio of the feature size to search windo size. Figure 2: track ed feature from special ef fects se- quence in the mo vie or est Gump The re gion is out of focus and has noticeable film-grain noise across frames. small (e.g. 10 or smaller) area from this re gion ould not pro vide usable feature. The chosen feature size is

more than 40 pix els.
Page 6
historical film and video sequences. Manually pick ed features from one frame of sequence were automati- cally track ed er the remaining frames; this information as used as the basis for further processing. The relati performance of our algorithm is function of both the search windo size and the ratio of the fea- ture size to search windo size. Relati performance increases along the windo size axis (Fig. 1); higher resolution plot ould sho an additional ripple reflect- ing the relation between the search windo size and the bounding po wer of

tw o. The property that the relati performance is greater on lar ger problems is desirable. able illustrates the performance obtained in special ef fects feature tracking application. able compares the performance of our algorithm with that of high-end commercial image compositing package. Note that while small (e.g. 10 feature size ould suf- fice in an ideal digital image, in practice much lar ger fea- ture sizes and search windo ws are sometimes required or preferred: The image sequences used in film and video are sometimes obtained from mo ving cameras and may ha considerable

translation between frames due to camera shak e. Due to the high resolution required to represent digital film, en small mo ement across frames may correspond to distance of man pix els. The selected features are of course constrained to the ailable features in the image; distinct features are not al ays ailable at preferred scales and lo- cations. Man potential features in typical digitized image are either out of focus or blurred due to motion of the camera or object (Fig. 2). Feature match is also hindered by imaging noise such as film grain. Lar ge features are more accurate

in the presence of blur and noise. As result of these considerations feature sizes of 20 and lar ger and search windo ws of 50 and lar ger are often emplo yed. The ast algorithm in some cases reduces high-resolution feature tracking from an ernight to an er -lunch pro- cedure. ith lo wer (proxy) resolution and aster ma- chines, semi-automated feature tracking is tolerable in an interacti system. Certain applications in other fields may also benefit from the algorithm described here. or xample, image stabilization is common feature in recent con- sumer video cameras. Although most

such systems are stabilized by inertial means, one manuf acturer implemented digital stabilization and thus presumably used some form of image tracking. The algorithm used lea es room for impro ement ho we er: it has been criticized as being slo and unpredictable and product re vie recommended lea ving it disabled [15 ]. Refer ences [1] Anandan, Computational Frame ork and an Algorithm for the Measurement of isual Motion, Int. Computer ision 2(3), p. 283-310, 1989. [2] D. I. Barnea, H. Silv erman, class of algorithms for ast digital image re gistration, IEEE ans. Com- puter 21, pp. 179-186,

1972. [3] R. Brunelli and Poggio, F ace Recognition: Fea- tures ersus emplates, IEEE ans. attern Anal- ysis and Mac hine Intellig ence ol. 15, no. 10, pp. 1042-1052, 1993. [4] J. Burt, C. en, X. Xu, Local Correlation Mea- sures for Motion Analysis: Compariti Study, IEEE Conf attern Reco gnition Ima Pr ocessing 1982, pp. 269-274. [5] Cro Summed-Area ables for xture Map- ping, Computer Gr aphics ol 18, No. 3, pp. 207- 212, 1984. [6] R. O. Duda and E. Hart, attern Classification and Scene Analysis Ne ork: ile 1973. [7] R. C. Gonzalez and R. E. oods, Digital Ima Pr ocessing (third

edition), Reading, Massachusetts: Addison-W esle 1992. [8] A. Goshtasby S. H. Gage, and J. Bartholic, o-Stage Cross-Correlation Approach to emplate Matching, IEEE ans. attern Analysis and Ma- hine Intellig ence ol. 6, no. 3, pp. 374-378, 1984. [9] C. uglin and D. Hines, The Phase Correlation Im- age Alignment Method, Proc. Int. Conf. Cybernetics and Society 1975, pp. 163-165. [10] J. Le wis, F ast emplate Matching, ision Inter face p. 120-123, 1995. [11] A. R. Lindse The Non-Existence of elet Function Admitting elet ransform Con olu- tion Theorem of the ourier ype, Rome Laboratory

echnical Report C3BB, 1995. [12] B. D. Lucas and Kanade, An Iterati Image Re gistration echnique with an Application to Stereo ision, IJCAI 1981. [13] S. K. Mitra and J. Kaiser Handbook for Digital Signal Pr ocessing Ne ork: ile 1993. [14] A. Oppenheim and R. Schafer Digital Signal Pr ocessing Engle ood Clif fs, Ne Jerse y: Prentice- Hall, 1975.
Page 7
[15] D. Polk, Product Probe anasonic PV -IQ604, ideomak er October 1994, pp. 55-57. [16] Pratt, Digital Ima Pr ocessing John ile Ne ork, 1978. [17] Qi ian and M. N. Huhns, Algorithms for Subpix el Re gistration, CVGIP 35, p.

220-233, 1986. [18] Ryan, The Prediction of Cross-Correlation Accurac in Digital Stereo-P air Images, PhD thesis, Uni ersity of Arizona, 1981. [19] J. Shi and C. omasi, Good Features to rack, Proc. IEEE Conf on Computer ision and attern Reco gnition 1994. [20] Flame ef fects compositing softw are, Discreet Logic, Montreal, Quebec. [21] Advance ef fects compositing softw are, vid ech- nology Inc., wksb ury Massachusetts. [22] After Ef fects ef fects compositing softw are, Adobe (COSA), Mountain ie California.