Scale-space theory
A theory of multi-scale representation of sensory data developed by the image processing and computer vision communities. The purpose is to represent signals at multiple scales in such a way that fine scale structures are successively suppressed, and a scale parameter $t$ is associated with each level in the multi-scale representation.
For a given signal $f : \mathbf{R} ^ { N } \rightarrow \mathbf{R}$, a linear scale-space representation is a family of derived signals $L : \mathbf{R} ^ { N } \times \mathbf{R} \rightarrow \mathbf{R}$, defined by $L ( . \ ; 0 ) = f ( . )$ and
\begin{equation*} L (. ; t ) = h (. ; t ) * f ( . ) \end{equation*}
for some family $h : \mathbf{R} ^ { N } \times \mathbf{R} \rightarrow \mathbf{R}$ of convolution kernels [a1], [a2] (cf. also Integral equation of convolution type). An essential requirement on the scale-space family $L$ is that the representation at a coarse scale constitutes a simplification of the representations at finer scales. Several different ways of formalizing this requirement about non-creation of new structures with increasing scales show that the Gaussian kernel
\begin{equation*} g ( x ; t ) = \frac { 1 } { ( 2 \pi t ) ^ { N / 2 } } \operatorname { exp } \left( - \frac { x _ { 1 } ^ { 2 } + \ldots + x _ { N } ^ { 2 } } { 2 t } \right) \end{equation*}
constitutes a canonical choice for generating a scale-space representation [a3], [a4], [a5], [a6]. Equivalently, the scale-space family satisfies the diffusion equation
\begin{equation*} \partial _ { t } L = \frac { 1 } { 2 } \nabla ^ { 2 } L. \end{equation*}
The motivation for generating a scale-space representation of a given data set originates from the basic fact that real-world objects are composed of different structures at different scales and may appear in different ways depending on the scale of observation. For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a machine vision system analyzing an unknown scene, there is no way to know what scales are appropriate for describing the data. Thus, the only reasonable approach is to consider descriptions at all scales simultaneously [a1], [a2].
From the scale-space representation, at any level of scale one can define scale-space derivatives by
\begin{equation*} L _ { x ^ \alpha} ( x ; t ) = \partial _ { x ^ \alpha} ( g ( x ; t ) * f ( x ) ), \end{equation*}
where $\alpha = ( \alpha _ { 1 } , \dots , \alpha _ { D } ) ^ { T }$ and $\partial _ { x ^ \alpha} L = L _ { x _ { 1 } ^ {\alpha _ { 1}} \ldots x _ { D } ^ { \alpha _ { D } } }$ constitute multi-index notation for the derivative operator $\partial _ { x ^\alpha}$. Such Gaussian derivative operators provide a compact way to characterize the local image structure around a certain image point at any scale. Specifically, the output from scale-space derivatives can be combined into multi-scale differential invariants, to serve as feature detectors (see Edge detection and Corner detection for two examples).
More generally, a scale-space representation with its Gaussian derivative operators can serve as a basis for expressing a large number of early visual operations, including feature detection, stereo matching, computation of motion descriptors and the computation of cues to surface shape [a3], [a4]. Neuro-physiological studies have shown that there are receptive field profiles in the mammalian retina and visual cortex, which can be well modeled by the scale-space framework [a7].
Pyramid representation [a8] is a predecessor to scale-space representation, constructed by simultaneously smoothing and subsampling a given signal. In this way, computationally highly efficient algorithms can be obtained. A problem noted with pyramid representations, however, is that it is usually algorithmically hard to relate structures at different scales, due to the discrete nature of the scale levels. In a scale-space representation, the existence of a continuous scale parameter makes it conceptually much easier to express this deep structure [a2]. For features defined as zero-crossings of differential invariants, the implicit function theorem (cf. Implicit function) directly defines trajectories across scales, and at those scales where a bifurcation occurs, the local behaviour can be modeled by singularity theory [a3], [a5].
Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes [a9]. There are strong relations between scale-space theory and wavelet theory (cf. also Wavelet analysis), although these two notions of multi-scale representation have been developed from slightly different premises.
References
[a1] | A.P. Witkin, "Scale-space filtering" , Proc. 8th Internat. Joint Conf. Art. Intell. Karlsruhe, West Germany Aug. 1983 (1983) pp. 1019–1022 |
[a2] | J.J. Koenderink, "The structure of images" Biological Cybernetics , 50 (1984) pp. 363–370 |
[a3] | T. Lindeberg, "Scale-space theory in computer vision" , Kluwer Acad. Publ. (1994) |
[a4] | L.M.J. Florack, "Image structure" , Kluwer Acad. Publ. (1997) |
[a5] | J. Sporring, et al., "Gaussian scale-space theory" , Kluwer Acad. Publ. (1997) |
[a6] | B.M ter Haar Romeny, et al., "Proc. First Internat. Conf. scale-space" , Lecture Notes Computer Science , 1252 , Springer (1997) |
[a7] | R.A. Young, "The Gaussian derivative model for spatial vision: Retinal mechanisms" Spatial Vision , 2 (1987) pp. 273–293 |
[a8] | P.J. Burt, E.H. Adelson, "The Laplacian Pyramid as a Compact Image Code" IEEE Trans. Commun. , 9 : 4 (1983) pp. 532–540 |
[a9] | "Geometry-driven diffusion in computer vision" B.M ter Haar Romeny (ed.) , Kluwer Acad. Publ. (1994) |
Scale-space theory. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Scale-space_theory&oldid=49919