Sunday, August 29, 2010

Pyramids

It is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Historically, pyramid representation is a predecessor to scale space representation and multiresolution analysis (WIKIPEDIA).

There are 2 types of pyramids: the lowpass and bandpass.
SIFT uses LOWPASS pyramids: A lowpass pyramid is generated by first smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of two along each coordinate direction.



A bandpass pyramid is obtained by forming the difference between adjacent levels in a pyramid, where in addition some kind of interpolation is performed between representations at adjacent levels of resolution, to enable the computation of pixelwise differences.

Disadvantage:
Pyramid representation leads to rapidly decreasing image size. This reduces computational work in computation of the representation and subsequent processing. The main disadvantage of with pyramids is that they are defined from a logarithmic process, making theoritical analysis complicated. Furthermore, they correspond to quite a coarse quantization along the scale direction, which makes logarithmically hard to relate images structures across multiple scales. ...


A scale-space is a continuous function which can be used to find extrema across all possible scales [A. Witkin. Scale-space filtering, int. joint conf. Artif. Intell, 2:1019–1021, 1983.].

In computer vision the scale-space is typically implemented as an image pyramid where the input image is iteratively convolved with Gaussian kernel and repeatedly sub-sampled (reduced in size). This method is used to great effect in SIFT (See my post of SIFT Details) but since each layer relies on the previous, and images need to be resized it is not computationally efficient.

Another way to construct the scale-space is by applying kernels of increasing size to the original image. This allows for multiple layers of the scale-space pyramid to be processed simultaneously and negates the need to subsample the image hence providing performance increase.

The SURF approach leaves the original image unchanged and varies only the filter size.

References:

H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. European Conference on Computer Vision, 1:404–417, 2006.

D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

No comments:

Post a Comment