Matlab - Working with Videos: August 2010

Sunday, August 29, 2010

Hessian Matrix

In mathematics, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The second derivative can be applied at a non-degenerate critical point x. If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f (this is true even if x is degenerate). Otherwise the test is inconclusive. In view of what has just been said, the second derivative test for functions of one and two variables is simple. In one variable, the Hessian contains just one second derivative; if it is positive then x is a local minimum, if it is negative then x is a local maximum; if it is zero then the test is inconclusive. In two variables, the discriminant can be used, because the determinant is the product of the eigenvalues. If it is positive then the eigenvalues are both positive, or both negative. If it is negative then the two eigenvalues have different signs. If it is zero, then the second derivative test is inconclusive(wikipdia).

consider a continuous function of two variables such that the value of the function at (x,y) is given by f(x,y). The Hessian matrix, H, is the matrix of partial derivates of the function f.

The determinant of this matrix, known as the discriminant, is calculated by:

The value of the discriminant is used to classify the maxima and minima of the function by the second order derivative test. Since the determinant is the product of eigenvalues of the Hessian we can classify the points based on the sign of the result. If the determinant is negative then the eigenvalues have different signs and hence the point is not a local extremum; if it is positive then either both eigenvalues are positive or both are negative and in either case the point is classified as an extremum.

NOTE:

We can calculate the derivatives by convolution with an appropriate kernel. In the case of SURF the second order scale normalised Gaussian is the chosen filter as it allows for analysis over scales as well as space. We can construct kernels for the Gaussian derivatives in x, y and combined xy direction such that we calculate the four entries of the Hessian matrix. Use of the Gaussian allows us to vary the amount of smoothing during the convolution stage so that the determinant is calculated at different scales. Furthermore, since the Gaussian is an isotropic (i.e. circularly symmetric) function, convolution with the kernel allows for rotation invariance.

Lowe found performance increase in approximating the Laplacian of Gaussians by a difference of Gaussians. In a similiar manner, Bay proposed an approximation to the Laplacian of Gaussians by using box filter representations of the respective kernels.

Histogram

A histogram is one of the basic quality tools. It is used to graphically summarize and display the distribution and variation of a process data set. A frequency distribution shows how often each different value in a set of data occurs. The main purpose of a histogram is to clarify the presentation of data.

• • •
What is the most common system response?
What distribution (center, variation and shape) does the data have?
Does the data look symmetric or is it skewed to the left or right?

Typical applications of histograms in root cause analysis include:
• Presenting data to determine which causes dominate
• Understanding the distribution of occurrences of different problems, causes,consequences, etc.

Weaknesses
There are two weaknesses of histograms that you should bear in mind. The first is that histograms can be manipulated to show different pictures. If too few or too many bars are used, the histogram can be misleading. This is an area which requires some judgment, and perhaps some experimentation, based on the analyst's experience.
Histograms can also obscure the time differences among data sets.
furthermore there is a loss of individual measurements due to grouping. Misrepresentation of data from groups that are too wide or too narrow.

Distributions to encounter:

Bi-Modal (double-peaked)
Distribution appears to have two peaks
• May indicate that data from more than one process are mixed together
o Materials may have come from two separate vendors
o Samples may have come from two separate machines
• A bi-modal curve often means that the data actually reflects two distinct processes with different centers. You will need to distinguish between the two processes to get a clear view of what is really happening in either individual process.

SURF Details (Speed Up Robust Features)

SURF (Speeded Up Robust Features) is a robust image detector & descriptor, first presented by Herbert Bay et al. in 2006, that can be used in computer vision tasks like object recognition or 3D reconstruction. It is partly inspired by the SIFT descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT. SURF is based on sums of approximated 2D Haar wavelet responses and makes an efficient use of integral images. As basic image features it uses a Haar wavelet approximation of the determinant of Hessian blob detector (WIKIPEDIA).

The performance gain in the SURF Implementation is due to use of integral images (See my post on Integral Images)

In constructing the scale space, Lowe proposed a difference of Gaussian to approximate the Laplacian of Gaussian.The images were blurred and subsampled to produce a scale space. After which the difference of gaussian was used to estimate the laplacian gaussian.

in SURF this was done using a box filter representation of a kernel. This kernel was used to produce the pyramid and kept the original image unchanged while changing the filter size.

TO DO: Add more of the details

PCA-SIFT Details

PCA-SIFT was introduced as an improvement for SIFT. The implementation only differs in the 4th step (keypoint descriptor). PCA-SIFT uses Principal Component Analysis (PCA) instead of a histogram to normalize the gradient patch. Hence, the feature vector is significantly smaller than the standard SIFT feature vector.

It differs from the original SIFT implementation in its forth stage. By reducing the 128 element of the original SIFT using PCA, results in a significant space benefit.

Another difference is the input vector is created by concatenating the horizontal and vertical gradient maps for 41x41 patch centered at the keypoint. Producing 2x39x39 = 3042 vector elements. The fewer components of PCA-SIFT requires less storage therefore, resulting in faster matching. The dimensionality of this feature space is arbitrary chosen as n=20, which results in a significant space benefit.

TO DO: add more detail

The Log-Polar Transform

The log-polar transform is a simple operation that changes the coordinate system of an image from Cartesian to log-polar.

Before and after conversion to logpolar coordinate system

Log-polar coordinates have several interesting properties::..
First, when optical flow is computed as the camera moves along the optical axis (as is common in robotics), the optical flow vectors are radial from the image center. However, in log-polar coordinates, the flow vectors are vertical. This can make the detection of moving objects more efficient computationally.

Second, the log polar transform converts an image to a form that is rotation and scale invariant. This can be very helpful for object detection. Finally, the log-polar transform emulates how images appear on the back of the human retina. (The brain then transforms the image to the Cartesian-like way we perceive it.)

In Matlab (http://www.mathworks.com/matlabcentral/newsreader/view_thread/108599):

function [pim,zoff] = PolarTransform(img,zcenter,rs,thetas)
% function [pim,zoff] = PolarTransform(img,zcenter,rs,thetas)
% extract the polar transform of a section of img
% polar transform origin at zcenter(x+iy)
% rs(pixels),thetas(radians) are sets of radii and angles to sample
% if they are scalars, then the radii are 1:r
% and thetas is taken to be dtheta in
% thetas = linspace(0, 2*pi , round(2*pi/dtheta) ); thetas =
thetas(1:end-1)
% zoff are the coordinates of the sample points in the original image
(x+iy)
% REA 4/15-6/1/05
if isscalar(rs)
rs = 1:round(rs);
end

if isscalar(thetas)
thetas = linspace(0, 2*pi , round(2*pi/thetas) );
thetas = thetas(1: (end-1) );
end

[thetas_img,rs_img] = meshgrid(thetas,rs);
[xs,ys] = pol2cart(thetas_img,rs_img);

% offset the x,y coords to the appropriate location
zoff = complex(xs,ys) + zcenter;

% and extract the image data
pim = interp2(img,real(zoff),imag(zoff),'nearest');

Furthermore:

Cartesian Coordinate System Conversion

cart2pol	Transform Cartesian coordinates to polar or cylindrical
cart2sph	Transform Cartesian coordinates to spherical
pol2cart	Transform polar or cylindrical coordinates to Cartesian
sph2cart	Transform spherical coordinates to Cartesian

Notes: SURF Haar like feature (Copy paste from source)

In matching keypoints, using euclidean distance:
A correct-positive is a match where the two keypoints correspond to the same physical location (as determined either by ground truth for labeled images, or using known image transformations for synthetic image de-formation tests). A false-positive is a match where the two keypoints come from different physical locations.

The Haar wavelet is also the simplest possible wavelet. The technical disadvantage of the Haar wavelet is that it is not continuous, and therefore not differentiable. This property can, however, be an advantage for the analysis of signals with sudden transitions, such as monitoring of tool failure in machines

A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in these regions and calculates the difference between them. This difference is then used to categorize subsections of an image. For example, let us say we have an image database with human faces. It is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore a common haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case).

a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features is necessary to describe an object with sufficient accuracy. In the Viola-Jones object detection framework, the Haar-like features are therefore organized in something called a classifier cascade to form a strong learner or classifier.

The key advantage of a Haar-like feature over most other features is its calculation speed. Due to the use of integral images, a Haar-like feature of any size can be calculated in constant time (approximately 60 microprocessor instructions for a 2-rectangle feature).

Pyramids

It is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Historically, pyramid representation is a predecessor to scale space representation and multiresolution analysis (WIKIPEDIA).

There are 2 types of pyramids: the lowpass and bandpass.
SIFT uses LOWPASS pyramids: A lowpass pyramid is generated by first smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of two along each coordinate direction.

A bandpass pyramid is obtained by forming the difference between adjacent levels in a pyramid, where in addition some kind of interpolation is performed between representations at adjacent levels of resolution, to enable the computation of pixelwise differences.

Disadvantage:
Pyramid representation leads to rapidly decreasing image size. This reduces computational work in computation of the representation and subsequent processing. The main disadvantage of with pyramids is that they are defined from a logarithmic process, making theoritical analysis complicated. Furthermore, they correspond to quite a coarse quantization along the scale direction, which makes logarithmically hard to relate images structures across multiple scales. ...

A scale-space is a continuous function which can be used to find extrema across all possible scales [A. Witkin. Scale-space filtering, int. joint conf. Artif. Intell, 2:1019–1021, 1983.].

In computer vision the scale-space is typically implemented as an image pyramid where the input image is iteratively convolved with Gaussian kernel and repeatedly sub-sampled (reduced in size). This method is used to great effect in SIFT (See my post of SIFT Details) but since each layer relies on the previous, and images need to be resized it is not computationally efficient.

Another way to construct the scale-space is by applying kernels of increasing size to the original image. This allows for multiple layers of the scale-space pyramid to be processed simultaneously and negates the need to subsample the image hence providing performance increase.

The SURF approach leaves the original image unchanged and varies only the filter size.

References:

H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. European Conference on Computer Vision, 1:404–417, 2006.

D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

Saturday, August 28, 2010

computing integral images (summed area table)

A summed area table (also known as an integral image) is an algorithm for quickly and efficiently generating the sum of values in a rectangular subset of a grid.

The idea behind summed area tables is that you build a two dimensional running sum of the image.
its widely used in the computer vision community when it was used by Viola-Jones object detection framework twenty years after its introduction.

Viola, Paul; Jones, Michael (2002). "Robust Real-time Object Detection". International Journal of Computer Vision. http://research.microsoft.com/~viola/Pubs/Detect/violaJones_IJCV.pdf.

Integral Images is done for GRAY SCALE Image.

In MATLAB, integral images is computed via the following function:

This function computes an integral image
such that the value of intim(r,c)
equals sum(sum(im(1:r, 1:c))

intim = cumsum(cumsum(im,1),2);

A summed area table, or integral
image, can be generated by computing the cumulative sums along the rows of an image and then computing the cumulative sums down the columns. Thus the value at any point (x, y) in the integral image is the sum of all the image pixels above and to the left of (x, y), inclusive.

The integral image is computed rapidly from an input image and is used to speed up the calculation of any upright rectangular area. Used in the SURF implementation

    [rows, cols] = size(intim);
    fim = zeros(rows, cols);
    

    [nfilters, fcols] = size(f);
    if fcols ~= 5
        error('Filters must be specified via an nx5 matrix');
    end

    f(:,1:2) = f(:,1:2)-1;       % Adjust the values of r1 and c1 to save addition
                                 % operations inside the loops below
    
    rmin =  1-min(f(:,1));       % Set loop bounds so that we do not try to
    rmax =  rows - max(f(:,3));  % access values outside the image. 
    cmin =  1-min(f(:,2));
    cmax =  cols - max(f(:,4));
    
    for r = rmin:rmax
        for c = cmin:cmax
            for n = 1:nfilters
            
                fim(r,c) = fim(r,c) + f(n,5)*...
                       (intim(r+f(n,3),c+f(n,4)) - intim(r+f(n,1),c+f(n,4)) ...
                      - intim(r+f(n,3),c+f(n,2)) + intim(r+f(n,1),c+f(n,2)));
            
            end
        end
end

examples images to follow:

Original Image

Filtered image with box filter and integral image

an nxm image produces an nxm integral image

Thursday, August 26, 2010

Principal Component Analysis

Principal component analysis has several disadvantages including

Translation variant
Scale variant
Background variant
Lighting variant

The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Although PCA is used as a tool in exploratory data analysis and making predictive models, its strength is also considered its weakness. This is because of its non-parametric analysis. Since there are no parameters to tweak, the answer of PCA is unique and independent of the users experience. Thus, the answer may not be optimal in all cases.

The basic idea of PCA is given a set of variables, find a set of variables with less redundancy, this gives a good representation for regression analysis. Since redundancy is computed via correlation, it is required for reducing the scale space between vector elements

In Matlab PCA is done via the following function:

[COEFF, SCORE] = princomp(X);

EXAMPLES TO FOLLOW:
TO DO

Matlab - Working with Videos