histograms of oriented gradients for human detection

We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The image is divided into small connected regions called cells, and for the pixels within each cell, a histogram of gradient directions is compiled. Dalal and Triggs explored four different methods for block normalization. For improved accuracy, the local histograms can be contrast-normalized by calculating a measure of the intensity across a larger region of the image, called a block, and then using this value to normalize all cells within the block. Image pre-processing thus provides little impact on performance.

Abstract. We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. Tiling the detection window with a dense (in fact, overlapping) grid of HOG descriptors and using the combined feature vector in a conventional SVM based window classifier gives our human detection chain. The most common method is to apply the 1-D centered, point discrete derivative mask. Dalal and Triggs tested other, more complex masks, such as the 3x3 masks. The second step of calculation is creating the cell histograms. In tests, the gradient magnitude itself generally produces the best results. The R-HOG blocks appear quite similar to the Circular HOG blocks (C-HOG) can be found in two variants: those with a single, central cell and those with an angularly divided central cell. The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. Since it operates on local cells, it is invariant to geometric and photometric transformations, except for object orientation.

Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR '05. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. These blocks typically overlap, meaning that each cell contributes more than once to the final descriptor. Moreover, they found that some minor improvement in performance could be gained by applying a Gaussian spatial window within each block before tabulating histogram votes in order to weight pixels around the edge of the blocks less. The cells themselves can either be rectangular or radial in shape, and the histogram channels are evenly spread over 0 to 180 degrees or 0 to 360 degrees, depending on whether the gradient is "unsigned" or "signed". Also, Gaussian weighting provided no benefit when used in conjunction with the C-HOG blocks.

The HOG descriptor has a few key advantages over other descriptors. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds. Gradients [-1 0 1] and [-1 0 1]T were good enough. The HOG descriptor is thus particularly suited for human detection in images. The first step of calculation in many feature detectors in image pre-processing is to ensure normalized color and gamma values.