Combinational neural network using Gabor filters for the classification of handwritten digits

09/18/2017 ∙ by N. Joshi, et al. ∙ IG Farben Haus 0

A classification algorithm that combines the components of k-nearest neighbours and multilayer neural networks has been designed and tested. With this method the computational time required for training the dataset has been reduced substancially. Gabor filters were used for the feature extraction to ensure a better performance. This algorithm is tested with MNIST dataset and it will be integrated as a module in the object recognition software which is currently under development.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep learning algorighms are gaining fame with its performance on the large datasets. They rely on the availability of huge data, thereby minimizing the influence of few irregular samples from dataset. A classification algorithms that combines the k-nearest nearest neighbours and multi layer perceptron (MLP) network has been designed and tested. A fusion of parametric and non-parametric approach produces better learning results with a few trainning examples. With this combinational approach, we have also managed to reduced a substancial amount dataset required to train the network. This algorithm has been tested on the well known dataset of handwritten digits from MNIST MNIST . This approach is known as meta learning MetaLearn .

Ii Gabor filters and feature extraction

Gabor filters are the natural choice to extract features as they produce a respose similar to the human visual system Gabor_1 . A two-dimentional Gabor function is written as



and is the wavelength, is the orientation, is the phase offset,

is the standard deviation of the Gaussian envelope and

is the spatial aspect ratio of the Gabor function. A Gabor filter bank is created by the choice of different parameters. In this case three values of and were chosen with eight rotations ( ’s) equally spaced between (Total ). Fig.1 shows pictures of different gabor filters in 2D with false colour coded intensities.

Figure 1: Gabor filter bank created by varying , frequency () and rotation (). The intensities are false colour coded for better visual purpose.

Each of the gabor filter is convolved with the sample image (see example Fig.2

) to get the respose matrix. All of these response matrices then converted to form a feature vector.

Figure 2: An example of four Gabor filters applied to the sample image.

Iii Combined network Model

Gabor filter can be used for boundary detection or texture segmentation. In the case of handwritten digits recognition the inner structure is ignored. Only frequencies () and the values of Gaussian spread responsible for detecting the form of figure were used Gabor_2 ; Gabor_3 . One should adjust these values according to image properties like the size and depth. Please also note that, each image manipulation library uses different methods to read the raw data from the image file, giving rise to conflicting results. In this case, an image was imported as a float array and normalized to avoid any floating point errors. A feature vector can be formed by calculating local energy, mean amplitude, entropy or phase amplitude. We used the information entropy and the energy to form a vector of dimension. The Shannon information entropy is defined as


where is the number of gray levels and

is the probability associated with gray level

Shannon .

Figure 3: Schematics of neural network model.

Fig.3 shows the scheme of the algorithm. Models of digits were created by taking an average of a certain number of randomly chosen images from the MNIST dataset. The feature vector extracted from this model was used to initialize an MLP network. Then a batch of dataset from trainning set was fed. When the cost function exceeds the pre-determenined value calculated from covariance matrix, a newly calulated centroid was used to update the weights of the neural network. Thus, with this strategy, a faster recognition of particular image was achieved using lesser number of trainning samples.

Iv Experiments and Results

It was observed that the efficiency of the algorithm depends on the number of elements including the Gabor parameters as well as standard deviation allowed to update weights of the network. In the first stage, the Gabor parameters were varied in such a way that the distances between vectors generated from model digits were optimized.

Figure 4: Minkowski distances distributed over invervals for different values of frequency and .


shows the normalized distribution of Minkowski distances between each digits. The Minkowski distance between two points

and of order is defined as


For , it reduces to the Euclidean distance.

It is desired that for best classification the centroids should be as far as possible with equal distances between them. Inspite of that, one observes that the distance between digits 1 and 7 is much less than the distance between 1 and 8 due to their similarity. When frequencies and are optimized we get the distribution depicted by blue solid line (see Fig.4). The number of interval denotes the vector distance formed by digits

In the second stage the algorithm is devided into two branches. The feature set with labels from the model digit is fed directly to the MLP network and the weights are initialized. The trainning set is devided into batches and using nearest neighbor algorithm, centroid and covariance matrix is calculated by


from which the standard deviation is derived i.e. the square roots of the eigenvalues of the covariance matrix along the principal component. The centroid is simply the mean of the vectors. if the distribution is spread, far more than the previously calculated standard deviation, the weights of neural network is updated and varified against the testing set; consequently the accuracy over the test set is calculated.

Fig. 5 shows the evolution of the separation between feature vectors as the weights are updated with trainning batches. The separation between feature set grows more distant during the trainning phase and increasing the accuracy.

Figure 5: Evolution of distribution of distances between intervals as a function of number of batches fed to the network.
Figure 6: Accuracy as a function of batches for two different values of sigma deviation.

MNIST provides large data set for testing hence one can calculate accuracy with a low uncertainty. In this case it was observed that the accuracy on testing data set depends on weight updating. If we allow for a larger Gaussian spread, the accuracy rises quickly over number of batches but does not reach its heighest value. Whereas for smaller values about accuracy was acheived at the cost of larger number of batches.

The calculations were performed using computer cluster with 8 nodes consisting 2 CPUs each. The module called mpi4py was used for parallel processing mpi4py . The calculation of Centroid and standard deviation was done using Scikit-learn along with Numpy library Scikit-learn. The neural network was designed with the module known as Multi layer Perceptron under Scikit-learn.

V Conclusions

A model fusing two different methods is presented here. The performance of Gabor filter in image segmentation is also investigated and proved useful, though substantial amount of efforts were required to fine-tune filter parameters for the output consistancy. With this method we have achieved about accuracy. A sub-module is proposed to manipulate the images in order to align properly a distrorted image. Early results have have shown increased efficiency of . Although this code is tested only with the MNIST dataset, the ongoing efforts will use the other datasets like Omniglot character sets and other abstract objects.


  • (1)
  • (2) Maudsley, D.B. “ A Theory of Meta-Learning and Principles of Facilitation: An Organismic Perspective”, University of Toronto, 1979.
  • (3) John Daugman, “Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression”, IEEE Trans on Acoustics, Speech, and Signal Processing. Vol. 36. No. 7. July 1988, pp. 1169–1179.
  • (4) R.Mehrotra, K.R.Namuduri, N.Ranganathan, “Gabor filter-based edge detection”, Volume 25, Issue 12, December 1992, Pages 1479-1494.
  • (5) A.G. Ramakrishnan, S. Kumar Raja and H.V. Raghu Ram, “Neural network-based segmentation of textures using Gabor features”, Proc. 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 365 - 374, 2002.
  • (6) C.E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
  • (7) L. Dalcin, P. Kler, R. Paz, and A. Cosimo, “Parallel Distributed Computing using Python”, Advances in Water Resources, 34(9):1124-1139, 2011
  • (8) Pedregosa et al., “Scikit-learn: Machine Learning in Python”, JMLR 12, pp. 2825-2830, 2011.