Rotation-Invariant Restricted Boltzmann Machine Using Shared Gradient Filters

Finding suitable features has been an essential problem in computer vision. We focus on Restricted Boltzmann Machines (RBMs), which, despite their versatility, cannot accommodate transformations that may occur in the scene. As a result, several approaches have been proposed that consider a set of transformations, which are used to either augment the training set or transform the actual learned filters. In this paper, we propose the Explicit Rotation-Invariant Restricted Boltzmann Machine, which exploits prior information coming from the dominant orientation of images. Our model extends the standard RBM, by adding a suitable number of weight matrices, associated with each dominant gradient. We show that our approach is able to learn rotation-invariant features, comparing it with the classic formulation of RBM on the MNIST benchmark dataset. Overall, requiring less hidden units, our method learns compact features, which are robust to rotations.

READ FULL TEXT VIEW PDF

Authors

page 5

page 7

06/28/2016

Theta-RBM: Unfactored Gated Restricted Boltzmann Machine for Rotation-Invariant Representations

Learning invariant representations is a critical task in computer vision...
07/09/2018

Decreasing the size of the Restricted Boltzmann machine

We propose a method to decrease the number of hidden units of the restri...
10/31/2012

Temporal Autoencoding Restricted Boltzmann Machine

Much work has been done refining and characterizing the receptive fields...
09/11/2017

On better training the infinite restricted Boltzmann machines

The infinite restricted Boltzmann machine (iRBM) is an extension of the ...
10/13/2017

A Deep Incremental Boltzmann Machine for Modeling Context in Robots

Context is an essential capability for robots that are to be as adaptive...
01/09/2020

Non-Parametric Learning of Lifted Restricted Boltzmann Machines

We consider the problem of discriminatively learning restricted Boltzman...
03/25/2011

Classification of Sets using Restricted Boltzmann Machines

We consider the problem of classification when inputs correspond to sets...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

It is widely known that a crucial problem in image understanding is to find suitable features for the task at hand. Hand-crafted descriptors were able to provide adequate representations, but they rely on specific structures in the scene and could not accommodate certain nuisance factors properly. Hence, extensive efforts in learning image representations have been done in the past years, demonstrating that machine learning approaches are able to outperform hand-crafted descriptors

[23]. Examples of learned features are e.g. vocabulary learning [5], sparse coding [15]

, Gaussian mixture models

[1]

, neural networks

[2].

Neural networks (NNs) are graphical models, where nodes in a graph are connected with weighted connections and parameters are determined via optimisation algorithms. The Restricted Boltzmann Machine

(RBM) has recently gained popularity, mainly because of its applications to deep learning

[2, 12]. RBM is a generative NN constituted by a bipartite graph, which sides are referred to visible layer and hidden layer respectively. The set of parameters within the RBM are optimised via the Contrastive Divergence (CD) algorithm [11]. Although RBMs can achieve satisfactory results [4], their use in shallow networks (namely few layers) cannot accommodate complex variability occurring in the scene [20]. To this end, the Deep Belief Network (DBN) was proposed in [14], which is constituted by several stacked RBMs. Albeit DBN have been shown to achieve some translation invariance, they may not well accommodate other nuisance factors (e.g. rotation).

Figure 1: The dominant orientation is determined for the provided image and is used to compute the gradient . The contribution of this gradient is shared amongst the other weight matrices , , , rotating the learned filters by the angle to generate the term.

In fact, several modifications of the original RBM formulation have been recently proposed, achieving certain transformation invariance. In [21], a transformation invariant RBM is proposed, where images are subjected to a predefined set of transformations. In [13] an RBM that learns equivariant features is proposed, whereby adding a new variable to be inferred within the hidden units, this variable is then used to rotate learned weights accordingly. In [19]

, a rotation (invariant) Convolutional RBM is proposed. The marginal probability of RBM is extended with a Markov Random Field, including transformed versions of input images. In

[20]

, an additional step of the backpropagation algorithm used to train DBN is introduced, where the weights are transformed and the entire network is trained again. In

[3], the authors propose an RBM where input images are divided into non-overlapping blocks. Then, patches are extracted on SIFT keypoints [18] and subsequently rotated and scaled accordingly. Despite their progress, the aforementioned methods share the following drawbacks: either they are limited to the set of transformations considered within the model, or they involve deep networks in the hope of learning better transformation invariant features [13, 20, 21], albeit increasing computational demand.

In this paper instead we present the Explicit Rotation-Invariant Restricted Boltzmann Machine (ERI-RBM), which can model the nuisance caused by rotated versions of the same pattern, without actually applying any transformation to the data. Our method considers a set of weight matrices (similar concept as in C-RBM [16]) and each sample is provided to the visible layer with its dominant orientation [3]. This information is used to select a particular weight matrix during the Gibbs sampling to compute gradients of parameters. The contribution given by the new update gradients is shared among the other weight matrices, rotating the filters accordingly [20] (cf. Figure 1). Experiments on MNIST-rot show superior performance to several baseline benchmarks and a recent method from the literature.

Our contributions are multi-fold: (i) rotation is treated explicitly, without rotating the image patterns, in contrast to for example [21]; (ii) we adopt a shallow model using a limited amount of additional weight matrices, instead of deep architectures [17]; (iii) we share the contribution coming from a weight matrix with the other ones, rotating the learned filters by suitable angles.

This paper is organised as follows. Section 2 describes the proposed Explicit Rotation-Invariant Restricted Boltzmann Machine. In Section 3, we present experimental results, whereas Section 4 concludes the manuscript.

2 Explicit Rotation-Invariant RBM (ERI-RBM)

In this section, we discuss how to embed the concept of rotation-invariance explicitly in the RBM formulation. Since input patterns are images, we will assume that neurones in the visible layer are arranged in matrix form of size

, width and height respectively. Each row in the weight matrix , connecting visible units to hidden units, is a

-dimensional vector. Therefore, each row in

can also be arranged in matrix form of size . Henceforth, we will refer to rows in the weight matrix as learned filters and rows in as update filters, which is the gradient computed during the Contrastive Divergence algorithm.

2.1 Proposed model

Let be a set of evenly distanced angles , such that for any . In our model, we augment the number of weight matrices , such that every angle is associated to a matrix . Here, is the number of hidden units, the number of visible units, and

is the number of angles. In addition, each weight matrix has an associated bias vector

. Hence, we rewrite the energy function characterising the standard Restricted Boltzmann Machine formulation as follows:

(1)

where is the -th weight matrix, is the bias vector for the hidden layer associated to , with , and is the bias vector for the visible layer. The index is uniquely determined on each input image , and will be discussed thoroughly in Section 2.2. Because of the modification in (1), all the equations involved in the CD algorithm have to be rewritten. Specifically, the conditional probabilities become:

(2)
(3)

During the optimisation algorithm, an image with dominant orientation is provided to the Gibbs sampling. After a sufficient number of alternating computations of (2) and (3), the gradient can be computed, whose contribution is shared with the remaining matrices in . To update , , , we transform the update filters in which are then added to the -th gradient. Specifically, since we can represent rows in as images, they can be rotated by an angle . Therefore, we define a new shared update filter term , such that

(4)

where defines the 2D rotation matrix by an angle . This operation may generate filters bigger than the input layers and we crop them such that the filter size remains . At this point, the final expression for the gradient is updated as follows:

(5)

Note that (5

) will be utilised within the Stochastic Gradient Descent step of the CD algorithm. Therefore,

will be multiplied by a learning rate that typically has values set in the order of (further details are discussed in [10]

). Hence, any side effects originating from pixel interpolation are minimised, precisely because of the small

. Gradients are computed as described in [11], using samples with the associated dominant orientation .

2.2 Finding the dominant angle and corresponding index

Figure 2: Computation of the dominant orientation for a sample image taken from the MNIST dataset: (a) original sample, (b) gradients of the image, (c) histogram of oriented gradients with highlighted mode , (d) sample rotated by degree. The region marked by a green ellipse corresponds to the same portion of the number 3 in the original and rotated image. Observe the differences due to image interpolation introduced during rotation.

Each image is associated to an angle , determined by the histogram of oriented gradients from [6]. Derivatives along the and directions are computed and the angle of each gradient vector can be determined. All the vectors are accumulated into a histogram with bins and the angle with the highest frequency is found. Formally, the index , such that , . Figure 2 shows graphically those steps: from the original image pattern (a), derivatives are computed using Sobel filters (b). Subsequently, we build the weighted histogram of oriented gradients and the angle with the highest frequency is selected (c). We highlight in red the -th bin of the histogram, hence for the illustrated example. In (d) we report a rotated version of the sample image by degree to show the deleterious effect of image interpolation.

Since strong edges near image boundaries may bias the estimation of the dominant gradient, the magnitude of the corresponding vectors is weighted with a Gaussian kernel, with

(width and height of respectively), such that central gradients contribute more than those at the boundaries. (We found this value covers evenly the entire image without exceeding its size.)

3 Experimental results

Setup: We used the MNIST-rot dataset111Available at http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepVsShallowComparisonICML2007 [14], containing images for training, for validation, and

for testing. This dataset is derived from the MNIST dataset, where samples were rotated by random angles. To enable comparison with other methods, for consistency, we kept this dataset splitting, and we did not perform cross-validation (that could have provided variances for statistical analysis). Since each image contains several non-zero entries close to 0, we threshold them at a value

. We compare ERI-RBM with several informative baselines and a recent invariant method. Classical RBM: We trained a standard Bernoulli Restricted Boltzmann Machine and compared results with our Explicit Rotation-Invariant RBM. Dominant RBM (D-RBM): We built a simplified model that learns an RBM for each dominant orientation, splitting the training set into partitions, associated to a different RBM (ie., we have independent RBMs). Oriented RBM (O-RBM): We pre-process the dataset by aligning all images according to their dominant orientation to a reference orientation and train a single RBM. TI-RBM: We also compared with the method in [21], using the authors implementation222Available at https://github.com/kihyuks/icml2012_tirbm

. Extracted features are provided to the following classifiers: linear and RBF SVM

[22], softmax [9], and K-NN [7].

RBF SVM
Linear SVM
Softmax
K-NN
K=3
RBM (H=100) 87.37% 59.27% 57.80% 82.69%
D-RBM (H=100, S=4) 83.44% 58.95% 56.80% 78.84%
D-RBM (H=100, S=9) 79.18% 53.62% 50.76% 73.56%
D-RBM (H=100, S=18) 69.84% 49.20% 46.58% 63.61%
O-RBM (H=100 S=18) 87.37% 58.99% 57.80% 82.69%
ERI-RBM (H=100, S=4) 78.49% 60.27% 58.31% 74.97%
ERI-RBM (H=100, S=9) 91.27% 74.87% 73.02% 88.48%
ERI-RBM (H=100, S=18) 92.08% 77.69% 75.84% 89.34%
TI-RBM [21] (H=100, S=18) 80.63% 69.10% 68.20% 73.60%
Table 1: Testing accuracies of standard RBM, Dominant RBM, Oriented RBM, TI-RBM [21], and our proposed ERI-RBM.

Parameters: We set the number of hidden units to , while progressively increased the number of bins , used to generate the histogram of orientations. Following the instructions in [10], we set the learning rate

, the Contrastive Divergence algorithm is iterated up to 200 epochs, and a constant momentum

was used. The parameters for SVM were found using logarithmic grid search and best values are reported in Table 1. We set arbitrary for the K-NN, using the Euclidean distance as metric. For TI-RBM [21], a set of transformations are considered, which is each associated with an array of hidden units, while a single weight matrix

is considered. The final representation used during inference is obtained by max-pooling. To make the comparison to ERI-RBM fair, for TI-RBM the sparsity term was disabled, and we set the number of hidden units to

.

Discussion: We report our results in Table 1 and we noticed that nonlinear SVM gave the best performance in all the cases. The baseline is given by RBM with an accuracy of . Tests using D-RBM show a gradual loss of accuracy as the number of dominant orientations is increased. This behaviour can be attributed to the lack of information sharing amongst the RBMs, since they were each trained independently with less data (per RBM). Overall, our proposed model outperforms the baseline RBM (). At , ERI-RBM has a loss of performance, because of the coarse quantization of the space: angles , , , and will have orthogonal rotations when shared update filters are computed for neighbour matrices, causing the propagation of sharp rotations that do not contribute much. As the number of increases, ERI-RBM has a of improvement, showing that our model is able to learn rotation-invariant features. This is also displayed in Figure 3, showing learned filters when . O-RBM shows no improvement compared to RBM, demonstrating that the contribution provided by the shared update filters increases the discriminative power of the final representation. Note that we also trained classical RBM with , noticing an improvement of , still lower than ERI-RBM. Finally, using the same experimental setup, ERI-RBM outperformed [21] by in testing accuracy. (These results are different from those reported in [21] since sparsity is not present and we used less units.). Our approach does rely on the determination of orientation, which could be seen as a limitation. Preliminary results (not shown for brevity), obtained by artificially perturbing the orientation estimate, show that we are tolerant to such errors up to bins off on the original estimate. This remains to be confirmed in images with cluttered background.

Figure 3: Filters learned by our ERI-RBM at . We highlight a filter that appears at rotations , , , and , showing that our model learns rotation-invariant filters. The remaining weight matrices are omitted for brevity.

4 Conclusions

In this paper we proposed the Explicit Rotation-Invariant Restricted Boltzmann Machine (ERI-RBM). Current approaches do not address the problem of rotation-invariance directly, but use a predefined set of transformations to transform either the input images [19, 21] or the learned filters [13, 20]. We were inspired by these approaches to modify the RBM learning process, such that to learn invariant features without taking into account all possible transformations, which is demanding and may propagate noise due to pixel interpolations.

Our ERI-RBM utilises the dominant gradient of input images in order to select the best set of filters to optimise. We find the corresponding gradients efficiently and update the filters in a process where information is shared across the different filters, minimising thus any effects of interpolation. Overall, our model learns rotation-invariant features and achieves an accuracy of in the MNIST-rot dataset. Comparisons with several baselines and approaches from the literature showed superior performance in a common experimental setup. Moreover, comparing to the deep architecture of [8] and the results on MNIST-rot, ERI-RBM reached similar performance using just 100 of hidden units compared to the 500 in [8]. In conclusion, ERI-RBM is able to learn rotation-invariant features in an unsupervised fashion, with a reduced number of hidden units, within a shallow network.

Acknowledgements

We thank NVIDIA corporation for providing us a Titan X GPU.

References

  • [1] Agarwal, A., Triggs, B.: Computer Vision – ECCV: 9th European Conference on Computer Vision, chap. Hyperfeatures – Multilevel Local Coding for Visual Recognition, pp. 30–43. Springer Berlin Heidelberg (2006)
  • [2]

    Arel, I., Rose, D.C., Karnowski, T.P.: Deep Machine Learning - A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine 5(4), 13–18 (2010)

  • [3] Cheng, D., Sun, T., Jiang, X., Wang, S.: Unsupervised feature learning using Markov deep belief network. In: 2013 IEEE International Conference on Image Processing. pp. 260–264. No. 20120073110053, IEEE (2013)
  • [4] Coates, A., Arbor, A., Ng, A.Y.: An Analysis of Single-Layer Networks in Unsupervised Feature Learning. AISTATS pp. 215–223 (2011)
  • [5] Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision pp. 59–74 (2004)
  • [6] Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Proceedings of the IEEE CVPR 1, 886–893 (2005)
  • [7] Dasarathy, B.: Nearest neighbor (NN) norms: nn pattern classification techniques. IEEE Computer Society Press (1991)
  • [8] Gens, R., Domingos, P.M.: Deep Symmetry Networks. In: NIPS. pp. 2537–2545. Curran Associates, Inc. (2014)
  • [9] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Springer Series in Statistics, vol. 1. Springer New York, 2nd edn. (2009)
  • [10] Hinton, G.: A Practical Guide to Training Restricted Boltzmann Machines. Springer Berlin Heidelberg, 2nd edn. (2012)
  • [11] Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural computation 14(8), 1771–1800 (2002)
  • [12] Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural computation 18(7), 1527–54 (2006)
  • [13] Kivinen, J.J., Williams, C.K.I.: Transformation Equivariant Boltzmann Machines. In: ICANN, vol. 6791, pp. 1–9. Springer (2011)
  • [14] Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th ICML pp. 473–480 (2007)
  • [15] Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient Sparse coding algorithms. Advances in neural information processing systems pp. 801–808 (2006)
  • [16] Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. Advances in Neural Information Processing Systems pp. 873–880 (2008)
  • [17]

    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML (2009)

  • [18] Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
  • [19] Schmidt, U., Roth, S.: Learning rotation-aware features: From invariant priors to equivariant descriptors. Proceedings of the IEEE CVPR pp. 2050–2057 (2012)
  • [20] Shou, Z., Zhang, Y., Cai, H.J.: A study of transformation-invariances of deep belief networks. In: IJCNN. pp. 1–8. IEEE (2013)
  • [21] Sohn, K., Lee, H.: Learning Invariant Representations with Local Transformations. Proceedings of the 29th ICML pp. 1311–1318 (2012)
  • [22]

    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons (1998)

  • [23] Wei, X., Phung, S.L., Bouzerdoum, A.: Visual descriptors for scene categorization: experimental evaluation. Artificial Intelligence Review 45(3), 1–36 (2015)