Practical Riemannian Neural Networks

by   Gaétan Marceau-Caron, et al.

We provide the first experimental results on non-synthetic datasets for the quasi-diagonal Riemannian gradient descents for neural networks introduced in [Ollivier, 2015]. These include the MNIST, SVHN, and FACE datasets as well as a previously unpublished electroencephalogram dataset. The quasi-diagonal Riemannian algorithms consistently beat simple stochastic gradient gradient descents by a varying margin. The computational overhead with respect to simple backpropagation is around a factor 2. Perhaps more interestingly, these methods also reach their final performance quickly, thus requiring fewer training epochs and a smaller total computation time. We also present an implementation guide to these Riemannian gradient descents for neural networks, showing how the quasi-diagonal versions can be implemented with minimal effort on top of existing routines which compute gradients.



There are no comments yet.


page 1

page 2

page 3

page 4


Practical Quasi-Newton Methods for Training Deep Neural Networks

We consider the development of practical stochastic quasi-Newton, and in...

Fisher Information and Natural Gradient Learning of Random Deep Networks

A deep neural network is a hierarchical nonlinear model transforming inp...

Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis

Stochastic variance reduction algorithms have recently become popular fo...

Scalable Adaptive Stochastic Optimization Using Random Projections

Adaptive stochastic gradient methods such as AdaGrad have gained popular...

Diagonal Rescaling For Neural Networks

We define a second-order neural network stochastic gradient training alg...

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

We introduce optimization methods for convolutional neural networks that...

Efficient Quasi-Geodesics on the Stiefel Manifold

Solving the so-called geodesic endpoint problem, i.e., finding a geodesi...

Code Repositories


An extension to Torch7's nn package.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • [Ama98] Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural Comput., 10(2):251–276, February 1998.
  • [DHS11] John C. Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
  • [HJLM07] Gary B. Huang, Vidit Jain, and Erik Learned-Miller. Unsupervised joint alignment of complex images. In ICCV, 2007.
  • [Hoc91] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Masters Thesis, Technische Universität München, München, 1991.
  • [LC] Yann Lecun and Corinna Cortes.

    The MNIST database of handwritten digits.

  • [Mar14] James Martens. New perspectives on the natural gradient method. CoRR, abs/1412.1193, 2014.
  • [NWC11] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In

    NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011

    , 2011.
  • [Oll15] Yann Ollivier. Riemannian metrics for neural networks I: feedforward networks. Information and Inference, 4(2):108–153, 2015.
  • [PB13] Razvan Pascanu and Yoshua Bengio. Natural gradient revisited. CoRR, abs/1301.3584, 2013.
  • [RMB07] Nicolas Le Roux, Pierre-Antoine Manzagol, and Yoshua Bengio. Topmoumoute online natural gradient algorithm. In John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, pages 849–856. Curran Associates, Inc., 2007.
  • [SHK14] Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.