Training Neural Networks with Stochastic Hessian-Free Optimization

01/16/2013
by   Ryan Kiros, et al.
0

Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of the dataset size. We modify Martens' HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting. Stochastic Hessian-free optimization gives an intermediary between SGD and HF that achieves competitive performance on both classification and deep autoencoder experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
research
11/09/2020

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Standard first-order stochastic optimization algorithms base their updat...
research
09/11/2015

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Multidimensional recurrent neural networks (MDRNNs) have shown a remarka...
research
09/05/2013

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Hessian-free training has become a popular parallel second or- der optim...
research
06/03/2020

On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

Following early work on Hessian-free methods for deep learning, we study...
research
09/16/2017

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

In this paper we focus on the linear algebra theory behind feedforward (...
research
03/12/2021

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

This paper presents a novel natural gradient and Hessian-free (NGHF) opt...

Please sign up or login with your details

Forgot password? Click here to reset