Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

03/28/2017
by   Kai Fan, et al.
0

Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up the convergence with improved use of the gradient estimation by sampling methods. However, the existing algorithm and corresponding theoretical justification depend on the assumption that the possible configurations of inputs are finite, like binary variables. The purpose of this paper is to generalize SSD for Gaussian RBM being capable of mod- eling continuous data, regardless of the previous assumption. We propose the gradient descent methods in non-Euclidean space of parameters, via de- riving the upper bounds of logarithmic partition function for RBMs based on Schatten-infinity norm. We empirically show that the advantage and improvement of SSD over stochastic gradient descent (SGD).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (Sgd) methods are the most powerful optimiza...
research
10/21/2019

Non-Gaussianity of Stochastic Gradient Noise

What enables Stochastic Gradient Descent (SGD) to achieve better general...
research
05/15/2023

Neural Boltzmann Machines

Conditional generative models are capable of using contextual informatio...
research
02/28/2019

A block-random algorithm for learning on distributed, heterogeneous data

Most deep learning models are based on deep neural networks with multipl...
research
09/18/2017

When is a Convolutional Filter Easy To Learn?

We analyze the convergence of (stochastic) gradient descent algorithm fo...
research
05/02/2023

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Gradient clipping is a popular modification to standard (stochastic) gra...
research
01/08/2018

Weighted Contrastive Divergence

Learning algorithms for energy based Boltzmann architectures that rely o...

Please sign up or login with your details

Forgot password? Click here to reset