Stochastic Training of Neural Networks via Successive Convex Approximations

06/15/2017
by   Simone Scardapane, et al.
0

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2016

A Framework for Parallel and Distributed Training of Neural Networks

The aim of this paper is to develop a general framework for training neu...
research
08/17/2016

Mollifying Networks

The optimization of deep neural networks can be more challenging than tr...
research
10/21/2019

Implementation of a modified Nesterov's Accelerated quasi-Newton Method on Tensorflow

Recent studies incorporate Nesterov's accelerated gradient method for th...
research
10/25/2020

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

When equipped with efficient optimization algorithms, the over-parameter...
research
06/11/2021

Collaborative Multidisciplinary Design Optimization with Neural Networks

The design of complex engineering systems leads to solving very large op...
research
07/09/2023

Large-scale global optimization of ultra-high dimensional non-convex landscapes based on generative neural networks

We present a non-convex optimization algorithm metaheuristic, based on t...
research
11/19/2015

Online Batch Selection for Faster Training of Neural Networks

Deep neural networks are commonly trained using stochastic non-convex op...

Please sign up or login with your details

Forgot password? Click here to reset