Surfing: Iterative optimization over incrementally trained deep networks

07/19/2019
by   Ganlin Song, et al.
3

We investigate a sequential optimization procedure to minimize the empirical risk functional f_θ̂(x) = 1/2G_θ̂(x) - y^2 for certain families of deep networks G_θ(x). The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters θ_0, we show that the objective f_θ_0(x) is "nice" and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks x G_θ_t(x) and associated risk functions f_θ_t(x), where t indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized. The algorithm is formalized and analyzed for a family of expansive networks. We call the procedure surfing since it rides along the peak of the evolving (negative) empirical risk function, starting from a smooth surface at the beginning of learning and ending with a wavy nonconvex surface after learning is complete. Experiments show how surfing can be used to find the global optimum and for compressed sensing even when direct gradient descent on the final learned network fails.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2020

Frank-Wolfe optimization for deep networks

Deep neural networks is today one of the most popular choices in classif...
research
09/29/2022

Computational Complexity of Sub-linear Convergent Algorithms

Optimizing machine learning algorithms that are used to solve the object...
research
11/20/2018

Variance Suppression: Balanced Training Process in Deep Learning

Stochastic gradient descent updates parameters with summation gradient c...
research
12/09/2019

On the rate of convergence of a neural network regression estimate learned by gradient descent

Nonparametric regression with random design is considered. Estimates are...
research
01/27/2023

Meta-Learning Mini-Batch Risk Functionals

Supervised learning typically optimizes the expected value risk function...
research
11/11/2016

Learning to Learn without Gradient Descent by Gradient Descent

We learn recurrent neural network optimizers trained on simple synthetic...
research
06/30/2023

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep...

Please sign up or login with your details

Forgot password? Click here to reset