A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

07/07/2022
by   Dhananjay Ashok, et al.
0

We present a novel hybrid algorithm for training Deep Neural Networks that combines the state-of-the-art Gradient Descent (GD) method with a Mixed Integer Linear Programming (MILP) solver, outperforming GD and variants in terms of accuracy, as well as resource and data efficiency for both regression and classification tasks. Our GD+Solver hybrid algorithm, called GDSolver, works as follows: given a DNN D as input, GDSolver invokes GD to partially train D until it gets stuck in a local minima, at which point GDSolver invokes an MILP solver to exhaustively search a region of the loss landscape around the weight assignments of D's final layer parameters with the goal of tunnelling through and escaping the local minima. The process is repeated until desired accuracy is achieved. In our experiments, we find that GDSolver not only scales well to additional data and very large model sizes, but also outperforms all other competing methods in terms of rates of convergence and data efficiency. For regression tasks, GDSolver produced models that, on average, had 31.5 MSE in 48 GDSolver was able to achieve the highest accuracy over all competing methods, using only 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2017

Output Range Analysis for Deep Neural Networks

Deep neural networks (NN) are extensively used for machine learning task...
research
12/16/2020

Data optimization for large batch distributed training of deep neural networks

Distributed training in deep learning (DL) is common practice as data an...
research
08/26/2020

Gravilon: Applications of a New Gradient Descent Method to Machine Learning

Gradient descent algorithms have been used in countless applications sin...
research
08/05/2019

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

In this paper, we theoretically prove that gradient descent can find a g...
research
04/30/2020

Physarum Powered Differentiable Linear Programming Layers and Applications

Consider a learning algorithm, which involves an internal call to an opt...
research
10/17/2022

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

There is a recent focus on designing architectures that have an Integer ...
research
11/07/2022

Optimizing Wi-Fi Channel Selection in a Dense Neighborhood

In dense neighborhoods, there are often dozens of homes in close proximi...

Please sign up or login with your details

Forgot password? Click here to reset