SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

06/01/2021
by   Zaccharie Ramzi, et al.
0

In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach in many settings, ranging from hyperparameter optimization to large Multiscale DEQs applied to CIFAR and ImageNet. We show that it reduces the computational cost of the backward pass by up to two orders of magnitude. All this is achieved while retaining the excellent performance of the original models in hyperparameter optimization and on CIFAR, and giving encouraging and competitive results on ImageNet.

READ FULL TEXT
research
04/23/2023

Efficient Training of Deep Equilibrium Models

Deep equilibrium models (DEQs) have proven to be very powerful for learn...
research
06/26/2023

PMaF: Deep Declarative Layers for Principal Matrix Features

We explore two differentiable deep declarative layers, namely least squa...
research
10/20/2021

Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

Machine learning training methods depend plentifully and intricately on ...
research
11/09/2021

On Training Implicit Models

This paper focuses on training implicit models of infinite layers. Speci...
research
01/10/2020

ReluDiff: Differential Verification of Deep Neural Networks

As deep neural networks are increasingly being deployed in practice, the...
research
02/24/2022

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Deep declarative networks and other recent related works have shown how ...
research
11/08/2019

Penalty Method for Inversion-Free Deep Bilevel Optimization

Bilevel optimizations are at the center of several important machine lea...

Please sign up or login with your details

Forgot password? Click here to reset