Nonsmooth automatic differentiation: a cheap gradient principle and other complexity results

06/01/2022
by   Jérôme Bolte, et al.
0

We provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. Prominent examples are the famous relu and convolutional neural networks together with their standard loss functions. Using the recent notion of conservative gradients, we then establish a "nonsmooth cheap gradient principle" for backpropagation encompassing most concrete applications. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches which have, at this day, dimensional-dependent worst case estimates. In order to understand this class of methods, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication. This shows a fundamental limitation for improving forward AD for that task. Finally, while the fastest algorithms for computing a Clarke subgradient are linear in the dimension, it appears that computing two distinct Clarke (resp. lexicographic) subgradients for simple neural networks is NP-Hard.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2022

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for op...
research
06/12/2023

Can Forward Gradient Match Backpropagation?

Forward Gradients - the idea of using directional derivatives in forward...
research
09/23/2018

Provably Correct Automatic Subdifferentiation for Qualified Programs

The Cheap Gradient Principle (Griewank 2008) --- the computational cost ...
research
07/13/2022

Automatic Differentiation: Theory and Practice

We present the classical coordinate-free formalism for forward and backw...
research
09/27/2019

Backpropagation in the Simply Typed Lambda-calculus with Linear Negation

Backpropagation is a classic automatic differentiation algorithm computi...
research
09/23/2019

Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning

The Clarke subdifferential is not suited to tackle nonsmooth deep learni...
research
06/16/2020

Learning Linear Programs from Optimal Decisions

We propose a flexible gradient-based framework for learning linear progr...

Please sign up or login with your details

Forgot password? Click here to reset