Provably Correct Automatic Subdifferentiation for Qualified Programs

09/23/2018
by   Sham Kakade, et al.
0

The Cheap Gradient Principle (Griewank 2008) --- the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of 5) as that of simply computing the function itself --- is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing subderivatives: widely used ML libraries, including TensorFlow and PyTorch, do not correctly compute (generalized) subderivatives even on simple examples. This work considers the question: is there a Cheap Subgradient Principle? Our main result shows that, under certain restrictions on our library of nonsmooth functions (standard in nonlinear programming), provably correct generalized subderivatives can be computed at a computational cost that is within a (dimension-free) factor of 6 of the cost of computing the scalar function itself.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2020

Effect of barren plateaus on gradient-free optimization

Barren plateau landscapes correspond to gradients that vanish exponentia...
research
06/01/2022

Nonsmooth automatic differentiation: a cheap gradient principle and other complexity results

We provide a simple model to estimate the computational costs of the bac...
research
09/08/2021

Iterated Vector Fields and Conservatism, with Applications to Federated Learning

We study iterated vector fields and investigate whether they are conserv...
research
10/11/2019

Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions

Evolutionary Strategies (ES) are known to be an effective black-box opti...
research
05/03/2019

TensorNetwork on TensorFlow: A Spin Chain Application Using Tree Tensor Networks

TensorNetwork is an open source library for implementing tensor network ...
research
11/03/2020

AdaDGS: An adaptive black-box optimization method with a nonlocal directional Gaussian smoothing gradient

The local gradient points to the direction of the steepest slope in an i...
research
11/15/2021

Quadratic speedup of global search using a biased crossover of two good solutions

The minimisation of cost functions is crucial in various optimisation fi...

Please sign up or login with your details

Forgot password? Click here to reset