
Largetime asymptotics in deep learning
It is by now wellknown that practical deep supervised learning may roug...
read it

Neural Dynamics on Complex Networks
We introduce a deep learning model to learn continuoustime dynamics on ...
read it

Semitractability of optimal stopping problems via a weighted stochastic mesh algorithm
In this article we propose a Weighted Stochastic Mesh (WSM) Algorithm fo...
read it

ANODE: Unconditionally Accurate MemoryEfficient Gradients for Neural ODEs
Residual neural networks can be viewed as the forward Euler discretizati...
read it

Graph Planning with Expected Finite Horizon
Graph planning gives rise to fundamental algorithmic questions such as s...
read it

Statistically Significant Stopping of Neural Network Training
The general approach taken when training deep learning classifiers is to...
read it

Universal flow approximation with deep residual networks
Residual networks (ResNets) are a deep learning architecture with the re...
read it
Sparse approximation in learning via neural ODEs
We consider the continuoustime, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon T in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and L^1–parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time T^* beyond which the optimal parameters vanish. Moreover, under appropriate interpolation assumptions on the neural ODE, we provide quantitative estimates of the stopping time T^∗, and of the training error of the trajectories at the stopping time. The latter stipulates a quantitative approximation property of neural ODE flows with sparse parameters. In practical terms, a shorter timehorizon in the training problem can be interpreted as considering a shallower residual neural network (ResNet), and since the optimal parameters are concentrated over a shorter time horizon, such a consideration may lower the computational cost of training without discarding relevant information.
READ FULL TEXT
Comments
There are no comments yet.