
Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks
Training deep neural networks for solving machine learning problems is o...
read it

On the saddle point problem for nonconvex optimization
A central challenge to many fields of science and engineering involves m...
read it

A Generalized Projected Bellman Error for Offpolicy Value Estimation in Reinforcement Learning
Many reinforcement learning algorithms rely on value estimation. However...
read it

Practical GaussNewton Optimisation for Deep Learning
We present an efficient blockdiagonal ap proximation to the GaussNewt...
read it

Neural TemporalDifference Learning Converges to Global Optima
Temporaldifference learning (TD), coupled with neural networks, is amon...
read it

Algorithms for Piecewise Constant Signal Approximations
We consider the problem of finding optimal piecewise constant approximat...
read it

Mobile Networks for Computer Go
The architecture of the neural networks used in Deep Reinforcement Learn...
read it
Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation
Recent development of Deep Reinforcement Learning has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error function. Despite great successes of Deep Reinforcement Learning, development of reliable and efficient numerical algorithms to minimise the Bellman Error is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly nonconvex or using incorrect gradient information as done in SemiGradient algorithms. In this work, we analyse the Mean Squared Bellman Error from a smooth optimisation perspective combined with a Residual Gradient formulation. Our contribution is twofold. First, we analyse critical points of the error function and provide technical insights on the optimisation procure and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions we can eliminate suboptimal local minima when using overparametrised neural networks. We can construct an efficient Approximate Newton's algorithm based on our analysis and confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. Second, we demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the short coming of SemiGradients. To benefit from an approximate Newton's algorithm complete derivatives of the Mean Squared Bellman error must be considered during training.
READ FULL TEXT
Comments
There are no comments yet.