A neural network based policy iteration algorithm with global H^2-superlinear convergence for stochastic games on domains

06/05/2019
by   Kazufumi Ito, et al.
0

In this work, we propose a class of numerical schemes for solving semilinear Hamilton-Jacobi-Bellman-Isaacs (HJBI) boundary value problems which arise naturally from exit time problems of diffusion processes with controlled drift. We exploit policy iteration to reduce the semilinear problem into a sequence of linear Dirichlet problems, which are subsequently approximated by a multilayer feedforward neural network ansatz. We establish that the numerical solutions converge globally in the H^2-norm, and further demonstrate that this convergence is superlinear, by interpreting the algorithm as an inexact Newton iteration for the HJBI equation. Moreover, we construct the optimal feedback controls from the numerical value functions and deduce convergence. The numerical schemes and convergence results are then extended to HJBI boundary value problems corresponding to controlled diffusion processes with oblique boundary reflection. Numerical experiments on the stochastic Zermelo navigation problem are presented to illustrate the theoretical results and to demonstrate the effectiveness of the method.

READ FULL TEXT
research
07/04/2020

Numerical method for solving the Dirichlet boundary value problem for nonlinear triharmonic equation

In this work, we consider the Dirichlet boundary value problem for nonli...
research
09/14/2020

On construction of a global numerical solution for a semilinear singularly–perturbed reaction diffusion boundary value problem

A class of different schemes for the numerical solving of semilinear sin...
research
11/19/2021

Impact of spatial coarsening on Parareal convergence

We study the impact of spatial coarsening on the convergence of the Para...
research
07/21/2023

DeepMartNet – A Martingale based Deep Neural Network Learning Algorithm for Eigenvalue/BVP Problems and Optimal Stochastic Controls

In this paper, we propose a neural network learning algorithm for solvin...
research
11/28/2020

Approximate Midpoint Policy Iteration for Linear Quadratic Control

We present a midpoint policy iteration algorithm to solve linear quadrat...
research
05/10/2019

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...
research
07/15/2020

Widest Paths and Global Propagation in Bounded Value Iteration for Stochastic Games

Solving stochastic games with the reachability objective is a fundamenta...

Please sign up or login with your details

Forgot password? Click here to reset