Learning Finite-State Controllers for Partially Observable Environments

01/23/2013
by   Nicolas Meuleau, et al.
0

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

research
01/23/2013

Solving POMDPs by Searching the Space of Finite Policies

Solving partially observable Markov decision processes (POMDPs) is highl...
research
05/21/2018

Stochastic modified equations for the asynchronous stochastic gradient descent

We propose a stochastic modified equations (SME) for modeling the asynch...
research
11/08/2021

Gradient-Descent for Randomized Controllers under Partial Observability

Randomization is a powerful technique to create robust controllers, in p...
research
09/27/2019

Risk-Averse Planning Under Uncertainty

We consider the problem of designing policies for partially observable M...
research
01/12/2023

Safe Policy Improvement for POMDPs via Finite-State Controllers

We study safe policy improvement (SPI) for partially observable Markov d...
research
04/24/2019

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Recursive stochastic algorithms have gained significant attention in the...
research
06/07/2021

Learning Stochastic Optimal Policies via Gradient Descent

We systematically develop a learning-based treatment of stochastic optim...

Please sign up or login with your details

Forgot password? Click here to reset