
Solving POMDPs by Searching the Space of Finite Policies
Solving partially observable Markov decision processes (POMDPs) is highl...
read it

RiskAverse Planning Under Uncertainty
We consider the problem of designing policies for partially observable M...
read it

Stochastic Finite State Control of POMDPs with LTL Specifications
Partially observable Markov decision processes (POMDPs) provide a modeli...
read it

Stochastic modified equations for the asynchronous stochastic gradient descent
We propose a stochastic modified equations (SME) for modeling the asynch...
read it

Efficient Stochastic Gradient Descent for Distributionally Robust Learning
We consider a new stochastic gradient descent algorithm for efficiently ...
read it

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms
Recursive stochastic algorithms have gained significant attention in the...
read it

A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains
Partially observable Markov decision processes (POMDPs) are a natural mo...
read it
Learning FiniteState Controllers for Partially Observable Environments
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finitestate automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finitestate automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each timestep.
READ FULL TEXT
Comments
There are no comments yet.