Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

09/17/2018
by   Hyung-Jin Yoon, et al.
0

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2016

POMDP-lite for Robust Robot Planning under Uncertainty

The partially observable Markov decision process (POMDP) provides a prin...
research
05/18/2006

Cross-Entropic Learning of a Machine for the Decision in a Partially Observable Universe

Revision of the paper previously entitled "Learning a Machine for the De...
research
05/20/2020

Hidden Markov Models and their Application for Predicting Failure Events

We show how Markov mixed membership models (MMMM) can be used to predict...
research
08/02/2020

Dynamic Discrete Choice Estimation with Partially Observable States and Hidden Dynamics

Dynamic discrete choice models are used to estimate the intertemporal pr...
research
01/10/2013

Planning and Acting under Uncertainty: A New Model for Spoken Dialogue Systems

Uncertainty plays a central role in spoken dialogue systems. Some stocha...
research
08/27/2017

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

We consider the problem of tracking an intruder using a network of wirel...
research
08/21/2021

Sequential Stochastic Optimization in Separable Learning Environments

We consider a class of sequential decision-making problems under uncerta...

Please sign up or login with your details

Forgot password? Click here to reset