DeepAI AI Chat
Log In Sign Up

MDPs with Unawareness

by   Joseph Y. Halpern, et al.

Markov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.


page 1

page 2

page 3

page 4


Learning Factored Markov Decision Processes with Unawareness

Methods for learning and planning in sequential decision problems often ...

MDPs with Unawareness in Robotics

We formalize decision-making problems in robotics and automated control ...

Approximation Methods for Partially Observed Markov Decision Processes (POMDPs)

POMDPs are useful models for systems where the true underlying state is ...

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement ...

Reasoning about Unforeseen Possibilities During Policy Learning

Methods for learning optimal policies in autonomous agents often assume ...

Decision Automation for Electric Power Network Recovery

Critical infrastructure systems such as electric power networks, water n...

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Markov Decision Processes (MDPs) are a mathematical framework for modeli...