Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

03/14/2023
by   Merlijn Krale, et al.
0

We study Markov decision processes (MDPs), where agents have direct control over when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss incurred by the heuristic. To decide whether or not to take a measurement action, we introduce the concept of measuring value. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2023

Learning Optimal Admission Control in Partially Observable Queueing Networks

We present an efficient reinforcement learning algorithm that learns the...
research
03/16/2022

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Traditionally, Reinforcement Learning (RL) aims at deciding how to act o...
research
03/22/2023

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

A central task in control theory, artificial intelligence, and formal me...
research
06/16/2023

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

Automatic bill payment is an important part of business operations in fi...
research
09/05/2015

Reinforcement Learning with Parameterized Actions

We introduce a model-free algorithm for learning in Markov decision proc...
research
07/04/2010

Computational Model of Music Sight Reading: A Reinforcement Learning Approach

Although the Music Sight Reading process has been studied from the cogni...
research
01/11/2018

Counterfactual equivalence for POMDPs, and underlying deterministic environments

Partially Observable Markov Decision Processes (POMDPs) are rich environ...

Please sign up or login with your details

Forgot password? Click here to reset