Active Reinforcement Learning: Observing Rewards at a Cost

11/13/2020
by   David Krueger, et al.
0

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2023

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

We extend the provably convergent Full Gradient DQN algorithm for discou...
research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
01/26/2020

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime-s...
research
09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...
research
09/15/2021

Estimation of Warfarin Dosage with Reinforcement Learning

In this paper, it has attempted to use Reinforcement learning to model t...
research
02/16/2022

Deep Contextual Bandits for Orchestrating Multi-User MISO Systems with Multiple RISs

The emergent technology of Reconfigurable Intelligent Surfaces (RISs) ha...
research
02/28/2017

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Conventional reinforcement learning methods for Markov decision processe...

Please sign up or login with your details

Forgot password? Click here to reset