On Value Functions and the Agent-Environment Boundary

05/30/2019
by   Nan Jiang, et al.
0

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.

READ FULL TEXT
research
03/25/2022

Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps

We consider a challenging theoretical problem in offline reinforcement l...
research
02/24/2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

Upside down reinforcement learning (UDRL) flips the conventional use of ...
research
04/03/2023

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

In goal-reaching reinforcement learning (RL), the optimal value function...
research
11/28/2016

Nonparametric General Reinforcement Learning

Reinforcement learning (RL) problems are often phrased in terms of Marko...
research
12/22/2020

Dynamic penalty function approach for constraints handling in reinforcement learning

Reinforcement learning (RL) is attracting attentions as an effective way...
research
04/15/2021

Scale Invariant Solutions for Overdetermined Linear Systems with Applications to Reinforcement Learning

Overdetermined linear systems are common in reinforcement learning, e.g....
research
02/13/2021

Interactive Learning from Activity Description

We present a novel interactive learning protocol that enables training r...

Please sign up or login with your details

Forgot password? Click here to reset