Orchestrated Value Mapping for Reinforcement Learning

03/14/2022
by   Mehdi Fatemi, et al.
0

We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly decomposing the reward signal into multiple channels. The first principle enables incorporating specific properties into the value estimator that can enhance learning. The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions. This can be leveraged for various purposes, e.g. dealing with highly varying reward scales, incorporating a priori knowledge about the sources of reward, and ensemble learning. Combining the two principles yields a general blueprint for instantiating convergent algorithms by orchestrating diverse mapping functions over multiple reward channels. This blueprint generalizes and subsumes algorithms such as Q-Learning, Log Q-Learning, and Q-Decomposition. In addition, our convergence proof for this general class relaxes certain required assumptions in some of these algorithms. Based on our theory, we discuss several interesting configurations as special cases. Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2019

Distributional Reward Decomposition for Reinforcement Learning

Many reinforcement learning (RL) tasks have specific properties that can...
research
01/05/2022

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

Estimating value functions is a core component of reinforcement learning...
research
02/19/2023

Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning

An agent's ability to reuse solutions to previously solved problems is c...
research
12/28/2018

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a centra...
research
09/14/2017

Shared Learning : Enhancing Reinforcement in Q-Ensembles

Deep Reinforcement Learning has been able to achieve amazing successes i...
research
05/05/2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both co...
research
01/10/2023

Mastering Diverse Domains through World Models

General intelligence requires solving tasks across many domains. Current...

Please sign up or login with your details

Forgot password? Click here to reset