Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

01/01/2022
by   Ziyang Tang, et al.
4

Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. However, standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. In this paper, we advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function. The benefit of learning the operator is that we can incorporate any new reward function as input and attain its corresponding value function in a zero-shot manner. To approximate this special type of operator, we design a number of novel operator neural network architectures based on its theoretical properties. Our design of operator networks outperform the existing methods and the standard design of general purpose operator network, and we demonstrate the benefit of our operator deep Q-learning framework in several tasks including reward transferring for offline policy evaluation (OPE) and reward transferring for offline policy optimization in a range of tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2023

Bounding the Optimal Value Function in Compositional Reinforcement Learning

In the field of reinforcement learning (RL), agents are often tasked wit...
research
11/29/2021

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Reinforcement learning (RL) agents are widely used for solving complex s...
research
10/27/2022

Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision

Learning-based methods in robotics hold the promise of generalization, b...
research
08/28/2019

Reinforcement Learning: Prediction, Control and Value Function Approximation

With the increasing power of computers and the rapid development of self...
research
09/11/2019

Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning

A common approach for defining a reward function for Multi-objective Rei...
research
08/20/2021

Plug and Play, Model-Based Reinforcement Learning

Sample-efficient generalisation of reinforcement learning approaches hav...
research
02/23/2021

Greedy Multi-step Off-Policy Reinforcement Learning

Multi-step off-policy reinforcement learning has achieved great success....

Please sign up or login with your details

Forgot password? Click here to reset