Automatic Trade-off Adaptation in Offline RL

06/16/2023
by   Phillip Swazinna, et al.
0

Recently, offline RL algorithms have been proposed that remain adaptive at runtime. For example, the LION algorithm <cit.> provides the user with an interface to set the trade-off between behavior cloning and optimality w.r.t. the estimated return at runtime. Experts can then use this interface to adapt the policy behavior according to their preferences and find a good trade-off between conservatism and performance optimization. Since expert time is precious, we extend the methodology with an autopilot that automatically finds the correct parameterization of the trade-off, yielding a new algorithm which we term AutoLION.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
02/13/2022

Supported Policy Optimization for Offline Reinforcement Learning

Policy constraint methods to offline reinforcement learning (RL) typical...
research
11/15/2022

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning ...
research
05/21/2022

User-Interactive Offline Reinforcement Learning

Offline reinforcement learning algorithms still lack trust in practice d...
research
06/15/2021

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Many advances that have improved the robustness and efficiency of deep r...
research
07/05/2022

Offline RL Policies Should be Trained to be Adaptive

Offline RL algorithms must account for the fact that the dataset they ar...

Please sign up or login with your details

Forgot password? Click here to reset