Representation-Driven Reinforcement Learning

05/31/2023
by   Ofir Nabati, et al.
0

We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where good policy representations enable optimal exploration. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches, leading to significantly improved performance compared to traditional methods. Our framework provides a new perspective on reinforcement learning, highlighting the importance of policy representation in determining optimal exploration-exploitation strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

We propose Deterministic Sequencing of Exploration and Exploitation (DSE...
research
05/28/2017

Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

Bandit based optimisation has a remarkable advantage over gradient based...
research
07/19/2021

Improving exploration in policy gradient search: Application to symbolic optimization

Many machine learning strategies designed to automate mathematical tasks...
research
09/13/2019

ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Traditionally, off-policy learning algorithms (such as Q-learning) and e...
research
07/22/2023

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promis...
research
09/15/2017

The Uncertainty Bellman Equation and Exploration

We consider the exploration/exploitation problem in reinforcement learni...
research
03/07/2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces

We present a new algorithm ASEBO for conducting optimization of high-dim...

Please sign up or login with your details

Forgot password? Click here to reset