DeepAI AI Chat
Log In Sign Up

Adaptive Trade-Offs in Off-Policy Learning

by   Mark Rowland, et al.

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives of existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.


Pivotal Pruning of Trade-offs in QPNs

Qualitative probabilistic networks have been designed for probabilistic ...

The Bias-Expressivity Trade-off

Learning algorithms need bias to generalize and perform better than rand...

On the Fundamental Trade-offs in Learning Invariant Representations

Many applications of representation learning, such as privacy-preservati...

Natural Language Generation as Planning under Uncertainty Using Reinforcement Learning

We present and evaluate a new model for Natural Language Generation (NLG...

Form + Function: Optimizing Aesthetic Product Design via Adaptive, Geometrized Preference Elicitation

Visual design is critical to product success, and the subject of intensi...

Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

Many real-life scenarios require humans to make difficult trade-offs: do...

The Fundamentals of Policy Crowdsourcing

What is the state of the research on crowdsourcing for policy making? Th...