
Active Reinforcement Learning: Observing Rewards at a Cost
Active reinforcement learning (ARL) is a variant on reinforcement learni...
read it

Hidden Incentives for AutoInduced Distributional Shift
Decisions made by machine learning systems have increasing influence on ...
read it

Quantifying Differences in Reward Functions
For many tasks, the reward function is too complex to be specified proce...
read it

Pitfalls of learning a reward function online
In some agent designs like inverse reinforcement learning an agent needs...
read it

Learning Human Objectives by Evaluating Hypothetical Behavior
We seek to align agent behavior with a user's objectives in a reinforcem...
read it

Scaling shared model governance via model splitting
Currently the only techniques for sharing governance of a deep learning ...
read it

Scalable agent alignment via reward modeling: a research direction
One obstacle to applying reinforcement learning algorithms to realworld...
read it

Reward learning from human preferences and demonstrations in Atari
To solve complex realworld problems with reinforcement learning, we can...
read it

Learning to Follow Language Instructions with Adversarial Reward Induction
Recent work has shown that deep reinforcementlearning agents can learn ...
read it

AI Safety Gridworlds
We present a suite of reinforcement learning environments illustrating v...
read it

Deep reinforcement learning from human preferences
For sophisticated reinforcement learning (RL) systems to interact useful...
read it

Universal Reinforcement Learning Algorithms: Survey and Experiments
Many stateoftheart reinforcement learning (RL) algorithms typically a...
read it

Nonparametric General Reinforcement Learning
Reinforcement learning (RL) problems are often phrased in terms of Marko...
read it

A Formal Solution to the Grain of Truth Problem
A Bayesian agent acting in a multiagent environment learns to predict t...
read it

Exploration Potential
We introduce exploration potential, a quantity that measures how much a ...
read it

Loss Bounds and Time Complexity for Speed Priors
This paper establishes for the first time the predictive performance of ...
read it

Thompson Sampling is Asymptotically Optimal in General Environments
We discuss a variant of Thompson sampling for nonparametric reinforcemen...
read it

On the Computability of AIXI
How could we solve the machine learning and the artificial intelligence ...
read it

Bad Universal Priors and Notions of Optimality
A big open question of algorithmic information theory is the choice of t...
read it

A Definition of Happiness for Reinforcement Learning Agents
What is happiness for reinforcement learning agents? We seek a formal de...
read it