Jan Leike

research

∙ 05/31/2023

Let's Verify Step by Step

In recent years, large language models have greatly improved in their ab...

1 Hunter Lightman, et al. ∙

research

∙ 06/12/2022

Self-critiquing models for assisting human evaluators

We fine-tune large language models to write natural language critiques (...

8 William Saunders, et al. ∙

research

∙ 03/04/2022

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at fo...

1 Long Ouyang, et al. ∙

research

∙ 01/20/2022

Safe Deep RL in 3D Environments using Human Feedback

Agents should avoid unsafe behaviour during both training and deployment...

5 Matthew Rahtz, et al. ∙

research

∙ 09/22/2021

Recursively Summarizing Books with Human Feedback

A major challenge for scaling machine learning is training models to per...

0 Jeff Wu, et al. ∙

research

∙ 07/07/2021

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly availabl...

6 Mark Chen, et al. ∙

research

∙ 05/30/2021

Institutionalising Ethics in AI through Broader Impact Requirements

Turning principles into practice is one of the most pressing challenges ...

0 Carina Prunkl, et al. ∙

research

∙ 11/13/2020

Active Reinforcement Learning: Observing Rewards at a Cost

Active reinforcement learning (ARL) is a variant on reinforcement learni...

0 David Krueger, et al. ∙

research

∙ 09/19/2020

Hidden Incentives for Auto-Induced Distributional Shift

Decisions made by machine learning systems have increasing influence on ...

9 David Krueger, et al. ∙

research

∙ 06/24/2020

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...

23 Adam Gleave, et al. ∙

research

∙ 04/28/2020

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs...

6 Stuart Armstrong, et al. ∙

research

∙ 12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...

20 Siddharth Reddy, et al. ∙

research

∙ 12/14/2018

Scaling shared model governance via model splitting

Currently the only techniques for sharing governance of a deep learning ...

6 Miljan Martic, et al. ∙

research

∙ 11/19/2018

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world...

20 Jan Leike, et al. ∙

research

∙ 11/15/2018

Reward learning from human preferences and demonstrations in Atari

To solve complex real-world problems with reinforcement learning, we can...

6 Borja Ibarz, et al. ∙

research

∙ 06/05/2018

Learning to Follow Language Instructions with Adversarial Reward Induction

Recent work has shown that deep reinforcement-learning agents can learn ...

14 Dzmitry Bahdanau, et al. ∙

research

∙ 11/27/2017

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating v...

0 Jan Leike, et al. ∙

research

∙ 06/12/2017

Deep reinforcement learning from human preferences

For sophisticated reinforcement learning (RL) systems to interact useful...

0 Paul Christiano, et al. ∙

research

∙ 05/30/2017

Universal Reinforcement Learning Algorithms: Survey and Experiments

Many state-of-the-art reinforcement learning (RL) algorithms typically a...

0 John Aslanides, et al. ∙

research

∙ 11/28/2016

Nonparametric General Reinforcement Learning

Reinforcement learning (RL) problems are often phrased in terms of Marko...

0 Jan Leike, et al. ∙

research

∙ 09/16/2016

A Formal Solution to the Grain of Truth Problem

A Bayesian agent acting in a multi-agent environment learns to predict t...

0 Jan Leike, et al. ∙

research

∙ 09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...

0 Jan Leike, et al. ∙

research

∙ 04/12/2016

Loss Bounds and Time Complexity for Speed Priors

This paper establishes for the first time the predictive performance of ...

0 Daniel Filan, et al. ∙

research

∙ 02/25/2016

Thompson Sampling is Asymptotically Optimal in General Environments

We discuss a variant of Thompson sampling for nonparametric reinforcemen...

0 Jan Leike, et al. ∙

research

∙ 10/19/2015

On the Computability of AIXI

How could we solve the machine learning and the artificial intelligence ...

0 Jan Leike, et al. ∙

research

∙ 10/16/2015

Bad Universal Priors and Notions of Optimality

A big open question of algorithmic information theory is the choice of t...

0 Jan Leike, et al. ∙

research

∙ 05/18/2015

A Definition of Happiness for Reinforcement Learning Agents

What is happiness for reinforcement learning agents? We seek a formal de...

0 Mayank Daswani, et al. ∙

Jan Leike

Featured Co-authors

Sign in with Google

Consider DeepAI Pro