John Schulman

research

∙ 05/31/2023

Let's Verify Step by Step

In recent years, large language models have greatly improved in their ab...

1 Hunter Lightman, et al. ∙

research

∙ 01/31/2023

Scaling laws for single-agent reinforcement learning

Recent work has shown that, in generative modeling, cross-entropy loss i...

0 Jacob Hilton, et al. ∙

research

∙ 10/19/2022

Scaling Laws for Reward Model Overoptimization

In reinforcement learning from human feedback, it is common to optimize ...

0 Leo Gao, et al. ∙

research

∙ 07/28/2022

Efficient Training of Language Models to Fill in the Middle

We show that autoregressive language models can learn to infill text aft...

2 Mohammad Bavarian, et al. ∙

research

∙ 03/04/2022

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at fo...

1 Long Ouyang, et al. ∙

research

∙ 12/17/2021

WebGPT: Browser-assisted question-answering with human feedback

We fine-tune GPT-3 to answer long-form questions using a text-based web-...

0 Reiichiro Nakano, et al. ∙

research

∙ 10/27/2021

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tas...

0 Karl Cobbe, et al. ∙

research

∙ 10/01/2021

Batch size-invariance for policy optimization

We say an algorithm is batch size-invariant if changes to the batch size...

21 Jacob Hilton, et al. ∙

research

∙ 09/28/2021

Unsolved Problems in ML Safety

Machine learning (ML) systems are rapidly increasing in size, are acquir...

0 Dan Hendrycks, et al. ∙

research

∙ 03/29/2021

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

The NeurIPS 2020 Procgen Competition was designed as a centralized bench...

26 Sharada Mohanty, et al. ∙

research

∙ 01/26/2021

The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Although deep reinforcement learning has led to breakthroughs in many di...

11 William H. Guss, et al. ∙

research

∙ 10/28/2020

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four do...

2 Tom Henighan, et al. ∙

research

∙ 09/09/2020

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...

0 Karl Cobbe, et al. ∙

research

∙ 12/03/2019

Leveraging Procedural Generation to Benchmark Reinforcement Learning

In this report, we introduce Procgen Benchmark, a suite of 16 procedural...

0 Karl Cobbe, et al. ∙

research

∙ 04/07/2019

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Monte Carlo Tree Search (MCTS) algorithms perform simulation-based searc...

4 Thomas Anthony, et al. ∙

research

∙ 02/06/2019

Semi-Supervised Learning by Label Gradient Alignment

We present label gradient alignment, a novel algorithm for semi-supervis...

0 Jacob Jackson, et al. ∙

research

∙ 12/06/2018

Quantifying Generalization in Reinforcement Learning

In this paper, we investigate the problem of overfitting in deep reinfor...

14 Karl Cobbe, et al. ∙

research

∙ 09/14/2018

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being...

34 Ignasi Clavera, et al. ∙

research

∙ 04/10/2018

Gotta Learn Fast: A New Benchmark for Generalization in RL

In this report, we present a new reinforcement learning (RL) benchmark b...

0 Alex Nichol, et al. ∙

research

∙ 03/08/2018

Reptile: a Scalable Metalearning Algorithm

This paper considers metalearning problems, where there is a distributio...

0 Alex Nichol, et al. ∙

research

∙ 09/28/2017

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Dexterous multi-fingered hands are extremely versatile and provide a gen...

0 Aravind Rajeswaran, et al. ∙

research

∙ 07/01/2017

Teacher-Student Curriculum Learning

We propose Teacher-Student Curriculum Learning (TSCL), a framework for a...

0 Tambet Matiisen, et al. ∙

research

∙ 06/05/2017

UCB Exploration via Q-Ensembles

We show how an ensemble of Q^*-functions can be leveraged for more effec...

0 Richard Y. Chen, et al. ∙

research

∙ 11/15/2016

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Count-based exploration algorithms are known to perform near-optimally w...

0 Haoran Tang, et al. ∙

research

∙ 11/09/2016

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning

Deep reinforcement learning (deep RL) has been successful in learning so...

0 Yan Duan, et al. ∙

research

∙ 11/08/2016

Variational Lossy Autoencoder

Representation learning seeks to expose certain aspects of observed data...

0 Xi Chen, et al. ∙

research

∙ 06/21/2016

Concrete Problems in AI Safety

Rapid progress in machine learning and artificial intelligence (AI) has ...

0 Dario Amodei, et al. ∙

research

∙ 06/12/2016

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

This paper describes InfoGAN, an information-theoretic extension to the ...

0 Xi Chen, et al. ∙

research

∙ 06/05/2016

OpenAI Gym

OpenAI Gym is a toolkit for reinforcement learning research. It includes...

0 Greg Brockman, et al. ∙

research

∙ 05/31/2016

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...

0 Rein Houthooft, et al. ∙

research

∙ 05/09/2016

Theano: A Python framework for fast computation of mathematical expressions

Theano is a Python library that allows to define, optimize, and evaluate...

0 The Theano Development Team, et al. ∙

John Schulman

Featured Co-authors

Sign in with Google

Consider DeepAI Pro