Tom Brown

research

∙ 02/15/2023

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement l...

0 Deep Ganguli, et al. ∙

research

∙ 12/15/2022

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...

0 Yuntao Bai, et al. ∙

research

∙ 11/04/2022

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to...

0 Samuel R. Bowman, et al. ∙

research

∙ 09/24/2022

In-context Learning and Induction Heads

"Induction heads" are attention heads that implement a simple algorithm ...

8 Catherine Olsson, et al. ∙

research

∙ 09/06/2022

Inverse methods: How feasible are spatially low-resolved capacity expansion modeling results when dis-aggregated at high resolution?

Spatially highly-resolved capacity expansion models are computationally ...

0 Martha Maria Frysztacki, et al. ∙

research

∙ 08/23/2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

We describe our early efforts to red team language models in order to si...

0 Deep Ganguli, et al. ∙

research

∙ 07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...

12 Saurav Kadavath, et al. ∙

research

∙ 05/21/2022

Scaling Laws and Interpretability of Learning from Repeated Data

Recent large language models have been trained on vast datasets, but als...

0 Danny Hernandez, et al. ∙

research

∙ 04/12/2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedb...

2 Yuntao Bai, et al. ∙

research

∙ 02/15/2022

Predictability and Surprise in Large Generative Models

Large-scale pre-training has recently emerged as a technique for creatin...

0 Deep Ganguli, et al. ∙

research

∙ 12/01/2021

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be poss...

11 Amanda Askell, et al. ∙

research

∙ 01/22/2021

The strong effect of network resolution on electricity system models with high shares of wind and solar

Energy system modellers typically choose a low spatial resolution for th...

0 Martha Maria Frysztacki, et al. ∙

research

∙ 12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...

0 Nicholas Carlini, et al. ∙

research

∙ 08/21/2019

Testing Robustness Against Unforeseen Adversaries

Considerable work on adversarial defense has studied robustness to a fix...

1 Daniel Kang, et al. ∙

research

∙ 05/03/2019

Transfer of Adversarial Robustness Between Perturbation Types

We study the transfer of adversarial robustness of deep neural networks ...

12 Daniel Kang, et al. ∙

research

∙ 08/14/2018

Skill Rating for Generative Models

We explore a new way to evaluate generative models using insights from e...

0 Catherine Olsson, et al. ∙

research

∙ 07/20/2017

Opening the black box of energy modelling: Strategies and lessons learned

The global energy system is undergoing a major transition, and in energy...

0 Stefan Pfenninger, et al. ∙

Tom Brown

Featured Co-authors

Sign in with Google

Consider DeepAI Pro