We test the hypothesis that language models trained with reinforcement
l...
As AI systems become more capable, we would like to enlist their help to...
Developing safe and useful general-purpose AI systems will require us to...
"Induction heads" are attention heads that implement a simple algorithm ...
Spatially highly-resolved capacity expansion models are computationally
...
We describe our early efforts to red team language models in order to
si...
We study whether language models can evaluate the validity of their own
...
Recent large language models have been trained on vast datasets, but als...
We apply preference modeling and reinforcement learning from human feedb...
Large-scale pre-training has recently emerged as a technique for creatin...
Given the broad capabilities of large language models, it should be poss...
Energy system modellers typically choose a low spatial resolution for th...
It has become common to publish large (billion parameter) language model...
Considerable work on adversarial defense has studied robustness to a fix...
We study the transfer of adversarial robustness of deep neural networks
...
We explore a new way to evaluate generative models using insights from
e...
The global energy system is undergoing a major transition, and in energy...