Modern language models can imitate complex patterns through few-shot
lea...
Large language models trained for safety and harmlessness remain suscept...
As the scale of machine learning models increases, trends such as scalin...
Deployed multimodal systems can fail in ways that evaluators did not
ant...
For content recommender systems such as TikTok and YouTube, the platform...
We analyze transformers from the perspective of iterative inference, see...
Auditing large language models for unexpected behaviors is critical to
p...
Mining large corpora can generate useful discoveries but is time-consumi...
Specifying reward functions for complex tasks like object manipulation o...
Neural networks often exhibit emergent behavior, where qualitatively new...
Existing techniques for training language models can be misaligned with ...
Research in mechanistic interpretability seeks to explain behaviors of
m...
In recent years, deep neural networks have demonstrated increasingly str...
Forecasting future world events is a challenging but valuable task. Fore...
Transparency methods such as model visualizations provide information th...
Digital recommender systems such as Spotify and Netflix affect not only
...
Large language models generate complex, open-ended outputs: instead of
o...
We propose a metric – Projection Norm – to predict a model's performance...
How do two distributions of texts differ? Humans are slow at answering t...
Reward hacking – where RL agents exploit gaps in misspecified reward
fun...
In real-world applications of machine learning, reliable and safe system...
Overparameterization is shown to result in poor test accuracy on rare
su...
When making everyday decisions, people are guided by their conscience, a...
Machine learning (ML) systems are rapidly increasing in size, are acquir...
Large-scale, two-sided matching platforms must find market outcomes that...
To understand neural network behavior, recent works quantitatively compa...
While programming is one of the most broadly applicable skills in modern...
Larger language models have higher accuracy on average, but are they bet...
Traditional learning approaches for classification implicitly assume tha...
Adversarially trained models exhibit a large generalization gap: they ca...
Why do models often attend to salient words, and how does this evolve
th...
Feature alignment is an approach to improving robustness to distribution...
Many intellectual endeavors require mathematical problem solving, but th...
Convex relaxations have emerged as a promising approach for verifying
de...
We propose a new test to measure a text model's multitask accuracy. The ...
We show how to assess a language model's knowledge of basic concepts of
...
We introduce three new robustness benchmarks consisting of naturally
occ...
We explore why many recently proposed robust estimation problems are
eff...
Dataset replication is a useful tool for assessing whether improvements ...
The classical bias-variance trade-off predicts that bias decreases and
v...
We analyze the performance of the Tukey median estimator under total
var...
Detecting out-of-distribution examples is important for safety-critical
...
Robust statistics traditionally focuses on outliers, or perturbations in...
Considerable work on adversarial defense has studied robustness to a fix...
We introduce natural adversarial examples -- real-world, unmodified, and...
We study the transfer of adversarial robustness of deep neural networks
...
In component-based program synthesis, the synthesizer generates a progra...
Despite their impressive performance on diverse tasks, neural networks f...
Machine learning models trained on data from the outside world can be
co...
Collectively, machine learning (ML) researchers are engaged in the creat...