
Understanding Generalization in Adversarial Training via the BiasVariance Decomposition
Adversarially trained models exhibit a large generalization gap: they ca...
read it

Approximating How Single Head Attention Learns
Why do models often attend to salient words, and how does this evolve th...
read it

Limitations of PostHoc Feature Alignment for Robustness
Feature alignment is an approach to improving robustness to distribution...
read it

Measuring Mathematical Problem Solving With the MATH Dataset
Many intellectual endeavors require mathematical problem solving, but th...
read it

Enabling certification of verificationagnostic networks via memoryefficient semidefinite programming
Convex relaxations have emerged as a promising approach for verifying de...
read it

Measuring Massive Multitask Language Understanding
We propose a new test to measure a text model's multitask accuracy. The ...
read it

Aligning AI With Shared Human Values
We show how to assess a language model's knowledge of basic concepts of ...
read it

The Many Faces of Robustness: A Critical Analysis of OutofDistribution Generalization
We introduce three new robustness benchmarks consisting of naturally occ...
read it

Robust estimation via generalized quasigradients
We explore why many recently proposed robust estimation problems are eff...
read it

Identifying Statistical Bias in Dataset Replication
Dataset replication is a useful tool for assessing whether improvements ...
read it

Rethinking BiasVariance Tradeoff for Generalization of Neural Networks
The classical biasvariance tradeoff predicts that bias decreases and v...
read it

When does the Tukey median work?
We analyze the performance of the Tukey median estimator under total var...
read it

A Benchmark for Anomaly Segmentation
Detecting outofdistribution examples is important for safetycritical ...
read it

Generalized Resilience and Robust Statistics
Robust statistics traditionally focuses on outliers, or perturbations in...
read it

Testing Robustness Against Unforeseen Adversaries
Considerable work on adversarial defense has studied robustness to a fix...
read it

Natural Adversarial Examples
We introduce natural adversarial examples  realworld, unmodified, and...
read it

Transfer of Adversarial Robustness Between Perturbation Types
We study the transfer of adversarial robustness of deep neural networks ...
read it

FrAngel: ComponentBased Synthesis with Control Structures
In componentbased program synthesis, the synthesizer generates a progra...
read it

Semidefinite relaxations for certifying robustness to adversarial examples
Despite their impressive performance on diverse tasks, neural networks f...
read it

Stronger Data Poisoning Attacks Break Data Sanitization Defenses
Machine learning models trained on data from the outside world can be co...
read it

Troubling Trends in Machine Learning Scholarship
Collectively, machine learning (ML) researchers are engaged in the creat...
read it

Sever: A Robust MetaAlgorithm for Stochastic Optimization
In high dimensions, most machine learning methods are brittle to even a ...
read it

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
This report surveys the landscape of potential security threats from mal...
read it

Certified Defenses against Adversarial Examples
While neural networks have achieved high accuracy on standard image clas...
read it

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
We introduce a criterion, resilience, which allows properties of a datas...
read it

Learning from Untrusted Data
The vast majority of theoretical results in machine learning and statist...
read it

Concrete Problems in AI Safety
Rapid progress in machine learning and artificial intelligence (AI) has ...
read it

Unsupervised Risk Estimation Using Only Conditional Independence Structure
We show how to estimate a model's test error from unlabeled data, on dis...
read it

The Statistics of Streaming Sparse Regression
We present a sparse analogue to stochastic gradient descent that is guar...
read it
Jacob Steinhardt
is this you? claim profile