We show that we can easily design a single adversarial perturbation P th...
As AI systems become more capable, we would like to enlist their help to...
Developing safe and useful general-purpose AI systems will require us to...
Deep neural network classifiers partition input space into high confiden...
We describe our early efforts to red team language models in order to
si...
We study whether language models can evaluate the validity of their own
...
We apply preference modeling and reinforcement learning from human feedb...
Large-scale pre-training has recently emerged as a technique for creatin...
There has been a significant progress in detecting out-of-distribution (...
A variety of recent works, spanning pruning, lottery tickets, and traini...
Mahalanobis distance (MD) is a simple and popular post-processing method...
Near out-of-distribution detection (OOD) is a major challenge for deep n...
In computer vision, it is standard practice to draw a single sample from...
Linear interpolation between initial neural network parameters and conve...
In suitably initialized wide networks, small learning rates transform de...
Recent approaches to efficiently ensemble neural networks have shown tha...
The early phase of training of deep neural networks is critical for thei...
Deep ensembles have been empirically shown to be a promising approach fo...
The local geometry of high dimensional neural network loss landscapes ca...
There are many surprising and perhaps counter-intuitive properties of
op...
We investigate neural network training and generalization using the conc...
Quantum State Tomography is the task of determining an unknown quantum s...
We explore the loss landscape of fully-connected neural networks using
r...
Supermassive black holes at centers of clusters of galaxies strongly int...
We propose a novel architecture for k-shot classification on the Omniglo...