We present the NeurIPS 2021 consistency experiment, a larger-scale varia...
The Sharpness Aware Minimization (SAM) optimization algorithm has been s...
How do author perceptions match up to the outcomes of the peer-review pr...
Previous work on neural noisy channel modeling relied on latent variable...
Self-attention is a useful mechanism to build generative models for lang...
Normalization layers are a staple in state-of-the-art deep neural networ...
Large deep neural networks are powerful, but exhibit undesirable behavio...
Much of human dialogue occurs in semi-cooperative settings, where agents...
The prevalent approach to sequence to sequence learning maps an input
se...
The pre-dominant approach to language modeling to date is based on recur...
The prevalent approach to neural machine translation relies on bi-direct...
Theano is a Python library that allows to define, optimize, and evaluate...
Parameter-specific adaptive learning rate methods are computationally
ef...
A central challenge to many fields of science and engineering involves
m...
This article exposes the failure of some big neural networks to leverage...