Training deep neural networks in low rank, i.e. with factorised layers, ...
Training on web-scale data can take months. But most computation and tim...
We introduce Goldilocks Selection, a technique for faster model training...
We challenge a common assumption underlying most supervised deep learnin...
Existing generalization measures that aim to capture a model's simplicit...
The number of parameters in state of the art neural networks has drastic...
We demonstrate 10-40
EfficientNets, and Transformer models, with minimal...
We detect out-of-training-distribution sentences in Neural Machine
Trans...
Evaluation of Bayesian deep learning (BDL) methods is challenging. We of...
Neural networks are easier to optimise when they have many more weights ...
Tensor2Tensor is a library for deep learning models that is well-suited ...
This work details CipherGAN, an architecture inspired by CycleGAN used f...
Deep residual networks (ResNets) have significantly pushed forward the
s...
Deep learning yields great results across many fields, from speech
recog...
The dominant sequence transduction models are based on complex recurrent...
Depthwise separable convolutions reduce the number of parameters and
com...