
Robustness to Pruning Predicts Generalization in Deep Neural Networks
Existing generalization measures that aim to capture a model's simplicit...
Interlocking Backpropagation: Improving depthwise modelparallelism
The number of parameters in state of the art neural networks has drastic...
SliceOut: Training Transformers and CNNs faster while using less memory
We demonstrate 1040 EfficientNets, and Transformer models, with minimal...
Wat zei je? Detecting OutofDistribution Translations with Variational Transformers
We detect outoftrainingdistribution sentences in Neural Machine Trans...
A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks
Evaluation of Bayesian deep learning (BDL) methods is challenging. We of...
Learning Sparse Networks Using Targeted Dropout
Neural networks are easier to optimise when they have many more weights ...
Tensor2Tensor for Neural Machine Translation
Tensor2Tensor is a library for deep learning models that is wellsuited ...
Unsupervised Cipher Cracking Using Discrete GANs
This work details CipherGAN, an architecture inspired by CycleGAN used f...
The Reversible Residual Network: Backpropagation Without Storing Activations
Deep residual networks (ResNets) have significantly pushed forward the s...
One Model To Learn Them All
Deep learning yields great results across many fields, from speech recog...
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent...
Depthwise Separable Convolutions for Neural Machine Translation
Depthwise separable convolutions reduce the number of parameters and com...
Aidan N. Gomez
