
A Closer Look at Codistillation for Distributed Training
Codistillation has been proposed as a mechanism to share knowledge among...
read it

Revisiting Loss Modelling for Unstructured Pruning
By removing parameters from deep neural networks, unstructured pruning m...
read it

Recovering Petaflops in Contrastive SemiSupervised Learning of Visual Representations
We investigate a strategy for improving the computational efficiency of ...
read it

SlowMo: Improving CommunicationEfficient Distributed SGD with Slow Momentum
Distributed optimization is essential for training large models on large...
read it

Needles in Haystacks: On Classifying Tiny Objects in Large Images
In some computer vision domains, such as medical or hyperspectral imagin...
read it

Gossipbased ActorLearner Architectures for Deep Reinforcement Learning
Multisimulator training has contributed to the recent success of Deep R...
read it

Improved Conditional VRNNs for Video Prediction
Predicting future frames for a video sequence is a challenging generativ...
read it

Stochastic Gradient Push for Distributed Deep Learning
Large minibatch parallel SGD is commonly used for distributed training ...
read it

DNN's Sharpest Directions Along the SGD Trajectory
Recent work has identified that using a high learning rate or a small ba...
read it

A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning
The risks and perils of overfitting in machine learning are well known. ...
read it

Fast Approximate Natural Gradient Descent in a Kroneckerfactored Eigenbasis
Optimization algorithms that leverage gradient covariance information, s...
read it

Three Factors Influencing Minima in SGD
We study the properties of the endpoint of stochastic gradient descent (...
read it

Residual Connections Encourage Iterative Inference
Residual networks (Resnets) have become a prominent architecture in deep...
read it

A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connection...
read it

A dataset and exploration of models for understanding video data through fillintheblank questionanswering
While deep convolutional neural networks frequently approach or exceed h...
read it

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
We propose zoneout, a novel method for regularizing RNNs. At each timest...
read it

Theano: A Python framework for fast computation of mathematical expressions
Theano is a Python library that allows to define, optimize, and evaluate...
read it

Dynamic Capacity Networks
We introduce the Dynamic Capacity Network (DCN), a neural network that c...
read it

Delving Deeper into Convolutional Networks for Learning Video Representations
We propose an approach to learn spatiotemporal features in videos from ...
read it

Oracle performance for visual captioning
The task of associating images and videos with a natural language descri...
read it

Describing Videos by Exploiting Temporal Structure
Recent progress in using recurrent neural networks (RNNs) for image desc...
read it

FitNets: Hints for Thin Deep Nets
While depth tends to improve network performances, it also makes gradien...
read it