
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
The early phase of training has been shown to be important in two ways f...
read it

The BreakEven Point on Optimization Trajectories of Deep Neural Networks
The early phase of training of deep neural networks is critical for thei...
read it

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning
We introduce a parameterization method called Neural Bayes which allows ...
read it

Entropy Penalty: Towards Generalization Beyond the IID Assumption
It has been shown that instead of learning actual object features, deep ...
read it

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
Residual networks (ResNet) and weight normalization play an important ro...
read it

The Benefits of Overparameterization at Initialization in Deep ReLU Networks
It has been noted in existing literature that overparameterization in R...
read it

hdetach: Modifying the LSTM Gradient Towards Better Optimization
Recurrent neural networks are known for their notorious exploding and va...
read it

On the Spectral Bias of Deep Neural Networks
It is well known that overparametrized deep neural networks (DNNs) are ...
read it

A Walk with SGD
Exploring why stochastic gradient descent (SGD) based optimization metho...
read it

Variational BiLSTMs
Recurrent neural networks like long shortterm memory (LSTM) are importa...
read it

Three Factors Influencing Minima in SGD
We study the properties of the endpoint of stochastic gradient descent (...
read it

Fraternal Dropout
Recurrent neural networks (RNNs) are important class of architectures am...
read it

Residual Connections Encourage Iterative Inference
Residual networks (Resnets) have become a prominent architecture in deep...
read it

A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connection...
read it

On Optimality Conditions for AutoEncoder Signal Recovery
AutoEncoders are unsupervised models that aim to learn patterns from ob...
read it

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
While the authors of Batch Normalization (BN) identify and address an im...
read it

Why Regularized AutoEncoders learn Sparse Representation?
While the authors of Batch Normalization (BN) identify and address an im...
read it

Dimensionality Reduction with Subspace Structure Preservation
Modeling data as being sampled from a union of independent subspaces has...
read it

Is Joint Training Better for Deep AutoEncoders?
Traditionally, when generative models of data are developed via deep arc...
read it

An Analysis of Random Projections in Cancelable Biometrics
With increasing concerns about security, the need for highly secure phys...
read it
Devansh Arpit
is this you? claim profile
Postdoctoral Fellow at University of Montreal  Montreal Institute for Learning Algorithms (MILA)