
The BreakEven Point on Optimization Trajectories of Deep Neural Networks
The early phase of training of deep neural networks is critical for thei...
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning
We introduce a parameterization method called Neural Bayes which allows ...
Entropy Penalty: Towards Generalization Beyond the IID Assumption
It has been shown that instead of learning actual object features, deep ...
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
Residual networks (ResNet) and weight normalization play an important ro...
The Benefits of Overparameterization at Initialization in Deep ReLU Networks
It has been noted in existing literature that overparameterization in R...
hdetach: Modifying the LSTM Gradient Towards Better Optimization
Recurrent neural networks are known for their notorious exploding and va...
On the Spectral Bias of Deep Neural Networks
It is well known that overparametrized deep neural networks (DNNs) are ...
A Walk with SGD
Exploring why stochastic gradient descent (SGD) based optimization metho...
Variational BiLSTMs
Recurrent neural networks like long shortterm memory (LSTM) are importa...
Three Factors Influencing Minima in SGD
We study the properties of the endpoint of stochastic gradient descent (...
Fraternal Dropout
Recurrent neural networks (RNNs) are important class of architectures am...
Residual Connections Encourage Iterative Inference
Residual networks (Resnets) have become a prominent architecture in deep...
A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connection...
On Optimality Conditions for AutoEncoder Signal Recovery
AutoEncoders are unsupervised models that aim to learn patterns from ob...
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
While the authors of Batch Normalization (BN) identify and address an im...
Why Regularized AutoEncoders learn Sparse Representation?
While the authors of Batch Normalization (BN) identify and address an im...
Dimensionality Reduction with Subspace Structure Preservation
Modeling data as being sampled from a union of independent subspaces has...
Is Joint Training Better for Deep AutoEncoders?
Traditionally, when generative models of data are developed via deep arc...
An Analysis of Random Projections in Cancelable Biometrics
With increasing concerns about security, the need for highly secure phys...
