
Second Order Optimization Made Practical
Optimization in machine learning, both theoretical and applied, is prese...
Proximity Preserving Binary Code using Signed GraphCut
We introduce a binary embedding framework, called Proximity Preserving C...
Convolutional Bipartite Attractor Networks
In human perception and cognition, the fundamental operation that brains...
Identity Crisis: Memorization and Generalization under Extreme Overparameterization
We study the interplay between memorization and generalization of overpa...
Are All Layers Created Equal?
Understanding learning and generalization of deep architectures has been...
Exponentiated Gradient Meets Gradient Descent
The (stochastic) gradient descent and the multiplicative update method a...
MemoryEfficient Adaptive Optimization for LargeScale Learning
Adaptive gradientbased optimizers such as AdaGrad and Adam are among th...
The Well Tempered Lasso
We study the complexity of the entire regularization path for least squa...
Shampoo: Preconditioned Stochastic Tensor Optimization
Preconditioned gradient methods are among the most general and powerful ...
A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization
We describe a framework for deriving and analyzing online optimization a...
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
We develop a general duality between neural networks and compositional k...
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (...
Using Web Cooccurrence Statistics for Improving Image Categorization
Object recognition and localization are important tasks in computer visi...
Update Rules for Parameter Estimation in Bayesian Networks
This paper reexamines the problem of parameter estimation in Bayesian n...
Switching Portfolios
A constant rebalanced portfolio is an asset allocation algorithm which k...
Matrix Approximation under Local LowRank Assumption
Matrix approximation is a common tool in machine learning for building a...
