Particle filtering is a popular method for inferring latent states in
st...
This paper studies few-shot learning via representation learning, where ...
The selection of initial parameter values for gradient-based optimizatio...
Recent research shows that for training with ℓ_2 loss, convolutional
neu...
Mode connectivity is a surprising phenomenon in the loss landscape of de...
Efforts to understand the generalization mystery in deep learning have l...
Over-parameterized deep neural networks trained by simple first-order me...
How well does a classic deep net architecture like AlexNet or VGG19 clas...
Recent works have cast some light on the mystery of why deep nets fit an...
We prove that for an L-layer fully-connected linear neural network, if t...
We analyze speed of convergence to global optimum for gradient descent
t...
We study the implicit regularization imposed by gradient descent for lea...
We revisit the question of reducing online learning to approximate
optim...
A first line of attack in exploratory data analysis is data visualizatio...
We consider the convex-concave saddle point problem _x_y
f(x)+y^ A x-g(y...
We propose a rank-k variant of the classical Frank-Wolfe algorithm to so...
In this paper, we study the stochastic combinatorial multi-armed bandit
...