
Properties of the After Kernel
The Neural Tangent Kernel (NTK) is the widenetwork limit of a kernel de...
read it

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?
We establish conditions under which gradient descent applied to fixedwi...
read it

When does gradient descent with logistic loss find interpolating twolayer networks?
We study the training of finitewidth twolayer smoothed ReLU networks f...
read it

Failures of modeldependent generalization bounds for leastnorm interpolation
We consider bounds on the generalization performance of the leastnorm l...
read it

Finitesample analysis of interpolating linear classifiers in the overparameterized regime
We prove bounds on the population risk of the maximum margin algorithm f...
read it

On the Global Convergence of Training Deep Linear ResNets
We study the convergence of gradient descent (GD) and stochastic gradien...
read it

Oracle lower bounds for stochastic gradient sampling algorithms
We consider the problem of sampling from a strongly logconcave density ...
read it

Benign Overfitting in Linear Regression
The phenomenon of benign overfitting is one of the key mysteries uncover...
read it

Sizefree generalization bounds for convolutional neural networks
We prove bounds on the generalization error of convolutional networks. T...
read it

On the effect of the activation function on the distribution of hidden nodes in a deep network
We analyze the joint probability distribution on the lengths of the vect...
read it

Density estimation for shiftinvariant multidimensional distributions
We study density estimation for classes of shiftinvariant distributions...
read it

Learning Sums of Independent Random Variables with Sparse Collective Support
We study the learnability of sums of independent integer random variable...
read it

The Singular Values of Convolutional Layers
We characterize the singular values of the linear transformation associa...
read it

Representing smooth functions as compositions of nearidentity functions with implications for deep network optimization
We show that any smooth biLipschitz h can be represented exactly as a c...
read it

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks
We analyze algorithms for approximating a function f(x) = Φ x mapping ^d...
read it

Surprising properties of dropout in deep networks
We analyze dropout in deep networks with rectified linear units and the ...
read it

On the Inductive Bias of Dropout
Dropout is a simple but effective technique for learning in neural netwo...
read it

The Power of Localization for Efficiently Learning Linear Separators with Noise
We introduce a new approach for designing computationally efficient lear...
read it

Active and passive learning of linear separators under logconcave distributions
We provide new results concerning label efficient, polynomial time, pass...
read it
Philip M. Long
is this you? claim profile