
Provable Memorization via Deep Neural Networks using Sublinear Parameters
It is known that Θ(N) parameters are sufficient for neural networks to m...
read it

A Unifying View on Implicit Bias in Training Linear Neural Networks
We study the implicit bias of gradient flow (i.e., gradient descent with...
read it

Minimum Width for Universal Approximation
The universal approximation property of widthbounded networks has been ...
read it

SGD with shuffling: optimal rates without component convexity and large epoch requirements
We study withoutreplacement SGD for solving finitesum optimization pro...
read it

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Transformer networks use pairwise attention to compute contextual embedd...
read it

LowRank Bottleneck in Multihead Attention Models
Attention based Transformer architecture has enabled significant advance...
read it

Are Transformers universal approximators of sequencetosequence functions?
Despite the widespread adoption of Transformer models for NLP tasks, the...
read it

Are deep ResNets provably better than linear predictors?
Recently, a residual network (ResNet) with a single residual block has b...
read it

Finite sample expressive power of smallwidth ReLU networks
We study universal finite sample expressivity of neural networks, define...
read it

Efficiently testing local optimality and escaping saddles for ReLU networks
We provide a theoretical algorithm for checking local optimality and esc...
read it

A Critical View of Global Optimality in Deep Learning
We investigate the loss surface of deep linear and nonlinear neural netw...
read it

Global optimality conditions for deep neural networks
We study the error landscape of deep linear and nonlinear neural network...
read it
Chulhee Yun
is this you? claim profile