
Metalearning Transferable Representations with a Single Target Domain
Recent works found that finetuning and joint training—two popular appro...
Theoretical Analysis of SelfTraining with Deep Networks on Unlabeled Data
Selftraining algorithms, which train a model to fit pseudolabels predic...
Selftraining Avoids Using Spurious Features Under Domain Shift
In unsupervised domain adaptation, existing theory focuses on situations...
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
The noise in stochastic gradient descent (SGD) provides a crucial implic...
The Implicit and Explicit Regularization Effects of Dropout
Dropout is a widelyused regularization technique, often required to obt...
Improved Sample Complexities for Deep Networks and Robust Classification via an AllLayer Margin
For linear classifiers, the relationship between (normalized) output mar...
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Stochastic gradient descent with a large initial learning rate is a wide...
Learning Imbalanced Datasets with LabelDistributionAware Margin Loss
Deep learning algorithms can fare poorly when the training dataset suffe...
Datadependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Existing Rademacher complexity bounds for neural networks rely only on n...
On the Margin Theory of Feedforward Neural Networks
Past works have shown that, somewhat surprisingly, overparametrization ...
Markov Chain Truncation for DoublyIntractable Inference
Computing partition functions, the normalizing constants of probability ...
Colin Wei
