A major challenge in modern machine learning is theoretically understand...
Contrastive learning is a highly effective method which uses unlabeled d...
A common lens to theoretically study neural net architectures is to anal...
Pretrained language models have achieved state-of-the-art performance wh...
Recent works in self-supervised learning have advanced the state-of-the-...
Recent works found that fine-tuning and joint training—two popular
appro...
Self-training algorithms, which train a model to fit pseudolabels predic...
In unsupervised domain adaptation, existing theory focuses on situations...
The noise in stochastic gradient descent (SGD) provides a crucial implic...
Dropout is a widely-used regularization technique, often required to obt...
For linear classifiers, the relationship between (normalized) output mar...
Stochastic gradient descent with a large initial learning rate is a wide...
Deep learning algorithms can fare poorly when the training dataset suffe...
Existing Rademacher complexity bounds for neural networks rely only on n...
Past works have shown that, somewhat surprisingly, over-parametrization ...
Computing partition functions, the normalizing constants of probability
...