
Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data
Traditional statistics forbids use of test data (a.k.a. holdout data) du...
read it

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
It is generally recognized that finite learning rate (LR), in contrast t...
read it

Why Are Convolutional Nets More SampleEfficient than FullyConnected Nets?
Convolutional neural networks often dominate fullyconnected counterpart...
read it

TextHide: Tackling Data Privacy in Language Understanding Tasks
An unsolved challenge in distributed or federated learning is to effecti...
read it

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Autoregressive language models pretrained on large corpora have been suc...
read it

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...
read it

InstaHide: Instancehiding Schemes for Private Distributed Learning
How can multiple distributed entities collaboratively train a shared dee...
read it

Privacypreserving Learning via Deep Net Pruning
This paper attempts to answer the question whether neural network prunin...
read it

A Sample Complexity Separation between NonConvex and Convex MetaLearning
One popular trend in metalearning is to learn from many training tasks ...
read it

Provable Representation Learning for Imitation Learning via Bilevel Optimization
A common strategy in modern learning systems is to learn a representatio...
read it

Overparameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Adversarial training is a popular method to give neural nets robustness ...
read it

Enhanced Convolutional Neural Tangent Kernels
Recent research shows that for training with ℓ_2 loss, convolutional neu...
read it

An Exponential Learning Rate Schedule for Deep Learning
Intriguing empirical evidence exists that deep learning can work well wi...
read it

Harnessing the Power of Infinitely Wide Deep Nets on Smalldata Tasks
Recent research shows that the following two models are equivalent: (a) ...
read it

Explaining Landscape Connectivity of Lowcost Solutions for Multilayer Nets
Mode connectivity is a surprising phenomenon in the loss landscape of de...
read it

Implicit Regularization in Deep Matrix Factorization
Efforts to understand the generalization mystery in deep learning have l...
read it

A Simple Saliency Method That Passes the Sanity Checks
There is great interest in *saliency methods* (also called *attribution ...
read it

On Exact Computation with an Infinitely Wide Neural Net
How well does a classic deep net architecture like AlexNet or VGG19 clas...
read it

A Theoretical Analysis of Contrastive Unsupervised Representation Learning
Recent empirical works have successfully used unlabeled data to learn fe...
read it

FineGrained Analysis of Optimization and Generalization for Overparameterized TwoLayer Neural Networks
Recent works have cast some light on the mystery of why deep nets fit an...
read it

Theoretical Analysis of Auto RateTuning by Batch Normalization
Batch Normalization (BN) has become a cornerstone of deep learning acros...
read it

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
We analyze speed of convergence to global optimum for gradient descent t...
read it

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Motivations like domain adaptation, transfer learning, and feature learn...
read it

An Analysis of the tSNE Algorithm for Data Visualization
A first line of attack in exploratory data analysis is data visualizatio...
read it

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Conventional wisdom in deep learning states that increasing depth improv...
read it

Stronger generalization bounds for deep nets via a compression approach
Deep nets generalize well despite having more parameters than the number...
read it

Theoretical limitations of EncoderDecoder GAN architectures
Encoderdecoder GANs architectures (e.g., BiGAN and ALI) seek to add an ...
read it

Provable benefits of representation learning
There is general consensus that learning representations is useful for a...
read it

Extending and Improving Wordnet via Unsupervised Word Embeddings
This work presents an unsupervised approach for improving WordNet that b...
read it

Generalization and Equilibrium in Generative Adversarial Nets (GANs)
We show that training of generative adversarial network (GAN) may not ha...
read it

Provable learning of Noisyor Networks
Many machine learning applications use latent variable models to explain...
read it

Mapping Between fMRI Responses to Movies and their Natural Language Annotations
Several research groups have shown how to correlate fMRI responses to th...
read it

Provable Algorithms for Inference in Topic Models
Recently, there has been considerable progress on designing algorithms w...
read it

Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Word embeddings are ubiquitous in NLP and information retrieval, but it'...
read it

Simple, Efficient, and Neural Algorithms for Sparse Coding
Sparse coding is a basic task in many fields including signal processing...
read it

RANDWALK: A Latent Variable Model Approach to Word Embeddings
Semantic word embeddings represent the meaning of a word via a vector, a...
read it

More Algorithms for Provable Dictionary Learning
In dictionary learning, also known as sparse coding, the algorithm is gi...
read it

New Algorithms for Learning Incoherent and Overcomplete Dictionaries
In sparse recovery we are given a matrix A (the dictionary) and a vector...
read it

A Practical Algorithm for Topic Modeling with Provable Guarantees
Topic models provide a useful method for dimensionality reduction and ex...
read it