Sanjeev Arora

research

∙ 07/29/2023

A Theory for Emergence of Complex Skills in Language Models

A major driver of AI products today is the fact that new skills emerge i...

0 Sanjeev Arora, et al. ∙

research

∙ 07/03/2023

Trainable Transformer in Transformer

Recent works attribute the capability of in-context learning (ICL) in la...

0 Abhishek Panigrahi, et al. ∙

research

∙ 05/27/2023

Fine-Tuning Language Models with Just Forward Passes

Fine-tuning language models (LMs) has yielded success on diverse downstr...

8 Sadhika Malladi, et al. ∙

research

∙ 03/14/2023

Do Transformers Parse while Predicting the Masked Word?

Pre-trained language models have been shown to encode linguistic structu...

0 Haoyu Zhao, et al. ∙

research

∙ 03/02/2023

Why (and When) does Local SGD Generalize Better than SGD?

Local SGD is a communication-efficient variant of SGD for large-scale tr...

0 Xinran Gu, et al. ∙

research

∙ 02/13/2023

Task-Specific Skill Localization in Fine-tuned Language Models

Pre-trained language models can be fine-tuned to solve diverse NLP tasks...

0 Abhishek Panigrahi, et al. ∙

research

∙ 11/05/2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

Saliency methods compute heat maps that highlight portions of an input t...

0 Arushi Gupta, et al. ∙

research

∙ 10/11/2022

A Kernel-Based View of Language Model Fine-Tuning

It has become standard to solve NLP tasks by fine-tuning pre-trained lan...

0 Sadhika Malladi, et al. ∙

research

∙ 10/03/2022

Understanding Influence Functions and Datamodels via Harmonic Analysis

Influence functions estimate effect of individual data points on predict...

8 Nikunj Saunshi, et al. ∙

research

∙ 07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...

8 Zhiyuan Li, et al. ∙

research

∙ 06/14/2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) we...

0 Kaifeng Lyu, et al. ∙

research

∙ 05/20/2022

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differen...

9 Sadhika Malladi, et al. ∙

research

∙ 05/19/2022

Understanding Gradient Descent on Edge of Stability in Deep Learning

Deep learning experiments in Cohen et al. (2021) using deterministic Gra...

194 Sanjeev Arora, et al. ∙

research

∙ 03/02/2022

Adaptive Gradient Methods with Local Guarantees

Adaptive gradient methods are the method of choice for optimization in m...

16 Zhou Lu, et al. ∙

research

∙ 02/28/2022

Understanding Contrastive Learning Requires Incorporating Inductive Biases

Contrastive learning is a popular form of self-supervised learning that ...

35 Nikunj Saunshi, et al. ∙

research

∙ 11/30/2021

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Gradient inversion attack (or input recovery from gradient) is an emergi...

1 Yangsibo Huang, et al. ∙

research

∙ 11/28/2021

On Predicting Generalization using GANs

Research on generalization bounds for deep networks seeks to give ways t...

6 Yi Zhang, et al. ∙

research

∙ 10/26/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...

6 Kaifeng Lyu, et al. ∙

research

∙ 10/13/2021

What Happens after SGD Reaches Zero Loss? –A Mathematical Framework

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is ...

30 Zhiyuan Li, et al. ∙

research

∙ 02/25/2021

Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data

Traditional statistics forbids use of test data (a.k.a. holdout data) du...

11 Sanjeev Arora, et al. ∙

research

∙ 02/24/2021

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

It is generally recognized that finite learning rate (LR), in contrast t...

12 Zhiyuan Li, et al. ∙

research

∙ 10/16/2020

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Convolutional neural networks often dominate fully-connected counterpart...

25 Zhiyuan Li, et al. ∙

research

∙ 10/12/2020

TextHide: Tackling Data Privacy in Language Understanding Tasks

An unsolved challenge in distributed or federated learning is to effecti...

7 Yangsibo Huang, et al. ∙

research

∙ 10/07/2020

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Autoregressive language models pretrained on large corpora have been suc...

16 Nikunj Saunshi, et al. ∙

research

∙ 10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...

0 Zhiyuan Li, et al. ∙

research

∙ 10/06/2020

InstaHide: Instance-hiding Schemes for Private Distributed Learning

How can multiple distributed entities collaboratively train a shared dee...

5 Yangsibo Huang, et al. ∙

research

∙ 03/04/2020

Privacy-preserving Learning via Deep Net Pruning

This paper attempts to answer the question whether neural network prunin...

18 Yangsibo Huang, et al. ∙

research

∙ 02/25/2020

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

One popular trend in meta-learning is to learn from many training tasks ...

12 Nikunj Saunshi, et al. ∙

research

∙ 02/24/2020

Provable Representation Learning for Imitation Learning via Bi-level Optimization

A common strategy in modern learning systems is to learn a representatio...

6 Sanjeev Arora, et al. ∙

research

∙ 02/16/2020

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Adversarial training is a popular method to give neural nets robustness ...

11 Yi Zhang, et al. ∙

research

∙ 11/03/2019

Enhanced Convolutional Neural Tangent Kernels

Recent research shows that for training with ℓ_2 loss, convolutional neu...

17 Zhiyuan Li, et al. ∙

research

∙ 10/16/2019

An Exponential Learning Rate Schedule for Deep Learning

Intriguing empirical evidence exists that deep learning can work well wi...

36 Zhiyuan Li, et al. ∙

research

∙ 10/03/2019

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Recent research shows that the following two models are equivalent: (a) ...

16 Sanjeev Arora, et al. ∙

research

∙ 06/14/2019

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Mode connectivity is a surprising phenomenon in the loss landscape of de...

3 Rohith Kuditipudi, et al. ∙

research

∙ 05/31/2019

Implicit Regularization in Deep Matrix Factorization

Efforts to understand the generalization mystery in deep learning have l...

4 Sanjeev Arora, et al. ∙

research

∙ 05/27/2019

A Simple Saliency Method That Passes the Sanity Checks

There is great interest in *saliency methods* (also called *attribution ...

4 Arushi Gupta, et al. ∙

research

∙ 04/26/2019

On Exact Computation with an Infinitely Wide Neural Net

How well does a classic deep net architecture like AlexNet or VGG19 clas...

0 Sanjeev Arora, et al. ∙

research

∙ 02/25/2019

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Recent empirical works have successfully used unlabeled data to learn fe...

34 Sanjeev Arora, et al. ∙

research

∙ 01/24/2019

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Recent works have cast some light on the mystery of why deep nets fit an...

16 Sanjeev Arora, et al. ∙

research

∙ 12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...

20 Sanjeev Arora, et al. ∙

research

∙ 10/04/2018

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent t...

8 Sanjeev Arora, et al. ∙

research

∙ 05/14/2018

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Motivations like domain adaptation, transfer learning, and feature learn...

0 Mikhail Khodak, et al. ∙

research

∙ 03/05/2018

An Analysis of the t-SNE Algorithm for Data Visualization

A first line of attack in exploratory data analysis is data visualizatio...

0 Sanjeev Arora, et al. ∙

research

∙ 02/19/2018

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Conventional wisdom in deep learning states that increasing depth improv...

0 Sanjeev Arora, et al. ∙

research

∙ 02/14/2018

Stronger generalization bounds for deep nets via a compression approach

Deep nets generalize well despite having more parameters than the number...

0 Sanjeev Arora, et al. ∙

research

∙ 11/07/2017

Theoretical limitations of Encoder-Decoder GAN architectures

Encoder-decoder GANs architectures (e.g., BiGAN and ALI) seek to add an ...

0 Sanjeev Arora, et al. ∙

research

∙ 06/14/2017

Provable benefits of representation learning

There is general consensus that learning representations is useful for a...

0 Sanjeev Arora, et al. ∙

research

∙ 04/29/2017

Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that b...

0 Mikhail Khodak, et al. ∙

research

∙ 03/02/2017

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

We show that training of generative adversarial network (GAN) may not ha...

0 Sanjeev Arora, et al. ∙

research

∙ 12/28/2016

Provable learning of Noisy-or Networks

Many machine learning applications use latent variable models to explain...

0 Sanjeev Arora, et al. ∙

Sanjeev Arora

Featured Co-authors

Sign in with Google

Consider DeepAI Pro