Behnam Neyshabur

research

∙ 11/20/2022

Convexifying Transformers: Improving optimization and understanding of transformer networks

Understanding the fundamental mechanism behind the success of transforme...

0 Tolga Ergen, et al. ∙

research

∙ 11/18/2022

Layer-Stack Temperature Scaling

Recent works demonstrate that early layers in a neural network contain u...

6 Amr Khalifa, et al. ∙

research

∙ 11/15/2022

REPAIR: REnormalizing Permuted Activations for Interpolation Repair

In this paper we look into the conjecture of Entezari et al.(2021) which...

0 Keller Jordan, et al. ∙

research

∙ 11/15/2022

Teaching Algorithmic Reasoning via In-context Learning

Large language models (LLMs) have shown increasing in-context learning c...

0 Hattie Zhou, et al. ∙

research

∙ 09/13/2022

Revisiting Neural Scaling Laws in Language and Vision

The remarkable progress in deep learning in recent years is largely driv...

5 Ibrahim Alabdulmohsin, et al. ∙

research

∙ 07/11/2022

Exploring Length Generalization in Large Language Models

The ability to extrapolate from short problem instances to longer ones i...

0 Cem Anil, et al. ∙

research

∙ 06/29/2022

Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of ...

13 Aitor Lewkowycz, et al. ∙

research

∙ 06/27/2022

Long Range Language Modeling via Gated State Spaces

State space models have shown to be effective at modeling long range dep...

0 Harsh Mehta, et al. ∙

research

∙ 06/22/2022

Understanding the effect of sparsity on neural networks robustness

This paper examines the impact of static sparsity on the robustness of a...

23 Lukas Timpl, et al. ∙

research

∙ 03/11/2022

Block-Recurrent Transformers

We introduce the Block-Recurrent Transformer, which applies a transforme...

0 DeLesley Hutchins, et al. ∙

research

∙ 02/04/2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and traini...

4 Yamini Bansal, et al. ∙

research

∙ 01/11/2022

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Real-world machine learning deployments are characterized by mismatches ...

71 Saurabh Garg, et al. ∙

research

∙ 10/12/2021

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

In this paper, we conjecture that if the permutation invariance of neura...

0 Rahim Entezari, et al. ∙

research

∙ 10/08/2021

A Loss Curvature Perspective on Training Instability in Deep Learning

In this work, we study the evolution of the loss Hessian across many cla...

47 Justin Gilmer, et al. ∙

research

∙ 10/05/2021

Exploring the Limits of Large Scale Pre-training

Recent developments in large-scale machine learning suggest that by scal...

5 Samira Abnar, et al. ∙

research

∙ 06/30/2021

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

Although machine learning models typically experience a drop in performa...

0 Anders Andreassen, et al. ∙

research

∙ 06/17/2021

Deep Learning Through the Lens of Example Difficulty

Existing work on understanding deep learning often employs measures that...

0 Robert J. N. Baldock, et al. ∙

research

∙ 12/14/2020

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

Understanding generalization in deep learning is arguably one of the mos...

0 Yiding Jiang, et al. ∙

research

∙ 12/05/2020

When Do Curricula Work?

Inspired by human learning, researchers have proposed ordering examples ...

0 Xiaoxia Wu, et al. ∙

research

∙ 10/29/2020

Understanding the Failure Modes of Out-of-Distribution Generalization

Empirical studies suggest that machine learning models often rely on fea...

0 Vaishnavh Nagarajan, et al. ∙

research

∙ 10/27/2020

Are wider nets better given the same number of parameters?

Empirical studies demonstrate that the performance of neural networks im...

0 Anna Golubeva, et al. ∙

research

∙ 10/16/2020

The Deep Bootstrap: Good Online Learners are Good Offline Generalizers

We propose a new framework for reasoning about generalization in deep le...

0 Preetum Nakkiran, et al. ∙

research

∙ 10/03/2020

Sharpness-Aware Minimization for Efficiently Improving Generalization

In today's heavily overparameterized models, the value of the training l...

2 Pierre Foret, et al. ∙

research

∙ 08/31/2020

Extreme Memorization via Scale of Initialization

We construct an experimental setup in which changing the scale of initia...

7 Harsh Mehta, et al. ∙

research

∙ 08/26/2020

What is being transferred in transfer learning?

One desired capability for machines is the ability to transfer their kno...

78 Behnam Neyshabur, et al. ∙

research

∙ 07/27/2020

Towards Learning Convolutions from Scratch

Convolution is one of the most essential components of architectures use...

53 Behnam Neyshabur, et al. ∙

research

∙ 12/06/2019

Observational Overfitting in Reinforcement Learning

A major component of overfitting in model-free reinforcement learning (R...

21 Xingyou Song, et al. ∙

research

∙ 12/04/2019

Fantastic Generalization Measures and Where to Find Them

Generalization of deep networks has been of great interest in recent yea...

23 Yiding Jiang, et al. ∙

research

∙ 12/02/2019

The intriguing role of module criticality in the generalization of deep networks

We study the phenomenon that some modules of deep neural networks (DNNs)...

13 Niladri S. Chatterji, et al. ∙

research

∙ 05/30/2018

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Despite existing work on ensuring generalization of neural networks in t...

2 Behnam Neyshabur, et al. ∙

research

∙ 02/14/2018

Stronger generalization bounds for deep nets via a compression approach

Deep nets generalize well despite having more parameters than the number...

0 Sanjeev Arora, et al. ∙

research

∙ 05/25/2017

Implicit Regularization in Matrix Factorization

We study implicit regularization when optimizing an underdetermined quad...

0 Suriya Gunasekar, et al. ∙

research

∙ 05/22/2017

Stabilizing GAN Training with Multiple Random Projections

Training generative adversarial networks is unstable in high-dimensions ...

0 Behnam Neyshabur, et al. ∙

research

∙ 12/19/2016

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, o...

0 Alekh Agarwal, et al. ∙

research

∙ 05/23/2016

Global Optimality of Local Search for Low Rank Matrix Recovery

We show that there are no spurious local minima in the non-convex factor...

0 Srinadh Bhojanapalli, et al. ∙

research

∙ 05/23/2016

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

We investigate the parameter-space geometry of recurrent neural networks...

0 Behnam Neyshabur, et al. ∙

research

∙ 06/08/2015

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

We revisit the choice of SGD for training deep neural networks by recons...

0 Behnam Neyshabur, et al. ∙

research

∙ 02/27/2015

Norm-Based Capacity Control in Neural Networks

We investigate the capacity, convexity and characterization of a general...

0 Behnam Neyshabur, et al. ∙

research

∙ 12/20/2014

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

We present experiments demonstrating that some other form of capacity co...

0 Behnam Neyshabur, et al. ∙

research

∙ 10/21/2014

On Symmetric and Asymmetric LSHs for Inner Product Search

We consider the problem of designing locality sensitive hashes (LSH) for...

0 Behnam Neyshabur, et al. ∙

research

∙ 11/29/2013

The Power of Asymmetry in Binary Hashing

When approximating binary similarity using the hamming distance between ...

0 Behnam Neyshabur, et al. ∙

research

∙ 11/13/2013

Sparse Matrix Factorization

We investigate the problem of factorizing a matrix into several sparse m...

0 Behnam Neyshabur, et al. ∙

Behnam Neyshabur

Featured Co-authors

Sign in with Google

Consider DeepAI Pro