
Depth Separations in Neural Networks: What is Actually Being Separated?
Existing depth separation results for constantdepth networks essentiall...
read it

Proving the Lottery Ticket Hypothesis: Pruning is All You Need
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a ...
read it

Is Local SGD Better than Minibatch SGD?
We study local SGD (also known as parallel SGD and federated averaging),...
read it

How Good is SGD with Random Shuffling?
We study the performance of stochastic gradient descent (SGD) on smooth ...
read it

Bandit Regret Scaling with the Effective Loss Range
We study how the regret guarantees of nonstochastic multiarmed bandits ...
read it

Failures of GradientBased Deep Learning
In recent years, Deep Learning has become the goto solution for a broad...
read it

Oracle Complexity of SecondOrder Methods for FiniteSum Problems
Finitesum optimization problems are ubiquitous in machine learning, and...
read it

DepthWidth Tradeoffs in Approximating Natural Functions with Neural Networks
We provide several new depthbased separation results for feedforward n...
read it

DistributionSpecific Hardness of Learning Neural Networks
Although neural networks are routinely and successfully trained in pract...
read it

Spurious Local Minima are Common in TwoLayer ReLU Neural Networks
We consider the optimization problem associated with training simple ReL...
read it

WithoutReplacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization
Stochastic gradient methods for machine learning and optimization proble...
read it

The Power of Depth for Feedforward Neural Networks
We show that there is a simple (approximately radial) function on ^d, ex...
read it

MultiPlayer Bandits  a Musical Chairs Approach
We consider a variant of the stochastic multiarmed bandit problem, wher...
read it

On the Quality of the Initial Basin in Overspecified Neural Networks
Deep learning, in the form of artificial neural networks, has achieved r...
read it

Convergence of Stochastic Gradient Descent for PCA
We consider the problem of principal component analysis (PCA) in a strea...
read it

Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity
We study the convergence properties of the VRPCA algorithm introduced b...
read it

An Optimal Algorithm for Bandit and ZeroOrder Convex Optimization with TwoPoint Feedback
We consider the closely related problems of bandit convex optimization w...
read it

Communication Complexity of Distributed Convex Learning and Optimization
We study the fundamental limits to communicationefficient distributed m...
read it

On the Complexity of Learning with Kernels
A wellrecognized limitation of kernel learning is the requirement to ha...
read it

Attribute Efficient Linear Regression with DataDependent Sampling
In this paper we analyze a budgeted learning setting, in which the learn...
read it

On the Computational Efficiency of Training Neural Networks
It is wellknown that neural networks are computationally hard to train....
read it

Nonstochastic MultiArmed Bandits with GraphStructured Feedback
We present and study a partialinformation model of online learning, whe...
read it

Communication Efficient Distributed Optimization using an Approximate Newtontype Method
We present a novel Newtontype method for distributed optimization, whic...
read it

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation
Many machine learning approaches are characterized by information constr...
read it

An Algorithm for Training Polynomial Networks
We consider deep neural networks, in which the output of each node is a ...
read it

Online Learning with Switching Costs and Other Adaptive Adversaries
We study the power of different types of adaptive (nonoblivious) adversa...
read it

Stochastic Gradient Descent for Nonsmooth Optimization: Convergence Results and Optimal Averaging Schemes
Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
read it

On the Complexity of Bandit and DerivativeFree Stochastic Convex Optimization
The problem of stochastic convex optimization with bandit feedback (in t...
read it

Relax and Localize: From Value to Algorithms
We show a principled way of deriving online learning algorithms from a m...
read it

Learning with the Weighted Tracenorm under Arbitrary Sampling Distributions
We provide rigorous guarantees on learning with the weighted tracenorm ...
read it

From Bandits to Experts: On the Value of SideObservations
We consider an adversarial online learning setting where a decision make...
read it

Efficient Transductive Online Learning via Randomized Rounding
Most traditional online learning algorithms are based on variants of mir...
read it

Graph Approximation and Clustering on a Budget
We consider the problem of learning from a similarity matrix (such as sp...
read it

LargeScale Convex Minimization with a LowRank Constraint
We address the problem of minimizing a convex function over the space of...
read it

Using More Data to Speedup Training Time
In many recent applications, data is plentiful. By now, we have a rather...
read it

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide ...
read it

Learning Exponential Families in HighDimensions: Strong Convexity and Sparsity
The versatility of exponential families, along with their attendant conv...
read it

SizeIndependent Sample Complexity of Neural Networks
We study the sample complexity of learning neural networks, by providing...
read it

Detecting Correlations with Little Memory and Communication
We study the problem of identifying correlations in multivariate data, u...
read it

Are ResNets Provably Better than Linear Predictors?
A residual network (or ResNet) is a standard deep neural net architectur...
read it

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
We provide tight finitetime convergence bounds for gradient descent and...
read it

Exponential Convergence Time of Gradient Descent for OneDimensional Deep Linear Neural Networks
In this note, we study the dynamics of gradient descent on objective fun...
read it

The Complexity of Making the Gradient Small in Stochastic Convex Optimization
We give nearly matching upper and lower bounds on the oracle complexity ...
read it

Space lower bounds for linear prediction
We show that fundamental learning tasks, such as finding an approximate ...
read it

Global Nonconvex Optimization with Discretized Diffusions
An Euler discretization of the Langevin diffusion is known to converge t...
read it

On the Power and Limitations of Random Features for Understanding Neural Networks
Recently, a spate of papers have provided positive theoretical results f...
read it

The Complexity of Finding Stationary Points with Stochastic Gradient Descent
We study the iteration complexity of stochastic gradient descent (SGD) f...
read it

Learning a Single Neuron with Gradient Methods
We consider the fundamental problem of learning a single neuron x σ(w^ x...
read it

Can We Find NearApproximatelyStationary Points of Nonsmooth Nonconvex Functions?
It is wellknown that given a bounded, smooth nonconvex function, standa...
read it
Ohad Shamir
is this you? claim profile
Faculty member Department of Computer Science and Applied Mathematics at Weizmann Institute of Science