
An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks
It is well known that modern deep neural networks are powerful enough to...
read it

Pufferfish: Communicationefficient Models At No Extra Cost
To mitigate communication overheads in distributed model training, sever...
read it

On the Utility of Gradient Compression in Distributed Training Systems
Rapid growth in data sets and the scale of neural network architectures ...
read it

PermutationBased SGD: Is Random Optimal?
A recent line of groundbreaking results for permutationbased SGD has c...
read it

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Distributed model training suffers from communication bottlenecks due to...
read it

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
Due to its decentralized nature, Federated Learning (FL) lends itself to...
read it

Optimal Lottery Tickets via SubsetSum: Logarithmic OverParameterization is Sufficient
The strong lottery ticket hypothesis (LTH) postulates that one can appro...
read it

Closing the convergence gap of SGD without replacement
Stochastic gradient descent without replacement sampling is widely used ...
read it

Federated Learning with Matched Averaging
Federated learning allows edge devices to collaboratively learn a shared...
read it

DETOX: A Redundancybased Framework for Faster and More Robust Gradient Aggregation
To improve the resilience of distributed training to worstcase, or Byza...
read it

Bad Global Minima Exist and SGD Can Reach Them
Several recent works have aimed to explain why severely overparameterize...
read it

Convergence and Margin of Adversarial Training on Separable Data
Adversarial training is a technique for training robust machine learning...
read it

Does Data Augmentation Lead to Positive Margin?
Data augmentation (DA) is commonly used during model training, as it sig...
read it

SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
read it

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding
We present ErasureHead, a new approach for distributed gradient descent ...
read it

A Geometric Perspective on the Transferability of Adversarial Directions
Stateoftheart machine learning models frequently misclassify inputs t...
read it

ATOMO: Communicationefficient Learning via Atomic Sparsification
Distributed model training suffers from communication overheads due to f...
read it

The Effect of Network Width on the Performance of Largebatch Training
Distributed implementations of minibatch stochastic gradient descent (S...
read it

Gradient Coding via the Stochastic Block Model
Gradient descent and its many variants, including minibatch stochastic ...
read it

DRACO: Robust Distributed Training via Redundant Gradients
Distributed model training is vulnerable to worstcase system failures a...
read it

Approximate Gradient Coding via Sparse Random Graphs
Distributed algorithms are often beset by the straggler effect, where th...
read it

Stability and Generalization of Learning Algorithms that Converge to Global Optima
We establish novel generalization bounds for learning algorithms that co...
read it

CYCLADES: Conflictfree Asynchronous Machine Learning
We present CYCLADES, a general framework for parallelizing stochastic op...
read it

Bipartite Correlation Clustering  Maximizing Agreements
In Bipartite Correlation Clustering (BCC) we are given a complete bipart...
read it

Sparse PCA via Bipartite Matchings
We consider the following multicomponent sparse PCA problem: given a se...
read it

Provable Deterministic Leverage Score Sampling
We explain theoretically a curious empirical phenomenon: "Approximating ...
read it
Dimitris Papailiopoulos
is this you? claim profile