We explore the impact of parameter sparsity on the scaling behavior of
T...
Obtaining versions of deep neural networks that are both highly-accurate...
We initiate the study of repeated game dynamics in the population model,...
We present ongoing work on a new automatic code generation approach for
...
We study a distributed multi-armed bandit setting among a population of ...
Leveraging second-order information at the scale of deep networks is one...
Recent advances in large language model (LLM) pretraining have led to
hi...
Knowledge distillation is a popular approach for enhancing the performan...
Pruning - that is, setting a significant subset of the parameters of a n...
Determining the degree of inherent parallelism in classical sequential
a...
Recent vision architectures and self-supervised training methods enable
...
We provide a new efficient version of the backpropagation algorithm,
spe...
The breakthrough performance of large language models (LLMs) comes with ...
Communication-reduction techniques are a popular way to improve scalabil...
We show for the first time that large-scale generative pretrained transf...
To maximize the performance of concurrent data structures, researchers h...
Asynchronous programming has gained significant popularity over the last...
Data-parallel distributed training of deep neural networks (DNN) has gai...
Generative Pre-trained Transformer (GPT) models set themselves apart thr...
Models from the Vision Transformer (ViT) family have recently provided
b...
Distributed optimization has become one of the standard ways of speeding...
We revisit the performance of the classic gradual magnitude pruning (GMP...
We consider the problem of model compression for deep neural networks (D...
We examine the question of whether SGD-based optimization of deep neural...
Federated Learning (FL) is an emerging paradigm to enable the large-scal...
In the stochastic population protocol model, we are given a connected gr...
Pre-trained Transformer-based language models have become a key building...
Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to
...
The recent focus on the efficiency of deep neural networks (DNNs) has le...
In this note, we introduce a distributed twist on the classic coupon
col...
Transfer learning is a classic paradigm by which models pretrained on la...
Writing concurrent code that is both correct and efficient is notoriousl...
The ability to scale out training workloads has been one of the key
perf...
We study efficient distributed algorithms for the fundamental problem of...
Designing and implementing efficient parallel priority schedulers is an
...
This paper gives tight logarithmic lower bounds on the solo step complex...
The availability of large amounts of user-provided data has been key to ...
Efficiently approximating local curvature information of the loss functi...
The increasing computational requirements of deep neural networks (DNNs)...
Dynamic Connectivity is a fundamental algorithmic graph problem, motivat...
As the size and complexity of models and datasets grow, so does the need...
Approximate agreement is one of the few variants of consensus that can b...
Let G be a graph on n nodes. In the stochastic population protocol model...
We investigate fast and communication-efficient algorithms for the class...
The growing energy and performance costs of deep learning have driven th...
We study adversary-resilient stochastic distributed optimization, in whi...
Many communication-efficient variants of SGD use gradient quantization
s...
Motivated by the interest in communication-efficient methods for distrib...
The design and implementation of efficient concurrent data structures ha...
Transactions can simplify distributed applications by hiding data
distri...