
-
Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
It has been experimentally observed that the efficiency of distributed t...
read it
-
Consensus Control for Decentralized Deep Learning
Decentralized training of deep learning models enables on-device learnin...
read it
-
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
Decentralized training of deep learning models is a key element for enab...
read it
-
Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
Conformal Predictors (CP) are wrappers around ML methods, providing erro...
read it
-
Learning from History for Byzantine Robust Optimization
Byzantine robustness has received significant attention recently given i...
read it
-
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Decentralized optimization methods enable on-device training of machine ...
read it
-
Sparse Communication for Training Deep Networks
Synchronous stochastic gradient descent (SGD) is the most common method ...
read it
-
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Federated learning is a challenging optimization problem due to the hete...
read it
-
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning
Lossy gradient compression has become a practical tool to overcome the c...
read it
-
Multi-Head Attention: Collaborate Instead of Concatenate
Attention layers are widely used in natural language processing (NLP) an...
read it
-
Taming GANs with Lookahead
Generative Adversarial Networks are notoriously challenging to train. Th...
read it
-
Byzantine-Robust Learning on Heterogeneous Datasets via Resampling
In Byzantine robust distributed optimization, a central server wants to ...
read it
-
Dynamic Model Pruning with Feedback
Deep neural networks often have millions of parameters. This can hinder ...
read it
-
Ensemble Distillation for Robust Model Fusion in Federated Learning
Federated Learning (FL) is a machine learning setting where many devices...
read it
-
Extrapolation for Large-batch Training in Deep Learning
Deep learning networks are typically trained by Stochastic Gradient Desc...
read it
-
Secure Byzantine-Robust Machine Learning
Increasingly machine learning systems are being deployed to edge servers...
read it
-
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
We present an efficient method of utilizing pretrained language models, ...
read it
-
Data Parallelism in Training Sparse Neural Networks
Network pruning is an effective methodology to compress large neural net...
read it
-
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Decentralized stochastic optimization methods have gained a lot of atten...
read it
-
Robust Cross-lingual Embeddings from Parallel Sentences
Recent advances in cross-lingual word embeddings have primarily relied o...
read it
-
Advances and Open Problems in Federated Learning
Federated learning (FL) is a machine learning setting where many clients...
read it
-
On the Relationship between Self-Attention and Convolutional Layers
Recent trends of incorporating attention mechanisms in vision have led r...
read it
-
On the Tunability of Optimizers in Deep Learning
There is no consensus yet on the question whether adaptive gradient meth...
read it
-
Model Fusion via Optimal Transport
Combining different models is a widely used paradigm in machine learning...
read it
-
Decentralized Deep Learning with Arbitrary Communication Compression
Decentralized training of deep learning models is a key element for enab...
read it
-
Correlating Twitter Language with Community-Level Health Outcomes
We study how language on social media is linked to diseases such as athe...
read it
-
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
We study gradient compression methods to alleviate the communication bot...
read it
-
On Linear Learning with Manycore Processors
A new generation of manycore processors is on the rise that offers dozen...
read it
-
Better Word Embeddings by Disentangling Contextual n-Gram Information
Pre-trained word vectors are ubiquitous in Natural Language Processing a...
read it
-
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word rep...
read it
-
SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
read it
-
Structure Tree-LSTM: Structure-aware Attentional Document Encoders
We propose a method to create document representations that reflect thei...
read it
-
Forecasting intracranial hypertension using multi-scale waveform metrics
Objective: Intracranial hypertension is an important risk factor of seco...
read it
-
Overcoming Multi-Model Forgetting
We identify a phenomenon, which we refer to as multi-model forgetting, t...
read it
-
Evaluating the Search Phase of Neural Architecture Search
Neural Architecture Search (NAS) aims to facilitate the design of deep n...
read it
-
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective fun...
read it
-
Unsupervised Scalable Representation Learning for Multivariate Time Series
Time series constitute a challenging data type for machine learning algo...
read it
-
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Sign-based algorithms (e.g. signSGD) have been proposed as a biased grad...
read it
-
Efficient Greedy Coordinate Descent for Composite Problems
Coordinate descent with random coordinate selection is the current state...
read it
-
Sparsified SGD with Memory
Huge scale machine learning problems are nowadays tackled by distributed...
read it
-
Wasserstein is all you need
We propose a unified framework for building unsupervised representations...
read it
-
Don't Use Large Mini-Batches, Use Local SGD
Mini-batch stochastic gradient methods are the current state of the art ...
read it
-
COLA: Communication-Efficient Decentralized Linear Learning
Decentralized machine learning is a promising emerging paradigm in view ...
read it
-
A Distributed Second-Order Algorithm You Can Trust
Due to the rapid growth of data and computational resources, distributed...
read it
-
Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients
We show that Newton's method converges globally at a linear rate for obj...
read it
-
Training DNNs with Hybrid Block Floating Point
The wide adoption of DNNs has given birth to unrelenting computing requi...
read it
-
End-to-End DNN Training with Block Floating Point Arithmetic
DNNs are ubiquitous datacenter workloads, requiring orders of magnitude ...
read it
-
Revisiting First-Order Convex Optimization Over Linear Spaces
Two popular examples of first-order optimization methods over linear spa...
read it
-
EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings
Keyphrase extraction is the task of automatically selecting a small set ...
read it
-
Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems
We propose a generic algorithmic building block to accelerate training o...
read it