
Simultaneous Training of Partially Masked Neural Networks
For deploying deep learning models to lower end devices, it is necessary...
read it

Obtaining Better Static Word Embeddings Using Contextual Embedding Models
The advent of contextual word embeddings – representations of words whic...
read it

Lightweight CrossLingual Sentence Representation Learning
Largescale models for learning fixeddimensional crosslingual sentence...
read it

Federated Learning for Malware Detection in IoT Devices
This work investigates the possibilities enabled by federated learning c...
read it

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
It has been experimentally observed that the efficiency of distributed t...
read it

Consensus Control for Decentralized Deep Learning
Decentralized training of deep learning models enables ondevice learnin...
read it

QuasiGlobal Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
Decentralized training of deep learning models is a key element for enab...
read it

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
Conformal Predictors (CP) are wrappers around ML methods, providing erro...
read it

Learning from History for Byzantine Robust Optimization
Byzantine robustness has received significant attention recently given i...
read it

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Decentralized optimization methods enable ondevice training of machine ...
read it

Sparse Communication for Training Deep Networks
Synchronous stochastic gradient descent (SGD) is the most common method ...
read it

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Federated learning is a challenging optimization problem due to the hete...
read it

PowerGossip: Practical LowRank Communication Compression in Decentralized Deep Learning
Lossy gradient compression has become a practical tool to overcome the c...
read it

MultiHead Attention: Collaborate Instead of Concatenate
Attention layers are widely used in natural language processing (NLP) an...
read it

Taming GANs with Lookahead
Generative Adversarial Networks are notoriously challenging to train. Th...
read it

ByzantineRobust Learning on Heterogeneous Datasets via Resampling
In Byzantine robust distributed optimization, a central server wants to ...
read it

Dynamic Model Pruning with Feedback
Deep neural networks often have millions of parameters. This can hinder ...
read it

Ensemble Distillation for Robust Model Fusion in Federated Learning
Federated Learning (FL) is a machine learning setting where many devices...
read it

Extrapolation for Largebatch Training in Deep Learning
Deep learning networks are typically trained by Stochastic Gradient Desc...
read it

Secure ByzantineRobust Machine Learning
Increasingly machine learning systems are being deployed to edge servers...
read it

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
We present an efficient method of utilizing pretrained language models, ...
read it

Data Parallelism in Training Sparse Neural Networks
Network pruning is an effective methodology to compress large neural net...
read it

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Decentralized stochastic optimization methods have gained a lot of atten...
read it

Robust Crosslingual Embeddings from Parallel Sentences
Recent advances in crosslingual word embeddings have primarily relied o...
read it

Advances and Open Problems in Federated Learning
Federated learning (FL) is a machine learning setting where many clients...
read it

On the Relationship between SelfAttention and Convolutional Layers
Recent trends of incorporating attention mechanisms in vision have led r...
read it

On the Tunability of Optimizers in Deep Learning
There is no consensus yet on the question whether adaptive gradient meth...
read it

Model Fusion via Optimal Transport
Combining different models is a widely used paradigm in machine learning...
read it

Decentralized Deep Learning with Arbitrary Communication Compression
Decentralized training of deep learning models is a key element for enab...
read it

Correlating Twitter Language with CommunityLevel Health Outcomes
We study how language on social media is linked to diseases such as athe...
read it

PowerSGD: Practical LowRank Gradient Compression for Distributed Optimization
We study gradient compression methods to alleviate the communication bot...
read it

On Linear Learning with Manycore Processors
A new generation of manycore processors is on the rise that offers dozen...
read it

Better Word Embeddings by Disentangling Contextual nGram Information
Pretrained word vectors are ubiquitous in Natural Language Processing a...
read it

Crosslingual Document Embedding as ReducedRank Ridge Regression
There has recently been much interest in extending vectorbased word rep...
read it

SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
read it

Structure TreeLSTM: Structureaware Attentional Document Encoders
We propose a method to create document representations that reflect thei...
read it

Forecasting intracranial hypertension using multiscale waveform metrics
Objective: Intracranial hypertension is an important risk factor of seco...
read it

Overcoming MultiModel Forgetting
We identify a phenomenon, which we refer to as multimodel forgetting, t...
read it

Evaluating the Search Phase of Neural Architecture Search
Neural Architecture Search (NAS) aims to facilitate the design of deep n...
read it

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective fun...
read it

Unsupervised Scalable Representation Learning for Multivariate Time Series
Time series constitute a challenging data type for machine learning algo...
read it

Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Signbased algorithms (e.g. signSGD) have been proposed as a biased grad...
read it

Efficient Greedy Coordinate Descent for Composite Problems
Coordinate descent with random coordinate selection is the current state...
read it

Sparsified SGD with Memory
Huge scale machine learning problems are nowadays tackled by distributed...
read it

Wasserstein is all you need
We propose a unified framework for building unsupervised representations...
read it

Don't Use Large MiniBatches, Use Local SGD
Minibatch stochastic gradient methods are the current state of the art ...
read it

COLA: CommunicationEfficient Decentralized Linear Learning
Decentralized machine learning is a promising emerging paradigm in view ...
read it

A Distributed SecondOrder Algorithm You Can Trust
Due to the rapid growth of data and computational resources, distributed...
read it

Global linear convergence of Newton's method without strongconvexity or Lipschitz gradients
We show that Newton's method converges globally at a linear rate for obj...
read it

Training DNNs with Hybrid Block Floating Point
The wide adoption of DNNs has given birth to unrelenting computing requi...
read it
Martin Jaggi
is this you? claim profile
TenureTrack Assistant Professor at EPFL (École polytechnique fédérale de Lausanne)