b'Martin Jaggi'

research

∙ 07/13/2023

Layerwise Linear Mode Connectivity

In the federated setup one performs an aggregation of separate local mod...

0 Linara Adilova, et al. ∙

research

∙ 06/14/2023

Provably Personalized and Robust Federated Learning

Clustering clients with similar objectives and learning a model per clus...

0 Mariel Werner, et al. ∙

research

∙ 06/01/2023

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

Transformer-based language models have found many diverse applications r...

0 Matteo Pagliardini, et al. ∙

research

∙ 05/30/2023

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizi...

0 Anastasia Koloskova, et al. ∙

research

∙ 05/29/2023

Collaborative Learning via Prediction Consensus

We consider a collaborative learning setting where each agent's goal is ...

0 Dongyang Fan, et al. ∙

research

∙ 05/26/2023

Rotational Optimizers: Simple Robust DNN Training

The training dynamics of modern deep neural networks depend on complex i...

0 Atli Kosson, et al. ∙

research

∙ 05/26/2023

Ghost Noise for Regularizing Deep Neural Networks

Batch Normalization (BN) is widely used to stabilize the optimization pr...

0 Atli Kosson, et al. ∙

research

∙ 05/26/2023

Hardware-Efficient Transformer Training via Piecewise Affine Operations

Multiplications are responsible for most of the computational cost invol...

0 Atli Kosson, et al. ∙

research

∙ 05/25/2023

Landmark Attention: Random-Access Infinite Context Length for Transformers

While transformers have shown remarkable success in natural language pro...

0 Amirkeivan Mohtashami, et al. ∙

research

∙ 02/24/2023

Linearization Algorithms for Fully Composite Optimization

In this paper, we study first-order algorithms for solving fully composi...

0 Maria-Luiza Vladarean, et al. ∙

research

∙ 02/23/2023

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

We study the widely known Cubic-Newton method in the stochastic setting ...

0 El Mahdi Chayti, et al. ∙

research

∙ 01/05/2023

Beyond spectral gap (extended): The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collab...

0 Thijs Vogels, et al. ∙

research

∙ 12/01/2022

Second-order optimization with lazy Hessians

We analyze Newton's method with lazy Hessian updates for solving general...

0 Nikita Doikov, et al. ∙

research

∙ 11/19/2022

Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training

The unprecedented growth in DNN model complexity, size and the amount of...

0 Simla Burcu Harma, et al. ∙

research

∙ 11/12/2022

Modular Clinical Decision Support Networks (MoDN) – Updatable, Interpretable, and Portable Predictions for Evolving Clinical Environments

Data-driven Clinical Decision Support Systems (CDSS) have the potential ...

0 Cécile Trottet, et al. ∙

research

∙ 06/16/2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for dist...

0 Anastasia Koloskova, et al. ∙

research

∙ 06/07/2022

Beyond spectral gap: The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collab...

0 Thijs Vogels, et al. ∙

research

∙ 05/30/2022

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

It has been widely observed in training of neural networks that when app...

0 Amirkeivan Mohtashami, et al. ∙

research

∙ 05/17/2022

SKILL: Structured Knowledge Infusion for Large Language Models

Large language models (LLMs) have demonstrated human-level performance o...

0 Fedor Moiseev, et al. ∙

research

∙ 04/13/2022

Data-heterogeneity-aware Mixing for Decentralized Learning

Decentralized learning provides an effective framework to train machine ...

11 Yatin Dandi, et al. ∙

research

∙ 02/11/2022

Improving Generalization via Uncertainty Driven Perturbations

Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bi...

7 Matteo Pagliardini, et al. ∙

research

∙ 02/09/2022

Agree to Disagree: Diversity through Disagreement for Better Transferability

Gradient-based learning algorithms have an implicit simplicity bias whic...

0 Matteo Pagliardini, et al. ∙

research

∙ 02/03/2022

Characterizing Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

While SGD, which samples from the data with replacement is widely studie...

0 Amirkeivan Mohtashami, et al. ∙

research

∙ 02/03/2022

Byzantine-Robust Decentralized Learning via Self-Centered Clipping

In this paper, we study the challenging task of Byzantine-robust decentr...

0 Lie He, et al. ∙

research

∙ 12/16/2021

Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation

Over-parameterized deep neural networks are able to achieve excellent tr...

0 Futong Liu, et al. ∙

research

∙ 11/16/2021

Interpreting Language Models Through Knowledge Graph Extraction

Transformer-based language models trained on large text corpora have enj...

0 Vinitra Swamy, et al. ∙

research

∙ 11/10/2021

Linear Speedup in Personalized Collaborative Learning

Personalization in federated learning can improve the accuracy of a mode...

0 El Mahdi Chayti, et al. ∙

research

∙ 10/25/2021

Optimal Model Averaging: Towards Personalized Collaborative Learning

In federated learning, differences in the data or objectives between the...

0 Felix Grimberg, et al. ∙

research

∙ 10/13/2021

WAFFLE: Weighted Averaging for Personalized Federated Learning

In collaborative or federated learning, model personalization can be a v...

0 Martin Beaussart, et al. ∙

research

∙ 10/08/2021

RelaySum for Decentralized Deep Learning on Heterogeneous Data

In decentralized machine learning, workers compute model updates on thei...

0 Thijs Vogels, et al. ∙

research

∙ 09/06/2021

On Second-order Optimization Methods for Federated Learning

We consider federated learning (FL), where the training data is distribu...

0 Sebastian Bischoff, et al. ∙

research

∙ 08/18/2021

Semantic Perturbations with Normalizing Flows for Improved Generalization

Data augmentation is a widely adopted technique for avoiding overfitting...

0 Oğuz Kaan Yüksel, et al. ∙

research

∙ 07/14/2021

IFedAvg: Interpretable Data-Interoperability for Federated Learning

Recently, the ever-growing demand for privacy-oriented machine learning ...

0 David Roschewitz, et al. ∙

research

∙ 06/25/2021

Implicit Gradient Alignment in Distributed and Federated Learning

A major obstacle to achieving global convergence in distributed and fede...

12 Yatin Dandi, et al. ∙

research

∙ 06/16/2021

Simultaneous Training of Partially Masked Neural Networks

For deploying deep learning models to lower end devices, it is necessary...

0 Amirkeivan Mohtashami, et al. ∙

research

∙ 06/08/2021

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

The advent of contextual word embeddings – representations of words whic...

11 Prakhar Gupta, et al. ∙

research

∙ 05/28/2021

Lightweight Cross-Lingual Sentence Representation Learning

Large-scale models for learning fixed-dimensional cross-lingual sentence...

9 Zhuoyuan Mao, et al. ∙

research

∙ 04/15/2021

Federated Learning for Malware Detection in IoT Devices

This work investigates the possibilities enabled by federated learning c...

0 Valerian Rey, et al. ∙

research

∙ 03/03/2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

It has been experimentally observed that the efficiency of distributed t...

0 Sebastian U. Stich, et al. ∙

research

∙ 02/09/2021

Consensus Control for Decentralized Deep Learning

Decentralized training of deep learning models enables on-device learnin...

0 Lingjing Kong, et al. ∙

research

∙ 02/09/2021

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Decentralized training of deep learning models is a key element for enab...

41 Tao Lin, et al. ∙

research

∙ 02/05/2021

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning

Conformal Predictors (CP) are wrappers around ML methods, providing erro...

0 Giovanni Cherubin, et al. ∙

research

∙ 12/18/2020

Learning from History for Byzantine Robust Optimization

Byzantine robustness has received significant attention recently given i...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 11/03/2020

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Decentralized optimization methods enable on-device training of machine ...

22 Dmitry Kovalev, et al. ∙

research

∙ 09/19/2020

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method ...

0 Negar Foroutan Eghlidi, et al. ∙

research

∙ 08/08/2020

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Federated learning is a challenging optimization problem due to the hete...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 08/04/2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...

0 Thijs Vogels, et al. ∙

research

∙ 06/29/2020

Multi-Head Attention: Collaborate Instead of Concatenate

Attention layers are widely used in natural language processing (NLP) an...

0 Jean-Baptiste Cordonnier, et al. ∙

research

∙ 06/25/2020

Taming GANs with Lookahead

Generative Adversarial Networks are notoriously challenging to train. Th...

12 Tatjana Chavdarova, et al. ∙

research

∙ 06/16/2020

Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

In Byzantine robust distributed optimization, a central server wants to ...

0 Lie He, et al. ∙

Martin Jaggi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro