Layerwise Linear Mode Connectivity

07/13/2023
by   Linara Adilova, et al.
0

In the federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger global model; most often aggregation is a simple averaging of the parameters. Understanding when and why averaging works in a non-convex setup, such as federated deep learning, is an open challenge that hinders obtaining highly performant global models. On i.i.d. datasets federated deep learning with frequent averaging is successful. The common understanding, however, is that during the independent training models are drifting away from each other and thus averaging may not work anymore after many local parameter updates. The problem can be seen from the perspective of the loss surface: for points on a non-convex surface the average can become arbitrarily bad. The assumption of local convexity, often used to explain the success of federated averaging, contradicts to the empirical evidence showing that high loss barriers exist between models from the very beginning of the learning, even when training on the same data. Based on the observation that the learning process evolves differently in different layers, we investigate the barrier between models in a layerwise fashion. Our conjecture is that barriers preventing from successful federated training are caused by a particular layer or group of layers.

READ FULL TEXT

page 3

page 10

page 11

page 12

page 13

research
11/15/2019

Information-Theoretic Perspective of Federated Learning

An approach to distributed machine learning is to train models on local ...
research
06/06/2022

Certified Robustness in Federated Learning

Federated learning has recently gained significant attention and popular...
research
12/14/2018

On the Convergence of Federated Optimization in Heterogeneous Networks

The burgeoning field of federated learning involves training machine lea...
research
02/10/2020

Federated Learning of a Mixture of Global and Local Models

We propose a new optimization formulation for training federated learnin...
research
05/23/2016

Deep Learning without Poor Local Minima

In this paper, we prove a conjecture published in 1989 and also partiall...
research
06/02/2016

Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks

Parallelization framework has become a necessity to speed up the trainin...
research
10/16/2018

Collaborative Deep Learning Across Multiple Data Centers

Valuable training data is often owned by independent organizations and l...

Please sign up or login with your details

Forgot password? Click here to reset