Network Support for High-performance Distributed Machine Learning

02/05/2021
by   Francesco Malandrino, et al.
0

The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology em around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose a system model that captures such aspects in the context of supervised machine learning, accounting for both learning nodes (that perform computations) and information nodes (that provide data). We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform, in order to minimize the learning cost while meeting the target prediction error and execution time. After proving important properties of the above problem, we devise an algorithm, named DoubleClimb, that can find a 1+1/|I|-competitive solution (with I being the set of information nodes), with cubic worst-case complexity. Our performance evaluation, leveraging a real-world network topology and considering both classification and regression tasks, also shows that DoubleClimb closely matches the optimum, outperforming state-of-the-art alternatives.

READ FULL TEXT

page 1

page 12

research
02/23/2022

Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud Continuum

We address distributed machine learning in multi-tier (e.g., mobile-edge...
research
11/26/2019

Network Embedding: An Overview

Networks are one of the most powerful structures for modeling problems i...
research
12/02/2022

Matching DNN Compression and Cooperative Training with Resources and Data Availability

To make machine learning (ML) sustainable and apt to run on the diverse ...
research
07/23/2019

An Optimization-enhanced MANO for Energy-efficient 5G Networks

5G network nodes, fronthaul and backhaul alike, will have both forwardin...
research
09/27/2021

A communication efficient distributed learning framework for smart environments

Due to the pervasive diffusion of personal mobile and IoT devices, many ...
research
05/15/2017

Distributed Algorithms for Feature Extraction Off-loading in Multi-Camera Visual Sensor Networks

Real-time visual analysis tasks, like tracking and recognition, require ...
research
05/08/2023

Reducing Reconfiguration Time in Hybrid Optical-Electrical Datacenter Networks (Extended Abstract)

We study how to reduce the reconfiguration time in hybrid optical-electr...

Please sign up or login with your details

Forgot password? Click here to reset