Train Where the Data is: A Case for Bandwidth Efficient Coded Training

by   Zhifeng Lin, et al.

Training a machine learning model is both compute and data-intensive. Most of the model training is performed on high performance compute nodes and the training data is stored near these nodes for faster training. But there is a growing interest in enabling training near the data. For instance, mobile devices are rich sources of training data. It may not be feasible to consolidate the data from mobile devices into a cloud service, due to bandwidth and data privacy reasons. Training at mobile devices is however fraught with challenges. First mobile devices may join or leave the distributed setting, either voluntarily or due to environmental uncertainties, such as lack of power. Tolerating uncertainties is critical to the success of distributed mobile training. One proactive approach to tolerate computational uncertainty is to store data in a coded format and perform training on coded data. Encoding data is a challenging task since erasure codes require multiple devices to exchange their data to create a coded data partition, which places a significant bandwidth constraint. Furthermore, coded computing traditionally relied on a central node to encode and distribute data to all the worker nodes, which is not practical in a distributed mobile setting. In this paper, we tackle the uncertainty in distributed mobile training using a bandwidth-efficient encoding strategy. We use a Random Linear Network coding (RLNC) which reduces the need to exchange data partitions across all participating mobile devices, while at the same time preserving the property of coded computing to tolerate uncertainties. We implement gradient descent for logistic regression and SVM to evaluate the effectiveness of our mobile training framework. We demonstrate a 50 communication bandwidth compared to MDS coded computation, one of the popular erasure codes.


page 7

page 9


Mobile Communications, Computing and Caching Resources Optimization for Coded Caching with Device Computing

Edge caching and computing have been regarded as an efficient approach t...

Multi-Agent Reinforcement Learning Based Coded Computation for Mobile Ad Hoc Computing

Mobile ad hoc computing (MAHC), which allows mobile devices to directly ...

Distributed Machine Learning on Mobile Devices: A Survey

In recent years, mobile devices have gained increasingly development wit...

Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Stragglers, Byzantine workers, and data privacy are the main bottlenecks...

Modeling and Trade-off for Mobile Communication, Computing and Caching Networks

Computation task service delivery in a computing-enabled and caching-aid...

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

Language model training in distributed settings is limited by the commun...

SoniControl - A Mobile Ultrasonic Firewall

The exchange of data between mobile devices in the near-ultrasonic frequ...

Please sign up or login with your details

Forgot password? Click here to reset