Good Intentions: Adaptive Parameter Servers via Intent Signaling

06/01/2022
by   Alexander Renz-Wieland, et al.
0

Parameter servers (PSs) ease the implementation of distributed training for large machine learning (ML) tasks by providing primitives for shared parameter access. Especially for ML tasks that access parameters sparsely, PSs can achieve high efficiency and scalability. To do so, they employ a number of techniques – such as replication or relocation – to reduce communication cost and/or latency of parameter accesses. A suitable choice and parameterization of these techniques is crucial to realize these gains, however. Unfortunately, such choices depend on the task, the workload, and even individual parameters, they often require expensive upfront experimentation, and they are susceptible to workload changes. In this paper, we explore whether PSs can automatically adapt to the workload without any prior tuning. Our goals are to improve usability and to maintain (or even improve) efficiency. We propose (i) a novel intent signaling mechanism that acts as an enabler for adaptivity and naturally integrates into ML tasks, and (ii) a fully adaptive, zero-tuning PS called AdaPS based on this mechanism. Our experimental evaluation suggests that automatic adaptation to the workload is indeed possible: AdaPS matched or outperformed state-of-the-art PSs out of the box.

READ FULL TEXT

page 3

page 13

research
02/03/2020

Dynamic Parameter Allocation in Parameter Servers

To keep up with increasing dataset sizes and model complexity, distribut...
research
10/06/2018

Towards Self-Tuning Parameter Servers

Recent years, many applications have been driven advances by the use of ...
research
04/01/2021

Replicate or Relocate? Non-Uniform Access in Parameter Servers

Parameter servers (PSs) facilitate the implementation of distributed tra...
research
12/07/2020

SpotTune: Leveraging Transient Resources for Cost-efficient Hyper-parameter Tuning in the Public Cloud

Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) a...
research
10/25/2020

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Personalized recommendations are one of the most widely deployed machine...
research
12/20/2022

Tuning the Tail Latency of Distributed Queries Using Replication

Querying graph data with low latency is an important requirement in appl...

Please sign up or login with your details

Forgot password? Click here to reset