Replicate or Relocate? Non-Uniform Access in Parameter Servers

04/01/2021
by   Alexander Renz-Wieland, et al.
0

Parameter servers (PSs) facilitate the implementation of distributed training for large machine learning tasks. A key challenge for PS performance is that parameter access is non-uniform in many real-world machine learning tasks, i.e., different parameters exhibit drastically different access patterns. We identify skew and nondeterminism as two major sources for non-uniformity. Existing PSs are ill-suited for managing such non-uniform access because they uniformly apply the same parameter management technique to all parameters. As consequence, the performance of existing PSs is negatively affected and may even fall behind that of single node baselines. In this paper, we explore how PSs can manage non-uniform access efficiently. We find that it is key for PSs to support multiple management techniques and to leverage a well-suited management technique for each parameter. We present Lapse2, a PS that replicates hot spot parameters, relocates less frequently accessed parameters, and employs specialized techniques to manage nondeterminism that arises from random sampling. In our experimental study, Lapse2 outperformed existing, single-technique PSs by up to one order of magnitude and provided near-linear scalability across multiple machine learning tasks.

READ FULL TEXT
research
02/03/2020

Dynamic Parameter Allocation in Parameter Servers

To keep up with increasing dataset sizes and model complexity, distribut...
research
03/15/2023

Dataset Management Platform for Machine Learning

The quality of the data in a dataset can have a substantial impact on th...
research
10/31/2017

5G Ultra-dense networks with non-uniform Distributed Users

User distribution in ultra-dense networks (UDNs) plays a crucial role in...
research
06/01/2022

Good Intentions: Adaptive Parameter Servers via Intent Signaling

Parameter servers (PSs) ease the implementation of distributed training ...
research
02/03/2021

Optimal Non-Uniform Deployments of LoRa Networks

LoRa wireless technology is an increasingly prominent solution for massi...
research
05/13/2021

Leveraging Non-uniformity in First-order Non-convex Optimization

Classical global convergence results for first-order methods rely on uni...
research
05/19/2022

Comparison on the criticality parameters for two supercritical branching processes in random environments

Let {Z_1,n , n≥ 0} and {Z_2,n, n≥ 0} be two supercritical branching proc...

Please sign up or login with your details

Forgot password? Click here to reset