SOAR: Minimizing Network Utilization with Bounded In-network Computing

10/27/2021
by   Raz Segal, et al.
0

In-network computing via smart networking devices is a recent trend for modern datacenter networks. State-of-the-art switches with near line rate computing and aggregation capabilities are developed to enable, e.g., acceleration and better utilization for modern applications like big data analytics, and large-scale distributed and federated machine learning. We formulate and study the problem of activating a limited number of in-network computing devices within a network, aiming at reducing the overall network utilization for a given workload. Such limitations on the number of in-network computing elements per workload arise, e.g., in incremental upgrades of network infrastructure, and are also due to requiring specialized middleboxes, or FPGAs, that should support heterogeneous workloads, and multiple tenants. We present an optimal and efficient algorithm for placing such devices in tree networks with arbitrary link rates, and further evaluate our proposed solution in various scenarios and for various tasks. Our results show that having merely a small fraction of network devices support in-network aggregation can lead to a significant reduction in network utilization. Furthermore, we show that various intuitive strategies for performing such placements exhibit significantly inferior performance compared to our solution, for varying workloads, tasks, and link rates.

READ FULL TEXT
research
01/12/2022

Constrained In-network Computing with Low Congestion in Datacenter Networks

Distributed computing has become a common practice nowadays, where the r...
research
01/17/2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

With widespread advances in machine learning, a number of large enterpri...
research
02/23/2022

MLProxy: SLA-Aware Reverse Proxy for Machine Learning Inference Serving on Serverless Computing Platforms

Serving machine learning inference workloads on the cloud is still a cha...
research
10/18/2019

FLIP:FLexible IoT Path Programming Framework for Large-scale IoT

With the rapid increase in smart objects forming IoT fabric, it is inevi...
research
04/16/2022

A Distributed and Elastic Aggregation Service for Scalable Federated Learning Systems

Federated Learning has promised a new approach to resolve the challenges...
research
04/17/2020

Network-Aware Optimization of Distributed Learning for Fog Computing

Fog computing promises to enable machine learning tasks to scale to larg...
research
08/02/2019

The need for modern computing paradigm: Science applied to computing

More than hundred years ago the 'classic physics' was it in its full pow...

Please sign up or login with your details

Forgot password? Click here to reset