Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

12/11/2021
by   Yifan Yuan, et al.
0

The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network protocols must resort to expensive workarounds. Applications involving floating point data, including distributed training for machine learning and distributed query processing, are key examples. In this paper, we propose FPISA, a floating point representation designed to work efficiently in programmable switches. We first implement FPISA on an Intel Tofino switch, but find that it has limitations that impact throughput and accuracy. We then propose hardware changes to address these limitations based on the open-source Banzai switch architecture, and synthesize them in a 15-nm standard-cell library to demonstrate their feasibility. Finally, we use FPISA to implement accelerators for training for machine learning and for query processing, and evaluate their performance on a switch implementing our changes using emulation. We find that FPISA allows distributed training to use 25-75 fewer CPU cores and provide up to 85.9 environment than SwitchML. For distributed query processing with floating point data, FPISA enables up to 2.7x better throughput than Spark.

READ FULL TEXT

page 3

page 5

research
08/04/2021

BEANNA: A Binary-Enabled Architecture for Neural Network Acceleration

Modern hardware design trends have shifted towards specialized hardware ...
research
05/11/2022

Libra: In-network Gradient Aggregation for Speeding up Distributed Sparse Deep Training

Distributed sparse deep learning has been widely used in many internet-s...
research
02/16/2023

On the Limit Performance of Floating Gossip

In this paper we investigate the limit performance of Floating Gossip, a...
research
05/10/2023

P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Generalized linear models (GLMs) are a widely utilized family of machine...
research
04/10/2020

Cheetah: Accelerating Database Queries with Switch Pruning

Modern database systems are growing increasingly distributed and struggl...
research
06/10/2021

NetFC: enabling accurate floating-point arithmetic on programmable switches

In-network computation has been widely used to accelerate data-intensive...
research
07/01/2016

Using the pyMIC Offload Module in PyFR

PyFR is an open-source high-order accurate computational fluid dynamics ...

Please sign up or login with your details

Forgot password? Click here to reset