Automatic Parallelization of Software Network Functions

07/27/2023
by   Francisco Pereira, et al.
0

Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC's Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 11

research
09/27/2016

An Evaluation of Coarse-Grained Locking for Multicore Microkernels

The trade-off between coarse- and fine-grained locking is a well underst...
research
09/07/2023

METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies

Due to the scaling problem of the DRAM technology, non-volatile memory d...
research
02/19/2022

Scalable Fine-Grained Parallel Cycle Enumeration Algorithms

This paper investigates scalable parallelisation of state-of-the-art cyc...
research
07/09/2020

IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform

In modern server CPUs, last-level cache (LLC) is a critical hardware res...
research
03/08/2016

Testing fine-grained parallelism for the ADMM on a factor-graph

There is an ongoing effort to develop tools that apply distributed compu...
research
05/03/2019

An Efficient Approach to Achieve Compositionality using Optimized Multi-Version Object Based Transactional Systems

In the modern era of multi-core systems, the main aim is to utilize the ...
research
02/20/2019

JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

The distributed shared memory (DSM) architecture is widely used in today...

Please sign up or login with your details

Forgot password? Click here to reset