Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor

10/17/2022
by   Marco Bertuletti, et al.
0

5G Radio access network disaggregation and softwarization pose challenges in terms of computational performance to the processing units. At the physical layer level, the baseband processing computational effort is typically offloaded to specialized hardware accelerators. However, the trend toward software-defined radio-access networks demands flexible, programmable architectures. In this paper, we explore the software design, parallelization and optimization of the key kernels of the lower physical layer (PHY) for physical uplink shared channel (PUSCH) reception on MemPool and TeraPool, two manycore systems having respectively 256 and 1024 small and efficient RISC-V cores with a large shared L1 data memory. PUSCH processing is demanding and strictly time-constrained, it represents a challenge for the baseband processors, and it is also common to most of the uplink channels. Our analysis thus generalizes to the entire lower PHY of the uplink receiver at gNodeB (gNB). Based on the evaluation of the computational effort (in multiply-accumulate operations) required by the PUSCH algorithmic stages, we focus on the parallel implementation of the dominant kernels, namely fast Fourier transform, matrix-matrix multiplication, and matrix decomposition kernels for the solution of linear systems. Our optimized parallel kernels achieve respectively on MemPool and TeraPool speedups of 211, 225, 158, and 762, 880, 722, at high utilization (0.81, 0.89, 0.71, and 0.74, 0.88, 0.71), comparable a single-core serial execution, moving a step closer toward a full-software PUSCH implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
01/13/2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Pr...
research
08/12/2019

Prototyping Software Transceiver for the 5G New Radio Physical Uplink Shared Channel

5G New Radio (NR) is an emerging radio access technology, which is plann...
research
04/07/2021

A matrix math facility for Power ISA(TM) processors

Power ISA(TM) Version 3.1 has introduced a new family of matrix math ins...
research
09/02/2022

Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters

Modern high-performance computing architectures (Multicore, GPU, Manycor...
research
08/31/2017

Algorithmic patterns for H-matrices on many-core processors

In this work, we consider the reformulation of hierarchical (H) matrix a...
research
12/05/2020

MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

A key challenge in scaling shared-L1 multi-core clusters towards many-co...
research
01/09/2023

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

We introduce Stream-K, a work-centric parallelization of matrix multipli...

Please sign up or login with your details

Forgot password? Click here to reset