An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache

05/28/2023
by   Madhav P. Desai, et al.
0

We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two identical, independent, single-issue in-order execution pipelines (hardware threads) which share a common memory sub-system (consisting of instruction and data caches together with a memory management unit). From a design perspective, the assembly and verification of the dual threaded processor is simplified by the use of existing verified implementations of the execution pipeline and a memory unit. Because the memory unit is shared by the two hardware threads, the relative area overhead of adding the second hardware thread is 25% of the area of the existing single threaded processor. Using an FPGA implementation we evaluate the performance of the dual threaded processor relative to the single threaded one. On applications which can be parallelized, we observe speedups of 1.6X to 1.88X. For applications that are not parallelizable, the speedup is more modest. We also observe that the dual threaded processor performance is degraded on applications which generate large numbers of cache misses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2018

A Queuing Model for CPU Functional Unit and Issue Queue Configuration

In a superscalar processor, instructions of various types flow through a...
research
04/13/2019

Evaluation of the RIKEN Post-K Processor Simulator

For the purpose of developing applications for Post-K at an early stage,...
research
10/27/2019

Cilkmem: Algorithms for Analyzing the Memory High-Water Mark of Fork-Join Parallel Programs

Software engineers designing recursive fork-join programs destined to ru...
research
05/06/2021

Parallelized sequential composition, pipelines, and hardware weak memory models

Since the introduction of the CDC 6600 in 1965 and its `scoreboarding' t...
research
07/15/2022

Computing Execution Times with eXecution Decision Diagrams in the Presence of Out-Of-Order Resources

Worst-Case Execution Time (WCET) is a key component for the verification...
research
02/26/2021

SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length

Over the last decade the relative latency of access to shared memory by ...
research
07/14/2018

Deriving AOC C-Models from D V Languages for Single- or Multi-Threaded Execution Using C or C++

The C language is getting more and more popular as a design and verifica...

Please sign up or login with your details

Forgot password? Click here to reset