Efficient Instruction Scheduling using Real-time Load Delay Tracking

09/07/2021
by   Andreas Diavastos, et al.
0

Many hardware structures in today's high-performance out-of-order processors do not scale in an efficient way. To address this, different solutions have been proposed that build execution schedules in an energy-efficient manner. Issue time prediction processors are one such solution that use data-flow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we aim to improve their accuracy, and consequently their performance, in an energy efficient way. We accomplish this by taking advantage of two key observations. First, memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. Second, we find that these memory access delays often repeat across iterations of the same code. This, in turn, allows us to predict the arrival time of these accesses. In this work, we introduce a new processor microarchitecture, that replaces a complex reservation-station-based scheduler with an efficient, scalable alternative. Our proposed scheduling technique tracks real-time delays of loads to accurately predict instruction issue times, and uses a reordering mechanism to prioritize instructions based on that prediction, achieving close-to-out-of-order processor performance. To accomplish this in an energy-efficient manner we introduce: (1) an instruction delay learning mechanism that monitors repeated load instructions and learns their latest delay, (2) an issue time predictor that uses learned delays and data-flow dependencies to predict instruction issue times and (3) priority queues that reorder instructions based on their issue time prediction. Together, our processor achieves 86.2 processor, higher than previous efficient scheduler proposals, while still consuming 30

READ FULL TEXT

page 1

page 2

page 8

page 9

research
06/06/2016

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution

We introduce the Coarse-Grain Out-of-Order (CG- OoO) general purpose pro...
research
11/23/2020

RVCoreP-32IC: A high-performance RISC-V soft processor with an efficient fetch unit supporting the compressed instructions

In this paper, we propose a high-performance RISC-V soft processor with ...
research
01/28/2022

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy

Over the years, processor throughput has steadily increased. However, th...
research
12/13/2021

Slowing Down for Performance and Energy: An OS-Centric Study in Network Driven Workloads

This paper studies three fundamental aspects of an OS that impact the pe...
research
07/19/2018

A Queuing Model for CPU Functional Unit and Issue Queue Configuration

In a superscalar processor, instructions of various types flow through a...
research
06/12/2020

A Unified Learning Platform for Dynamic Frequency Scaling in Pipelined Processors

A machine learning (ML) design framework is proposed for dynamically adj...
research
06/27/2021

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

This paper presents a methodology for using LLVM-based tools to tune the...

Please sign up or login with your details

Forgot password? Click here to reset