Freeway to Memory Level Parallelism in Slice-Out-of-Order Cores

01/03/2022
by   Rakesh Kumar, et al.
0

Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex and energy-hungry hardware. This work revisits slice-out-of-order (sOoO) cores as an energy efficient alternative for MLP exploitation. sOoO cores achieve energy efficiency by constructing and executing slices of MLP generating instructions out-of-order only with respect to the rest of instructions; the slices and the remaining instructions, by themselves, execute in-order. However, we observe that existing sOoO cores miss significant MLP opportunities due to their dependence-oblivious in-order slice execution, which causes dependent slices to frequently block MLP generation. To boost MLP generation, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them from blocking subsequent independent slices and MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway delivers 12 better performance than the state-of-the-art sOoO core and is within 7 MLP limits of full OoO execution.

READ FULL TEXT

page 7

page 8

research
03/15/2022

Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

As the number of deep neural networks (DNNs) to be executed on a mobile ...
research
09/23/2020

Enhancing Resource Management through Prediction-based Policies

Task-based programming models are emerging as a promising alternative to...
research
08/28/2022

Assessing the Impact of Execution Environment on Observation-Based Slicing

Program slicing reduces a program to a smaller version that retains a ch...
research
05/05/2023

Supporting single responsibility through automated extract method refactoring

The responsibility of a method/function is to perform some desired compu...
research
03/16/2018

Memory Slices: A Modular Building Block for Scalable, Intelligent Memory Systems

While reduction in feature size makes computation cheaper in terms of la...
research
09/25/2019

An Improvement Over Threads Communications on Multi-Core Processors

Multicore is an integrated circuit chip that uses two or more computatio...
research
12/20/2018

Mechanism to Mitigate AVX-Induced Frequency Reduction

Modern Intel CPUs reduce their frequency when executing wide vector oper...

Please sign up or login with your details

Forgot password? Click here to reset