On-Chip Mechanisms to Reduce Effective Memory Access Latency

09/01/2016
by   Milad Hashemi, et al.
0

This dissertation develops hardware that automatically reduces the effective latency of accessing memory in both single-core and multi-core systems. To accomplish this, the dissertation shows that all last level cache misses can be separated into two categories: dependent cache misses and independent cache misses. Independent cache misses have all of the source data that is required to generate the address of the memory access available on-chip, while dependent cache misses depend on data that is located off-chip. This dissertation proposes that dependent cache misses are accelerated by migrating the dependence chain that generates the address of the memory access to the memory controller for execution. Independent cache misses are accelerated using a new mode for runahead execution that only executes filtered dependence chains. With these mechanisms, this dissertation demonstrates a 62 and a 19 on a set of high memory intensity workloads.

READ FULL TEXT
research
05/20/2017

Cache Hierarchy Optimization

Power consumption, off-chip memory bandwidth, chip area and Network on C...
research
12/09/2020

Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

Cross-core communication is increasingly a bottleneck as the number of p...
research
12/31/2020

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...
research
11/23/2021

NUMAscope: Capturing and Visualizing Hardware Metrics on Large ccNUMA Systems

Cache-coherent non-uniform memory access (ccNUMA) systems enable paralle...
research
12/04/2017

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching...
research
10/09/2018

Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver

We study the performance behaviour of a seismic simulation using the Exa...
research
07/26/2016

Uber: Utilizing Buffers to Simplify NoCs for Hundreds-Cores

Approaching ideal wire latency using a network-on-chip (NoC) is an impor...

Please sign up or login with your details

Forgot password? Click here to reset