nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems

11/08/2019
by   Andreas Abel, et al.
0

We present nanoBench, a tool for evaluating small microbenchmarks using hardware performance counters on Intel and AMD x86 systems. Most existing tools and libraries are intended to either benchmark entire programs, or program segments in the context of their execution within a larger program. In contrast, nanoBench is specifically designed to evaluate small, isolated pieces of code. Such code is common in microbenchmark-based hardware analysis techniques. Unlike previous tools, nanoBench can execute microbenchmarks directly in kernel space. This allows to benchmark privileged instructions, and it enables more accurate measurements. The reading of the performance counters is implemented with minimal overhead avoiding functions calls and branches. As a consequence, nanoBench is precise enough to measure individual memory accesses. We illustrate the utility of nanoBench at the hand of two case studies. First, we briefly discuss how nanoBench has been used to determine the latency, throughput, and port usage of more than 12,000 instruction variants on recent x86 processors. Second, we show how to generate microbenchmarks to precisely characterize the cache architectures of ten Intel Core microarchitectures. This includes the most comprehensive analysis of the employed cache replacement policies to date.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2018

uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

Modern microarchitectures are some of the world's most complex man-made ...
research
09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...
research
09/05/2020

Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

This paper surveys a range of methods to collect necessary performance d...
research
09/14/2020

Simplex: Repurposing Intel Memory Protection Extensions for Information Hiding

With the rapid increase in software exploits, the last few decades have ...
research
01/13/2022

MCAD: Beyond Basic-Block Throughput Estimation Through Differential, Instruction-Level Tracing

Estimating instruction-level throughput is critical for many application...
research
10/02/2019

Base64 encoding and decoding at almost the speed of a memory copy

Many common document formats on the Internet are text-only such as email...
research
08/05/2021

Accelerating XOR-based Erasure Coding using Program Optimization Techniques

Erasure coding (EC) affords data redundancy for large-scale systems. XOR...

Please sign up or login with your details

Forgot password? Click here to reset