An analysis of core- and chip-level architectural features in four generations of Intel server processors

02/24/2017
by   Johannes Hofmann, et al.
0

This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broad- well) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock speed and its limitations, L2 and L3 cache bandwidth and latency, the impact of Cluster on Die (CoD) and cache snoop modes, and the Uncore clock speed. Using microbenchmarks we study the influence of these factors on code performance. This insight can then serve as input for analytic performance models. We show that the energy efficiency of the LINPACK and HPCG benchmarks can be improved considerably by tuning the Uncore clock speed without sacrificing performance, and that the Graph500 benchmark performance may profit from a suitable choice of cache snoop mode settings.

READ FULL TEXT
research
06/15/2018

AVX-512 extension to OpenQCD 1.6

We publish an extension of openQCD-1.6 with AVX-512 vector instructions ...
research
09/12/2016

An ECM-based energy-efficiency optimization approach for bandwidth-limited streaming kernels on recent Intel Xeon processors

We investigate an approach that uses low-level analysis and the executio...
research
10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...
research
08/14/2020

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Data-parallel problems demand ever growing floating-point (FP) operation...
research
03/04/2022

AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications

User-facing applications running in modern datacenters exhibit irregular...
research
04/07/2022

Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors

Modern processors, in particular within the server segment, integrate mo...
research
10/02/2019

Base64 encoding and decoding at almost the speed of a memory copy

Many common document formats on the Internet are text-only such as email...

Please sign up or login with your details

Forgot password? Click here to reset