AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications

03/04/2022
by   Jawad Haj-Yahya, et al.
0

User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements. These characteristics render ineffective existing energy conserving techniques when processors are idle due to the long transition time from a deep idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep C-state architecture optimized for datacenter server processors targeting latency-sensitive applications. AW is based on three key ideas. First, AW eliminates the latency overhead of saving/restoring the core context (i.e., micro-architectural state) when powering-off/-on the core in a deep idle power state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) retaining context in the power-ungated domain. Second, AW eliminates the flush latency overhead (several tens of microseconds) of the L1/L2 caches when entering a deep idle power state by keeping L1/L2 cache content power-ungated. A minimal control logic also remains power-ungated to serve cache coherence traffic (i.e., snoops) seamlessly. AW implements sleep-mode in caches to reduce caches leakage power consumption and lowers a core voltage to the minimum operational voltage level to minimize the leakage power of the power-ungated domain. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, further cutting precious microseconds of wake-up latency at a negligible power cost. Our evaluation with an accurate simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumption of Memcached by up to 71 (35

READ FULL TEXT

page 3

page 5

research
09/24/2020

A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in ...
research
04/22/2022

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

This paper presents the design of AgilePkgC (APC): a new C-state archite...
research
12/22/2021

DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors

To reduce the leakage power of inactive (dark) silicon components, moder...
research
02/24/2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors

This paper presents a survey of architectural features among four genera...
research
05/09/2018

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

This paper presents the Neural Cache architecture, which re-purposes cac...
research
10/23/2020

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

The nanoPU is a new networking-optimized CPU designed to minimize tail l...
research
12/18/2022

An Efficient NVM based Architecture for Intermittent Computing under Energy Constraints

Battery-less technology evolved to replace battery technology. Non-volat...

Please sign up or login with your details

Forgot password? Click here to reset