BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing

by   Lei Wang, et al.

The past decades witness FLOPS (Floating-point Operations per Second) as an important computation-centric performance metric. However, for datacenter (in short, DC) computing workloads, such as Internet services or big data analytics, previous work reports that they have extremely low floating point operation intensity, and the average FLOPS efficiency is only 0.1 average IPC is 1.3 (the theoretic IPC is 4 on the Intel Xeon E5600 platform). Furthermore, we reveal that the traditional FLOPS based Roofline performance model is not suitable for modern DC workloads, and gives misleading information for system optimization. These observations imply that FLOPS is inappropriate for evaluating DC computer systems. To address the above issue, we propose a new computation-centric metric BOPs (Basic OPerations) that measures the efficient work defined by the source code, includes floating-point operations and the arithmetic, logical, comparing, and array addressing parts of integer operations. We define BOPS as the average number of BOPs per second, and propose replacing FLOPS with BOPS to measure DC computer systems. On the basis of BOPS, we propose a new Roofline performance model for DC computing, which we call DC-Roofline model, with which we optimize DC workloads with the improvement varying from 119


BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing

The past decades witness FLOPS (Floating-point Operations per Second), a...

FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference

In many machine learning applications, e.g., tree-based ensembles, float...

End-to-End DNN Training with Block Floating Point Arithmetic

DNNs are ubiquitous datacenter workloads, requiring orders of magnitude ...

NetFC: enabling accurate floating-point arithmetic on programmable switches

In-network computation has been widely used to accelerate data-intensive...

Numerical analysis of Givens rotation

Generating 2-by-2 unitary matrices in floating-precision arithmetic is a...

FFT Convolutions are Faster than Winograd on Modern CPUs, Here is Why

Winograd-based convolution has quickly gained traction as a preferred ap...

Measuring the Algorithmic Efficiency of Neural Networks

Three factors drive the advance of AI: algorithmic innovation, data, and...

Please sign up or login with your details

Forgot password? Click here to reset