Exploring the Behavior of Coherent Accelerator Processor Interface (CAPI) on IBM Power8+ Architecture and FlashSystem 900

09/12/2019
by   Kaushik Velusamy, et al.
0

The Coherent Accelerator Processor Interface (CAPI) is a general term for the infrastructure that provides high throughput and low latency path to the flash storage connected to the IBM POWER 8+ System. CAPI accelerator card is attached coherently as a peer to the Power8+ processor. This removes the overhead and complexity of the IO subsystem and allows the accelerator to operate as part of an application. In this paper, we present the results of experiments on IBM FlashSystem900 (FS900) with CAPI accelerator card using the "CAPI-Flash IBM Data Engine for NoSQL Software" Library. This library provides the application, a direct access to the underlying flash storage through user space APIs, to manage and access the data in flash. This offloads kernel IO driver functionality to dedicated CAPI FPGA accelerator hardware. We conducted experiments to analyze the performance of FS900 with CAPI accelerator card, using the Key Value Layer APIs, employing NASA's MODIS Land Surface Reflectance dataset as a large dataset use case. We performed Read and Write operations on datasets of size ranging from 1MB to 3TB by varying the number of threads. We then compared this performance with other heterogeneous storage and memory devices such as NVM, SSD and RAM, without using the CAPI Accelerator in synchronous and asynchronous file IO modes of operations. The results indicate that FS900 CAPI, together with the metadata cache in RAM, delivers the highest IO/s and OP/s for read operations. This was higher than just using RAM, along with utilizing lesser CPU resources. Among FS900, SSD and NVM, FS900 had the highest write IO/s. Another important observation is that, when the size of the input dataset exceeds the capacity of RAM, and when the data access is non-uniform and sparse, FS900 with CAPI would be a cost-effective alternative.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2017

Analyzing IO Amplification in Linux File Systems

We present the first systematic analysis of read, write, and space ampli...
research
09/23/2019

SplitFS: Reducing Software Overhead in File Systems for Persistent Memory

We present SplitFS, a file system for persistent memory (PM) that reduce...
research
10/16/2017

The ALICE O2 common driver for the C-RORC and CRU read-out cards

ALICE (A Large Ion Collider Experiment) is the heavy-ion detector design...
research
03/06/2018

ASAP: Accelerated Short-Read Alignment on Programmable Hardware

The proliferation of high-throughput sequencing machines ensures rapid g...
research
05/08/2018

FlashAbacus: A Self-Governing Flash-Based Accelerator for Low-Power Systems

Energy efficiency and computing flexibility are some of the primary desi...
research
02/18/2020

Characterizing Synchronous Writes in Stable Memory Devices

Distributed algorithms that operate in the fail-recovery model rely on t...
research
05/09/2018

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

Many FPGAs vendors have recently included embedded processors in their d...

Please sign up or login with your details

Forgot password? Click here to reset