Improving Block-level Efficiency with scsi-mq

04/28/2015
by   Blake Caldwell, et al.
0

Current generation solid-state storage devices are exposing a new bottlenecks in the SCSI and block layers of the Linux kernel, where IO throughput is limited by lock contention, inefficient interrupt handling, and poor memory locality. To address these limitations, the Linux kernel block layer underwent a major rewrite with the blk-mq project to move from a single request queue to a multi-queue model. The Linux SCSI subsystem rework to make use of this new model, known as scsi-mq, has been merged into the Linux kernel and work is underway for dm-multipath support in the upcoming Linux 4.0 kernel. These pieces were necessary to make use of the multi-queue block layer in a Lustre parallel filesystem with high availability requirements. We undertook adding support of the 3.18 kernel to Lustre with scsi-mq and dm-multipath patches to evaluate the potential of these efficiency improvements. In this paper we evaluate the block-level performance of scsi-mq with backing storage hardware representative of a HPC-targerted Lustre filesystem. Our findings show that SCSI write request latency is reduced by as much as 13.6 profiling the CPU usage of our prototype Lustre filesystem, we found that CPU idle time increased by a factor of 7 with Linux 3.18 and blk-mq as compared to a standard 2.6.32 Linux kernel. Our findings demonstrate increased efficiency of the multi-queue block layer even with disk-based caching storage arrays used in existing parallel filesystems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2023

B-Treaps Revised: Write Efficient Randomized Block Search Trees with High Load

Uniquely represented data structures represent each logical state with a...
research
02/25/2021

BPF for storage: an exokernel-inspired approach

The overhead of the kernel storage path accounts for half of the access ...
research
10/17/2022

RIO: Order-Preserving and CPU-Efficient Remote Storage Access

Modern NVMe SSDs and RDMA networks provide dramatically higher bandwidth...
research
08/17/2020

CARGO : Context Augmented Critical Region Offload for Network-bound datacenter Workloads

Network bound applications, like a database server executing OLTP querie...
research
09/08/2022

Kernel-Segregated Transpose Convolution Operation

Transpose convolution has shown prominence in many deep learning applica...
research
11/12/2018

Transkernel: Bridging Monolithic Kernels to Peripheral Cores

Smart devices see a large number of ephemeral tasks driven by background...
research
02/22/2013

LFTL: A multi-threaded FTL for a Parallel IO Flash Card under Linux

New PCI-e flash cards and SSDs supporting over 100,000 IOPs are now avai...

Please sign up or login with your details

Forgot password? Click here to reset