Optimizing ROOT IO For Analysis

11/07/2017
by   Brian Bockelman, et al.
0

The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform their end-stage analysis, reading from intermediate files they generate from experiment data. RIO is thus incredibly flexible: it must serve as a file format for archival (optimized for space) and for working data (optimized for read speed). To date, most of the technical work has focused on improving the former use case. We present work designed to help improve RIO for analysis. We analyze the real-world impact of LZ4 to decrease decompression times (and the corresponding cost in disk space). We introduce new APIs that read RIO data in bulk, removing the per-event overhead of a C++ function call. We compare the performance with the existing RIO APIs for simple structure data and show how this can be complimentary with efforts to improve the parallelism of the RIO stack.

READ FULL TEXT
research
07/19/2022

A Comparison of HDF5, Zarr, and netCDF4 in Performing Common I/O Operations

Scientific data is often stored in files because of the simplicity they ...
research
03/22/2019

Understanding and taming SSD read performance variability: HDFS case study

In this paper we analyze the influence that lower layers (file system, O...
research
06/21/2021

ciftiTools: A package for reading, writing, visualizing and manipulating CIFTI files in R

Surface- and grayordinate-based analysis of MR data has well-recognized ...
research
06/06/2017

ChemKED: a human- and machine-readable data standard for chemical kinetics experiments

Fundamental experimental measurements of quantities such as ignition del...
research
03/31/2020

The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle

The WARC file format is widely used by web archives to preserve collecte...
research
01/20/2022

Statistical detection of format dialects using the weighted Dowker complex

This paper provides an experimentally validated, probabilistic model of ...
research
04/07/2022

RNTuple performance: Status and Outlook

Upcoming HEP experiments, e.g. at the HL-LHC, are expected to increase t...

Please sign up or login with your details

Forgot password? Click here to reset