Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

01/05/2021
by   William F Godoy, et al.
0

Oak Ridge National Laboratory (ORNL) experimental neutron science facilities produce 1.2 TB a day of raw event-based data that is stored using the standard metadata-rich NeXus schema built on top of the HDF5 file format. Performance of several data reduction workflows is largely determined by the amount of time spent on the loading and processing algorithms in Mantid, an open-source data analysis framework used across several neutron sciences facilities around the world. The present work introduces new data management algorithms to address identified input output (I/O) bottlenecks on Mantid. First, we introduce an in-memory binary-tree metadata index that resemble NeXus data access patterns to provide a scalable search and extraction mechanism. Second, data encapsulation in Mantid algorithms is optimally redesigned to reduce the total compute and memory runtime footprint associated with metadata I/O reconstruction tasks. Results from this work show speed ups in wall-clock time on ORNL data reduction workflows, ranging from 11% to 30% depending on the complexity of the targeted instrument-specific data. Nevertheless, we highlight the need for more research to address reduction challenges as experimental data volumes increase.

READ FULL TEXT
research
12/01/2021

Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities

We present algorithmic improvements to the loading operations of certain...
research
07/05/2021

Data Lake Ingestion Management

Data Lake (DL) is a Big Data analysis solution which ingests raw data in...
research
07/14/2019

Metadata Extraction from Raw Astroparticle Data of TAIGA Experiment

Today, the operating TAIGA (Tunka Advanced Instrument for cosmic rays an...
research
03/12/2021

Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management

Scalable data science requires access to metadata, which is increasingly...
research
01/23/2022

An Analysis and Comparison of ACT-R and Soar

This is a detailed analysis and comparison of the ACT-R and Soar cogniti...
research
05/31/2019

DFS: A Dataset File System for Data Discovering Users

Many research questions can be answered quickly and efficiently using da...
research
12/04/2018

Using Binary File Format Description Languages for Documenting, Parsing, and Verifying Raw Data in TAIGA Experiment

The paper is devoted to the issues of raw binary data documenting, parsi...

Please sign up or login with your details

Forgot password? Click here to reset