Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities

12/01/2021
by   William F Godoy, et al.
0

We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This makes it a scalability challenge, as the number of experiments stored increases in a single ensemble file. The present work follows up on our previous efforts on data management algorithms, to address identified input output (I/O) bottlenecks in Mantid, an open-source data analysis framework used across several neutron science facilities around the world. We reuse an in-memory binary-tree metadata index that resembles data access patterns, to provide a scalable search and extraction mechanism. In addition, several memory operations are refactored and optimized for the current common use cases, ranging most frequently from 10 to 180, and up to 360 separate measurement configurations. Results from this work show consistent speed ups in wall-clock time on the Mantid LoadMD routine, ranging from 19% to 23% on average, on ORNL production computing systems. The latter depends on the complexity of the targeted instrument-specific data and the system I/O and compute variability for the shared computational resources available to users of ORNL's Spallation Neutron Source (SNS) and the High Flux Isotope Reactor (HFIR) instruments. Nevertheless, we continue to highlight the need for more research to address reduction challenges as experimental data volumes, user time and processing costs increase.

READ FULL TEXT

page 1

page 4

research
01/05/2021

Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

Oak Ridge National Laboratory (ORNL) experimental neutron science facili...
research
07/18/2022

ir_metadata: An Extensible Metadata Schema for IR Experiments

The information retrieval (IR) community has a strong tradition of makin...
research
03/29/2020

Persistent Identification Of Instruments

Instruments play an essential role in creating research data. Given the ...
research
10/12/2019

Geomancer: An Open-Source Framework for Geospatial Feature Engineering

This paper presents Geomancer, an open-source framework for geospatial f...
research
01/25/2020

GeoRocket: A scalable and cloud-based data store for big geospatial files

We present GeoRocket, a software for the management of very large geospa...
research
07/27/2020

FASTA/Q Data Compressors for MapReduce-Hadoop Genomics:Space and Time Savings Made Easy – Version 1

Motivation: Storage of genomic data is a major cost for the Life Science...
research
08/18/2016

Efficient Multi-Frequency Phase Unwrapping using Kernel Density Estimation

In this paper we introduce an efficient method to unwrap multi-frequency...

Please sign up or login with your details

Forgot password? Click here to reset