hepaccelerate: Fast Analysis of Columnar Collider Data

06/14/2019
by   Joosep Pata, et al.
0

At HEP experiments, processing terabytes of structured numerical event data to a few statistical summaries is a common task. This step involves selecting events and objects within the event, reconstructing high-level variables, evaluating multivariate classifiers with up to hundreds of variations and creating thousands of low-dimensional histograms. Currently, this is done using multi-step workflows and batch jobs. Based on the CMS search for H(μμ), we demonstrate that it is possible to carry out significant parts of a real collider analysis at a rate of up to a million events per second on a single multicore server with optional GPU acceleration. This is achieved by representing HEP event data as memory-mappable sparse arrays, and by expressing common analysis operations as kernels that can be parallelized across the data using multithreading. We find that only a small number of relatively simple kernels are needed to implement significant parts of this Higgs analysis. Therefore, analysis of real collider datasets of billions events could be done within minutes to a few hours using simple multithreaded codes, reducing the need for managing distributed workflows in the exploratory phase. This approach could speed up the cycle for delivering physics results at HEP experiments. We release the hepaccelerate prototype library as a demonstrator of such accelerated computational kernels. We look forward to discussion, further development and use of efficient and easy-to-use software for terabyte-scale high-level data analysis in the physical sciences.

READ FULL TEXT
research
10/31/2022

High-Level Event Mining: A Framework

Process mining methods often analyze processes in terms of the individua...
research
01/18/2021

E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once

There have been a number of recent proposals to enhance the performance ...
research
11/16/2010

Fast GPGPU Data Rearrangement Kernels using CUDA

Many high performance-computing algorithms are bandwidth limited, hence ...
research
03/17/2020

Evolution of the ROOT Tree I/O

The ROOT TTree data format encodes hundreds of petabytes of High Energy ...
research
12/21/2022

AEStream: Accelerated event-based processing with coroutines

Neuromorphic sensors imitate the sparse and event-based communication se...
research
06/13/2023

Efficient GPU Implementation of Affine Index Permutations on Arrays

Optimal usage of the memory system is a key element of fast GPU algorith...
research
01/22/2019

Unsupervised Automated Event Detection using an Iterative Clustering based Segmentation Approach

A class of vision problems, less commonly studied, consists of detecting...

Please sign up or login with your details

Forgot password? Click here to reset