Using Big Data Technologies for HEP Analysis

01/22/2019
by   Matteo Cremonesi, et al.
0

The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow.

READ FULL TEXT

page 6

page 7

research
10/31/2017

CMS Analysis and Data Reduction with Apache Spark

Experimental Particle Physics has been at the forefront of analyzing the...
research
07/24/2023

Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case

The challenges expected for the next era of the Large Hadron Collider (L...
research
08/28/2020

Coffea – Columnar Object Framework For Effective Analysis

The coffea framework provides a new approach to High-Energy Physics anal...
research
05/16/2021

Advanced Multi-Variate Analysis Methods for New Physics Searches at the Large Hadron Collider

Between the years 2015 and 2019, members of the Horizon 2020-funded Inno...
research
02/14/2019

Theory-plus-code documentation of the DEPAM workflow for soundscape description

In the Big Data era, the community of PAM faces strong challenges, inclu...
research
12/17/2019

Embedded Constrained Feature Construction for High-Energy Physics Data Classification

Before any publication, data analysis of high-energy physics experiments...
research
12/05/2018

A distributed data warehouse system for astroparticle physics

A distributed data warehouse system is one of the actual issues in the f...

Please sign up or login with your details

Forgot password? Click here to reset