hep_tables: Heterogeneous Array Programming for HEP

by   Gordon Watts, et al.

Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that is the hallmark of the first step of a particle physics analysis: selection, filtering, basic vector operations, and filling histograms. The High Luminosity run of the Large Hadron Collider (HL-LHC), scheduled to start in 2026, will require physicists to regularly skim datasets that are over a PB in size, and repeatedly run over datasets that are 100's of TB's - too big to fit in memory. Declarative programming techniques are a way of separating the intent of the physicist from the mechanics of finding the data, processing the data, and using distributed computing to process it efficiently that is required to extract the plot or data desired in a timely fashion. This paper describes a prototype library that provides a framework for different sub-systems to cooperate in producing this data, using an array-programming declarative interface. This prototype has a ServiceX data-delivery sub-system and an awkward array sub-system cooperating to generate requested data. The ServiceX system runs against ATLAS xAOD data.



There are no comments yet.


page 1

page 2

page 3

page 4


HDArray: Parallel Array Interface for Distributed Heterogeneous Devices

Heterogeneous clusters with nodes containing one or more accelerators, s...

Coffea – Columnar Object Framework For Effective Analysis

The coffea framework provides a new approach to High-Energy Physics anal...

A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array

This paper describes a multi-functional deep in-memory processor for inf...

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

Many interesting data sets available on the Internet are of a medium siz...

Array Programming with NumPy

Array programming provides a powerful, compact, expressive syntax for ac...

Memory-efficient array redistribution through portable collective communication

Modern large-scale deep learning workloads highlight the need for parall...

Soft Maximin Aggregation of Heterogeneous Array Data

The extraction of a common signal across many recordings is difficult wh...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.