Distributed Real-Time Data Stream Analysis for CTA

10/27/2020
by   Katharina Morik, et al.
0

Once completed, the Cherenkov Telescope Array (CTA) will be able to map the gamma-ray sky in a wide energy range from several tens of GeV to some hundreds of TeV and will be more sensitive than previous experiments by an order of magnitude. It opens up the opportunity to observe transient phenomena like gamma-ray bursts (GRBs) and flaring active galactic nuclei (AGN). In order to successfully trigger multi-wavelength observations of transients, CTA has to be able to alert other observatories as quickly as possible. Multi-wavelength observations are essential for gaining insights into the processes occurring within these sources of such high energy radiation. CTA will consist of approximately 100 telescopes of different sizes and designs. Images are streamed from all the telescopes into a central computing facility on site. During observation CTA will produce a stream of up to 20 000 images per second. Noise suppression and feature extraction algorithms are applied to each image in the stream as well as previously trained machine learning models. Restricted computing power of a single machine and the limits of network's data transfer rates become a bottleneck for stream processing systems in a traditional single-machine setting. We explore several different distributed streaming technologies from the Apache Big-Data eco-system like Spark, Flink, Storm to handle the large amount of data coming from the telescopes. To share a single code base while executing on different streaming engines we employ abstraction layers such as the streams-framework. These use a high level language to build up processing pipelines that can transformed into the native pipelines of the different platforms. Here we present results of our investigation and show a first prototype capable of analyzing CTA data in real-time.

READ FULL TEXT
research
10/26/2020

Online Analysis of High-Volume Data Streams in Astroparticle Physics

Experiments in high-energy astroparticle physics produce large amounts o...
research
04/16/2020

Developing and Deploying Machine Learning Pipelines against Real-Time Image Streams from the PACS

Executing machine learning (ML) pipelines on radiology images is hard du...
research
11/10/2022

Colocating Real-time Storage and Processing: An Analysis of Pull-based versus Push-based Streaming

Real-time Big Data architectures evolved into specialized layers for han...
research
10/27/2020

FACT-Tools - Processing High-Volume Telescope Data

Several large experiments such as MAGIC, FACT, VERITAS, HESS or the upco...
research
05/19/2022

Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads

To conduct real-time analytics computations, big data stream processing ...
research
03/28/2023

Searching for long faint astronomical high energy transients: a data driven approach

HERMES (High Energy Rapid Modular Ensemble of Satellites) pathfinder is ...

Please sign up or login with your details

Forgot password? Click here to reset