FACT-Tools - Processing High-Volume Telescope Data

10/27/2020
by   Katharina Morik, et al.
0

Several large experiments such as MAGIC, FACT, VERITAS, HESS or the upcoming CTA project deploy high-precision Cherenkov telescopes to monitor celestial objects. The First G-APD Cherenkov Telescope (FACT) is pioneering the use of solid state photo detectors for imaging atmospheric Cherenkov telescopes. Since October 2011, the FACT collaboration has successfully been showing the application and reliability of silicon photo multipliers for earth-bound gamma-ray astronomy. The amount of data collected by modern Cherenkov telescopes poses big challenges for the data storage and the data analysis. The challenges range from domain specific physics aspects, such as finding good filtering algorithms/parameters for background rejection, to scalability issues, requiring analytical software to be scaled to large clusters of compute nodes for an effective real-time analysis.Modern cluster environments, which emerged from the Big Data community, aim at distributed data storage with a strong emphasis on data locality and fault-tolerant computing. These clusters perfectly match the requirements of modern data-driven physics experiments. However, their programming demands expert knowledge to gain the full performance advantages at the user level. In a joint effort of physicists and computer scientists we targeted this area of conflict using the generic streams framework, a pluggable data processing environment developed at the Collaborative Research Center SFB-876. Using streams allows for the high-level design of analytical data flows, while maintaining compatibility to large scale streaming platforms. This enables physicists to develop and test new algorithms in a local environment and deploy their solutions on modern compute clusters without adaptions. Using the $\backslash$streams framework, we built a processing library for designing a data pipeline for the FACT telescope. The resulting FACT Tools provide a rapid-prototyping environment for the development of any data processing stage required within FACT. The toolsuite supports reading raw camera output and applying various data cleaning and feature extraction stages. The integration of popular machine learning libraries additionally supplies smart filtering of relevant events to suppress background noise. The abstract modeling of data pipelines allow for an efficient data processing on large scale clusters within the Apache Hadoop ecosystem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2020

Online Analysis of High-Volume Data Streams in Astroparticle Physics

Experiments in high-energy astroparticle physics produce large amounts o...
research
03/28/2018

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science

Distributed data processing platforms for cloud computing are important ...
research
07/04/2019

Development of a data infrastructure for a global data and analysis center in astroparticle physics

Nowadays astroparticle physics faces a rapid data volume increase. Meanw...
research
03/15/2022

Innovations in trigger and data acquisition systems for next-generation physics facilities

Data-intensive physics facilities are increasingly reliant on heterogene...
research
07/18/2019

Approximate Solution Approach and Performability Evaluation of Large Scale Beowulf Clusters

Beowulf clusters are very popular and deployed worldwide in support of s...
research
10/27/2020

Distributed Real-Time Data Stream Analysis for CTA

Once completed, the Cherenkov Telescope Array (CTA) will be able to map ...
research
06/16/2023

An approach to provide serverless scientific pipelines within the context of SKA

Function-as-a-Service (FaaS) is a type of serverless computing that allo...

Please sign up or login with your details

Forgot password? Click here to reset