FACT-Tools - Processing High-Volume Telescope Data
Several large experiments such as MAGIC, FACT, VERITAS, HESS or the upcoming CTA project deploy high-precision Cherenkov telescopes to monitor celestial objects. The First G-APD Cherenkov Telescope (FACT) is pioneering the use of solid state photo detectors for imaging atmospheric Cherenkov telescopes. Since October 2011, the FACT collaboration has successfully been showing the application and reliability of silicon photo multipliers for earth-bound gamma-ray astronomy. The amount of data collected by modern Cherenkov telescopes poses big challenges for the data storage and the data analysis. The challenges range from domain specific physics aspects, such as finding good filtering algorithms/parameters for background rejection, to scalability issues, requiring analytical software to be scaled to large clusters of compute nodes for an effective real-time analysis.Modern cluster environments, which emerged from the Big Data community, aim at distributed data storage with a strong emphasis on data locality and fault-tolerant computing. These clusters perfectly match the requirements of modern data-driven physics experiments. However, their programming demands expert knowledge to gain the full performance advantages at the user level. In a joint effort of physicists and computer scientists we targeted this area of conflict using the generic streams framework, a pluggable data processing environment developed at the Collaborative Research Center SFB-876. Using streams allows for the high-level design of analytical data flows, while maintaining compatibility to large scale streaming platforms. This enables physicists to develop and test new algorithms in a local environment and deploy their solutions on modern compute clusters without adaptions. Using the $\backslash$streams framework, we built a processing library for designing a data pipeline for the FACT telescope. The resulting FACT Tools provide a rapid-prototyping environment for the development of any data processing stage required within FACT. The toolsuite supports reading raw camera output and applying various data cleaning and feature extraction stages. The integration of popular machine learning libraries additionally supplies smart filtering of relevant events to suppress background noise. The abstract modeling of data pipelines allow for an efficient data processing on large scale clusters within the Apache Hadoop ecosystem.
READ FULL TEXT