The archive solution for distributed workflow management agents of the CMS experiment at LHC

01/11/2018
by   Valentin Kuznetsov, et al.
0

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate O(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

A Bespoke Workflow Management System for Data-Driven Urgent HPC

In this paper we present a workflow management system which permits the ...
research
03/06/2022

A Realtime Monitoring Platform for Workflow Subroutines

With the advancement in distributed computing, workflow management syste...
research
01/18/2023

A Workflow Model for Holistic Data Management and Semantic Interoperability in Quantitative Archival Research

Archival research is a complicated task that involves several diverse ac...
research
03/30/2020

A Framework for Online Investment Algorithms

The artificial segmentation of an investment management process into a w...
research
10/10/2020

Designing for Recommending Intermediate States in A Scientific Workflow Management System

To process a large amount of data sequentially and systematically, prope...
research
04/15/2018

Applying Distributed Ledgers to Manage Workflow Provenance

Sharing provenance across workflow management systems automatically is n...
research
01/22/2019

Adapting The Secretary Hiring Problem for Optimal Hot-Cold Tier Placement under Top-K Workloads

Top-K queries are an established heuristic in information retrieval. Thi...

Please sign up or login with your details

Forgot password? Click here to reset