Flow-Bench: A Dataset for Computational Workflow Anomaly Detection

06/16/2023
by   George Papadimitriou, et al.
0

A computational workflow, also known as workflow, consists of tasks that must be executed in a specific order to attain a specific goal. Often, in fields such as biology, chemistry, physics, and data science, among others, these workflows are complex and are executed in large-scale, distributed, and heterogeneous computing environments that are prone to failures and performance degradations. Therefore, anomaly detection for workflows is an important paradigm that aims to identify unexpected behavior or errors in workflow execution. This crucial task to improve the reliability of workflow executions must be assisted by machine learning-based techniques. However, such application is limited, in large part, due to the lack of open datasets and benchmarking. To address this gap, we make the following contributions in this paper: (1) we systematically inject anomalies and collect raw execution logs from workflows executing on distributed infrastructures; (2) we summarize the statistics of new datasets, as well as a set of open datasets, and provide insightful analyses; (3) we benchmark unsupervised anomaly detection techniques by converting workflows into both tabular and graph-structured data. Our findings allow us to examine the effectiveness and efficiencies of the benchmark methods and identify potential research opportunities for improvement and generalization. The dataset and benchmark code are available online with MIT License for public usage.

READ FULL TEXT
research
03/22/2021

Mining Scientific Workflows for Anomalous Data Transfers

Modern scientific workflows are data-driven and are often executed on di...
research
08/31/2020

Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

Because of the limits input/output systems currently impose on high-perf...
research
06/08/2022

A Comprehensive Survey of Graph-based Deep Learning Approaches for Anomaly Detection in Complex Distributed Systems

Anomaly detection is an important problem for complex distributed system...
research
12/20/2022

Resonant Anomaly Detection with Multiple Reference Datasets

An important class of techniques for resonant anomaly detection in high ...
research
06/08/2023

Scalable and Adaptive Log-based Anomaly Detection with Expert in the Loop

System logs play a critical role in maintaining the reliability of softw...
research
11/15/2017

Modular Resource Centric Learning for Workflow Performance Prediction

Workflows provide an expressive programming model for fine-grained contr...
research
10/13/2022

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

Out-of-distribution (OOD) detection is vital to safety-critical machine ...

Please sign up or login with your details

Forgot password? Click here to reset