DeepAI
Log In Sign Up

Mining Scientific Workflows for Anomalous Data Transfers

03/22/2021
by   Huy Tu, et al.
0

Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network performance is crucial to ensure reliable and efficient workflow execution. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection.

READ FULL TEXT
03/17/2022

The Analysis of Online Event Streams: Predicting the Next Activity for Anomaly Detection

Anomaly detection in process mining focuses on identifying anomalous cas...
12/28/2020

Detecting Anomalous line-items by Modeling the Legal Case Lifecycle

Anomaly detection continues to be the subject of research and developmen...
06/04/2020

Portability of Scientific Workflows in NGS Data Analysis: A Case Study

The analysis of next-generation sequencing (NGS) data requires complex c...
02/22/2019

Bayesian Anomaly Detection and Classification

Statistical uncertainties are rarely incorporated in machine learning al...
08/31/2020

Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

Because of the limits input/output systems currently impose on high-perf...
08/16/2020

In-situ Workflow Auto-tuning via Combining Performance Models of Component Applications

In-situ parallel workflows couple multiple component applications, such ...
02/12/2018

client2vec: Towards Systematic Baselines for Banking Applications

The workflow of data scientists normally involves potentially inefficien...