BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

01/11/2018
by   Maria Luiza Mondelli, et al.
0

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98 the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.

READ FULL TEXT
research
10/06/2022

WfBench: Automated Generation of Scientific Workflow Benchmarks

The prevalence of scientific workflows with high computational demands c...
research
10/10/2020

Designing for Recommending Intermediate States in A Scientific Workflow Management System

To process a large amount of data sequentially and systematically, prope...
research
08/17/2023

Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability

Modern large-scale scientific discovery requires multidisciplinary colla...
research
03/06/2022

Managing Complex Workflows in Bioinformatics - An Interactive Toolkit with GPU Acceleration

Bioinformatics research continues to advance at an increasing scale with...
research
02/14/2019

Theory-plus-code documentation of the DEPAM workflow for soundscape description

In the Big Data era, the community of PAM faces strong challenges, inclu...
research
06/19/2023

DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service

The Serverless Computing is becoming increasingly popular due to its eas...
research
07/25/2018

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

The current landscape of scientific research is widely based on modeling...

Please sign up or login with your details

Forgot password? Click here to reset