Managing Complex Workflows in Bioinformatics - An Interactive Toolkit with GPU Acceleration

by   Dulani Meedeniya, et al.

Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in highperformance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of ×2.89 magnitude by utilizing GPUs and gains in speed by average ×2.832 magnitude (over n =5 scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved ×1.71 times for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.


page 1

page 3

page 4

page 5

page 7


An Interactive Workflow Generator to Support Bioinformatics Analysis through GPU Acceleration

Next Generation Sequencing has introduced novel means of sequencing mill...

Distributed In-memory Data Management for Workflow Executions

Complex scientific experiments from various domains are typically modele...

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Advances in sequencing techniques have led to exponential growth in biol...

Scheduling Algorithms for Efficient Execution of Stream Workflow Applications in Multicloud Environments

Big data processing applications are becoming more and more complex. The...

GPU coprocessors as a service for deep learning inference in high energy physics

In the next decade, the demands for computing in large scientific experi...

Analyzing the HCP Datasets using GPUs: The Anatomy of a Science Engagement

This paper documents the experience improving the performance of a data ...

Accelerated Quality-Diversity for Robotics through Massive Parallelism

Quality-Diversity (QD) algorithms are a well-known approach to generate ...