Efficient Scheduling for Scalable Bioinformatics Analysis Platform with Microservices

03/06/2022
by   Dulani Meedeniya, et al.
0

With the advancement of biology and computer science, amount of bioinformatics data has grown at a rapid rate. Due to this increasing demand for performance and testing of new algorithms, bioinformaticians are required to maintain efficient technological infrastructures. Hence, adoption of such novel technologies is necessary to cater the increasing demand of the industry. Furthermore, it is imperative to increase the productivity of the existing systems and at the same time execute large jobs associated with the domain. Various scheduling techniques ranging from classic First Come First Serve to the latest cloud technologies such as MapReduce can be used to execute these jobs in parallel. The work presented in this paper demonstrates an optimized platform to support the execution of various bioinformatics computations that deal with massively large datasets. This platform comprises of a MapReduce model that adopt multilevel feedback queue algorithm in scheduling such large-scale, time-consuming jobs parallel in a multicore processor. A broad comparison of existing common scheduling algorithms is conducted, to identify the most suitable scheduling algorithm. The paper also presents the performance evaluation results of the proposed solution with a range of biological sequences and algorithms as inputs. The time efficiency of the proposed solution has a x18 improvement over general First Come First Serve algorithm, for processing 1000 sequences while it gives 10x improvement at 10000 sequences, dropping again to 3x at 50000. Multilevel sequence alignment tools that are not optimized for GPU parallelism are benefited mostly from our solution.

READ FULL TEXT
research
03/06/2022

A Scalable Bioinformatics Analysis Platform based on Microservices Architecture

With the advancement of technologies, web services play a significant ro...
research
01/22/2018

Adaptive parallelism with RMI: Idle high-performance computing resources can be completely avoided

In practice, standard scheduling of parallel computing jobs almost alway...
research
03/21/2019

Exploiting Promising Sub-Sequences of Jobs to solve the No-Wait Flowshop Scheduling Problem

The no-wait flowshop scheduling problem is a variant of the classical pe...
research
10/11/2022

Parallel solutions for preemptive makespan scheduling on two identical machines

We consider online preemptive scheduling of jobs arriving one by one, to...
research
04/05/2022

Streaming Algorithms for Multitasking Scheduling with Shared Processing

In this paper, we design the first streaming algorithms for the problem ...
research
10/28/2020

Benchmarking Parallelism in FaaS Platforms

Serverless computing has seen a myriad of work exploring its potential. ...
research
09/01/2019

Improving the Effective Utilization of Supercomputer Resources by Adding Low-Priority Containerized Jobs

We propose an approach to utilize idle computational resources of superc...

Please sign up or login with your details

Forgot password? Click here to reset