Efficient Scheduling for Scalable Bioinformatics Analysis Platform with Microservices
With the advancement of biology and computer science, amount of bioinformatics data has grown at a rapid rate. Due to this increasing demand for performance and testing of new algorithms, bioinformaticians are required to maintain efficient technological infrastructures. Hence, adoption of such novel technologies is necessary to cater the increasing demand of the industry. Furthermore, it is imperative to increase the productivity of the existing systems and at the same time execute large jobs associated with the domain. Various scheduling techniques ranging from classic First Come First Serve to the latest cloud technologies such as MapReduce can be used to execute these jobs in parallel. The work presented in this paper demonstrates an optimized platform to support the execution of various bioinformatics computations that deal with massively large datasets. This platform comprises of a MapReduce model that adopt multilevel feedback queue algorithm in scheduling such large-scale, time-consuming jobs parallel in a multicore processor. A broad comparison of existing common scheduling algorithms is conducted, to identify the most suitable scheduling algorithm. The paper also presents the performance evaluation results of the proposed solution with a range of biological sequences and algorithms as inputs. The time efficiency of the proposed solution has a x18 improvement over general First Come First Serve algorithm, for processing 1000 sequences while it gives 10x improvement at 10000 sequences, dropping again to 3x at 50000. Multilevel sequence alignment tools that are not optimized for GPU parallelism are benefited mostly from our solution.
READ FULL TEXT 
  
  
     share
 share