An Adapter Architecture for Heterogeneous Data Processing in Bioinformatics Pipelines

03/06/2022
by   Dulani Meedeniya, et al.
0

Bioinformatics is a growing field focused on both the domains of computer science and biology. A range of bioinformatics data processing tools exists at present, which takes inputs and produces outputs in varying formats depending on the algorithms and processes being used. The undesirable situation where such processes would produce outputs that may not allow the pipelining of other processes, calls for a generic bioinformatics data format converter. Though such converters currently exist, most of them are limited to text conversions and provide limited functionality. In addition, such functions have the potential capability of supporting parallelism to increase the overall throughput. A solution that can provide the said conversion functions as well as utility functions, while processing with a high throughput via parallelism is proposed through this paper. A utility function of this system requires storing bioinformatics data locally. In addition to facilitating this, an average reduction of size by 40% is achieved in data storage. Evaluation of the system using a set of 7,000,000 gene data showed the maximum time consumption for retrieval as 400ms.

READ FULL TEXT
research
09/08/2017

On-Disk Data Processing: Issues and Future Directions

In this paper, we present a survey of "on-disk" data processing (ODDP). ...
research
12/12/2017

Real-time Text Analytics Pipeline Using Open-source Big Data Tools

Real-time text processing systems are required in many domains to quickl...
research
05/24/2018

PRINS: Resistive CAM Processing in Storage

Near-data in-storage processing research has been gaining momentum in re...
research
12/06/2018

K-Pg: Shared State in Differential Dataflows

Many of the most popular scalable data-processing frameworks are fundame...
research
11/05/2019

An Efficient Word Lookup System by using Improved Trie Algorithm

Efficiently word storing and searching is an important task in computer ...
research
10/25/2021

WOLF: A modular estimation framework for robotics based on factor graphs

This paper introduces WOLF, a C++ estimation framework based on factor g...
research
09/01/2020

On Open and Strong-Scaling Tools for Atom Probe Crystallography: High-Throughput Methods for Indexing Crystal Structure and Orientation

Volumetric crystal structure indexing and orientation mapping are key da...

Please sign up or login with your details

Forgot password? Click here to reset