A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform

07/20/2020
by   Mario Randazzo, et al.
0

Indexing sequence data is important in the context of Precision Medicine, where large amounts of “omics” data have to be daily collected and analyzed in order to categorize patients and identify the most effective therapies. Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2021

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

With the rapid growth of Next Generation Sequencing (NGS) technologies, ...
research
12/09/2022

CopAS: A Big Data Forensic Analytics System

With the advancing digitization of our society, network security has bec...
research
07/04/2018

Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics

Distributed approaches based on the map-reduce programming paradigm have...
research
07/21/2018

Integrated IoT and Cloud Environment for Fingerprint Recognition

Big data applications involving the analysis of large datasets becomes a...
research
09/21/2018

S3BD: Secure Semantic Search over Encrypted Big Data in the Cloud

Cloud storage is a widely utilized service for both personal and enterpr...
research
09/22/2019

Cutting the Unnecessary Long Tail: Cost-Effective Big Data Clustering in the Cloud

Clustering big data often requires tremendous computational resources wh...
research
12/03/2020

WedgeChain: A Trusted Edge-Cloud Store With Asynchronous (Lazy) Trust

We propose WedgeChain, a data store that spans both edge and cloud nodes...

Please sign up or login with your details

Forgot password? Click here to reset