Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

07/07/2021
by   Ylenia Galluzzo, et al.
0

With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here we propose algorithms for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our algorithms are the first ones that distribute the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform

Indexing sequence data is important in the context of Precision Medicine...
research
12/09/2022

CopAS: A Big Data Forensic Analytics System

With the advancing digitization of our society, network security has bec...
research
07/04/2018

Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics

Distributed approaches based on the map-reduce programming paradigm have...
research
02/14/2018

Classification of Scientific Papers With Big Data Technologies

Data sizes that cannot be processed by conventional data storage and ana...
research
11/18/2021

A Secure Experimentation Sandbox for the design and execution of trusted and secure analytics in the aviation domain

The aviation industry as well as the industries that benefit and are lin...
research
05/29/2019

Designing and Implementing Data Warehouse for Agricultural Big Data

In recent years, precision agriculture that uses modern information and ...
research
09/19/2018

The Read-Optimized Burrows-Wheeler Transform

The advent of high-throughput sequencing has resulted in massive genomic...

Please sign up or login with your details

Forgot password? Click here to reset