SnakeLines: integrated set of computational pipelines for sequencing reads

06/25/2021
by   Jaroslav Budis, et al.
0

Background: With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilizing sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centers with inconsistent versions of installed libraries and bioinformatics tools. Results: We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, metagenomics, and methylation analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. Conclusion: SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. Keywords: Computational pipeline, framework, massively parallel sequencing, reproducibility, virtual environment

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2018

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Evaluating the computational reproducibility of data analysis pipelines ...
research
08/28/2019

Comparing Perturbation Models for Evaluating Stability of Post-Processing Pipelines in Neuroimaging

A lack of software reproducibility has become increasingly apparent in t...
research
10/13/2018

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Reproducibility of computational studies is a hallmark of scientific met...
research
08/09/2017

A Collaborative Approach to Computational Reproducibility

Although a standard in natural science, reproducibility has been only ep...
research
03/19/2018

Data provenance tracking as the basis for a biomedical virtual research environment

In complex data analyses it is increasingly important to capture informa...
research
08/16/2022

SIERRA: A Modular Framework for Research Automation and Reproducibility

Modern intelligent systems researchers form hypotheses about system beha...
research
11/08/2017

Boutiques: a flexible framework for automated application integration in computing platforms

We present Boutiques, a system to automatically publish, integrate and e...

Please sign up or login with your details

Forgot password? Click here to reset