copMEM: Finding maximal exact matches via sampling both genomes

05/22/2018
by   Szymon Grabowski, et al.
0

Genome-to-genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single-threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using less than 10 GB of RAM memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology...
research
04/06/2020

SOPanG 2: online searching over a pan-genome without false positives

The pan-genome can be stored as elastic-degenerate (ED) string, a recent...
research
11/15/2018

Vectorized Character Counting for Faster Pattern Matching

Many modern sequence alignment tools implement fast string matching usin...
research
05/03/2022

Computing Maximal Unique Matches with the r-index

In recent years, pangenomes received increasing attention from the scien...
research
07/19/2012

Quick HyperVolume

We present a new algorithm to calculate exact hypervolumes. Given a set ...
research
03/04/2020

Minimum Enclosing Parallelogram with Outliers

We study the problem of minimum enclosing parallelogram with outliers, w...
research
12/02/2014

Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

The increased affordability of whole genome sequencing has motivated its...

Please sign up or login with your details

Forgot password? Click here to reset