Optimal Construction of Hierarchical Overlap Graphs

02/04/2021
by   Shahbaz Khan, et al.
0

Genome assembly is a fundamental problem in Bioinformatics, where for a given set of overlapping substrings of a genome, the aim is to reconstruct the source genome. The classical approaches to solving this problem use assembly graphs, such as de Bruijn graphs or overlap graphs, which maintain partial information about such overlaps. For genome assembly algorithms, these graphs present a trade-off between overlap information stored and scalability. Thus, Hierarchical Overlap Graph (HOG) was proposed to overcome the limitations of both these approaches. For a given set P of n strings, the first algorithm to compute HOG was given by Cazaux and Rivals [IPL20] requiring O(||P||+n^2) time using superlinear space, where ||P|| is the cummulative sum of the lengths of strings in P. This was improved by Park et al. [SPIRE20] to O(||P||log n) time and O(||P||) space using segment trees, and further to O(||P||log n/loglog n) for the word RAM model. Both these results described an open problem to compute HOG in optimal O(||P||) time and space. In this paper, we achieve the desired optimal bounds by presenting a simple algorithm that does not use any complex data structures.

READ FULL TEXT

page 1

page 3

page 9

research
02/25/2021

A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

The hierarchical overlap graph (HOG) is a graph that encodes overlaps fr...
research
11/25/2020

Genome assembly, a universal theoretical framework: unifying and generalizing the safe and complete algorithms

Genome assembly is a fundamental problem in Bioinformatics, requiring to...
research
12/13/2017

Closing in on Time and Space Optimal Construction of Compressed Indexes

Fast and space-efficient construction of compressed indexes such as comp...
research
11/10/2020

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast am...
research
02/13/2018

Hierarchical Overlap Graph

Given a set of finite words, the Overlap Graph (OG) is a complete weight...
research
06/26/2023

Prefix-free graphs and suffix array construction in sublinear space

A recent paradigm shift in bioinformatics from a single reference genome...
research
10/13/2022

Fast genomic optical map assembly algorithm using binary representation

Reducing the cost of sequencing genomes provided by next-generation sequ...

Please sign up or login with your details

Forgot password? Click here to reset