Joint Overlap Analysis of Multiple Genomic Interval Sets

12/30/2018
by   Burcak Otlu, et al.
0

Next-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets. We proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel. We developed these methods as a tool, Joint Overlap Analysis (JOA), which takes n interval sets and finds overlapping intervals with no constraints on the given intervals. The proposed indexed segment tree forest is a novel composite data structure, which leverages on indexing and natural binning of a segment tree. We also presented construction and search algorithms for this novel data structure. We compared JOA ST and JOA ISTF with each other, and with other interval intersection tools for verification of its correctness and for showing that it attains comparable execution times. We implemented JOA in Java using the fork/join framework which speeds up parallel processing by taking advantage of all available processor cores. We compared JOA ST with JOA ISTF and showed that segment tree and indexed segment tree forest methods are comparable with each other in terms of execution time and memory usage. We also carried out execution time comparison analysis for JOA and other tools and demonstrated that JOA has comparable execution time and is able to further reduce its running time by using more processors per node. JOA can be run using its GUI or as a command line tool. JOA is available with source code at https://github.com/burcakotlu/JOA/. A user manual is provided at https://joa.readthedocs.org

READ FULL TEXT
research
07/14/2018

A Simple and Space Efficient Segment Tree Implementation

The segment tree is an extremely versatile data structure. In this paper...
research
02/12/2020

Parameterized Complexity of Two-Interval Pattern Problem

A 2-interval is the union of two disjoint intervals on the real line. Tw...
research
05/08/2018

The interval number of a planar graph is at most three

The interval number of a graph G is the minimum k such that one can assi...
research
07/17/2012

Computation of the Hausdorff distance between sets of line segments in parallel

We show that the Hausdorff distance for two sets of non-intersecting lin...
research
11/01/2017

Determination of Checkpointing Intervals for Malleable Applications

Selecting optimal intervals of checkpointing an application is important...
research
04/22/2021

HINT: A Hierarchical Index for Intervals in Main Memory

Indexing intervals is a fundamental problem, finding a wide range of app...
research
11/07/2019

Parallel Data Distribution Management on Shared-Memory Multiprocessors

The problem of identifying intersections between two sets of d-dimension...

Please sign up or login with your details

Forgot password? Click here to reset