Safety and Completeness in Flow Decompositions for RNA Assembly

01/25/2022
by   Shahbaz Khan, et al.
0

Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to uniquely identify the correct decomposition. Therefore, we propose to report safe paths, i.e., subpaths of at least one path in every flow decomposition. Ma, Zheng, and Kingsford [WABI 2020] addressed the existence of multiple optimal solutions in a probabilistic framework, i.e., non-identifiability. Later [RECOMB 2021], they gave a quadratic-time algorithm based on a global criterion for solving a problem called AND-Quant, which generalizes the problem of reporting whether a given path is safe. We give the first local characterization of safe paths for flow decompositions in directed acyclic graphs (DAGs), leading to a practical algorithm for finding the complete set of safe paths. We evaluated our algorithms against the trivial safe algorithms (unitigs, extended unitigs) and the popularly used heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. Despite maintaining perfect precision our algorithm reports significantly higher coverage (≈ 50% more) than trivial safe algorithms. The greedy-width algorithm though reporting a better coverage, has significantly lower precision on complex graphs. Overall, our algorithm outperforms (by ≈ 20%) greedy-width on a unified metric (F-Score) when the dataset has significant number of complex graphs. Moreover, it has superior time (3-5×) and space efficiency (1.2-2.2×), resulting in a better and more practical approach for bioinformatics applications of flow decomposition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2021

Safety of Flow Decompositions in DAGs

Network flows are one of the most studied combinatorial optimization pro...
research
01/30/2023

A Safety Framework for Flow Decomposition Problems via Integer Linear Programming

Many important problems in Bioinformatics (e.g., assembly or multi-assem...
research
07/05/2022

Width Helps and Hinders Splitting Flows

Minimum flow decomposition (MFD) is the NP-hard problem of finding a sma...
research
09/22/2018

Minimum Number of Test Paths for Prime Path and other Structural Coverage Criteria

The software system under test can be modeled as a graph comprising of a...
research
08/31/2022

Minimum Flow Decomposition in Graphs with Cycles using Integer Linear Programming

Minimum flow decomposition (MFD) – the problem of finding a minimum set ...
research
09/26/2019

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Belief tracking is a basic problem in planning with sensing. While the p...
research
11/04/2017

Finding branch-decompositions of matroids, hypergraphs, and more

Given n subspaces of a finite-dimensional vector space over a fixed fini...

Please sign up or login with your details

Forgot password? Click here to reset