A Safety Framework for Flow Decomposition Problems via Integer Linear Programming

01/30/2023
by   Fernando H. C. Dias, et al.
0

Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, minimum flow decomposition. We obtain our results by developing a "safety test" for paths based on a general Integer Linear Programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure. Results: Experimental results on the transcriptome datasets of Shao and Kingsford (TCBB, 2017) show that all safe paths for minimum flow decompositions correctly recover up to 90 more than previously known safe paths, such as (Caceres et al. TCBB, 2021), (Zheng et al., RECOMB 2021), (Khan et al., RECOMB 2022, ESA 2022). Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8 of the over 27,000 non-trivial graphs of this dataset in only 1.5 hours. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem. Availability: https://github.com/algbio/mfd-safety

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Minimum Flow Decomposition in Graphs with Cycles using Integer Linear Programming

Minimum flow decomposition (MFD) – the problem of finding a minimum set ...
research
05/17/2023

Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

The graph traversal edit distance (GTED) is an elegant distance measure ...
research
01/25/2022

Safety and Completeness in Flow Decompositions for RNA Assembly

Decomposing a network flow into weighted paths has numerous applications...
research
02/12/2021

Safety of Flow Decompositions in DAGs

Network flows are one of the most studied combinatorial optimization pro...
research
07/09/2020

Safety in s-t Paths, Trails and Walks

Given a directed graph G and a pair of nodes s and t, an s-t bridge of G...
research
02/24/2020

From omnitigs to macrotigs: a linear-time algorithm for safe walks – common to all closed arc-coverings of a directed graph

A partial solution to a problem is called safe if it appears in all solu...
research
07/07/2021

A new metaheuristic approach for the art gallery problem

In the problem "Localization and trilateration with the minimum number o...

Please sign up or login with your details

Forgot password? Click here to reset