Minimum Segmentation for Pan-genomic Founder Reconstruction in Optimal Time

05/09/2018
by   Tuukka Norri, et al.
0

Given a threshold L and a set R = {R_1, ..., R_m} of m haplotype sequences, each having length n, the minimum segmentation problem for founder reconstruction is to partition the sequences into disjoint segments R[i_1+1,i_2], R[i_2+1, i_3], ..., R[i_r-1+1, i_r], where 0 = i_1 < ... < i_r = n and R[i_j-1+1, i_j] is the set {R_1[i_j-1+1, i_j], ..., R_m[i_j-1+1, i_j]}, such that the length of each segment, i_j - i_j-1, is at least L and K = _j{ |R[i_j-1+1, i_j]| } is minimized. The distinct substrings in the segments R[i_j-1+1, i_j] represent founder blocks that can be concatenated to form K founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of haplotypes being complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset