Duplication with transposition distance to the root for q-ary strings
We study the duplication with transposition distance between strings of length n over a q-ary alphabet and their roots. In other words, we investigate the number of duplication operations of the form x = (abcd) → y = (abcbd), where x and y are strings and a, b, c and d are their substrings, needed to get a q-ary string of length n starting from the set of strings without duplications. For exact duplication, we prove that the maximal distance between a string of length at most n and its root has the asymptotic order n/log n. For approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximal distance has a sharp transition from the order n/log n to log n at β=(q-1)/q. The motivation for this problem comes from genomics, where such duplications represent a special kind of mutation and the distance between a given biological sequence and its root is the smallest number of transposition mutations required to generate the sequence.
READ FULL TEXT