Permutation patterns in genome rearrangement problems

08/08/2018
by   Giulio Cerbai, et al.
UNIFI
0

In the context of the genome rearrangement problem, we analyze two well known models, namely the block transposition and the prefix block transposition models, by exploiting the connection with the notion of permutation pattern. More specifically, for any k, we provide a characterization of the set of permutations having distance ≤ k from the identity (which is known to be a permutation class) in terms of what we call generating permutations and we describe some properties of its basis, which allow to compute such a basis for small values of k.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

03/20/2019

Permutation patterns in genome rearrangement problems: the reversal model

In the context of the genome rearrangement problem, we analyze two well ...
08/31/2020

Sorting by Prefix Block-Interchanges

We initiate the study of sorting permutations using prefix block-interch...
10/17/2019

Algorithmic coincidence classification of mesh patterns

We review and extend previous results on coincidence of mesh patterns. W...
07/09/2019

Block-avoiding point sequencings of arbitrary length in Steiner triple systems

An ℓ-good sequencing of an STS(v) is a permutation of the points of the ...
05/10/2022

Upper Bounds to Genome Rearrangement Problem using Prefix Transpositions

A Genome rearrangement problem studies large-scale mutations on a set of...
05/20/2019

Prefix Block-Interchanges on Binary and Ternary Strings

The genome rearrangement problem computes the minimum number of operatio...
10/12/2021

On Permutation Invariant Problems in Large-Scale Inference

Simultaneous statistical inference problems are at the basis of almost a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One of the major trends in bioinformatics and biomathematics is the study of the genome rearrangement problem. Roughly speaking, given a genome, one is interested in understanding how the genome can evolve into another genome. To give a proper formalization, several models for rearranging a genome have been introduced, each of which defines a series of allowed elementary operations to be performed on a genome in order to obtain an adjacent one. For several models, it is possible to define a distance between two genomes, by counting the minimum number of elementary operations needed to transform one genome into the other. The investigation of the main properties of such a distance becomes then a key point in understanding the main features of the model under consideration.


A common formalization of any such models consists of encoding a genome using a permutation (in linear notation) and describing an elementary operation as a combinatorial operation on the entries of such a permutation. Many genome rearrangement models have been studied under this general framework. Among them, the following ones are very well known.

  • The reversal model consists of a single operation, defined as follows: a new permutation is obtained from a given one by selecting a cluster of consecutive elements and reversing it. More formally, given , a reversal is performed by choosing and then forming the permutation . This model was introduced in [WEHM82], then studied for instance in [BP93, HP99].

  • A variant of the reversal model is the prefix reversal model, which is a specialization of the previous one in which the reversal operation can only be performed on a prefix of the given permutation. This is clearly an easier model to investigate, which is also known as pancake sorting (see for instance [GP79]).

  • A very popular and studied model is the transposition model, see [BP98]. Given a permutatation , a transposition operation consists of taking two adjacent clusters of consecutive elements and interchanging their positions. Formally, one has to choose indices , then form the permutation .

  • As for the reversal, also for the transposition model there is a “prefix variant”. In the prefix transposition model the leftmost block of elements to interchange is a prefix of the permutation. Sorting by prefix transposition is studied in [DM02].

Independently from the chosen model, there are some general questions that can be asked in order to gain a better understanding of its combinatorial properties. First of all, the operations of the model often (but not always) allow to define a distance between two permutations and , as the minimum number of elementary operations needed to transform into . Moreover, when the operations are nice enough, the above distance could even be left-invariant, meaning that, given permutations (of the same length), . As a consequence, choosing for instance , the problem of evaluating the distance reduces to that of sorting with the minimum number of elementary allowed operations. Now, if is a left-invariant distance on the set of all permutations of the same length, define the -ball of to be the set , where is the identity permutation of length . The following questions are quite natural to ask:

  • compute the diameter of , i.e. the maximum distance between two permutations of ;

  • compute the diameter of , i.e. the maximum distance between two permutations of ;

  • characterize the permutations of , i.e. the permutations of having maximum distance from the identity;

  • characterize the permutations of , i.e. the permutations of having maximum distance from the identity;

  • characterize and enumerate the permutations of ;

  • design sorting algorithms and study the related complexity issues.

In the literature there are several results, concerning several evolution models, which give some insight into the above problems. Our work starts from the observation that, in many cases, the balls can be characterized in terms of pattern avoidance. Recall that, given two permutation and , with , we say that is a pattern of when there exist such that as a permutation is isomorphic to (which means that are in the same relative order as the elements of ). This notion of pattern in permutation defines an obvious partial order, and the resulting poset is known as the permutation pattern poset. When is not a pattern of , we say that avoids . A down-set (also called a permutation class) of the permutation pattern poset can be described in terms of its minimal excluded permutations (or, equivalently, the minimal elements of the complementary up-set): these permutations are called the basis of . The idea of studying the balls in terms of pattern avoidance is not new. As far as we know, the first model which has been investigated from this point of view is the (whole) tandem duplication-random loss model: Bouvel and Rossin [BR09] have in fact shown that, in such a model, the ball is a class of pattern avoiding permutations, whose basis is the set of minimal permutations having descents (here minimal is intended in the permutation pattern order). Subsequent works [BP10, BF13, CGM11] have been done concerning the enumeration of the basis permutations of such classes. More recently, Homberger and Vatter [HV16] described an algorithm for the enumeration of any polynomial permutation class, which can be fruitfully used for all the above mentioned distances, since the resulting balls are indeed polynomial classes. However, their results do not allow to find information on the basis of the classes.

In the present work we try to enhance what have been obtained in [HV16] in two directions. First, we aim at giving a structural characterization of the balls for some of the above distances, thus complementing the results in [HV16], which is more concerned with computational issues. Second, we provide some insight on the properties of the bases of such balls, hoping to gain a better understanding of them. We will be mainly concerned with the block transposition and the prefix block transposition models, leaving the reversal models to a future paper.


Some of the results of the present work are contained in the MSc thesis of the first author [Cer17].

2 Block transposition

Among the four above mentioned models, the block transposition one is probably the hardest to investigate.

Denoting with the transposition distance, the permutation class can be described in terms of its generating permutations.

A strip of is a maximal consecutive substring such that, for all , .

A permutation is said to be plus irreducible [AS02] when, for all , . In other words, is a plus irreducible permutation when it does not have points that are adjacent both in positions and values, with values increasing. Equivalently, a permutation is plus irreducible if and only if all of its strips have length .

Any permutation can be associated with a plus irreducible permutation, denoted , which is obtained by replacing each strip of with its minimum element, then suitably rescaling the resulting word. For instance, if , then . It is easy to observe that in the permutation pattern order. Moreover, for every permutation , we have that .

Given , let be nonnegative integers. The monotone inflation of through is the permutation obtained from by replacing each element of with the identity permutation of length suitably rescaled, so to mantain the relative order of the elements of . So, for instance, if and , we have . In the following we will denote with the set of all monotone inflations of a permutation and with the set , for a given set of permutations . The notion of monotone inflation is clearly related to that of geometric grid class [AABRV13]. More specifically, given a -matrix and denoting with the geometric grid class of permutations determined by , if is a permutation and is its permutation matrix, then it is not difficult to realize that:

  1. ;

  2. ;

  3. .

Now define a permutation to be generating for when it is a maximal plus irreducible permutation of . The set of all generating permutations for will be called the generating set of . We thus have the following fact, whose easy proof is omitted.

Proposition 2.1.

For every , .

A very natural description of the balls is then provided by its generating set. This is our first open problem.


Open problem.Characterize the generating permutations of , for every .


For instance, it is easy to realize that . In the following, we will provide a structural description of the generating permutations of for a generic .

Our approach is based on the observation that a generating permutation for is a plus irreducible permutation, and so it will be convenient to work inside the poset of plus irreducible permutations, seen as a subposet of the classical permutation pattern poset (notice that this is also a subposet of the poset of peg permutations, as defined in [HV16]). In passing, we remark that the enumeration of plus irreducible permutations is well known: denoting with the number of plus irreducible permutations of length , we have the recurrence relation

for . With initial conditions , we get for the exponential generating function and the closed form . This is sequence A000255 in [S], see also [AAB07].


Suppose is a plus irreducible permutation of length in the generating set of . Inflate by choosing three (not necessarily distinct) indices and replacing and by strips of suitable lengths, as follows:

  • if the three indices are all distinct, take strips of length 2;

  • if two of the indices are equal, take the associated strip of length 3;

  • if all indices are equal, take a strip of length 4.

If is the multiset of the selected indices, the resulting permutation will be denoted . Now observe that, in all of the above cases, there exists a unique block transposition that breaks all the new strips in such a way that, in the resulting permutation, each pair of adjacent elements of a new strip becomes either nonadjacent or adjacent in the reverse way; more specifically, is the transposition with indices . We call the permutation obtained from by applying . As an example, consider the permutation , and the multiset of indices ; then we get and .

The following lemma gives some basic properties of the above described construction that will be useful in the sequel. The proof is easy and so left to the reader.

Lemma 2.2.

Let be a plus irreducible permutation of length and a multiset of indices of of cardinality 3. Then is a plus irreducible permutation of length ; moreover, if and , then and .

We are now ready to give an explicit description of the generating set of the ball . Such a result will be preceded by a technical proposition (stated without proof) which gives a recipe to recursively construct the set of permutations which are obtainable by means of a single block transposition. In the proof of the next theorem we also need the definition of breakpoint, which can be found for instance in [FLRTV09]. Given a permutation , a breakpoint of is an integer such that . By convention, and are breakpoints whenever and , respectively.

Proposition 2.3.

Let be the set of all multisets of cardinality 3 of . For every plus irreducible permutation , denote with the set of all permutations which can be obtained with a single block transposition from any permutation of . Then

Theorem 2.4.

For every , the generating set of is the set of all plus irreducible permutations of length and having distance from the identity.

Proof.

We start by showing that there exists a finite number of permutations which are plus irreducible, of length and at distance from the identity, such that . We can proceed by induction on . When , we have already observed that , and 1324 is plus irreducible, has length 3+1=4 and has distance 1 from the identity 1234. Now consider a permutation ; this means, in particular, that there is a permutation such that is obtained from by a single block transposition. Thus, using the induction hypothesis, we can say that there exists a plus irreducible permutation of length and having distance from the identity such that . By Proposition 2.3, there exists such that . Notice that is finite and that, by Lemma 2.2, is plus irreducible and has length ; so what remains to prove is that has distance from the identity. Clearly . On the other hand, since Lemma 2.2 implies that starts with 1 and ends with , recalling that is plus irreducible, we have that has exactly breakpoints, since the only indices that are not breakpoints are and . Therefore, denoting with the number of breakpoints of , since (this follows from an observation in [BP98]), we have that

as desired.

To conclude the proof we now have to show that any plus irreducible permutation of length and having distance from the identity is a generating permutation of . In fact, since , is the monotone inflation of some generating permutation of . Therefore is a plus irreducible permutation of length at distance from the identity. So in particular and have the same length, which means that necessarily . ∎

The above theorem allows to design a procedure to list the generating set of : starting from the identity of length , perform repeated monotone inflations as in Proposition 2.3 (for times) so to obtain all generating permutations of . This is similar to the approach used in [HV16].

For instance, when , the generating set for consists of the eleven permutations 1324657, 1352647, 1354627, 1364257, 1426357, 1436527, 1462537, 1524637, 1536247, 1624357, 1632547.

Notice however that, in this way, it is possible to obtain the same generating permutation several times, so in the list of permutations given in output by the above procedure one has to remove duplicates. This is the main reason for which the described approach is not useful for enumerating the generating set.


Open problem.Enumerate the generating permutations of , for every .


A very interesting information that we can get on concerns its basis. We start by recalling that monotone inflations are particular geometric grid classes [AABRV13]; as a consequence, the general theory of geometric grid classes allows us to say that is a permutation class having finite basis (and also that it is strongly rational, meaning that its generating function is rational, together with the generating functions of all of its subclasses). What we are able to do is to provide a nontrivial upper bound to the length of the basis elements, which is clearly of great help in effectively computing the basis itself.

Theorem 2.5.

Every permutation belonging to the basis of has length at most .

Proof.

We start by observing that, given basis permutation of , if were not plus irreducible, then necessarily and we have already observed that ; so would not be minimal among the permutations at distance from the identity, which is impossible. Therefore we can assert that all basis permutations of are plus irreducible.

Now it is easy to prove that every basis permutation has length at most . Indeed, it is not difficult to show that a plus irreducible permutation of length contains as a pattern at least one permutation of length that is plus irreducible as well. Thus, if were a basis permutation having length greater than , then, in the poset of plus irreducible permutations, there would exist at least one plus irreducible permutation of length greater than such that . Since all generating permutations of have length , cannot belong to , which is not possible since is a basis permutation.

Moreover, suppose that is a permutation in the basis of . First of all we have that and , since otherwise we could remove or thus obtaining a smaller permutation having the same distance from the identity, against the minimality of . If had length , since is plus irreducible, there would exist which is plus irreducible as well. Since is minimal in the complement of , necessarily . This would imply that is a generating permutation of . This is however impossible, since it would be and by Lemma 2.2 and the construction showed in Theorem 2.4, and for what we have proved above, and is obtained from by removing a single entry. ∎

The above theorem also suggest a procedure to determine the basis of . In the poset of plus irreducible permutations, consider the set of permutations of length which are not generating. For each of them (say ), consider the set of permutation of length covered by it: if all of them are also below some generating permutation of , then is in the basis of . Otherwise, just repeat the same procedure starting from the permutations covered by which do not belong to .

As an instance, we have the following result.

Proposition 2.6.

The basis of is .

Proof.

Since , we perform the above procedure with all permutations of length 4 except . A direct computation shows that the only permutations which cover only elements of are precisely . Moreover, is the unique permutation of length 3 which is not in , and all of its coverings are in , so is in the basis as well. ∎

3 Prefix transposition

If we restrict the block transposition operation to pairs of blocks such that the first one is a prefix of the permutation, we obtain the so-called prefix transposition model. It is clearly a special case of the block transposition model and, as such, it is simpler to analyze. Denoting with the prefix transposition distance, our first goal is to characterize the balls in terms of generating permutations. As a first example, it is easy to see that . The approach we use to determine the generating set is slightly different from the one we have used for the block transposition model. In the present case, we are able to give an explicit construction of the generating set of starting from the generating set of .

Proposition 3.1.

Let be a generating permutation of .

  1. Suppose that , where and and are the subwords of determined by such a decomposition. Then the permutation is a generating permutation of , where are obtained from (respectively) by increasing by 1 all the entries between and and by increasing by 2 all the entries that are greater than .

  2. Suppose that , where and and are the subwords of determined by such a decomposition. Then the permutation is a generating permutation of , where are obtained from (respectively) by increasing by 1 all the entries between and and by increasing by 2 all the entries that are greater than .

  3. Suppose that , where and and are the subwords of determined by such a decomposition. Then the permutation is a generating permutation of , where are obtained from by increasing by 2 all the entries that are greater than .

Proof.

We will give details only for the first case, the remaining two being analogous. Since the prefix transposition model is a special case of the block transposition one, we have that, if is a generating permutation for , we can construct generating permutations for by suitably choosing two elements and of (possibly the same one), then suitably inflating them and performing the prefix transposition operation which exchanges the prefix block ending with with the adjacent block ending with . This is done in analogy with the construction described before Lemma 2.2.

If and precedes in , then we can decompose as . After inflating and , we thus get the permutation , where the elements of and have been renamed, namely all entries greater than and smaller than have been increased by 1 and all entries greater than have been increased by 2. We can now perform the desired prefix transposition, which exchanges the prefix block with the adjacent block , thus obtaining the predicted permutation. ∎

The above proposition gives a recipe for constructing generating permutations of starting from those of . Notice that, if has length , then the permutations obtained with the previous construction have length . Since we have seen that , a simple inductive argument shows that the generating permutations of we have produced all have length . Actually, we have something stronger, which is the analogous of Theorem 2.4 in the case of the prefix transposition model. Since the proof is similar, we just give the statement.

Theorem 3.2.

For every , the generating set of is the set of all plus irreducible permutations of length and having distance from the identity.

However, in this case we can also enumerate the generating sets.

Theorem 3.3.

The generating set of has cardinality .

Proof.

We observe that, if is a generating permutation for , then it has been obtained from a generating permutation of by one of the construction described in Proposition 3.1. However, cannot be obtained in two different ways. This can be shown by considering the elements and (notice that, in this model, a generating permutation cannot start with 1).

  1. If the element on the right of in is larger than or equal to , then is constructed as in 1 of Proposition 3.1.

  2. If the element on the right of in is smaller than or equal to , then is constructed as in 2 of Proposition 3.1.

  3. If the element on the right of in is equal to , then is constructed as in 3 of Proposition 3.1.

Since the three above cases are disjoint, we can conclude that comes from a unique generating permutation of through the construction of Proposition 3.1. Thus, the total number of generating permutations of is obtained by multiplying the number of generating permutations of by the number of possible inflations of each of them, which is equal to the number of multisets of cardinality 2 of a set of cardinality , i.e. . Since the generating set of has cardinality , a simple inductive argument shows that the required cardinality is indeed equal to . ∎

We have already observed that, for , the generating set is . For , the generating set is .

Concerning the basis of , we have been able to prove the analogue of Theorem 2.5, however the proof is slightly more complicated, so we cannot reproduce it entirely here, due to limited space.

Theorem 3.4.

Every permutation belonging to the basis of has length at most .

Proof.

(sketch). The fact that the permutations in the basis of must all have length at most can be proved in a similar way as the first part of Theorem 2.5.

Now suppose that is a basis permutation for of length , and set . Then it can be shown that has to be plus irreducible and that , i.e. the last element of is not its maximum. From a previous observation, we know that it is possible to remove one element of in such a way that the resulting permutation of length is plus irreducible. However, since belongs to the basis of , has to be a generating permutation of . Since it is possible to prove that the last element of any generating permutation of is its maximum, there are only two possibilities: either the last element of is and is obtained by removing , or the second-to-last element of is and is obtained by removing the last element.

Since the two cases are symmetric in a well precise sense, we just consider the first one. Our goal is now to show that we can remove another element from (different from ) and obtain another plus irreducible permutation, which turns out to be a generating permutation: this is however impossible, since it does not ends with its maximum. In many cases, if we remove the last element of , we do obtain a plus irreducible permutation. The only cases in which this does not work are those in which is immediately before in . In such cases, if we remove , we indeed get a plus irreducible permutation, unless is immediately before in . By repeating this argument, we find that we are always able to remove an element different from and obtain a plus irreducible permutation, except for the permutation (recall that , so is even). Also in this last case, however, we can remove 1 from and the permutation thus obtained is easily seen to be plus irreducible. ∎

Thanks to the above bound, we are able also in this case to compute the basis for small values of .

Proposition 3.5.
  1. The basis of is .

  2. The basis of consists of three permutations of length 4, namely 1432, 2143, 4321, and fifteen permutations of length 5, namely 13524, 14253, 24351, 25314, 25413, 35142, 35214, 35241, 41352, 42513, 42531, 43152, 51324, 52413, 53142.

Acknowledgements

Both authors are members of the INdAM Research group GNCS; they are partially supported by INdAM - GNCS 2018 project “Proprietá combinatorie e rilevamento di pattern in strutture discrete lineari e bidimensionali” and by a grant of the ”Fondazione della Cassa di Risparmio di Firenze” for the project ”Rilevamento di pattern: applicazioni a memorizzazione basata sul DNA, evoluzione del genoma, scelta sociale”.

References

  • [AABRV13] M. H. Albert, M. D. Atkinson, M. Bouvel, N. Ruskuc and V. Vatter. Geometric grid classes of permutations. Transactions of the American Mathematical Society, 365:5859–5881, 2013.
  • [AAB07] M. H. Albert, M. D. Atkinson and R. Brignall. Permutation Classes of Polynomial Growth. Annals of Combinatorics, 11:249–264, 2007.
  • [AS02] M. D. Atkinson and T. Stitt. Restricted permutations and the wreath product. Discrete Mathematics, 259:19–36, 2002.
  • [BP93] V. Bafna and P. A. Pevzner. Genome rearrangements and sorting by reversals. 34th Annual Symposium on Foundations of Computer Science (Palo Alto, CA, 1993), IEEE Comput. Soc. Press, Los Alamitos, CA, pp. 148–157, 1993.
  • [BP98] V. Bafna and P. A. Pevzner. Sorting by transpositions. SIAM Journal on Discrete Mathematics, 11:224–240, 1998.
  • [BF13] M. Bouvel and L. Ferrari. On the enumeration of -minimal permutations. Discrete Mathematics and Theoretical Computer Science, 15:33–48, 2013.
  • [BP10] M. Bouvel and E. Pergola. Posets and permutations in the duplication-loss model: minimal permutations with descents. Theoretical Computer Science, 411:2487–2501, 2010.
  • [BR09] M. Bouvel and D. Rossin. A variant of the tandem duplication-random loss model of genome rearrangement. Theoretical Computer Science, 410:847–858, 2009.
  • [Cer17] G. Cerbai. Pattern avoiding permutations in genome rearrangement problems. MSc Thesis, Dipartimento di Matematica e Informatica “U. Dini”, University of Firenze, Italy, 2017.
  • [CCMR06] K. Chaudhuri, K. Chen, R. Mihaescu and S. Rao. On the tandem duplication-random loss model of genome rearrangement. Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, pp. 564–-570, 2006.
  • [CGM11] W. Y. C. Chen, C. C. Y. Gu and K. J. Ma.

    Minimal permutations and 2-regular skew tableaux.

    Advances in Applied Mathematics, 47:795–812, 2011.
  • [DM02] Z. Dias and J. Meidanis. Sorting by prefix transposition. SPIRE2002, Lecture Notes in Computer Science, 2476:65–76, 2002.
  • [FLRTV09] G. Fertin, A. Labarre, I. Rusu, E. Tannier and S. Vialette. Combinatorics of Genome Rearrangements. MIT Press, Cambridge, MA, 2009.
  • [GP79] W. H. Gates and C. H. Papadimitriou. Bounds for sorting by prefix reversal. Discrete Mathematics, 27:47–57, 1979.
  • [HP99] S. Hannenhalli and P. A. Pevzner. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. Journal of the ACM, 46:1–27, 1999.
  • [HV16] C. Homberger and V. Vatter. On the effective and automatic enumeration of polynomial permutation classes. Journal of Symbolic Computation, 76:84–96, 2016.
  • [S] N. J. A. Sloane. The on-line encyclopedia of integer sequences. At oeis.org.
  • [WEHM82] G. A. Watterson, W. J. Ewens, T. Hall and A. Morgan. The chromosome inversion problem. Journal of Theoretical Biology, 99:1–7, 1982.