    # Mesosome Avoidance

We consider avoiding mesosomes – that is, words of the form xx' with x' a conjugate of x that is different from x – over a binary alphabet. We give a structure theorem for mesosome-avoiding words, count how many there are, characterize all the infinite mesosome-avoiding words, and determine the minimal forbidden words.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Since the early 20th century, with the work of Axel Thue [11, 12], pattern avoidance in words has been a topic of interest. Thue proved that there exists an infinite word over a -letter alphabet having no factor (i.e., a contiguous sub-block) of the form , where is a nonempty word. We say that such a word avoids squares. It is easy to see that every binary word of length has a square, so one cannot avoid squares over a -letter alphabet. Thue also proved that there exists an infinite word over a -letter alphabet avoiding overlaps, that is, factors of the form , where is a single letter and is a possibly empty word.

Since Thue’s pioneering work, many similar avoidance problems have been studied. For example, Erdős  posed the problem of avoiding abelian squares in words—i.e., words of the form where is a permutation of —and Keränen  showed that it is possible to avoid abelian squares over a -letter alphabet; the alphabet size is optimal. In , the authors studied words avoiding , where is a cyclic shift of the underlying alphabet. In , the authors studied words avoiding , where is an arbitrary nonerasing morphism. In , the authors studied words avoiding where is “nearly” identical to . A classic problem, due independently to Justin , Brown and Freedman [1, Conjecture, pp. 595–596], Pirillo and Varricchio , and Halbeisen and Hungerbühler  and still unsolved, asks if it is possible to avoid additive squares (words of the form where and ) over some finite subset of , the natural numbers.

A natural pattern to consider is , where is a conjugate (cyclic shift) of . For example, the English words enlist and listen are conjugates. However, since and are conjugate if and only if there exist words with and , we see that a word contains this kind of pattern if and only if it contains a square. Thus, it is impossible to avoid such a pattern over a binary alphabet, although it can be avoided over a ternary alphabet.

Instead, in this note, we consider a minor variation of this pattern. We say a finite nonempty word is a mesosome if it is of the form , with a conjugate of , and . The word mesosome itself is a mesosome, as some is a cyclic shift of meso. Every squarefree word, of course, avoids mesosomes, so mesosomes can certainly be avoided over a -letter alphabet. Our goal is to consider mesosome avoidance over a binary alphabet.

There are four principal results. First, we characterize all finite binary words avoiding mesosomes. Second, we count how many such words there are of length , and show this number is a cubic function of . Third, we characterize all infinite binary words avoiding mesosomes. Finally, we characterize the minimal forbidden words for mesosome-avoiding words.

## 2 Characterizing the finite mesosome-avoiding words

We define and , and extend this to finite and infinite words in the obvious way. Clearly a binary word has a mesosome iff has a mesosome. Without loss of generality, then, in our characterization we can restrict our attention to nonempty words that begin with . We make this assumption in what follows.

The basic idea is to classify mesosome-avoiding words starting with

by the number of runs they contain. A run is a maximal block of contiguous identical symbols. With the notation , we mean that and are conjugates (possibly equal).

One run: Clearly avoids mesosomes for all .

Two runs: Clearly avoids mesosomes for all .

Three runs:

###### Lemma 1.

A word of the form (for ) avoids mesosomes iff

is odd.

###### Proof.

If is mesosome-free, then must be odd. For otherwise and contains a mesosome factor of the form . For the converse, suppose is odd and contains a mesosome . Then must start with and end with and must start with and end with . But then would be even, a contradiction. ∎

Four runs:

###### Lemma 2.

The word (for ) avoids mesosomes iff

• or

• and and are odd.

###### Proof.

Let us show that the strings in Cases (a) and (b) avoid mesosomes.

Case (a): . Suppose has a mesosome. Then with and . Suppose begins inside the first group of ’s in . If it ends inside the first group of ’s, then either or contains a , a contradiction. So contains a . But then either , forcing , or , forcing to have no ’s.

So begins with the first . But then there is no possible choice for .

So begins at a position in . But then either starts with and doesn’t, or , forcing . In all cases we get a contradiction.

Case (b): , with and and are odd.

Suppose has a mesosome. Then again, with and . Suppose begins inside the first group of ’s in . Then as in the previous case, cannot end in the first group of ’s. Suppose ends inside the first group of ’s in . Then since is odd, must contain some ’s from the second group of ’s. This means that has ’s. Since , we see that has more ’s than does, a contradiction. Suppose that ends after the first group of ’s. Then has at least ’s, and has fewer than ’s, a contradiction.

So must begin within the first group of ’s or after. But then must end in the second group of ’s. Since is odd and , we can apply the argument above to .

Now we show that these are the only mesosome-avoiding binary words with four runs.

Suppose with . Considering the factor gives us that is odd, by Lemma 1, and considering the factor gives us that is odd, also by Lemma 1.

There are four cases to consider:

Case 1: , : Write , with and . Note that and , a contradiction.

Case 2: , , : Write , with and . Note that and , a contradiction.

Case 3: , .
Case 4: , , . Cases 3 and 4 are completely parallel to Cases 1 and 2, by considering instead of . ∎

Five runs:

###### Lemma 3.

If is mesosome-free and has 5 runs, then .

###### Proof.

Let . We can now apply Lemma 2 to the first four and last four runs. This gives four possibilities:

Case 1: . If , then , with and . So and . The case follows analogously by considering . So . In this case , which has no mesosome.

Case 2: and and . But then , a contradiction.

Case 3: and and . But then , a contradiction.

Case 4: and and and . But then and , a contradiction. ∎

More than runs: By applying Lemma 3 to , we see that the first five runs must be . We can then apply the same lemma to each consecutive group of runs and conclude that or .

For the converse, we need to see that these words are mesosome-free. But this is clear, since any factor of the form with either satisfies if is even, or if is odd. In this latter case, contains one more copy of one letter than does.

Thus we have proved the following result:

###### Theorem 1.

The finite binary word is mesosome-free iff or has one of the following forms:

• , ;

• , ;

• for and odd;

• for ;

• for and odd, and ;

• or for .

## 3 Counting the mesosome-avoiding words

Based on the characterization of Section 2, we can now count the mesosome-avoiding binary words of length . The following table gives the number of such words of length for the first few values of :

 n mn 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 4 8 14 24 32 42 54 68 82 98 118 140 162 186 216 248

It is sequence A341277 in the On-Line Encyclopedia of Integer Sequences .

###### Theorem 2.

Let denote the number of mesosome-avoiding binary words of length . Write for and . Then

 m(4k) =(4k3+15k2+41k−12)/3; m(4k+1) =(4k3+18k2+50k)/3; (1) m(4k+2) =(4k3+21k2+59k+12)/3; m(4k+3) =(4k3+24k2+68k+30)/3.
###### Proof.

In what follows we only consider the words that begin with ; to get the total of all such words, as provided in the statement of the the theorem, it is necessary to multiply the intermediate results below by .

Words with one run: There is only one, namely .

Words with two runs: We must count the words of the form for and . Clearly there are such words.

Words with three runs: We must count words of the form for and odd. Once is fixed, there are clearly such words, so letting , we see that the total is

 ∑1≤j′≤(n−1)/2n−2j′={n(n−2)/4,% if n is even;(n−1)2/4,if n is odd.

Words with four runs: Here there are two cases. Either for , or with , , , and odd.

In the former case there are clearly such words.

To count the second case, let denote the number of ways to write as the sum with and and odd. Then it is easy to see that . Then the second case is clearly . To compute , rewrite as

 r/4−3/8+(−1)r/8+(1−i)ir/8+(1+i)(−i)r/8,

and then sum in the standard way (or use Maple). We get

 g(n)=n−396(n2−6n+3(−1)n/2+1/2)+1192((6in−18i+6(−1)n)(−i)n+(−6in+18i+6(−1)n)in).

Considering the possibilities mod 4, this gives

 g(4n) =2n3/3−(3/2)n2+5n/6 g(4n+1) =2n3/3−n2+n/3 g(4n+2) =2n3/3−n2/2−n/6 g(4n+3) =2n3/3−2n/3.

Words with five or more runs: From Lemma 3 we see that for , there is exactly one such word of length .

Now, adding up all five cases, and multiplying by , we get that the total number of mesosome-avoiding words of length is given by Eqn. (1).

## 4 Infinite binary words avoiding mesosomes

Recall that by , for a finite word , we mean the infinite word .

###### Theorem 3.

Let . If is an infinite binary mesosome-free word, then has one of the following forms (or their binary complement):

###### Proof.

First suppose that is an infinite binary mesosome-free word. Let denote the first letters of , for all . Note that must be mesosome-free for all such . This means that for all such , must be of the form described in Theorem 1.

Now suppose to the contrary that for some , has the form for odd, and . Then if we add zeros at the end of , we will have a word with five runs not of the form or , and so we will have a mesosome. If we add ones at the end of , then eventually and our new word will have a mesosome. Therefore, there is no way to add letters at the end of while ensuring that the new word is mesosome-free. Thus, must contain a mesosome, a contradiction.

It follows that is of the form , , , , , or for all . It is easy to see by letting approach infinity that must be of the required form.

Now suppose that is of the required form. Then clearly for all natural , is mesosome-free by Theorem 1, so is mesosome-free. ∎

## 5 Minimal forbidden words

A word is minimal forbidden if it is a mesosome, but no proper subword is a mesosome. For example, the smallest minimal forbidden words are and . There are ten minimal forbidden words of length , which are , , , , , and their binary complements. These are the only minimal forbidden words of length . Note that a minimal forbidden word is always of the form for , so a minimal forbidden word always has even length.

Our first goal is to characterize the minimal forbidden words. We do this by considering cases depending on the number of runs that the words contain.

One or two runs: All words with one or two runs are mesosome-free, so none are minimal-forbidden.

Three runs: Let be minimal forbidden. Then is a mesosome, so must be even by Lemma 1. Now, if , we can write , where is a mesosome, implying that is not minimal forbidden. Similarly, if , then is not minimal forbidden. Thus, we must have , and so for even .

Four runs:

###### Lemma 4.

A word of the form is minimal forbidden iff and are odd and either

• , and or

• , and .

###### Proof.

Let be minimal forbidden. Then and must be odd, because otherwise either or would be a proper subword and a mesosome. Having established that and are odd, we split into four cases as follows:

Case 1: and . If then is mesosome-free by Lemma 2, so is not minimal forbidden. Otherwise, suppose that . Then , so we can write where is a proper subword and (since and ) a mesosome by Lemma 2. A similar argument applied to will show that if , then is still not minimal forbidden, a contradiction.

Case 2: and . Note that if and , then is minimal forbidden and . It follows that in this case, is minimal forbidden iff and are empty; that is, if and only if and .

Case 3: and . By considering , we see that this case is symmetrical to case 2. It follows that in this case, is minimal forbidden iff and .

Case 4: and . By Lemma 2, is mesosome-free, so cannot be minimal forbidden. ∎

Five runs:

###### Lemma 5.

No word of length of the form is minimal forbidden.

###### Proof.

Suppose to the contrary that has length and is minimal forbidden. Then clearly at least one of , , , , and must be greater than . We split into two cases based on this.

Case 1: . Then or . Suppose that . Then we can write , where and . Note that is a mesosome of length , and since has length , it must be that is a proper subword of , a contradiction. The case where is symmetrical.

Case 2: at least one of , , and is different than one. Then as in the proof of Lemma 3, it can be shown that either or is a mesosome, implying that is not minimal forbidden. ∎

Six runs or more: Suppose to the contrary that is minimal forbidden with six runs or more. Then must have at least one run of length , because otherwise would be mesosome-free by Theorem 1. Therefore, we can write , where has five runs, with one run of length . Note that and cannot both be empty, because needs to have at least six runs. Thus, is a proper subword of . Since has at least one run of length , must also be a mesosome by Theorem 1, a contradiction. By contradiction, we see that there are no minimal forbidden words with six runs or more.

We have now proved the following result:

###### Theorem 4.

For (mod 4) and there are minimal forbidden words and they are as follows:

• for ;

• for ;

• ,

and their binary complements.

For (mod 4) and there are minimal forbidden words and they are as follows:

• for ;

• for ;

• ,

and their binary complements.

## References

•  T. C. Brown and A. R. Freedman. Arithmetic progressions in lacunary sets. Rocky Mountain J. Math. 17 (1987), 587–596.
•  P. Erdős. Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl. 6 (1961), 221–254.
•  L. Halbeisen and N. Hungerbühler. An application of Van der Waerden’s theorem in additive number theory. INTEGERS—Elect. J. Combin. Number Theory 0 (2000), #A7.
•  J. Justin. Généralisation du théorème de van der Waerden sur les semi-groupes répétitifs. J. Combin. Theory. Ser. A 12 (1972), 357–367.
•  V. Keränen. Abelian squares are avoidable on 4 letters. In W. Kuich, editor, Proc. 19th Int’l Conf. on Automata, Languages, and Programming (ICALP), Vol. 623 of Lecture Notes in Computer Science, pp. 41–52. Springer-Verlag, 1992.
•  J. Loftus, J. O. Shallit, and M.-w. Wang. New problems of pattern avoidance. In G. Rozenberg and W. Thomas, editors, Developments in Language Theory, 1999, pp. 185–199. World Scientific, 2000.
•  T. Ng, P. Ochem, N. Rampersad, and J. Shallit. New results on pseudosquare avoidance. In R. Mercas and D. Reidenbach, editors, WORDS 2019, Vol. 11682 of Lecture Notes in Computer Science, pp. 264–274. Springer-Verlag, 2019.
•  P. Ochem, N. Rampersad, and J. Shallit. Avoiding approximate squares. Internat. J. Found. Comp. Sci. 19 (2008), 633–648.
•  G. Pirillo and S. Varricchio. On uniformly repetitive semigroups. Semigroup Forum 49 (1994), 125–129.
•  N. J. A. Sloane et al. The On-Line Encyclopedia of Integer Sequences. Web resource available at https://oeis.org, 2021.
•  A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7 (1906), 1–22. Reprinted in Selected Mathematical Papers of Axel Thue, T. Nagell, editor, Universitetsforlaget, Oslo, 1977, pp. 139–158.
•  A. Thue. Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 1 (1912), 1–67. Reprinted in Selected Mathematical Papers of Axel Thue, T. Nagell, editor, Universitetsforlaget, Oslo, 1977, pp. 413–478.