# Re-pairing brackets

Consider the following one-player game. Take a well-formed sequence of opening and closing brackets. As a move, the player can pair any opening bracket with any closing bracket to its right, erasing them. The goal is to re-pair (erase) the entire sequence, and the complexity of a strategy is measured by its width: the maximum number of nonempty segments of symbols (separated by blank space) seen during the play. For various initial sequences, we prove upper and lower bounds on the minimum width sufficient for re-pairing. (In particular, the sequence associated with the complete binary tree of height n admits a strategy of width sub-exponential in n.) Our two key contributions are (1) lower bounds on the width and (2) their application in automata theory: quasi-polynomial lower bounds on the translation from one-counter automata to Parikh-equivalent nondeterministic finite automata. The latter result answers a question by Atig et al. (2016).

## Authors

• 9 publications
• 3 publications
• ### Which Regular Languages can be Efficiently Indexed?

In the present work, we study the hierarchy of p-sortable languages: reg...
02/12/2021 ∙ by Nicola Cotumaccio, et al. ∙ 0

• ### Game Characterization of Probabilistic Bisimilarity, and Applications to Pushdown Automata

We study the bisimilarity problem for probabilistic pushdown automata (p...
11/16/2017 ∙ by Vojtech Forejt, et al. ∙ 0

• ### Lower bounds on separation automata for Parity Games

Several recently developed quasi-polynomial time algorithms for Parity G...
02/19/2019 ∙ by Alexander Kozachinskiy, et al. ∙ 0

• ### Lower Bounds on Unambiguous Automata Complementation and Separation via Communication Complexity

We use results from communication complexity, both new and old ones, to ...
09/19/2021 ∙ by Mika Göös, et al. ∙ 0

• ### The space complexity of mirror games

We consider a simple streaming game between two players Alice and Bob, w...
10/08/2017 ∙ by Sumegha Garg, et al. ∙ 0

• ### Lower bounds for the maximum number of runners that cause loneliness, and its application to Isolation

We consider (n+1) runners with given constant unique integer speeds runn...
01/17/2020 ∙ by Deepak Ponvel Chermakani, et al. ∙ 0

• ### Exponential Resolution Lower Bounds for Weak Pigeonhole Principle and Perfect Matching Formulas over Sparse Graphs

We show exponential lower bounds on resolution proof length for pigeonho...
12/02/2019 ∙ by Susanna F. de Rezende, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Consider the following one-player game. Take a well-formed sequence of opening and closing brackets; that is, a word in the Dyck language. As a move, the player can pair any opening bracket with any closing bracket to its right, erasing them. The two brackets do not need to be adjacent or matched to each other in the word. The goal is to re-pair (erase) the entire word, and the complexity of a play is measured by its width: the maximum number of nonempty segments of symbols (‘islands’ separated by blank space) seen during the play. Here is an example:

Note that move  pairs up two brackets not matched to each other in the initial word; such moves are permitted without restrictions. At the beginning, there is a single segment, which splits into two after the first move. Both segments disappear simultaneously after the third move; the width of the play is equal to . In this example, width  is actually sufficient: a better strategy is to erase the endpoints first and then erase the two matched pairs in either order.

For a word , the width of  is the minimum width sufficient for re-pairing it. Is it true that all well-formed (Dyck) words, no matter how long, have bounded complexity, i.e., can be re-paired using width at most , where is independent of the word? The answer to this simply formulated combinatorial question turns out to be negative, but there does not appear to be a simple proof for this: strategies, perhaps surprisingly, turn out quite intricate. In the present paper, we study this and related questions.

Motivation. First of all, we find the re-pairing problem interesting in its own right—as a curious combinatorial game, easily explained using just the very basic concepts in discrete mathematics. So our motivation is, in part, driven by the appeal of the problem itself.

We have first identified the re-pairing problem when studying an open question in automata theory, the complexity of translation of one-counter automata (OCA) on finite words into Parikh-equivalent nondeterministic finite automata (NFA) [ACH16]. This translation arose in model checking, namely in the context of availability expressions [AAMS15, HMO10]. It is more generally motivated by recent lines of research on the classic Parikh theorem [Par66]: on its applications in verification (see, e.g., [GM12, EGP14, HL12]) and on its extensions and refinementsrequired for them [EGKL11, KT10, Kop15]. It has been unknown [ACH16] whether the translation in question can be made polynomial, and in this paper we answer this question negatively: using our results on the re-pairing problem, we obtain a quasi-polynomial lower bound on the blowup in the translation.

The re-pairing problem is a curious case study in the theory of non-uniform models of computation. As it turns out, restricted strategies, where paired brackets are always matched to each other in the word, have a close connection with the black-and-white pebble game on binary trees, a classic setting in computational complexity (see, e.g., surveys by Nordström [Nor13, Nor15]). We show that unrestricted strategies in the re-pairing game make it significantly more complex than pebbling, strengthening this model of computation.

Our two key contributions are (i) lower bounds on the width (against this model) and (ii) the connection to automata theory: lower bounds on the translation from OCA to Parikh-equivalent NFA.

Our lower bounds on the width are obtained by bounding the set of (Dyck) words that can be re-paired using width  (for each ). To put this into context, classic models of matrix grammars [Roz87] and deterministic two-way transducers [Roz85], as well as a more recent model of streaming string transducers [AC11, AC10], extend our model of computation with additional finite-state memory. (We refer the reader to surveys [FR16, Mus17, MP19] for more details.) In terms of transducers, our technique would correspond to determining the expressive power of machines of bounded size. Existing results of this kind (see [FR16, Mus17]) apply to variants of the model where concatenation is restricted: with [AR13] or without [DRT16] restrictions on output and in the presence of nondeterminism [BGMP16]. Our result is, to the best of our knowledge, the first lower bound against the unrestricted model.

In the application of the re-pairing problem to automata theory, our lower bounds on the size of NFA for the Parikh image can be viewed as lower bounds on the size of commutative NFA, i.e., nondeterministic automata over the free commutative monoid (cf. [Huy83, Huy85, Esp97, KT10, Kop15, HH16]). To the best of our knowledge, we are the first to develop lower bounds (on description size) for this simple non-uniform model of computation. It is well-known that, even for usual NFA, obtaining lower bounds on the size (number of states) is challenging and, in fact, provably (and notoriously) hard; and that the available toolbox of techniques is limited (see, e.g., [GH06, HK08] and [HPS09]). From the ‘NFA perspective’, we first develop a lower bound for a stronger model of computation and then import this result to NFA using combinatorial tools, which we thus bring to automata theory: the Birkhoff—von Neumann theorem on doubly stochastic matrices (see, e.g., [Sch03, p. 301]) and the Nisan—Wigderson construction of a family of sets with pairwise low intersection [NW94]. The obtained lower bounds point to a limitation of NFA that does not seem to have the form of the usual communication complexity bottleneck (cf. [Sha09, Theorem 3.11.4], [HPS09], and the book by Hromkovic [Hro97]); exploring and exploiting this further is a possible direction for future research.

### Our contribution

In this paper, we define and study the re-pairing problem (game), as sketched above. Our main results are as follows:

1. We show that every well-formed (Dyck) word has a re-pairing of width , where is the length of .

This re-pairing always pairs up brackets that are matched to each other in ; we call re-pairings with this property simple. It is standard that well-formed words are associated with trees; for words  associated with binary trees, we show that the minimum width of a simple re-pairing is equal (up to a constant factor) to the minimum number of pebbles in the black-and-white pebble game on the associated tree, a quantity that has been studied in computational complexity [CS76, LT80, Nor13, Nor15] and captures the amount of space used by nondeterministic computation.

In particular, this means [Lou79, Mey81, LT80] that for the word associated with a complete binary tree of height , the minimum width of simple re-pairings is , which is logarithmic in the length of .

2. For , we show how to beat this bound, giving a (non-simple) recursive re-pairing strategy of width . This is a function sub-exponential in ; it grows faster than all , but slower than all , .

3. For and for a certain ‘stretched’ version of it, , we prove lower bounds on the width of re-pairings:

 width(Z(n)) =Ω(loglog|Z(n)|logloglog|Z(n)|)=Ω(lognloglogn), (1) width(Y(ℓ)) =Ω(√log|Y(ℓ)|loglog|Y(ℓ)|)=Ω(ℓ).
4. As an application of our lower bounds, we prove that there is no polynomial-time translation from one-counter automata (OCA) on finite words to Parikh-equivalent nondeterministic finite automata (NFA). This shows that optimal translations must be quasi-polynomial, answering a question by Atig et al. [ACH16].

To prove this result, we consider OCA from a specific complete family, , identified by Atig et al. [ACH16]. (There is a polynomial translation from (any) OCA to Parikh-equivalent NFA if and only if these OCA  have Parikh-equivalent NFA of polynomial size.) We prove, for every Dyck word of length , a lower bound of on the minimum size of NFA accepting regular languages Parikh-equivalent to . Based on the words , we get a lower bound of

 nΩ(√logn/loglogn)

on the size of NFA. Note that this holds for NFA that accept not just a specific regular language, but any language Parikh-equivalent to the one-counter language (there are infinitely many such languages for each ).

### Background and related work

##### Parikh image of one-counter languages.

The problem of re-pairing brackets in well-formed words is linked to the following problem in automata theory.

The Parikh image (or commutative image) of a word  over an alphabet

is a vector of dimension

in which the components specify how many times each letter from occurs in . The Parikh image of a language is the set of Parikh images of all words . It is well-known [Par66] that for every context-free language there exists a regular language  with the same Parikh image (Parikh-equivalent to ). If is generated by a context-free grammar of size , then there is a nondeterministic finite automaton (NFA) of size exponential in  that accepts such a regular language  (see [EGKL11]); the exponential in this translation is necessary in the worst case.

When applying this translation to a language from a proper subclass of context-free languages, it is natural to ask whether this blowup in description size can be avoided. For languages recognized by one-counter automata (OCA; a fundamental subclass of pushdown automata), the exponential construction is suboptimal [ACH16]. If an alphabet  is fixed, then for every OCA with  states over  there exists a Parikh-equivalent NFA of polynomial size (the degree of this polynomial depends on ). And even in general, if the alphabet is not fixed, for every OCA with  states over an alphabet of cardinality at most  there exists a Parikh-equivalent NFA of size , quasi-polynomial in . Whether this quasi-polynomial construction is optimal has been unknown, and we prove in the present paper a quasi-polynomial lower bound.

We note that the gap between NFA of polynomial and quasi-polynomial size grows to exponential when the translation is applied iteratively, as is the case in Abdulla et al. [AAMS15].

##### Matrix grammars of finite index and transducers.

The question of whether all well-formed (Dyck) words can be re-paired using bounded width can be linked to a question on matrix grammars, a model of computation studied since the 1960s [Abr65]. Matrix grammars are a generalization of context-free grammars in which productions are applied in ‘batches’ prescribed by the grammar. This formalism subsumes many classes of rewriting systems, including controlled grammars, L systems, etc. (see, e.g., [DPS97]).

The index of a derivation in a matrix grammar is the maximum number of nonterminals in a sentential form in this derivation (this definition applies to ordinary context-free grammars as well) [Bra67, GS68]. Bounding the index of derivations, i.e., restricting grammars to finite index is known to reduce the class of generated languages; this holds both for ordinary context-free [GS68, Sal69, Gru71] and matrix grammars [Bra67]. Languages generated by finite-index matrix grammars have many characterizations: as languages output by deterministic two-way transducers with one-way output tape [Raj72], or produced by EDT0L systems of finite index [Lat79, Proposition I.2]; images of monadic second-order logic (MSO) transductions [EH01]; and, most recently, output languages of streaming string transducers [AC11, AC10]. (See also the survey by Filiot and Reynier [FR16].)

Encoding the rules of our re-pairing problem in the matrix grammar formalism leads to a simple sequence of grammars with index  for subsets of the Dyck language ; the question of whether all Dyck words can be re-paired using bounded width is the same as asking if any of these grammars has in fact (bounded-index) derivations for all Dyck words. A 1987 paper by Rozoy [Roz87] is devoted to the proof that, in fact, no matrix grammar can generate all words in using bounded-index derivations without also generating some words outside . This amounts to saying that no finite-index matrix grammar generates ; and a non-constant lower bound on the width in the re-pairing problem could be extracted from the proof.

Unfortunately, the proof in that paper seems to be flawed. Fixing the argument does not seem possible, and we are not aware of an alternative proof. (We discuss the details and compare the proof to our construction in Appendix A).

## 2 Basic definitions

##### The Dyck language.

We use non-standard notation for brackets in words from the Dyck language: the opening bracket is denoted by  and the closing bracket by ; we call these symbols pluses and minuses, accordingly. Moreover, in some contexts it is convenient to interpret and as integers and .

Let be an even integer. A word , , is a Dyck word (or a well-formed word) if it has an equal number of and and for every the inequality is satisfied. The height of a position  in a well-formed word  is . As usual, denotes the length of the word  (the number of symbols in it).

Dyck words are naturally associated with ordered rooted forests (i.e., with sequences of ordered rooted trees). E.g., words  defined by

 Z(1)=+−;Z(n+1)=+Z(n)Z(n)− (2)

can be associated with complete binary trees of height . Recall that the height of a rooted tree is the maximum length of a path (number of edges) from the root to a leaf.

Note that we described a re-pairing of the word  in section 1.

##### Re-pairings and their width.

A re-pairing of a well-formed word is a sequence of pairs

 p=(p1,…,pN/2),where pi=(ℓi,ri)

and the following properties are satisfied:

1. , , for all ;

2. every number from the interval occurs in exactly one pair .

(We use the word ‘interval’ to refer to a set of the form .)

The intuition is that the index  corresponds to discrete time, and at time  the two symbols and are (re-)paired (or erased). Denote by the set of points from that correspond to symbols erased at times .

It is easy to see that re-pairings exist for every well-formed word. By induction on the length of the word one can prove a stronger statement: a word can be extended to a re-pairing iff all numbers in the pairs are different, the property (R1) is satisfied, and the remaining signs (those which have not been erased) constitute a well-formed word. We now define the following quantities:

• The width of a set of integers, , is the smallest number of intervals the union of which is equal to .

• The width of a re-pairing  at time is .

• The width of a re-pairing  of a well-formed word , , is , i.e., the maximum of the width of this re-pairing over all time points.

• The width of a well-formed word , , is , where the minimum is over all re-pairings of .

We will look into how big the width of a well-formed word of length  can be, that is, we are interested in , where the maximum is over all well-formed words of length .

###### Remark 1.

Section 1 discussed the minimization of the maximum number of the “surviving” (non-erased) intervals. This quantity cannot differ from the width defined above by more than .

###### Remark 2.

A tree-based representation of re-pairings is described in Section 5 and, in more details, in Appendix, Section B.2.

## 3 Simple bounds and simple re-pairings

In this section we establish several basic facts on the width of well-formed words and re-pairings (proofs are provided in Appendix, Section C). A careful use of bisection leads to the following upper bound:

###### Theorem 1.

for all well-formed words .

We call a re-pairing of a well-formed word  simple if at all times it pairs up two signs that are matching in the word . The re-pairing that the proof of Theorem 1 constructs is simple.

We now show a link between simple re-pairings and strategies in the following game. Let be an acyclic graph (in our specific case it will be a tree with edges directed from leaves to root). Define a black-and-white pebble game on  (see, e.g., [LT80, Nor15]) as follows. There is only one player, and black and white pebbles are placed on the nodes of the graph. The following moves are possible:

1. place a black pebble on a node, provided that all its immediate predecessors carry pebbles;

2. remove a black pebble from any node;

3. place a white pebble on any node; and

4. remove a white pebble from a node, provided that all its immediate predecessors carry pebbles.

(In a tree, immediate predecessors are immediate descendants, i.e., children. Rules (M1) and (M4) are applicable to all sources, i.e., leaves of .) At the beginning there are no pebbles on any nodes. A sequence of moves in the game is a strategy; it is successful if it achieves the goal: reaching a configuration in which all sinks of the graph carry pebbles and there are no white pebbles on any nodes. By we will denote the minimum number of pebbles sufficient for a successful strategy in the black-and-white pebble game on .

###### Theorem 2.

Suppose the tree  associated with a well-formed word  is binary. Then the minimum width of a simple re-pairing for  is .

Since is a tree, it follows from the results of the papers [Lou79, Mey81, LT80] (see also [Sav98, pp. 526–528]) that the value of at most doubles if the strategies are not allowed any white pebbles. The optimal number of (black) pebbles in such strategies is determined by the so-called Strahler number (see, e.g., [ELS14] and [LT80]):

###### Corollary 1.

For binary trees, the following two quantities are within a constant factor from each other: the minimum width of a simple re-pairing for and the maximum height of a complete binary tree which is a graph-theoretic minor of the tree .

By Corollary 1, the upper bound in Theorem 2 has the same order of magnitude as (or lower than) the upper bound from Theorem 1. The latter gives a simple re-pairing too, but also holds for non-binary trees .

The lower bound in Theorem 2 relies on the re-pairing being simple. For instance, for the word associated with a complete binary tree (see (2)), the minimum width of a simple re-pairing is , but the (usual) width is for all (section 4).

## 4 Upper bound for complete binary trees

Recall the words , defined by equation (2) on page 2.

###### Theorem 3.

.

The upper bound from the previous section gives , whilst the functions for are such that and for all .

To prove Theorem 3 we need a family of framed words . Denote by the word

 ++…++kσ−−…−−k. (3)

Using the brackets terminology, this is the word which is enclosed by  pairs of openings and closing brackets. We will call such words -framed.

###### Remark 3.

If , then , because a re-pairing can erase the signs of from left to right, pairing each with a from the prefix and each with a from the suffix. This re-pairing is, of course, not simple.

We construct a family of re-pairings  of framed words , where and is a parameter. The definition will be recursive, and will control the ‘granularity’ of the recursion.

##### Oveview.

On each step of the re-pairing  the leftmost remaining is erased. For , it is paired with the leftmost remaining . For , it is paired with the that we choose using the following recursive definition.

At each step of the re-pairing , we define an auxiliary subsequence of the word that forms a word . If the leftmost remaining minus is not in the subsequence, then we pair it with the leftmost non-erased plus. Otherwise we consider the re-pairing of the word , where we pick and below, and pair the minus using this re-pairing (more details to follow).

##### Stages of the re-pairing p(q,n,k).

The re-pairing is divided into stages, indexed by . Denote by , , the th leftmost occurrence (factor) of  in the word . Stage

begins at the moment when all minuses to the left of the start position

of the factor are erased, and ends when stage  begins. Define an integer sequence as follows:

 k2q⋅s+1=0, k2q⋅s+t=⌈log2t⌉−1 for 1

At the beginning of stage , the subsequence of is formed by the rightmost non-erased pluses to the left of ; followed by the symbols of the factor ; followed by the leftmost non-erased minuses to the right of the end position of . The symbols of written together, form the word .

Choose such that the width of the re-pairing is minimal. At the first part of stage , the re-pairing pairs the signs in according to the re-pairing . The first part ends when either all minuses to the left of the factor are erased or the sequence is exhausted (whichever is earlier). In the latter case the final part of stage  is started. At each step of this part, the leftmost non-erased minus is paired with the leftmost non-erased plus.

###### Claim 1.

Re-pairings are well-defined.

Define , where the minimum is over for and over for .

###### Claim 2.

for .

Somewhat strangely, we have been unable to find solutions to recurrences of this form in the literature.

###### Claim 3.

.

Since , this implies the upper bound of Theorem 3.

##### Proof idea for Claim 2.

(For complete proofs of these claims see Appendix, Section D.) Assume . We notice that at each step at most two of the factors , are partially erased. (All other factors either have been erased completely () or are yet untouched ().) Furthermore, non-erased signs to the left of partially erased factors form several intervals; each of them, except possibly the leftmost, has size at least .

Note that, at each moment in time, the non-erased signs form a well-formed (Dyck) word, so the height of each position in with respect to these signs only is nonnegative. Since the height of positions in the word  cannot exceed , it follows that a partially erased factor can be preceded by at most  non-erased intervals (runs of pluses). This leads to the recurrence of Claim 2.

## 5 Lower bounds

###### Theorem 4.

There exists a sequence of well-formed words  such that

 width(Wn)=Ω(√log|Wn|/loglog|Wn|).

The words in this sequence are similar to the words  associated with complete binary trees. They are associated with a ‘stretched’ version of the complete binary tree, i.e., one in which every edge is subdivided into several edges. More precisely, let , , …, be a finite sequence of positive integers. Define the following sequence of well-formed words inductively:

 X(a0)=+a0−a0, X(a0,…,ak)=+akX(a0,…,ak−1)X(a0,…,ak−1)−ak.

The words we use to prove Theorem 4 have the form , where , , and . In particular, . (Notice that .) Our method applies both to and , giving the bounds in equation (1) on page 1.

We give a proof overview below, details are provided in Appendix, Section E. We use a tree representation of re-pairings. Informally, this tree tracks the sequence of mergers of erased intervals. This sequence, indeed, is naturally depicted as an ordered rooted binary tree as shown in Fig. 1(a). This tree is essentially a derivation tree for a word in an appropriate matrix grammar. Edges of a rooted tree are divided into levels according to the distance to the root. We think of this distance as a moment of time in the derivation process. The derived word can be read off the tree by following the left-to-right depth-first traversal.

For formal definitions see Appendix, Section B.2.2.

Our proof is inductive, and one of the ideas is what the induction should be over. Observe that every factor of a Dyck word induces a connected subgraph, which we call a fragment; see Fig. 1(b). The width of a tree or a fragment is defined in natural way: it is the maximal number of edges at a level of the tree. E.g., the fragment shown in Fig. 1(b) has width 2.

Our inductive statement applies to fragments. Fix a well-formed word ; in the sequel we specialize the argument to and . Denote by the maximum length of a factor  associated with a fragment of width at most  in trees that derive the word . Put differently, given , consider all possible trees that derive . Fragments of width at most  in these trees are associated with factors of the word , and is the maximum length of such a factor. Note that in this definition the width of the (entire) trees is not restricted.

It is clear from the definition that the sequence of numbers  is non-decreasing: . We obtain upper bounds on the numbers  by induction. For , we show that for big enough  and . Here and below, implicit constants in the asymptotic notation do not depend on and . From this we get . We observe that if , then . Since , it follows that every derivation tree of the word  must have width  satisfying , that is, .

For , we show a stronger inequality, , which is sufficient for a lower bound .

To prove the inductive upper bound on we need to show that narrow fragments cannot be associated with long factors. For this purpose we use two ideas.

##### Combinatorial properties of increases and drops in Z(n) and Y(ℓ).

Denote by the difference , where and are the start and end positions of a factor . The value is the increase in height on the factor.

The first property is that every factor of  of length  contains a sub-factor  with . The second combinatorial property of  is as follows: for sufficiently large  and every two factors  and of the word , if and  is located to the left of , then the distance between these factors is at least . Here and below the distance between the factors is the length of the smallest factor of containing both of them.

For the word , similar properties hold, but the functions and are replaced by the functions and , respectively.

##### Balance within a single time period.

Consider a factor  of the word  associated with a fragment of width at most  (in a tree derivation that generates ). Denote this fragment .

Notice that, in a Dyck word, every is matched by a somewhere to the left of it. Thus, for a factor , there exists a factor  to the left of with a matching height increase: . We strengthen this balance observation to identify a pair of matching factors , (with a slightly smaller height increase in ) which also satisfies the following conditions:

1. is large enough (of magnitude indicated by the first combinatorial property);

2. the factors and are derived during overlapping time intervals,

3. the factor sits to the left of and inside , and

4. the sub-fragment associated with the factor between and has width strictly smaller than the width of the entire fragment .

These conditions enable us to upper-bound the distance between and through a function of . On the other hand, this distance is lower-bounded by the second combinatorial property. Comparing the bounds shows how to bound , and thus , from above by a function of .

## 6 An application: Lower bounds for commutative NFA

In this section we link the re-pairing problem for well-formed (Dyck) words to the descriptional complexity (number of states in NFA) of the Parikh image of languages recognized by one-counter automata (OCA).

We consider a slightly simplified version of complete languages introduced by Atig et al. [ACH16] (see Section 1). Each of them is over the alphabet and can be recognized by an OCA with  states. We will assume throughout that  is even. In what follows, we need only the Parikh image  of this language. We call  the universal one-counter set (since it is provably the hardest case for translations from OCA to Parikh-equivalent NFA). This is the set of -dimensional vectors of nonnegative integers that satisfy the following conditions:

1. and the directed graph on vertices with the set of edges is a monotone path from to (i.e., one with in each edge); we call such paths chains;

2. (balance) the vector belongs to the cone  of balanced vectors:

 K={(x0,…,xn−1):n−1∑i=0(−1)ixi=0; k∑i=0(−1)ixi≥0, 0≤k
3. (compatibility) if for some , then for some ; if for some , then for some .

We skip the standard definition of nondeterministic finite automata (NFA). The meaning of the numbers and is that they specify the number of occurrences of letters and on accepting paths in (the transition graph of) the NFA. An NFA recognizes a language with Parikh image  iff for each vector from the NFA has an accepting path with these counts of occurrences, and for all other vectors no such path exists. There exists [ACH16] an NFA with  states that recognizes a language with Parikh image . Our goal is to prove that this superpolynomial dependency on  is unavoidable.

###### Theorem 5.

Let be an arbitrary Dyck word of length . Suppose an NFA recognizes a language with Parikh image . Then the number of states of is at least .

###### Corollary 2.

If an NFA recognizes a language with Parikh image , then its number of states is .

Corollary 2 follows from Theorems 4 and 5.

Since is the Parikh image of a language recognized by an OCA with  states, it follows that there is no polynomial translation from OCA to Parikh-equivalent NFA.

Our proof of Theorem 5 makes three steps:

1. In the NFA for we find accepting paths , parameterized by sets , , and extract re-pairings of from them. Roughly speaking, the parameter  determines the set of positions  for which the path has .

Intuitively, as goes through any strongly connected component (SCC) in the NFA, the re-pairing erases pairs such that a cycle in this SCC reads letters and

. To get a bijection between even and odd indices, we use the Birkhoff—von Neumann theorem on doubly stochastic matrices (see, e.g.,

[Sch03, p. 301]).

2. With every SCC in the NFA, we associate an auxiliary set . We show that each path visits an SCC for which .

3. By making range in a family of sets with low intersection, we ensure that no other path can visit the SCC . So the NFA has at least SCCs, and therefore at least states. The low intersection property means that for all . We choose ; the family of size can be obtained by the Nisan—Wigderson construction [NW94].

For the details of the proof see Appendix, Section F.

## 7 Open problems

Our work suggests several directions for future research. The first is computing the width of as well as of other words, closing the gap between the upper and lower bounds. Obtaining super-constant lower bounds (for infinite families of words, both constructively and non-constructively) seems particularly difficult. Our lower bound on the width of leaves a gap between and for the size of blowup in an OCA to Parikh-equivalent NFA translation, and our second problem is to close this gap.

The third problem is to recover a proof of Rozoy’s statement that the Dyck language  is not generated by any matrix grammar of finite index [Roz87], or equivalently by any two-way deterministic transducer with one-way output tape [Roz85]. We expect that our lower bound construction for the width can be extended appropriately.

Last but not least, our re-pairing game corresponds to the following family of deterministic two-way transducers  generating Dyck words. The input to a transducer encodes a derivation tree of width , in the sense defined in section 5. Symbols corresponds to layers of the tree; there are symbols in the alphabet that encode the branching and symbols that encode the positions of a pair of brackets ( and ). The transducer  simulates a traversal of the tree and outputs the generated word; it has states. All words of width at most  are generated by .

Our final problem is to determine if there exist smaller transducers that generate all Dyck words of length , , and do not generate any words outside . Here is such that all words of length  have width at most .

## Acknowledgment

We are grateful to Georg Zetzsche for the reference to Rozoy’s paper [Roz87].

This research has been supported by the Royal Society (IEC\R2\170123). The research of the second author has been funded by the Russian Academic Excellence Project ‘5-100’. Supported in part by RFBR grant 17–51-10005 and by the state assignment topic no. 0063-2016-0003.

## References

• [AAMS15] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Roland Meyer, and Mehdi Seyed Salehi. What’s decidable about availability languages? In FSTTCS’15, volume 45 of LIPIcs, pages 192–205, 2015.
• [Abr65] Samuel Abraham. Some questions of phrase-structure grammars I. Computational Linguistics, 4:61–70, 1965.
• [AC10] Rajeev Alur and Pavol Cerný. Expressiveness of streaming string transducers. In FSTTCS’10, pages 1–12, 2010.
• [AC11] Rajeev Alur and Pavol Cerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In POPL’11, pages 599–610, 2011.
• [ACH16] Mohamed Faouzi Atig, Dmitry Chistikov, Piotr Hofman, K. Narayan Kumar, Prakash Saivasan, and Georg Zetzsche. The complexity of regular abstractions of one-counter languages. In LICS’16, pages 207–216, 2016.
• [AR13] Rajeev Alur and Mukund Raghothaman. Decision problems for additive regular functions. In ICALP’13 (Proceedings, Part II), pages 37–48, 2013.
• [BGMP16] Félix Baschenis, Olivier Gauwin, Anca Muscholl, and Gabriele Puppis. Minimizing resources of sweeping and streaming string transducers. In ICALP’16, pages 114:1–114:14, 2016.
• [Bra67] Barron Brainerd. An analog of a theorem about context-free languages. Information and Control, 11(5/6):561–567, 1967.
• [CS76] Stephen A. Cook and Ravi Sethi. Storage requirements for deterministic polynomial time recognizable languages. J. Comput. Syst. Sci., 13(1):25–37, 1976.
• [DPS97] Jürgen Dassow, Gheorghe Păun, and Arto Salomaa. Grammars with controlled derivations. In Handbook of Formal Languages, Volume 2. Linear Modeling: Background and Application, pages 101–154. Springer, 1997.
• [DRT16] Laure Daviaud, Pierre-Alain Reynier, and Jean-Marc Talbot. A generalised twinning property for minimisation of cost register automata. In LICS’16, pages 857–866, 2016.
• [EGKL11] Javier Esparza, Pierre Ganty, Stefan Kiefer, and Michael Luttenberger. Parikh’s theorem: A simple and direct automaton construction. Inf. Process. Lett., 111(12):614–619, 2011.
• [EGP14] Javier Esparza, Pierre Ganty, and Tomás Poch. Pattern-based verification for multithreaded programs. ACM Trans. Program. Lang. Syst., 36(3):9:1–9:29, 2014.
• [EH01] Joost Engelfriet and Hendrik Jan Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log., 2(2):216–254, 2001.
• [ELS14] Javier Esparza, Michael Luttenberger, and Maximilian Schlund. A brief history of Strahler numbers. In LATA’14, volume 8370 of Lecture Notes in Computer Science, pages 1–13, 2014.
• [Esp97] Javier Esparza. Petri nets, commutative context-free grammars, and basic parallel processes. Fundam. Inform., 31(1):13–25, 1997.
• [FR16] Emmanuel Filiot and Pierre-Alain Reynier. Transducers, logic and algebra for functions of finite words. SIGLOG News, 3(3):4–19, 2016.
• [GH06] Hermann Gruber and Markus Holzer. Finding lower bounds for nondeterministic state complexity is hard. Electronic Colloquium on Computational Complexity (ECCC), 13(027), 2006. Conference version in: Developments in Language Theory (DLT) 2006; Lecture Notes in Computer Science, vol. 4036, pp. 363–374, Springer.
• [GM12] Pierre Ganty and Rupak Majumdar. Algorithmic verification of asynchronous programs. ACM Trans. Program. Lang. Syst., 34(1):6:1–6:48, May 2012.
• [Gru71] Jozef Gruska. A few remarks on the index of context-free grammars and languages. Information and Control, 19(3):216–223, 1971.
• [GS68] Seymour Ginsburg and Edwin H. Spanier. Derivation-bounded languages. J. Comput. Syst. Sci., 2(3):228–250, 1968.
• [HH16] Christoph Haase and Piotr Hofman. Tightening the complexity of equivalence problems for commutative grammars. In STACS, volume 47 of LIPIcs, pages 41:1–41:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
• [HK08] Markus Holzer and Martin Kutrib. Nondeterministic finite automata-recent results on the descriptional and computational complexity. In CIAA’08, pages 1–16, 2008.
• [HL12] Matthew Hague and Anthony Widjaja Lin. Synchronisation- and reversal-bounded analysis of multithreaded programs with counters. In CAV, volume 7358 of Lecture Notes in Computer Science, pages 260–276. Springer, 2012.
• [HMO10] Jochen Hoenicke, Roland Meyer, and Ernst-Rüdiger Olderog. Kleene, rabin, and scott are available. In CONCUR, volume 6269 of Lecture Notes in Computer Science, pages 462–477. Springer, 2010.
• [HPS09] Juraj Hromkovic, Holger Petersen, and Georg Schnitger. On the limits of the communication complexity technique for proving lower bounds on the size of minimal nfa’s. Theor. Comput. Sci., 410(30-32):2972–2981, 2009.
• [Hro97] Juraj Hromkovic. Communication Complexity and Parallel Computing. Texts in Theoretical Computer Science. An EATCS Series. Springer, 1997.
• [Huy83] Dung T. Huynh. Commutative grammars: The complexity of uniform word problems. Information and Control, 57(1):21–39, 1983.
• [Huy85] Dung T. Huynh. The complexity of equivalence problems for commutative grammars. Information and Control, 66(1/2):103–121, 1985.
• [Kop15] Eryk Kopczynski. Complexity of problems of commutative grammars. Logical Methods in Computer Science, 11(1), 2015.
• [KT10] Eryk Kopczynski and Anthony Widjaja To. Parikh images of grammars: Complexity and applications. In LICS, pages 80–89. IEEE Computer Society, 2010.
• [Lat79] Michel Latteux. Substitutions dans le EDT0L systèmes ultralinéaires. Information and Control, 42(2):194–260, 1979.
• [Lou79] Michael Conrad Loui. The space complexity of two pebble games on trees. Technical memorandum TM-133, Laboratory for Computer Science, Massachusetts Institute of Technology (MIT), 1979.
• [LT80] Thomas Lengauer and Robert Endre Tarjan. The space complexity of pebble games on trees. Inf. Process. Lett., 10(4/5):184–188, 1980.
• [Mey81] Friedhelm Meyer auf der Heide. A comparison of two variations of a pebble game on graphs. Theor. Comput. Sci., 13:315–322, 1981.
• [MP19] Anca Muscholl and Gabriele Puppis. The many facets of string transducers (invited talk). In STACS, volume 126 of LIPIcs, pages 2:1–2:21. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019.
• [Mus17] Anca Muscholl. A tour of recent results on word transducers. In FCT’17, pages 29–33, 2017.
• [Nor13] Jakob Nordström. Pebble Games, Proof Complexity, and Time-Space Trade-offs. Logical Methods in Computer Science, Volume 9, Issue 3:1–63, September 2013.
• [Nor15] Jakob Nordström. New Wine into Old Wineskins: A Survey of Some Pebbling Classics with Supplemental Results, 2015.
• [NW94] Noam Nisan and Avi Wigderson. Hardness vs randomness. J. Comput. Syst. Sci., 49(2):149–167, 1994.
• [Par66] Rohit J. Parikh. On context-free languages. J. ACM, 13(4):570–581, 1966.
• [Raj72] Vaclav Rajlich. Absolutely parallel grammars and two-way finite state transducers. J. Comput. Syst. Sci., 6(4):324–342, 1972.
• [Roz85] Brigitte Rozoy. About two-way transducers. In FCT’85, pages 371–379, 1985.
• [Roz87] Brigitte Rozoy. The Dyck language is not generated by any matrix grammar of finite index. Inf. Comput., 74(1):64–89, 1987.
• [Sal69] Arto Salomaa. On the index of a context-free grammar and language. Information and Control, 14(5):474–477, 1969.
• [Sav98] John E. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley, 1998.
• [Sch03] Alexander Schrijver. Springer-Verlag, 2003.
• [Sha09] Jeffrey Shallit. A Second Course in Formal Languages and Automata Theory. Cambridge University Press, 2009.

## Appendix A On matrix grammars of finite index

Rozoy’s 1987 paper [Roz87] is devoted to the proof that no matrix grammar can generate all words in the Dyck language  using bounded-index derivations without also generating some words outside . This amounts to saying that no finite-index matrix grammar can generate .

Unfortunately, the proof in that paper seems to be flawed. The main tool in the proof is an inductive statement, relying on several technical claims. The quantifiers in the use of one of these claims are swapped when compared against the formulation (and proof) of the claim. (The earlier conference version [Roz85] of that paper uses an equivalent formalism of deterministic two-way transducers instead of matrix grammars; while the high-level structure of the results is present in the conference version, it does not contain the details of the proofs.)

In more detail, the main tool in Rozoy’s proof is a certain inductive statement, , which is part of the proof of Proposition 6.2.1. This statement , , relies on several technical claims, and in particular on Corollary 6.1.2. This Corollary says, roughly, that, given a (part of a) derivation tree, a position  in the derived word, and a family of disjoint time intervals , …, , there exists an index  such that a certain quantity enjoys what one can call an averaging upper bound. The quantity is the height of the position  computed relative to just the symbols generated at times , but not at other times.

Unfortunately, the use of this Corollary does not match its formulation. This occurs in the proofs of the base cases  and , and in the proof of the inductive step. In each of these three cases, time intervals , , are chosen to correspond to occurrences of the factor in , denoted by , . But no position  is chosen. Then the Corollary is invoked to find an index  such that enjoys an averaging upper bound, where denotes the beginning of , the th occurrence of . It appears as if this use of Corollary would require a different formulation: instead of “for every position  there exists an index  such that is upper-bounded”, it would need a “given positions , , there exists an  such that for the pair the quantity is upper-bounded”.

Given the existing formulation, it seems possible that for each of the positions (the start positions of the occurrences of ), there indeed exists a suitable , but this  is different from . In other words, the quantities may be bounded for , but not for . A formulation with the swapped quantifiers, “there exists an  such that for all ”, would suffice to rule out this bad case, but it is not clear if it is possible to prove such a statement.

Rozoy’s construction has certainly influenced our lower bound construction (for the width of re-pairings). The family of statements  in her construction amounts to an inductive upper bound on the length of the factors  that can occur in the derivation tree for  if this tree has width at most  (we are glossing over some technical details here). In our proofs, instead of factors of the form  we consider arbitrary factors in Definition 1: the quantity upper-bounds the length of all factors (of ) that can be associated with fragments of width at most . We do not restrict the width of the entire tree either. The proof of the main inductive step (Lemma 3) then crucially depends on an upper bound for width  being available for all words, not just for words of a particular form, . We do not know if it is possible to use a weaker inductive statement, one mentioning words of the form only, for a lower bound on .

## Appendix B Trees, Dyck words, and re-pairings

In this section we review the various use of trees in our work. The interpretation of Dyck words as trees (subsection B.1) is required by our results on simple re-pairings and pebble games (Section C below and Section 3 in the main text). The tree representation of re-pairings (subsection B.2) is required for our lower bounds on the width of re-pairings (Section E below and Section 5 in the main text).

We will use ordered rooted trees. Let us recall the standard definitions.

A tree with an empty set of nodes (an empty tree) is denoted by . Suppose the set of nodes  is nonempty; then it has a distinguished node  (the root of the tree). Consider a mapping . The node is the immediate ancestor (parent) of a node , and the node  is the immediate descendant (child) of . The transitive closure of the parent (resp. child) relation is the ancestor (resp. descendant) relation. No node is its own descendant, and all non-root nodes are descendants of the root. These are exactly the defining properties of a rooted tree. The set of children of each node is linearly ordered. A leaf is a node without descendants.

A node and the set of its descendants form an ordered rooted tree in a natural way, which will be referred to as the subtree rooted at .

If every node in a tree has at most two children, the tree is binary. If , then the node is the left child and is the right child.

For a non-root node , denote by the sibling of , that is, the node with the same parent different from , if such a node exists in the tree.

### b.1 Dyck words and trees

Well-formed words are naturally associated with ordered rooted forests (i.e., with sequences of ordered rooted trees) in the following way. Well-formed words are exactly those generated by the unambiguous grammar

 S→ε | +S−S.

The rooted forest  associated with the word  is defined recursively using the derivation in this grammar: , and

 T(+σ1−σ2)=(T(+σ1−),T(σ2)),

where is the tree in which the children of the root are the roots of the trees from the rooted forest . The two symbols and in the word are associated with the root of the tree ; these symbols are also said to be matched to each other.

This way all symbols in the word  are split into pairs of symbols matched to each other, and every pair is associated with a node in the rooted forest .

### b.2 Tree representation of re-pairings

To prove lower bounds on the width, we will need a different representation of re-pairings. Informally, this representation will track the sequence of mergers of erased intervals. This sequence is naturally depicted as an ordered rooted binary tree. The width of the tree will match the width of the re-pairing (in that the two numbers will differ by at most ). This tree is essentially a derivation tree for the word in an appropriate matrix grammar.

#### b.2.1 Trees and traversals

Here and below, all trees wil be binary and edges in trees will be directed from root to leaves. That is, in an ordered rooted binary tree, the set

of edges consists of ordered pairs of the form

. A ranking function  will be defined on this set, and we will call it the time function. For edges  departing from the root, the time  is equal to ; for all pairs of edges , with a common endpoint the time function satisfies . Whenever , we will say that the edge  exists at time .

The width of a tree  with the set of nodes  and set of edges  is now defined as the maximum (over all time points ) number of edges existing at time , i.e.,

 width(I,E)=maxt∣∣T−1(t)∣∣.

The width of an arbitrary subset of edges, for , is defined analogously.

Recall the definition of the (left-to-right depth-first) traversal of the tree (we denote this traversal by ). This is a sequence of edges in which every edge occurs twice; it is defined recursively as follows. For a tree that consists of the root node only (and no edges), the traversal is the empty sequence. If and are the two children of the root, then

 τ(I,E)=(root,v1)⋅τ(I1,E1)⋅(root,v1)⋅(root,v2)⋅τ(I2,E2)⋅(root,v2),

where denotes the concatenation of sequences, and and are the subtrees rooted at and , respectively. Similarly, if there is only one child , then

 τ(I,E)=(root,v1)⋅τ(I1,E1)⋅(root,v1),

where is the subtree rooted at . Every tree has exactly one traversal.

An example of a traversal is shown in Fig. 2.

For every , edges that exist at time  cut the traversal as follows.

###### Claim 4.

The edges , , of the tree that exist at time  occur in the traversal in pairs:

 τ(I,E)=U0e1D1e1U1e2D2e2…Uk−1ekDkekUk. (5)

All edges from exist at times before , and all edges from at times after .

###### Proof.

Assume that the edges are sorted according to the indices of their first occurrences in the traversal.

Let be the two endpoints of the edge , so that is closer to the root than . It is clear from the definition of the traversal that the part of the traversal between the two occurrences of —denote this part by —is the traversal of the subtree rooted at ; therefore, all edges in that part exist at times strictly greater than (after) .

Since an undirected tree has exactly one path between any two edges, we have whenever .

For the same reason, the part  of the traversal between the second occurrence of  and the first occurrence of  does not touch any edges from subtrees rooted at . The same also holds for the initial and final parts of the traversal: from the beginning of the traversal up until  and from the second occurrence of  to the end of the traversal. ∎

#### b.2.2 Tree derivations

Tree derivations, or derivations, are similar to re-pairings defined in section 2. Informally, at any time point , one  and one  are placed on edges that exist at time ; the plus must be placed to the left of the minus. A sign can be placed on an edge in two ways: on the left side of the edge or on the right side of it. This is shown in Fig. 3.

The formal definition is as follows. Let be a ordered rooted binary tree. The traversal defines a function

 τ(I,E):{1,…,2|E|}→E

in natural way (any sequence is formally a function of this form).

Consider partial functions of the form

 π:{1,…,2|E|}→{−1,+1}.

These functions are partial words (or patterns) over the alphabet . We define by the domain of the partial function, i.e., the set of all such that is defined.

For any subset a partial word defines a word over the alphabet  by the rule: if , where , then

 π(S)=π(i1)⋅π(i2)⋅…⋅π(iℓ).

Let be the set of places in the traversal occupied by edges existing at time . Formally,

 Et={i:T(τ(I,E)(i))=t}.

For brevity we use notation .

Given a tree , a tree derivation (or a derivation) is a partial word such that, for all points in time , either the word  is empty, or it is .

Every edge occurs in the traversal exactly twice and thus may derive at most two signs in a tree derivation . Let , where . If is defined, then the sign is said to be placed on the left side of the edge. If is defined, the sign is said to be placed on the right side of the edge.

For an example of a derivation shown in Fig. 3, the corresponding partial function is as follows:

 i12345678π+1−1+1−1

Blank spaces in the table mean that the function is not defined.

Let be a derivation based on the tree . We will say that the word  is generated by the derivation , or, alternatively, that the word is derived by the tree. We will also sometimes say that a sign in the word is derived at time  if it is derived by an edge that exists at time .

#### b.2.3 Fragments

Consider an arbitrary factor of the tree traversal . The set of edges that occur in this factor at least once forms a subgraph in the tree , which we will call a fragment of this tree. It is easy to check that all fragments are (weakly) connected subgraphs of the graph . Thus, a fragment is a tree in graph theory terms.

Let be a derivation based on a tree . An interval defines a factor of the word and each factor of can be represented in this form. Moreover, w.l.o.g. we will assume that .

Suppose that for a factor of the word . The interval selects a factor in the tree traversal. We say that the fragment corresponding to this factor of the tree traversal is associated to the factor .

Note that the only one fragment is associated to a factor of but the converse is not true. A fragment is a set of edges by definition. It may correspond to different factors of the tree traversal because each edge occurs twice in the traversal.

### b.3 Derivations and re-pairings

We now show how to link re-pairings of words and derivations.

Let be a derivation based on a tree . Consider the word . Take any point in time  for which this word has two (paired) signs; they are said to be derived at this time point. Denote by the index of the sign from this pair in the word ; and by the index of the sign. We obtain a sequence

 pπ=((ℓtmax,rtmax),…,(ℓt1,rt1)),

where is the maximum point in time at which edges of the tree exist. The time in this sequence flows in direction opposite to that in the tree; indeed, as we already mentioned at the beginning of section B.2, the tree depicts a sequence of mergers of erased intervals.

Fig. 1(a) shows an example where a re-pairing of a word is obtained from a tree derivation.

###### Claim 5.

The word is well-formed, and the sequence is its re-pairing. The width of  does not exceed the width of the tree .

###### Proof.

It is easy to see that the word is well-formed and the sequence satisfy the requirements on re-pairings. Indeed, if , then the plus is placed to the left of the minus by definition of words .

Consider a time point . Suppose  edges exist at this time, , …, if read from left to right. It follows from Claim 4 that the re-pairing has, at this time point, at most  erased intervals in the word . Indeed, in terms of the factorization (5) from Claim 4, each interval is formed by the symbols of the word that correspond to the edges from the subsequence (this is the traversal of the subtree of descendants of the edge ).

Some of these intervals can be empty, however, they do in any case cover all signs that have been erased by time . ∎

We now describe a link in the opposite direction. Let be a re-pairing of the word . We will construct a tree of width  and a derivation  based on this tree such that .

Let us first construct a tree in which the nodes are pairs of the form

 (maximal erased interval, time point).

The time here is specified according to the sequence . The intervals are specified by their start and end positions. This way a node corresponds to a maximal interval , between positions  and , which has been erased completely by time .

The ancestor—descendant relation on the pairs of this form is the conjunction of the set inclusion for intervals and the ordering on time points: a pair is a descendant of a pair if and only if

 [s1,f1]⊆[s2,f2]andt1

Here we have taken into account that the time on the tree and in the re-pairing flows in opposite directions.

All possible ways in which the erased intervals can evolve when a new pair of signs is erased in the re-pairing  are shown in Fig. 4.

All intervals not adjacent to the freshly erased (paired) signs remain unchanged (case (1)). The node of the tree that corresponds to this interval at the time right after these two signs are paired has exactly one child. The cases (2)–(6) are similar: in these cases the interval evolves, but does not merge with other intervals. For this reason, the nodes corresponding to such intervals at this time have exactly one child too. As a special case, these scenarios can capture a new interval emerging (the dashed area in the picture is empty), that is, a new leaf appearing in the tree .

In the case (7) one of the freshly erased (paired) signs is adjacent to two intervals. Dashed lines in the picture indicate positions of signs that may also be paired at this time. The node of the tree that corresponds to the new interval has two children in this case. Case (8) shows another way two intervals can merge, in which there are two signs between the intervals, and they are paired with one another at this point. There are no further cases of two-interval mergers: indeed, since just two signs are erased at a single time point, there may be at most two signs between the merging intervals.

Finally, case (9) shows the last possible scenario, one where three intervals are merged into a single one. The relative position of signs is determined uniquely in this case.

We will now transform the tree into a binary tree and a derivation  based on the latter tree such that . As seen from Fig. 4, the tree may have nodes with three children, in which case auxiliary nodes need to be inserted so that the tree would be binary.

The way the signs are positioned on the edges of the tree is based on Fig. 4 and shown in Fig. 5.

Note that due to the case (9) there is no bijection between the points in time in the tree and the points in time in the re-pairing. Still, the ordering of erased pairs in time in the tree is the reverse of the ordering in the re-pairing . This ensures the equality .

Another observation is that in the cases (5)–(6) the width of the tree becomes greater by  than the width of the re-pairing , because the signs are placed on (auxiliary) edges that lead to auxiliary leaves. Informally, one can think of this as of a new interval (of and ) being “born” first and merged with an old interval second. In the re-pairing  these events occur simultaneously.

Combining all the arguments above, we obtain a new characterization of the width of a well-formed word.

###### Theorem 6.

The difference between the width of a well-formed word  and the minimum width of a tree  for which there exists a derivation  generating the word , i.e., , is at most .

## Appendix C Proofs for Section 3

### c.1 Proof of the logarithmic upper bound

In this subsection we prove Theorem 1. We will use several observations.

###### Claim 6.

Let be a factorization of a well-formed word into well-formed words. Then

 width(σ)≤1+max1≤i≤t(width(