 # Dissecting Power of a Finite Intersection of Context Free Languages

Let ^k,α denote a tetration function defined as follows: ^1,α=2^α and ^k+1,α=2^^k,α, where k,α are positive integers. Let Δ_n denote an alphabet with n letters. If L⊆Δ_n^* is an infinite language such that for each u∈ L there is v∈ L with | u|<| v|≤^k,α| u| then we call L a language with the growth bounded by (k,α)-tetration. Given two infinite languages L_1,L_2∈Δ_n^*, we say that L_1 dissects L_2 if | L_1∩ L_2|=∞ and |(Δ_n^*∖ L_1)∩ L_2|=∞. Given a context free language L, let κ(L) denote the size of the smallest context free grammar G that generates L. We define the size of a grammar to be the total number of symbols on the right sides of all production rules. Given positive integers n,k with k≥ 2, we show that there are context free languages L_1,L_2,…, L_3k-3⊆Δ^*_n with κ(L_i)≤ 40 k such that if α is a positive integer and L⊆Δ_n^* is an infinite language with the growth bounded by (k,α)-tetration then there is a regular language M such that M∩(⋂_i=1^3k-3L_i) dissects L and the minimal deterministic finite automaton accepting M has at most k+α+3 states.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the theory of formal languages, the regular and the context free languages constitute a fundamental concept that attracted a lot of attention in the past several decades. Recall that every regular language is accepted by some deterministic finite automaton and every context free language is accepted by some pushdown automaton.

In contrast to regular languages, the context free languages are closed neither under intersection nor under complement. The intersection of context free languages have been systematically studied in [GINSBURG1966620, weiner1973, Wotschke1973, WOTSCHKE1978456, 10.1007/978-3-030-40608-0_24]. Let denote the family of all languages such that for each there are context free languages with . For each , it has been shown that there is a language such that . Thus the -intersections of context free languages form an infinite hierarchy in the family of all formal languages lying between context free and context sensitive languages [weiner1973].

One of the topics in the theory of formal languages that has been studied is the dissection of infinite languages. Let be an alphabet with letters, and let be infinite languages. We say that dissects if and . Let be family of languages. We say that a language is -dissectible if there is a such that dissects . Let denote the family of regular languages. In [YAMAKAMI2013116] the -dissectibility has been investigated. Several families of -dissectible languages have been presented. Moreover it has been shown that there are infinite languages that cannot be dissected with a regular language. Also some open questions for -dissectibility can be found in [YAMAKAMI2013116]. For example it is not known if the complement of a context free languages is -dissectible.

There is a related longstanding open question in [bucher1980]: Given two context free languages such that and is an infinite set, is there a context free language such that , , and both the languages and are infinite? This question was mentioned also in [YAMAKAMI2013116].

Some other results concerning the dissection of infinite languages may be found in [10.1007/978-3-319-64419-6_13]. A similar topic is the finding of minimal covers of languages; see [Domaratzki2001MinimalCO]. Recall that a language is called -immune if there is no infinite language such that . The immunity is also related to the dissection of languages; some results on this theme can be found in [10.1007/978-3-662-21545-6_34, post1944, 10.1007/978-3-030-40608-0_24].

Let denote the set of all positive integers. An infinite language is called constantly growing, if there is a constant and a finite set such that for each with there is a word and a constant such that . We say also that is -constantly growing. In [YAMAKAMI2013116], it has been proved that every constantly growing language is -dissectible.

We define a tetration function (a repeated exponentiation) as follows: and , where . The tetration function is known as a fast growing function. If are positive positive integers and is an infinite language such that for each there is with then we call a language with the growth bounded by -tetration.

Let be an infinite language with the growth bounded by -tetration, where . In the current article we show that there are:

• an alphabet with ,

• an erasing alphabetical homomorphism ,

• a nonerasing alphabetical homomorphism , and

• context free languages

such that the homomorphic image dissects the homomorphic image .

We sketch the basic ideas of our proof. Recall that a non-associative word on the letter is a “well parenthesized” word containing a given number of occurrences of . It is known that the number of non-associative words containing occurrences of is equal to the -th Catalan number. For example for we have distinct non-associative words: , , , , and . Every non-associative word contains the prefix for some , where denotes the -th power of the opening bracket. It is easy to verify that there are non-associative words such that equals “approximately” . We construct three context free languages whose intersection accepts such words; we call these words balanced non-associative words. By counting the number of opening brackets of a balanced non-associative word with occurrences of we can compute a logarithm of .

Let and . Our construction can be “chained” so that we construct context free languages, whose intersection accepts words with occurrences of and a prefix , where is equal “approximately” to and . If is a language with the growth bounded by a -tetration then the language is constantly growing. Less formally said, by means of intersection of context free languages we transform the challenge of dissecting a language with the growth bounded by -tetration to the challenge of dissecting a constantly growing language. This approach allows us to prove our result.

## 2 Preliminaries

Let denote the set of all positive real numbers.

Let be an ordered alphabet (set) of distinct opening brackets, and let be an ordered alphabet (set) of distinct closing brackets. We define the alphabet . The alphabet contains all opening brackets and all the closing brackets without the the first one . It follows that .

Let denote the empty word. Given a finite alphabet , let denote the set of all finite nonempty words over the alphabet and let .

Given a finite alphabet , let denote the number of occurrences of the nonempty factor in the word . Let denote the set of all factors a word . We define that ; i.e. the empty word and the word are factors of . Let denote the set of all prefixes of . We define that . Let denote the set of all suffixes of . We define that .

Given two finite alphabets , a homomorphism from to is a function such , where . It follows that in order to define a homomorphism , it suffices to define for every ; such definition “naturally” extends to every word . We say that is an nonerasing alphabetical homomorphism if for every . We say that is an erasing alphabetical homomorphism if for every and there is at least one such that .

## 3 Balanced non-associative words

Suppose , where , and . To simplify the notation we define , , and ; it means that denotes the -th opening bracket, denotes the -th closing bracket, and denotes the -th opening bracket.

Let be an erasing alphabetical homomorphism defined as follows:

• ,

• ,

• .

• , where .

Let be a language, we define the language .

###### Remark 3.1.

For given the erasing alphabetical homomorphism sends all opening and closing brackets from and to the empty string with the exception of , , and .

Let be the context free language generated by the following context free grammar, where is a start non-terminal symbol, is a non-terminal symbol, and are terminal symbols (the letters from ).

• ,

• , where .

We call the words from non-associative words over the opening bracket , the closing bracket , and the letter .

###### Remark 3.2.

Let . To understand the definition of , note that the language is generated by the context free grammar defined by: To see this, just remove the non-terminal symbol in the definition of . The usage of the non-terminal symbol allows to “insert” between any two letters of a word from the words from ; the set contains words from that have no occurrence of . It means that if , then , where and .

The reason for the name “non-associative words” is the obvious similarity between the words from and the “standard non-associative words” mentioned in the introduction section. For our purpose it is necessary that if and only if for every .

Recall that a pushdown automaton is a -tuple , where

• is a set of states,

• is an input alphabet,

• is a stack alphabet,

• is an input state,

• is the initial symbol of the stack,

• is a transition function.

We define that a pushdown automaton accepts a word by the empty stack, hence we do not need to define the set of final states. Given a pushdown automaton , let denotes the language accepted by .

Let denote the language accepted by the pushdown automaton , where:

• ,

• ,

• , where , , and ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• , where ,

• ,

• , where ,

• , where , and

• , where and .

###### Remark 3.3.

Note that the letters from change neither the state of nor the stack. Hence to illuminate the behavior of , we can consider only words over the alphabet . Then it is easy to see that the pushdown automaton pushes on the stack on the first occurrence of . For every other occurrence of the pushdown automaton pushes on the stack. Once reached the state , then for every occurrence of one is removed from the stack. The state works as a refuse state. Note that after reaching the state the stack is not empty, the stack cannot be changed, and no other state can be reached from . The states and enable to recognize the first occurrence of . Once the states are reached, the states and can not be reached any more.

Thus the pushdown automaton accepts all words, where the number of occurrences of after the first occurrence of is exactly one more than the number of occurrences of . Formally, if then we define as follows:

• If then .

• If then let be such that and .

Clearly is uniquely defined. Then we have that if and only if or . It follows that the words without any occurrence of are accepted. In the following we will consider the words from the intersection . Note that there are only two nonempty words , that have no occurrence of .

Recall that a “standard” non-associative word can be represented as a full binary rooted tree graph, where every inner node represents a corresponding pair of brackets and every leaf represents the letter . It is know that the number of inner nodes plus one is equal to the number of leaves in a full binary rooted tree graph. In the case of non-associative words from , let the leaves represent the factors and . Then the number of occurrences of is equal to the number of leaves and the number of occurrences of is equal to the number of inner nodes. Hence the intersection contains non-associative words that have no “unnecessary” brackets; for example , and .

Let be the context free language generated by the following context free grammar, where is a start non-terminal symbol, are non-terminal symbols, and are terminal symbols (the letters from ).

• ,

• ,

• ,

• ,

• ,

• , where .

We call the words from balanced words.

###### Remark 3.4.

Let . It is easy to see that the words from the language contains no factor of the form , where are distinct positive integers; hence the name “balanced” words. The non-terminal symbols enable that if then has a prefix and a suffix for some .

The non-terminal symbol in the definition of has the same purpose like in the definition of .

Let

 Ωk,m=Nawk,m∩Balk,m∩Λk,m.

We call the words from balanced non-associative words over the opening bracket , the closing bracket , and a letter .

Let , where . The set contains the balanced non-associative words having exactly occurrences of the letter .

Given a word and , let

 height(w,a)=max{j∣aj∈Fac(w)}.

The height of a word is the maximal power of the letter , that is a factor of . We show that if and is the height the opening bracket in then is a prefix of and is a suffix of .

###### Lemma 3.5.

If and then and .

###### Proof.

Note that . Since , there is such that . To get a contradiction suppose that . Because it follows that for some with and .

Consider the prefix . Obviously . It is easy to see that if , and then . Thus . It follows that .

This is a contradiction, since for every prefix of a non-associative word we have that . We conclude that and . In an analog way we can show that . This completes the proof. ∎

For a word , we show the relation between the height of and the number of occurrences of in .

###### Proposition 3.6.

If and then

 2h−1≤occur(w,z)≤2h.
###### Proof.

We prove the proposition for all by induction:

• If then .

• If then .

• If then

Thus the proposition holds for and clearly we have that if then , where . Suppose the proposition holds for all . We prove the proposition holds for .

Let and . Lemma 3.5 implies that , , , and . Since it follows that . Because we have that . Clearly ; note that . Thus . For we assumed that the proposition holds for all , we can derive that

 occur(w,z)=occur(w1,z)+occur(w2,z)≤2h1+2h1=2h1+1=2h

and

 occur(w,z)=occur(w1,z)+occur(w2,z)≥2h1−1+2h1−1=2h1=2h−1.

This completes the proof. ∎

Proposition 3.6 have the following obvious corollary.

###### Corollary 3.7.

If , , and then

 log2n≤h≤1+log2n.

Given , let denote the word built from by replacing the first occurrence of in by . Formally, if then . If and , where then .

We prove that the set of balanced non-associative words having occurrences of is nonempty for each .

If then .

###### Proof.

If then . Given with , let be such that . Obviously such exists and is uniquely determined. Let . Let . Clearly and . Note that . Let and . Let . Then one can easily verify that and .

Less formally said, we construct a balanced non-associative word having occurrences of and then we replace a given number of occurrences of with the factor to achieve the required number of occurrences of . This completes the proof.

## 4 Intersection of balanced non-associative words

Let and let . We show that for all positive integers with there is a word such that has occurrences of the opening bracket .

If and then .

###### Proof.

Let . Let and let , where . Lemma 3.8 implies that such exist.

Let . Let . Note that . Then it is quite straightforward to see that and . This completes the proof.

To clarify the proof of Proposition 4.1, let us see the following example.

###### Example 4.2.

Let and . To make the example easy to read, we define and . It means that , , , , , , , and .

To fit the example into the width of the page, we define auxiliary words and :

• ,

• .

Then we have that

• ; ; ; ;

• ; ; ; ;

• ;

• .

This ends the example.

We define two technical functions and for all and as follows:

• and .

• and .

It is a simple exercise to prove the following lemma. We omit the proof.

###### Lemma 4.3.

If then there is a constants such that for each with we have

 log(j)2t≤log[j]2t≤c1+log[j]2t.
###### Remark 4.4.

Note in Lemma 4.3 that the constant depends on .

Using the function we present an upper and a lower bound for the height of words from .

###### Proposition 4.5.

If , then there is a constants such for each , , and we have

 log(k)2n≤h≤c1+log(k)2n.
###### Proof.

It follows from Corollary 3.7 that Then the proposition follows from Lemma 4.3.

###### Remark 4.6.

Note in Proposition 4.5 that the constant depends on .

## 5 Dissection by a regular language

In [YAMAKAMI2013116] it was shown that every constantly growing language can be dissected by some regular language.

###### Lemma 5.1.

(see [YAMAKAMI2013116, Lemma ]) Every infinite constantly growing language is -dissectible.

From the proof of Lemma in [YAMAKAMI2013116] we can formulate the following Lemma.

###### Lemma 5.2.

If , , , , and is a -constantly growing language then there are such that both sets are infinite, where

 Hi={w∣w∈L and |w|≡ji(modc+1)} and i∈{1,2}.

## 6 Tetration

Recall that a deterministic finite automaton is -tuple , where is the set of states, is an input alphabet, is the initial state, is a transition function, and is the set of accepting states. Let denote the language accepted by ; is a regular language.

We prove that if is an infinite language of balanced non-associative words with the number of occurrences of “bounded” by -tetration then can be dissected by a regular language.

###### Proposition 6.1.

If , , and is an infinite language such that for each there is with then there is a regular language such that dissects .

###### Proof.

Let be such that

 n2≤expkαn1, (1)

where and .

Let and . Proposition 4.5 implies that there is a constants such that

 h1≥log(k)2n1 and h2≤c1+log(k)2n2 (2)

Note that the constant does not depend on . On the other hand the constant depends on .

From (1) and (2) we have that

 h2≤c1+log(k)2n2≤c1+log(k)2(expk,αn1). (3)

Realize that and that if and then . Then we have that

 log(j)2(expj,αn1)=log(j−1)2(expj−1,α+log2n1)≤log(j−1)2(expj−1,αlog2n1). (4)

From (4) it follows that

 log(k)2(expk,αn1)≤log2(exp1,αlog(k−1)2n1)=α+log(k)2n1. (5)

From (2), (3), and (5) we have that

 h2≤c1+α+log(k)2n1≤c1+α+h1. (6)

The equation (6) says that there is a constant such that for each there is with .

Lemma 5.2 implies that there are such that both are infinite sets, where

 Hi={v∣v∈L and height(μk,k(v),xk)≡ji(modc+1