# New Size Hierarchies for Two Way Automata

We introduce a new type of nonuniform two--way automaton that can use a different transition function for each tape square. We also enhance this model by allowing to shuffle the given input at the beginning of the computation. Then we present some hierarchy and incomparability results on the number of states for the types of deterministic, nondeterministic, and bounded-error probabilistic models. For this purpose, we provide some lower bounds for all three models based on the numbers of subfunctions and we define two witness functions.

## Authors

• 16 publications
• 1 publication
• 1 publication
• ### Game Characterization of Probabilistic Bisimilarity, and Applications to Pushdown Automata

We study the bisimilarity problem for probabilistic pushdown automata (p...
11/16/2017 ∙ by Vojtech Forejt, et al. ∙ 0

• ### Lower Bounds on Unambiguous Automata Complementation and Separation via Communication Complexity

We use results from communication complexity, both new and old ones, to ...
09/19/2021 ∙ by Mika Göös, et al. ∙ 0

• ### One-Way Topological Automata and the Tantalizing Effects of Their Topological Features

We cast new light on the existing models of 1-way deterministic topologi...
03/18/2019 ∙ by Tomoyuki Yamakami, et al. ∙ 0

• ### Polynomially Ambiguous Probabilistic Automata on Restricted Languages

We consider the computability and complexity of decision questions for P...
02/25/2019 ∙ by Paul C. Bell, et al. ∙ 0

• ### State Complexity Characterizations of Parameterized Degree-Bounded Graph Connectivity, Sub-Linear Space Computation, and the Linear Space Hypothesis

The linear space hypothesis is a practical working hypothesis, which ori...
11/15/2018 ∙ by Tomoyuki Yamakami, et al. ∙ 0

• ### CoPaR: An Efficient Generic Partition Refiner

Partition refinement is a method for minimizing automata and transition ...
11/21/2018 ∙ by Hans-Peter Deifel, et al. ∙ 0

• ### Representing Unordered Data Using Multiset Automata and Complex Numbers

Unordered, variable-sized inputs arise in many settings across multiple ...
01/02/2020 ∙ by Justin DeBenedetto, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Nonuniform models (like circuits, branching programs, uniform models using advice, etc.) have played significant roles in computational complexity, and, naturally they have also been investigated in automata theory (e.g. [7, 12, 14, 6]). The main computational resource for nonuniform automata is the number of internal states that depends on the input size. Thus we can define linear, polynomial, or exponential size automata models. In this way, for example, nonuniform models allow us to formulate the analog of “ versus problem” in automata theory: Sakoda and Sipser [24] conjectured that simulating a two–way nondeterministic automaton by two–way deterministic automata requires exponential number of states in the worst case. But, the best known separation is only quadratic () [9, 13] and the researchers have succeeded to obtain slightly better bounds only for some modified models (e.g. [21, 15, 11, 16]). Researchers also considered similar question for OBDD model that can be seen as nonuniform automata (e.g. [18], [20], [4], [2], [1], [19]).

In this paper, we present some hierarchy results for deterministic, nondeterministic, and bounded-error probabilistic nonuniform two–way automata models, which can also be seen as a “two–way” version of ordered binary decision diagrams (OBDDs) [27]. For each input length (), our models can have different number of states, and, like Branching programs or the data-independent models defined by Holzer [12], the transition functions can be changed during the computation. Holzer’s model can use a different transition function for each step. We restrict this property so that the transition function is the same for the same tape positions, and so, we can have at most different transition functions. Moreover, we enhance our models by shuffling the input symbols at the beginning of the computation. We give the definitions and related complexity measures in Section 2.

In order to obtain our main results, we start with presenting some generic lower bounds (Section 3) by using the techniques given in [25] and [10]. Then, we define two witness Boolean functions in Section 3.2.1: Shuffled Address Function, denoted , which is a modification of Boolean functions given in [22] (see also [5], [8], [13], [17]), and its uniform version . Moreover, regarding these functions, we provide two deterministic algorithms. In our results, we also use the well known Equality function .

In Sections 4 and 5, we present our main results based on the size (the number of states) of models. We obtain linear size separations for deterministic models and quadratic size separations for nondeterministic and probabilistic models. Moreover, we investigate the effect of shuffling for all three types of models, and, we show that in some cases shuffling can save huge amount of states and in some other cases shuffling cannot be size efficient. We also show that the constant number of states does not increase the computational power of deterministic and nondeterministic nonuniform models without shuffling.

## 2. Definitions

Our alphabet is binary, . We mainly use the terminologies of Branching programs: Our decision problems are solving/computing Boolean functions: The automaton solving a function accepts the inputs where the function gets the value true and rejects the inputs where the function gets the value false. For uniform models, on the other hand, our decision problems are recognizing languages: The automaton recognizing a language accepts any member and rejects any non-member.

A nonuniform head–position–dependent two–way deterministic automaton working on the inputs of length/size (2DA) is a 6-tuple

 Dn=(Σ,S,s1,δ={δ1,…,δn},sa,sr),

where (i) is the set of states ( can be a function in ) and () are the initial, accepting, and rejecting states, respectively; and, (ii) is a collection of transition functions such that is the transition function that governs behaviour of when reading the th symbol/variable of the input, where . Any given input is placed on a read-only tape with a single head as from the squares 1 to , where is the th symbol of . When is in and reads on the tape, it switches to state and updates the head position with respect to if . If (), the head moves one square to the left (the right), and, it stays on the same square, otherwise. The transition functions and must be defined to guarantee that the head never leaves during the computation. Moreover, the automaton enters or only on the right most symbol and then the input is accepted or rejected, respectively.

The nondeterministic counterpart of 2DA, denoted 2NA, can choose from more than one transition in each step. So, the range of each transition function is , where is the power set of any given set. Therefore, a 2NA can follow more than one computational path and the input is accepted only if one of them ends with the decision of “acceptance”. Note that some paths end without any decision since the transition function can yield the empty set for some transitions.

The probabilistic counterpart of 2DA, denoted 2PA, is a 2NA

such that each transition is associated with a probability. Thus, 2PA

s can be in a probability distribution over the deterministic configurations (the state and the position of head forms a configuration) during the computation. To be a well-formed machine, the total probability must be 1, i.e. the probability of outgoing transitions from a single configuration must be always 1. Thus, each input is accepted and rejected by a 2PA

with some probabilities. An input is said to be accepted/rejected by a (bounded-error) 2PA if the accepting/rejecting probability by the machine is at least for some .

A function is said to be computed by a 2DA (a 2NA , a 2PA ) if each member of is accepted by (, ) and each member of is rejected by (, ).

The class is formed by the functions such that each is computed by a 2DA , the number of states of which is no more than , where is a non-negative integer. We can similarly define nondeterministic and probabilistic counterparts of this class, denoted and respectively.

We also introduce a generalization of our nonuniform models that can shuffle the input at the beginning of the computation with respect to a permutation. A nonuniform head–position–dependent shuffling two–way deterministic automaton working on the inputs of length/size of (2DA), say , is a 2DA that shuffles the symbols of input with respect to , a permutation of , i.e. the -th symbol of the input is placed on -th place on the tape (), and then execute the 2DA algorithm on this new input. The nondeterministic and probabilistic models can be respectively abbreviated as 2NA and 2PA.

The class is formed by the functions such that each is computed by a 2DA whose number of states is no more than , where is a non-negative integer and is a permutation of . The nondeterministic and probabilistic classes are respectively represented by and .

Moreover we consider uniform versions of two-way automata, respectively 2DFA and 2NFA. We can define 2DFA in the same way as 2DA, but it is identical for all , and for any . Moreover, they can use end-markers, between which the given input is placed on the input tape. We can define 2NFA similarly. The corresponding classes of languages defined by 2DFAs and 2NFAs of size are denoted and , respectively.

## 3. Lower bounds, Boolean functions, and algorithms

Our key complexity measure behind our results is the number of subfunctions for a given function. It can be seen as the counterpart of “the equivalence classes of a language” with respect to Myhill-Nerode Theorem [23].

Let be a Boolean function defined on . We define the set of all permutations of as . Let be a permutation. We can order the elements of with respect to , say , and then we can split them into two disjoint non-empty (ordered) sets by picking an index : and . Let be a mapping that assigns a value to each . Then, we define function that returns the value of where the values of the input from are fixed by . The function is called a subfunction.

The total number of different subfunctions with respect to and is denoted by . Then, we focus on the maximum value by considering all possible indices:

 Nθ(f)=maxi∈{1,…,n−1}Nθi(f).

After this, we focus on the best permutation that minimizes the number of subfunctions:

 N(f)=minθ∈Θ(n)Nθ(f).

Now, we represent the relation between the number of subfunctions for Boolean function and the number of equivalence classes of a language.

Let be a language defined on . For a given non-negative integer , is the language composed by all members of with length , i.e. . For , two strings and are said to be equivalent if for any , if and only if .

We denote the number of non-equivalent strings of length as . Then, similar to the number of subfunctions,

 R(Ln)=maxr∈{1,…,n−1}Rr(Ln) and Rn(L)=R(Ln).

The function denotes the characteristic Boolean function for language . Thus, we can say that

 Nid(fLn)=Rn(L).

for is natural order.

### 3.1. Lower bounds

First we give our lower bounds on the sizes of models in terms of . Note that all of lower bounds for shuffling models are valid also for non-shuffling models.

###### Theorem 1.

If the function is computed by a 2DA of size for some permutations , then

 N(f)≤(d+1)d+1.
###### Proof.

This result is easily obtained by using the standard and well-known conversion given by Shepherdson [25]. ∎

###### Corollary 1.

If the language is recognized by a 2DFA of size , then

 Rn(L)≤(d+1)d+1,
###### Theorem 2.

If the function is computed by a 2NA of size for some permutations , then

 N(f)≤2(d+1)2.
###### Proof.

This result follows from [26]. ∎

###### Corollary 2.

If the language is recognized by a 2NFA of size , then

 Rn(L)≤2(d+1)2.

Based on Theorems 1 and 2, we can obtain the following result.

###### Theorem 3.

For constant integer , and

contain only characteristic functions of regular languages.

###### Proof.

If is constant and (or ) computes then is constant and it is same for each . Hence the of the corresponding language is constant too, so the language is regular since the number of equivalence classes is finite with respect to Myhill-Nerode Theorem [23]. ∎

###### Theorem 4.

If the function is computed by a 2PA of size for some permutations with expected running time and error probability , then

 N(f)≤⌈4d(8+3logT)log(1+2ε)(1+ε)⌉(d+1)2.
###### Proof.

This result follows from the techniques given in [10].

We need some additional definitions to present our lower bound for 2PA. Let . Two numbers and are said to be if either

• or

• , , and .

Two numbers and are said to be mod if either

• and or

• , and and are .

Let be an

-state Markov chain with starting state

and two absorbing states and ; denote the probability that Markov chain is absorbed in state when started in state ; and, denote the expected time to absorption into one of the states or . Two Markov chains and are said to be mod if, for each pair , and are mod .

Let be any partition such that is the order of the inputs for , and, and .

Let us consider configurations for . Configuration is initial configuration of the automata, is accepting state and position of head on the last symbol, is similar but in rejecting state. For , configuration is for position and state of the automata . For , configuration is for position and state of the automata . We will use three object for describing computational process for the automata: matrix

and vectors

. Matrix is block diagonal matrix with two blocks and . -th line and row of the matrix corresponding to configuration . Matrices , have following elements:

• If begins the computation from the configuration , then it reaches the configuration early than another for within probability

• if begins the computation from the configuration , then it reaches the configuration early than another for within probability .

The vector and have size and -th element of vectors also correspond to . The vector represents initial distribution of probability for configuration of the automata. So and other elements are . The vector is characteristic vector of accepting state. So, and other elements are

Let us discuss some properties.

###### Lemma 1.

If the function is computed by of size within error probability , then for any satisfying , there is a such that

 p0⋅(MA(σ,γ))t′⋅q≥12+ε

On the other hand, for any satisfying there is no such .

###### Proof.

Since computation is probabilistic, there can be more than one path from the configuration , where it reaches the configuration early than another for . But by construction of matrix , we consider all of them. In [10] have shown that we can model probabilistic computation in this way. ∎

We have shown that we model computation of 2PA by Markov chain specified by matrix .

###### Lemma 2.

Let and are two -state Markov chains and . Let , и . If and are mod , then .

###### Proof.

Dwork and Stockmeyer [10] have shown that

 a(P′)≥(1−2λm3)β−2ma(P)−4√λmT.

Therefore the bound on may be obtained by substituting the values of and . ∎

Let be the set of all possible matrices such that and, for any and , and are mod .

The inequality can be obtained in the same way.111 includes all information about the behaviour of automaton on and so if there are two different and with different subfunctions but their matrices and are mod , it should be different on some . But the behaviour of the automaton on this input is the same and therefore matrices and are mod . This is a contradiction.

To estimate

we use technique similar to one from [10]. Let . Define an equivalence relation on matrices in as follows: . Let be a largest equivalence class. Since there are at most equivalence classes, . Size of is obtained in [10], since by substituting values and we have:

 Nid(f)≤|M|≤2c|E(w)|≤
 ≤⌈4d(8+3logT)log(1+2ε)/(1+ε)⌉c.

The proof of Theorem 4 is completed.

###### Corollary 1.

If the function is computed by a 2PA of size for some permutations with expected running time and error probability , then

 N(f)≤(32dlogT)(d+1)2.
###### Proof.

For , , and, for , . ∎

### 3.2. Boolean Functions

We define two Boolean functions: (1) A modification of Boolean function given in [3, 17, 22, 8, 13] Shuffled Address Function, denoted , and (2) Uniform Shuffled Address Function as a modification of . We also use the language , the characteristic function of which is .

#### 3.2.1. Boolean Function 2-SAFt:

We divide all input into two parts, and each part into blocks. Each block has address and value. Formally, Boolean function for integer such that

 (1) 2t(2t+⌈log2t⌉)

We divide the input variables (the symbols of the input) into blocks. There are variables in each block. After that, we divide each block into address and value variables (see Figure 1). The first variables of block are address and the other variables of block are value. We call and are the value and the address variables of the th block, respectively, for .

Function is calculated based on the following five sub-routines:

1. gets the address of a block:

2. gets the number of block by address:

 Ind(X,a)={p,where p is the minimal % number such that Adr(X,p)=a,−1,if there are no such p.
3. gets the value of the block with address :

 Val(X,a)={∑b−1j=0xpj(mod t),where p=Ind(X,a) for p≥0,−1,if Ind(X,a)<0.

Suppose that we are at the -th step of iteration.

1. gets the first part of the th step of iteration:

 Step1(X,i)={−1,if Step2(X,i−1)=−1,Val(X,Step2(X,i−1))+t,otherwise.
2. gets the second part of the th step of iteration:

 Step2(X,i)=⎧⎪⎨⎪⎩−1,if Step1(X,i)=−1,2,if i=−1Val(X,Step1(X,i)),otherwise.

Function is computed iteratively:

1. We find the block with address in the first part and compute the value of this block, which is the address of the block for the second part.

2. We take the block from the second part with the computed address and compute value of the block, which is the address of the new block for the first part.

3. We find the block with new address in the second part and check value of this block. If the value is greater than , then value of is , and otherwise.

If we do not find block with searching address in any phase then value of is also . See the Figure 2 for the iterations of the function.

###### Theorem 5.

For integer , where satisfies .

###### Proof.

The proof is based on the following two technical Lemmas 3 and 4.

###### Lemma 3.

Let be some integers satisfying Inequality (1) and be a partition such that contains at least value variables from exactly blocks. Then, contains at least value variables from exactly blocks.

###### Proof.

We define contains at least value variables from th block. Let . Then, contains at most value variables from th block, so contains at least value variables from th block. By (1), we can get

 b−(t−1)=⌊n2t⌋−⌈log2t⌉−(t−1)

which is bigger than

 (2t+⌈log2t⌉)−⌈log2t⌉−(t−1)=2t−(t−1)=t+1.

Let be the numbers of all blocks and . Then, we can follow that . ∎

Let be any order. Then, we pick a partition such that contains at least value variables from exactly blocks. We define contains at least value variables from th block and . By the proof of Lemma 3, we know that .

Let be the partition for the input with respect to . We define the sets and for the input with respect to that satisfies the following conditions. For , , , and :

• For any , ;

• For any , ;

• There is such that ;

• The value of is for any and ;

• The value of is for any and ;

###### Lemma 4.

For any sequence , where , there are and such that for and .

###### Proof.

Let such that for . Remember that the value of is for any . Hence the value of depends only on the variables from . At least value variables of th block belong to . Hence we can choose input with ’s in the value variables of th block which belongs to . For set and , we can follow the same proof. ∎

Remember the statement of the theorem: For integer , if satisfies Inequality (1), then

 N(2-SAFt)≥tt−2.

Here are the details for the proofs.

Let be an order. Then, we pick the partition such that contains at least value variables from exactly blocks.

Let be two different inputs and and be their corresponding mappings, respectively. We show that the subfunctions and are different. Let such that .

If , then we choose providing that , , and , where . That is,

• ,

• and ,

• and , and,

• and .

Thus, and .

If , then we choose providing that and . That is,

• and ,

• and ,

• and , and,

• and .

Hence and .

Therefore and also .

Now, we compute . For , we can get each value of for . It means due to Lemma 4. Therefore, , and, by definition of , we have . ∎

###### Theorem 6.

There is a of size that computes .

###### Proof.

Let be the input. We begin with the first part of automaton that computes . Automaton checks each th block for predicate . If it is true, then computes , and, it checks the next block, otherwise. If checks all blocks and does not find the block, it switches to the rejecting state. If finds , then it goes to one of the special state . From this state, the automaton returns back to the beginning of the input.

We continue with the second part of automaton that computes . From the state , checks each th block for predicate . If it is true, then computes , and, it checks the next block, otherwise. If checks all blocks and does not find the block, it switches to the rejecting state. If finds , then it goes to one of special state . From this state, the automaton returns back to the beginning of the input.

Now, we describe the third part of automaton that computes . From the state , checks each th block for predicate . If it is true, then computes , and, it checks the next block, otherwise. If checks all blocks and does not find the block, then it switches to the rejecting state. If finds , then it goes to one of special state . From this state automaton returns back to the beginning of the input.

The forth part of automaton computes . From the state , checks each th block for predicate . If it is true, then computes , and, it checks next block otherwise. If checks all blocks and does not find the block, it switches to the rejecting state. If finds and , the automaton accepts the input and rejects the input, otherwise.

In the first part, the block checking procedure uses only states. Computing uses states and there are states. So, the size of the first part is . In the second part, the block checking procedure uses only states and we have blocks to check the state pairs for each value of , that is states. Computing uses states and also has states. Therefore, the size of the second part is . Similarly, we can show that the size of the third part is . In the fourth part, we need states for procedure checking and computing . has also one accept and one reject states. So, the size of the forth part is . Thus, the overall size of is . ∎

#### 3.2.2. Boolean Function 2-USAFt:

The definition of is as follows:

 2-USAFt(X):{0,1}n→{0,1} for integer t=t(n)% satisfying that
 (2) 4t(2t+⌈log2t⌉)

We denote its language version as .

We divide the input variables (the symbols of the input) into blocks. There are variables in each block. After that, we divide each block into mark, address, and value

variables. All variables that are in odd positions are

mark. The type of the bit on an even position is determined by the value of previous mark bit. The bit is address, if the previous mark bit’s value is 0, and value otherwise. The first variables of block that are in the even positions denote the address and the other variables of block denote the value.

We call , and are the value, address, and the mark variables of the th block, respectively, for .

Function is calculated based on the following five sub-routines:

1. gets the address of a block: