# Sequentiality of String-to-Context Transducers

Transducers extend finite state automata with outputs, and describe transformations from strings to strings. Sequential transducers, which have a deterministic behaviour regarding their input, are of particular interest. However, unlike finite-state automata, not every transducer can be made sequential. The seminal work of Choffrut allows to characterise, amongst the functional one-way transducers, the ones that admit an equivalent sequential transducer. In this work, we extend the results of Choffrut to the class of transducers that produce their output string by adding simultaneously, at each transition, a string on the left and a string on the right of the string produced so far. We call them the string-to-context transducers. We obtain a multiple characterisation of the functional string-to-context transducers admitting an equivalent sequential one, based on a Lipschitz property of the function realised by the transducer, and on a pattern (a new twinning property). Last, we prove that given a string-to-context transducer, determining whether there exists an equivalent sequential one is in coNP.

Comments

There are no comments yet.

## Authors

• 7 publications
• 1 publication
• ### DAWGs for parameterized matching: online construction and related indexing structures

Two strings x and y over Σ∪Π of equal length are said to parameterized m...
02/17/2020 ∙ by Katsuhito Nakashima, et al. ∙ 0

read it

• ### Sampling from Stochastic Finite Automata with Applications to CTC Decoding

Stochastic finite automata arise naturally in many language and speech p...
05/21/2019 ∙ by Martin Jansche, et al. ∙ 0

read it

• ### Twinning automata and regular expressions for string static analysis

In this paper we formalize and prove the soundness of Tarsis, a new abst...
06/04/2020 ∙ by Luca Negrini, et al. ∙ 0

read it

• ### String Sanitization: A Combinatorial Approach

String data are often disseminated to support applications such as locat...
06/26/2019 ∙ by Giulia Bernardini, et al. ∙ 0

read it

• ### Toward Data Cleaning with a Target Accuracy: A Case Study for Value Normalization

Many applications need to clean data with a target accuracy. As far as w...
01/13/2021 ∙ by Adel Ardalan, et al. ∙ 0

read it

• ### A New Class of Searchable and Provably Highly Compressible String Transformations

The Burrows-Wheeler Transform is a string transformation that plays a fu...
02/04/2019 ∙ by Raffaele Giancarlo, et al. ∙ 0

read it

• ### An output-sensitive algorithm for the minimization of 2-dimensional String Covers

String covers are a powerful tool for analyzing the quasi-periodicity of...
06/21/2018 ∙ by Alexandru Popa, et al. ∙ 0

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

sec:introduction

Transducers are a fundamental model to describe programs manipulating strings. They date back to the very first works in theoretical computer science, and are already present in the pioneering works on finite state automata [23, 1]. While finite state automata are very robust w.r.t. modifications of the model such as non-determinism and two-wayness, this is not the case for transducers. These two extensions do affect the expressive power of the model. Non-determinism is a feature very useful for modelisation and specification purposes. However, when one turns to implementation, deriving a sequential, i.e. input-deterministic, transducer is a major issue. A natural and fundamental problem thus consists, given a (non-deterministic) transducer, in deciding whether there exists an equivalent sequential transducer. This problem is called the sequentiality problem.

In [10], Choffrut addressed this problem for the class of functional (one-way) finite state transducers, which corresponds to so-called rational functions. He proved a multiple characterisation of the transducers admitting an equivalent sequential transducer. This characterisation includes a machine-independent property, namely a Lipschitz property of the function realised by the transducer. It also involves a pattern property, namely the twinning property, that allows to prove that the sequentiality problem is decidable in polynomial time for the class of functional finite state transducers [25]. This seminal work has led to developments on the sequentiality of finite state transducers [8, 7]. These results have also been extended to weighted automata [9, 19, 15] and to tree transducers [24]. See also [20] for a survey on sequentiality problems.

While the model of one-way transducers is now rather well-understood, a current challenge is to address the so-called class of regular functions, which corresponds to functions realised by two-way transducers. This class has attracted a lot of interest during the last years. It is closed under composition [11] and enjoys alternative presentations using logic [14], a deterministic one-way model equipped with registers, named streaming string transducers [2] (SST for short), as well as a set of regular combinators [4, 6, 12]. This class of functions is much more expressive, as it captures for instance the mirror image and the copy. Yet, it has good decidability properties: equivalence and type-checking are decidable in PSpace [18, 3]. We refer the interested reader to [16] for a recent survey. Intuitively, two-way finite state transducers (resp. SST) extend one-way finite state transducers with two important features: firstly, they can go through the input word both ways (resp. they can prepend and append words to registers), and secondly, they can perform multiple passes (resp. they can perform register concatenation).

In this paper, we lift the results of Choffrut [10] to a class of transducers that can perform the first of the two features mentioned above, thus generalising the class of rational functions. More precisely, we consider transducers which, at each transition, extend the output word produced so far by prepending and appending two words to it. This operation can be defined as the extension of a word with a context, and we call these transducers the string-to-context transducers. However, it is important to notice that that they still describe functions from strings to strings. We characterise the functional string-to-context transducers that admit an equivalent sequential string-to-context transducer through a machine independent property: the function realised by the transducer satisfies a Lipschitz property that involves an original factor distance and a pattern property of the transducer which we call contextual twinning property, and that generalises the twinning property to contexts. We also prove that the sequentiality problem for these transducers is in the class coNP.

A key technical tool of the result of [10] was a combinatorial analysis of the loops, showing that the output words of synchronised loops have conjugate primitive roots. For string-to-context transducers, the situation is more complex, as the combinatorics may involve the words of the two sides of the context. Intuitively, when these words do commute with the output word produced so far, it is possible for instance to move to the right a part of the word produced on the left. In order to prove our results, we thus dig into the combinatorics of contexts associated with loops, identifying different possible situations, and we then use this analysis to describe an original determinisation construction.

Our results also have a strong connection with the register minimisation problem for SST. This problem consists in determining, given an SST and a natural number , whether there exists an equivalent SST with registers. It has been proven in [13] that the problem is decidable for SST that can only append words to registers, and the proof crucially relies on the fact that the case exactly corresponds to the sequentiality problem of one-way finite state transducers. Hence, our results constitute a first step towards register minimisation for SST without register concatenation. The register minimisation problem for non-deterministic SST has also been studied in [5] for the case of concatenation-free SST. The targeted model being non-deterministic, the two problems are independent.

Due to lack of space, omitted proofs can be found in the Appendix.

## 2 Models

sec:preliminaries

#### Words, contexts and partial functions

Let be a finite alphabet. The set of finite words (or strings) over is denoted by . The empty word is denoted by . The length of a word is denoted by . We say that a word is a prefix (resp. suffix) of a word if there exists a word such that (resp. ). We say that two words are conjugates if there exist two words such that and . If this holds, we write . The primitive root of a word , denoted , is the shortest word such that for some . [[17]]l:Fine Let . There exists such that if there is a common factor of and of length at least , then .

Given two words , the longest common prefix (resp. suffix) of and is denoted by (resp. ). We define the prefix distance between and , denoted by , as .

Given a word , we say that is a factor of if there exist words such that . Given two words , a longest common factor of and is a word of maximal length that is a factor of both and . Note that this word is not necessarily unique. We denote such a word by . The factor distance between and , denoted by , is defined as . This definition is correct as is independent of the choice of the common factor of maximal length.

Using a careful case analysis, we can prove that is indeed a distance, the only difficulty lying in the subadditivity:

r:distf is a distance.

Given a finite alphabet , a context on is a pair of words . The set of contexts on is denoted . The empty context is denoted by . For a context , we denote by (resp. ) its left (resp. right) component: (resp. ). The length of a context is defined by . The lateralized length of a context is defined by . For a context and a word , we write for the word . We define the concatenation of two contexts as the context . Last, given a context and a word , we denote by the unique word such that , when such a word exists.

Given a set of contexts , we denote by the longest common context of elements in , defined as . We also write .

We consider two sets . Given , we let . We denote the set of partial functions from to as . Given , we write , and we denote by its domain. When more convenient, we may also see elements of as subsets of . Last, given , we let denote some such that and .

#### String-to-Context and String-to-String Transducers

Let be two finite alphabets. A string-to-context transducer (S2C for short) from to is a tuple where is a finite set of states, (resp. ) is the finite initial (resp. final) function, is the finite set of transitions.

A state is said to be initial (resp. final) if (resp. ). We depict as as (resp. ) the fact that (resp. ). A run from a state to a state on a word where for all , , is a sequence of transitions: . The output of such a run is the context , and is denoted by . We depict this situation as . The set of runs of is denoted . The run is said to be accepting if is initial and final. This string-to-context transducer computes a relation defined by the set of pairs such that there are with . Thus, even if its definition involves contexts on , the semantics of is a relation between words on and words on . Given an S2C , we define the constant as . Given , we denote by the S2C obtained by replacing with . An S2C is trimmed if each of its states appears in some accepting run. W.l.o.g., we assume that the string-to-context transducers we consider are trimmed. An S2C from to is functional if the relation is a function from to . An S2C is sequential if is a singleton and if for every transitions , we have and .

The classical model of finite-state transducers is recovered in the following definition: Let be two finite alphabets. A string-to-context transducer is a string-to-string transducer (S2S for short) from to if, for all , , and for all , .

Notations defined for S2C hold for classical transducers as is. For an S2S, we write (resp. , and ) instead of (resp. , and ).

Given an S2C , we define its right S2S, denoted , as the tuple where, for all , and , and, for all , . Its left S2S is defined similarly, and by applying the mirror image on its output labels.

Two examples of S2C (not realisable by S2S) are depicted on e:StoC.

## 3 Lipschitz and Twinning Properties

sec:lipschitz-twinning

We recall the properties considered in [10], and the associated results.

We say that a function satisfies the Lipschitz property if there exists such that .

We consider an S2S and . Two states and are said to be -twinned if for any two runs and , where and are initial, we have for all , . An S2S satisfies the twinning property (TP) if there exists such that any two of its states are -twinned.

[[10]] Let be a functional S2S. The following assertions are equivalent:

1. there exists an equivalent sequential S2S,

2. satisfies the Lipschitz property,

3. satisfies the twinning property.

We present the adaptation of these properties to string-to-context transducers. We say that satisfies the contextual Lipschitz property (CLip) if there exists such that .

d:ctp We consider an S2C and . Two states and are said to be -contextually twinned if for any two runs and , where and are initial, we have for all , . An S2C satisfies the contextual twinning property (CTP) if there exists such that any two of its states are -contextually twinned.

## 4 Main Result

sec:main-result

The main result of the paper is the following theorem, which extends to string-to-context transducers the characterisation of sequential transducers amongst functional ones.

t:main Let be a functional S2C. The following assertions are equivalent:

1. there exists an equivalent sequential string-to-context transducer,

2. satisfies the contextual Lipschitz property,

3. satisfies the contextual twinning property.

###### Proof.

The implications and are proved in r:det-implies-lip and r:lip-implies-ctp respectively. The implication is more involved, and is based on a careful analysis of word combinatorics of loops of string-to-context transducers satisfying the CTP. This analysis is summarised in r:ctp-implies-2-loop and used in sec:construction to describe the construction of an equivalent sequential S2C. ∎

r:det-implies-lip Let be a functional S2C realizing the function . If there exists an equivalent sequential S2C, then satisfies the contextual Lipschitz property.

###### Proof.

Let us consider the equivalent sequential S2C. We claim that is context-Lipschitzian with coefficient . Consider two input words in the domain of . If , then the result is trivial. Otherwise, let