1 Introduction
Words and trees are used almost universally in Computer Science, and
logical formalisms are among the most convenient tools for specifying
these objects, or sets thereof.
Automata constitute another class of tools, procedural in nature,
widely used to define languages;
underlying this formalism is a rich algebraic theory, through which
further tools from other areas of Mathematics can be used to better understand
the properties of word and tree languages.
Moreover,
the most significant classes of these languages happen to have
descriptions in several formalisms. For example,
the regular word languages are exactly those that are
recognized by finite automata and monoids, and those that
are definable by monadic secondorder formulas with a
unary predicate for each letter and a “leftof” binary positional
predicate.
Similarly the regular tree languages are simultaneously
those that are definable
by monadic secondorder formulas with two
positional predicates (“ancestorof” and “nextsiblingof”)
and those that are recognized by finite tree automata,
as well as various sorts of algebraic structures
(e.g. finite term algebras, finite forest algebras, finitary preclones).
In the same vein, the word languages definable with
firstorder logical formulas with the “leftof” predicate
have several algebraic and combinatorial descriptions,
see [pi97, sc65, krrh65, rhti89, th82].
In particular, these languages are
precisely those whose syntactic monoid is
aperiodic;
thanks to this property,
whether a regular word language is
firstorder definable
can be determined with
a straightforward
algorithm.
In the world of tree languages,
however, none of the definitions for
aperiodicity of the
syntactic algebra tried so far has managed
to characterize precisely the
firstorder definable languages
[he91, bostwa09],
and the techniques invented to show that
certain important subclasses of these languages
are indeed decidable [bese09, bosest08, bose08, plse11, plse15] did not seem to extend to the whole class.
Forest algebras combine two monoids (horizontal
and vertical) in a way which makes it easy for researchers to
apply techniques from the theory of monoids and word languages.
An encouraging harvest of results has already been obtained with this tool [bowa08, bosest08, bostwa09].
In this paper, we look at a counterpart, in the world of trees and forests, to the description of the
aperiodic monoids as
the variety of monoids generated by iterated block products of semilattices [rhti89].
This is a
description of the firstorder definable languages,
developed in terms of the variety of
finitary preclones [eswe10] generated by iterated
block products of preclones that count occurrences
of node labels, regardless of the actual tree
structure.
Intuitively, a block product
works as the
combination of two tree automata
where at every node , the second automaton,
besides the label of , also reads
the current state reached by the first automaton
after reading the subtree rooted
at (“below” )
and the outcome of the
processing, by the first automaton,
of the context of within the tree (“above” ).
The description of firstorder definable languages
developed in [hath87, mora03] suggests that
the threshold, period numerical congruences
are a fundamental feature in the combinatorics of firstorder definable
languages.
Consistently with this we use the same kind of counting
and the corresponding quotient, counter monoids
(in these notations,
the Boolean OR and the cyclic group
are respectively and ).
In our formalism,
we denote by the onedimensional algebra where is the horizontal monoid,
by
the variety of forest algebras generated by
,
by
the variety generated by iterated block
products of algebras from ,
and by the closure of these varieties
over joint.
Let
and denote
both the class of forest languages definable by firstorder
formulas built with the
“ancestor” positional predicate and the usual
quantifiers only (for ) or the same with the
, , modular quantifiers,
and the varieties generated by their syntactic algebras.
Using the formalism of finitary preclones they introduced in
[eswe05], by Esík and Weil
have established in [eswe10] the correspondences
and
.
We explore two ways of defining from
another algebra, where multicontexts are the underlying objects.
The algebra of mappings , and the
“multivertical monoid” of derived from it
enable us to
define notions of pumping and aperiodicity that generalize two
of the known necessary conditions for membership in
, namely aperiodicity of the vertical monoid and the
“absence of vertical confusion on uniform multicontexts”
defined in [bostwa09].
The extended algebra , where
is the powerset of , makes explicit some properties of that are not directly visible in or .
An example is described in Section LABEL:sec:potthoff:
this is a language whose syntactic algebra
has aperiodic vertical and multivertical monoids,
but where is divided by the group .
The language is defined with a formula
that, among other things, counts the parity of the length of
certain nodetoleaf paths;
a pair of elements of does precisely this counting.
An algebra lies outside of the variety
if, and only if there exists
an infinite sequence of sets , one for each , of forests belonging
to different languages recognized by , such that
the elements of
cannot be told apart by any forest algebra in .
This sequence is usually described through an EhrenfeuchtFraïssé game.
We call such a sequence a proof of nonmembership in .
An EhrenfeuchtFraïssé game actually builds a recursive proof,
where each forest of
is built by inserting
copies of the elements of at the ports of
the corresponding
element of a set of multicontexts.
Such a proof can be specified with an infinite sequence of
such sets ;
we denote by
the proposition that states the existence a
that has the required properties;
RC stands for “recursive construction”.
We prove that
if, and only if holds
for every ,
that is, every algebra that is not firstorder has a recursive proof
of nonmembership. Next, we observe that in the existing proofs, the circuit
is either identical to
, in which case each forest of
is built from copies of the same, finite set
of multicontexts (“proofbycopy”), or
is obtained by pumping a starting set
of multicontexts (“proofbypumping”).
The questions of the existence of a
proofbycopy
and of a restricted form of proofbypumping
are both recursively enumerable.
Section 2 contains background
on forests, multicontexts and circuits,
and on forest algebras
and the varieties .
In Section 3,
we define the algebras and .
and the related notions of
pumping and aperiodicity.
In Section 4, we prove that an algebra is outside
if, and only if
this can be asserted with
a recursive proof; we then explore the notions of
proofbycopy and proofbypumping.
In Section 5, we
discuss in our formalism
some typical examples of
nonmembership proofs.
We conclude with some comments and open questions.
2 Definitions and Background
2.1 Forests, Multicontexts, and Circuits
We consistently work with a finite alphabet
, which we assume to always contain a neutral letter , such that for every forest homomorphism ,
is the identity mapping. Let be another alphabet, disjoint from .
A multicontext over is a sequence of trees in which
a subset of the leaves consists of ports, where
every nonport node
carries a label .
We denote by the set of all nodes in ,
by the set of its ports and by the set of the nonport nodes.
We work with multicontext where each port either has a label ,
or has several labels, each
specified as a mapping from to a set that is disjoint from .
A forest is a multicontext without ports;
a context in the usual sense is a forest with a unique port that
carries the special label ,
called its port. Throughout the paper,
this port is considered apart from the others.
Given ,
we denote by the multicontext
consisting of all subtrees of rooted at the sons of .
The subtree rooted at , i.e. plus the node ,
is denoted .
The
context of within , with notation ,
is built from and by replacing
with a port.
The ancestors of this port constitute the trunk of the context.
If we deal with a set of multicontexts instead of an individual , we
use the notations , , , etc.
The sets of all forests, and contexts over are
respectively denoted
and . We use the notations and ,
respectively,
for the set of all multicontexts over for the
set of all multicontexts with a port
(the contextsinmulticontexts, so to speak).
We use the
standard representation for individual forests of multicontexts,
where nodes are listed in preorder and where
concatenation and represent the fatherson relation and
horizontal addition, respectively. For example,
is a tree with a root labelled and two sons labelled and ,
while
is a forest of two trees, where nodes and are roots,
and nodes and are leaves.
Inserting in a context consists in replacing the port of
with a copy of ; the resulting forest is
denoted , or .
Insertion in multicontexts is done
here either on a wholesale basis, i.e. something
is inserted at every port,
or
on a selective basis, when insertion occurs at a
prespecified set of ports.
The latter method is defined in Section 3;
the former is associated with circuits and the construction of witnesses,
as follows.
Let , and be three sets.
A circuit over
is a set with
an element for every ;
this component is a multicontext .
We can regard as having an input wire
for every , an output wire for every ,
and is the result of
unraveling into a tree all those nodes of from which
the output wire is accessible.
A set of forests over
is defined similarly, with an element
for every . The insertion of in
a circuit over consists in
inserting a copy of the forest
at every ;
the result is a set of forests over ,
denoted .
If and are circuits over
then inserting in
builds a circuit over .
It can be verified, using standard methods, that this operation is associative.
2.2 Forest algebras
The reader is assumed to be knowledgeable with the notions of
semigroups and monoids, and their relations with regular languages,
word congruences and monoid homomorphisms (see [pi84, pi97]).
Two types of notations are used for
the monoids discussed in the article. There is an additive, or “horizontal” notation where
the identity and operation are denoted and , respectively, although this does in no way
imply that the latter is commutative. In the multiplicative, or “vertical” notation, the neutral element is denoted
and the operation is written with or by concatenation of the arguments.
A transformation of a set is a mapping , i.e. an element
of the monoid .
A translation in a monoid (with the additive notation) is a mapping
, where ,
defined by . If is commutative, then
the translations are of the form .
The set with the composition of functions
is the translation monoid of .
Definition 2.1
A forest algebra is a pair where is a monoid and is a submonoid of which contains .
Monoids and are the
horizontal and vertical monoids of , respectively.
Because is a submonoid of , its action on
is faithful.
Forest algebras were introduced in [bowa08] as pairs of abstract monoids;
in that case, faithfulness has to be specified in the definition.
A forest algebra homomorphism
from to
is a pair of mappings where
and are monoid homomorphisms
and , respectively, and such that
for every .
The free forest algebra over is
; since it is generated from
,
a homomorphism
is completely specified once and every , , are known.
A forest congruence in is a pair of
equivalence relations, both denoted by , such that in iff for every context ,
and
in iff for every forest .
A congruence refines another congruence
over the same domain,
when for all .
A homomorphism defines its nuclear congruence:
,
and conversely a congruence defines a homomorphism from to .
A set is
recognized by if there exist
a homomorphism
and a subset
such that for all ,
. A context language
is recognized in the same way, with an accepting set .
The syntactic congruences of these languages are refined by
.
A variety of forest algebras is a class of finite forest algebras closed under finite direct
product and division.
Given forest algebras
and , we say that is a subalgebra of iff and ,
and that it divides , with notation , if it is the homomorphic image of a subalgebra
of .
A variety of forest languages is formally defined
as a mapping such that, for every alphabet ,
is closed under finite boolean operations, inverse homomorphism of free algebras
and context quotients.
With a language and , the context quotient of by is the
set ; a
forest algebra which recognizes also recognizes .
The lattices of varieties of forest algebras and of varieties of forest languages are isomorphic [bostwa12].
Let be a forest algebra and let .
An element is accessible from
when for some .
A set is
strongly connected when its elements are mutually accessible;
a strongly connected component of is a subset that is maximal for this property.
Let be such a set: we define from it the set
of all elements from which is accessible, and
its complement , which is an ideal,
that is,
a subset of closed under the
action of every element of .
Let .
The leafcompletion of a multicontext through
a mapping
is the forest
,
obtained by
labeling every port with .
Consistently with this,
the leafextension of
a homomorphism
to
is built by defining ,
for every and onenode tree
with label .
Then
is
the image by of the leafcompletion of through
.
2.3 Block product congruences
It is known that the equivalence relation over where every class consists in all forests that model the same set of formulas of quantifier depth , is a forest congruence. A generalized version of this congruence is , where are integers, defined as follows:

it is built around the threshold, period counting congruence over , defined by
the quotient monoid is denoted ;

given , we have if, and only if, for every , the number of nodes with label in and in are congruent under ; the quotient algebra is denoted ; the corresponding surjective homomorphism is ;

for , given that and the quotient algebra are already known, we define a relabeling operation which consists in replacing, at every node of , the label with the triple
this defines the relabeling alphabet ; the same is done in a context ; however, the new label of is different depending on whether is on the trunk, so that is a context over , where and are disjoint copies of ;

for and , we have if, and only if and .
Example:
we have and
, which illustrates
the distinction between trunk and offtrunk nodes.
The quotient algebra is isomorphic
to the onedimensional algebra .
A onedimensional forest algebra^{1}^{1}1Also called flat algebras in previous works on
the topic: the homomorphic image of a forest is the image in a monoid
of a “flattened”,
“onedimensional” version of the forest. The wording is also a
reference to the notion
that an
algebra recognizes
forest languages that are “more twodimensional” than
those recognized by .
is a pair such that for every homomorphism
and every , there exists such that
. In such an algebra,
the homomorphic image of a forest is independent of its structure, that is, the algebra only
considers the string of its node labels, given in a predetermined order (e.g. in preorder).
Therefore, associates to a monoid homomorphism
, such that .
We denote by the (unique)
onedimensional algebra built from .
The congruences can be defined algebraically, as follows.
Let
and
denote respectively the variety of monoids generated by
and the variety of forest algebras generated by the algebras
where .
Then for every language ,
its syntactic forest algebra belongs to
if, and only if refines its syntactic congruence,
or equivalently, iff divides .
Next, every algebra is a block product
, with
.
We use to denote the variety
generated by block products of the form
with .
Finally, .
We will make abundant use of the following.
Proposition 2.2
The following statements on a finiteindex congruence over are equivalent: ; ; the congruence refines .
Let denote the variety of all forest languages
definable with firstorder logic formulas with the and
quantifiers and the ‘ancestor’ positional predicate; for ,
let denote the variety defined in terms of the same
sort of formulas, where now the , , modular quantifiers are
also allowed.
It was proved in
[eswe10] that the syntactic preclones of the languages in
generate the same variety as the iterated block products of
preclones defined in terms of counting under threshold one (i.e. the monoid );
adding to the generating preclones those
defined with counting under the congruence
yields a characterization for .
It can be verified that these equivalences
translate into
and
.
Remark. Actually,
, where
is the Boolean OR monoid, and
similarly
,
so that working in terms of nontrivial thresholds is not mandatory.
However, doing so makes it possible to follow
more closely the countingunderthreshold that seems to be inherent to the
construction of proofs of nonmembership in ,
and is reminiscent to the description of the
firstorder definable forest
languages developed in [hath87, mora03].
Note that a characterization
also exists,
where is the cyclic group of order ;
we put aside this special case in the current version of this paper.
3 Algebras for Multicontexts
Forest algebras were designed as tools to handle trees, forests, and contexts over . Dealing with multicontexts over as we do in this article demands that a suitable algebraic structure be developed to describe how a forest algebra works on them. A first approach consists, given a forest algebra , in regarding a multicontext as a specification for a multivariate mapping from to . This defines the algebra of mappings ; it is used to define the notion of pumping, which underlies the construction of certain EhrenfeuchtFraïssé games, and to associate to a threshold and a period that are consistent with those used so far in the literature. A second approach consists in considering that a port label specifies elements of are allowed as inputs at that port. This leads to the definition of the extended algebra , which we use to generalize once more the notions of threshold, period, and aperiodicity. Necessary conditions for firstorder definability, that supersede some of the existing ones, are defined from the latter.
3.1 Multicontexts
We use both and to denote the pair
consisting of a multicontext , where
every interior node carries a label ,
and a port labeling .
When this pair is equipped with a second port labeling
, we denote the resulting tuple
when is fixed and the emphasis is on as a whole,
and when it is understood that
is fixed and is one of several possible
second port labelings.
Next, instead of labeling a port directly with a horizontal
monoid element, as it was done in the previous section,
we take and in sets and ,
respectively, such that
, and are pairwise disjoint;
when dealing with specific algebras,
leaf extensions of the appropriate homomorphisms are
then defined on and .
Note that we are ultimately interested in
the recognition of languages over ,
so that and are artefacts used in this
process and the ultimate results should not depend on them.
The tuples over and ,
along with the contexts defined from them
by replacing a leaf
with a special port ,
constitute a forest algebra
;
those over and constitute
;
the reader can verify that both
are free algebras.
Besides the insertion in a context, i.e. the monoid operations in
, and ,
we define
an operation that does multiple, simultaneous insertions in a multicontext from .
Given sets of multicontexts and and ,
with for every ,
we denote by
the set of all multicontexts that can be built by taking an element ,
inserting at each port a multicontext
and replacing the label of with the neutral letter ;
with this new label, has no effect on the image by a homomorphism
of , while it remains available to be used in reasonings
and proofs.
No other label is modified, so that in particular
if is a copy of and ,
then the counterpart of in
satisfies .
Let and
let .
Then with ,
we use the notations and
for and ,
respectively. Given a congruence ,
we say that is stable
when every pair of ports satisfies
.
Next,
let and let be the set of all
ports with label in .
With we define the set
obtained by pumping times the set at ,by:
and
.
This definition of pumping is consistent
with the definition of the
vertical monoid of
(where both and are singletons),
with
the
“vertical confusion” defined in [bostwa09] (where and are singletons),
and with the
“vertical confusion on uniform multicontexts”
also discussed in [bostwa09] (where and are singletons and
the ports of are indistinguishable
by any congruence).
3.2 The algebra of mappings
Let be a finite algebra and a surjective homomorphism. We look for a reasonable way of extending to , besides the one that consists in defining a leaf extension of to . With this in mind, we define the algebra of mappings of , which we denote . To do so, we show how to translate a congruence in into a congruence in , and vice versa. Define a mapping from to , that turns into a forest by replacing all port labels with the neutral letter ; define from to in exactly the same way; both mappings constitute a surjective homomorphism from to . Thus, given a forest congruence over , a forest congruence is defined in a natural way over . In the other direction, let be finite and let
be simultaneously regarded as a vector
a mapping . Given , define by . From and , we build a forest over by replacing in every port label with . We build from in the same way; the port of retains its label. We extend to by defining for every , so that in fact is one of the leaf extensions for that can be built on , and define a mapping by ; similarly, we define by . Next, we define and , the sets of all mappings and , respectively, and given and , the operations and vertical action , , and , so that the pair constitutes a forest algebra^{2}^{2}2A notation that mentions , e.g. , would actually be more accurate, if more cumbersome. and defined by and is a homomorphism. Let be the nuclear congruence of : we define from it an equivalence between nodes, also denoted . Let where is a set of multicontexts closed under :iff and and ;
nodes equivalent under this relation “cannot be told apart by ”. We also write and in order to specify where the nodes are located. Given and we define the mapping
Since is closed under , is the same for every and we can use the notation . Then with , we observe
Therefore, every operation satisfies the compatibility property for [busa00, Definition 5.1].
Proposition 3.1
Let and be built from and . Then the nuclear congruence of is refined by the congruence built recursively over , as in Section 2.3. Hence, .
Proof. By induction on . Recall that a given is regarded both as a mapping and as a vector . For the case, we associate to every a vector with component in and labels in , where , , and , , are respectively the number of nodes with label and the number of ports with . The algebra is isomorphic to , so that, with some abuse of notations, we can write , and given a mapping , the image of can be represented as a vector . Within , there is an equivalence class under for every possible value of , i.e. every vector in . Then given ,
From there, if , then . With , the induction hypothesis states that if , then for every mapping , the leaf completions of and through satisfy . Assume that . Two nodes or ports of and of receive the same label in the versions of and relabeled according to iff , and . By the induction hypothesis, the last two items imply, for every :
which means that and receive the same label in the versions of and relabeled according to , that is, .
The algebras and are not isomorphic, however. To see this, let and , so that , and , so that is isomorphic to , where, and finally . With and , we have , while is the constant function that map onto .
3.3 Equivalence under pumping
We use the algebra of mappings to define a “threshold , period equivalence under pumping” congruence within . First, let denote the relation where two forests are equivalent iff they are the same up to horizontal permutations within a sum (we might as well use instead of ). Then we consider a special case of a multicontext where any two ports satisfy , that is, their contexts within are indistinguishable. Then the stable sets of ports are exactly the sets , ; we say that is suitable for pumping. Pumping^{3}^{3}3Note that this formalism also covers the case where pumping is done “horizontally”, i.e. where we are dealing with a multicontext of the form and where . the singleton along a stable set of ports , we obtain for each a singleton . We now define , for every . First, coincides with . Next, the forest congruence is generated by the pairs and the corresponding context congruence by the pairs , where , and . Then recursively for , given a set of multicontexts closed under and a set that is stable, is the congruence generated by the pairs and where , , , and . We denote by the (infinite) quotient algebra and by the corresponding surjective homomorphism.
Proposition 3.2
Every congruence of finite index over is refined by a congruence .
Proof. Let be finite. Let and let be suitable for pumping. Assume that ; we pump the singleton along . Given we define and the mapping
Observe that and in general,
The mapping generates a subsemigroup
of ;
from the threshold and period of
we obtain integers and
such that, for all combination of and ,
we have
as soon as
.
We prove by induction on , that with these and ,
for every