1 Introduction
VC dimension.
VC dimension is a widely used parameter measuring the complexity of set systems. Since its introduction in the 70s in the seminal work of Vapnik and Chervonenkis [Vapnik71vcdim]
, it became a fundamental notion in statistical learning theory. VC dimension has also found multiple applications in combinatorics and in algorithm design, particularly in the area of approximation algorithms.
The original definition states that the VC dimension of a set system , where is the universe and is the family of sets, is equal to the supremum of cardinalities of subsets of that are shattered by . Here, a subset is shattered by if the restriction of to — defined as the set system — is the whole powerset of .
In many applications, the boundedness of the VC dimension is exploited mainly through the SauerShelah Lemma [Sauer72sauershelahlemma, Shelah72sauershelahlemma], which states that a set system over a universe of size and of VC dimension contains only different sets. As a bound on VC dimension is inherited under restrictions, this implies that for every subset of the universe, the cardinality of the set system is at most . This polynomial bound on the sizes of restrictions distinguishes set systems with bounded VC dimension from arbitrary set systems, where the exponential growth is witnessed by larger and larger shattered sets.
However, for many set systems appearing in various settings, the bound provided by the SauerShelah Lemma is far from optimum: the degree of the best possible polynomial bound is much lower than the VC dimension. This motivates introducing a more refined notion of the VC density of a set system, which is (slightly informally) defined as the lowest possible degree of a polynomial bounding the cardinalities of its restrictions. See Section 2.1 for a formal definition. The SauerShelah Lemma then implies that the VC density is never larger than the VC dimension, but in fact it can be much lower. This distinction is particularly important for applications in approximation algorithms, where having VC density equal to one (which corresponds to a linear bound in the SauerShelah Lemma) implies the existence of nets of size [Chan12wscapprox], while a superlinear bound implied by the boundedness of the VC dimension gives only nets of size (see e.g. [Mustafa17epsnets]). This difference seems innocent at first glance, but shaving off the logarithmic factor actually corresponds to the possibility of designing constantfactor approximation algorithms [Chan12wscapprox].
Defining set systems in logic.
In this work we study set systems definable in different variants of logic over various classes of graphs. We concentrate on finding a precise understanding of the connection between the expressive power of the considered logic and the structural properties of the investigated class of graphs that are necessary and sufficient for the following assertion to hold: formulas can define only simple set systems in graphs from , where simplicity is measured in terms of the VC parameters.
To make this idea precise, we need a way to define a set system from a graph using a formula. Let be a formula of some logic (to be made precise later) in the vocabulary of graphs, where are tuples of free vertex variables. Note here that the partition of free variables into and is fixed; in this case we say that is a partitioned formula. Then defines in a graph the set system of definable sets:
Here, and denote the sets of evaluations of variables of and in , respectively. In other words, every defines the set consisting of all those for which is true in . Then is a set system over universe that comprises all subsets of definable in this way.
For an example, if and verifies whether the distance between and is at most , for some , then is the set system whose universe is the vertex set of , while the set family comprises all balls of radius in .
The situation when the considered logic is the First Order logic was recently studied by Pilipczuk, Siebertz, and Toruńczyk [Pilipczuk18typesinsparsegraphs]. They showed that the simplicity of definable set systems in graphs is tightly connected to their sparseness, as explained formally next. On one hand, if is a nowhere dense^{1}^{1}1Nowhere denseness is a notion of uniform sparseness in graphs. As it is not directly related to our investigations, we refrain from giving a formal definition, and refer the interested reader to the discussion in [Pilipczuk18typesinsparsegraphs] instead. class of graphs, then for every partitioned formula , defines in graphs from set systems of VC density at most . On the other hand, if is not nowhere dense, but is closed under taking subgraphs, then there exists a partitioned formula that defines in graphs from set systems of arbitrarily high VC dimension, hence also arbitrarily high VC density. Note that one cannot expect lower VC density than for any nontrivial logic and class , because already the very simple formula defines set systems of VC density in edgeless graphs. Thus, in some sense the result stated above provides a sharp dichotomy.
In this work we are interested in similar dichotomy statements for more expressive variants of logic on graphs, namely and . Recall that on graphs extends by allowing quantification over subsets of vertices, while in one can in addition quantify over subsets of edges. This setting has been investigated by Grohe and Turán [Grohe04msotreevcdim]. They proved that if graphs from a graph class have uniformly bounded cliquewidth (i.e. there is a constant that is an upper bound on the cliquewidth of every member of ), then every formula defines in graphs from set systems with uniformly bounded VC dimension. They also gave a somewhat complementary lower bound showing that if contains graphs of arbitrarily high treewidth and is closed under taking subgraphs, then there exists a fixed formula that defines in graphs from set systems with unbounded VC dimension.
Our contribution.
We improve the results of Grohe and Turán [Grohe04msotreevcdim] in two aspects. First, we prove tight upper bounds on the VC density of the considered set systems, and not only on the VC dimension. Second, we clarify the dichotomy statements by showing that the boundedness of the VC parameters for set systems definable in is tightly connected to the boundedness of cliquewidth, and there is a similar connection between the complexity of set systems definable in and the boundedness of treewidth. Formal statements follow.
For the upper bounds, our results are captured by the following theorem. Here, and are extensions of and , respectively, by modular predicates of the form , where is a monadic variable and are integers. Also, is a restriction of where we allow only modular predicates with , that is, checking the parity of the cardinality of a set.
Theorem 1.
Let be a class of graphs and be a partitioned formula. Additionally, assume that one of the following assertions holds:

[nosep]

has uniformly bounded cliquewidth and is a formula; or

has uniformly bounded treewidth and is a formula.
Then there is a constant such that for every graph and nonempty vertex subset ,
In particular, this implies that for a partitioned formula , the class of set systems has VC density whenever has uniformly bounded cliquewidth and is a formula, or has uniformly bounded treewidth and is a formula.
Note that theorem 1 provides much better bounds on the cardinalities of restrictions of the considered set systems than bounding the VC dimension and using the SauerShelah Lemma, as was done in [Grohe04msotreevcdim]. In fact, as argued in [Grohe04msotreevcdim, Theorem 12], even in the case of defining set systems over words, the VC dimension can be towerexponential high with respect to the size of the formula. In contrast, theorem 1 implies that the VC density will be actually much lower: at most . This improvement has an impact on some asymptotic bounds in learningtheoretical corollaries discussed by Grohe and Turán, see e.g. [Grohe04msotreevcdim, Theorem 1].
For lower bounds, we work with labelled graphs. For a finite label set , a vlabelled graph is a graph whose vertices are labelled using labels from , while in a velabelled graph we label both the vertices and the edges using . For a graph class , by we denote the class of all vlabelled graphs whose underlying unlabeled graphs belong to , while is defined analogously for velabelled graphs. The discussed variants of work over labelled graphs in the obvious way.
Theorem 2.
There exists a finite label set such that the following holds. Let be a class of graphs and be a logic such that either

[nosep]

contains graphs of arbitrarily large cliquewidth and ; or

contains graphs of arbitrarily large treewidth and .
Then there exists a partitioned formula in the vocabulary of graphs from , where if (i) holds and if (ii) holds, such that the family
contains set systems with arbitrarily high VC dimension.
Thus, the combination of theorem 1 and theorem 2 provides a tight understanding of the usual connections between and cliquewidth, and between and treewidth, also in the setting of definable set systems. We remark that the second connection was essentially observed by Grohe and Turán in [Grohe04msotreevcdim, Corollary 20], whereas the first seems new, but follows from a very similar argument.
As argued by Grohe and Turán in [Grohe04msotreevcdim, Example 21], some mild technical conditions, like closedness under labelings with a finite label set, is necessary for a result like theorem 2 to hold. Indeed, the class of subdivided complete graphs has unbounded treewidth and cliquewidth, yet  and formulas can only define set systems of bounded VC dimension on this class, due to symmetry arguments. Also, the fact that in the case of unbounded cliquewidth we need to rely on logic instead of plain is connected to the longstanding conjecture of Seese [Seese91msotwundecidable] about decidability of in classes of graphs.
2 Preliminaries
2.1 VapnikChervonenkis parameters
In this section we briefly recall the main definitions related to the VapnikChervonenkis parameters. We only provide a terse summary of the relevant concepts and results, and refer to the work of Mustafa and Varadarajan [Mustafa17epsnets] for a broader context.
A set system is a pair , where is the universe or ground set, while is a family of subsets of . While a set system is formally defined as the pair , we will often use that term with a family alone, and then is implicitly taken to be . The size of a set system is .
For a set system and , the restriction of to is the set system , where . We say that is shattered by if is the whole powerset of . Then the VC dimension of is the supremum of cardinalities of sets shattered by .
As we are mostly concerned with the asymptotic behavior of restrictions of set systems, the following notion will be useful.
Definition 3.
The growth function of a set system is the function defined as:
Clearly, for any set system we have that , but many interesting set systems admit asymptotically polynomial bounds. This is in particular implied by the boundedness of the VC dimension, via the SauerShelah Lemma stated below.
Lemma 4 (Sauer–Shelah Lemma [Sauer72sauershelahlemma, Shelah72sauershelahlemma]).
If is a set system of VC dimension , then
Note that when the VC dimension of is not bounded, then for every there is a set of size that is shattered by , which implies that . This provides an interesting dichotomy: if is not bounded by a polynomial, it must be equal to the function .
As useful as the Sauer–Shelah Lemma is, the upper bound on asymptotics of the growth function implied by it is quite weak for many natural set systems. Therefore, we will study the following quantity.
Definition 5.
The VC density of a set system is the quantity
Observe that the definition of the VC density of makes little sense when the universe of is finite, as then the growth function ultimately becomes , allowing a polynomial bound of arbitrary small degree. Therefore, we extend the definition of VC density to classes of finite set systems (i.e., families of finite set systems) as follows: the VC density of a class is the infimum over all for which there is such that for all and . Note that this is equivalent to measuring the VC density of the set system obtained by taking the union of all set systems from on disjoint universes. Similarly, the VC dimension of a class of set systems is the supremum of the VC dimensions of the members of .
Thus, informally speaking the VC density of is the lowest possible degree of a polynomial bound that fits the conclusion of the Sauer–Shelah lemma for . Clearly, the Sauer–Shelah lemma implies that the VC density is never larger than the VC dimension, but as it turns out, that connection goes both ways:
Lemma 6 ([Mustafa17epsnets]).
A set system satisfying for all has VC dimension bounded by .
Hence, a set system has finite VC dimension if and only if it has finite VC density, but the results showing their equivalence usually produce relatively weak bounds. As discussed in the introduction, VC density is often a finer measure of complexity than VC dimension for interesting problems.
2.2 Set systems definable in logic
We assume basic familiarity with relational structures. The domain (or universe) of a relational structure will be denoted by . For a tuple of variables and a subset , by we denote the set of all evaluations of in , that is, functions mapping the variables of to elements of . A class of structures is a set of relational structures over the same signature.
Consider a logic over some relational signature . A partitioned formula is an formula of the form , where the free variables are partitioned into object variables and parameter variables . Then for a structure , we can define the set system of definable sets in :
If is a class of structures, then we define the class of set systems .
Note that the universe of is , so the elements of can be interpreted as tuples of elements of of length . When measuring the VC parameters of set systems it will be convenient to somehow still regard as the universe. Hence, we introduce the following definition: a tuple set system is a pair , where is a universe and is a family of sets of tuples of elements of . Thus, can be regarded as an tuple set system with universe .
When is a tuple set system, for a subset of elements we define
This naturally gives us the definition of a restriction: . We may now lift all the relevant definitions — of shattering, of the VC dimension, of the growth function, and of the VC density — to tuple set systems using only such restrictions: to subsets . Note that these notions for tuple set systems are actually different from the corresponding regular notions, which would consider as a set system with universe . This is because, for instance for the VC dimension, in the regular definition we would consider shattering all possible subsets of tuples of the universe, while in the definition for tuple set systems we restrict attention to shattering sets of the form , where .
2.3 MSO and transductions
Recall that Monadic Second Order logic () is an extension of the First Order logic () that additionally allows quantification over subsets of the domain (i.e. unary predicates), represented as monadic variables. Sometimes we will also allow modular predicates of the form , where is a monadic variable and are integers, in which case the corresponding logic shall be named . If only parity predicates may be used (i.e. ), we will speak about logic.
The main idea behind the proofs presented in the next sections is that we will analyze how complicated set systems one can define in on specific simple structures: trees and grid graphs. Then these results will be lifted to more general classes of graphs by means of logical transductions.
For a logic (usually a variant of ) and a signature , by we denote the logic comprising all formulas over . Then deterministic transductions are defined as follows.
Definition 7.
Fix two relational signatures and . A deterministic transduction from structures to structures is a sequence of formulas: , where the length of matches the arity of .
The semantics we associate with this definition is as follows. Let be a structure and . Then is a structure given by:
In a nutshell, we restrict the universe of the input structure to the elements satisfying , and in this new domain we reinterpret the relations of using formulas evaluated in .
We will sometimes work with nondeterministic transductions, which are the following generalization.
Definition 8.
Fix two relational signatures and . A nondeterministic transduction from structures to structures is a pair consisting of: a finite signature consisting entirely of unary relation symbols, which is disjoint from ; and a deterministic transduction from structures to structures. Transduction is called the deterministic part of .
We associate the following semantics with this definition. If is a structure, then by we denote the set of all possible structures obtained by adding valuations of the unary predicates from to . Then we define , which is again a set of structures. Thus, a nondeterministic transduction can be seen as a procedure that first nondeterministically selects the valuation of the unary predicates from in the input structure, and then applies the deterministic part.
If is a class of structures and is a transduction (deterministic or not), then by we denote the sum of images of over elements of . Also, if is a signature consisting of unary relation names that is disjoint from , then we write for the class of all possible structures that can be obtained from the structures from by adding valuations of the unary predicates from .
An important property of deterministic transductions is that formulas working over the output structure can be “pulled back” to formulas working over the input structure that select exactly the same tuples. All one needs to do is add guards for all variables, ensuring that the only entities we operate on are those accepted by , and replace all relational symbols of with their respective formulas which define the transduction. This translation is formally encapsulated in the following result.
Lemma 9 (Backwards Translation Lemma, cf. [Courcelle94msotransductionsurvey]).
Let be a deterministic transduction from structures to structures, and let . Then for every formula there is an formula such that for every structure and ,
The formula provided by Lemma 9 will be denoted by .
Finally, we remark that in the literature there is a wide variety of different notions of logical transductions and interpretations; we chose one of the simplest, as it will be sufficient for our needs. We refer a curious reader to a survey of Courcelle [Courcelle94msotransductionsurvey].
2.4 MSO on graphs
We will work with two variants of on graphs: and . Both these variants are defined as the standard notion of logic, but applied to two different encodings of graphs as relational structures. When we talk about formulas, we mean formulas over structures representing graphs as follows: elements of the structure correspond to vertices and there is a single binary relation representing adjacency. The second variant, , encompasses formulas over structures representing graphs as follows: the domain contains both edges and vertices of the graph, and there is a binary incidence relation that selects all pairs such that is an edge and is one of its endpoints. These two encodings of graphs will be called the adjacency encoding and the incidence encoding, respectively.
Thus, practically speaking, in we may only quantify over subsets of vertices, while in we allow quantification both over subsets of vertices and over subsets of edges. is strictly more powerful than , for instance it can express that a graph is Hamiltonian. We may extend and with modular predicates in the natural way, thus obtaining logic , , etc.
If is a graph and is an formula over graphs, where is any of the variants of discussed above, then we may define the tuple set system as before, where the universe of is the vertex set of . We remark that in case of , despite the fact that formally an formula works over a universe consisting of both vertices and edges, in the definition of we consider only the vertex set as the universe. That is, the parameter variables range over and each evaluation defines the set of evaluations satisfying which is included in .
2.5 MSO and tree automata
When proving upper bounds we will use the classic connection between and tree automata. Throughout this paper, all trees will be finite, rooted, and binary: every node may have a left child and a right child, though one or both of them may be missing. Trees will be represented as relational structures where the domain consists of the nodes and there are two binary relations, respectively encoding being a left child and a right child. In case of labeled trees, the signature is extended with a unary predicate for each label.
Definition 10.
Let be a finite alphabet. A (deterministic) tree automaton is a tuple where is a finite set of states, is a subset of denoting the accepting states, while is the transition function.
A run of a tree automaton over a labeled tree is the labeling of its nodes which is computed in a bottomup manner using the transition function. That is, if a node bears symbol and the states assigned by the run to the children of are and , respectively, then the state assigned to is . In case has no left or right child, the corresponding state is replaced with the special symbol . In particular, the state in every leaf is determined as , where is the label of the leaf. We say that a tree automaton accepts a finite tree if .
The following statement expresses the classic equivalence of and finite automata over trees.
Lemma 11 ([Rabin69s2sdecidability]).
For every sentence over the signature of labeled trees there exists a tree automaton which is equivalent to in the following sense: for every labeled tree , if and only if accepts .
Since we are actually interested in formulas with free variables and not only sentences, we will need to change this definition slightly. Informally speaking, we will enlarge the alphabet in a way which allows us to encode valuations of the free variables. Let be a labelled tree and consider a tuple of variables along with its valuation . Then we can encode in by defining the augmented tree as follows: is the tree with labels from that is obtained from by enriching the label of every node with the function defined as follows: for , we have if and only if . As observed by Grohe and Turán [Grohe04msotreevcdim], formulas can be translated to equivalent tree automata working over augmented trees.
Lemma 12 ([Grohe04msotreevcdim]).
For every formula over the signature of labeled trees there exists a tree automaton over labelled trees which is equivalent to in the following sense: for every labelled tree and , if and only if accepts .
3 Upper bounds
In this section we prove theorem 1. We start with investigating the case of definable set systems in trees. This case will be later translated to the case of classes with bounded treewidth or cliquewidth by means of transductions.
3.1 Trees
Recall that labelled binary trees are represented as structures with domains containing their nodes, two successor relations—one for the left child, and one for the right—and unary predicates for labels. It turns out that definable set systems over labelled trees actually admit optimal upper bounds for VC density. This improves the result of Grohe and Turán [Grohe04msotreevcdim] showing that such set systems have bounded VC dimension.
Theorem 13.
Let be a class of finite binary trees with labels from a finite alphabet , and be a partitioned formula over the signature of labeled binary trees. Then there is a constant such that for every tree and a nonempty subset of its nodes , we have
Proof.
By lemma 12, is equivalent to a tree automaton over an alphabet of . We will now investigate how the choice of parameters can affect the runs of over .
Since we are really considering over the alphabet extended with binary markers for and , we will use to denote the extension of the labeling of where all binary markers are set to . That is, is the tree labeled with alphabet obtained from by extending each symbol appearing in with functions that map all variables of and to . Tree is defined analogously, where the markers for are set according to the valuation , while the markers for are all set to .
In we have natural ancestor and descendant relations; we consider every node its own ancestor and descendant as well. Let be the subset of nodes of that consists of:

[nosep]

the root of ;

all nodes of ; and

all nodes such that both the left child and right child of have a descendant that belongs to .
Note that . For convenience, let be a function that maps every node of to the least ancestor of that belongs to .
We define a tree with as the set of nodes as follows. A node is the left child of a node in if the following holds in : is a descendant of the left child of and no internal vertex on the unique path from to belongs to . Note that every node has at most one left child in , for if it had two left children , then the least common ancestor of and would belong to and would be an internal vertex on both the to path and the to path. The right child relation in is defined analogously. The reader may think of as of with contracted to , for every ; see fig. 1.
Note that we did not define any labeling on the tree . Indeed, we treat as an unlabeled tree, but will consider different labelings of induced by various augmentations of . For this, we define alphabet
where denotes the set of functions from to . Now, for a fixed valuation of parameter variables and object variables , we define the labeled tree as follows. Consider any node and let be the context of : a tree obtained from by restricting it to the descendants of , and, for every child of in , replacing the subtree rooted at by a single special node called a hole. The automaton can be now run on the context provided that for every hole of we prescribe a state to which this hole should evaluate. Thus, running on defines a state transformation , which maps tuples of states assigned to the holes of to the state assigned to . Intuitively, encodes the compressed transition function of when run over the subtree of induced by , where it is assumed that on the input we are given the states to which the children of in are evaluated. Note that the domain of consists of pairs of states if has two children in , of one state if has one child in , and of zero states if has no children in . Thus
Note that for fixed and , is uniquely determined by the subset of variables of that maps to . This is because , while is the only node of that may belong to . Hence, with we can associate a function that given , outputs the transformation for any (equivalently, every) satisfying iff , for all . Then we define the labeled tree as with labeling . Note that the above construction can be applied to in the same way.
Now, for we define the labeled tree by augmenting with markers for the valuation ; note that this is possible because is contained in the node set of . We also define an automaton working on labeled trees as follows. uses the same state set as , while its transition function is defined by taking the binary valuation for in a given node , applying it to the label of to obtain a state transformation, verifying that the arity of this transformation matches the number of children of , and finally applying that transformation to the input states. Then the following claim follows immediately from the construction.
Claim 1.
For all and , the run of on is equal to the restriction of the run of on to the nodes of .
From Claim 1 it follows that if for two tuples we have , then for every , accepts if and only if accepts . As is equivalent to the formula in the sense of lemma 12, this implies that
In other words, and define the same element of . We conclude that the cardinality of is bounded by the number of different trees that one can obtain by choosing different .
Observe that for each , tree differs from by changing the labels of at most nodes. Indeed, from the construction of it follows that for each , the labels of in and in may differ only if maps some variable of to a node belonging to ; this can happen for at most nodes of . Recalling that and , the number of different trees is bounded by
where . As argued, this number is also an upper bound on the cardinality of , which concludes the proof.
3.2 Classes with bounded treewidth or cliquewidth
We now exploit the known connections between trees and graphs of bounded treewidth or cliquewidth, expressed in terms of the existence of suitable transductions, to lift theorem 13 to more general classes of graphs, thereby proving theorem 1. In fact, we will not rely on the original combinatorial definitions of these parameters, but on their logical characterizations proved in subsequent works.
The first parameter of interest is the cliquewidth of a graph, introduced by Courcelle and Olariu [CourcelleO00]. We will use the following wellknown logical characterization of cliquewidth.
Theorem 14 ([CourcelleE95, EngelfrietO97]).
For every there is a finite alphabet and a deterministic transduction such that for every graph of cliquewidth at most there exists a labeled binary tree satisfying the following: is the adjacency encoding of .
Thus, one may think of graphs of bounded cliquewidth as of graphs that are interpretable in labeled trees. By combining theorem 14 with theorem 13 we can prove part (i) of theorem 1 as follows.
Fix a class with uniformly bounded cliquewidth and a partitioned formula over the signature of . Let be the upper bound on the cliquewidth of graphs from , and let and be the alphabet and the deterministic transduction provided by theorem 14 for . Then for every , we can find a labeled tree such that is the adjacency encoding of . Note that . Observe that for every and vertex subset , we have
where is the formula pulled back through the transduction , as given by lemma 9. As by theorem 13 we have for some constant , the same upper bound can be also concluded for the cardinality of . This proves theorem 1, part (i).
To transfer these result to the case of over graphs of bounded treewidth, we need to define an additional graph transformation. For a graph , the incidence graph of is the bipartite graph with as the vertex set, where a vertex is adjacent to an edge if and only if is an endpoint of . The following result links on a graph with on its incidence graph.
Lemma 15 ([Courcelle18, Courcelle18a]).
Let be a graph of treewidth . Then the cliquewidth of the incidence graph of is at most . Moreover, with any formula one can associate a formula such that for any graph and we have if and only if , where is the incidence graph of .
Now lemma 15 immediately reduces part (ii) of theorem 1 to part (i). Indeed, for every partitioned formula , the corresponding formula provided by lemma 15 satisfies the following: for every graph and its incidence graph , we have
Observe that by lemma 15, if a graph class has uniformly bounded treewidth, then the class comprising the incidence graphs of graphs from has uniformly bounded cliquewidth. Hence we can apply part (i) of theorem 1 to the class and obtain an upper bound of the form for any , where is a constant. By the above containment of set systems, this upper bound carries over to restrictions of . This concludes the proof of part (ii) of theorem 1.
4 Lower bounds
We now turn to proving theorem 2. As in the work of Grohe and Turán [Grohe04msotreevcdim], the main idea is to show that the structures responsible for unbounded VC dimension of definable set systems are grids. That is, the first step is to prove a suitable unboundedness result for the class of grids, which was done explicitly by Grohe and Turán in [Grohe04msotreevcdim, Example 19]. Second, if the considered graph class has unbounded treewidth (resp., cliquewidth), then we give a deterministic transduction (resp. transduction) from to the class of grids. Such transductions are present in the literature and follow from known forbiddenstructures theorems for treewidth and cliquewidth. Then we can combine these two steps into the proof of theorem 2 using the following generic statement. In the following, we shall say that logic has unbounded VC dimension on a class of structures if there exists a partitioned formula over the signature of such that the class of set systems has infinite VC dimension.
Lemma 16.
Let and be two classes of structures and . Suppose that there exists a deterministic transduction with input signature being the signature of and the output signature being the signature of such that . Then if has unbounded VC dimension on , then also has unbounded VC dimension on .
Proof.
Let formula witness that has unbounded VC dimension on . Then it is easy to see that the formula , provided by lemma 9, witnesses that has unbounded VC dimension on .
4.1 Grids
For , we denote . An grid is a relational structure over the universe with two successor relations. The horizontal successor relation selects all pairs of elements of the form , where and . Similarly, the vertical successor relation selects all pairs of elements the form , where and . Note that these relations are not symmetric: the second element in the pair must be the successor of the first in the given direction.
Grohe and Turán proved the following.
Theorem 17 (Example 19 in [Grohe04msotreevcdim]).
has unbounded VC dimension on the class of grids.
The proof of theorem 17 roughly goes as follows. The key idea is that for a given set of elements it is easy to verify in the following property: is true if and only if the th bit of the binary encoding of is . This can be done on the rowbyrow basis, by expressing that elements of in every row encode, in binary, a number that is one larger than what the elements of encoded in the previous row. Using this observation, one can easily write a formula that selects exactly pairs of the form such that . Then shatters the set , as the binary encodings of numbers from to
give all possible bit vectors of length
when restricted to the first bits. Consequently, shatters a set of size in an grid, which enables us to deduce the following slight strengthening of theorem 17: has unbounded VC dimension on any class of structures that contains infinitely many different grids.For the purpose of using existing results from the literature, it will be convenient to work with grid graphs instead of grids. An grid graph is a graph on vertex set where two vertices and are adjacent if and only if . When speaking about grid graphs, we assume the adjacency encoding as relational structures. Thus, the difference between grid graphs and grids is that the former are only equipped with a symmetric adjacency relation without distinguishement of directions, while in the latter we may use (oriented) successor relations, different for both directions. Fortunately, grid graphs can be reduced to grids using a wellknown construction, as explained next.
Lemma 18.
There exists a nondeterministic transduction from the adjacency encodings of graphs to grids such that for every class of graphs that contains arbitrarily large grid graphs, the class contains arbitrarily large grids.
Proof.
The transduction uses six additional unary predicates, that is, . We explain how the transduction works on grid graphs, which gives rise to a formal definition of the transduction in a straightforward way.
Given an grid graph , the transduction nondeterministically chooses the valuation of the predicates of as follows: for , selects all vertices such that and selects all vertices such that . Then the horizontal successor relation can be interpreted as follows: holds if and only if and are adjacent in , and are both selected by for some , and there is such that is selected by while is selected by . The vertical successor relation is interpreted analogously.
It is easy to see that if is an grid graph and the valuation of the predicates of is selected as above, then indeed outputs an grid. This implies that if contains infinitely many different grid graphs, then contains infinitely many different grids.
We may now combine lemma 18 with theorem 17 to show the following.
Lemma 19.
Suppose and is a class of structures such that there exists a nondeterministic transduction from to adjacency encodings of graphs such that contains infinitely many different grid graphs. Then there exists a finite signature consisting only of unary relation names such that has unbounded VC dimension on .
Proof.
As nondeterministic transductions are closed under composition for all the three considered variants of logic (see e.g. [Courcelle94msotransductionsurvey]), from lemma 18 we infer that there exists a nondeterministic transduction such that contains infinitely many different grids. By definition, transduction has its deterministic part such that . It now remains to take and use lemma 16 together with theorem 17 (and the remark after it).
4.2 Classes with unbounded treewidth and cliquewidth
For part (ii) of theorem 2 we will use the following standard proposition, which essentially dates back to the work of Seese [Seese91msotwundecidable].
Lemma 20.
There exists a nondeterministic transduction from incidence encodings of graphs to adjacency encodings of graphs such that for every graph class whose treewidth is not uniformly bounded, the class contains all grid graphs.
Proof.
Recall that a minor model of a graph in a graph is a mapping from to connected subgraphs of such that subgraphs are pairwise disjoint, and for every edge there is an edge in with one endpoint in and the other in . Then contains as a minor if there is a minor model of in . By the Excluded Grid Minor Theorem [Robertson86excludedgridtheorem], if a class of graphs has unbounded treewidth, then every grid graph is a minor of some graph from . Therefore, it suffices to give a nondeterministic transduction from incidence encodings of graphs to adjacency encodings of graphs such that for every graph , contains all minors of .
The transduction works as follows. Suppose is a given graph and is a minor model of some graph in . First, in we nondeterministically guess three subsets:

[nosep]

a subset of vertices, containing one arbitrary vertex from each subgraph of ;

a subset of edges, consisting of the union of spanning trees of subgraphs (where each spanning tree is chosen arbitrarily);

a subset of edges, consisting of one edge connecting a vertex of and a vertex of for each edge , chosen arbitrarily.
Recall that graph is given by its incidence encoding, hence these subsets can be guessed using three unary predicates in . Now with sets in place, the adjacency encoding of the minor can be interpreted as follows: the vertex set of is , while two vertices are adjacent in if and only if in they can be connected by a path that traverses only edges of and one edge of . It is straightforward to express this condition in .
Observe that part (ii) of theorem 2 follows immediately by combining lemma 20 with lemma 19. Indeed, from this combination we obtain a partitioned formula and a finite signature consisting of unary relation names such that the class of set systems has infinite VC dimension. Here, we treat as the class of incidence encodings of graphs from . Now if we take the label set to be the powerset of , we can naturally modify to an equivalent formula working over velabelled graphs, where the label of every vertex encodes the subset of predicates of that select . Thus has infinite VC dimension, which concludes the proof of part (ii) of theorem 2.
To prove part (i) of theorem 2 we apply exactly the same reasoning, but with lemma 20 replaced with the following result of Courcelle and Oum [Courcelle07vertexminorsseese].
Lemma 21 (Corollary 7.5 of [Courcelle07vertexminorsseese]).
There exists a transduction from adjacency encodings of graphs to adjacency encodings of graphs such that if is a class of graphs of unbounded cliquewidth, then contains arbitrarily large grid graphs.