# Subsampling large graphs and invariance in networks

Specify a randomized algorithm that, given a very large graph or network, extracts a random subgraph. What can we learn about the input graph from a single subsample? We derive laws of large numbers for the sampler output, by relating randomized subsampling to distributional invariance: Assuming an invariance holds is tantamount to assuming the sample has been generated by a specific algorithm. That in turn yields a notion of ergodicity. Sampling algorithms induce model classes---graphon models, sparse generalizations of exchangeable graphs, and random multigraphs with exchangeable edges can all be obtained in this manner, and we specialize our results to a number of examples. One class of sampling algorithms emerges as special: Roughly speaking, those defined as limits of random transformations drawn uniformly from certain sequences of groups. Some known pathologies of network models based on graphons are explained as a form of selection bias.

## Authors

• 8 publications
05/09/2019

### Fast uniform generation of random graphs with given degree sequences

In this paper we provide an algorithm that generates a graph with given ...
02/09/2021

### More Is More – Narrowing the Generalization Gap by Adding Classification Heads

Overfit is a fundamental problem in machine learning in general, and in ...
10/15/2020

### Large Very Dense Subgraphs in a Stream of Edges

We study the detection and the reconstruction of a large very dense subg...
02/21/2019

### Local Computation Algorithms for Spanners

A graph spanner is a fundamental graph structure that faithfully preserv...
09/18/2018

### Connectivity and Structure in Large Networks

Large real-life complex networks are often modeled by various random gra...
11/10/2021

### Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Machine learning models that are developed to be invariant under certain...
04/22/2018

### Sampling in Uniqueness from the Potts and Random-Cluster Models on Random Regular Graphs

We consider the problem of sampling from the Potts model on random regul...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Consider a large graph or network, and invent a randomized algorithm that generates a subgraph. The algorithm can be understood as a model of an experimental design—a protocol used to collect data in a survey, or to sample data from a network—or as an actual program extracting data from a data base. Use the algorithm to extract a sample graph, small relative to input size. What information can be obtained from a single such sample? Certainly, that should depend on the algorithm. We approach the problem starting from a simple observation: Fix a sequence with entries. Generate a random sequence by sampling elements of , uniformly and independently with replacement. Then is exchangeable, and it remains so under a suitable distributional limit in input size. Similarly, one can generate an exchangeable graph (a random graph whose law is invariant under permutations of the vertex set) by sampling vertices of an input graph independently, and extracting the induced subgraph. The resulting class of random graphs is equivalent to graphon models [14, 15, 22]. Exchangeability is an example of distributional symmetry, that is, invariance of a distribution under a class of transformations [36]. Thus, a randomized algorithm (independent selection of elements) induces a symmetry principle (exchangeability of elements), and when applied to graphs, it also induces a model class (graphon models). The purpose of this work is to show how the interplay of these properties answers what can be learned from a single subgraph, both for the example above and for other algorithms.

### 1.1 Overview

Start with a large graph . The graph may be unadorned, or a “network” in which each edge and vertex is associated with some mark or observed value. We always assume that the “initial subgraph of size ”, denoted , is unambiguously defined. This may be the induced subgraph on the first vertices (if vertices are enumerated), the subgraph incident to the first edges (if edges are enumerated), the neighborhood of size around a fixed root, et cetera; details follow in Section 2. Now invent a randomized algorithm that generates a graph of size from the input , and denote this random graph . For example:

###### Algorithm 1.
i.) Select k vertices of y|n independently and uniformly without replacement. Extract the induced subgraph Sn→k(y|n) of y|n on these vertices. Label the vertices of Sn→k(y|n) by 1,…,k in order of appearance.

We assume

is so large that it is modeled as infinite, and hence ask for a limit in input size: Is there a random variable

that can be regarded as a sample of infinite size from an infinite input graph ? That is, a variable such that, for each output size , the restriction is the distributional limit in input size ,

 Sn→k(y)\tiny d−−−−−−→S∞(y)|k as n→∞.

Necessary and sufficient conditions are given in Theorem 1. By sampling and passing to the limit, some information about is typically lost— cannot be reconstructed precisely from the sample output, which is of some importance to our purposes; see Remark 2.

The main problem we consider is inference from a single realization: By a model on , we mean a set

of probability measures on

. Observed is a single graph of size . We choose a sampling algorithm as a modeling assumption on how the data was generated, and take its limit as above. For a given set of input graphs, the algorithm induces a model

 P:={Py|y∈Y} where Py:=L(S∞(y)).

The observed graph is modeled as a sample of size from an infinite input graph. Even as , this means that only a single draw from the model distribution is available. When is finite, this is further constrained to a partial realization.

We have already observed the sampler output is exchangeable for certain algorithms, or more generally invariant under some family of transformations of . A fundamental principle of ergodic theory is that invariance under a suitable family of transformations defines a set of distinguished probability measures, the -ergodic measures. Section 4 reviews the relevant concepts. The utility of invariance to our problem is that ergodic measures of suitable transformation families are distinguishable by a single realization: If a model is chosen as a subset of these -ergodic measures, and a single draw from some (unknown) element of is observed, the distribution is unambiguously determined by . The following consequence is made precise in Section 5:

If the limiting sampler can be shown to have a suitable invariance property, then inference from a single realization is possible in principle.

We make the provision in principle as the realization only determines abstractly—for an actual algorithm, there is no obvious way to derive from an observed graph. We hence ask under what conditions the expectations

 E[f(S∞(y))]=Py(f) for some f∈L1(Py)

can be computed or approximated given an observed realization. (Here and throughout, we use the notation .) Theorem 5 provides the following answer: If is specifically a group satisfying certain properties, one can choose a specific sequence of finite subsets of of and define

 Fxk:=1|Ak|∑t∈Akδt(x) hence Fxk(f)=1|Ak|∑t∈Akf(t(x)).

The sequence can be thought of as an empirical measure, and satisfies a law of large numbers: If a sequence of functions converges almost everywhere to a function , then

 FS∞(y)k(fk)k→∞−−−→E[f(S∞(y))] (1)

under suitable conditions on the sampler. Theorem 5 formulates such convergence as a law of large numbers for symmetric random variables, and subsumes several known results on exchangeable structures, graphons, etc.

The graph in (1) is typically infinite. To work with a finite sample , we formulate additional conditions on that let transformations act on finite structures, which leads to a class of groups we call prefix actions. We then define sampling algorithms by randomly applying a transformation: Fix a prefix action , draw a random transformation uniformly from those elements of that affect only a subgraph of size , and define

 Sn→k(y)=Φn(y|n)|k.

In words, randomly transform using and then truncate at output size . These algorithms turn out to be of particular interest: They induce various known models—graphons, edge-exchangeable graphs, and others—and generically satisfy -invariance and other non-trivial properties; see Theorem 9. The law of large numbers strengthens to

 FSk(y)k(fk)k→∞−−−→E[f(y)]. (2)

In contrast to (1), the approximation is now a function of a finite sample of size , and the right-hand side a functional of the input graph , rather than of . See Corollary 10.

With the general results in place, we consider specific algorithms. In some cases, the algorithm induces a known class of random graphs as its family of possible output distributions; see Table 1. Section 8 concerns Algorithm 1, exchangeable graphs, and graphon models. We consider modifications of Algorithm 1, and show how known misspecification problems that arise when graphons are used to model network data can be explained as a form of selection bias. Section 9 relates well-known properties of exchangeable sequences and partitions to algorithms sampling from fixed sequences and partitions. That serves as preparation for Section 10, on algorithms that select a random sequence of edges, and report the subgraph incident to those edges. If the input graph is simple, a property of

can be estimated from the sample output if and only if it is a function of the degree sequence of

. If is a multigraph and the limiting relative multiplicities of its edges sum to 1, the algorithm generates edge-exchangeable graphs in the sense of [21, 18, 33]. The two cases differ considerably: For simple input graphs, the sample output is completely determined by vertex degrees, for multigraphs by edge multiplicities. If a sampling algorithm explores a graph by following edges—as many actual experimental designs do, see e.g. [40]—the stochastic dependence between edges tends to become more complicated, and understanding symmetries of such algorithms is much harder. Section 11 puts some previously known properties of methods algorithms that sample neighborhoods in the context of this work.

### 1.2 Related work

111The ideas proposed here are used explicitly in forthcoming work of Veitch and Roy [49] and Borgs, Chayes, Cohn, and Veitch [17], both already available as preprints. Cf. Section 8.4.

Related previous work largely falls into two categories: One concerns random graphs, exchangeability, graph limits, and related topics. This work is mostly theoretical, and intersects probability, combinatorics, and mathematical statistics [14, 15, 22, 36, 5, 10, 27, 39, 45]. A question closely related to the problem considered here—what probabilistic symmetries aside from exchangeability of vertices are applicable to networks analysis problems—was posed in [45]. One possible solution, due to Caron and Fox [19], is to require exchangeability of an underlying point process. This idea can be used to generalize graph limits to sparse graphs [48, 16]. Another answer are random multigraphs whose edges, rather than vertices, are exchangeable [21, 18, 33]. These are exchangeable partitions, in the sense of Kingman [38], of the upper triagonal of the set . The second related category of work covers experimental design in networks, and constitutes a substantial literature, see [40] for references. This literature tends to be more applied, although theoretical results have been obtained [e.g. 41]. The two bodies of work are largely distinct, with a few notable exceptions, such as the results on identifiability problems in [20].

The specific problem considered here—the relationship between sampling and symmetry—seems largely unexplored, but Aldous reasons about exchangeability in terms of uniform sampling in [2], and, in joint work with Lyons [1], extends the work of Benjamini and Schramm [8] from a symmetry perspective. (Kallenberg [34] and other authors use the term sampling differently, for the explicit generation of a draw from a suitable representation of a distribution.) More closely related from a technical perspective than experimental design in networks are two ideas popular in combinatorics. One is property testing, which samples uniform parts of a large random structure , and then asks with what probability is in a certain class; see [4]. The second is used to define convergence of discrete structures: Start with a set of such structures, and equip it with some initial metric (so convergence in distribution is defined). Invent a randomized algorithm that generates a “substructure” of a fixed structure , and call a sequence convergent if the laws converge weakly. The idea is exemplified by [8], but seems to date back further, and is integral to the construction of graph limits: The topology on dense graphs defined in this manner by Algorithm 1 is metrizable, by the “cut metric” of Frieze and Kannan [14, 15].

## 2 Spaces of graphs and discrete structures

Informally, we consider a space of infinite structures, spaces of finite substructures of size , and a map that takes a structure of size to its initial substructure of size . For example, if (resp. ) consists of labeled graphs vertex set (resp. ), then may map to the induced subgraph on the first vertices. More formally, these objects are defined as follows: Let , for , be countable sets. Require that for each pair , there is a surjective map

 \vbox\tiny∙∣∣n:Xn→Xm such that xn∣∣m∣∣k=xn∣∣k if xn∈Xn and k≤m≤n. (3)

We write whenever . In words, is a substructure of . An infinite sequence

 x:=(x1,x2,…) with xn∈Xn and xn⪯xn+1 for all n∈N (4)

can then be regarded as a structure of infinite size. The set of all such sequences is denoted . The maps can be extended to by defining if as above.

If each point in is an infinite graph, two natural ways to measure size of subgraphs is by counting vertices, or by counting edges. Since the notion of size determines the definition of the restriction map , the two lead to different types of almost discrete spaces:

(i) Counting vertices. Choose as the set of graphs a given type (e.g. simple and undirected) with vertex set , and as the analogous set of graphs with vertex set . The restriction map extracts the induced subgraph on the first vertices, i.e.  is the graph with vertex set that contains those edges of with . Graph size is the cardinality of the vertex set.

(ii) Counting edges. A graph with vertices in is represented as a sequence of edges, , where . Each set consists of all graphs with edges, , and vertex set . The restriction map

 x↦x∣∣n:=((i1,j1),…,(in,jn)) (5)

extracts the first edges, and graphs size is cardinality of the edge set.

To define probabilities on requires a notion of measurability, and hence a topology. We endow each countable set with its canonical, discrete topology, and with the smallest topology that makes all maps continuous. A topological space constructed in this manner is called procountable. Any procountable space admits a canonical “prefix metric”

 d(x,x′):=infn∈N{2−n∣∣x|n=x′|n}, (6)

which is indeed an ultrametric. The ultrametric space is complete. An almost discrete space is a proucountable space that is separable, and hence Polish. Throughout, all sets of infinite graphs are almost discrete spaces, or subsets of such spaces. If every set is finite, is called profinite (or Boolean, or a Stone space) [28]. A space is profinite iff it is almost discrete and compact. A random element of is defined up to almost sure equivalence by a sequence satisfying (4) almost surely. A probability measure on is similarly defined by its “finite-dimensional distributions” on the spaces , by standard arguments [e.g. 35]. This representation can be refined to represent a measures on topological subspaces of ; see Appendix A.

For example, if is the finite set of simple, undirected graphs with vertex set , and extracts the induced subgraph on the first vertices, is the space of all simple, undirected graph on , and its topology coincides with the one inherited from the product topology on . This space is the natural habitat of graph limit theory. The “cut norm” topology [14] coarsens the almost discrete topology on .

## 3 Subsampling algorithms

To formalize the notion of subsampling, first consider a finite input graph . The algorithm has access to a random source, which generates a sequence of i.i.d. uniform random variables in , with joint law . Given and the first uniform variables, the algorithm generates a random output graph in . Formally, this is a (jointly) measurable map

 Sn→k:Yn×[0,1]k→Xk, (7)

which we will often read as an -valued random variable , parametrized by . Since each sampling step augments the previously sampled graph, we require these random variables to cohere accordingly, as

 Sn→k(yn)⪯Sn→k+1(yn)almost% surely. (8)

It suffices to require that exists for sufficiently large: For each and , there is an such that is defined for all . A sampling algorithm is a family as in (7) that satisfies (8).

To explain sampling from an infinite graph , we ask whether there is a limiting variable such that convergence holds in distribution as . For that to be the case, it is certainly necessary that the limits

exist for all . We call the limit a prefix density

. These prefix densities can be collected into a vector,

 t(y):=(txk(y))k∈N,xk∈Xk which is a measurable map t:Y→[0,1]∪kXk.

Our first main result shows the necessary condition that exists is indeed sufficient:

###### Theorem 1.

Let be an almost discrete space, and any subset, equipped with the restriction of the Borel sets of . Let be a sampling algorithm . If the prefix densities exist on , there exists a jointly measurable function

 S∞:Y×[0,1]→X % satisfying Sn→k(y|n,U)%d−−−→S∞(y,U)∣∣k as n→∞

for all and .

There is hence a random variable , with values in , which can be interpreted as an infinite or “asymptotically large” sample from an infinite input graph . Each restriction represents a sample of size from . If repeated application of sampling preserves the output distribution, i.e. if

we call the algorithm idempotent.

###### Remark 2.

The limit in output size is an inverse limit: A growing graph is assembled as in (4), and all information in can be recovered from . In contrast, the limit in input size is distributional, so the input graph can typically not be reconstructed, even from an infinite sample . The limit of Algorithm 1, for example, will output an empty graph if has a finite number of edges, or indeed if the number of edges in grows sub-quadratically in . We regard as a measurement of properties of a “population” (see [40] for a discussion of populations in network problems, and [45] for graph limits as populations underlying exchangeable graph data). An infinitely large sample makes asymptotic statements valid, in the sense that any effect of finite sample size can be made arbitrarily small, but does not exhaust the population.

## 4 Background: Invariance and symmetry

We use the term invariance

to describe preservation of probability distributions under a family of transformations; we also call an invariance a

symmetry if this family is specifically a group. Let be a standard Borel space, and a family of measurable (but not necessarily invertible) transformations . A random element of is -invariant if its law remains invariant under every element of ,

 t(X)\rm\tiny d=X % for all t∈T. (9)

Analogously, a probability measure is -invariant if the image measure satisfies for all . We denote the set of all -invariant probability measures on by . It is trivially convex, though possibly empty if the family is “too large”.

### 4.1 Ergodicity

Inference from a single instance relies on the concept of ergodicity. A Borel set is invariant if for all , and almost invariant if

 P(A△t−1A)=0 for all t∈T and all % P∈\textscInv.

We denote the system of all invariant sets , and that of all almost invariant sets . Both are -algebras. Recall that a probability is trivial on a -algebra if for all . For a probability measure on , we define:

 P is {T-ergodic}:⇔P is T-invariant, and trivial on ¯¯¯σ(T).

The set of all ergodic measures is denoted .

### 4.2 Groups and symmetries

We reserve the term symmetry for invariance under transformation families that form a group. A useful concept in this context is the notion of a group action: For example, if is a space of graphs, a group of permutations may act on a graph by permuting its vertices, by permuting its edges, by permuting certain subgraphs, etc. Such different effects of one and the same group can be formalized as maps that explain how a permutation affects the graph . Formally, let be a group, with unit element . An action of on is a map , customarily denoted , with the properties

 (i)Te(x)=x for all x∈X and (ii)Tϕ∘Tϕ′=Tϕϕ′ for all ϕ,ϕ′∈G.

If is equipped with a topology, and with the corresponding Borel sets, is a measurable action if it is jointly measurable in both arguments. Any measurable action on defines a family of transformations . Clearly, each element of is a bimeasurable bijection, and is again a group. The orbit of an element of under a group action is the set . The orbits of form a partition of into disjoint sets. If there exists a Polish topology on that makes measurable, each orbit is a measurable set [7, §2.3].

### 4.3 Characterization of ergodic components

The main relevance of ergodicity to our purposes is that, informally, the elements of a model chosen as a subset can be distinguished from one another by means of a single realization, provided that is not too complex. In other words, if a random variable is assumed to be distributed according to some distribution in , then a single draw from determines this distribution unambiguously within . That is a consequence of the ergodic decomposition theorem, whose various shapes and guises are part of mathematical folklore. To give a reasonably general statement, we have to formalize that be “not too complex”: Call separable if a countable subset exists that defines the same set of invariant measures as ,

 \textscInv(T0)=\textscInv(T). (10)

If so, we call a separating subset. Criteria for verifying separability are reviewed in Appendix A. If is separating, both the ergodic probability measures and the almost invariant sets defined by the two families coincide,

 \sc Erg(T0)=\sc Erg(T) % and ¯¯¯σ(T0)=¯¯¯σ(T). (11)

The following form of the decomposition theorem is amalgamated from [25, 29, 43]:

###### Theorem 3 (Folklore).

Let be a separable family of measurable transformations of a standard Borel space . Then the -ergodic measures are precisely the extreme points of , and for every pair of ergodic measures, there is a set such that and . A random element of is -invariant if and only if there is a random probability measure on such that

 ξ∈\sc Erg(T) and P[X∈\vbox\tiny∙|ξ]=ξ(\vbox\tiny∙) (12)

almost surely. If so, the law of is uniquely determined by the law of .

The decomposition theorem can be understood as a generalization of the representation theorems of de Finetti, of Aldous and Hoover, and similar results: In expectation, the almost sure identity (12) takes the weaker but more familiar form

 P(\vbox\tiny∙)=∫\sc Erg(T)ν(\vbox\tiny∙)μξ(dν) where μξ:=L(ξ).

In de Finetti’s theorem, the ergodic measures are the laws of i.i.d. sequences; in the Aldous-Hoover theorem applied to simple, undirected, exchangeable graphs, they are those distributions represented by graphons; etc. The generality of Theorem 3 comes at a price: The theorems of de Finetti, Kingman, and Aldous-Hoover provide constructive representations of (12): There is a collection of independent, uniform random variables on , and a class of measurable mappings , such that each ergodic random element can be represented as for some . The representation is non-trivial in that each finite substructure can be represented analogously by a finite subset of the collection . Kallenberg [36] calls such a representation a coding. Existence of a coding can be much harder to establish than (12), and not all invariances seem to admit codings.

### 4.4 Definitions of exchangeability

The term exchangeability generically refers to invariance under an action of either the finitary symmetric group , or of the infinite symmetric group of all bijections of . For the purposes of Theorem 3, both definitions are typically equivalent: The group inherits its natural topology from the product space , which makes the subgroup a dense subset. If is a continuous action of on a metrizable space, the image hence lies dense in in pointwise convergence, which in turn implies is a separating subset for (see Section A.2).

In terms of their orbits, the two actions differ drastically: If , for example, and permutes sequence indices, each orbit of is countable. Not so for : Let be the set of sequences containing both an infinite number of 0s and of 1s. For any two , there exists a bijection with . The set thus constitutes a single, uncountable orbit of , which is complemented by a countable number of countable orbits. That illustrates the role of almost invariant sets: By de Finetti’s theorem, the ergodic measures are factorial Bernoulli laws. For all Bernoulli parameters , these concentrate on , and does not subdivide further into strictly invariant sets. In other words, does not provide sufficient resolution to guarantee mutual singularity in Theorem 3, but the almost invariant sets do. Vershik [50] gives a detailed account. Unlike Theorem 3, more explicit results like the law of large numbers in Section 6 rely on the orbit structure, and must be formulated in terms of .

## 5 Sampling and symmetry

We now consider the fundamental problem of drawing conclusions from a single observed instance in the context of sampling. For now, we assume the entire, infinite output graph is available. Consider a sampling algorithm , with input set and output space , defined as in Section 3, whose prefix densities exist for all . We generically denote its output distributions

 Py:=L(S∞(y)).

Suppose a model is chosen as a subset of . Can two elements be distinguished from another given a single sample ? That can be guaranteed only if

 Py(A)=1 and Py′(A)=0 for some % Borel set A⊂X. (13)

To decide more generally which distribution in (and hence which input graph ) accounts for , we define

 Σ:=⋂y∈YΣy where Σy:={A∈B(X)∣∣Py(A)∈{0,1}}.

Then is a -algebra. From (13), we conclude:

Determining the input graph based on a single realization of is possible if the output laws are pairwise distinct on .

The sampling algorithm does not typically preserve all information provided by the input graph, due to the distributional limit defining . Thus, demanding that all pairs be distinguishable may be too strong a requirement. The -algebra defines a natural equivalence relation on ,

 y≡\rm\tiny Sy′:⇔Py(A)=Py′(A) for all A∈Σ.

More colloquially, means and cannot be distinguished given a single realization . We note does not generally imply : The measures may be distinct, but detecting that difference may require multiple realizations. We call the algorithm resolvent if

 y≡\rm\tiny Sy′ implies Py=Py′. (14)

Let denote the equivalence class of . If is resolvent, we can define , and formulate the condition above as

 P^y and P^y′ are mutually singular on Σ whenever ^y≠^y′. (15)

Establishing mutual singularity requires identifying a suitable system of sets in (13), which can be all but impossible: Since is Polish, each measure has a unique support (a smallest, closed set with ), but these closed support sets are not generally disjoint. To satisfy (13), is chosen more generally as measurable, but unlike the closed support, the measurable support of a measure is far from unique. One would hence have to identify a (possibly uncountable) system of not uniquely determined sets, each chosen just so that (13) holds pairwise.

If an invariance holds, Theorem 3 solves the problem. That motivates the following definition: A measurable action of a group on is a symmetry of the algorithm if all output distributions are -ergodic. If is countable, that is equivalent to demanding

 (i)Tϕ(S∞(y))\rm\tiny d% =S∞(y) for all y∈Y,ϕ∈G and (ii)σ(G)⊂Σ. (16)

If is uncountable, (ii) must be strengthened to . Clearly, an algorithm that admits a separable symmetry is resolvent; thus, symmetry guarantees (15). We note mutual singularity could be deduced without requiring is a group action; this condition anticipates the law of large numbers in Section 6.

### 5.1 A remark: What can be said without symmetry

If we randomize the input graph by substituting a random element of with law for , the resulting output distribution is the mixture . We define the set of all such laws as , where is the space of probability measures on . Clearly, is convex, with the laws as its extreme points. Without further assumptions, we can obtain the following result. It makes no appeal to invariance, and cannot be deduced from Theorem 3 above.

###### Proposition 4.

Let be a sampling algorithm with prefix densities. Then for every , there exists a measurable subset of probability measures such that all measures in are (i) mutually singular and (ii) 0–1 on . There exists a random probability measure on such that and almost surely.

A structure similar to Theorem 3 is clearly recognizable. That said, the result is too weak for our purposes: The set of of representing measures depends on , which means it cannot be used as a model, and the result does not establish a relationship between the measure and the elements of . Note it holds if, but not only if, .

## 6 Symmetric laws of large numbers

Consider a similar setup as above: A random variable takes values in a standard Borel space , and its distribution is invariant under a measurable action of a group . Let be a function in . If is separable, Theorem 3 shows that is generated by drawing an instance of —that is, by randomly selecting an ergodic measure—and then drawing . The expectation of given the instance of that generated is

 ξ(f)=E[f(X)|ξ]=E[f(x)|¯¯¯σ(G)]a.s.

Again by Theorem 3, observing completely determines the instance of . In principle, hence completely determines . These are all abstract quantities, however; is it possible to compute from a given instance of ?

If the group is finite, the elementary properties of conditional expectations imply

 E[f(X)|¯¯¯σ(G)]=1|G|∑ϕ∈Gf(Tϕ(X)) almost surely,

so is indeed given explicitly. The groups arising in the context of sampling are typically countably infinite. In this case, the average on the right is no longer defined. It is then natural to ask whether can be approximated by finite averages, i.e. whether there are finite sets such that

Since is invariant under each , each average on the left must be invariant at least approximately: A necessary condition for convergence is certainly that, for any , the relative size of the displacement can be made arbitrarily small by choosing large. That is formalized in the next condition, (17)(i).

A countable group is amenable if there is a sequence of finite subsets of with the property: For some and all ,

 (i)|ϕAk∩Ak|n→∞−−−→|Ak| and (ii)∣∣∪j

A sequence satisfying (i) is called almost invariant. This first condition turns out to be the crucial one: If a sequence satisfying (i) exists, it is always possible to find a sequence satisfying (i) and (ii), by passing to a suitable subsequence if necessary [42, Proposition 1.4]. Thus, is a amenable if it contains a sequence satisfying (i). Amenable groups arise first and foremost in ergodic theory [e.g. 24], but also, for example, in hypothesis testing, as the natural class of groups satisfying the Hunt-Stein theorem [13]. If (17) holds, and is a measurable action of on , we call the measurable mapping

 (x,k)↦Fxk(\vbox\tiny∙):=1|Ak|∑ϕ∈AkδTϕ(x)(\vbox\tiny∙) (18)

an empirical measure for the action .

###### Theorem 5.

Let be a random element of a Polish space , and functions in , such that almost surely under the law of . Let be a measurable action of a countable group satisfying (17), and the empirical measure defined by . If is invariant under , then

 FXk(fk)n→∞−−−→ξ(f)% almost surely, (19)

where is the random ergodic measure in (12). If moreover there is a function such that for all , convergence in (19) also holds in .

The finitary symmetric group satisfies (17) for . The law of large numbers (19) hence holds generically for any “exchangeable random structure”, i.e. for any measurable action of . Special cases include the law of large numbers for de Finetti’s theorem, the continuity of Kingman’s correspondence [46, Theorem 2.3], and Kallenberg’s law of large numbers for exchangeable arrays [34]. They can be summarized as follows:

###### Corollary 6.

If a random element of a Polish space is invariant under a measurable action of , the empirical measure converges weakly to as , almost surely under the law of .

For sequences, the empirical measure can be broken down further into a sum over sequence entries, and redundancy of permutations then shrinks the sum from to terms. Now suppose that is specifically the output of a sampling algorithm:

###### Corollary 7.

Let be a sampling algorithm whose prefix densities exist for all . Suppose a countable amenable group is a symmetry group of under a measurable action . If samples from a random input graph , then

 FS∞(Y)k(fk)k→∞−−−→PY(f)L(Y)-a.s.

holds for any functions satisfying and -a.s. for -almost all .

For example, one can fix a finite structure of size , and choose as the indicator . Corollary 7 then implies

 1Ak∑ϕ∈AkI{Tϕ(S∞(y))|j=xj}k→∞−−−→txj(y),

which makes a (strongly) consistent estimator of the prefix density from output generated by the sampler. Here, is still an infinite structure. If the action is such that the elements of each set affect only the initial substructure of size , we can instead define for graphs of size . Thus, , and . If a sample is generated from using ,

 1Ak∑ϕ∈AkI{Tϕ(Sk(y))|j=xj}k→∞−−−→txj(y)

consistently estimates from a finite sample of increasing size. The sampling algorithms discussed in the next section admit such estimators.

## 7 Sampling by random transformation

We now consider group actions where each element of the group changes only a finite substructure: replaces a prefix of by some other structure of size . We can hence subdivide the group into subsets , for each , consisting of elements which only affect the prefix of size . Thus, . If only affects a prefix of size , then typically so does its inverse, and each subset is itself a group. If each subgroup is finite, the group is hence of the form

 G=∪n∈NGn for some finite % groups G1⊂G2⊂…. (20)

A group satisfying (20) is called direct limit or direct union of finite groups. Since it is countable, any measurable action satisfies Theorem 3. Plainly, also satisfies (17), with . Thus, for any measurable action ,

 (x,n)↦Fxn(\vbox\tiny∙)=∑ϕ∈GnδTϕ(x)(\vbox\tiny∙)

is an empirical measure, and satisfies the law of large numbers (19).

If each affects only a finite substructure, the action must commute with restriction, in the sense that

 Tn(ϕ,x|n)=T(ϕ,x)∣∣n for an action Tn:Gn×Xn→Xn and all ϕ∈Gn,x∈X. (21)

We call any action of a direct limit group that satisfies (21) a prefix action. In most cases, one can think of a prefix action as a map that removes the subgraph from by some form of “surgery”, and then pastes in another graph of the same size. The action is hence a subset of the group of all permutations of . If is finite, so is , which is hence a valid choice for . Prefix actions include, for example, the case where is the action of on the first vertices, but it is worth noting that can be much larger: is typically of size exponential in . We observe:

###### Proposition 8.

Prefix actions on almost discrete spaces are continuous.

### 7.1 Random transformations

Transformation invariance can be built into a sampling algorithm by constructing the algorithm from a random transformation. For a random element of , we define

 Sn→k(y):=T(Φn,y)∣∣k for each y∈Y⊂X. (22)

If is prefix action, one can equivalently substitute for on the right-hand side. Algorithm 1 can for instance be represented in this manner, by choosing as a uniform random permutation of the first vertices. The next results assume the following conditions: The uniform random elements used in the construction are only defined on finite subgroups, but whenever prefix densities exist, one can once again take the limit in input size and obtain a limiting sampler . These samplers are particularly well-behaved:

###### Theorem 9.

Let be a sampling algorithm satisfying (7.1). Then for all ,

 \rm(i)t∘Tϕ=tfor ϕ∈G\rm(ii)t(S∞(y))\resizebox2.4pt\rm a.s.=t(y)\rm(iii)t(y)=t(y′) iff y≡\rm\tiny Sy′.

Each output distribution is -invariant, and the law of a sample of size is -invariant. The algorithm is idempotent and resolvent, and any two output distributions and are either identical, or mutually singular.

One can ask whether it is even possible to recover properties of the input graph: If and is a statistic, can be estimated based on ? Since the sampling algorithm does not resolve differences between to equivalent input graphs , a minimal requirement is that be constant on equivalence classes,

 f(y)=f(y′) whenever y≡\rm\tiny Sy′. (24)

For algorithms defined by random transformations, the law of large numbers strengthens to:

###### Corollary 10.

Suppose a sampling algorithm satisfies (7.1), and is a Borel function satisfying (24). Require is -ergodic. Let be a sequence of functions on . Then for every with (i) and (ii) -a.s.,

 1|Gk|∑ϕ∈Gkfk(S∞(y))k→∞−−−→f(y)Py-a.s.

If is replaced by a -valued random variable , and (i) and (ii) hold -a.s., convergence holds -a.s.

### 7.2 The topology induced by a sampling algorithm

Any sampling algorithm whose prefix densities exist on a set induces a topology on this set, the weak topology of (i.e. the smallest topology on that makes each prefix density continuous). Informally speaking, if the equivalence classes of coincide with the fibers of (as is the case in Theorem 9), this is the smallest topology that distinguishes input points whenever they are distinguishable by the sampler. If is defined by Algorithm 1, the prefix densities are precisely the “homomorphism densities” of graph limit theory—depending on the definition, possibly up to normalization [22]. The weak topology of is hence the cut norm topology [14]. The cut norm topology is defined on the set of undirected, simple graphs with vertex set , and coarsens the almost discrete topology on this set. One may hence ask how this property depends on the sampler: Under what conditions on the subsampling algorithm does the topology induced by the sampler coarsen the topology of the input space? If the algorithm is defined by random transformation as above, that is always the case:

###### Proposition 11.

Let be a sampling algorithm defined as in (22) by a prefix action on an almost discrete space . Let be any topological subspace of such that the prefix densities exist for each . Then is continuous on .

## 8 Selecting vertices independently

Throughout this section, we choose both input space and the output space as the set of simple, undirected graphs with vertex set , and extracts the induced subgraph on the first vertices.

### 8.1 Exchangeability and graphons

Algorithm 1 selects a subgraph uniformly from the set of all subgraphs of size of the input graph . Such uniform random subgraphs are integral to the definition of graphons [14, 15], and the prefix densities are in this case precisely the homomorphism densities of graph limit theory (up to normalization). It is thus a well-known fact that Algorithm 1 induces the class of graphon models, whose relationship to exchangeable random graphs has in turn be clarified by Diaconis and Janson [22] and Austin [6].

Applied to this case, our results take the following form: Algorithm 1 can equivalently be represented as a random transformation (22). Define as the action of that permutes the vertex labels of a graph, and rewrite Algorithm 1 as:

###### Algorithm 2.
i.) Draw Φn∼Uniform(Sn). Generate the permuted graph Xn:=Φn(y|n). Report the subgraph Sn→k(y):=Xn|k.

Clearly, creftype 2 and 1 are equivalent. It is possible to construct pathological input graphs for which prefix densities do not exist; we omit details, and simply define . Then is invariant under , and we obtain from Theorems 5 and 9:

###### Corollary 12.

Algorithm 1 is idempotent, and the limiting random graph is exchangeable. Let be a function constant on each equivalence class of . Then if functions satisfy for all outside a -null set,

 1k!∑π∈Skfk(Tπ(Sk(y))k→∞−−−→f(y) almost surely.

The equivalence classes of