 # Priors on exchangeable directed graphs

Directed graphs occur throughout statistical modeling of networks, and exchangeability is a natural assumption when the ordering of vertices does not matter. There is a deep structural theory for exchangeable undirected graphs, which extends to the directed case via measurable objects known as digraphons. Using digraphons, we first show how to construct models for exchangeable directed graphs, including special cases such as tournaments, linear orderings, directed acyclic graphs, and partial orderings. We then show how to construct priors on digraphons via the infinite relational digraphon model (di-IRM), a new Bayesian nonparametric block model for exchangeable directed graphs, and demonstrate inference on synthetic data.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Directed graphs arise in many applications involving pairwise relationships among objects, such as friendships, communication patterns in social networks, and logical dependencies (Wasserman and Faust, 1994)

. In machine learning, latent variable models are popular tools for modeling relational data in applications such as clustering

(Wang and Wong, 1987; Kemp et al., 2006; Xu et al., 2007; Airoldi et al., 2008), feature modeling (Hoff et al., 2002; Miller et al., 2009; Palla et al., 2012), and network dynamics (Fu et al., 2009; Blundell et al., 2012; Heaukulani and Ghahramani, 2013; Kim and Leskovec, 2013).

Many such models assume exchangeability,

i.e., that the joint distribution of the edges is invariant under permutations of the vertices.

Undirected exchangeable graphs have been extensively studied. The foundational Aldous–Hoover theorem (Aldous, 1981; Hoover, 1979) characterizes undirected exchangeable graphs in terms of certain measurable functions. Our perspective in this paper is closer to the equivalent characterization in terms of graphons due to Lovász and Szegedy (2006). A graphon is a symmetric, measurable function . Given a graphon , there is an associated countably infinite exchangeable graph with random adjacency matrix defined as follows (see Figure 1):

 Ui iid∼Uniform[0,1]~{}for~{}i∈N, (1) Gij|Ui,Uj ind∼Bernoulli(W(Ui,Uj)),~{}for i

and set for , and . Every exchangeable undirected graph can be written as a mixture of such sampling procedures. For , we write to denote the finite random undirected graph on underlying set induced by this sampling procedure. For more details on graphons and exchangeable graphs, see the survey by Diaconis and Janson (2008) and book by Lovász (2012).

Most work involving priors on exchangeable graphs has focused on undirected graphs; for various extensions, see the end of Section 5. For directed graphs, much of the work has extended the undirected case by using a single asymmetric measurable function to model the directed graph (see Orbanz and Roy (2015, §4) for a survey of such models). While such an asymmetric function is appropriate for exchangeable bipartite graphs (Diaconis and Janson, 2008), this representation cannot express all exchangeable directed graph models (see Section 3.1). Exchangeable directed graphs are also characterized by a sampling procedure given by the Aldous–Hoover theorem. As with the undirected case, we will work with an equivalent formulation in terms of measurable objects known as digraphons (Diaconis and Janson, 2008); see also Offner (2009), Aroskar (2012), and Aroskar and Cummings (2014). The Aldous–Hoover theorem implies that exchangeable directed graphs are determined by specifying a distribution on digraphons. Indeed, a digraphon is a more complicated representation for exchangeable directed graphs than a single asymmetric measurable function; a digraphon describes the possible directed edges between each pair of vertices jointly, rather than independently. We define digraphons in Section 2; for related work, see Section 5.

### 1.1 Contributions

This paper presents two main contributions. We first show how digraphons can be used to model directed graphs, highlighting special cases that make use of dependence in the edge directions. In particular, we characterize the form of digraphons that produce tournaments, linear orderings, directed acyclic graphs, and partial orderings (Section 3

). We briefly discuss how these formulations can be used to produce estimators for directed graph models (Section

3.3).

Next, we given an explicit example of a prior on digraphons: we present the infinite relational digraphon model (di-IRM), a Bayesian nonparametric block model for exchangeable directed graphs, which uses a Dirichlet process stick-breaking prior to partition the unit interval and Dirichlet-distributed weights for each pair of classes in the partition (Section 4). We derive a collapsed Gibbs sampling inference procedure (Section 6), and demonstrate applications of inference on synthetic data (Section 7), showing some limitations of using the infinite relational model with an asymmetric measurable function to model edge directions independently.

## 2 Background

We begin by defining notation and providing relevant background on directed exchangeable graphs. Our presentation largely follows Diaconis and Janson (2008).

### 2.1 Notation

Let . For a directed graph (or digraph) whose vertex set is or , we write for its adjacency matrix, i.e., if there is an edge from vertex to vertex , and 0 otherwise. We will omit mention of the set when it is clear. In general, for a directed graph, may be asymmetric, and we allow self-loops, which correspond to values on the diagonal. The adjacency matrix of an undirected graph (without self-loops) is a symmetric array satisfying for all .

We write to denote that the random variables and are equal in distribution.

### 2.2 Exchangeability for directed graphs

A random (infinite) directed graph on is exchangeable if its joint distribution is invariant under all permutations of the vertices:

 (Gij)i,j∈Nd=(Gπ(i)π(j))i,j∈N. (2)

By the Kolmogorov extension theorem, it is equivalent to ask for this to hold only for those permutations that move a finite number of elements of .

Such an array is sometimes called jointly exchangeable. The case where the distribution is preserved under permutation of each index separately, i.e., where for arbitrary permutations and , is called separately exchangeable, and arises for adjacency matrices of bipartite graphs.

### 2.3 Digraphons

As described by Diaconis and Janson (2008), using the Aldous–Hoover theorem one may show that every exchangeable countably infinite directed graph is expressible as a mixture of with respect to some distribution on digraphons .

We now define digraphons; in Section 2.4 we will describe the sampling procedure that yields .

###### Definition 2.1.

A digraphon is a 5-tuple , where , for , and are measurable functions satisfying the following conditions for all :

 W00(x,y) =W00(y,x); W11(x,y) =W11(y,x); (3) W01(x,y) =W10(y,x);

and

 W00(x,y)+W01(x,y)+W10(x,y)+W11(x,y)=1.

Given a digraphon , write for the map given by .

The functions

represent the joint probability of

and for , i.e.,

 Pr(Gij=a,Gji=b)=Wab(Ui,Uj), (4)

conditioned on and . In this way, determines the probability of having neither edge direction between vertices and , of only having a single edge to from (“right-to-left”), of a single edge from to (“left-to-right”), and of directed edges in both directions between to . The function represents the probability of ; because it is -valued, this merely states whether or not has a self-loop.

(There is an equivalent alternative set of objects that may be used to specify an exchangeable digraph, where are as before and gives the marginal probability of a self-loop, which is independent of the other edges; see Diaconis and Janson (2008) for details.)

### 2.4 Sampling from a digraphon

The adjacency matrix of a countably infinite random graph is determined by the following sampling procedure:

1. Draw for .

2. For each pair of distinct vertices , assign the edge values for and according to an independent such that Equation (4) holds.

3. Assign self-loops for all .

In other words, in step 2 we assign , where we interpret the categorical random variable as a distribution over the choices , in that order. Note that step 2 is well-defined by the symmetry condition in Equation (2.1). Figure 2 illustrates this sampling procedure via a schematic.

An analogous sampling procedure yields finite random digraphs: Given , in step 1, instead sample only for . Then determine for as before. We write to denote the random digraph thereby induced on . Figure 2: Schematic illustrating digraphon sampling procedure for W=(W00,W01,W10,W11,w). The x-axis is vertical and y-axis horizontal, with (0,0) in the upper left, so that the notation Wab(x,y) coheres with the usual (row,column) convention for matrix indexing.

### 2.5 Aldous–Hoover theorem for directed graphs

Diaconis and Janson (2008) derived the following corollary of the Aldous–Hoover theorem for directed graphs.

###### Theorem 2.2 (Diaconis–Janson).

Every exchangeable random countably infinite directed graph is obtained as a mixture of ; in other words, as for some random digraphon .

Therefore the problem of specifying the distribution of an infinite exchangeable digraph may be equivalently viewed as the problem of specifying a distribution on digraphons.

## 3 Digraphons and statistical modeling

We first motivate the use of digraphons instead of asymmetric measurable functions for modeling exchangeable directed graphs. We then discuss the representations via digraphons for several random structures which are special cases of directed graphs. Finally, we discuss how to estimate digraphons, in the context of both Bayesian and frequentist estimation.

### 3.1 Modeling limitations of asymmetric measurable functions

Asymmetric measurable functions characterize exchangeable bipartite graphs by the Aldous–Hoover theorem for separately exchangeable arrays; for details see Diaconis and Janson (2008, §8). These functions can also be used to generate and model directed graphs (without self-loops) by considering the edge directions and independently, i.e., for all , conditioned on and , according to the following sampling procedure:

 Ui iid∼Uniform[0,1]~{}for~{}i∈N, Gij|Ui,Uj ind∼Bernoulli(Wasym(Ui,Uj)),~{}for i≠j,

and for . Currently priors on these asymmetric functions are popular in Bayesian modeling of directed graphs, as we note in Section 5.

Asymmetric measurable functions are also equivalent to the following special case of the digraphon representation. Via the above sampling procedure, every asymmetric measurable function yields the same directed graph as the digraphon given pointwise by

 W(x,y)=((1−p)(1−q),(1−p)q,p(1−q),pq,0),

where and . In particular, conditioned on and , the marginal probability of an edge from to and of an edge from to are independent.

On the other hand, many common kinds of digraphs are not obtainable from a single asymmetric function. Consider the following two classes:

1. Undirected graphs: between any two vertices and , there are either no edges (), or edges in both directions ().

2. Tournaments: between any two vertices and , there is exactly one directed edge, i.e., or but not both.

For digraphs of either of these two sorts, the directions are correlated, and hence not obtainable from the above sampling procedure for an asymmetric measurable function, as this procedure generates and independently. This demonstrates how the use of an asymmetric measurable function is poorly suited for graphs with correlated edge directions. Though constructing a model for general digraphs using the function leads to misspecification, one might hope to perform inference nevertheless; however, as we show in Section 7.2, doing so may fail to discern structure that may be discovered through posterior inference with respect to a prior on digraphons.

In contrast to the use of asymmetric measurable functions, where one considers edge directions independently, with digraphons one considers the edge directions between vertex and vertex jointly, as in Equation (4). Thus, digraphons give a more general and flexible representation for modeling digraphs.

### 3.2 Special cases

We discuss several special cases of directed graphs and specify the form of their digraphon representations.

#### 3.2.1 Undirected graphs

Undirected graphs can be viewed as directed graphs with no self-loops, where each pair of distinct vertices either has edges in both directions or in neither. Hence a digraphon that yields an undirected graph is one having no probability in the single edge directions, i.e., such that (or equivalently, ) and . Such a digraph is therefore determined by merely specifying the graphon , where is implicit.

###### Example 3.1.

In Figure 3, we display an example of a digraphon whose samples are undirected Erdős–Rényi graphs with edge density , i.e.,

 (W00,W01,W10,W11,w)=(\nicefrac12,0,0,\nicefrac12,0).

This digraphon corresponds to the graphon . Figure 3: left: Erdős–Rényi undirected graph as a digraphon W; right: G(20,W)

#### 3.2.2 Tournaments

A tournament is a directed graph without self-loops, where for each pair of vertices, there is an edge in exactly one direction. In other words, a tournament has if and only if for , and . Therefore a digraphon yielding a tournament is one satisfying and (or equivalently, ).

###### Example 3.2.

An example of a tournament digraphon is displayed in Figure 4:

 (W00,W01,W10,W11,w)=(0,\nicefrac12,\nicefrac12,0,0).

The random tournament induced by sampling from this digraphon is almost surely isomorphic to a countable structure known as the generic tournament. (For more details on this example, see Chung and Graham (1991) and Diaconis and Janson (2008, Example 9.2).)

As discussed in Section 3.2.1, exchangeable undirected graphs can be specified in terms of single functions (graphons) and their associated sampling procedure (described in Equation 1). Similarly, tournaments also have a single-function representation and associated sampling procedure. Namely, a tournament digraphon is determined by a measurable function that is anti-symmetric in the sense that for all (corresponding to the digraphon condition ). To sample from , first sample for , and then set (and ). The digraphon in Example 3.2 corresponds to the anti-symmetric, measurable function .

Tournament digraphons have recently been studied in detail by Thörnblad (2016), which calls the single function a tournament kernel.

Statistical models for tournaments appear in the ranking theory literature, often using a variant of the Bradley–Terry model (Bradley and Terry, 1952), first described by Zermelo (1929). For more details, including the relation to graphons, see Chatterjee (2015, §2.7). This literature, and related estimation papers such as Chatterjee and Mukherjee (2016), is also often framed in terms of a single-function representation. Figure 4: left: Digraphon W that yields a generic tournament; right: G(20,W)

#### 3.2.3 Linearly ordered sets

A digraph is a (strict) linear ordering when the directed edge relation is transitive, and every pair of distinct vertices has an edge in exactly one direction. Consider the digraphon given by and , where

 W10(x,y)={1~{}if~{}x

The countable directed graph induced by sampling from this digraphon is almost surely a linear order. In fact, this is essentially the only such example — by Glasner and Weiss (2002, §8), its distribution is the same as that of every exchangeable linear ordering. (In other words, any digraphon yielding the (unique) exchangeable linear ordering is weakly isomorphic to this one; see Section 7 for details.) Furthermore, the countable linear ordering obtained from sampling this digraphon is almost surely dense and without endpoints, and hence isomorphic to the rationals. A finite sample with vertices has distribution equal to the uniform measure on all ways of linearly ordering .

This digraphon is displayed in Figure 5 alongside a 20 vertex random sample, rearranged by increasing ; note that for almost every sample, the corresponding rearranged graph will have all vertices strictly above the diagonal.

#### 3.2.4 Directed acyclic graphs

A directed acyclic graph (DAG) is a directed graph having no directed path from any vertex to itself. Various work has focused on models for DAGs (e.g., see Roverato and Consonni (2004)

), and especially their use in describing random instances of directed graphical models (also known as Bayesian networks). DAGs also arise naturally as networks describing non-circular dependencies (e.g., among software packages), and in other key data structures.

One can show, using the main result of Hladký et al. (2015) (which we describe in Section 3.2.5), that any exchangeable DAG can be obtained from sampling a digraphon satisfying for and . Note that this constrains the digraphon to have the same zero-valued regions as those in the canonical presentation of a linear ordering digraphon (as described above and displayed in Figure 5), except that may be arbitrary. (Equivalently, for , the value may be chosen arbitrarily, so that the remaining terms are given by and .) A digraphon of this form thereby specifies one way in which the exchangeable DAG can be topologically ordered (i.e., extended to some exchangeable linear ordering).

Specifying a digraphon in this way always yields a DAG upon sampling, as the standard linear ordering on does not admit directed cycles, and one can show that all exchangeable DAGs arise in this way, as mentioned above.

###### Example 3.3.

An example of a digraphon that yields exchangeable DAGs is the generic DAG digraphon given by

 W00 =\nicefrac12, W10(x,y) ={\nicefrac12~{}if~{}x

where is such that . This example is displayed in Figure 6. We can see that the reordered sample is indeed a DAG, as the edges clearly all lie above the diagonal in the adjacency matrix.

#### 3.2.5 Partially ordered sets

A partially ordered set, or poset, is a set with a binary relation that is reflexive, antisymmetric, and transitive. A poset can be viewed as a digraph having a directed edge from to if and only if . Note that the transitive closure of any DAG is a poset, i.e., if in a DAG, there is a directed path from to , the transitive closure has an edge from to , thereby producing a partial ordering. (One can similarly define the “transitive closure digraphon” of a digraphon that yields DAGs to obtain a digraphon yielding the corresponding transitive closures). Conversely, any poset (with self-loops removed) is already a DAG. Therefore exchangeable posets are obtainable by some digraphon of the form described in Section 3.2.4 (except with ), though not all such digraphons yield posets. Analogously, representing an exchangeable poset via a digraphon of this form amounts to specifying a linearization of the poset.

Janson (2011) develops a theory of poset limits (or posetons) and their relation to exchangeable posets. By Hladký et al. (2015), any exchangeable poset is given by some digraphon for which implies that , i.e., is compatible with the standard linear ordering on .

###### Example 3.4.

Consider the following example of a digraphon that yields an exchangeable poset, specified by the following blockmodel:

 W10 =⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩\nicefrac12~{}if~{}x<\nicefrac14~{}and~{}\nicefrac14≤y<\nicefrac34,\nicefrac12~{}if~{}\nicefrac14≤x<\nicefrac34~{}and~{}y≥\nicefrac34,1~{}if~{}x<\nicefrac14~{}and~{}y≥\nicefrac34, and0~{}otherwise,

where , where is such that , and where .

This example is displayed in Figure 7. In particular, the block structure of the model is reflected in the rearranged sample on the right. We can see that this is an exchangeable poset: if the loops (the diagonal) are removed from this digraph, it is a DAG (as all the edges in the rearranged sample are above the diagonal), and one can check that it is transitively closed.

This is a key example among posets. Work of Kleitman and Rothschild (1975) and Compton (1988)

, characterizing the combinatorial structure of a typical large finite poset, implies that the sequence of uniform distributions on labeled posets of size

converges (in the sense of poset limits) to this example. Figure 7: left: Example of a 3×3 SBM poset digraphon; right: G(20,W)

### 3.3 Digraphon estimation

For undirected graphs, the graphon estimation problem has received considerable attention in recent years. In graphon estimation, one seeks to infer either the function , or the associated probability matrix with entries , given a single sample (or multiple samples) of the graph. From the Bayesian modeling perspective, one places a prior on graphons and performs an inference procedure to estimate the parameters of the random function prior.

From the frequentist perspective, one is interested in producing an estimator for a fixed graphon, and many such algorithms have been developed, including histogram and degree-sorting based methods. To produce a frequentist digraphon estimator, one can extend methods developed for graphons. Just as a single asymmetric measurable function is insufficient for representing correlated edge directions, one must likewise estimate the edge directions jointly. Although a directed graph can be simply represented with a single asymmetric matrix, digraphon estimators must consider the impact on pairs of entries jointly when partitioning, rearranging, or otherwise manipulating vertices .

##### Histogram estimators for digraphons

A histogram estimation procedure for graphons partitions the vertices into several classes, and then uses the average edge density across each pair of classes as an estimate of the probability of an edge between two vertices in those classes. This reduces the problem to that of estimating a partition that yields a good estimate of these edge densities. Many methods have been developed for this problem; for further details see the references within Borgs et al. (2015, §§1.3 and 1.7).

To estimate a digraphon (ignoring loops), we must estimate four edge-direction histograms, where the goal is to estimate a partition of the vertices that simultaneously yields good estimates of the four types of edge densities. After producing a partition of the vertices, one likewise computes the average edge density in each of the four cases, resulting in four histograms. (If considering loops, there is one additional 1-dimensional histogram whose estimates are to be jointly optimized by the partition.)

The Frieze–Kannan and Szemerédi regularity lemmas lead to bounds on how well a large graph can be approximated using edge densities across a partition (Lovász, 2012, Chapters 9 and 10); see also Kallenberg (1999). The generalization of the Szemerédi regularity lemma to directed graphs by Alon and Shapira (2004) likewise provides a bound in terms of directed edge densities.

##### Degree-sorting estimators for digraphons

Many degree-sorting algorithms have been proposed for graphon estimation. These algorithms often involve “sorting” followed by “smoothing”. In the sorting step, the vertices are sorted by their degrees, where the degree of a vertex is defined to be , In the smoothing step, the -valued adjacency matrix is used to produce a -valued matrix using some smoothing algorithm. For example, Chan and Airoldi (2014)

compare a degree-sorting algorithm that uses total variation distance minimization as a smoothing step to one that uses universal singular value thresholding

(Chatterjee, 2015) as the smoothing step. Degree-sorting estimators assume that the degree distribution is strictly monotonizable in , i.e., in order for sorting to be effective, the degrees of the vertices must vary.

This idea can be similarly applied to digraphon estimation: First sort the degrees of the vertices by the four types of edge directions, to obtain four adjacency matrices, and then smooth these matrices. It would suffice to require, after possibly applying a single measure-preserving transformation to , that the map is strictly increasing with respect to the lexicographic ordering of .

In this paper, we do not comment further on digraphon estimators, but many other graphon estimation techniques should generalize similarly. One way of describing the general pattern is to jointly consider the corresponding techniques applied to the four matrices obtained from the adjacency matrix restricted to each joint edge type.

##### Priors on digraphons

Bayesian approaches may also be use to estimate a digraphon; this is the focus of much of the rest of the paper. One may likewise use similar techniques to those that have been developed for graphons. Analogously to the case of undirected graphs, a Bayesian model for exchangeable directed graphs involves placing a prior on digraphons. This is justified by the characterization in Section 2.5 of exchangeable directed graphs in terms of random digraphons. We discuss such an approach in depth in Section 4, where we present a Bayesian nonparametric model based on random partitions using the Dirichlet process.

## 4 Infinite relational digraphon model

We now proceed to describe a prior on digraphons that makes use of block structure. For directed graphs, the infinite relational model (IRM) (Kemp et al., 2006) models edges between vertices using an asymmetric measurable function and is a nonparametric extension of the (asymmetric) stochastic block model. In this section, we present the infinite relational digraphon model (di-IRM), which gives a prior on digraphons. This model can be viewed as a generalization of the symmetric IRM, a graphon model, to the digraphon case. We then show how the di-IRM can be used to model a variety of digraphs, including ones that cannot be modeled using an asymmetric IRM.

### 4.1 Generative model

We present two equivalent representations of the di-IRM model: (1) a digraphon representation and (2) a clustering representation. The digraphon representation uses a stick-breaking Dirichlet process prior to partition the unit interval, while the clustering representation uses a Chinese restaurant process prior to partition the vertices. The difference between the two representations is analogous to that between the representations of the IRM given by Orbanz and Roy (2015, §4.1).

#### 4.1.1 Digraphon representation

We first introduce some notation. Let

be a concentration hyperparameter, and

be a hyperparameter vector for the weight matrices

, where for . We allow some (but not all) of the Dirichlet parameters to take the value zero, at which the corresponding components must be degenerate. As a shorthand, we write for the 4-tuple of weights of the classes and , where . The following generative process gives a prior on digraphons:

1. Draw a partition of :

 Π|α∼DP-Stick(α).
2. Draw weights for each pair of classes of the partition:

1. Draw weights for the upper diagonal blocks, where :

 ηr,s|β∼Dirichlet(β).
2. Draw weights for the diagonal blocks:

 (η(00)r,r,η(01)r,r+η(10)r,r,η(11)r,r) | β ∼Dirichlet(β(00),β(01)+β(10),β(11)),

subject to the constraint

 η(01)r,r =η(10)r,r.
3. Set weights for the lower diagonal blocks, where , such that the symmetry requirements in Equation (2.1) are satisfied:

 η(00)r,s =η(00)s,r,η(11)r,s=η(11)s,r, η(01)r,s =η(10)s,r,η(10)r,s=η(01)s,r.

In Section 4.2 we show different types of random digraphons that arise from various settings of . The partition is drawn from a Dirichlet stick-breaking process: for each , draw , and for every , set , so that , thereby determining a random partition of .

The self-loops can be specified using the same partition of , either with a deterministic -valued function or a single weight , as described in Section 2.3. For our purposes, we assume . This generative process fully specifies a random digraphon , from which random digraphs can then be sampled according to the process given in Section 2.4.

#### 4.1.2 Clustering representation

An alternative representation of the generative process for a partition described above can be formulated directly in terms of clustering: in this generative process, each vertex has a cluster assignment . This yields an equivalent assignment to that given by the digraphon formulation if, after sampling the uniform random variable , we assign vertex to the cluster corresponding to the class of the partition of that belongs to.

Thus, in place of the first step of the generative process given in the digraphon representation (Section 4.1.1), we draw a partition of the vertices from a Chinese restaurant process (CRP) (as described in, e.g., Aldous (1985)): where each gives the cluster assignment of vertex , and is a hyperparameter. The weights are drawn in the same manner as in the second step of the digraphon representation of the di-IRM. Finally, edges are drawn analogously to the general digraphon sampling procedure: , so that Equation (4) holds, where again the Categorical distribution is over the choices .

This representation is particularly convenient for performing inference, especially when using a collapsed Gibbs sampling procedure, as we show in Section 6.

### 4.2 Special cases obtained from the di-IRM

In Figure 8, we display examples of random di-IRM draws using several settings of the hyperparameter vector . The parameter settings were specifically chosen to illustrate some of the special cases the di-IRM model can cover.

##### Undirected

To get a prior on graphons using the di-IRM, we can set . Figure 7(a) shows a parameter setting that produces undirected graphs and is equivalent to a symmetric IRM when taking to be the IRM; we can see from the sample on the right that the graph is indeed undirected.

##### Tournaments

We can specify a di-IRM tournament prior by setting . Figure 7(c) shows the parameter setting , which puts all the mass on the middle two functions. The tournament structure is easy to see in the 20-vertex sample; for distinct and , whenever there is an edge from to , there is not an edge from to .

Figure 7(e) shows a less extreme (non-tournament) variant that still has strong correlations between the edge directions, by virtue of retaining most of the mass on the functions and . Here we set . Note that the block structure in a sample from this digraphon is more subtle than in the undirected sample, demonstrating the importance of counting all four edge-direction combinations rather than just marginals for the two directions.

##### Directed acyclic graphs

To obtain a directed acyclic graph from the di-IRM, we set the hyperparameters so that the resulting function is empty and has nonzero values only on blocks above the diagonal, as in Section 3.2.4. To achieve this, we set the Dirichlet weight parameters such that for the weights where , and for each class let refer to the 3-tuple of hyperparameters used for the weights on the diagonal, each set to . With these hyperparameter choices, we obtain a directed acyclic di-IRM, as seen in Figure 7(g). We can see in both samples that the directed edges in the resorted sample lie above the diagonal. Note that we make use of only in this section, to show how to get a DAG prior; in our later inference examples, we use the di-IRM model as introduced in the previous subsection with the single vector of hyperparameters .

##### Near-ordering

Consider the hyperparameter settings for the weights when , and for every class . The resulting digraph is “nearly” ordered, in the sense that it is linearizable and any two elements in different classes are comparable, as seen in Figure 7(i). Here for any blocks above the diagonal, and the resulting partial ordering is apparent in both of the resorted samples, with all directed edges above the diagonal.

### 4.3 Other partitions for the di-IRM

Any block model digraphon can be specified in a similar manner: first define a partition of , which then gives a partition of ; next let each block on be piecewise constant such that the symmetry requirements in Equation (2.1) are satisfied.

In the case where the number of classes and the size of the classes are fixed parameters, the directed IRM behaves similarly to some random directed SBM. In addition to the CRP, we can also consider other partitioning schemes. Alternatively, one can consider other random partitions of as well. For instance, if one is interested in power law scaling in the number of clusters (and the sizes of particular clusters), the Pitman–Yor process (Pitman and Yor, 1997) provides a suitable generalization of the Dirichlet process. It has both a stick-breaking and urn representation analogous to those for the Dirichlet process.

## 5 Related work

The stochastic block model (see Holland et al. (1983) and Wasserman and Faust (1994)) has been well-studied in the case of directed graphs (Holland and Leinhardt, 1981; Wang and Wong, 1987), including from a Bayesian perspective (Wong, 1987; Gill and Swartz, 2004; Nowicki and Snijders, 2001). Although working within a restricted class of models, already Holland and Leinhardt (1981) consider the full joint distribution on edge directions, rather than making independence assumptions.

The directed stochastic blockmodel (di-SBM) can be represented as a digraphon given by four step-functions that are piecewise constant on a finite number of classes. We display an example of a directed SBM in Figure 9. The di-IRM model presented in this paper can be seen as a nonparametric extension of the di-SBM, just as the undirected IRM (introduced independently by Kemp et al. (2006) and Xu et al. (2007)) is a nonparametric undirected SBM.

Any prior on exchangeable undirected graphs can be described in terms of a corresponding prior on graphons. As alluded to in the introduction, many Bayesian nonparametric models for graphs admit a nice representation in this form (even if not originally described in these terms); see Orbanz and Roy (2015, §4) for additional details and examples from the machine learning literature, including the IRM. Likewise, priors on exchangeable digraphs (which have been less thoroughly explored) can be described in terms of the corresponding priors of digraphons, as we have begun to do here.

As noted in Lloyd et al. (2012), when existing models are expressed in these terms, various restrictions (and in particular, unnecessary independence assumptions) become more apparent. As we have seen, the use of the IRM on directed graphs models the edge directions as independent (see Kemp et al. (2004) for examples), a condition that can be straightforwardly relaxed when the model is expressed in the general setting provided by digraphons.

Exchangeable directed graphs have also been considered by Austin (2008), via an application of the Aldous–Hoover theorem, although this work does not describe digraphons explicitly. We conclude this section by describing several extensions of the graphon formalism, some of which can be combined with the directed case. In particular, edges may be more general than -valued. Variants of graphons for weighted and edge-colored graphs have been considered by Lovász (2012, Chapter 17) and Austin (2008). Graphs with edge multiplicity, or multigraphs, can be viewed as integer-valued arrays, a case also covered by the Aldous–Hoover theorem, although the corresponding extension of graphons is more complicated when the edge multiplicities are unbounded; see Kolossváry and Ráth (2011), Lovász (2012, Chapter 17), and Kunszenti-Kovács et al. (2014). Graphs (that are not necessarily symmetric) with real-valued edges are also covered by the Aldous–Hoover theorem through real-valued exchangeable arrays, and have many applications in statistics and machine learning; see Lloyd et al. (2012) and Orbanz and Roy (2015). The Aldous–Hoover theorem also covers real-valued -dimensional arrays for , although the corresponding extension of graphons to the case of hypergraphs is considerably more involved; for details, see Lovász (2012, Chapter 23.3), Austin (2008), and Lloyd et al. (2013).

## 6 Posterior inference

In this section, we perform collapsed Gibbs sampling for the di-IRM. We use the notation for the clustering representation of the di-IRM, so we can use Gibbs sampling to repeatedly sample the cluster assignment of each vertex.

Let be a digraph on ; for simplicity we assume that has no self-edges, and that, as in Section 4.1.1, the di-IRM parameters are chosen so that no self-edges are produced. Consider a partition of into a countably infinite number of clusters, and for , let denote the cluster assignment of . Write for the vector of all cluster assignments, and for the 4-tuple of weight matrices. Because of the symmetry requirement of the diagonal, we are able to simplify notation as follows: let , let , and let . Let be the 3-tuple of hyperparameters for the diagonal blocks.

The likelihood of being drawn from the di-IRM, given cluster assignments and weights , is given by

 p(G|z,η) =∏r≤s∏a,b(η(ab)r,s)m(ab)r,s

where denotes the number of directed edges of type between class and class , for and .

Since the weights have a factorized Dirichlet distribution prior, we have

 p(η|β) =[B(β)−1∏r

where is the multivariate beta function.

We sample each cluster assignment conditional on all other assignment variables:

 zi|z−i∼p(zi|z−i,G)∝p(G|z)p(zi|z−i), (5)

where denotes the vector of all assignments such that .

To compute the first term in Equation (5), we can integrate out the parameters :

 p(G|z) =[B(β)−1∏r

where we simplify calculations on the diagonal using the shorthand , and .

The second term in Equation (5) comes from the CRP distribution on z:

 p(zi=r|z−i)={cri−1+α ~{}if~{}cr>0,~{}andαi−1+α ~{}if~{}r~{}is a new cluster,

where denotes the number of elements in cluster , and is the concentration hyperparameter.

We can reconstruct the weights using their MAP estimate:

 η(ab)r,s =(m(ab)r,s+β(ab))/Nr,s, (6)

where

## 7 Experiments

In this section, we experimentally evaluate the di-IRM model on synthetic data. We present two examples: the first is meant to illustrate the correct behavior of inference on di-IRM parameters, and the second is designed to show the advantage of using a digraphon representation (given by the di-IRM) over using an asymmetric function (given by the IRM).

Multiple digraphons may induce the same distribution on exchangeable digraphs, in which case they are said to be weakly isomorphic. This is not just because a digraphon can be perturbed on a measure-zero set without changing the induced distribution on digraphs, but also because measurable rearrangements of the digraphon will also leave the distribution invariant (analogously to how relabeling the vertices of a digraph does not change it up to isomorphism). Hence a digraphon is not identifiable from the random digraph ; in general only its weak isomorphism class can be determined. For details (in the analogous setting of graphons), see Diaconis and Janson (2008, §7) and Orbanz and Roy (2015, §3.4).

Therefore, in the following inference problems, we can only expect to estimate a digraphon up to its weak isomorphism class. In a block model, this results in the nonidentifiability of the order of the blocks.

### 7.1 Random di-IRM from uniform weights

We first draw a random di-IRM with the weights , which is displayed in Figure 9(a). We then generate a 100-vertex sample from this digraphon (Figure 9(c)). We ran a collapsed Gibbs sampling procedure for 200 iterations, beginning from a random initial clustering. This inference procedure is able to recover the original weights, up to reordering; the inferred weight matrices are displayed in Figure 10, drawn in proportion to the inferred cluster sizes.

### 7.2 Half-undirected, half-tournament example

We consider the 2-class step-function digraphon with half the vertices in each class that is given by ,

 W00(x,y)=W11(x,y)=⎧⎨⎩\nicefrac12~% {}if~{}x<\nicefrac12~{}and~{}y<\nicefrac12,\nicefrac12~{}if~{}x≥\nicefrac12~{}and~{}y≥\nicefrac12,0~{}otherwise,

and

 W01(x,y)=W10(x,y)=⎧⎨⎩\nicefrac12~% {}if~{}x≥\nicefrac12~{}and~{}y<\nicefrac12,\nicefrac12~{}if~{}x<\nicefrac12~{}and~{}y≥\nicefrac12,0~{}otherwise.

This digraphon is displayed in Figure 10(a), and a schematic illustrating the model is in Figure 10(b). This example demonstrates the importance of being able to distinguish regions having different correlations between edge directions (but the same marginal left-to-right and right-to-left edge probabilities).

We generated a synthetic digraph sampled from and then ran a collapsed Gibbs sampling procedure for the di-IRM. We also ran a similar collapsed Gibbs sampler for the IRM. Both samplers began with a random clustering and ran until the cluster assignments approximately converged. The results are shown in Figure 10(c); here the random sample is displayed alongside the sample resorted according the clusters inferred using the di-IRM model, as well as the clusters inferred by the IRM model. In both resorted images, the true clusters are colored, white indicates no edge, red indicates an edge between vertices from group 1, blue indicates an edge between vertices from group 2, and purple indicates an edge between vertices from different groups. Note that the true clusters are correctly inferred using the di-IRM model, as reordering the vertices according to the inferred clusters identifies the true groups, while the IRM model fails to discern the correct structure. The IRM only considers the marginal left-to-right and right-to-left edge probabilities, which do not distinguish the two clusters; in this particular inference run, almost all vertices were put into the first of the two clusters, which is consistent with not being able to distinguish between vertices with similar marginal edge probabilities. This result is what one would expect from an algorithm that has inferred uniform independent edge probabilities, i.e., the edge probabilities of an Erdős–Rényi graph.

## 8 Discussion

We have described how priors on digraphons can be used in the statistical modeling of exchangeable dense digraphs, and have exhibited several key classes of structures that one can model with particular subclasses of these priors. We have also illustrated why merely using asymmetric measurable functions is insufficient, as this produces a misspecified model for any exchangeable digraphs having correlations between the edge directions.

While models based on digraphons (and graphons) are almost surely dense (or empty) and not directly suitable for real-world network applications that are sparse, it is still useful to study models using digraphons (see, e.g., the discussion in Orbanz and Roy (2015, §7.1)). Some recent work, e.g., Borgs et al. (2015); Veitch and Roy (2015); Borgs et al. (2016); Herlau and Schmidt (2016); Cai et al. (2016); Crane and Dempsey (2016); Caron and Fox (2014), has pointed to methods for extending exchangeable graphs to the case of sparse graphs, but many interesting problems remain.

## Acknowledgments

A preliminary version of this material was presented at the 10th Conference on Bayesian Nonparametrics in Raleigh, NC during June 2015, and an extended abstract was presented at the NIPS 2015 workshop Bayesian Nonparametrics: The Next Generation during December 2015. The authors would like to thank Tamara Broderick, Vikash Mansinghka, Peter Orbanz, Daniel Roy, and Victor Veitch for helpful conversations, and Rehana Patel and Daniel Roy for comments on a draft.

This material is based upon work supported by the United States Air Force and the Defense Advanced Research Projects Agency (DARPA) under Contract Numbers FA8750-14-C-0001 and FA8750-14-2-0004. Work by D. C. was also supported by a McCormick Fellowship and Bernstein Award at the University of Chicago, an ISBA Junior Researcher Travel Award, and an ISBA@NIPS Special Travel Award. Work by C. F. was also supported by Army Research Office grant number W911NF-13-1-0212 and a grant from Google. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the United States Air Force, Army, or DARPA.

## References

• Airoldi et al. (2008) E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. J. Machine Learning Res., 9:1981–2014, 2008.
• Aldous (1981) D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multivariate Anal., 11(4):581–598, 1981.
• Aldous (1985) D. J. Aldous. Exchangeability and related topics. In École d’été de probabilités de Saint-Flour, XIII—1983, volume 1117 of Lecture Notes in Math., pages 1–198. Springer, Berlin, 1985.
• Alon and Shapira (2004) N. Alon and A. Shapira. Testing subgraphs in directed graphs. J. Comput. System Sci., 69(3):353–382, 2004.
• Aroskar and Cummings (2014) A. Aroskar and J. Cummings. Limits, regularity and removal for finite structures. To appear in J. Symb. Logic. ArXiv e-print 1412.8084, 2014.
• Aroskar (2012) A. Aroskar. Limits, regularity and removal for relational and weighted structures. PhD thesis, Carnegie Mellon University, 2012.
• Austin (2008) T. Austin. On exchangeable random variables and the statistics of large graphs and hypergraphs. Probab. Surv., 5:80–145, 2008.
• Blundell et al. (2012) C. Blundell, K. A. Heller, and J. M. Beck. Modelling reciprocating relationships with Hawkes processes. In Adv. Neural Inform. Process. Syst. (NIPS) 26, pages 2609–2617, 2012.
• Borgs et al. (2015) C. Borgs, J. T. Chayes, H. Cohn, and S. Ganguly. Consistent nonparametric estimation for heavy-tailed sparse graphs. ArXiv e-print 1508.06675v2, 2015.
• Borgs et al. (2016) C. Borgs, J. T. Chayes, H. Cohn, and N. Holden. Sparse exchangeable graphs and their limits via graphon processes. ArXiv e-print 1601.07134, 2016.
• Bradley and Terry (1952) R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika, 39:324–345, 1952.
• Cai et al. (2016) D. Cai, T. Campbell, and T. Broderick. Edge-exchangeable graphs and sparsity. In Adv. Neural Inform. Process. Syst. (NIPS) 29, 2016.
• Caron and Fox (2014) F. Caron and E. B. Fox. Sparse graphs using exchangeable random measures. ArXiv e-print 1401.1137, 2014.
• Chan and Airoldi (2014) S. H. Chan and E. Airoldi. A consistent histogram estimator for exchangeable graph models. In Proc. 31st Int. Conf. Mach. Learn. (ICML), pages 208–216, 2014.
• Chatterjee and Mukherjee (2016) S. Chatterjee and S. Mukherjee. On estimation in tournaments and graphs under monotonicity constraints. ArXiv e-print 1603.04556, 2016.
• Chatterjee (2015) S. Chatterjee. Matrix estimation by universal singular value thresholding. Ann. Statist., 43(1):177–214, 2015.
• Chung and Graham (1991) F. R. K. Chung and R. L. Graham. Quasi-random tournaments. J. Graph Theory, 15(2):173–198, 1991.
• Compton (1988) K. J. Compton. The computational complexity of asymptotic problems. I. Partial orders. Inform. and Comput., 78(2):108–123, 1988.
• Crane and Dempsey (2016) H. Crane and W. Dempsey. Edge exchangeable models for network data. ArXiv e-print 1603.04571, 2016.
• Diaconis and Janson (2008) P. Diaconis and S. Janson. Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7), 28(1):33–61, 2008.
• Fu et al. (2009) W. Fu, L. Song, and E. P. Xing. Dynamic mixed membership blockmodel for evolving networks. In Proc. 26th Int. Conf. Mach. Learn. (ICML), pages 329–336, 2009.
• Gill and Swartz (2004) P. S. Gill and T. B. Swartz. Bayesian analysis of directed graphs data with applications to social networks. J. Roy. Statist. Soc. Ser. C, 53(2):249–260, 2004.
• Glasner and Weiss (2002) E. Glasner and B. Weiss. Minimal actions of the group of permutations of the integers. Geom. Funct. Anal., 12(5):964–988, 2002.
• Heaukulani and Ghahramani (2013) C. Heaukulani and Z. Ghahramani. Dynamic probabilistic models for latent feature propagation in social networks. In Proc. 30th Int. Conf. Mach. Learn. (ICML), pages 275–283, 2013.
• Herlau and Schmidt (2016) T. Herlau and M. Schmidt. Completely random measures for modelling block-structured sparse networks. In Adv. Neural Inform. Process. Syst. (NIPS) 29, 2016.
• Hladký et al. (2015) J. Hladký, A. Máthé, V. Patel, and O. Pikhurko. Poset limits can be totally ordered. Trans. Amer. Math. Soc., 367(6):4319–4337, 2015.
• Hoff et al. (2002) P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. J. Amer. Statist. Assoc., 97(460):1090–1098, 2002.
• Holland and Leinhardt (1981) P. W. Holland and S. Leinhardt.

An exponential family of probability distributions for directed graphs.

J. Amer. Statist. Assoc., 76(373):33–65, 1981.
• Holland et al. (1983) P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: first steps. Social Networks, 5(2):109–137, 1983.
• Hoover (1979) D. N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 1979.
• Janson (2011) S. Janson. Poset limits and exchangeable random posets. Combinatorica, 31(5):529–563, 2011.
• Kallenberg (1999) O. Kallenberg. Multivariate sampling and the estimation problem for exchangeable arrays. J. Theoret. Probab., 12(3):859–883, 1999.
• Kemp et al. (2006) C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In

Proc. 21st Nat. Conf. Artificial Intelligence (AAAI-06)

, 2006.
• Kemp et al. (2004) C. Kemp, T. L. Griffiths, and J. B. Tenenbaum. Discovering latent classes in relational data. AI Memo 2004-019, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 2004.
• Kim and Leskovec (2013) M. Kim and J. Leskovec. Nonparametric multi-group membership model for dynamic networks. In Adv. Neural Inform. Process. Syst. (NIPS) 27, pages 1385–1393, 2013.
• Kleitman and Rothschild (1975) D. J. Kleitman and B. L. Rothschild. Asymptotic enumeration of partial orders on a finite set. Trans. Amer. Math. Soc., 205:205–220, 1975.
• Kolossváry and Ráth (2011) I. Kolossváry and B. Ráth. Multigraph limits and exchangeability. Acta Math. Hungar., 130(1-2):1–34, 2011.
• Kunszenti-Kovács et al. (2014) D. Kunszenti-Kovács, L. Lovász, and B. Szegedy. Multigraph limits, unbounded kernels, and Banach space decorated graphs. ArXiv e-print 1406.7846, 2014.
• Lloyd et al. (2012) J. R. Lloyd, P. Orbanz, Z. Ghahramani, and D. M. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Adv. Neural Inform. Process. Syst. (NIPS) 25, pages 1007–1015, 2012.
• Lloyd et al. (2013) J. R. Lloyd, P. Orbanz, Z. Ghahramani, and D. M. Roy. Exchangeable databases and their functional representation. In NIPS Workshop on Frontiers of Network Analysis: Methods, Models, and Application, 2013.
• Lovász (2012) L. Lovász. Large networks and graph limits, volume 60 of Amer. Math. Soc. Colloq. Publ. Amer. Math. Soc., Providence, RI, 2012.
• Lovász and Szegedy (2006) L. Lovász and B. Szegedy. Limits of dense graph sequences. J. Combin. Theory Ser. B, 96(6):933–957, 2006.
• Miller et al. (2009) K. T. Miller, T. L. Griffiths, and M. I. Jordan. Nonparametric latent feature models for link prediction. In Adv. Neural Inform. Process. Syst. (NIPS) 22, pages 1276–1284, 2009.
• Nowicki and Snijders (2001) K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc., 96(455):1077–1087, 2001.
• Offner (2009) D. Offner. Extremal problems on the hypercube. PhD thesis, Carnegie Mellon University, 2009.
• Orbanz and Roy (2015) P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE Trans. Pattern Anal. Mach. Intell., 37(2):437–461, 2015.
• Palla et al. (2012) K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In Proc. 29th Int. Conf. Mach. Learn. (ICML), 2012.
• Pitman and Yor (1997) J. Pitman and M. Yor. The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab., 25(2):855–900, 1997.
• Roverato and Consonni (2004) A. Roverato and G. Consonni. Compatible prior distributions for directed acyclic graph models. J. R. Stat. Soc. Ser. B Stat. Methodol., 66(1):47–61, 2004.
• Thörnblad (2016) E. Thörnblad. Decomposition of tournament limits. ArXiv e-print 1604.04271, 2016.
• Veitch and Roy (2015) V. Veitch and D. M. Roy. The class of random graphs arising from exchangeable random measures. ArXiv e-print 1512.03099, 2015.
• Wang and Wong (1987) Y. J. Wang and G. Y. Wong. Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc., 82(397):8–19, 1987.
• Wasserman and Faust (1994) S. Wasserman and K. Faust. Social network analysis: methods and applications. Structural Analysis in the Social Sciences. Cambridge University Press, 1994.
• Wong (1987) G. Y. Wong. Bayesian models for directed graphs. J. Amer. Statist. Assoc., 82(397):140–148, 1987.
• Xu et al. (2007) Z. Xu, V. Tresp, S. Yu, K. Yu, and H. Kriegel. Fast inference in infinite hidden relational models. In Proc. of Mining and Learning with Graphs (MLG 2007), 2007.
• Zermelo (1929) E. Zermelo. Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Math. Z., 29(1):436–460, 1929.