1.1 A typical use case - Amazon product reviews
Amazon Reviews data sets are available for many product categories. Sizes of five of these data sets are shown in Table 1, located in Section 4.6. Fix a category, say books, and consider a bipartite graph , where denotes reviewers, denotes books, and an edge corresponds to existence of a review of a specific book by a specific reviewer. Block structure in such a graph corresponds to a clustering of some set of reviewers around some (unstated) type of books. The four point test developed in this paper quantifies the amount of block structure on a scale from 0 to 1. For example, in the original ordering, books received a 0.3 score while digital music received a 0.6, implying that reviewers of Digital Music are more bound to their music genres than are book reviewers to their type of book.
1.2 Notation for incidence data
Association mining treats an incidence matrix, represented as an undirected bipartite graph with ordered left vertices , degrees ; and ordered right vertices , degrees . There are incidences of form , also written , where
At a finer level of detail , the joint degree matrix of is the integer matrix whose entry counts the number of edges whose left endpoint has degree , and whose right endpoint has degree :
Alternatively, we may view the data as:
A binary contingency table, i.e. a 0-1 matrix with given row sums and column sums . Here
A hypergraph with given vertex degrees and hyperedge weights. Here is in bijection with the left nodes , and
1.3 What is meant by absence of block structure?
Classical studies of association in small contingency tables, summarized in Agresti 
, focus on tests for statistical independence of rows in a small, dense random matrix. For reasons discussed in AppendixA.3, such notions are entirely unsuited to the discovery of block structure in large, sparse binary contingency tables. Instead we follow a non-parametric approach suggested by the quasirandomness literature [16, 18, 19, 20].
Repeat the following experiment many times: pick edges uniformly at random, sort according to left end point, and test whether all orderings of right end points are equally frequent. This approach is too crude to detect statistical dependence between a specific pair of rows in a large matrix, but is able to detect block structure, as we shall see in Section 2.3. Moreover we will not need to consider arbitrarily large ; the choice suffices.
The first tool we shall introduce is a method of converting a sample of edges into a random permutation of symbols. All the definitions of this section extend to bipartite multigraphs, in which a sample of edges might include two edges which have the same pair of endpoints.
Sample distinct edges
in a bipartite (multi)graph where and are ordered, and sort them by left end point, so . A permutation induced by the sample means one that is selected uniformly at random from those with the property111 If there are no ties in either sorted list, then there is a unique with this property. :
How could this allow us to test for absence of block structure? Suppose edges are picked uniformly at random, and sorted by left endpoint. Repeat this many times. If there is some such that right vertices tend to appear earlier on such a list than those where , then the possible orderings of right endpoints in the sample are not equally likely. In this case lower numbered left vertices would tend to be associated with right vertices . We shall study an explicit example in Section 2.3.
A sequence of random bipartite (multi)graphs , where , , is called asymptotically block-free of order if the distribution of the permutation induced by a uniform random sample of distinct edges converges to the uniform distribution on , as . If this condition holds for all , we call asymptotically block-free.
Remark: Lemma 5.1 shows how such a sequence may be constructed.
In a practical situation, we typically have a single large graph, from which we can draw many samples of size . By partitioning the edges randomly into sets of size , we obtain such samples, each of which induces one of permutations, as in Definition 1.1
. A null hypothesiscould now be phrased as: each of the possible outcomes in these
independent multinomial trials has equal probability. An alternative hypothesis is that these possible outcomes are not equally likely. The goodness of fit test to the multinomial, with degrees of freedom would be a natural choice to test versus .
This procedure is still burdensome: it seems we must repeat for all , and then decide how to combine the results. Fortunately it suffices to consider just the case . In other words, the only hypothesis we need to test is , which means that, for the given orderings of left and right vertices, the graph is block-free of order four.
1.4 Permutations induced by samples of four edges suffice
The main result of our paper is:
If a sequence of random bipartite (multi)graphs, as in Definition 1.2, is asymptotically block-free of order four, then it is asymptotically block-free of order for all .
The proof, which will be given later, comes from combining a combinatorics result of Král & Pikhurko  concerning quasirandom permutations, with a construction which maps a bipartite (multi)graph with edges to a permutation on symbols.
2 Application: measuring block structure
2.1 4-permutations and Lehmer codes
We are given a sparse bipartite (multi)graph with edges, with total orders on the left vertices and on the right vertices.
List the 24 elements of the permutation group as . Consider the following multinomial trial: sample four edges uniformly at random from , without replacement, sort these four edges by left endpoint, and record an outcome for the trial if the right endpoints are ordered according to the permutation , as in Definition 1.1 (i.e. break ties randomly). In practical computation, the ordering of right endpoints may be represented by the Lehmer code222 Suggested by Ryan Kaliszewski, personal communication which maps the vector to
For example has Lehmer code . Next the mapping
is a bijection from to , bearing in mind that .
The null hypothesis says: each of the possible outcomes in this independent multinomial trial has equal probability . The alternative hypothesis says: in this multinomial trial, not all outcomes are equally likely. Here is how we propose to perform the test of versus in time, or indeed time if Steps 2 and 3 of Section 2.2 are distributed among processors
2.2 Four point test: computational steps
Recall that and are ordered sets of vertices, inducing two partial orders on the set of edges, namely the partial order by left endpoint, and the partial order by right endpoint, respectively.
For tie-breaking purposes, select independently, and uniformly at random, total orders and on among the linear extensions of the partial orders induced by those on and , respectively. For example, if and are sets of integers, this can be achieved by jittering each to , where are pairs of independent Uniformrandom variables, for .
Draw samples333 In practice, order randomly, then partition it into blocks of length four, discarding any remainder. of size four from , uniformly and without replacement. Thus no edge is sampled more than once.
Order each block of size four, say , , by :
so in if all these left vertices are distinct. Compute the Lehmer code (4) associated with the ordering of under , which coincides with the ordering of in , if all these right vertices are distinct. For example if the block of four is
sorting by left vertex gives
and the Lehmer code for is .
These samples yield a vector counting the frequencies of each Lehmer code using the mapping (5). Under the null hypothesis , stores independent multinomial trials, with probability vector .
Suppose we wish to test the null hypothesis , i.e. the graph is block-free of order four, for the given orderings of left and right vertices. Perform a goodness of fit test of with respect to the multinomial distribution. The expected value of each under the null hypothesis is . The four point chi-squared statistic, or 4PT-, is
which is to be compared to the upper tail of the distribution.
Suppose has been rejected, and we seek a scale-free measure of how much block structure the graph has, with respect to the given orderings of left and right vertices. We propose to use the total variation distance, or 4PT-TV, between the empirical probability measure which assigns mass to Lehmer code , and the uniform measure on the 24 Lehmer codes, namely
2.3 Basic example: bipartite graph with two blocks
Figure 1 shows an example where all the incidences in the bipartite graph fall either in or in , for some and . Suppose proportion of incidences fall into , and proportion fall into . As in Figure 1, suppose vertices in are listed before those in , and those in are listed before those in . Call this the ordered two block model.
Suppose four incidences , , , are selected uniformly at random, labelled so that
Hence out of the 24 permutations, the only possible ones when are those in the set
Likewise consists of permutations where 1 is in the first place, consists of permutations where 4 is in the last place, while . From this reasoning, we obtain the simple lemma:
In the ordered two block model, where a proportion of incidences fall into the block, and proportion fall into the block, the relative frequency of permutation is
where Binomial, and is the set of permutations which are possible under the constraint that the first of belong to .
These relative frequencies are displayed in Figure 2 as a function of . This number of curves is less than 24 because there exist different choices of for which the functions coincide.
The permutations for which (the lowest value) are those in .
Figure 2 demonstrates that when vertex ordering reveals block structure in the incidence matrix, the relative frequencies of different permutations in are tilted.
3 How vertex ordering affects the four point test
3.1 A pair of superficially similar but structurally different matrices
We shall set up a pair of Bernoulli matrix models, and , each with rows and columns, whose marginal statistics and likelihood ratio statistics (see Section A.3) are almost indistinguishable, but whose structure is entirely different, and then apply the four point test to each. We will also show how changing the vertex ordering of one of them dramatically changes the results of the four point test.
Figures 3 and 4 illustrate, respectively, (1) the structural difference between the two associated bipartite graphs, one of which decomposes completely into two components, like the one shown in Figure 1, and (2) the superficial similarity of their incidence matrices, under suitably randomized vertex orderings.
3.2 Bernoulli matrix model lacking block structure
Here is an elaborate pseudo-random construction based on modular arithmetic. Select positive integers and , such that are coprime. The real number has the properties
Partition the residues modulo into in any way so that
The Bernoulli parameters of are defined as follows. Fix a reference constant . Let be the residue class of modulo . Take
This deterministic scheme ensures that (to within a small discrepancy),
There is a pseudo-random set of cells of the incidence table with parameter .
There is a disjoint pseudo-random set of cells of the incidence table with parameter ,
The remaining of the cells of the incidence table have parameter zero.
Furthermore the proportions of each of these three types of cell are almost the same in every row and column, thanks to the use of residue classes of modulo . Indeed every column total has mean about and every row total has mean about , since the weighted sum of the parameters in 1, 2, 3 is
The main point is that no causally sparser or denser blocks of the incidence matrix will ever appear, no matter how the rows and columns are ordered, because the placements of the zero parameters are essentially different in every row and column. A realization appears on the left in Figure 4.
3.3 Bernoulli matrix model with hidden block structure
We shall now modify the last example to produce a Bernoulli matrix model , with block structure, whose Bernoulli parameters are chosen so that the number of index pairs for which is about , the number for which is about , and the rest are zero, just as for the previous case. Recall .
Let denote a random sample of rows, and let denote a random sample of columns. The random choices of and effectively screen the block structure from visual detection, when is sufficiently small. The Bernoulli parameters of are defined as follows.
This resembles the example of Section 2.3, in that a proportion out of the expected total of incidences appear in the block, and a proportion in the block. See Figure 1. Here too every column total has mean and every row total has mean
, although the variances are slightly different to those of Section3.2. In simulations of the models 3.2 and 3.3, the resulting incidence matrices are statistically indistinguishable to the naked eye for ; see Figure 4.
3.4 Four point test applied to concrete instances
The pseudo-random model of Section 3.2, and the hidden block model of Section 3.3 were instantiated with , , and . and presented in the left and right panels of Figure 4, with 3116 and 3065 incidences, respectively. Block structure is imperceptible on the right panel, because the index sets and were selected randomly.
The four point test was applied three times to each matrix. The pseudo-random matrix scores were well within the -th percentile of the distribution. The random sampling of 4-tuples of edges causes significant random variation in test scores. Similar scores were observed for the model with hidden block structure; the frequencies of different permutations are shown in Figure 5.
Finally the vertex ordering was changed for the model with hidden block structure, to make vertices in precede those in , and vertices in precede those in . Such a re-ordering could be inferred from a graph partition algorithm, such as the one444 FindGraphPartition  that produced the right pair of globs in Figure 3. After this three applications of the four point test produced scores , far in the tail of the distribution, and Figure 6 shows the highly imbalanced permutation frequencies.
3.5 Practical conclusions from the case study
The case study emphasises that, when block structure is present, the four point test will reveal it only when the vertices are ordered in a way to tilt the frequencies of the permutations in . Consecutive applications of the test to the same matrix will produce answers with a statistical variability which reflects the random sampling of 4-subsets of the edges.
4 Natural vertex order computation in a bipartite graph
4.1 Choosing right and left vertex orders
We have seen in Section 3 that the ordering of left and right vertices strongly affects the output of the four point test. In this section we describe one computationally efficient method to select a natural order for the left vertices and for the right vertices, which tends to highlight block structure and to boost the Four Point Statistic. See Figure 7 for a preview. This is not the only possible method: see Section 6.3.
4.2 Symmetric linear operators
Extending the notation of Section 1.2, introduce diagonal matrices
Rescale the incidence matrix to give the matrix:
Introduce two rescaled symmetrized Laplacian operators:
where denotes the identity matrix. Define vectors
The following well known facts are easily checked by matrix multiplications.
4.3 Positive symmetric linear operators
We shall now shift attention away from Laplacians, towards the positive symmetric operators and . We already know that is the unique eigenvector of eigenvalue 1 for , and likewise for . Introduce a new symmetric linear operator on which composes left multiplication by with projection orthogonal to , namely
Since and , we may write this operator as a rank one perturbation of :
The corresponding operator on is
Here are some facts about them, without proof.
Let denote the rank of , i.e. of , and suppose the associated bipartite graph is connected. Each of the operators and has the same set of positive eigenvalues , which belong to the set . All other eigenvalues of and are zero. Moreover if denotes the eigenvector of associated with , then is an unnormalized eigenvector of associated with , and
Remark: and are known as Fiedler Vectors for the induced graphs on the left and right vertex sets, respectively.
Without loss of generality, suppose ; otherwise work instead with . Hence we put emphasis on , and derive results for from Lemma 4.2.
4.4 Power method
Proposition 4.1 (Power Method).
Take a random vector whose components are independent normal random variables. Project orthogonal to , and rescale to norm 1 to obtain . Iterate for :
Let denote the angle such that . The event has probability 1, and in that case
This implies that, with probability 1, exists and is equal to or to . Provided , the convergence occurs at an exponential rate.
This iterative scheme is the power method decribed in Golub & Van Loan [12, Theorem 8.2.1] for the computation of the eigenvector with top eigenvalue of the symmetric linear operator . The cited theorem proves the bound on . ∎
Since an approximation suffices, we propose to fix some and to stop the iteration (12) at the first for which
For a given spectrum, (13) implies that matrix multiplies will suffice, each of which is work. We observe in practice that if the local structure of remains statistically similar as increases, the number of iterations before stopping does not vary as increases, implying that total work is . A crude upper bound for graph diameter can be obtained by selecting a left vertex uniformly at random, and taking
to be twice the number of steps of breadth first search needed to cover the graph entirely. In the absence of an estimate for, we observed that in sparse graphs iterations were sufficient for convergence when . For more on the relation between graph spectrum and graph diameter, see Chung [8, Ch. 3]
. The heuristic claim is thatwork suffices for computing an adequate natural order.
Probabilistic arguments show that the random variable in the upper bound (13) is .
In experiments, the ratio is typically less than , making the convergence faster than that implied by (13).
We have phrased the iteration (12) in terms of the symmetric operator in order to appeal to the literature on the symmetric eigenvalue problem. In computational implementation the matrix is typically given by two jagged arrays, one giving a look-up by row, and the other giving a look-up by column. The iteration (12) can be implemented under the rescaling :
where is the all ones vector. The normalization step need not be performed in the norm. It can, for example, be performed in the norm instead.
4.5 Definition of natural order
The left vertex set is in natural order if vertices are in decreasing or increasing order of the corresponding components of the eigenvector , described in Lemma 4.2. Likewise components of supply a natural order for the right vertex set .
In this definition we do not insist that or be computed precisely. Indeed an approximation, constructed as in Proposition 4.1, suffices.
See Figure 7 for an illustration of an incidence matrix transformed into natural order of left and right vertices.
4.6 Scaling behavior in natural order and four point test computations
The natural order and four point test computations have been implemented both in a Mathematica prototype and in a performant Java 10 package called QuantifyBipartiteBlockStructure.
We simulated some -regular random hypergraphs on vertices, where the vertices in hyperedge were not picked uniformly, but were a weighted sample using weight for vertex , which tends to force incidences away from the diagonal. For , we simulated two instances of such random hypergraphs for parameter choices , , . Empty columns were discarded. Figure 7 shows one of the largest matrices, both before and after the natural order computation.
The four point total variation score (8) was always in the range for the raw matrix, and in the range for the naturally ordered matrix, regardless of scale. varied as much between two matrices of the same size as it did between two matrices of different sizes. This suggests the possibility of proving limit results for values of as under suitable assumptions about the matrix generation mechanism.
On the other hand the four point chi-squared statistic (7) scales in proportion to in these examples.
|Review Set||# edges||# left||# right||TV||giant||TV-NO||4PT||N.O.|
4.7 Large natural order and four point test computations
We performed four point test and natural ordering computations on five sets of Amazon Reviews data555 jmcauley.ucsd.edu/data/amazon, as shown in Table 1. In all cases left vertices were reviewers, and right vertices were products of a specific type. Reading the data took longer than performing the four point test, whose time scaled linearly in the number of edges, as expected; see Figure 8. It is noteworthy that execution times for ratural ordering, which typically required about 25 iterations of the power method, also scaled linearly in the number of edges.
Only for Amazon Reviews of Books did the natural ordering improve the score in the four point total variation statistic (8). For the other four product categories, the original order yields a higher score. The high scores suggest that, for example, music tracks fall into music genres, and reviewers of one genre do not tend to review other genres. This effect is least for books: some reviewers may rate multiple types of literature.
5 Transition between random permutations and random bipartite graphs
The theory in this section leads to a proof of Theorem 1.1.
5.1 Fixed or random vertex degrees?
For modelling applications, and for algorithm development, we seek efficient ways to generate asymptotically block-free random bipartite (multi)graphs with arbitrary marginal degree distributions. In this section we describe one such natural construction. See also .
When considering heavy-tailed distributions, for example with finite mean but infinite variance, the case of random left and right vertex degree vectors is more important, since the maximum vertex degree may be very large. This will be treated in a separate paper .
5.2 Random permutation generates bipartite graph: fixed vertex degrees
This section is inspired by the half-edge construction due to Wormald, and the configuration model in Bollobás [4, Section II.4]. A totally ordered left vertex set and a totally ordered right vertex set are given. Fix a left vertex degree vector , and right vertex degree vector in advance, where both vectors sum to . It is required that has degree , and has degree . It is convenient to introduce the partial sums
with , .
Construct two sequences and of vertex labels, both of length , where when , and when . Thus contains symbols referring to , then symbols referring to , and so on:
while contains symbols referring to , then symbols referring to , and so on. We call and left and right half-edge vectors, respectively.
Given left and right half-edge vectors and , respectively, of length , the bipartite multigraph induced by a permutation is the graph whose edge set consists of the pairs
We estimate in Section B.1 the expected number of duplicate edges in . Blanchet & Stauffer  give necessary and sufficient conditions, also proved in Janson , for the asymptotic probability of obtaining a simple graph to be positive.
The following lemma is nearly a tautology, given the construction (14).
Suppose for each , and are left and right half-edge vectors of the same length , where and are the sets of distinct labels occurring in the respective vectors. Take to be bipartite (multi)graph on induced, as in (14), by a uniform random permutation . If , , and . then is asymptotically block-free.
Fix . For any such that , select edges uniformly at random, say
where for brevity we have dropped the index from the notation. The left endpoints are already in increasing order. Since the permutation is uniformly random, the right endpoints are ordered uniformly at random. Thus the every , the sequence is asymptotically block-free of order . ∎
5.3 Inversion of the half edge construction
Let us elaborate on the construction of total orders on edges, introduced in Section 2.2. Fix an arbitrary total order on the edges, , with the property that, for all ,
In other words, the order on edges is consistent with the order on left vertices. Next generate i.i.d. Uniform random variables , which will be used as tie breakers in the following way. Extend the right half-edges above, i.e.
to a series of pairs
This yields another total order on the edges, namely lexicographic ordering using first the ordering on the , then the ordering on the . In other words,
if either , or else and .
From the constructions above, the following Inversion Lemma is a tautology.
5.4 Permutation terminology: Property
This terminology is reproduced from . Let consist of permutations on . We view each as a bijection , and we say that the length of is . For and with , let be the probability that a random -point subset of induces a permutation isomorphic to (that is, iff where consists of ). A sequence of permutations is said to have Property if their lengths tend to and for every . It is easy to see that implies .
5.5 Proof of Theorem 1.1
Take a sequence of random bipartite (multi)graphs, where , , and , which is asymptotically block-free of order 4.
Apply Definition 5.2 to convert each graph into a random permutation of symbols. From asymptotically block-freeness of order 4, and the auxiliary randomization (16), it follows that Property holds for the sequence , in the sense of Section 5.4. Theorem 1 of Král & Pikhurko  shows that Property holds for all . Together with Lemma 5.2, this implies is asymptotically block-free of order , for all , as desired.
6 Open problems
In this new area of research, many topics remain to be explored.
6.1 Directed non-bipartite graphs
6.2 Vertex exchangeability and edge exchangeability
Caron & Fox  present constructions of random bipartite graphs where the left vertices are exchangeable, and the right vertices are exchangeable. A general case is described by Borgs, Chayes, Cohn and Holden . Cai, Campbell & Broderick  and Crane & Dempsey  have defined the notion of an edge-exchangeable graph sequence. What happens when one applies the four point test to vertex-exchangeable or edge-exchangeable graph sequences?
6.3 Minimum degree instead of natural order
The natural order defined in Section 4 is neither the only, nor the cheapest, approach to ordering rows and columns of a sparse matrix in order to expose something resembling block structure. For example, Duff et al  describe the minimum degree algorithm. This starts with all rows declared active, and terminates when no active rows remain. Active degree of column means the number of incidences of column with active rows. Iterate as follows:
Select some column uniformly at random from those of minimum non-zero active degree, and place it next in the column ordering.
Active rows incident to are placed next in the row ordering, and are then declared inactive.
Update active degrees of columns by subtracting counts of incidences with newly inactive rows.
Column labels left over when active rows are exhausted are placed in arbitrary order, after the others. We would like to know whether applying minimum degree to some kinds of sparse matrices leads to higher or lower 4PT-TV scores than applying natural order.
6.4 Discrepancy measures in bipartite graphs
Given vertex sets and in a directed bipartite graph , let count the set of edges between and :
The total degree of vertices in , and in , respectively, is
Motivated by the notion of discrepancy, which gives one of the equivalent definitions of a quasirandom permutation , define the discrepancy in the bipartite graph to be the random variable
The open problem is to give computable bounds on the discrepancy of a sequence of random bipartite graphs which are asymptotically block-free in the sense of Definition 1.2. Possibly such bounds may be derived from concentration inequalities such as Theorem D.1 below.
6.5 Relation to quasirandom hypergraphs
Quasirandom hypergraphs are those which have the properties one would expect to find in “truly” random hypergraphs, in which a -edge contains vertices selected uniformly without replacement, and all -edges are statistically independent. Shapira & Yuster , Lenz and Mubayi , , and other authors cited therein, study quasirandomness in sequences of dense -uniform hypergraphs, meaning that, for some , the number of hyperedges with vertices inside any is
The study of quasirandom structures lies at the core of recent proofs of Szemerédi’s Theorem (see ) obtained by Gowers, and by Rödl et al. We would like to clarify how this theory of dense quasirandom hypergraphs interacts with the approach to sparse hypergraphs (viewed in terms of bipartite graphs and quasirandom permutations) that we have taken here.
Appendix A Appendix: likelihood ratio statistic for sparse binary contingency tables
This section is intended to assuage the concerns of statisticians for whom tests of association in binary contingency table necessarily involve the likelihood ratio statistic, as presented in texts such as Agresti . We will see that the likelihood ratio statistic detects association between row degree and column degree in a binary contingency table, and its variance detects non-uniform incidence rates, but it does not detect block structure in sparse tables, as we show in a painstaking example.
a.2 Likelihood ratio statistic in the Bernoulli matrix model
For simplicity consider first the Bernoulli matrix model of a binary contingency table, where incidence is Bernoulli, for some constants with values in . Let denote the set of index pairs for which
. Define a log odds ratio
and a normalizing constant