Rectangle-Loop-Algorithm
None
view repo
Uniform sampling of binary matrix with fixed margins is an important and difficult problem in statistics, computer science, ecology and so on. The well-known swap algorithm would be inefficient when the size of the matrix becomes large or when the matrix is too sparse/dense. Here we propose the Rectangle Loop algorithm, a Markov chain Monte Carlo algorithm to sample binary matrices with fixed margins uniformly. Theoretically the Rectangle Loop algorithm is better than the swap algorithm in Peskun's order. Empirically studies also demonstrates the Rectangle Loop algorithm is remarkablely more efficient than the swap algorithm.
READ FULL TEXT VIEW PDFNone
The problem of sampling binary matrices with fixed row and column sums has attracted much attention in numerical ecology. In ecological studies, the binary matrix is called occurrence matrix. Rows usually corresponds to species, the columns, to locations. For example, the binary matrix shown on Table 1 is known as “Darwin’s Finch” dataset, which comes from Darwin’s studies of the finches on the Galapagos islands (an archipelago in the East Pacific). The matrix represents the presence/absence of 13 species of finches in 17 islands. A or in entry indicates the presence or absence of species at island . It is clear from Table 1 that some pairs of species tend to occur together (for example, species 9 and 10) while some other pairs tend to be disjoint. Therefore, it is of our interest to investigate whether the cooperation/competition influences the distribution of species on islands, or the patterns found are just by chance.
Island | |||||||||||||||||
Finch | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |
1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
4 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
5 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
10 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
11 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Assuming different species have independent distributions on islands, then the observed binary matrix is simply a random sample from the uniform distribution of all the binary matrices with fixed margins. Table
2 gives an example of all configurations of binary matrices with as both row and column sums. Ideally, if we could list all the binary matrices with arbitrary size, then we could compare the pattern found in the observed matrix with others, to conclude whether the observed matrix is simply by chance. However, enumerating matrices with fixed margins is often impractical both theoretically and computationally for moderate size of matrices. Therefore, sampling such random matrices becomes the natural choice.A | B | C | D | E |
---|---|---|---|---|
The problem of sampling such matrices also occurs in many other fields, with different names. For example, an equivalent formulation is uniformly sampling undirected bipartite graphs with given vertex degrees. A bipartite graph is a graph whose vertices are divided into two disjointed sets, denoted by , . is called the edge set where every edge connects one vertex in to one in . The binary matrix is often called the bi-adjacency matrix of and is defined by
Bipartite graphs are often used in network studies to model the interaction between two objects, for example, customers and products. It is often required to sample graphs with preserved degree sequence in network analysis uniformly. Throughout this paper, we will use the term ‘binary matrix’ instead of ‘bipartite graph’ to avoid confusion, although they are equivalent.
The algorithms of sampling binary matrices with fixed margins are divided into two classes. The first class of algorithms relies on the rejection sampling or importance sampling techniques, see [Sni91], [MP04], [CDHL05], [HM13] [HJ96]
for examples. Importance sampling usually provides degree from non-uniform distribution, but it can be used to construct estimators to estimate the quantities of interest, such as the number of binary matrices with given margins. Chen et al.
[CDHL05] introduced a sequential importance sampling (SIS) scheme to test the hypothesis we mentioned at the beginning of this paper on “Darwin’s Finch” dataset.The second class falls into the Markov Chain Monte Carlo (MCMC) category and will be our main focus in this paper. The well-known “swap algorithm” has been used for decades. To the author’s best knowledge, it is first introduced by Besag and Clifford [BC89] in 1989 to solve a statistical testing problem. The swap algorithm has been formally proposed and analysed by Rao et al. [RJB96] and [KTV99] in the 1990s. A similar question is to sample matrices with non-negative integer entries, fixed row and column sums. Diaconis and Gangolli have proposed a random walk Metropolis algorithm [DG95]. Many variations and extensions of this algorithm are described by Diaconis and Sturmfels [DS98].
The swap method attempts to make a single swap in each iteration, but when the matrix is large, or is mostly filled (or unfilled), the efficiency of swap algorithm can be relatively low. In 2008, Verhelst [Ver08] proposed a new MCMC algorithm based on the idea of performing multiple swaps per iteration. In 2014, Strona et al. [SNB14] introduced the “Curveball algorithm”, which uses a ‘fair trade’ operation to replace the ‘swap’ operation in the swap algorithm, aiming for a faster mixing. The mathematical formulation of Curveball algorithm is equivalent to Verhelst’s algorithm, but with different implementation and reasoning. A nice survey and numerical comparisons of the existing algorithms can be found in a recent dissertation [Rec18]. The class of ‘multiple swaps’ algorithms tends to improve the mixing time empiricially. However, each step of the ‘multiple swaps’ algorithm is slower than the classical swap algorithm. Meanwhile, it is hard to compare the ‘multiple swaps’ algorithms and classical swap algorithm theoretically, as the corresponding Markov-chains have complicated behaviors and are therefore hard to analyze mathematically. The only existing result can be found in [CK18].
In this paper, we introduce a novel algorithm called Rectangle Loop algorithm. The algorithm is based on the classical swap algorithm, with a careful utilization of the matrix structure given by margins. We have also proved the resulting Markov Chain dominates the classical chain used in the swap algorithm in the sense of Peskun’s partial ordering [Pes73], and is easy to implement. Section 2 gives a review of swap algorithm and Curveball algorithm, including the details of both algorithms and a discussion. In Section 3 we introduce our new algorithm – Rectangle Loop algorithm. Section 4 proves the theoretical properties of Rectangle Loop algorithm. Section 5 gives numerical results.
The swap algorithm, or equivalently, swap chain is based on the idea of swapping checkerboard units. Here a checkerboard unit is a two by two matrix with one of the following forms:
A swap means changing one checkerboard unit to the other.
Starting from an initial matrix, one chooses two rows and two columns uniformly at random among all rows and columns. If the resulting submatrix with entries in the intersection of these rows and columns is a checkerboard unit, it is swapped, otherwise, do nothing.
Input: initial binary matrix , number of iterations
The swap algorithm is a Metropolis-type Markov chain Monte Carlo which converges to the uniform distribution.
The swap algorithm can often be inefficient, taking Darwin’s Finch data for example, there are submatrices with size , however, only about of them are swappable. This means it requires a very large (the number of iterations) to ensure the generated degree is close to uniformly distributed. The Curveball algorithm provides another solution.
Input: initial binary matrix , number of iterations
The Curveball algorithm uses ‘trade’ instead of ‘swap’ operation in each iteration. Steps 3-5 in Algorithm 2 gives an illustration of trading, it trades elements in column with elements in column for row and row , preserving their row and column sums. Though seemingly complicated, there is a very intuitive explanation of the Curveball algorithm. We refer the readers to [SNB14] for detailed illustrations.
The swap algorithm is proven to converge to uniform distribution. However, getting stuck at the same configuration is inefficient and thus the convergence could be very slow. For example, numerical experiments suggest that one would expect more than iterations before each successful swap using ‘Darwin’s Finch’ dataset. Assuming the randomly chosen row is mostly filled, such as row in Table 1, the two random chosen entries in this row would most likely be but the row of a ‘checkerboard unit’ has to be either or . Therefore swapping rarely happens when the chosen row/column is mostly filled (or equivalently, mostly unfilled).
The Rectangle Loop algorithm is designed to increase the chance of swapping. The idea is illustrated in Figure . In this example the target matrix is of size , with row names and column names . In each step, we choose one row and one column uniformly at random (Step A). Suppose and is chosen, with corresponding entry , the red number in the top middle plot of Figure 1. Then we randomly choose a among all the s in (Step B). Since there is only one in , which is at location , this is our only choice. Again, we scan through all the entries in the same column with the just chosen () and randomly choose a among all s (Step C). In our example, the s of are located at and . Suppose we have chosen . Now the three locations altogether give us the fourth one , making the four entries a rectangle (Step D). If the fourth entry equals , then we swap the submatrix as we did in swap method (Step E). Otherewise the fourth entries equals and the original matrix is not changed. After Step A - E is iterated for many times, the resulting randomized matrices are used as representatives of uniformly distributed matrix with fixed margins.
The main difference between the Rectangle Loop algorithm and the swap algorithm is the sampling scheme. The Rectangle Loop algorithm is performing ‘conditional sampling’, making it more efficient than swap method, which is doing ‘unconditional choosing’. For example, suppose both the swap method and Rectangle Loop algorithm have chosen and , an entry with value . Then has only one which is in column
. For the swap algorithm, the probability of correctly choosing
is only , as it is uniformly choosing among all columns. The Rectangle Loop algorithm, however, as the mechanism guarantees we only sample from the zero entries, chooses with probability . Therefore it significantly increase the swapping probability, leading to a faster convergence than the swap chain.The details of the Rectangle Loop algorithm is described in Algorithm 3. Noteworthy, when finding a entry, we sample with the same column as the . When finding a , we sample with the same row as the . This ‘symmetric’ design ensures the algorithm that converges to the correct distribution, as will be proved in Section 4. The paths of sampled entries in each iteration always form a rectangle, that is where the name ‘Rectangle Loop algorithm’ comes from.
Given row sums and column sums , we define be the set of all matrices with row sums r and column sums c. The suffcient and necessary condition for not being zero is given by Gale [G57], Ryser [Rys09] in 1957. We call two matrices , ‘swappable’ if one can transform to the other via one step swap algorithm. Equivalently, and only differs in a ‘checkerboard unit’. For the sake of simplicity, we assume henceforth for any , as otherwise we could simply delete that degenerate row/column. The following theorem characterizes the limit distribution and transition probability of the swap chain.
Given and an initial matrix , the swap algorithm defines a Metropolis-type Markov chain with stationary distribution , transition kernel:
and acceptance probability .
When and are swappable, there exists two rows and two columns such that and only differs in the submatrix extracted by row and column . Therefore the probability of swapping to equals
Recall that in the setting of Metropolis-Hastings, the acceptance probability from to is given by
notice here the stationary distribution is designed to be , . Hence it is clear that the swap algorithm is a Metropolis-type Markov chain with as stationary distribtuion and acceptance probability , which justifies the correctness of the swap algorithm. ∎
The key point in swap algorithm is symmetry. When two different states are swappable, the associated transition probability is symmetric, i.e.,
this ensures the chain has acceptance probability .
In order to compare the efficiency of different Markov kernels with the same distribution, Peskun [Pes73] first introduced the following partial-ordering.
Let , be two Markov transition kernels on the same state space with same stationary distribution , then dominates off the diagonal, , if
for all and measurable with .
When the state space is finite, as in our case, iff all the off-diagonal entries of are greater than or equal to the corresponding off-diagonal entries of . This indicates has lower probability to get stuck in the same state, and is exploring the state space in a more efficient way. The following theorem shows the rectangale loop algorithm also has uniform distribution as stationary distribution, and the corresponding chain dominates the swap chain off the diagonal. For simplicity, we will use and to denote transition kernel for the swap chain and rectangle loop chain, respectively, omitting its dependency on
Given and an initial matrix , the Rectangale loop algorithm defines a Metropolis-type Markov chain with stationary distribution . The transition kernel dominates off the diagonal.
Given any two swappable configurations and , we are aiming to show .
As are swappable, there exists two rows and two columns such that and only differs in the submatrix extracted by row and column . Without loss of generality, we assume the checkerboard unit corresponding to has the form , as shown in Figure 4. Notice that there are four vertices of the submatrix and the Rectangle Loop algorithm chooses one arbitray row and column at its first step. This suggests the probability of transforming to equals the summation of four probabilities, each one corresponds to choosing one specific vertex of the ‘checkerboard unit’ .
Figure 2 illustrates the calculation of one path, starting from the vertex . The possibility of choosing row and column is . Then one chooses a among all the s in row , and there are of them. Therefore the possibility of choosing column equals . Similarly, after choosing , one chooses a among all s in column , and there are of them. Hence the possibility of choosing row equals . The fourth entry is fixed after determining the first three entries thus the last step has probability . Multipling the possibilities above altogether, the possibility of transforming to , starting with , equals
The probability of transforming to with other starting vertex can be calculated accordingly. It turns out can be written as the following summation:
Following the same strategy, can also be calculated below. Figure 3 illustrates the calculation of starting from .
After matching all the terms of with , we conclude that , which justifies the Rectangle Loop algorithm has Unif() as stationary distribution.
To show , notice that . and
It is clear that can be written as the summation of four terms. Each term in the summation is greater than or equal to . Therefore we conclude that for any swappable . This indicates , the Rectangle Loop algorithm is exploring the state space in a more efficient way than the swap algorithm. ∎
Now we use the example used in [SNB14] and [MP04] to compare the existing algorithms and the Rectangle Loop algorithm. The example below is concrete. The transition matrix can be calculated explicitly and convergence can be assessed analytically.
The five matrices shown in Table 2 are all possible configurations of binary matrices with as both row and column sums. The transition matrices for swap algorithm, Curveball algorithm and Rectangale loop algorithm is shown in Table 3.
Swap | Curveball | Rectangle Loop |
---|---|---|
Figure 4 shows the comparison of the three algorithms. Here we measure the distance between transition kernel and the stationary distribution by total variation distance:
where denotes the power. It is clear that all the algorithms converge exponentially to uniform distribution, but the Rectangle Loop algorithm converges faster than swap algorithm and Curveball algorithm.
For larger matrices, it is infeasible to calculate the transition matrix theoretically. To provide empirical justification for the advantage of the Rectangle Loop algorithm. We have designed the experiment as follows. For each , a binary matrix is generated for which each entry has probability to be . We ran each algorithm for iterations and collected the corresponding number of swaps, as shown in Table 4. When the filled portion is small, the Rectangle Loop algorithm is extremely efficient, producing more than times more swaps than the swap algorithm. For large , the advantage of Rectangle Loop algorithm is reduced, but still very significant. For , the Rectangle Loop still produces
times more swaps than the swap algorithm. Noteworthy, the zeros and ones play the symmetric rule in a binary matrix, therefore it is not necessary to generate the random matrix for
.The result above justifies our theoretical result that the Rectangle Loop algorithm converges faster than the swap algorithm. However, the above experiments did not consider the running time for each iteration. In fact, one iteration of Rectangle Loop algorithm is computationally more expensive than that of swap algorithm. To investigate this issue, we also record the time per swap for both algorithms, as shown in the last column of Table 4. It turns out that the Rectangle Loop algorithm still has a significant advantage than the swap algorithm after the running time issue is taken into account. For , the Rectangle Loop is about times more efficient than the swap algorithm. Even for , Rectangle Loop algorithm is still about times more efficient than the swap algorithm.
Method | Filled portion | Number of swaps | Time per swap (/s) |
---|---|---|---|
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap | |||
Rectangle Loop | |||
Swap |
We have also used the pertubation score suggested by Strona et.al. [SNB14] to access convergence for both algorithm. Pertubation score of a matrix is defined by the fraction of cells differing from the corresponding ones of the initial matrix. It takes several iterations for each algorithm to stabilize around its expectation. It is shown in Figure 5 that it takes less iterations and less time for the Rectangle Loop algorithm to stabilize, suggesting a faster mixing than the swap algorithm.
Going back to ‘Darwin’s Finch’ dataset, we use the test statistics
suggested by Roberts and Stone [RS90] to compare the three algorithms. is defined bywhere is the number of rows of matrix , . For the finch data, . Suppose this number is too large or too small, comparing with its expectation over all the matrices having the same margins as finch data. We would like to conclude that the cooperation/competition do influence the distribution of species. To investigate this, we implemented the swap algorithm, Curveball algorithm and Rectangle Loop algorithm on the same data for iterations, using its average as an estimator for . The results are shown in Figure 6. After iterations, Rectangle Loop algorithm gives an estimate of
with standard deviation
, swap algorithm gives an estimate of with standard deviation , Curveball algorithm gives an estimate of , with standard deviation . Therefore the observed data falls outside the three standard deviation boundaries for all three algorithms, suggesting strong evidence that the observed occurence matrix is not just by chance. Meanwhile, both swap algorithm and Rectangle algorithm gives similar estimations and lower standard deviations, which seem to be more accurate than the Curveball algorithm. Lastly, there is a significant pattern in Figure 6 that for both and standard deviation estimation, the Rectangle Loop algorithm becomes stabilized much earilier than the swap algorithm, indicating a faster mixing.There is a growing tendency to study the behavior of binary matrices with fixed margins in numerous scientific fields, ranging from mathematics to natural science to social science. For example, mathematicians and computer scientists are interested in the total number of configurations of given margin sums. Ecologists use the so-called occurence matrix
to model the presence/absence of species in different locations. Biologists use the binary matrix to model neuronal networks. Social scientists use the binary matrix for studying social network features.
One of the central and difficult problems is uniformly sampling binary matrices with given margins. In this article we have developed the Rectangle Loop algorithm which is efficient, intuitive and easy to implement. Theoretically, the algorithm is superior to the classical swap algorithm in Peskun’s order. In practice, the Rectangle Loop algorithm is notably more efficient than the swap approach. For a fixed number of iterations, Rectangle Loop algorithm produces times more successful swaps than the swap algorithm. For a fixed amount of time, Rectangle Loop algorithm still produces times more successful swaps than the swap algorithm. This suggests the Rectangle Loop algorithm is efficient both statistically and computationally.
There are many other problems that remain. From a theoretical point of view, it is important to give sharp bounds on the convergence speed of a given Markov chain. However, giving a useful running time estimate is often challenging in practical problems. It would be very interesting if the swap algorithm, Curveball algorithm and the Rectangle Loop algorithm can be investigated analytically. From an applied point of view, there are many factors that influence the performance of algorithms, such as running time per step (swap algorithm is the fastest, while Curveball algorithm is the slowest), initialization of the matrix, size of the matrix, ratio between row number and column numbers, filled proportions. Our empirical studies suggest that all the factors have a significant impact on the convergence speed for all the algorithms. It would be beneficial if more numerical experiments are carried out, yielding a complete and comprehensive comparison between all the existing algorithms.
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)
. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.