Phase transition in random contingency tables with non-uniform margins

03/20/2019
by   Sam Dittmer, et al.
0

For parameters n,δ,B, and C, let X=(X_kℓ) be the random uniform contingency table whose first n^δ rows and columns have margin BCn and the last n rows and columns have margin Cn . For every 0<δ<1, we establish a sharp phase transition of the limiting distribution of each entry of X at the critical value B_c=1+√(1+1/C). In particular, for 1/2<δ<1, we show that the distribution of each entry converges to a geometric distribution in total variation distance, whose mean depends sensitively on whether B<B_c or B>B_c. Our main result shows that E[X_11] is uniformly bounded for B<B_c, but has sharp asymptotic C(B-B_c) n^1-δ for B>B_c. We also establish a strong law of large numbers for the row sums in top right and top left blocks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 6

page 7

page 8

page 13

page 21

page 22

09/22/2020

On the number of contingency tables and the independence heuristic

We obtain sharp asymptotic estimates on the number of n × n contingency ...
10/28/2019

Limiting behavior of largest entry of random tensor constructed by high-dimensional data

Let X_k=(x_k1, ..., x_kp)', k=1,...,n, be a random sample of size n comi...
03/08/2018

Sample Complexity of Total Variation Minimization

This work considers the use of Total variation (TV) minimization in the ...
07/24/2021

On the Le Cam distance between multivariate hypergeometric and multivariate normal experiments

In this short note, we develop a local approximation for the log-ratio o...
08/27/2018

Max-Min and Min-Max universally yield Gumbel

"A chain is only as strong as its weakest link" says the proverb. But wh...
09/28/2020

Eigenvector distribution in the critical regime of BBP transition

In this paper, we study the random matrix model of Gaussian Unitary Ense...
03/30/2021

LASSO risk and phase transition under dependence

We consider the problem of recovering a k-sparse signal _0∈ℝ^p from nois...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

1.1. Random contingency tables

Contingency tables are fundamental objects in statistics for studying dependence structure between two or more variables, see e.g. [Eve92, FLL17, Kat14]. They also correspond to bipartite multi-graphs with given degrees and play an important role in combinatorics and graph theory, see e.g. [Bar09, DG95, DS98]. Random contingency tables have been intensely studied in a variety of regimes, yet remain largely out of reach in many interesting special cases, see e.g. [Bar10b, CM10].

Let

be two nonnegative integer vectors with the same sum of entries. Denote by

the set of all contingency tables with row sums  and column sums , i.e.

(1.1)

Let be the contingency table chosen uniformly at random from . The asymptotic properties of the entries of as is the subject of this paper. When the margins are uniform, i.e.  and , the exact asymptotics for are known [CM10, GM08]. In fact, the distribution of individual entries  is asymptotically geometric and the dependence between the entries vanish as the size of the table goes to infinity [CDS10].

In this paper we analyze random square contingency tables where the row and column sums have only two distinct values, and . Viewing such as block matrices, see Figure 1

, it natural to assume that the entries are again nearly independent and identically distributed within each block. However, there is still one degree of freedom remaining: the distribution of mass of each block. We establish a sharp phase transition for this distribution. The following corollary is a special case of general results we present in the next section.

Figure 1. Contingency table with parameters and . First rows and columns have margins , the last rows and columns have margins .
Corollary 1.1 (see Theorem 2.2).

Fix constants and . Let be the uniform random contingency table with the first row and column margins , and the last row and column margins . Then:

where the critical value.

In the next section we present various extensions and refinements of this results, including precise constants implies by the notation. We also extend the results to , although our results are not as strong in this case.

The story behind the phase transition in the corollary is quite interesting. For , and the phenomenon of large entry was first observed by Barvinok in [Bar10b, 1.5]

for the (non-uniform) distribution of

“typical” contingency tables. In fact, Barvinok showed that for and , the phase transition for typical tables happens at   , see [Bar10b, Bar12].

In [DLP19+], the authors tested empirically uniform contingency tables using a new MCMC algorithm introduced in [DP18], and the experiments seem to confirm Barvinok’s conjectured value for the critical . In fact, the simulations show drastically different behavior for the subcritical vs. supercritical cases. Here we analyze the entry in a random uniform contingency table with both margins  , see Figure 2. This is the case , where  .

Figure 2. Summary of simulations of entry in a random uniform contingency table with both margins  , where . The left graph corresponds to a subcritical value , and the right graph to a supercritical value .

In some sense, the simulations showed a sharper phase transition than the typical matrices: not only the expectation    exhibited jump from bounded to having linear growth, but the distribution of switches from geometric in the subcritical case to normal in the supercritical case.

This paper gives the first rigorous proof of the phase transition in the uniform case. Although we do not cover the case introduced by Barvinok, we conjecture the phase transition extends to this case. In fact, our results go beyond what the simulations in [DLP19+]

suggest, as we interpolate between

and (uniform) case. Rather surprisingly, we show that for the behavior of random uniform and typical matrices remains similar, with a geometric distribution in the supercritical case. We conjecture that there is an additional phase transition at , and for the distribution of is normal in the supercritical case (see Conjecture 3.2). In the limiting case

this is supported by the simulations mentioned above. Further conjectures with more refined estimates are given in Section 

3.

1.2. Background

Let be the transportation polytope of real nonnegative contingency tables with margins and , i.e. defined by (1.1) over . Clearly, . When , , we have is the classical Birkhoff polytope

, of interest in Combinatorics, Discrete Geometry, Combinatorial Optimization and Discrete Probability, see e.g. 

[DK14, Pak00]. The asymptotic behavior of the  vol  is known [CM09], as well as the exact value and the whole Ehrhart polynomial for , see [BP03], and numerical estimates for  [CV16]. Such sharp volume estimates were crucially used in [CDS10] to analyze the asymptotic behavior of random contingency table  for uniform margins.

For non-uniform margins the existing sharp asymptotic results cover only smooth margins, a technical condition which includes the case when all ratios and are bounded, see [BLSY10, BBK72, BC78, CM10] for precise statements. For general margins and , upper and lower bounds on were given in [Bar09, Bar10b] (see also Theorem 4.3). The proof of our main results heavily rely on these bounds.

Now, in statistics, a popular practice is to sample from the hypergeometric (Fisher–Yates) distribution defined as follows:

(1.2)

and   denotes the total sum of a contingency table in . We refer to [DE85, Eve92] for an extensive discussion and to [FLL17, Kat14] for the recent treatment.

The rationale behind this approach lies in the independence table, defined by  . This table  gives both the expectation of the Fisher–Yates distribution and is also the unique maximizer of the following strictly concave function

(1.3)

in the transportation polytope , see [Goo63, Ex. (iv)]. Note that for each , if we view as the ‘population contingency table’, where each entry is understood as the marginal probability of the entry , then defined at (1.3) is the entropy of the probability mass function 

. However, it is known that the hypergeometric distribution may not properly capture the behavior of the uniform contingency table

.

When one tries to find the marginal distribution for that maximizes the overall entropy subject to the margin condition, instead of viewing each contingency table as a rescaled probability mass function, one finds that the entries must be independent and geometrically distributed. Furthermore, one can further maximize the entropy by optimizing the mean of each entry. This leads to the notion of the typical table (see Definition 4.1), introduced by Barvinok in [Bar09] and further exploited in [Bar10a, Bar10b, BH10]. The behavior of typical and independence tables are known to be similar when the margins are relatively uniform [BLSY10], but could be drastically different when the margin are strongly asymmetric [Bar10b, 1.6].

In [Bar10b], Barvinok showed that there exists a phase transition in the behavior of the typical table for a simple model of contingency tables with asymmetric margins. Namely, let be the typical table for where  . For , all entries of are equal by the symmetry. In particular, the corner entry is bounded by for all . On the other hand, Barvinok [Bar10b] showed that for , the entry has linear growth

(1.4)

while all the other entries of are uniformly bounded by  . Hence, as passes a certain critical value , the ‘mass’ within the typical table suddenly concentrates at the corner entry . As we mentioned earlier, that is the starting observation for this paper.

1.3. Notation

We use    and  . For all  , denote    and  .

For all , we write

for a discrete random variable

with probability mass function111This notation is somewhat nonstandard, but is more convenient for our purposes.

(1.5)

Note that . We call a geometric random variable with mean 

. For every two probability distributions

, over a countable sample space , the total variation distance is defined as

(1.6)

Let and be random variables with distribution and , respectively. To simplify the notation, we write:

(1.7)

2. Statement of results

For parameters , , and , let , where

(2.1)

In other words,  is the set of contingency tables whose first rows and columns have margin and the other rows and columns have margin , see Figure 1. Let be the random contingency table sampled uniformly from . We are interested in the asymptotic behavior of the entry as for various choice of parameters   and . Note that the entries within each of the four blocks in Figure 1 have the same distribution by the symmetry.

We establish a sharp phase transition at

for the limiting expectation of the entries of . The following theorem shows that the limiting distribution of each entry of is geometric with mean depending on whether or . See Figure 3 for an illustration.

Theorem 2.1.

Let be sampled from uniformly at random. Fix and let be as above.

(i)

[bottom right]  For all and , we have:

(2.2)
(ii)

[sides]  For all and , we have:

(2.3)

where

(2.4)
(iii)

[top left]  For all and , we have:

(2.5)

where

(2.6)
Figure 3. Limiting distributions of the entries in the uniform contingency table in the subcritical (left) and supercritical (right) regimes for thick bezels .

Our second result proves the phase transition of entries of random contingency tables in expectation.

Theorem 2.2.

Let    be sampled from    uniformly at random. Let be as above.

(i)

[bottom right]  For all , , and , we have:

(2.7)
(ii)

[sides]  For all , we have:

(2.8)
(iii)

[top left]  For all , we have:

(2.9)

and

(2.10)

Lastly, we establish strong law of large numbers for row sums of entries in in the top left and bottom right blocks.

Theorem 2.3.

Fix and . Let be sampled from uniformly at random, and let be as above. Then a.s., as , we have:

(2.11)

Furthermore, for all    and  , a.s. as , we have:

(2.12)

As we mentioned in the introduction, the proofs utilize Barvinok’s technology of “typical tables" and upper and lower bounds on the number of contingency tables for general margins. Except for the next section where we summarize conjectural extensions of of our theorems, the rest of the paper is dedicated to proofs of the results.

3. Conjectures

In view of the strong law of large number for the top right block of given by Theorem 2.3

, we conjecture that a central limit theorem also holds in the supercritical regime

for at least when . However, we believe that a central limit behavior should not be expected for the subcritical regime . Our rationale is that, according to Theorem 2.3, the first row sum in the top right block of is asymptotically for , which is the full row sum of . Hence there is not much room for each entry in the top right block to fluctuate. On the other hand, for , the row sum in the top right block only contributes only to , which is independent of . Hence when is large, there is enough room for them to fluctuate, and they would not feel the ‘bar’ of since they must fluctuate around .

Conjecture 3.1.

Fix and . Let be sampled from uniformly at random. Denote

(3.1)

Then as , we have:

(3.2)

where

is the standard normal distribution and  “

”  denotes the weak convergence.

Note that

is the variance of the geometric distribution with mean

. We remark that currently we only know that, for all   and , we have:

Unfortunately, we are not able to get rid of the truncation in this formula, in contrast to Lemma 3.4 in [CDS10] for the uniform margin case. This is partly because our argument relies on the loose estimates of given by Theorem 4.3. Replacing the LHS with    would be the first step in proving Conjecture 3.1.

Next, we conjecture that there exists a phase transition in with respect to the limiting distribution of in the supercritical regime . For the ‘thick bezel case’ , Theorem 2.1 shows that converges in distribution to a geometric random variable. For the ‘thin bezel case’ , we conjecture that it should converge to a normal distribution. Roughly speaking, the sum of terms is asymptotically a normal random variable by Conjecture 3.1. Hence if , then there are not enough terms in this sum to exhibit central limit behavior. Hence the limiting distribution of this sum should be some rescaled version of the marginal distribution of .

To make a more precise conjecture, let be as in Conjecture 3.2. Then we write:

(3.3)

Assuming the summands in the left hand side are asymptotically uncorrelated, taking variance in each side gives

(3.4)

Hence we have at the following conjecture.

Conjecture 3.2.

Fix and . Let be sampled from uniformly at random. Then

(3.5)

Further conjectures and open problems are given in Section 9.

4. Concentration in blocks

As we mentioned in the introduction, Barvinok [Bar10b] introduced the notion of typical table for general contingency tables, which captures some “typical behavior” of the uniform random contingency table of fixed margins. We start with the precise definition:

Definition 4.1 (Typical table).

Fix margins and . Let denote the transportation polytope. For each , define

(4.1)

where the function is defined by

(4.2)

The typical table for is defined by

(4.3)

Here is is the transportation polytope of margins and defined in the introduction. Since the function defined at (4.1) is strictly concave, it attains a unique maximizer on the transportation polytope and thus the typical table is well-defined.

One of the building blocks of our main result is Lemma 4.2, which says that the law of an entry in a large block of random contingency table is attracted toward a geometric distribution, whose mean is dictated by the corresponding typical table. Given the set of   contingency tables of margins and , we call a set of indices a block of    if

(4.4)

Observe that when is sampled from uniformly at random and is a block of , the entries for all have the same distribution by the symmetry. Moreover, the entries of the typical table within a block are the same.

Lemma 4.2.

Let be the set of all   contingency tables of margins and . Let be sampled from uniformly at random. Suppose are blocks in with . Then there exists an absolute constant , s.t. for each and , we have:

(4.5)

where is the typical table for  , and

is the random matrix of independent entries with  

, and  .

Our proof of Lemma 4.2 relies upon the following results of Barvinok, see [Bar10b, Thm 1.7], [Bar09, Thm 1.1], and [Bar09, Lem 1.4].

Theorem 4.3 ([Bar09, Bar10b]).

Fix margins and . Let be the typical table for , and let be the random matrix of independent entries where  . Let    denote the total sum.

(i)

There exists some absolute constant such that

(4.6)
(ii)

conditioned on being in , table is uniform on  .

(iii)

For the constant in (i), we have

(4.7)

In other words, (ii) and (iii) of the above theorem says that the geometric matrix  with mean given by the typical table emulates the uniform random table in with probability at least . Hence on very rare events, we can ‘transfer’ some of the properties of this geometric matrix  to the uniform random contingency table .

Now we are ready to prove the key lemma.

Proof of Lemma 4.2.

Let    blocks    such that  . Let be the typical table for  , and let    denote the random matrix of independent entries where . Observe that we can choose a subset such that and every two elements of have distinct coordinates. Fix measurable sets    and  .

For a   matrix  and  , denote

(4.8)

Note that by the exchangeability of the entries of and in each block of  , variables    and    have the same distribution for all . In particular, we have:

(4.9)

Moreover, since are independent and since every two elements of    have non-overlapping coordinates, it follows that    are also independent.

Now note that from Theorem 4.3 (ii) and (iii), we have:

(4.10)

Also, by the Azuma–Hoeffding inequality, for every fixed , we have:

(4.11)

Hence, by conditioning on whether    is small or large, we get

(4.12)
(4.13)
(4.14)

Since is arbitrary, by absorbing the factor of 4 into , we obtain the result. ∎

Remark 4.4.

Following the arguments in [Bar09, Bar10b], it is not hard to see that a higher dimensional analogue of Theorem 4.3 holds. Namely, replace with and with in the theorem. Of course, the constant then depends on . Then a similar argument will show that a higher dimensional analogue of Lemma 4.2 also holds. Hence most of our main results should hold in higher dimensions. We do not justify this claim in the present paper.

5. Phase transition in the typical table and rate of convergence

Recall the definition of the typical table for given in the introduction. Namely, is the unique maximizer of the function defined at on the transportation polytope . Note that

is defined by the intersection of the hyperplanes in

given by

(5.1)
(5.2)

Note that the gradient is the   matrix , which has 1’s in the -th row and 0’s elsewhere. Similarly, is the   matrix , which has 1’s in the -th column and 0’s elsewhere.

On the other hand, it is easy to see that the gradient of the objective function defined at is given by

(5.3)

Hence by the multivariate Lagrange’s method, when evaluated at the typical table , must be in the non-negative span of ’s and ’s. This gives that there exists some non-negative constants and such that

(5.4)

or equivalently,

(5.5)

Now we consider , where the margins and are given at (2.1). By symmetry, there exist some constants , possibly depending on all parameters, such that

(5.6)

Furthermore, denote and . Then (5.5) gives

(5.7)

Note that the margin condition for reduces to

(5.8)

For a preliminary analysis for the solution of the equations in (5.8), observe that

(5.9)

In particular, as .

The main result in this section is the following lemma, which establishes the phase transition of the typical table and the rate of convergence of its entries.

Lemma 5.1.

Let be the typical table for , where . Let be as above.

(i)

If , then

(5.10)
(ii)

If , then the following expressions

(5.11)

are of order , where the constants in do not grow in .

In the following proposition, we show that if , then the corner entry of the typical table for is uniformly bounded in . We remark that there is a more general result of this type. Namely, [BLSY10, Thm 3.5] states that if the row and columns do not vary much, then the entries of the typical table are uniformly bounded by some constant independent of the size of the table. However, this result gives a sub-optimal lower critical value  . In order to push this threshold up to the desired critical value  , we optimize the proof of [BLSY10, Thm 3.5] for our model.

Proposition 5.2.

In notation above, suppose  . Then:

(5.12)
Proof.

For brevity, denote , , and . Then

(5.13)

Note that

(5.14)

Let us show that

(5.15)

Assume otherwise, that . The above inequality gives  , and we get

(5.16)

a contradiction.

By the definition of and , we have:

(5.17)

In order to upper bound , we consider maximizing the fraction in the left hand side. By (5.9), we know that  . Hence we have the following optimization problem:

(5.18)
(5.19)

It is not hard to see that the objective function is non-decreasing in and non-increasing in . Hence, as , the solution to the above problem approaches the limit . This implies

(5.20)

Now, since    by (5.9), we have:

(5.21)
(5.22)

This implies

(5.23)

This finished the proof. ∎

Proof of Lemma 5.1.

Suppose . By Proposition 5.2, we know that is uniformly bounded in . Hence, from the first equation in (5.8), we obtain:

(5.24)

In particular, .

For , let and be as in (5.7). Recall that and . Also recall that as from (5.9). Hence, we have:

(5.25)

This implies    and    as  . It follows that    as  , which is the correct limit that (i) implies.

In order to obtain the rate of convergence of , first define a function  . Then, since  , it suffices to show

(5.26)

For this end, first note that    and    is a decreasing function in  . Hence by the mean value theorem, for every constant  , we have:

(5.27)

for all sufficiently large  .

Next, write

(5.28)

Since   and , the mean value theorem implies that

(5.29)

for all sufficiently large . Thus (5.9) gives . Since as , this also implies that the second term in (5.28) is of order . For the first term, note that

(5.30)

Since and both and converge as , the above expression is  . Thus , and (5.26) follows from (5.27). This shows (i).

Next, suppose . To show (ii), we first obtain a lower bound on . Note that (5.9) gives , so we have

(5.31)

Then from the first equation in (5.8) and the fact that , we have

(5.32)
(5.33)

Now we derive the limits in (ii). First, note that from (5.33), we have

(5.34)

Since , we must have . Moreover, by (5.9). Hence .

Finally, we derive the rate of convergence. First, using (