# Exchangeable Bernoulli distributions: high dimensional simulation, estimate and testing

We explore the class of exchangeable Bernoulli distributions building on their geometrical structure. Exchangeable Bernoulli probability mass functions are points in a convex polytope and we have found analytical expressions for their extremal generators. The geometrical structure turns out to be crucial to simulate high dimensional and negatively correlated binary data. Furthermore, for a wide class of statistical indices and measures of a probability mass function we are able to find not only their sharp bounds in the class, but also their distribution across the class. Estimate and testing are also addressed.

## Authors

• 5 publications
• 2 publications
04/26/2020

### Learning and Testing Junta Distributions with Subcube Conditioning

We study the problems of learning and testing junta distributions on {-1...
12/28/2019

### Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms

Evaluating data separation in a geometrical space is fundamental for pat...
09/19/2008

### Generalized Prediction Intervals for Arbitrary Distributed High-Dimensional Data

This paper generalizes the traditional statistical concept of prediction...
01/20/2020

### Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

It is known that any target function is realized in a sufficiently small...
01/15/2020

### i-flow: High-dimensional Integration and Sampling with Normalizing Flows

In many fields of science, high-dimensional integration is required. Num...
01/29/2020

### Geometrical bounds for the variance and recentered moments

We bound the variance and other moments of a random vector based on the ...
10/13/2020

### Probabilistic characterization of the effect of transient stochastic loads on the fatigue-crack nucleation time

The rainflow counting algorithm for material fatigue is both simple to i...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

De Finetti’s representation theorem asserts that if we have an infinite sequence of exchangeable Bernoulli variables, they can be seen as a mixture of independent and identically distributed Bernoulli variables. Using this representation it is possible to easily estimate, test and simulate exchangeable Bernoulli variables also in high dimension, one reason why they are widely used in applications. The only drawback of De Finetti’s representation is that it requires an infinite sequence. A finite form of De Finetti’s theorem has been given in [2], based on the geometrical structure of the class of -dimensional exchangeable Bernoulli variables. In fact, is proven to be a -dimensional simplex and therefore each probability mass function in has a unique representation as a mixture of the extremal points. This result has been extended in [4]

where it is proved that the class of multivariate Bernoulli probability mass function with some given moments is a convex polytope, i.e. a convex hull of extremal points. Furthermore,

[3] provides an analytical expression of the extremal probability mass functions under exchangeability. The representation of exchangeable Bernoulli probability mass functions as points in a convex polytope and the ability of explicitly finding the extremal points are key steps in simulating also in high dimension, estimating and testing.

In this paper we study the statistical properties of and , i.e. the class of -dimensional exchangeable Bernoulli probability mass functions with given mean , building on their geometrical structure. We show that we can find the values of a wide class of measures as convex combinations of their values on the extremal points. As a consequence we are not only able to find their extremal values on the class, but we can also numerically find their distribution across the class.

Two important issues in the statistical literature are the simulation of high dimensional binary data with given correlation and the simulation of negative correlated binary data. The geometrical structure of

allows us to easily construct parametrical families of probability mass functions able to cover the whole correlation range. We can therefore select a multivariate Bernoulli probability mass function with any given correlation in the whole range of admissible correlations. This overcome the limit of the models used in the literature to simulate exchangeable binary data, that only cover positive correlations. This is not a big issue in high dimension, since at the limit negative correlation in not possible, but it comes out to be a limitation for lower dimensions: as an example the three dimensional exchangeable Bernoulli random variables with

admit negative correlations up to . Furthermore, in lower dimensions the geometrical structure also turns out to be crucial to perform uniform sampling from convex polytope, that is limited only by the amount of computational effort required. Uniform sampling can be used to find the distribution across the class of general statistical indices, that cannot be expresses as combinations of their extremal values.

The explicit form of the extremal points is important in estimate and testing, which are also addressed. We find the maximum likelihood estimator for a probability mass function in the classes and

and provide a generalized likelihood test for the null hypothesis of exchangeability or exchangeability with a given mean.

The results presented can be extended to the more general framework of partially exchangeable multivariate Bernoulli distributions. To give an overall idea, we show that partially exchangeable Bernoulli distributions can also be seen as points in a convex polytope, but we leave their investigation to future research.

The paper is organized as follows. Section 2 introduces the polytope of exchangeable Bernoulli distributions and studies the distribution of statistical indices and measures across the class. Section 3 addresses high dimensional simulations and discusses some applications. Section 4 finds the maximum likelihood estimator of exchangeable distributions using the representation of a probability mass function as linear combinations of the extremal points and provides a generalized likelihood ratio test for exchangeability. Section 5 opens the way to the generalization of this work to partially exchangeable Bernoulli distributions.

## 2 Exchangeable Bernoulli distributions

Let and be the classes of -dimensional Bernoulli distributions and of -dimensional exchangeable Bernoulli distributions, respectively. Let be the class of exchangeable Bernoulli distributions with the same Bernoulli marginal distributions . If

is a random vector with joint distribution in

, we denote

• its cumulative distribution function (cdf) by

and its probability mass function (pmf) by ;

• the column vector which contains the values of over , by ; we make the non-restrictive hypothesis that the set of binary vectors is ordered according to the reverse-lexicographical criterion. For example and . We assume that vectors are column vectors;

• the expected value of as , and we denote ;

• by the set of permutations on .

Let us consider a pmf . Since, by exchangeability, for any , any mass function in defines if and , . Therefore we identify a mass function in with the corresponding vector .

Let be the class of distributions of with . The pmf is a discrete distribution on . Let and .

The map:

 H:Ed(p)→Sd(p)fj→pj=(dj)fj. (2.1)

is a one-to-one correspondence between and . Therefore we have

 Ed(p)↔Sd(p) (2.2)

The paper [3] also proves that the class of distributions coincides with the entire class of discrete distributions with mean , say .

Therefore the three classes , and are essentially the same class, i.e.

 Ed(p)↔Sd(p)≡Dd(dp). (2.3)

Thanks to the above result we can look for the generators of to find the generators of which are in one-to-one relationship with the generators of . This approach simplifies the search.

The relationship between exchangeable and discrete distributions is more general, since in the same way it can be proved that

 (2.4)

Furthermore, by exchangeability the moments depend only on their order, we therefore use to denote a moment of order , where . For example we have . We also observe that the Pearson’s correlation between two Bernoulli variables and is related to the second-order moment as follows

 μ2=ρpq+p2. (2.5)

For the sake of simplicity we write or meaning that its distribution belongs to or , respectively. Analogously for or .

### 2.1 Polytope of Exchangeable Bernoulli distributions

We recall that a polytope (or more specifically a -polytope) is the convex hull of a finite set of points in called the extremal points of the polytope. We say that a set of points is affinely independent if no one point can be expressed as a linear convex combination of the others. For example, three points are affinely independent if they are not on the same line, four points are affinely independent if they are not on the same plane, and so on. The convex hull of affinely independent points is called a simplex or -simplex. For example, the line segment joining two points is a 1-simplex, the triangle defined by three points is a 2-simplex, and the tetrahedron defined by four points is a 3-simplex.

The class of discrete distributions on is the -simplex , with extremal points , . By means of the equivalence and the map we have that the class is a -simplex. In 2006, [2] proved that has extremal points , where is the measure

 g′j(x)=⎧⎨⎩1(dj)if#{xh:xh=1}=j0otherwise.

The same can be obtained by inverting the map .

In [3], the authors proved that the class of discrete distributions on with mean , , is a -polytope, i.e. for any there exist summing up to 1 and such that

 pY=np∑j=1λjrj. (2.6)

We call , the extremal points or the extremal densities. The -polytope is the set of solutions of a linear system. As a consequence we can find the support of the extremal rays and also their analytical expression. The following two propositions are proved in [3].

###### Proposition 2.1.

Let us consider the linear system

 Az=0,z∈Rd+1+ (2.7)

where is a matrix, and . The extremal rays of the system (2.7) have at most non-zero components.

###### Proposition 2.2.

The extremal rays in (2.6) have support on two points with , , is the largest integer less than and is the smallest integer greater than pd. They are

 rj(y)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩j2−pdj2−j1y=j1pd−j1j2−j1y=j20otherwise. (2.8)

If is integer the extremal rays contain also

 rpd(y)={1y=pd0otherwise. (2.9)

If is not integer there are ray densities. If is integer there are ray densities.

Using the equivalence a pmf in is a pmf on with mean . Thanks to the map in Eq. (2.1) this is also equivalent to state that is a -polytope, i.e. for any there exist summing up to 1 and such that

 f=np∑j=1λjej. (2.10)

We call the extremal points of . The map allows us to explicitly find the extremal points . They are:

 ej(x)=⎧⎪⎨⎪⎩rj(k)(dk)if#{xh:xh=1}=k0otherwise. (2.11)

We denote by and the random variables whose pmfs are and respectively and by and the random variables whose pmfs are and respectively. We will refer to , , and as extremal ray densities. If it clear from the context, we will omit extremal for the sake of simplicity.

The following proposition is a consequence of the geometrical structure of the class of exchangeable distributions and their sums. It allows us to have an analytical expression for a wide class of statistical indices defined as functionals on and , such as all the moments of the Bernoulli exchangeable distributions.

###### Proposition 2.3.
1. Let [] and let be a measurable function. Then

 E[ϕd(X)]=m∑i=1λiE[ϕd(Ej)], (2.12)

where are the extremal rays of [].

2. Let [] and let be a measurable function. Then

 E[ϕ(Y)]=m∑i=1λiE[ϕ(Rj)], (2.13)

where are the extremal rays of [].

###### Proof.

We only proof part 1, because part 2 is analogous.

It holds

 E[ϕd(X)]=∑x∈χdϕd(x)f(x)=∑x∈χdϕd(x)m∑i=1λjrj(x)=m∑i=1λiE[ϕd(Ej)]. (2.14)

Functionals defined on a class of -dimensional distributions, , , where has pmf , are commonly used in applications to define measures of risk [10]. We call expectation measures the measures defined by expectations and by their one-to-one transformations. Proposition 2.3 states that measures defined by expectation of the mass functions in a given class are themselves a convex polytope whose extremal points are the measures evaluated on the extremal rays of the class. Therefore, they have bounds on the extremal points. Examples of such functionals in our framework are:

1. Moments and cross moments of distributions in or and moments of discrete distributions in or .

2. The entropic risk measure on or for :

 Γ(f)=1γlog(E[e−γY]), (2.15)

where has pmf .

3. Excess loss function on

or , defined by

 Φ(Y)=E[(Y−k)+],k∈R, (2.16)

where .

4. Von Neumann-Morgestern expected utilities on or . These are expectation measures, where the function is some increasing utility function.

Notice that also the entropic risk measure has its bounds on the extremal points, since the logarithm is a monotone transformation.

We focus on the cross -moments . We have

 μα=d∑k=α(d−αk−α)(dk)pk,

where . We recall that the second order moment of the class are given by

 E[Y2]=E[(X1+…+Xd)2]=pd+d(d−1)μ2. (2.17)

In the next section, using Proposition 2.3 and convexity, we numerically find the distribution of the above measures across the class of pmf where they are defined.

### 2.2 Expectation measure distributions

We want to determine the distribution where is a random variable which has been chosen uniformly at random from or . Since classes of multivariate Bernoulli distributions with pre-specified moments, as e.g. , are polytopes, the representation of each mass function as a convex linear combination of the extremal points is not unique. Therefore we have to perform a triangularization of . To reduce dimensionality we aim to work on . This can be done using the map in (2.1) under the exchangeability condition for . As a consequence the distribution of can be studied in if ; we make this assumption in this section. We perform a triangularization of that is equivalent to perform a triangularization of the polytope . The dimension of is because is defined by two constraints. We can partition into simplices (e.g. using a Delaunay triangulation)

 C=⋃i∈ITi, (2.18)

where is a proper set of indices, and for . We observe that from a geometric point of view the intersection between two simplices and is not empty, being the common part of their borders. But this common part has zero probability of being selected and so we can neglect it assuming that each coincides with its interior part.

Let , where , . Let us denote by the distribution of , where is a random variable with pmf from or . We get

 Fϕ(t)=P(E[ϕ(Yp)]≤t)=∑i∈IP(Ti)P(E[ϕ(Yp)]≤t|Ti). (2.19)

If we assign a uniform measure on the space the probability of sampling a probability mass function in the simplex is simply the ratio between the volume of and the total volume of , i.e.

 P(Ti)=vol(Ti)vol(C). (2.20)

The volume of can be easily computed because the volume of an -simplex in -dimensional space with vertices is

 ∣∣∣1n!det(v1−v0,v2−v0,…,vn−v0)∣∣∣

where each column of the determinant is the difference between the vectors representing two vertices [8].

The probability is the ratio between the volume of the region and the volume of , i.e.

 P(E[ϕ(Y)]≤t|Ti)=vol(Ri,t)%vol(Ti). (2.21)

The computation of the volume of will depend on the definition of in the expectation measure .

We now consider the -order moments of the random variable whose pmf is denoted by

 μ(Y)k=E[Yk]=d∑i=0ikpY(i)=d∑i=0ikpi

From Eq. (2.1) we have and for and .

The pmf will lie in exactly one of the simplices . Let’s denote this simplex by . We can write , where is the set of indexes that defines the subset of rays which generate the simplex , i.e. . As a corollary of Proposition 2.3 we can write

 μ(Y)k=∑j∈J∗λjμ(j)k

where are the -moments of the ray random variables , . For -order moments the region is the subset of the standard simplex defined as . For -order moments the ratio of the volumes in Eq. (2.21) can be computed using an exact and iterative formula, see [9] and [1].

The same methodology can be applied also to other measures defined as function of expected values like the entropic risk measures.

## 3 High dimensional simulation

The representation of Bernoulli pmfs as convex combinations of ray densities allows us to sample from high dimensional exchangeable Bernoulli distributions in . The simplest way to do that is to generate directly from a ray density. Ray densities are extremal and have support on two points, we can also use combinations of rays to choose a distribution in the interior of the polytope. As an example we could choose in (2.10). Such a choice identifies a pmf inside the polytope. Nevertheless, the ray densities allow us to simulate from a family of distributions that cover the whole range spanned by a measure of dependence. Let and the maximum and minimum values of an expectation measure. If and are two corresponding ray densities, the parametrical family span the whole range

. We can therefore use this family to simulate a sample of binary data from a pmf with a given value of the measure or we can simulate binary variables with different values of the measure, by moving

. An important example is correlation. The problem to simulate from multivariate Bernoulli distributions with given correlations and in particular negative correlations is of interest in many applications and is addressed in the statistical literature, see [7]. The geometrical structure of allow us to solve this problem for exchangeable random vectors. In fact, we can choose a pmf with the required correlation simply by finding two ray densities in with the minimum and maximum of allowed correlations, say and respectively. The pmfs in the family span then whole correlation range. Simulation of a pmf in this family is easy and can be performed in high dimension. We provide general algorithm to simulate high dimensional exchangeable Bernoulli distribution from , given the vector in (2.6) or equivalently in (2.10). The vector can be chosen as we discussed above or it can be randomly chosen by giving a distribution on the simplex in (2.6). Once

is selected, it represents a probability distribution on the set of ray densities, i.e.

is the probability to extract the ray density . Given the pmf on the set of extremal rays, the following algorithm allows us to simulate in high dimension:

We observe that the Algorithm does not require to store any big structure and then it can be easily used for large , e.g. .

This case is interesting because the families of multivariate Bernoulli variables commonly used for simulation of exchangeable binary variables incorporate only positive correlation. We consider here two families of exchangeble Bernoulli models used in the literature to simulate high dimensional binary data. The first family is proposed in [5] and we term it family of one-factor models, taking the name from the one-factor models used in credit risk, that have a similar dependence structure. The second family is the mixture model based on De Finetti’s representation theorem.

The construction proposed in [5] provides an algorithm to generate binary data with given marginal Bernoulli distributions with means and exchangeable dependence structure, meaning that they are equicorrelated. However, by assuming that the marginal parameters are equal to a common parameter , their construction gives a vector . This is the case considered here. We therefore define the multivariate Bernoulli variable in this framework. Let

 Xi=(1−Ui)Yi+UiZ,i=1,…,d, (3.1)

where , , and and they are independent. We say that and its pmf have a one-factor structure. Clearly, is exchangeable, have distribution and correlation . By construction we have and the case implies that , put all the mass on .

According to De Finetti’s Theorem if is the pmf of a random vector extracted from an exchangeable sequence then has the representation

 f(j)=(dj)∫10pk(1−p)d−kdΨ(p),

where is a pdf on . Clearly, these vectors can only have positive correlations. One of the most used mixed Bernoulli model is the -mixing models, where: is the mixing variable. In this case we have

 p=E[Ψ]μ2=E[Ψ2].

Therefore we estimate the parameters and by solving the equations

 p=aa+bμ2=a(a+1)(a+b)(a+b+1).

For this model is not admissible, therefore the model cannot include independence. Notice that the model has two parameters and, chosen a -mixing model in it can be parametrized by .

In [3] the authors analytically found the correlation bounds for exchangeable pmf for each dimension and the minimum attainable correlation. The bounds for correlations are:

• if is not integer

 1d(d−1)[−jM1(jM1+1)+2jM1pd]−p2p(1−p)≤ρ≤1. (3.2)
• If is integer

 −1d−1≤ρ≤1. (3.3)

In both cases the minimal correlation goes to zero if the dimension increases, according to De Finetti’s representation theorem. Therefore, the capability to generate binary data with negative correlations is more important in low dimensions, where we are also able to perform uniform sampling, as discussed in the next section.

### 3.1 Uniform simulation

Let’s start considering uniform sampling from . From Eq.(5.1) we know . It follows that sampling uniformly at random from is equivalent to sampling uniformly at random from and then it is equivalent to sampling uniformly at random from the -simplex , which is a standard topic in the statistical literature.

Let’s now consider uniform sampling from . From Eq.(2.3) we know . It follows that sampling uniformly at random from is equivalent to sampling uniformly at random from and then is equivalent to sampling uniformly at random from the polytope . We therefore have to consider the triangularization of in (2.18). We can consider sampling from as a special case of sampling from . For this purpose when sampling from we denote the d-simplex by , . We have . Uniform sampling allow us to find the empirical distribution of some statistical indices that are not expectation measures. In fact, there are famous measures that are not expectation measures and for which Proposition 2.3 does not apply. For example the

-quantile or value at risk of a balanced portfolio of exchangeable Bernoulli variables, widely used in applications as a measure of risk. Let

, the -quantile at level is defined by

 qα(Y)=inf{y∈R:P(Y≤y)≥α}.

The -quantile is not a convex measure, nevertheless [3] proved the following proposition.

###### Proposition 3.1.

Let . Then

 minRqα(R)≤qα(Y)≤maxRqα(R),

where has pmf that is a ray densities of .

Another famous measure defined on classes of distribution is the entropy. For a discrete pmf it is given by:

 E(pY)=d∑i=0pilogpi.

The entropy does not satisfy Proposition 2.3 and does not reach its bound on the ray densities. We therefore are not able to use the geometry of a pmf in to numerically find its distribution across the . However, we can address this goal using simulations. For other measures for which the exact value of the ratio in (2.21) cannot be computed, an estimate of it can be simply obtained by sampling uniformly at random over and determining the relative frequency of the points that fall in

 ˆ(vol(Ri,t)vol(Ti))=#{pk∈Ri,t,k=1,…,N}N

where is the size of the sample. In these cases an estimate of the distribution will be obtained.

### 3.2 Applications

We present two applications where we will study different scenarios. The first application is in dimension , the polytope is -dimensional and we can explicitly see the triangularization. The second example is performed in higher dimensions, , where the structure of the polytope is more complex.

#### 3.2.1 Application 1

We study:

1. the exact distribution of the second-order moment in , the class of exchangeable distributions of dimension and mean ;

2. a family of pmfs for simulations that span the whole range of correlation in ;

3. the sampling distribution of the entropy in the same class ;

4. the joint distribution of the first-order moment and correlation in , the class of exchangeable distributions of dimension .

The ray densities of are the columns in Table 1.

The ray densities are points in which lie in a subspace of dimension

. Using standard Principal Component Analysis we can project these 4 points to

. The points inside the polygon in Figure 1 (left side) represent all the densities which belong to .

The right side of Figure 1 shows the triangularization. We have two triangles: , the largest one with area and , the smallest one with area . Then, the sampling probabilities are and .

For -order moments the region is the subset of the standard simplex defined as . We recall that in this case the ratio of the volumes in Eq. (2.21) can be computed using an exact and iterative formula, see [9] and [1].

Figure 2 (left side) exhibits the exact numerical cumulative distribution function (cdf) of across . The distribution of across is similar. The cdf of across the whole polytope is obtained by mixing the conditional cdfs as in (2.19).

Figure 2

- right side - shows the probability density function (pdf) of the mixture obtained from the cdf of

by , where has been chosen equal to .

The pmfs in the family span then whole correlation range and can be simulated according to the Algorithm proposed, Figure 3 shows the family in the polytope together with the families of -mixture models, with mean and of one-factor models, given . It is evident that our approach allows us to consider also pmfs with negative correlations (green straight line in the figure), while the other two approaches provide only positive correlations (blue and red lines in the figure). In dimension three the range of negative correlation is wide, as evidenced in the figure.

The simulated pdf of the entropy in the class , where the entropy of is defined to be the entropy of , is found using the methodology in Section 3.1 and it is shown in Figure 4.

The simulated pdf can obviously also be found for the second order moment . It is shown in Figure 5 for completeness. The simulated pdf is obviously in agrement with the exact numerical one (right side of Figure 2).

Figure 6 shows the simulated bivariate distribution of the mean and the correlation across . The joint behaviour of and is in accordance with the theoretical bounds found in [3] and recalled in (3.2) and (3.3). In this case, where , the minimal correlation is and it is attained for and .

#### 3.2.2 Application 2

We study:

1. the distribution of the second-order moment in , the class of exchangeable distributions of dimension and mean ;

2. the distribution of the entropy in the same class ;

The ray densities of are 12 and are given in Table 2.

The ray densities are points in which lie in a subspace of dimension . Using standard Principal Component Analysis we can project these 12 points to . We have 38 tetrahedra, 28 of which have almost zero volume.

We proceed as in the previous application to find the -order moment distribution across the polytope. We first compute its distribution across each tetrahedron using an exact and iterative formula and then we mix the conditional cdfs as in (2.19). Figure 7, left side, shows the pdf of the mixture obtained from the exact numerical cdf of .

The simulated pdf of the entropy in the class is found using the methodology in Section 3.1 and it is shown in the right side of Figure 7.

## 4 Estimate and testing

### 4.1 Maximum likelihood estimation

We focus on the maximum likelihood estimation in the classes of exchangeable distributions and exchangeable distributions with given margins, and respectively.

###### Proposition 4.1.

A maximum likelihood estimator of () always exists.

###### Proof.

() is a closed convex sets in , hence it is compact and the likelihood functions for the models in (4.6) are continuous. ∎

The maximum likelihood estimator (MLE) in the class can be found analytically using the map in (2.1).

Let us assume to observe a sample of size drawn from a -dimensional Bernoulli distribution and let have pmf that gives rise to counts . The count has a multinomial distribution with parameters , i.e. , where the parameter belongs to the -simplex . The likelihood function is

 L(n;p)=P(Ni=ni,i∈I)=(nn0⋯nd)d∏j=0(pj)nj, (4.1)

where we set . The MLE is the solution of the constrained maximization problem

 maxplogL(n;p),subd∑i=0pj−1=0 (4.2)

By using the Lagrange multipliers we find:

 ^pj1=Njn.

The MLE in the class is

 ^fj=^pj(dj)=Njn(dj),i=0,…,d

We now consider the class . The MLE estimator in can be numerically found by solving the constrained maximization problem:

 maxplogL(n;p),subd∑i=0pj−1=0d∑i=0jpj−pd=0, (4.3)

Therefore to have the MLE in we use the map again.

We can also use a direct approach and look for the MLE in . Let us assume to observe a sample of size drawn from a -dimensional Bernoulli distribution with pmf that gives rise to counts , where . Then the count has a multinomial distribution with parameters , i.e. , where the parameter belongs to the -simplex . The likelihood function is

 L(n;fχ)=P(Ni=ni,i∈I)=(nn1⋯n