Stick-breaking processes, clumping, and Markov chain occupation laws

01/23/2019
by   Zach Dietz, et al.
The University of Arizona
0

We consider the connections among `clumped' residual allocation models (RAMs), a general class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain discrete space time-inhomogeneous Markov chains related to simulated annealing and other applications. An intermediate structure is introduced in a given RAM, where proportions between successive indices in a list are added or clumped together to form another RAM. In particular, when the initial RAM is a Griffiths-Engen-McCloskey (GEM) sequence and the indices are given by the random times that an auxiliary Markov chain jumps away from its current state, the joint law of the intermediate RAM and the locations visited in the sojourns is given in terms of a `disordered' GEM sequence, and an induced Markov chain. Through this joint law, we identify a large class of `stick breaking' processes as the limits of empirical occupation measures for associated time-inhomogeneous Markov chains.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

08/24/2021

On the use of Markovian stick-breaking priors

In [10], a `Markovian stick-breaking' process which generalizes the Diri...
10/14/2019

Drift, Minorization, and Hitting Times

The “drift-and-minorization” method, introduced and popularized in (Rose...
07/15/2020

Stationarity and ergodic properties for some observation-driven models in random environments

The first motivation of this paper is to study stationarity and ergodic ...
11/13/2017

Circularly-Coupled Markov Chain Sampling

I show how to run an N-time-step Markov chain simulation in a circular f...
02/25/2007

Linking Microscopic and Macroscopic Models for Evolution: Markov Chain Network Training and Conservation Law Approximations

In this paper, a general framework for the analysis of a connection betw...
08/19/2019

Beta-Binomial stick-breaking non-parametric prior

A new class of nonparametric prior distributions, termed Beta-Binomial s...
10/31/2019

Recombination: A family of Markov chains for redistricting

Redistricting is the problem of partitioning a set of geographical units...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and summary

In this article, we introduce an intermediate ‘clumped’ structure in residual allocation models of apportionment of a resource, such as Griffiths-Engen-McCloskey (GEM) models. Although this intermediate structure is perhaps of its own interest, through it, we identify the empirical occupation law limits in a class of time-inhomogeneous discrete space Markov chains, associated with simulated annealing and other applications, as new types of stick-breaking processes built from Markovian samples, including Dirichlet processes. On the one hand, GEM models and Dirichlet processes have wide application in population genetics, ecology, combinatorial stochastic processes, and Bayesian nonparametric statistics; see books and surveys [8], [9], [18], [19], [27], [41] and references therein. On the other hand, the time-inhomogeneous Markov chains that we consider are stylized models of simulated annealing and Gibbs samplers or types of mRNA dynamics; see [5], [11], [15], [17], [25], [46]. In a sense, one purpose of the paper is to observe a perhaps unexpected connection between these apriori different objects.

We now discuss some of the relevant background on GEM and Dirichlet measures, and time-inhomogeneous Markov chains, before turning to an informal discussion of our results on the intermediate structure in GEM sequences and their connections with the occupation laws of the Markov chains.

1.1. GEM and Dirichlet measures

Consider the infinite-dimensional simplex

of all all discrete (probability) distributions on

. A residual allocation model (RAM) is a distribution on , introduced in the 1940’s [24] as a means to address problems of apportionment: Let be independent

-valued random variables, called ‘residual fractions’. Consider the associated process

, given by and

see Lemma 3.1 for the induction leading to the last equality. If , the distribution is the associated RAM. In general, need not sum to for a given realization. We note a simple condition equivalent to is that , the case for nontrivial, independent, identically distributed (iid) fractions (cf. Lemma 3.1).

The RAM when the fractions are iid Beta random variables is the well-known Griffiths-Engen-McCloskey GEM model. There are many characterizations and studies of the GEM sequence and its variants in recent years. For instance, the GEM model is the unique RAM with iid fractions that is invariant in law under size-biased permutation. Also, the GEM sequence is the unique invariant measure of ‘split and merge’ dynamics. In addition, there are important connections with Poisson-Dirichlet models. See for instance, among others, [1], [2], [10], [14], [20], [28], [29], [30], [35], [38], [39], [40], [42], and references therein.

Moreover, the GEM sequence is a fundamental building block of Dirichlet processes, which often serve as a measure on priors in Bayesian nonparametric statistics [18], [19]. With respect to a measurable space , consider the space of probability measures endowed with -field generated by the sets for and . We say that is a random probability sample from the Dirichlet process, with ‘parameters’ and probability measure on , if for any finite partition

that the vector

has the Dirichlet distribution with parameters .

The ‘stick breaking’ representation of the Dirichlet process with parameters , in terms of a GEM sequence , and an independent sequence of iid random variables with common distribution , is given by

(1.1)

There is a large literature on Dirichlet processes stemming from the seminal works [4], [16]. See [40], [45] with respect to the ‘stick breaking’ construction, and books [18], [19], [36], [41] for more on their history, other representations including that with respect to the ‘Chinese restaurant process’, and their use in practice.

In this article, we will concentrate on discrete spaces , that is those composed of either a finite or a countably infinite number of elements. We note, when is finite, and and for , the property that is given by a Dirichlet distribution was first stated in a population genetics context in [12]; see also [26].

1.2. Time-inhomogeneous Markov chains

Let be a generator kernel on , that is for , and . Suppose the entries of are suitably bounded so that the kernel

(1.2)

is a stochastic kernel for all large enough, and set otherwise. Let be the time-inhomogeneous Markov chain on the discrete space associated to kernels . Consider without zero rows. Then, every point in represents a valley from which the chain rarely but almost surely exits to enter another point valley. In this way, a certain ‘landscape’ is explored. The chain can be considered as a simplified model of simulated annealing or metastability (cf. [6], [17], [31], [37], [46]). From another view, continuous-time variants of such inhomogeneous chains have been used in the modeling of certain mRNA dynamics [25].

Interestingly, for finite , it was noted in [17] and [46] that the sample means of these chains do not converge a.s. or in probability, as would be the case for a homogeneous Markov chain. For generators without zero entries, weak convergence to an empirical occupation law

(1.3)

was identified by computing its moments in

[11]. Curiously, when is of the form for and

a stochastic matrix with constant rows

, it was also shown that is a Dirichlet distribution with parameters by matching the moments. Similar occupation laws were also derived in the continuous-time mRNA model in [25] as the stationary distributions of a promoter process on states, influencing levels of mRNA production.

In this context, part of our motivation is to understand this limit and its generalizations more constructively (Theorem 2.12).

1.3. Clumped structure and generalized ‘stick-breaking’ processes

We now describe a class of generalized stick-breaking processes. Let be a GEM sequence and, to be focused, let be an independent Markov chain with irreducible, recurrent transition kernel on a discrete space with initial distribution , although we also consider more general Markov chains, not necessarily irreducible or composed only of recurrent states, in several of our results.

Another motivation of ours is to understand the random measures

(1.4)

seen as a natural generalization of stick-breaking representation of the Dirichlet process, with respect to Markovian samples instead of the iid ones in (1.1).

In general, is not exchangeable in the sense that the GEM sequence may not be replaced by an arbitrary permutation without changing the measure. In contrast, when is iid and is the Dirichlet process, such an exchangeability property holds; for example, the Poisson-Dirichlet order statistics of may be used instead without changing the Dirichlet process (cf. [40]). We also note that other generalizations of Dirichlet processes have been considered, among them, Polya tree [33], Pitman-Yor [40], [43], and Beta processes [7].

We now introduce a clumped intermediate structure which will help analyze . Suppose are the times when the Markov chain jumps to a different state with the convention . In particular, ‘skip-repetition’ is allowed: The chain can begin in state , jump to at time , and then may jump back at time into state . We note that these times are not only those times when a state is observed for the first time, as used in the definition of size-biased permutations.

Consider for . We show that (cf. Theorems 2.4 and 2.7), conditional on the locations , the sequence is a RAM where the associated fractions are Beta for , a sort of ‘disordered’ GEM. Also, the law of can be computed as another Markov chain on with a transition kernel found in terms of . We will call the joint law of as a type of Markov Chain conditional GEM, or ‘MCcGEM’ distribution.

In terms of the clumped intermediate structure, we see that

(1.5)

This representation will allow us to identify as the limit of occupation laws of a matched time-inhomogeneous Markov chain (Theorems 2.12, 2.13).

We will also see that satisfies a ‘self-similarity’ equation (cf. Theorem 2.17), uniquely characterizing its distribution. This equation is reminiscent of the regenerative structure present in ‘stick-breaking’ [45], in integral constructions of the Dirichlet processs [32], [44], and in other related settings [21], [22].

Moreover, when

is finite, we discuss the joint moments of the distribution in Theorem

2.19. Although a formula for the moments is given in [11], the description in Theorem 2.19 is more detailed, allowing identification of the marginal distributions as Beta products (cf. Theorem 2.18 and Corollary 2.20).

1.4. Occupation laws of time-inhomogeneous Markov chains

With respect to the time-inhomogeneous Markov chain with kernels (1.2), starting from initial distribution , consider the random empirical occupation measure on ,

To connect with the intermediate clumping structure from the previous section, we will again implement a clumping procedure, this time to investigate local occupations, or clumped occupations, of the empirical measure of up to time .

However, in a Markov chain with kernels , later clumps of the chain are typically larger than earlier clumps. To keep the clump sizes from tending to zero after normalization, we consider the clumps in reverse chronological order, starting from time , so that the clumped occupations converge nontrivially in distribution.

Formally, let be the successive times when the Markov chain changes state, and let . Going backwards from time , let be the length of the last visit to state , be the length of the visit to state , and be the length of the visit to for . Let also and for . In addition, define for .

The figure below depicts, in a realization, the clumping boundaries marked in forward times, and the lengths of local occupations given backwards in time starting from time .

1

Then, is written as

We show (cf. Theorem 2.10), for generators satisfying natural conditions, conditionally on the values , that the distributions of converge, as , to a disordered GEM with parameters given in terms of and . Also, converges, as , to a homogeneous Markov chain , with transition kernel in terms of and . In particular, the joint law of and converges, as , to a Markov Chain conditional GEM distribution, denoted as the MCcGEM distribution with respect to .

In Theorem 2.12, we will then be able to show that converges to a random measure given in terms of and either in ‘stick-breaking’ or ‘clumped’ forms (1.4), (1.5). In particular, when where is a constant stochastic matrix with identical rows , the associated sequences and simplify, and the limit is identified in Subsection 2.2.2 as a Dirichlet process. Returning to one of our motivations, we comment that when is finite these results represent a more constructive view of the limits (1.3) found in [11].

Organization of the paper. We develop notions, make remarks, and state the main results, Theorems 2.4, 2.7, 2.10, 2.12, 2.13, 2.17, 2.18, and 2.19, in this order, in Section 2. Proofs are then given in Section 3.

2. Statement of results

We now formalize notation and state our main results, and related remarks about them, in several subsections. Throughout, we will use the convention that empty sums equal , and empty products are . Also, , , and . The notation signifies that the vector is in row form.

2.1. RAMs, GEMs and MCcGEM laws

A residual allocation model (RAM) is a way of defining a random probability measure on by iteratively assigning a random portion of the unassigned probability remaining to the next integer.

Definition 2.1 (Residual Allocation Model - RAM).

Let be a collection of independent -valued random variables. Define

(2.1)

Then, if is a.s. a probability measure on , that is if , we say is a RAM. If consists of iid fractions, and the associated is a RAM, we say is a self-similar RAM.

Consider now the following identity, verified in Lemma 3.1: For an arbitrary sequence of numbers and ,

(2.2)

Then, the sequence in (2.1) satisfies for (cf. Proposition 3.2). Accordingly, we have the useful observation that is a RAM exactly when .

A specific, well-known example of a RAM is the Griffiths-Engen-McCloskey (GEM) sequence.

Definition 2.2 (Gem).

Fix . Let be a sequence of iid variables with common distribution Beta. Then, the self-similar RAM , constructed from , is said to be a GEM distribution.

Also, consider a sequence of positive numbers, and let be a sequence of independent random variables where for . When the measure , found in terms of , is a RAM, we will say it is a disordered GEM sequence with parameters .

Now, in a RAM , one can clump adjacent probabilities with respect to an increasing sequence , marking boundaries of clumps, to form a new probability measure on .

Definition 2.3 (Clumped measure).

Let be an increasing sequence in with and , and let be a RAM. We clump according to to construct a new probability measure on where, for ,

We remark, when takes the value infinity at an entry in the sequence, necessarily is a distribution supported on .

An immediate question now is when is also a RAM. We will show that is always a RAM as long as is deterministic. However, the situation is more involved when a random sequence is used for the clumping.

Specifically, we will be interested in two types of random clumping sequences constructed from a Markov chain on the discrete space . The first sequence comes from considering clumps of repeated values in ; that is, will keep track of the times when switches values. The second sequence arises in considering the times when returns to its initial value .

For example, if is observed, we define and . More formally, Let and, for , set

(2.3)

In the case that reaches an absorbing state, denoted , the chain is eventually constant and is eventually infinite. In the case that is a transient state, the chain returns to the first state finitely many times and eventually takes the value infinity.

Define now by for . When does not reach an absorbing state, we think of as the sequence of values taken by without repetition. If however meets an absorbing state , will eventually be constant at value .

In the following theorem, a reader may like to focus on first pass on the case when possesses no absorbing states and formulas simplify.

In what follows, we will say that a sequence is a ‘possible’ sequence for a Markov chain on if the event has positive probability for each .

Theorem 2.4 (Clumped RAMs).

Let be a RAM. Fix an increasing sequence in with and . Then,

  • is a RAM with respect to fractions where

Let now be a Markov chain, independent of and with homogeneous transition kernel .

  • Then, the sequence is a Markov chain with homogeneous transition kernel given by

Let be a possible sequence in with respect to . Let be a possible sequence in with respect to .

  • Then, and are RAMs.

  • Also, if is self-similar, is a RAM and, when is a recurrent state with respect to , is a self-similar RAM.

We remark that the specifications of the fractions and their distributions in items (4) are given in the proof of Theorem 2.4. These specifications, in the case when is a GEM sequence, are part of Theorem 2.7.

Also, in item (4) above, we note that the self-similarity of is important to deduce in full generality that is a RAM. Later, in Example 2.9, we see that may not be a RAM if is not a self-similar RAM.

In addition, we observe that in item (4), when is a transient state, the sequence eventually takes constant value since is visited only a finitely many times a.s. Given is a nontrivial variable, cannot be iid. However, one may consider an iid sequence , say on a different probability space, where , and check that the self-similar RAM formed from fractions has the same distribution as .

We now consider the clumping procedures with respect to a GEM distribution . It will be convenient to define the notion of a generator kernel or matrix, these terms used interchangeably.

Definition 2.5 (Generator kernel).

Let be a square matrix on . We say that is a generator kernel if it satisfies for and . In addition, we will assume a boundedness condition, .

Every matrix of the form , where and is a stochastic kernel on , is a generator matrix. Moreover, we claim that every generator matrix can be (non-uniquely) decomposed in this fashion: The final condition in Definition 2.5 ensures that all entries are bounded, , so that a normalizing can be found.

We also observe that a generator matrix has a zero row, that is for some , exactly when is an absorbing state for a corresponding . In particular, when does not have zero rows, any corresponding does not have absorbing states.

We now formally define the notion of a Markov Chain conditional GEM (MCcGEM) joint distribution on the space

, endowed with the product topology and product -field formed in terms of the Borel -fields on and . This topology is discussed more in Subsection 3.4. By convention, we will say that a Beta random variable equals a.s.

Definition 2.6 (MCcGEM distribution).

With respect to a generator matrix , let be a homogeneous Markov chain with initial distribution and transition kernel on given by

(2.4)

Consider variables , on the same probability space as , such that Beta and are independent. Define where for , and observe that is a disordered GEM with parameters (see below).

We say that the pair has MCcGEM distribution with respect to .

To see that is a disordered GEM, we need only observe that is a probability distribution on . Here, a.s. exactly when diverges a.s. As the tail -field is trivial, the opposite is the summability a.s. By Kolmogorov’s -series theorem, and that is composed of Beta random variables on with means

and variances dominated by the means, almost sure summability holds exactly when

. For a generator matrix , this is never the case as the terms are uniformly bounded above.

We now describe a relation between GEM distributions and MCcGEM laws through clumping with respect to a homogeneous Markov chain.

Theorem 2.7 (GEM to MCcGEM).

Let and be GEM distribution. Let also be an independent homogeneous Markov chain with kernel and initial distribution . Recall the associated switch times , the clumped distribution , and the Markov chain near (2.3).

Then, is a homogeneous Markov chain with kernel and is a disordered GEM with parameters , that is has MCcGEM distribution with respect to .

Some cases of interest are developed in the following examples.

Example 2.8.

Suppose GEM and that is a homogeneous Markov chain with stochastic kernel where has constant diagonal entries, for . By Theorem 2.7, is a disordered GEM sequence with parameters . However, since , we conclude does not depend on and is actually a GEM sequence. In this case, the pair consists of independent sequences.

More generally, suppose is any random distribution on . Then, indeed, with respect to this Markov chain , by the proof of Part (4) of Theorem 2.4 (cf. (3.8)), the fractions do not depend on , and so .

Example 2.9.

We now consider a RAM constructed from independent fractions for . Such a RAM is a member of the well-known 2-parameter GEM family, here with GEM. Let be a sequence of iid Bernoulli variables. Thought of as a Markov chain on the -state space , every entry of the stochastic kernel of equals . By the discussion in Example 2.8, as the diagonal entries of are the constant , we have .

We now observe that is not a RAM: If it were a RAM, consider the associated non-atomic fractions (cf. Part (1) of Theorem 2.4). Compute

Then, , , and . Hence, , and so the non-atomic fractions are not independent, and cannot be a RAM.

2.2. Clumping and time-inhomogeneous Markov chains

Of course, the notion of clumping can be applied to random probability measures on , which are not RAMs. In particular, to capture the empirical occupation law limit of a Markov chain, we study its local occupations, or clumps of the sequence indexed in time, as it explores the space . As noted in the introduction, we will look at these local occupations in reverse order.

Let be a Markov chain on the discrete space , without absorbing states. Recall the definition of the switching times (cf. (2.3)), and let index the first switch after time . For and , define

Also, set

(2.5)

and . Consider the sequences and .

As a concrete example, consider an observation . Then for , the local occupations are summarized by eventually constant sequences and . Similarly, when , we have and . For a more general depiction, please refer to the figure in Section 1.4.

Hence, for , we have generally that

In the middle of the display, we see the average Markov chain occupation of state in the first steps. On the right-hand side, the sum is over local occupations, or clumps, of state , seen in the chain through steps. The notion suggested by this relation, part of the genesis of this article, is that we may study the limit average occupation law of by investigating the limit of the pair describing local occupations.

We now focus on a class of time-inhomogeneous Markov chains for which the limits of have succinct representation. Specifically, we consider inhomogeneous Markov chains with transition kernels , where is a generator matrix with no zero entries on the diagonal. A finite space case where was taken to have no zero entries at all was studied in [11]; see also [15], [5] for related developments.

In these chains, the clump lengths are typically growing with , unlike for homogeneous Markov chains. In particular, rather than an ergodic theorem, it was shown in [11] (cf. (1.3)) that the occupation laws converge weakly to a nontrivial distribution. Here, we consider a countable space generalization, allowing for reducibility and transient states, and formulate a characterization of these occupation limits through the reversed clumping device described above.

In the following statement, we say that a matrix is non-negative if all its entries are non-negative. Additionally, weak convergences here are in the sense of finite-dimensional distributions, the natural sense associated to the product space endowed with the product topology.

Theorem 2.10 (Time-inhomogenous MC to MCcGEM).

Let be a generator matrix on without zero rows. Let and be such that both , and define . Let also be a stochastic vector and be a stationary distribution of so that entry-wise,

(2.6)

Define kernels by

(2.7)

and let be the inhomogeneous Markov chain with transition kernels and initial distribution . Define as above with respect to , and also define the generator matrix by

(2.8)

Then, converges weakly to the homogeneous Markov chain with kernel and initial distribution . Also, for a possible sequence of , we have converges weakly to a disordered GEM sequence with parameters . Therefore, the associated pairs converge weakly to with MCcGEM distribution with respect to .

Example 2.11.

In the context of Example 2.8, suppose has constant diagonal entries . Then, the local occupations of the inhomogeneous Markov chain would converge to a GEM distribution, not just conditionally in terms of a MCcGEM distribution.

We now characterize the limit occupation law of in a ‘stick-breaking’ form with respect to either a MCcGEM distribution, or a paired GEM distribution and homogeneous Markov chain. In the following, weak convergence of is with respect to the discrete topology on , the space of probability measures on .

Theorem 2.12 (Occupation laws to MCcGEM and stick-breaking measures).

Consider the setting and assumptions of Theorem 2.10. Observe that is a stationary distribution of , and let be the homogeneous and stationary Markov chain with kernel and initial distribution . Let be a GEM sequence independent of .

Then, , where

(2.9)

In a sense, reversing the procedure, starting from the stick-breaking process , we may identify it as the limit of the occupation measure of a matched time-inhomogeneous Markov chain, almost a corollary of Theorem 2.12.

Theorem 2.13 (Stick-breaking measures to Occupation laws).

Let and is a GEM sequence. Let also be a stochastic matrix without absorbing states and with stationary distribution . Suppose is an independent homogeneous Markov chain with kernel starting from .

Then,

where is the occupation law defined with respect to an inhomogeneous Markov chain , as in the setting of Theorem 2.10, with respect to generator matrix , starting from any distribution satisfying entry-wise. Here, and are given by where , and .

In the next two subsections, we discuss remarks on Theorems 2.10 and 2.12, and a case when the random measure is a Dirichlet process.

2.2.1. Remarks

We now make several comments on Theorems 2.10 and 2.12.

1. Although we have specified that has no zero rows in Theorems 2.10 and 2.12, and therefore no absorbing states for , one can extend some of the statements trivially to the case when there are absorbing states. In particular, when the limit is the unit point mass at an absorbing state of , we have and . Then, the state is also an absorbing state for the inhomogeneous Markov chain , reached in finite time a.s. starting from . Also, the chain , starting from , is the constant sequence of ’s. In addition, the limit of is , and tends to a.s. We conclude that converges weakly to , a GEM with constant fractions . Moreover, the empirical distribution of the chain converges weakly to . We also observe that , and also both equal in distribution.

2.

There is a degree of freedom in picking a pair

. However, when specifying a MCcGEM distribution, each valid pair corresponds to the same generator matrix in this context. On the other hand, this family of pairs of a GEM distribution and Markov chain, indexed in , will have different joint distributions, although they all correspond to a single measure . We explore this notion in the case of Dirichlet processes in Subsection 2.2.2 below.

3. The convergence (2.6) is a condition on the structure of positive recurrent states of the homogeneous Markov chain run with kernel . Since the limit is a stationary distribution with respect to , the chain must have a positive recurrent state, and is positive only on such states. The initial distribution must be such that observation of a positive recurrent state occurs with probability 1.

In general, depends on when there is more than one irreducible class of positive recurrent states. We note, along with positive recurrent states, there may also be null recurrent and transient states associated with .

In the case that has a single class of positive recurrent states, then will be the unique stationary distribution associated with and will not depend on .

It could be that has an infinite number of null recurrent or transient states, in addition to positive recurrent states. But, the requirement that be stochastic means that the chain cannot visit a null recurrent state or remain indefinitely on transient states a.s. This reflects that the limit of corresponds to the long time average occupations of states in .

4. Any null recurrent or transient state of the chain run with corresponds to a zero row of or in other words an absorbing state for the chains and . However, such absorbing states are never visited by : The initial distribution is a stationary distribution of , which vanishes on these states. Moreover, as is also a stationary distribution of , the chain can only move on the positive recurrent states of , the states .

Similarly, starting from , the chain moves only on states , given that when either or