# Ergodic properties of some Markov chains models in random environments

We study ergodic properties of some Markov chains models in random environments when the random Markov kernels that define the dynamic satisfy some usual drift and small set conditions but with random coefficients. In particular, we adapt a standard coupling scheme used for getting geometric ergodic properties for homogeneous Markov chains to the random environment case and we prove the existence of a process of randomly invariant probability measures for such chains, in the spirit of the approach of Kifer for chains satisfying some Doeblin type conditions. We then deduce ergodic properties of such chains when the environment is itself ergodic. Our results complement and sharpen existing ones by providing quite weak and easily checkable assumptions on the random Markov kernels. As a by-product, we obtain a framework for studying some time series models with strictly exogenous covariates. We illustrate our results with autoregressive time series with functional coefficients and some threshold autoregressive processes.

## Authors

• 6 publications
• ### Stationarity and ergodic properties for some observation-driven models in random environments

The first motivation of this paper is to study stationarity and ergodic ...
07/15/2020 ∙ by Paul Doukhan, et al. ∙ 0

• ### Markov chains in random environment with applications in queueing theory and machine learning

We prove the existence of limiting distributions for a large class of Ma...
11/11/2019 ∙ by Attila Lovas, et al. ∙ 0

• ### Coupling and perturbation techniques for categorical time series

We present a general approach for studying autoregressive categorical ti...
07/31/2019 ∙ by Lionel Truquet, et al. ∙ 0

• ### Longitudinal network models and permutation-uniform Markov chains

We offer a general approach to modeling longitudinal network data, inclu...
08/12/2021 ∙ by William K. Schwartz, et al. ∙ 0

• ### Autoregressive Modeling of Forest Dynamics

In this work, we employ autoregressive models developed in financial eng...
11/20/2019 ∙ by Olga Rumyantseva, et al. ∙ 0

• ### Exponential inequalities for nonstationary Markov Chains

Exponential inequalities are main tools in machine learning theory. To p...
08/27/2018 ∙ by Pierre Alquier, et al. ∙ 0

• ### Random autoregressive models: A structured overview

Models characterized by autoregressive structure and random coefficients...
09/17/2020 ∙ by Marta Regis, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let and be two Polish spaces and for any , we consider a Markov kernel on . We denote by the Borel sigma-field of . Let also a stochastic process taking values in Our aim is to construct stationary Markov chains in a random environment defined by

 P(Yt∈A|X,Yt−1,Yt−2,…)=PXt−1(Yt−1,A),t∈Z. (1)

The term Markov chain in random environments (MCRE) comes from the fact that the process defines a time-inhomogeneous Markov chain conditionally on an exogenous process called the random environment. Such a definition has already been used in many previous contributions on the topic. See in particular Cogburn [5], Orey [13] or Kifer [11]. However, such references are devoted to quite specific structures, with either discrete state spaces or Doeblin type conditions. Recently, Doukhan et al. [10] investigated the case of observation-driven models with unbounded state spaces. This specific class of models were studied without assuming a small set assumption for the Markov kernels. For some of their examples, their assumptions are not always easy to check or require restrictive conditions on the noise density while the existence of a small set could be more easily obtained. An interesting and recent contribution to MCRE that satisfy both drift and small set conditions can be found in Lovas and Rásonyi [12]

. In particular, the authors proved some law of large numbers for such MCRE when the system is initialized at a given deterministic value. The present paper is mainly motivated by time series analysis and we will study another type of problem, the existence of stationary and ergodic solutions for (

1) which is one of the crucial tricky point to check when studying stationary time series and their statistical inference. We will also use different type of assumptions for the random Markov kernels, some of them being weaker.

Note that in (1), is automatically independent from conditionally on
. In econometrics or time series analysis, such a conditional independence assumption is related to a standard notion called strict exogeneity. In this context, past information has an influence on the present value but the latter one will not impact the present or the future values of the exogenous process . See for instance Chamberlain [3] for a discussion about the various equivalences between some exogeneity notions used in econometrics.

There exist numerous applications to the study of ergodic properties of MCRE. Lovas and Rásonyi [12]

provided examples in queuing theory, Machine learning or for linear autoregressive processes with random coefficients. One can more generally study non-linear autoregressive processes of the form

 Yt=f(Xt−1,Yt−1,εt), (2)

with i.i.d. errors independent of the process . One can then extend some classical time series models usually studied using homogeneous Markov chains theory to more realistic models. We believe that such extensions are fundamental since in the applied statistical literature, the use of exogenous covariates for time series analysis is almost systematic. We will discuss applications of our results to some of these models.

Finally, we mention that the conditional distribution given in (1) only depends on and not on . We choose such a formulation mainly for compatibility with time series analysis, where the present value of the process can be predicted only using past values of the observations. For a theoretical analysis of Markov chains in random environments, one can always replace the process by the shifted process to go back to the situation where the conditional distribution only depend on and not on so that both formulations are equivalent. Additionally, if the conditional distribution given in (1) has to depend on the whole past of the exogenous process, one can simply replace the single variable

by the new random variable

and the state space of the environment by the product space to go back to our initial formulation.

The paper is organized as follows. In Section 2, we give our main result when the drift and small set conditions are obtained from one iteration of the chain. An extension to more general chains is given in Section 3 whereas two examples of autoregressive processes satisfying our conditions are given in Section 4. The proofs of our results can be found in Section 5. Finally an Appendix section 6 contains the proof of an important lemma used in the previous section.

## 2 Assumptions and ergodicity result

For measurability issues, we impose the two following conditions. For any , the mapping is measurable. Obviously for any pair , has to be a probability measure on . In what follows, we denote by the set of measurable mappings such that , with for . The following assumptions will be used.

A1

The process is stationary.

A2

There exist a measurable mapping such that for all and two elements and of such that for all , . Moreover

 limsupnn∏i=1λ(X−i)1/n<1 a.s. (3)
A3

There exist a measurable mapping such that for any , one can find a probability kernel from to such that

 Px(y,A)≥η(R,x)νR(x,A),(x,y,A)∈F×V−1([0,R])×B(E).

#### Notes

1. Assumption A2 imposes a drift condition for the Markov kernels with some varying coefficients and

. The required conditions for these coefficients are quite weak, e.g. existence of logarithmic moments. Moreover, under Assumption

A1, condition (3) is automatically satisfied under ergodicity of the environment and the Lyapunov coefficient condition . Indeed, in this case, one can use Birkoff’s ergodic theorem to get

 n∏i=1λ(X−i)1/n=exp(1nn∑i=1log(X−i))→exp(E[log(λ(X0))])<1.

Such a condition is natural since for the simple case of a real-valued autoregressive process with random coefficients

 Yt=a(Xt−1)Yt−1+εt (4)

with i.i.d ’s, the condition is known to be optimal for getting existence of a non-anticipative and stationary solution. See Bougerol and Picard [1], Theorem .

2. In the case of a stationary but not necessarily ergodic environment, one can still apply the ergodic theorem and the condition (3) simply writes as a.s. where is the set of shift-invariant measurable sets in such that , where denotes the shift operator defined by and denotes the sigma-algebra . It is difficult to find a more explicit condition in the latter case, except if we impose the more restrictive condition a.s. which is the classical condition used for studying stability of Markov chains with a deterministic environment. However, the random environment case offers more flexibility by allowing the varying coefficient to exceed one for some realizations of the exogenous process.

3. Condition (3) is generally weaker than the long-time contractivity condition used in Lovas and Rásonyi [12]. With our notations, the latter condition writes as

 ¯¯¯γ:=limsupn→∞E1/n(b(X0)n∏t=1λ(Xt))<1 (5)

When , using stationarity, we deduce that there exist such that when is large enough. Using Markov’s inequality, we deduce that there exists such that and from the Borel-Cantelli lemma, we get a.s. which is equivalent to our condition. Note also that when the coordinates of are independent, (5) reduces to while our condition writes as which is weaker in general if we use Jensen’s inequality.

4. Using the classical terminology used for homogeneous Markov chains, Assumption A3 entails that any level set of the drift function is a small set for the kernels . This assumption is then more restrictive than the minorization condition of Lovas and Rásonyi [12] who only assumed existence of small set of the form where the value depends on the drift parameters and the constant defined in (5). However, inspection of the proof of Theorem 1 shows that we only require a large value of for the small set, see (16), but this value depends in a complicated way of the process , this is why we prefer to use A3. On the other hand, we do not impose any specific condition related to the behavior of the minorization function near , such as the assumption used by Lovas and Rásonyi [12]. As a consequence, for the simple autoregressive process (4), it is straightforward to get A3 when the noise term has a positive density lower-bounded on any compact set, while the condition of Lovas and Rásonyi [12] seems to be difficult to get when is unbounded and without additional restriction on the noise density.

We then get the following result.

###### Theorem 1.

Suppose that Assumptions A1-A3 hold true. There then exists a stationary process satisfying (1) and the distribution of such a process is unique. If in addition the process is ergodic, the process is also ergodic.

When the environment is ergodic, our result entails the following strong law of large numbers for the unique stationary solution. If with a measurable mapping such that , then a.s.

On the other hand, contrarily to Lovas and Rásonyi [12], we do not provide a weak law of large numbers when , where , , denotes the iterations of chain (1) initialized with . Such a result could need more technical details and we did not investigate it.

One can also show that converges in total variation to as where is the marginal distribution of our stationary solution. We discuss this point just after the statement of Corollary 1, see (5.4). However, our assumptions do not help to get a rate of convergence for as it is done in Lovas and Rásonyi [12]. We believe that a precise rate could be more difficult to get only using (3).

## 3 Extension to more general chains

As in Lovas and Rásonyi [12], we now assume that the drift/small set condition is obtained after a given number of iterations of the Markov kernels. This kind of extension is natural when we face to time-inhomogeneous Markov chains. In what follows, we recall that the product of two Markov kernels on is the Markov kernel defined by , . More precisely, we will assume the existence of a positive integer such that the following assumptions will be satisfied.

A4

There exist measurable mappings and such that for ,

 [Px1⋯Pxp]V≤λ(xp,xp−1,…,x1)V+b(xp,xp−1,…,x1).

Moreover , and

 limsupn→∞n−1∏k=0λ(X−1−kp,…,X−(k+1)p)1/n<1 a.s. (6)
A5

There exist a measurable mapping such that for any , one can find a probability kernel from to such that

 [Px1⋯Pxp](y,A)≥η(R,xp,…,x1)ν(xp,…,x1,A),(x,y,A)∈Fp×V−1([0,R])×B(E).

We then get the following result which generalizes Theorem 1.

###### Theorem 2.

Suppose that Assumptions A1 and A4-A5 hold true. There then exists a stationary process satisfying (1) and the distribution of such a process is unique. If in addition the process is ergodic, the process is also ergodic.

#### Note.

Assume and set for , . Under the integrability conditions given in A4 and when the process is stationary and , Birkoff’s ergodic theorem implies that

 limn→∞n−1∏k=0λ(U−1−kp)1/n=exp(E[log(U−1)|X−1(I)]),

With the set of measurable subsets of that are invariant for , i.e. . Now, if the process is only assumed to be ergodic, the conditional expectation appearing in the limit above is not necessarily an expectation. This differs from the case because a subsequence of an ergodic process is not necessarily ergodic. Let us mention that we obtain the limit when is mixing, i.e. for every pair of measurable subsets in , we have

 limn→∞P(X∈A,θnX∈B)=P(X∈A)P(X∈B). (7)

Indeed, if we apply (7) for and with , we obtain that is necessarily or . The mixing property (7), which is stronger than ergodicity, is satisfied for instance when the process is mixing. Hence, when the process is mixing, (6) holds true as soon as . We defer the reader to Samorodnitsky [15], Chapter , for a nice and concise introduction about the ergodicity and mixing concepts and with a summary of the various links existing between these notions.

## 4 Examples of autorgressive processes

In this section, we present two classical examples of autoregressive processes for which the stability properties are usually established using Markov chains techniques. We directly present a version with exogenous regressors and show that our results can be used to study these extensions. Let us mention that our examples are mainly illustrative and we argue that our results allow to extend most of the non-linear autoregressive time series models usually studied with Markov chain techniques. We defer the reader to Douc et al. [8], Section , for additional examples. In these two examples, we consider two independent stochastic processes and taking values respectively in and . We further assume the process is stationary and ergodic and that the s are i.i.d. The class of function was defined at the beginning of Section 2.

### 4.1 Threshold autoregressive process

Let , , five measurable functions. Set for ,

 Yt=(b1(Xt−1)+a1(Xt−1)Yt−1)1Yt−1≤r(Xt−1)+(b2(Xt−1)+a2(Xt−1)Yt−1)1Yt−1>r(Xt−1)+εt. (8)

Set .

###### Proposition 1.

Assume that , the distribution has a positive density lower-bounded on any compact subset of , for and that . There then exists a unique stationary and ergodic solution to equations (8).

#### Note.

When are deterministic, a process defined by (8) is well known and called threshold autoregressive process. See Tong [17] or Tsay [18]. Extension to a modeling with exogenous covariate has recently been investigated by Doukhan et al. [10], see their Section . However, we use here a much weaker assumption on the noise density and then substantially improve their stationarity result for this model.

### 4.2 Functional coefficients autoregressive time series

We consider the following model which is an extension of the functional coefficients autoregressive model of

Chen and Tsay [4], see also Cai et al. [2]. Let be a positive integer.

 Yt=p∑j=1aj(Xt−1,Yt−1,…,Yt−p)Yt−j+εt,t∈Z. (9)

Let us first introduce some notations. For and , set which is assumed to be finite and

 A(x)=(b1(x)⋯bp(x)\omit\span\omitIp−10p−1,1),

with

being the identity matrix of size

and

the null column vector of size

. Note that is the companion matrix associated to polynomial . Finally let

 γ(X)=infn≥11nE(log∥A(X−1)⋯A(X−n)∥),

where denotes an arbitrary norm on the space of real matrices of dimension .

###### Proposition 2.

Assume that , the distribution has a positive density lower-bounded on any compact subset of and that . Assume furthermore that the process is mixing in the sense of (7). There then exists a unique stationary and ergodic solution to equations (8).

#### Note.

is called Lyapunov exponent of the sequence of random matrices . It is not straightforward to get a more explicit condition for the negativity of this coefficient. For a general sequence of random matrices , negativity of the Lyapunov exponent is a classical condition used for defining stationary solution of random affine transformations on , where is a stationary process. See [1]. A more explicit sufficient condition can be obtained if satisfies , since in this case the spectral radius of the companion matrix associated to is less than one. Note that is the classical condition used in the model without covariates. See Douc et al. [8], p. .

## 5 Proof of the results

Our approach for proving Theorem 1 is inspired by that of Kifer [11]. Denoting , we prove the existence of some random probability measures such that for any ,

 πξt−1PXt:=∫πξt−1(dy)PXt(y,⋅)=πξt a.s.‘ (10)

To define such random probability measures, we impose that the mapping satisfies the definition of a probability kernel from to . In Stenflo [16], random probability measures satisfying (10) are called randomly invariant.

Following Kifer [11], natural candidates for are given by the almost sure limits of the backward iterations when where denotes the Dirac mass at point . For simplicity of notations, set . We remind that the product of two Markov kernels and on is the Markov kernel on defined by for and . To prove existence of such a limit, a possible approach is to get a control of the total variation distance

 supA∈B(E)∣∣δzQω0,n(A)−δz′Qω0,n(A)∣∣ (11)

for two initial state values and in and a fixed . Note that by stationarity of the process , it is only necessary to prove the existence of . Such a path-by-path control will obtained from a coupling argument detailed below. In the rest of the section, we assume that Assumptions A1-A3 are satisfied.

### 5.1 Coupling strategy

For a given and a positive integer , we define a probability measure on in the following way. First, for , we denote by and the coordinate mappings, i.e.

 Yt((y−n+j,¯¯¯y−n+j)j≥0)=yt,¯¯¯¯Yt((y−n+j,¯¯¯y−n+j)j≥0)=¯¯¯yt.

We then assume that and for , we define this probability measure as the distribution of a Markov chain defined as follows. For two real numbers and , we set . In what follows, we consider a positive real number that will be chosen latter (see the formula (16) given below).

• On the event , we set

 ¯¯¯¯Pn,ω(Yt∈A,¯¯¯¯Yt∈¯¯¯¯A|Yt−1,¯¯¯¯Yt−1)=PXt−1(ω)(Yt−1,A∩¯¯¯¯A⋅).
• On the event , we set

 ¯¯¯¯Pn,ω(Yt∈A,¯¯¯¯Yt∈¯¯¯¯A|Yt−1,¯¯¯¯Yt−1)=PXt−1(ω)(Yt−1,A)PXt−1(ω)(Yt−1,A).
• Finally, on the event , we set

 ¯¯¯¯Pn,ω(Yt∈A,¯¯¯¯Yt∈¯¯¯¯A|Yt−1,¯¯¯¯Yt−1) = η(R,Xt−1(ω))νR(Xt−1(ω),A∩¯¯¯¯A) + (1−η(R,Xt−1(ω)))QXt−1(ω)(Yt−1,A)QXt−1(¯¯¯¯Yt−1,¯¯¯¯A),

where for , and such that ,

 Qx(y,A)=Px(y,A)−η(R,x)νR(x,A)1−η(R,x).

This coupling scheme is classical for getting some bounds for geometric ergodicity of homogeneous Markov chains. See for instance Douc et al. [7] or Rosenthal [14] who applied such a technique to Markov chains satisfying both a drift and a small set condition. Let us mention that some bounds for controlling some quantities similar to (11) are also available for time-inhomogeneous Markov chains. However, one cannot use them in our context because the drift parameters considered in Douc et al. [7] are assumed to be less than one, which is not necessarily the case for the varying parameter we use in the present paper.

Let us give an interpretation of the proposed coupling scheme and for simplicity. First note that is the distribution of a non-homogeneous Markov chain such that under this probability measure the two coordinate processes and are both time-inhomogeneous Markov chains with successive transition kernels . In the first point, we see that when equals to , the two next states also coincide and the common next state is simulated with the Markov kernel . When the two previous states are different, two situations can occur. As explained in the second point, when or is outside the small set, the two next states are simulated independently from each other with the same transition kernel . On the other hand, when both previous states are inside the small set, the two next states are equal with probability and the common next state is a realization of the dominating measure or, with probability , the two next states are simulated independently form each other and with the same Markov kernel .

Going back to our original goal, one can note that

 supA∈B(E)∣∣δzQω0,n(A)−δz′Qω0,n(A)∣∣≤Pn,ω(Y0≠¯¯¯¯Y0)

due to dual expression of the total variation distance in term of coupling, i.e. for two probability measures and on ,

 supA∈B(E)∣∣μ(A)−μ′(A)∣∣=inf{P(U≠U′):U∼μ,U′∼μ′}.

### 5.2 Proof strategy

In the time-homogeneous case, the previous coupling approach can be used in the following way. See in particular Rosenthal [14] for a more detailed discussion and specific results. We then first assume that the coefficients and are deterministic and we write instead of . Let , , the successive random times (starting here from time ) such that . We have for an arbitrary integer ,

 ¯¯¯¯Pn(Y0≠¯¯¯¯Y0)≤¯¯¯¯Pn(Tm≥n)+P(Tm

Since on the event , we have a probability greater than to get a coalescence of the path, we deduce that

 ¯¯¯¯Pn(Y0≠¯¯¯¯Y0)≤¯¯¯¯Pn(Tm≥n)+ηm.

It then remains to bound the probability which can be obtained from the drift condition if is large enough. However, in the case of random environments, there are substantial difficulties due to the functions that take either very large or very small values for some time points, depending on the environment. This is why we use an approach comparable to that of Doukhan et al. [10]. In particular, we will consider some random time points, only depending on the environment and for which the function remains lower bounded. Moreover, these successive random time points are sufficiently spaced, so that the drift parameters of the corresponding subsampled chain remain under control. The effect of the coupling will be then analyzed only at these random time points. The goal of the next subsection is to introduce such random time points.

### 5.3 Control of the random environment

Our aim here is to define suitable random times only depending on the process . The following result will be central for this goal. We denote by the set of positive integers. In what follows, we denote by the mathematical expectation associated to .

###### Proposition 3.

There exist two random variables and an increasing sequence of random times , such that the following statements are valid.

1. , and for , , a.s.

2. If , let and such that . We then have

 ¯¯¯¯En,ω[V(Yτi(ω))|Yτi(ω)−s]≤(1−1/C1(ω))V(Yτi(ω)−s)+C1(ω),
 ¯¯¯¯En,ω[V(¯¯¯¯Yτi(ω))|¯¯¯¯Yτi−1(ω)−s]≤(1−1/C1(ω))V(¯¯¯¯Yτi−1(ω)−s)+C1(ω).
3. Setting , we have , a.s.

4. and . Moreover if , then

 limn→∞Lnn>0P-a.s.

To prove Propostion 3, we first state a lemma. Let be a pair of positive integers and with

 A2,C={x∈FN:η(2C1(2C1+1),x0)≥1/(C2+1)}

and

 A1,C1={x∈FN:supj≥C1j∏i=1λ(xi)≤1−1/C1,b(x1)+∑i≥2i−1∏k=1λ(xk)b(xi)≤C1}.

Clearly, is an element of the sigma-algebra generated by the cylinders set on and from Birkoff’s ergodic theorem, we have

 limn→∞1nn∑i=01AC(ξi)=limn→∞1nn∑i=11AC(ξ−i)=P(ξ0∈AC|X−1(I))P% -a.s.
###### Lemma 1.

The following assertions hold true.

1. .

2. Set . There exists a pair of positive, integer-valued random variables such that for almost all ,

 limn→∞1nn∑i=01AC(ω)(ξi(ω))=limn→∞1nn∑i=11AC(ω)(ξ−i(ω))=ρC(ω)(ω)>0.

#### Proof of Lemma 1

1. Let us prove the first point. Let . Condition (3) guarantees the existence, for almost of a positive integer such that

 supj≥˜Cλ(X−1(ω))⋯λ(X−j(ω))≤1−1/˜C. (12)

Let be the first positive integer such that (12) occurs. Next, we show that for almost , there exists a positive integer such that

 b(X−1(ω))+∑i≥2λ(X−1(ω))⋯λ(X−i+1(ω))b(X−i(ω))≤C4. (13)

To show (18), we use the Cauchy criterion. First, from the log-moment assumption on and the Borel-Cantelli lemma, we note that a.s. and using (3), we also get

 limsupi→∞[λ(X−1)⋯λ(X−i+1)b(X−i)]1/i<1 a.s.

which yields to (18). By taking , we see that for almost , there exists a positive integer such that and the first point of the lemma follows.

2. Since the sequence of sets is increasing with respect to and a fixed and the function is positive, we have a.s.

 limC1→∞limC2→∞ρC = limC1→∞P(ξ0∈A1,C1|X−1(I)) = P(ξ0∈∪C1≥1A1,C1|X−1(I))=1,

where the last equality follows from the first point. Hence, for almost every , there exists a pair of positive integers such that . Indeed, if it was not the case, there would exist a subset of with positive probability and such that for any and any pair of positive integers , , which contradicts the limiting property given just above. One can always select in such a way that defines a random variable. To this end, one can simply take an ordering of and that the first pair of integers such that . Moreover, since the set of pair of positive integers is countable, there exists an event with probability such that for any pair of positive integers and ,

 limn→∞1nn∑i=01AC(ξi(ω))=limn→∞1nn∑i=11AC(ξ−i(ω))=ρC(ω).

Such a limit is then also valid for which proves the result.

We now proceed to the proof of Proposition 3.

#### Proof of Proposition 3

We define the successive random times and such that for almost , with being the random variable defined in the statement of Lemma 1. We next define the sequence of random times as follows. We set for any positive integer and for any non-negative integer . Note that we have a.s. for any integer . Setting

 Mn=max{i≥1:˜τ−i≥−n},

we note that and from the second point of Lemma 1, we get

 liminfn→∞Mnn