# Higher-order ergodicity coefficients

Ergodicity coefficients for stochastic matrices provide valuable upper bounds for the magnitude of subdominant eigenvalues, allow to bound the convergence rate of methods for computing the stationary distribution and can be used to estimate the sensitivity of the stationary distribution to changes in the matrix. In this work we extend an important class of ergodicity coefficients defined in terms of the 1-norm to the setting of stochastic tensors. We show that the proposed higher-order ergodicity coefficients provide new explicit formulas that (a) guarantee the uniqueness of Perron Z-eigenvectors of stochastic tensors, (b) provide bounds on the sensitivity of such eigenvectors with respect to changes in the tensor and (c) ensure the convergence of different types of higher-order power methods to the stationary distribution of higher-order and vertex-reinforced Markov chains.

• 3 publications
• 23 publications
07/10/2019

### Higher-order ergodicity coefficients for stochastic tensors

Ergodicity coefficients for stochastic matrices provide valuable upper b...
04/12/2017

### Higher-order clustering in networks

A fundamental property of complex networks is the tendency for edges to ...
11/12/2021

### Higher-Order Coverage Errors of Batching Methods via Edgeworth Expansions on t-Statistics

While batching methods have been widely used in simulation and statistic...
07/17/2020

### Perturbation Bounds for Orthogonally Decomposable Tensors and Their Applications in High Dimensional Data Analysis

We develop deterministic perturbation bounds for singular values and vec...
06/10/2016

### Incoherent Tensor Norms and Their Applications in Higher Order Tensor Completion

In this paper, we investigate the sample size requirement for a general ...
02/15/2020

### Higher order co-occurrence tensors for hypergraphs via face-splitting

A popular trick for computing a pairwise co-occurrence matrix is the pro...
09/26/2019

### Shifted and extrapolated power methods for tensor ℓ^p-eigenpairs

This work is concerned with the computation of ℓ^p-eigenvalues and eigen...

## 1 Introduction

is entrywise nonnegative and such that for all . This implies that leaves the simplex invariant. A classical fixed point result from Brouwer [21] thus implies that there exists at least one stationary distribution for the Markov chain described by . In other words, there exists at least one eigenvector , such that has nonnegative entries that sum up to one. Brouwer’s theorem holds in general for mappings leaving a closed convex set invariant. For the specific case of stochastic matrices, however, much more can be said. In particular, if the Markov chain described by is ergodic, then has a unique positive eigenvector in which corresponds to the eigenvalue , the magnitude of any other eigenvalue of is strictly smaller than one and the power method converges to , for any choice of , with a convergence rate depending on the largest sub-dominant eigenvalue. In a way, these properties characterize the concept of ergodic chain and the so-called ergodicity coefficients were introduced to estimate whether or not a Markov chain is ergodic without resorting to spectral properties [20, 35].

Hypermatrices, or tensors, with modes

are a natural generalization of matrices. For example, a tensor with two modes is a matrix, whereas a tensor with three modes is a “cube” (or a “box” if dimensions are different for different modes). The multi-dimensional nature of tensors naturally gives rise to a variety of eigenvalue problems. In fact, the classical eigenvalue and singular value problems for a matrix, can be generalized to the tensor setting following different constructions which lead to different notions of eigenvalues and singular values for tensors, all of them reducing to the standard matrix case when the tensor has

modes, see e.g., [17] and Chapter 2 of [31]. In this work we focus on tensors having three modes, , all of them having the same dimension. This kind of tensors is attracting a growing interest due to their appearance in higher-order stochastic processes arising in the mathematical modeling of certain dynamics and ranking schemes based on random walks in complex networks [3, 4, 18, 29]. By extending the wide and influential literature on ergodicity coefficients for matrices, we introduce a family of higher-order ergodicity coefficients for stochastic cubical tensors and discuss how these allow to derive new conditions on the existence, uniqueness and computability of stationary distributions for different type of higher-order stochastic processes described by the tensors.

In a purely linear algebraic terminology, our new conditions allow to prove guarantees for existence, uniqueness and computability of so-called

-eigenvectors of stochastic tensors of order three. Eigenvectors of nonnegative tensors appear in many contexts dealing with high dimensional data, see e.g.,

[1, 4, 23, 24, 27, 31, 38] and references therein. While we focus here on -eigenvectors of stochastic tensors, we believe the results here presented can be further extended to more general eigenvector problems for nonnegative tensors and thus offer new insight into the developing Perron–Frobenius theory for tensors and multilinear maps.

The reminder of the paper is structured as follows: We fix the relevant notation in the next section. In Section 3 we review the concept of higher-order Markov chain, its associated -eigenvector stationary distribution and relevant existence and uniqueness results for that eigenvector. In Section 4 we recall the concept of ergodicity coefficient for a stochastic matrix and some of its properties. Then, in Section 5, we introduce our new higher-order ergodicity coefficients for stochastic cubical tensors and we prove our main results. Finally, in Section 6 we show how these apply to higher-order Markov chains, dominant -eigenvectors and how they compare with previous results. In particular, after recalling the concept of vertex-reinforced and spacey random walks, we introduce in Subsection 6.2 a general family of Markov processes with memory that includes the spacey random walk as particular case and we prove a new convergence result for this general stochastic process. Whereas, in Section 7, we show how the proposed results can be used and how they compare with previous work for two example application settings: the computation of the multilinear PageRank and the convergence analysis of the shifted higher-order power method.

## 2 Notation

Throughout this paper, we adopt the following notations. Let be the

-th canonical basis vector in

and let be the all-ones vector. Define the sets and . A real cubical tensor of order (or, equivalently, with modes) is a three-way array with real entries of size . We denote by the set of such tensors and use capital bold letters to denote its elements. The -th entry of is denoted by . Matrices are tensors with only modes and are denoted with standard capital letters.

Given a tensor , several tensor-vector product operations can be defined. We write to denote the tensor-vector multiplication over the second and third modes. Namely, denotes the vector entry-wise defined by

 (Pxy)i=n∑j,k=1Pijkxjyk

for . Moreover, the product denotes the matrix associated to the linear map , that is,

 (1) (Px)ij=n∑k=1Pijkxk.

A -eigenvalue of a tensor is a real number such that there exists a nonzero vector such that . That vector is a -eigenvector associated to , see [31].

There are possible transpositions of a tensor , each corresponding to a different permutation of the set . Using the notation proposed in [34], the transposed tensor corresponding to the permutation can be denoted by , namely,

 (P⟨π⟩)ijk=Pπ(i),π(j),π(k).

As it will be of particular importance to us, we devote the special notation to denote the tensor obtained by transposing the entries of over the second and third modes, namely

 PS=P⟨[132]⟩,(PS)ijk=Pikj.

All inequalities in this work are meant entry-wise. In particular, we write (resp., ) to denote a tensor such that (resp., ) for all indices . A tensor is said to be column stochastic or simply stochastic, if and its first mode entries all sum up to one, i.e.,

 n∑i=1Pijk=1∀j,k=1,…,n.

A tensor acting as the identity on the unit sphere can be defined in the case of tensors with an even number of modes, see [24]. For tensors with three modes we define the following two left and right “one-sided identity” tensors:

 (2) ELijk=δijandERijk=δik,for all i,j,k=1,…,n.

Both and are stochastic tensors and for all and one has and . Indeed,

 (ELvx)i=∑jkELijkvjxk=∑jkδijvjxk=vi∑kxk=vi

and similarly for . Note that, letting for any , it holds for all .

## 3 Higher-order Markov chains

Higher-order Markov chains are a natural extension of Markov chains, where the transitions depend on the past few states, rather than just the last one. For a plain introduction, see e.g., [3, 39]. For example, a discrete-time, second order Markov chain is defined by a third order tensor where

is the conditional probability of transitioning to state

, given that the last state was and the second last state was . More precisely, if

is the random variable describing the status of the chain on the set

at time , then

 Pijk=P(X(t+1)=i|X(t)=j,X(t−1)=k),

where denotes probability. Hence, the sequence obeys the rule

 (3) P(X(t+1)=i)=∑j,kPijkP(X(t)=j,X(t−1)=k).

Obviously it must hold , i.e., the tensor is stochastic.

Let be the probability vector of the random variable , i.e., the vector with entries . Let denote the joint probability function . Then, is the marginal probability , i.e., the vector with entries . Hence, the dynamics of the second order Markov chain (3) is described by the two-phase process

 (4) {(Yt+1)ij=∑kPijk(Yt)jk(xt+1)i=∑j(Yt+1)ij.

Note that both steps in (4) are linear and thus their convergence can be analyzed using standard ergodicity arguments. In fact, the second order Markov chain over the state set can be easily reduced to a first order Markov chain with state set , see e.g., [3, 39]. Thus, under appropriate hypotheses on , the iteration (4) has a unique limit such that

 (5) Yij=n∑k=1PijkYjk.

However, this approach has a clear computational drawback: the size of the joint probability function, or that of the equivalent first order Markov chain, is the square of that of the original chain. The situation gets even worse for an

-th order Markov chain due to the “curse of dimensionality” effect: the memory space required by the joint density grows exponentially with the chain size, requiring

entries. Moreover, the convergence analysis of the iteration (4) and its natural extension to the setting becomes cumbersome.

In order to circumvent these issues, Raftery [32]

proposed a technique to approximate higher-order Markov chains by means of a linear combination of first order ones, by assuming that the joint probability distribution of the lagged random variables

can be replaced by a mixture of its marginals. For example, in the case, that assumption reduces to replacing the conditional probabilities by an expression of the form , where is a stochastic matrix and . This technique, known as the Mixture Transition Distribution model, has been widely used to fit stochastic models with far fewer parameters than the fully parameterized model to multi-dimensional data in a variety of applications [5, 33].

A more recent and promising approach, which maintains all the information contained in the transition tensor , is the one proposed in [27]. Here, one assumes that the joint probability distribution of the higher-order Markov chain is the Kronecker product of its marginal distributions, that is, . This hypothesis, which is equivalent to assuming that the random variables and are independent, is a conceptual simplification of the Markov chain formalism that is introduced in order to obtain a computationally tractable extension to the higher-order case. Using our tensor-vector product notation, this “reduced” higher-order Markov process boils down to the iteration

 (6) xt+1=Pxtxt−1,

which replaces (4) and is the higher order counterpart of the usual power method for a stochastic matrix in the classical (first order) Markov chain setting. The limit of this sequence, if it exists, is a nonnegative vector such that

 (7) x=Pxx,

that is, is a -eigenvector of associated to the -eigenvalue . Thus, it is natural to consider that vector as a stationary density of the Markov chain (3).

Note that the limit matrix of (4) is such that . Indeed, from (5) we have

 (1TY)j=∑iYij=∑kYjk∑iPijk=∑kYjk=(Y1)j.

But that row-column sum is generally different from the vector in (7). In fact, that solution corresponds to the case where has rank one, . This is not difficult to prove, as if is such that and then must solve (7). On the other hand, the converse implication is false in general; that is, if solves (7) then the matrix may not be a solution of . Indeed, extensive numerical experiments reported in [39] show that the vector is strongly correlated with the row-column sum vector of , but the matrix has full rank in general and .

### 3.1 Uniqueness of the stationary distribution

The existence of a nonnegative solution of (7) is a direct consequence of the Brouwer’s fixed point theorem. However, unlike the matrix case, the irreducibility of is not enough to ensure the uniqueness of and additional assumptions are required.

A quite general assumption is introduced in [8], where it is proved that uniqueness is ensured for any aperiodic tensor. The definition of aperiodic tensor is given in terms of a special tensor-tensor product therein introduced and can be also found in [10]. It is not difficult to see that any entrywise positive cubic stochastic tensor is aperiodic. However, if has some zero entry, verifying whether is aperiodic or not can be a challenging task. Other sufficient conditions can be given in terms of the entries of the tensor , see for example [11, 15, 17, 26, 27].

In the following we introduce a family of ergodicity coefficients for stochastic cubic tensors of order three and we then show, in Section 6, how they allow us to prove new conditions for the uniqueness of a positive solution to (7). The conditions we obtain in this way can be easily computed and are, to the best of our knowledge, among the weakest conditions available in the literature so far.

## 4 Coefficients of ergoditicy

Let be a metric on and consider a mapping . Although other notions of ergodicity coefficient are available in the literature, see e.g., [20], for the purpose of this work a coefficient of ergodicity for is the best Lipschitz constant of with respect to , that is

 (8) τd(f)=maxx,y∈S1x≠yd(f(x),f(y))d(x,y).

Different choices of the metric give rise to different notions of ergodicity coefficients. For example, if is the Hilbert projective distance

 (9) dH(x,y)=log(maxixiyimaxiyixi)

then (8) is the so-called Birkhoff contraction ratio [6], which we denote by . This choice of metric is particularly interesting because it extends very naturally to the case of a mapping that leaves a generic proper cone invariant. Moreover, when is a linear map described by the matrix , the Birkhoff–Hopf theorem [13] provides an explicit formula for , which we recall below:

 τH(A)=tanh(14log(maxijhkAijAhkAikAhj)),

where denotes the hyperbolic tangent. More recently, in [15], an analogous explicit formula has been proved for the case where is a (weakly) multilinear mapping induced by a nonnegative tensor. In particular, this formula holds for the case of -eigenvectors of cubic stochastic tensors and we will review it in this setting in Section 6.1.

Another popular and successful choice for the distance is , where is the -norm on . Norm-based coefficients were introduced by Dobrushin in 1956 [12] for the case of linear mappings and have been the subject of numerous investigations afterwards, see e.g., [20, 35].

In Section 5 we analyze properties of norm-based coefficients for mappings defined by a stochastic tensor . To this end, we first review some relevant properties of these coefficients for the case of linear maps.

### 4.1 Norm-based ergodicity coefficients for matrices

Let be a stochastic matrix and . The -norm ergodic coefficient of is

 τp(P)=supx,y∈S1x≠y∥Px−Py∥p∥x−y∥p.

This definition extends obviously to any matrix , when appropriate. The linearity of , the continuity of and the fact that the set coincides with , which is compact, yield the equivalent formula

 τp(P)=max∥x∥p=1xT1=0∥Px∥p.

We review below relevant formulas and properties of and refer to [20, 35, 37] for proof details, further properties and discussion.

The following properties are direct consequences of the preceding definitions: If are stochastic then

1. if and only if

2. .

Ergodicity coefficients play an important role also in deriving perturbation bounds for the stationary probability vector of a Markov chain, as in the following result from Seneta [36], see also [20, Thm. 3.14].

###### Theorem 4.1

Let be two stochastic irreducible matrices, and let be their corresponding stationary probability vectors. Then

 ∥x−x′∥p≤∥P−P′∥p1−τp(P).

If is stochastic then, as , for any eigenvector of corresponding to an eigenvalue we have , which implies . Therefore,

 |λ|=∥λu∥p∥u∥p=∥Pu∥p∥u∥p≤max∥x∥p=1xT1=0∥Px∥p=τp(P),

that is, is an upper bound for the magnitude of any eigenvalue of different from 1. This observation implies the following well-know result.

###### Theorem 4.2

If is a stochastic matrix with for some then is ergodic, i.e., there exists a unique eigenvector such that . Moreover, the power method converges to for any , and

 ∥xt−x∥p≤τp(P)t∥x0−x∥p.

The theorem above gives a sufficient condition for the ergodicity of which is very useful in practice when combined with a number of explicit formulas that allow to compute using only the entries of , for the particular case .

###### Theorem 4.3

Let . Then

 τ1(P)=12maxj,k∑i|Pij−Pik|.

Moreover, if is stochastic then

 τ1(P) =1−minjk∑imin{Pij,Pik} =1−minI⊂{1,…,n}(minj∑i∉IPij+mink∑i∈IPik).

We will devote Sections 5 and 6 to extend the ergodicity coefficient to three-mode tensors, to prove analogous theorems to the preceding ones and to discuss further properties and applications.

### 4.2 Auxiliary results

Before proceeding further we recall here some useful preliminary result. Recall the notation . In what follows we set .

###### Lemma 1

For all it holds

 12∑i|yi|=maxI⊂[n]∑i∈Iyi.

###### Lemma 2

Let be a zero-sum vector having nonzero entries. Then there exists a decomposition where and for each there exist integers such that

 vk=12(ep(k)−eq(k)).

The proof of the previous lemma can be found in [35, Lemma 2.4] and is omitted for brevity. The following lemma, instead, is borrowed from [20, p. 166].

###### Lemma 3

For any ,

 min{s,t}=12(|s+t|−|s−t|).

Consequently,

 12|s−t|=12|s+t|−min{s,t}.

## 5 Ergodicity coefficients for third order tensors

Let be a cubic stochastic tensor. We define the following higher-order ergodicity coefficients:

 TL(P) =maxx∈\SSmaxy∈Z1∥Pxy∥1 TR(P) =maxx∈\SSmaxy∈Z1∥Pyx∥1 T(P) =maxx∈\SSmaxy∈Z1∥Pxy+Pyx∥1

The preceding definitions are extended obviously to any tensor , when appropriate. We remark the following immediate identities:

 TL(P)=TR(PS),T(P)=2TL(12P+12PS)=2TR(12P+12PS).

The relationship between the preceding definitions and the norm-based ergodicity coefficients considered in §4.1 can be revealed by considering the matrices associated to the tensor-vector products and defined as in (1):

 TL(P) =maxx∈\SSτ1(Px) TR(P) =maxx∈\SSτ1(PSx) T(P) =maxx∈\SSτ1(Px+PSx).

The forthcoming results provide explicit formulas for computing the coefficients above from the knowledge of the tensor entries.

###### Theorem 5.1

Let . Then,

 (10) TL(P)=12maxj,k1,k2∑i|Pijk1−Pijk2|.

Moreover, if is stochastic then

 (11) TL(P) =1−minj,k1,k2∑imin{Pijk1,Pijk2} (12) =1−minI⊂[n]minj(mink1∑i∉IPijk1+mink2∑i∈IPijk2).

###### Proof

For any let be a decomposition given by Lemma 2. From we have

 Pxy=12∑kαkPx(ep(k)−eq(k))=12∑j,kxjαkPej(ep(k)−eq(k)).

By the triangle inequality,

 ∥Pxy∥1 ≤12∑j,k|xj||αk|∥Pej(ep(k)−eq(k))∥1 ≤12∥x∥1∥y∥1maxj,k1,k2∥Pej(ek1−ek2)∥1.

Maximizing over and , we conclude that

 TL(P)≤12maxj,k1,k2∥Pej(ek1−ek2)∥1=12maxj,k1,k2∑i|Pijk1−Pijk2|.

Since for all we have and , the reverse inequality holds. Hence we have (10). Moreover, if is stochastic then from Lemma 3 we obtain

 TL(P) =12maxj,k1,k2∑i(Pijk1+Pijk2−2min{Pijk1,Pijk2}) =1−minj,k1,k2∑imin{Pijk1,Pijk2}

and we have (11). Finally, since , from Lemma 1 and (10) we get

 TL(P) =maxj,k1,k2maxI⊂[n]∑i∈IPijk1−Pijk2 =maxImaxj,k1,k2(1−∑i∉IPijk1−∑i∈IPijk2) =1−minIminj,k1,k2(∑i∉IPijk1+∑i∈IPijk2),

so we have (12) and the proof is complete.

The analogous formulas for the other higher-order coefficients are derived hereafter.

###### Corollary 1

Let . The following properties hold:

 (13) TR(P) =12maxj1,j2,kn∑i=1|Pij1k−Pij2k| (14) T(P) =12maxj,k1,k2∑i|Pijk1−Pijk2+Pik1j−Pik2j|.

Moreover, if is stochastic then

 (15) TR(P) =1−minj1,j2,kn∑i=1min{Pij1k,Pij2k} (16) =1−minI⊂[n]minj1,j2,k(∑i∉IPij1k+∑i∈IPij2k) (17) T(P) =2−minj,k1,k2∑imin{Pijk1+Pik1j,Pijk2+Pik2j} (18) =2−minI⊂[n]minj(mink1∑i∈I(Pijk1+Pik1j)+mink2∑i∉I(Pijk2+Pik2j)).

###### Proof

Equations (13), (15) and (16) derive from the identity and equations (10), (11) and (12), respectively. Now, define . Note that if is stochastic then also is stochastic. Since

 ∥Pxy+Pyx∥1=∥Pxy+PSxy∥1=2∥12(P+PS)xy∥1=2∥Qxy∥1,

we have . Hence, equations (14), (17) and (18) derive from the identity and equations (10), (11) and (12), respectively.

By (11), (15) and (17), it is immediate to observe that for a stochastic tensor it holds and

 (19) 0≤T(P)≤TL(P)+TR(P)≤2.

Stronger inequalities can be easily obtained for positive tensors, as shown in the next result.

###### Corollary 2

Let be a stochastic tensor. If there exists a positive number such that for all then

 TL(P)≤1−nα,TL(P)≤1−nα,T(P)≤2(1−nα).

###### Proof

The three inequalities in the claim follow immediately from equations (11), (15) and (17), respectively.

###### Remark 1

A close look at Theorem 5.1 reveals that, for any tensor we have if and only if for some matrix . In particular, is stochastic if and only if is stochastic. Analogously, from Corollary 1 we derive that if and only if for some matrix . Consequently, if and only if for some vector . It is not difficult to prove that the latter is also equivalent to . Hence, if is nonzero then

 TL(P)+TR(P)=0⟺T(P)=0⟺rank(P)=1.

### 5.1 Bounding the variation of higher-order coefficients

When working with stochastic tensors, it is quite natural to endow with the norm

 ∥P∥1=max∥x∥1=∥y∥1=1∥Pxy∥1.

In fact, standard linear algebraic techniques yield the explicit formula

 ∥P∥1=maxj,k∑i|Pijk|,

so that, if is stochastic, we have .

With the next theorem we prove a Lipschitz-continuity condition for the higher-order ergodicity coefficients with respect to the tensor -norm above.

###### Theorem 5.2

For arbitrary we have

 |T∗(P)−T∗(Q)|≤T∗(P−Q)≤∥P−Q∥1

where is any of or . Moreover,

 |T(P)−T(Q)|≤T(P−Q)≤2∥P−Q∥1.

###### Proof

Consider for definiteness , the other case being completely analogous. Suppose that . Hence, for some and we have

 TL(P)=∥Pxy∥1≤∥(P−Q)xy∥1+∥Qxy∥1≤TL(P−Q)+TL(Q).

Hence, . By reversing the roles of and we obtain and we arrive at the first claim. Analogously, for some and we have

 T(P)=∥Pxy+Pyx∥1 ≤∥(P−Q)xy+(P−Q)yx∥1+∥Qxy+Qyx∥1 ≤T(P−Q)+T(Q).

The inequality follows from the preceding one by exchanging and , and the second claim follows. The rightmost inequalities follow immediately from the definition of the ergodicity coefficients.

## 6 Applications to second-order Markov chains and Z-eigenvectors

In this section we prove an analogous of Theorem 4.2 for tensor -eigenvectors. Precisely, given stochastic, we provide a new condition that ensures the existence and uniqueness of a positive vector such that . Moreover, we show that under the same condition the higher-order power method always converges to and we provide an analogous, but stronger, condition that guarantees the global convergence of the alternate scheme .

The next theorem is the tensor analogous of Theorem 4.2.

###### Theorem 6.1

If is a stochastic tensor with then there exists a unique -eigenvector such that . Moreover, the higher-order power method converges to for any , and

 ∥xt−x∥1≤T(P)t∥x0−x∥1.

###### Proof

Let be given by . Let . Note that is a stochastic tensor such that . Moreover, the equation is equivalent to . Then, for all we have

 f(x)−f(y) =Qxx−Qyy+Qxy−Qxy =Qxx−Qyy+Qyx−Qxy=Q(x+y)(x−y).

Hence,

 τ1(f) =maxx,y∈\SS∥f(x)−f(y)∥1∥x−y∥1=maxx,y∈\SS2∥Q(12x+12y)(x−y)∥1∥x−y∥1 =maxv∈\SSmaxw∈Z12∥Qvw∥1=2TL(Q).

Since , we arrive at for any , which shows that is contractive with respect to the -norm. By the Banach fixed point theorem, there exists a unique fixed point such that . Moreover, the iteration converges to with for any and the claim follows.

For completeness, we include in this discussion the following result, which has been rederived many times by different authors [11, 15, 17, 26, 27], mainly from a well known uniqueness result in the fixed point theory [21].

###### Corollary 3

If is a stochastic tensor such that for all , then there exists a unique -eigenvector such that and the higher-order power method converges to for any .

###### Proof

In the stated hypotheses we have by virtue of Corollary 2. Hence, the claim is a direct consequence of Theorem