1 Introduction
Hypermatrices, or tensors, with modes
are a natural generalization of matrices. For example, a tensor with two modes is a matrix, whereas a tensor with three modes is a “cube” (or a “box” if dimensions are different for different modes). The multidimensional nature of tensors naturally gives rise to a variety of eigenvalue problems. In fact, the classical eigenvalue and singular value problems for a matrix, can be generalized to the tensor setting following different constructions which lead to different notions of eigenvalues and singular values for tensors, all of them reducing to the standard matrix case when the tensor has
modes, see e.g., [17] and Chapter 2 of [31]. In this work we focus on tensors having three modes, , all of them having the same dimension. This kind of tensors is attracting a growing interest due to their appearance in higherorder stochastic processes arising in the mathematical modeling of certain dynamics and ranking schemes based on random walks in complex networks [3, 4, 18, 29]. By extending the wide and influential literature on ergodicity coefficients for matrices, we introduce a family of higherorder ergodicity coefficients for stochastic cubical tensors and discuss how these allow to derive new conditions on the existence, uniqueness and computability of stationary distributions for different type of higherorder stochastic processes described by the tensors.In a purely linear algebraic terminology, our new conditions allow to prove guarantees for existence, uniqueness and computability of socalled
eigenvectors of stochastic tensors of order three. Eigenvectors of nonnegative tensors appear in many contexts dealing with high dimensional data, see e.g.,
[1, 4, 23, 24, 27, 31, 38] and references therein. While we focus here on eigenvectors of stochastic tensors, we believe the results here presented can be further extended to more general eigenvector problems for nonnegative tensors and thus offer new insight into the developing Perron–Frobenius theory for tensors and multilinear maps.The reminder of the paper is structured as follows: We fix the relevant notation in the next section. In Section 3 we review the concept of higherorder Markov chain, its associated eigenvector stationary distribution and relevant existence and uniqueness results for that eigenvector. In Section 4 we recall the concept of ergodicity coefficient for a stochastic matrix and some of its properties. Then, in Section 5, we introduce our new higherorder ergodicity coefficients for stochastic cubical tensors and we prove our main results. Finally, in Section 6 we show how these apply to higherorder Markov chains, dominant eigenvectors and how they compare with previous results. In particular, after recalling the concept of vertexreinforced and spacey random walks, we introduce in Subsection 6.2 a general family of Markov processes with memory that includes the spacey random walk as particular case and we prove a new convergence result for this general stochastic process. Whereas, in Section 7, we show how the proposed results can be used and how they compare with previous work for two example application settings: the computation of the multilinear PageRank and the convergence analysis of the shifted higherorder power method.
2 Notation
Throughout this paper, we adopt the following notations. Let be the
th canonical basis vector in
and let be the allones vector. Define the sets and . A real cubical tensor of order (or, equivalently, with modes) is a threeway array with real entries of size . We denote by the set of such tensors and use capital bold letters to denote its elements. The th entry of is denoted by . Matrices are tensors with only modes and are denoted with standard capital letters.Given a tensor , several tensorvector product operations can be defined. We write to denote the tensorvector multiplication over the second and third modes. Namely, denotes the vector entrywise defined by
for . Moreover, the product denotes the matrix associated to the linear map , that is,
(1) 
A eigenvalue of a tensor is a real number such that there exists a nonzero vector such that . That vector is a eigenvector associated to , see [31].
There are possible transpositions of a tensor , each corresponding to a different permutation of the set . Using the notation proposed in [34], the transposed tensor corresponding to the permutation can be denoted by , namely,
As it will be of particular importance to us, we devote the special notation to denote the tensor obtained by transposing the entries of over the second and third modes, namely
All inequalities in this work are meant entrywise. In particular, we write (resp., ) to denote a tensor such that (resp., ) for all indices . A tensor is said to be column stochastic or simply stochastic, if and its first mode entries all sum up to one, i.e.,
A tensor acting as the identity on the unit sphere can be defined in the case of tensors with an even number of modes, see [24]. For tensors with three modes we define the following two left and right “onesided identity” tensors:
(2) 
Both and are stochastic tensors and for all and one has and . Indeed,
and similarly for . Note that, letting for any , it holds for all .
3 Higherorder Markov chains
Higherorder Markov chains are a natural extension of Markov chains, where the transitions depend on the past few states, rather than just the last one. For a plain introduction, see e.g., [3, 39]. For example, a discretetime, second order Markov chain is defined by a third order tensor where
is the conditional probability of transitioning to state
, given that the last state was and the second last state was . More precisely, ifis the random variable describing the status of the chain on the set
at time , thenwhere denotes probability. Hence, the sequence obeys the rule
(3) 
Obviously it must hold , i.e., the tensor is stochastic.
Let be the probability vector of the random variable , i.e., the vector with entries . Let denote the joint probability function . Then, is the marginal probability , i.e., the vector with entries . Hence, the dynamics of the second order Markov chain (3) is described by the twophase process
(4) 
Note that both steps in (4) are linear and thus their convergence can be analyzed using standard ergodicity arguments. In fact, the second order Markov chain over the state set can be easily reduced to a first order Markov chain with state set , see e.g., [3, 39]. Thus, under appropriate hypotheses on , the iteration (4) has a unique limit such that
(5) 
However, this approach has a clear computational drawback: the size of the joint probability function, or that of the equivalent first order Markov chain, is the square of that of the original chain. The situation gets even worse for an
th order Markov chain due to the “curse of dimensionality” effect: the memory space required by the joint density grows exponentially with the chain size, requiring
entries. Moreover, the convergence analysis of the iteration (4) and its natural extension to the setting becomes cumbersome.In order to circumvent these issues, Raftery [32]
proposed a technique to approximate higherorder Markov chains by means of a linear combination of first order ones, by assuming that the joint probability distribution of the lagged random variables
can be replaced by a mixture of its marginals. For example, in the case, that assumption reduces to replacing the conditional probabilities by an expression of the form , where is a stochastic matrix and . This technique, known as the Mixture Transition Distribution model, has been widely used to fit stochastic models with far fewer parameters than the fully parameterized model to multidimensional data in a variety of applications [5, 33].A more recent and promising approach, which maintains all the information contained in the transition tensor , is the one proposed in [27]. Here, one assumes that the joint probability distribution of the higherorder Markov chain is the Kronecker product of its marginal distributions, that is, . This hypothesis, which is equivalent to assuming that the random variables and are independent, is a conceptual simplification of the Markov chain formalism that is introduced in order to obtain a computationally tractable extension to the higherorder case. Using our tensorvector product notation, this “reduced” higherorder Markov process boils down to the iteration
(6) 
which replaces (4) and is the higher order counterpart of the usual power method for a stochastic matrix in the classical (first order) Markov chain setting. The limit of this sequence, if it exists, is a nonnegative vector such that
(7) 
that is, is a eigenvector of associated to the eigenvalue . Thus, it is natural to consider that vector as a stationary density of the Markov chain (3).
Note that the limit matrix of (4) is such that . Indeed, from (5) we have
But that rowcolumn sum is generally different from the vector in (7). In fact, that solution corresponds to the case where has rank one, . This is not difficult to prove, as if is such that and then must solve (7). On the other hand, the converse implication is false in general; that is, if solves (7) then the matrix may not be a solution of . Indeed, extensive numerical experiments reported in [39] show that the vector is strongly correlated with the rowcolumn sum vector of , but the matrix has full rank in general and .
3.1 Uniqueness of the stationary distribution
The existence of a nonnegative solution of (7) is a direct consequence of the Brouwer’s fixed point theorem. However, unlike the matrix case, the irreducibility of is not enough to ensure the uniqueness of and additional assumptions are required.
A quite general assumption is introduced in [8], where it is proved that uniqueness is ensured for any aperiodic tensor. The definition of aperiodic tensor is given in terms of a special tensortensor product therein introduced and can be also found in [10]. It is not difficult to see that any entrywise positive cubic stochastic tensor is aperiodic. However, if has some zero entry, verifying whether is aperiodic or not can be a challenging task. Other sufficient conditions can be given in terms of the entries of the tensor , see for example [11, 15, 17, 26, 27].
In the following we introduce a family of ergodicity coefficients for stochastic cubic tensors of order three and we then show, in Section 6, how they allow us to prove new conditions for the uniqueness of a positive solution to (7). The conditions we obtain in this way can be easily computed and are, to the best of our knowledge, among the weakest conditions available in the literature so far.
4 Coefficients of ergoditicy
Let be a metric on and consider a mapping . Although other notions of ergodicity coefficient are available in the literature, see e.g., [20], for the purpose of this work a coefficient of ergodicity for is the best Lipschitz constant of with respect to , that is
(8) 
Different choices of the metric give rise to different notions of ergodicity coefficients. For example, if is the Hilbert projective distance
(9) 
then (8) is the socalled Birkhoff contraction ratio [6], which we denote by . This choice of metric is particularly interesting because it extends very naturally to the case of a mapping that leaves a generic proper cone invariant. Moreover, when is a linear map described by the matrix , the Birkhoff–Hopf theorem [13] provides an explicit formula for , which we recall below:
where denotes the hyperbolic tangent. More recently, in [15], an analogous explicit formula has been proved for the case where is a (weakly) multilinear mapping induced by a nonnegative tensor. In particular, this formula holds for the case of eigenvectors of cubic stochastic tensors and we will review it in this setting in Section 6.1.
Another popular and successful choice for the distance is , where is the norm on . Normbased coefficients were introduced by Dobrushin in 1956 [12] for the case of linear mappings and have been the subject of numerous investigations afterwards, see e.g., [20, 35].
In Section 5 we analyze properties of normbased coefficients for mappings defined by a stochastic tensor . To this end, we first review some relevant properties of these coefficients for the case of linear maps.
4.1 Normbased ergodicity coefficients for matrices
Let be a stochastic matrix and . The norm ergodic coefficient of is
This definition extends obviously to any matrix , when appropriate. The linearity of , the continuity of and the fact that the set coincides with , which is compact, yield the equivalent formula
We review below relevant formulas and properties of and refer to [20, 35, 37] for proof details, further properties and discussion.
The following properties are direct consequences of the preceding definitions: If are stochastic then



if and only if

.
Ergodicity coefficients play an important role also in deriving perturbation bounds for the stationary probability vector of a Markov chain, as in the following result from Seneta [36], see also [20, Thm. 3.14].
Theorem 4.1
Let be two stochastic irreducible matrices, and let be their corresponding stationary probability vectors. Then
If is stochastic then, as , for any eigenvector of corresponding to an eigenvalue we have , which implies . Therefore,
that is, is an upper bound for the magnitude of any eigenvalue of different from 1. This observation implies the following wellknow result.
Theorem 4.2
If is a stochastic matrix with for some then is ergodic, i.e., there exists a unique eigenvector such that . Moreover, the power method converges to for any , and
The theorem above gives a sufficient condition for the ergodicity of which is very useful in practice when combined with a number of explicit formulas that allow to compute using only the entries of , for the particular case .
Theorem 4.3
Let . Then
Moreover, if is stochastic then
4.2 Auxiliary results
Before proceeding further we recall here some useful preliminary result. Recall the notation . In what follows we set .
Lemma 1
For all it holds
Lemma 2
Let be a zerosum vector having nonzero entries. Then there exists a decomposition where and for each there exist integers such that
The proof of the previous lemma can be found in [35, Lemma 2.4] and is omitted for brevity. The following lemma, instead, is borrowed from [20, p. 166].
Lemma 3
For any ,
Consequently,
5 Ergodicity coefficients for third order tensors
Let be a cubic stochastic tensor. We define the following higherorder ergodicity coefficients:
The preceding definitions are extended obviously to any tensor , when appropriate. We remark the following immediate identities:
The relationship between the preceding definitions and the normbased ergodicity coefficients considered in §4.1 can be revealed by considering the matrices associated to the tensorvector products and defined as in (1):
The forthcoming results provide explicit formulas for computing the coefficients above from the knowledge of the tensor entries.
Theorem 5.1
Let . Then,
(10) 
Moreover, if is stochastic then
(11)  
(12) 
Proof
For any let be a decomposition given by Lemma 2. From we have
By the triangle inequality,
Maximizing over and , we conclude that
Since for all we have and , the reverse inequality holds. Hence we have (10). Moreover, if is stochastic then from Lemma 3 we obtain
and we have (11). Finally, since , from Lemma 1 and (10) we get
so we have (12) and the proof is complete.
The analogous formulas for the other higherorder coefficients are derived hereafter.
Corollary 1
Let . The following properties hold:
(13)  
(14) 
Moreover, if is stochastic then
(15)  
(16)  
(17)  
(18) 
Proof
By (11), (15) and (17), it is immediate to observe that for a stochastic tensor it holds and
(19) 
Stronger inequalities can be easily obtained for positive tensors, as shown in the next result.
Corollary 2
Let be a stochastic tensor. If there exists a positive number such that for all then
Proof
Remark 1
A close look at Theorem 5.1 reveals that, for any tensor we have if and only if for some matrix . In particular, is stochastic if and only if is stochastic. Analogously, from Corollary 1 we derive that if and only if for some matrix . Consequently, if and only if for some vector . It is not difficult to prove that the latter is also equivalent to . Hence, if is nonzero then
5.1 Bounding the variation of higherorder coefficients
When working with stochastic tensors, it is quite natural to endow with the norm
In fact, standard linear algebraic techniques yield the explicit formula
so that, if is stochastic, we have .
With the next theorem we prove a Lipschitzcontinuity condition for the higherorder ergodicity coefficients with respect to the tensor norm above.
Theorem 5.2
For arbitrary we have
where is any of or . Moreover,
Proof
Consider for definiteness , the other case being completely analogous. Suppose that . Hence, for some and we have
Hence, . By reversing the roles of and we obtain and we arrive at the first claim. Analogously, for some and we have
The inequality follows from the preceding one by exchanging and , and the second claim follows. The rightmost inequalities follow immediately from the definition of the ergodicity coefficients.
6 Applications to secondorder Markov chains and eigenvectors
In this section we prove an analogous of Theorem 4.2 for tensor eigenvectors. Precisely, given stochastic, we provide a new condition that ensures the existence and uniqueness of a positive vector such that . Moreover, we show that under the same condition the higherorder power method always converges to and we provide an analogous, but stronger, condition that guarantees the global convergence of the alternate scheme .
The next theorem is the tensor analogous of Theorem 4.2.
Theorem 6.1
If is a stochastic tensor with then there exists a unique eigenvector such that . Moreover, the higherorder power method converges to for any , and
Proof
Let be given by . Let . Note that is a stochastic tensor such that . Moreover, the equation is equivalent to . Then, for all we have
Hence,
Since , we arrive at for any , which shows that is contractive with respect to the norm. By the Banach fixed point theorem, there exists a unique fixed point such that . Moreover, the iteration converges to with for any and the claim follows.
For completeness, we include in this discussion the following result, which has been rederived many times by different authors [11, 15, 17, 26, 27], mainly from a well known uniqueness result in the fixed point theory [21].
Corollary 3
If is a stochastic tensor such that for all , then there exists a unique eigenvector such that and the higherorder power method converges to for any .
Proof
In the stated hypotheses we have by virtue of Corollary 2. Hence, the claim is a direct consequence of Theorem