 # A New Proof of Hopf's Inequality Using a Complex Extension of the Hilbert Metric

It is well known from the Perron-Frobenius theory that the spectral gap of a positive square matrix is positive. In this paper, we give a more quantitative characterization of the spectral gap. More specifically, using a complex extension of the Hilbert metric, we show that the so-called spectral ratio of a positive square matrix is upper bounded by its Birkhoff contraction coefficient, which in turn yields a lower bound on its spectral gap.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be an integer greater than or equal to . Let be an positive matrix, i.e., for all . By Perron’s theorem 

, the largest eigenvalue (in modulus) of

, denoted by , is unique, real and positive, and therefore, the spectral ratio of , defined as

 κ(A)≜max{|λ|:λ is an eigenvalue of A,λ≠ρ(A)}/ρ(A),

is strictly less than . Ostrowski  strengthened this result and showed that

 κ(A)≤M2−m2M2+m2, (1)

where and . Inspired by Ostrowski’s theorem, Hopf  further strengthened Perron’s theorem and showed that

 κ(A)≤M−mM+m. (2)

It has been observed  that Hopf’s strengthening is tight in the sense that there are examples of for which (2) holds with equality.

Though not the major concern of this work, let us mention that Frobenius [9, 10] generalized Perron’s theorem to non-negative matrices, which is popularly known as the Perron-Frobenius theorem. This result is the key pillar of the theory of non-negative matrices, which has a wide range of applications in multiple disciplines; see, e.g., [21, 14, 2, 1, 12]. Accordingly, there are numerous results characterizing the isolation of the largest eigenvalue of non-negative matrices, most of them in the forms of upper bounds on the modulus of the second largest eigenvalue; see, e.g., 

and the references therein. And it is worthwhile to note that for certain special families of symmetric non-negative matrices (such as adjacency matrices of a regular graph and transition probabilities matrices of a reversible stationary Markov chain), numerous Cheeger-type inequalities, which are in the forms of bounds on the difference between the largest and second largest eigenvalue, have been established; see, e.g.

[5, 4, 15, 13] and references therein.

Although it often shows up in the literature, the exact expression as in (2) actually does not appear in  and only follows from Theorem therein, stated for more general positive linear operators. As a matter of fact, a careful examination of the proof of Theorem reveals that it yields a bound stronger than (2).

To precisely state this stronger result, we need to introduce some notation and terminologies. Let denote the standard simplex in the -dimensional Euclidean space:

 W={w=(w1,w2,...,wn)∈Rn:n∑i=1wi=1,wi≥0 for all i}, (3)

and let

denote its interior, consisting of all the positive vectors in

. Let denote the Hilbert metric on , which is defined 111The Hilbert metric is often defined on a projective space (see, e.g., [21, 12]), which is equivalent to the definition in this paper up to a usual normalization. by

 dH(v,w)≜maxi,jlog(wi/wjvi/vj), for any two vectors v,w∈W∘. (4)

For any positive vector , we define its normalized version as

 N(w)=(w1,w2,…,wn)w1+w2+⋯+wn, (5)

which obviously belongs to . Apparently, the matrix induces a mapping , defined by

 fA(w)=N(Aw), for any vector w∈W∘. (6)

It is well known that is a contraction mapping under the Hilbert metric and the contraction coefficient , defined by

 τ(A)≜supv≠w∈W∘dH(Av,Aw)dH(v,w)

and often referred to as the Birkhoff contraction coefficient, can be explicitly computed as

 τ(A)=1−√ϕ(A)1+√ϕ(A), (7)

where

 ϕ(A)=mini,j,k,laikajlajkail. (8)

We are now ready to state the aforementioned stronger result:

###### Theorem 1.1.

For an positive matrix , we have

 κ(A)≤τ(A). (9)

As mentioned before, Theorem 1.1 follows from Theorem in , which is a contraction result with respect to the Hopf oscillation. Ostrowski  modified Birkhoff’s argument in  and gave an alternative proof of Theorem 1.1, which however still used the Hopf oscillation. In this work, we will give a new proof of Theorem 1.1 using a complex extension of the Hilbert metric in lieu of the Hopf oscillation. As it turned out, the complex Hilbert metric can be applied elsewhere; more specifically, it has been used  to establish the analyticity of entropy rate of hidden Markov chains and specify the corresponding domain of analyticity.

## 2 A Complex Hilbert Metric

Let and let . The following complex extension of the Hilbert metric has been proposed in :

 dH(v,w)=maxi,j∣∣ ∣∣log(wi/wjvi/vj)∣∣ ∣∣, for any v,w∈W+C, (10)

where is taken as the principal branch of the complex function. Here we remark that there are other complex extensions of the Hilbert metric; see, e.g., [20, 6]. Our treatment however only uses the extension in (10), which will henceforth be referred to as the complex Hilbert metric. For any , we define

 W∘C(ε)≜{w=(w1,w2,⋯,wn)∈WC:∃v∈W∘ such that |wi−vi|≤εvi for all i}. (11)

It can be easily verified that for small enough, and thereby the complex Hilbert metric is well-defined on .

Extending the definition in (5), for any complex vector with , we define its normalized version as

 N(w)=(w1,w2,…,wn)w1+w2+⋯+wn,

which obviously belongs to . And furthermore, for any , extending the definition in (6), we define by:

 fA(w)=N(Aw), for any vector w∈W∘C(ε), (12)

which is well-defined if is small enough.

The following lemma has been implicitly established in . We outline its proof for completeness and clarity. An interested reader may refer to the proofs of Theorem in  and relevant lemmas for more technical details.

###### Lemma 2.1.

Consider an positive square matrix . For any small enough , there exists such that for any ,

 dH(fA(x),fA(y))≤τε(A)dH(x,y), (13)

and moreover, tends to as tends to .

###### Proof.

First of all, we note, by the definition in (10), that for any ,

 dH(fA(x),fA(y))dH(x,y)=dH(N(Ax),N(Ay))dH(x,y)=maxi,j|Li,j|,

where

 Li,j=log(∑maimxm/∑majmxm)−log(∑maimym/∑majmym)maxk,l|log(xk/yk)−log(xl/yl)|.

Letting for all and choosing such that , we note that can be rewritten as

An application of the mean value theorem then yields that there exists such that

 |Li,j|≤∑lcl−cq|cp−cq|(e(cl−cq)ξailyl∑me(cm−cq)ξaimym−e(cl−cq)ξajlyl∑me(cm−cq)ξajmym).

By the definition of , there exist such that for some constant ,

 |xk−x∘k|≤C1εx∘k,|yk−y∘k|≤C1εy∘k for all k.

Now, let

 Dl=e(cl−cq)ξailyl∑me(cm−cq)ξaimym−e(cl−cq)ξajlyl∑me(cm−cq)ξajmym,

and

 D∘l=e(c∘l−c∘q)ξaily∘l∑me(c∘m−c∘q)ξaimy∘m−e(c∘l−c∘q)ξajly∘l∑me(c∘m−c∘q)ξajmy∘m,

where we have, similarly as above, defined for all . It then follows from the established facts that for some constant ,

 ∣∣ ∣∣∑lcl−cq|cp−cq|Dl−∑lcl−cq|cp−cq|D∘l∣∣ ∣∣

and

 ∣∣ ∣∣∑lcl−cq|cp−cq|D∘l∣∣ ∣∣≤τ(A)

that

 ∣∣ ∣∣∑lcl−cq|cp−cq|Dl∣∣ ∣∣≤C2C1ε+τ(A),

which immediately implies that

 dH(fA(x),fA(y))dH(x,y)≤C2C1ε+τ(A).

Setting and noting that can be chosen arbitrarily small, we establish (13) and conclude that tends to as tends to . ∎

## 3 Proof of Theorem 1.1

For a subset of , we generalize the definition in (11) and define

 SC(ε)≜{w=(w1,w2,⋯,wn)∈WC:∃v∈S such that |wi−vi|≤εvi for all i}.

We will need the following lemma, which, roughly speaking, asserts the equivalence between the Euclidean metric (denoted by ) and the Hilbert metric on a complex neighborhood of a compact subset of

###### Lemma 3.1.

For any compact subset of , there exists such that there exist constants such that for all and for all ,

 G1dH(v,w)
###### Proof.

The lemma follows from some straightforward arguments underpinned by the mean value theorem and the compactness of , which are completely parallel to those in the proof of Proposition in  (a real version of this lemma). ∎

We are now ready for the proof of Theorem 1.1.

###### Proof.

Consider an positive square matrix . Let

be the eigenvector corresponding to

. By the Perron-Frobenius theorem, we can choose to be a positive vector with , i.e., . Let be an eigenvalue of that is different from and let be a corresponding eigenvector. Here we remark that while and are real, and can be complex.

Now, consider a compact subset of that contains . It can be easily verified that for any , there exists such that for any ,

 N(An(x+y))=N(ρn(A)x+λny)∈SC(ε).

Henceforth, we let and . For any , it can be verified that

 dH(N(Amv),N(Am(v+w))) =dH(N(ρ(A)mv),N(ρ(A)mv+λmw)) =dH(N(v),N(v+~λmw)),

where we have written as for notational simplicity. Now, using the definition of the complex Hilbert metric, we continue

 dH(N(Amv),N(Am(v+w))) =maxi,j=1,2,...,n∣∣ ∣∣log(vi+~λmwi)/(vj+~λmwj)vi/vj∣∣ ∣∣ =maxi,j=1,2,...,n∣∣ ∣∣log1+~λm(wi/vi)1+~λm(wj/vj)∣∣ ∣∣ =maxi,j=1,2,...,n∣∣ ∣∣log(1+~λm(wi/vi)−(wj/vj)1+~λm(wj/vj))∣∣ ∣∣ =maxi,j=1,2,...,n∣∣ ∣∣log(1+(wi/vi)−(wj/vj)(1/~λm)+(wj/vj))∣∣ ∣∣ =∣∣ ∣∣log(1+(wi0/vi0)−(wj0/vj0)(1/~λm)+(wj0/vj0))∣∣ ∣∣, (14)

where we have assumed achieve the maxima in (14). We note that , since otherwise it would mean and therefore would be a scaled version of , contradicting the fact that is different from .

It follows from the fact that that there exists a constant such that for all ,

 dH(N(Amv),N(Am(v+w)))=∣∣ ∣∣log(1+(wi0/vi0)−(wj0/vj0)(1/~λm)+(wj0/vj0))∣∣ ∣∣≥C1∣∣ ∣∣(wi0/vi0)−(wj0/vj0)(1/~λm)+(wj0/vj0)∣∣ ∣∣.

And by Lemmas 2.1 and 3.1, there exist and a constant such that

 dH(N(Amv),N(Am(v+w)))≤C2τmε(A)dE(N(v),N(v+w)),

which immediately implies that

 C1∣∣ ∣∣1(1/~λm)+(wj0/vj0)∣∣ ∣∣≤C2τmε(A)dE(N(v),N(v+w))|(wi0/vi0)−(wj0/vj0)|.

One then verifies that there exists a constant (which depends only on ) such that

 dE(N(v),N(v+w))|(wi0/vi0)−(wj0/vj0)|

and furthermore, there exists a constant such that for all ,

 ∣∣ ∣∣1(1/~λm)+(wj0/vj0)∣∣ ∣∣≥C4~λm.

It then follows that after choosing small enough and then large enough, we have

 C1C4~λm≤C2C3τmε(A),

which, upon letting tend to infinity, yields , where we have used the fact that all the constants can be chosen independent of . Moreover, using the fact that can be chosen arbitrarily small, we apply Lemma 2.1 to obtain , which immediately leads to , as desired. ∎

Acknowledgement. This work is supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, under Project 17301017 and by the National Natural Science Foundation of China, under Project 61871343.

## References

•  R. Bapat and T. RagHavan. Nonnegative Matrices and Applications, New York: Cambridge University Press, 1997.
•  A. Berman and R. Plemmons. Nonnegative Matrices in the Mathematical Sciences, Philadephia, Pa.: Society for Industrial and Applied Mathematics, 1994.
•  G. Birkhoff. Extensions of Jentzsch’s Theorem. Transactions of the American Mathematical Society, vol. 85, no. 1, pp. 219-227, 1957.
•  A. Brouwer and W. Haemers. Spectra of graphs, Springer, New York, 2012.
•  F. Chung. Spectral graph theory, Providence, R.I.: Published for the Conference Board of the mathematical sciences by the American Mathematical Society, 1997.
•  L. Dubois. Projective metrics and contraction principles for complex cones. Journal of the London Mathematical Society, vol. 79, no. 3, pp. 719-727, 2009.
•  G. Han and B. Marcus. Analyticity of entropy rate of hidden Markov chains. IEEE Trans. Info. Theory, vol. 52, no. 12, pp. 5251-5266, 2006.
•  G. Han, B. Marcus and Y. Peres. A note on a complex Hilbert metric with application to domain of analyticity for entropy rate of hidden Markov processes. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, pp. 98-116, 2011.
•  G. Frobenius. Über matrizen aus positiven elementen. Sitzungsberichte Preussische Akademie der Wissenschaft, Berlin, pp. 471–476, 514–518, 1908, 1909.
•  G. Frobenius. Über matrizen aus nicht negativen elementen. Sitzungsberichte Preussische Akademie der Wissenschaft, Berlin, pp. 456–477, 1912.
•  E. Hopf. An inequality for positive linear integral operators. J. Math. Mech., vol. 12, no. 5, pp. 683–692, 1963.
•  B. Lemmens and R. Nussbaum. Nonlinear Perron-Frobenius Theory, Cambridge University Press, 2012.
•  D. Levin and Y. Peres. Markov Chains and Mixing Times, American Mathematical Society, 2nd Revised Edition, 2017.
•  H. Minc. Nonnegative Matrices, New York: Wiley, 1988.
•  R. Montenegro and P. Tetali. Mathematical Aspects of Mixing Times in Markov chains, Foundations and Trends in Theoretical Computer Science, Now Publishers, 2006.
•  A. Ostrowski. On positive matrices. Math. Ann., vol. 150, no. 3, pp. 276–284, 1963.
•  A. Ostrowski. Positive matrices and functional analysis. Recent Advances in Matrix Theory, Madison: Univ. of Wisconsin Press, 1964.
•  O. Perron. Grundlagen für eine theorie des Jacobischen Kettenbruchalgorithmus. Math. Ann., vol. 64, pp. 11–76, 1907.
•  U. Rothblum and C. Tan. Upper bounds on the maximum modulus of subdominant eigenvalues of nonnegative matrices. Linear Algebra Appl, vol. 66, pp. 45-86, 1985.
•  H. Rugh. Cones and gauges in complex spaces: Spectral gaps and complex Perron-Frobenius theory. Annals of Mathematics, vol. 171, no. 3, 2010.
•  E. Seneta. Non-negative Matrices and Markov Chains, Springer Series in Statistics, Springer-Verlag, New York Heidelberg Berlin, 1980.