1. Introduction and main results
Mixed discriminants were introduced by A.D. Alexandrov [Al38] in his work on mixed volumes and what was later called “the Alexandrov-Fenchel inequality”. Mixed discriminants generalize permanents and also found independent applications in problems of combinatorial counting, see, for example, Chapter 5 of [BR97], as well as in determinantal point processes [C+17], [KT12]. Recently, they made a spectacular appearance in the “mixed characteristic polynomial” introduced by Marcus, Spielman and Srivastava in their solution of the Kadison-Singer problem [M+15]. Over the years, the problem of computing or approximating mixed discriminants efficiently attracted some attention [GS02], [Gu05], [CP16].
In this paper, we establish some stability properties of mixed discriminants (the absence of zeros in certain complex domains) and, as a corollary, construct efficient algorithms to approximate the mixed discriminant of some sets of matrices. For example, we show that the mixed discriminant of positive semidefinite matrices can be approximated within a relative error in quasi-polynomial time, provided the distance of each matrix to the identity matrix in the operator norm does not exceed some absolute constant . We also consider the case when all matrices have rank 2, shown to be -hard by Gurvits [Gu05], and provide a quasi-polynomial approximation algorithm in a particular situation.
(1.1) Definitions and properties
Let be an -tuple of complex matrices. The mixed discriminant of is defined by
The determinant in the right hand side is a homogeneous polynomial of degree in complex variables and is the coefficient of the monomial . It is not hard to see that is a polynomial of degree in the entries of : assuming that for , we have
where is the symmetric group of permutations of . It follows from (1.1.1) that is linear in each argument , and symmetric under permutations of , see, for example, Section 4.5 of [Ba16].
Mixed discriminants appear to be the most useful when the matrices are positive semidefinite real symmetric (or complex Hermitian), in which case . For a real
, let denote the matrix with the -th entry equal . It is then not hard to see thatwhere is the matrix with columns , see, for example, Section 4.5 of [Ba16]. Various applications of mixed discriminants are based on (1.1.2). Suppose that are finite sets of vectors. Let us define
From the linearity of in each argument, we obtain
One combinatorial application of (1.1.3) is as follows: given a connected graph with vertices, color the edges of in colors. Then the number of spanning trees containing exactly one edge of each color is naturally expressed as a mixed discriminant. More generally, this extends to counting “rainbow bases” in regular matroids with colored elements, cf. Chapter 5 of [BR97]. Another application of (1.1.3) is in determinantal point processes [C+17].
The mixed discriminant of (positive semidefinite) matrices generalizes the permanent of a (non-negative) matrix. Namely, given diagonal matrices , we consider an matrix whose -th row is the diagonal of . It is easy to see that
where the permanent of is defined by
We note that if are positive semidefinite then is a non-negative matrix and that the permanent of any non-negative square matrix can be interpreted as the mixed discriminant of positive semidefinite matrices.
In their solution of the Kadison-Singer problem, Marcus, Spielman and Srivastava defined the mixed characteristic polynomial of matrices by
[M+15], see also [MS17]. The coefficients of can be easily expressed as mixed discriminants, see Section 5.1. Finding efficiently the partition in the Weaver’s reformulation of the Kadison-Singer conjecture (the existence of such a partition is proven in [M+15]) reduces to bounding the roots of the mixed characteristic polynomial, which in turn makes computing the coefficients of the polynomial (which are expressed as mixed discriminants) of interest. It follows from our results that for any non-negative integer and , fixed in advance, one can approximate the coefficient of in quasi-polynomial time, provided and the distance from each matrix to in the operator norm does not exceed an absolute constant .
Let be an complex matrix. For a set , let be the submatrix of consisting of the entries with . Gurvits [Gu05] noticed that computing the mixed discriminant of positive semidefinite matrices of rank 2 reduces to computing the sum
where for
the corresponding term is equal to 1. Indeed, applying a linear transformation if necessary, we assume that for
we have , where is the standard basis of and are some vectors in . From the linearity of the mixed discriminant in each argument and (1.1.2), it follows thatwhere is the Gram matrix of vectors , that is, and is the standard scalar product in , see [Gu05] for details. Computing a more general expression
for an integer is of interest in discrete determinantal point processes [KT12].
As a ramification of our approach, we present a quasi-polynomial algorithm of complexity approximating (1.1.5) within relative error provided for any , fixed in advance, where is the operator norm.
(1.2) Computational complexity
Since mixed discriminants generalize permanents, they are at least as hard to compute exactly or to approximate as permanents. Moreover, it appears that mixed discriminants are substantially harder to deal with than permanents. It is shown in [Gu05] that it is a -hard problem to compute even when for . In particular, computing (1.1.5) for a positive definite matrix and is a
-hard problem. In contrast, the permanent of a matrix with at most 2 non-zero entries in each row is trivial to compute. A Monte Carlo Markov Chain algorithm of Jerrum, Sinclair and Vigoda
[J+04] approximates the permanent of a non-negative matrix in randomized polynomial time. Nothing similar is known or even conjectured to work for the mixed discriminant of positive semidefinite matrices. A randomized polynomial time algorithm from [Ba99] approximates the mixed discriminant of positive semidefinite matrices within a multiplicative factor of for , where is the Euler constant. A deterministic polynomial time algorithm of [GS02] approximates the mixed discriminants of positive semidefinite matrices within a multiplicative factor of . As is shown in Section 4.6 of [Ba16], for any , fixed in advance, the scaling algorithm of [GS02] approximates the mixed discriminant within a multiplicative factor ofprovided the largest eigenvalue of each matrix
is within a factor of of its smallest eigenvalue.A combinatorial algorithm of [CP16] computes the mixed discriminant exactly in polynomial time for some class of matrices (of bounded tree width).
Our first result establishes the absence of complex zeros of if all lie sufficiently close to the identity matrix. In what follows, denotes the operator norm a matrix, which in the case of a real symmetric matrix is the largest absolute value of an eigenvalue.
(1.3) Theorem
There is an absolute constant (one can choose ) such that if are real symmetric matrices satisfying
then for , we have
We note that under the conditions of the theorem, the mixed discriminant is not confined to any particular sector of the complex plane (in other words, the reasons for the mixed discriminant to be non-zero are not quite straightforward). For example, if , then rotates times around the origin as ranges over the unit circle.
Applying the interpolation technique, see
[Ba16], [PR17], we deduce that the mixed discriminant can be efficiently approximated if the matrices are close to in the operator norm. Let be matrices satisfying the conditions of Theorem 1.3. Since in the simply connected domain (polydisc) , we can choose a branch of in that domain. It turns out that the logarithm of the mixed discriminant can be efficiently approximated by a low (logarithmic) degree polynomial.(1.4) Theorem
For any there is a constant and for any , for any positive integer , there is a polynomial
in the entries of real symmetric matrices and complex such that and
provided
where is the constant in Theorem 1.4 and
We show that the polynomial can be computed in quasi-polynomial time, where the implicit constant in the “” notation depends on alone. In other words, Theorem 1.4 implies that the mixed discriminant of positive definite matrices can be approximated within a relative error in quasi-polynomial time provided for each matrix , the ratio of any two eigenvalues is bounded by a constant , fixed in advance. We note that the mixed discriminant of such -tuples can vary within an exponentially large multiplicative factor .
Theorem 1.4 shows that the mixed discriminant can be efficiently approximated in some open domain in the space of -tuples of symmetric matrices. A standard argument shows that unless -hard problems can be solved in quasi-polynomial time, the mixed discriminant cannot be computed exactly in any open domain in quasi-polynomial time: if such a domain existed, we could compute the mixed discriminant exactly at any -tuple as follows: we choose a line through the desired -tuple and an -tuple in the domain; since the restriction of the mixed discriminant onto a line is a polynomial of degree , we could compute it by interpolation from the values at points in the domain.
We deduce from the Marcus - Spielman - Srivastava bound on the roots of the mixed characteristic polynomial [MS17] the following stability result for mixed discriminants.
(1.5) Theorem
Let be the positive real solution of the equation . Suppose that are positive semidefinite matrices such that and for . Then
As before, the interpolation argument produces the following algorithmic corollary.
(1.6) Theorem
For any , where is the constant of Theorem 1.5, there is a constant , and for any , and any positive integer there is a polynomial
in the entries of real symmetric matrices and complex such that and
provided are positive semidefinite matrices such that
and .
Again, the polynomial is constructed in quasi-polynomial time.
Some remarks are in order. An -tuple of positive semidefinite matrices satisfying (1.6.1) is called doubly stochastic. Gurvits and Samorodnitsky [GS02] proved that an -tuple of positive definite matrices can be scaled (efficiently, in polynomial times) to a doubly stochastic -tuple, that is, one can find an matrix , a doubly stochastic -tuple , and positive real numbers such that for , see also Section 4.5 of [Ba16] for an exposition. Then we have
and hence computing the mixed discriminant for any -tuple of positive semidefinite matrices reduces to that for a doubly stochastic -tuple. The -tuple naturally plays the role of the “center” of the set of all doubly stochastic -tuples. Let us contract the convex body of all doubly stochastic -tuples towards its center with a constant coefficient , . Theorem 1.5 implies that the mixed discriminants of all contracted -tuples are efficiently (in quasi-polynomial time) approximable. In other words, there is “core” of the convex body of doubly stochastic -tuples, where the mixed discriminant is efficiently approximable, and that core is just a scaled copy (with a constant, small but positive, scaling coefficient) of the whole body.
Finally, we address the problem of computing (1.1.5). First, we prove the following stability result.
(1.7) Theorem
For an complex matrix and a set , let be the submatrix of consisting of the entries with . For an integer , we define a polynomial
with constant term corresponding to . If then
Consequently, by interpolation we obtain the following result.
(1.8) Theorem
For any and any integer there is a constant and for any and integer there is a polynomial
in the entries of an complex matrix such that and
provided is an matrix such that .
The polynomial is constructed in quasi-polynomial time.
We prove Theorem 1.3 in Sections 2 and 3. We prove Theorem 1.4 in Section 4. In Section 5, we prove Theorems 1.5 and 1.6 and in Section 6, we prove Theorems 1.7 and 1.8.
2. Preliminaries
(2.1) From matrices to quadratic forms
Let be the standard scalar product in . With an real symmetric matrix we associate a quadratic form ,
Given quadratic forms , we define their mixed discriminant by
where is the matrix of . This definition does not depend on the choice of an orthonormal basis in (as long as the scalar product remains fixed): if we change the basis, the matrices change as
for some orthogonal matrix
and all , and hence the mixed discriminant does not change.The advantage of working with quadratic forms is that it allows us to define the mixed discriminant of the restriction of the forms onto a subspace. Namely, if are quadratic forms and is a subspace with , we make into a Euclidean space with the scalar product inherited from and define the mixed discriminant for the restrictions .
We will use the following simple lemma.
(2.2) Lemma
Let be quadratic forms and suppose that
where are real numbers and are unit vectors. Then
where is the orthogonal complement to .
Demonstration Proof
This is Lemma 4.6.3 from [Ba16]. We give its proof here for completeness. By the linearity of the mixed discriminant in each argument, it suffices to check the formula when , where is a unit vector. Let be the matrices of in an orthonormal basis, where is the -th basis vector and hence is the matrix where the -th entry is and all other entries are .
It follows from (1.1.1) that
where is the upper left submatrix of . We observe that is the matrix of the restriction . ∎
(2.3) Comparing two restrictions
Let be quadratic forms and let be unit vectors (we assume that . We would like to compare and . Let , so is a subspace of codimension 2. Let us identify and with as Euclidean spaces (we want to preserve the scalar product but do not worry about bases) in such a way that gets identified with . Hence the quadratic forms get identified with some quadratic forms and the quadratic forms get identified with some quadratic forms for .
We have
Besides
Let us denote
Hence are quadratic forms and by (2.3.1) we have for all . It follows then that
(2.4) Lemma
Suppose that . Let be linear forms. For , let be quadratic forms and let be some other quadratic forms. Then
Demonstration Proof
Since the restriction of a linear form onto a subspace is a linear form on the subspace, repeatedly applying Lemma 2.2, we reduce the general case to the case of , in which case the mixed discriminant in question is just . On the other hand, for all real we have
and hence
It follows by Definition 1.1 that . ∎
(2.5) Corollary
Suppose that and let be quadratic forms. Let be unit vectors such that and for , let us define quadratic forms and as in Section 2.3. Let for . Then
Moreover,
If , the second sum in the right hand side is empty.
Demonstration Proof
Since for , the proof follows by the linearity of the mixed discriminant in each argument and by Lemma 2.4. ∎
Finally, we will need a simple estimate.
(2.6) Lemma
Let and be complex numbers such that and for some . Then
Demonstration Proof
We write and for some real and such that and . Then
∎
3. Proof of Theorem 1.3
We prove Theorem 1.3 by induction on . Following Section 2, we associate with (now complex) matrices (now complex-valued) quadratic forms where is the quadratic form with matrix . If is a subspace then the restriction of onto is just , where is the restriction of onto . The induction is based on the following two lemmas.
(3.1) Lemma
Let us fix such that . Let , , be quadratic forms and let be complex numbers such that . Let us define
and suppose that the following conditions hold:
Then for any unit vector , we have
Demonstration Proof
We have
where
are the orthonormal eigenvectors of
and are the corresponding eigenvalues. In particular, from condition (2) of the lemma, we haveSince
from Lemma 2.2, we obtain by the linearity of the mixed discriminant
Let us choose a unit vector . Then
for some such that .
Combining (3.1.1) and (3.1.2), we get
From Lemma 2.6,
and hence
The proof then follows from (3.1.3). ∎
(3.2) Lemma
Let us fix such that
Let , , be quadratic forms and let be complex numbers such that . Let us define
and suppose that the following conditions hold:
Then for any two unit vectors , we have
for some such that
DemonstrationProof
As in Section 2.3, let us construct the quadratic forms for and the corresponding forms . Clearly,
and
Let
From condition (2) of the lemma, we have
From Corollary 2.5,
and
If then the second sum is absent in the right hand side of (3.2.2).
We can write
where and are eigenvalues of with the corresponding unit eigenvectors and . By (3.2.1) we have
Applying Lemma 2.2, we obtain
where the final inequality follows by condition (1) of the lemma. For we just have
Similarly, if , for every , we obtain
where each of the four terms in the right hand side is if the corresponding intersection of subspaces or fails to be -dimensional. Hence we get