Computing Nearby Non-trivial Smith Forms

12/11/2018 ∙ by Mark Giesbrecht, et al. ∙ University of Waterloo 0

We consider the problem of computing the nearest matrix polynomial with a non-trivial Smith Normal Form. We show that computing the Smith form of a matrix polynomial is amenable to numeric computation as an optimization problem. Furthermore, we describe an effective optimization technique to find a nearby matrix polynomial with a non-trivial Smith form. The results are then generalized to include the computation of a matrix polynomial having a maximum specified number of ones in the Smith Form (i.e., with a maximum specified McCoy rank). We discuss the geometry and existence of solutions and how our results can used for an error analysis. We develop an optimization-based approach and demonstrate an iterative numerical method for computing a nearby matrix polynomial with the desired spectral properties. We also describe an implementation of our algorithms and demonstrate the robustness with examples in Maple.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For a given square matrix polynomial , one can find unimodular matrices such that is a diagonal matrix . Unimodular means that there is a polynomial inverse matrix, or equivalently, that the determinant is a nonzero constant from . The unimodular matrices encapsulate the polynomial row and column operations, respectively, needed for such a diagonalization. The best known diagonalization is the Smith Normal Form (SNF, or simply Smith form) of a matrix polynomial. Here

where are monic and for . The Smith form always exists and is unique though the associated unimodular matrices , are not unique (see, e.g., (Kailath, 1980; Gohberg et al., 2009)). The diagonal entries are referred to as the invariant factors of .

Matrix polynomials and their Smith forms are used in many areas of computational algebra, control systems theory, differential equations and mechanics. The Smith form is important as it effectively reveals the structure of the polynomial lattice of rows and columns, as well as the effects of localizing at individual eigenvalues. That is, it characterizes how the rank decreases as the variable

is set to different values (especially eigenvalues, where the rank drops). The Smith form is closely related to the more general Smith-McMillan form for matrices of rational functions, a form that reveals the structure of the eigenvalue at infinity, or the infinite spectral structure. The eigenvalue at infinity is non-trivial if the leading coefficient matrix is rank deficient or equivalently, the determinant does not achieve the generic degree.

The algebra of matrix polynomials is typically described assuming that the coefficients are fixed and come from an exact arithmetic domain, usually the field of real or complex numbers. In this exact setting, computing the Smith form has been well studied, and very efficient procedures are available (see (Kaltofen and Storjohann, 2015) and the references therein). However, in some applications, particularly control theory and mechanics, the coefficients can come from measured data or contain some amount of uncertainty. Compounding this, for efficiency reasons such computations are usually performed using floating point to approximate computations in , introducing roundoff error. As such, algorithms must accommodate numerical inaccuracies and are prone to numerical instability.

Numerical methods to compute the Smith form of a matrix polynomial typically rely on linearization and orthogonal transformations (Van Dooren and Dewilde, 1983; Beelen and Van Dooren, 1988; Demmel and Kågström, 1993a, b; Demmel and Edelman, 1995) to infer the Smith form of a nearby matrix polynomial via the Jordan blocks in the Kronecker canonical form (see (Kailath, 1980)). These linearization techniques are backwards numerically stable, and for many problems this is sufficient to ensure that the computed solutions are computationally useful when a problem is continuous.

Unfortunately, the eigenvalues of a matrix polynomial are not necessarily continuous functions of the coefficients of the matrix polynomial, and backwards stability is not always sufficient to ensure computed solutions are useful in the presence of discontinuities. Previous methods are also unstructured in the sense that the computed non-trivial Smith form may not be the Smith form of a matrix polynomial with a prescribed coefficient structure, for example, maintaining the degree of entries or not introducing additional non-zero entries or coefficients. In extreme instances, the unstructured backwards error can be arbitrarily small, while the structured distance to an interesting Smith form is relatively large. Finally, existing numerical methods can also fail to compute meaningful results on some problems due to uncertainties. Examples of such problems include nearly rank deficient matrix polynomials, repeated eigenvalues or eigenvalues that are close together and other ill-posed instances.

In this paper we consider the problem of computing a nearby matrix polynomial with a prescribed spectral structure, broadly speaking, the degrees and multiplicities of the invariant factors in the Smith form, or equivalently the structure and multiplicity of the eigenvalues of the matrix polynomial. The results presented in this paper extend those in the conference paper (Giesbrecht, Haraldson, and Labahn, 2018)

. This work is not so much about computing the Smith normal form of a matrix polynomial using floating point arithmetic, but rather our focus is on the computation of a nearby matrix polynomial with “an interesting” or non-generic Smith normal form. The emphasis in this work is on the finite spectral structure of a matrix polynomial, though the techniques described are easily generalized to handle the instance of the infinite spectral structure as a special case.

The lack of continuity and the resulting limitations of backward stability is not the only issue that needs to be addressed when finding nearest objects in an approximate arithmetic environment. A second issue can be illustrated by recalling the well-known representation of the invariant factors of a matrix as ratios of the determinantal divisors , where

In the case of matrix polynomials, computing the nearest non-trivial Smith form is thus equivalent to finding the nearest matrix polynomial whose polynomial entries have a non-trivial GCD. Recall that approximate GCD problems can have infima that are unattainable. That is, there are co-prime polynomials with nearby polynomials with a non-trivial GCD at distances arbitrarily approaching an infimum, while at the infimum itself the GCD is trivial (see, e.g., (Giesbrecht, Haraldson, and Kaltofen, 2017a)).

The issue of unattainable infima extends to the Smith normal form. As an example, consider

If we look for nearby polynomials of degree at most such that (i.e. a nontrivial gcd) at minimal distance for some , then it is shown in (Haraldson, 2015, Example 3.3.6) that this distance is . This distance has an infimum of at . However at we have even though for all . For to have a non-trivial Smith form we must perturb such that they have a non-trivial GCD, and thus any such perturbation must be at a distance of at least . However, the perturbation of distance precisely has a trivial Smith form. There is no merit to perturbing the off-diagonal entries of .

Our work indirectly involves measuring the sensitivity to the eigenvalues of and the determinant of . Thus we differ from most sensitivity and perturbation analysis (e.g., (Stewart, 1994; Ahmad and Alam, 2009)), since we also study how perturbations affect the invariant factors, instead of the roots of the determinant. Additionally our theory is able to support the instance of being rank deficient and having degree exceeding one. One may also approach the problem geometrically in the context of manifolds (Edelman et al., 1997, 1999). We do not consider the manifold approach directly since it does not yield numerical algorithms.

Determining what it means for a matrix polynomial to have a non-trivial Smith form numerically and finding the distance from one matrix polynomial to another matrix polynomial having an interesting or non-trivial Smith form are formulated as finding solutions to continuous optimization problems. The main contributions of this paper are deciding when has an interesting Smith form, providing bounds on a “radius of triviality” around and a structured stability analysis on iterative methods to compute a structured matrix polynomial with desired spectral properties.

The remainder of the paper is organized as follows. In Section 2 we give the notation and terminology along with some needed background used in our work. Section 3 discusses the approximate Smith form computation as an optimization problem and provides some new bounds on the distance to non-triviality. We present an optimization algorithm in Section 4 with local stability properties and rapid local convergence to compute a nearby matrix polynomial with a non-trivial Smith form and discuss implementation details. A method to compute a matrix polynomial with a prescribed lower bound on the number of ones is discussed in Section 5. We discuss our implementation and include some examples in Section 6. The paper ends with a conclusion along with topics for future research.

2 Preliminaries

In this section we give the notation and formal definitions needed to precisely describe the problems summarized above. We also present some existing results used as building blocks for our work. In addition, we provide a basic description of matrix functions and their first-order derivatives (Jacobian matrices) which will be needed for the optimization work central to our results.

2.1 Notation and Terminology

We make extensive use of the following terminology and definitions. A matrix polynomial is an matrix whose entries are polynomials. Typically we also work with matrices whose entries have degree bound and use the notation to describe this set. Alternatively, we may express matrix polynomials as where . The degree of a matrix polynomial is defined to be the degree of the highest-order non-zero entry of , or the largest index such that . We say that has full rank or is regular if . As noted in the introduction, is said to be unimodular if . The (finite) eigenvalues are the roots of . The norm of a polynomial is defined as . For matrix polynomials we define . Our choice of norm is a distributed coefficient norm, sometimes known as the Frobenius norm.

Definition 2.1 (Svd – Golub and Van Loan 2012).

The Singular Value Decomposition (SVD) of

is given by , where satisfy , and is a diagonal matrix with non-negative real entries in descending order of magnitude, the singular values of . The distance to the nearest (unstructured) matrix of rank is .

For scalar matrices we frequently write for the largest singular value, and for the smallest singular value.

In this paper we are mainly concerned with coefficient structures that preserving the zero coefficient structure of a matrix polynomial, that is, we generally do not change zero coefficients to non-zero, or increase the degree of matrix entries.

Definition 2.2 (Affine/Linear Structure).

A nonzero matrix polynomial of degree at most has a linear structure from a set if

as a vector space over

, where

where and the matrices are linearly independent. If , where is fixed and , then is said to have an affine structure from the set .

Linearly and affine linearly structured matrices are best thought of as imposing linear equality constraints on the entries. Examples of matrices with a linear structure include matrices with prescribed zero entries or coefficients, Toeplitz/Hankel matrices, Sylvester matrices, resultant-like matrices, Ruppert matrices and several other matrices appearing in symbolic-numeric computation. Matrices with an affine structure include all matrices with a linear structure and, in addition, matrices having prescribed non-zero constant entries/coefficients, for example monic matrix polynomials.

Recall that the rank of a matrix polynomial is the maximum number of linearly independent rows or columns as a vector space over . This is the same as the rank of the matrix for any that is not an eigenvalue of . If we allow evaluation at eigenvalues, then the McCoy rank is the lowest rank when is evaluated at an eigenvalue.

Definition 2.3 (McCoy Rank and Non-Trivial SNF).

The McCoy rank of is , the lowest rank possible when is evaluated at any . Note that the rank of only drops at all if it is evaluated at an eigenvalue . The McCoy rank is also the number of ones in the Smith form. Equivalently, if has non-trivial invariant factors, then the McCoy rank of is . The matrix polynomial is said to have a non-trivial Smith normal form if the McCoy rank is at most , or equivalently, if it has two or more invariant factors of positive degree.

Problem 2.4 (Approximate SNF Problem).

Given a matrix polynomial , find the distance to a non-trivial SNF. Find a matrix polynomial of prescribed coefficient structure that has a prescribed McCoy rank of at most for such that is minimized under , if such a exists.

We will consider to be the Frobenius norm. Computing the nearest (if it exists) McCoy rank matrix is the approximate SNF.

Problem 2.5 (Lower McCoy Rank Approximation Problem).

Computing the nearest (if it exists) McCoy rank matrix for is a lower McCoy rank approximation.

In a generic sense, the nearest matrix polynomial with an interesting SNF will have McCoy rank

with probability one, but many matrices arising from applications are expected to have more interesting Smith forms nearby.

As described in the introduction, it is possible that the distance to a non-trivial SNF is not attainable. That is, there is a solution that is approached asymptotically, but where the Smith form is trivial at the infimum. Fortunately, in most instances of interest, solutions will generally be attainable. We will also later discuss how to identify and compute unattainable solutions. Problem 2.4 and Problem 2.5 admit the nearest rank or rank matrix polynomial as a special case. However, the computational challenges are fundamentally different for non-trivial instances.

2.2 Embedding into Scalar Domains

In our study of nearest non-trivial Smith forms we often make use of the representation of the diagonal elements as ratios of GCDs of sub-determinants. When the coefficients are polynomials with numeric coefficients it is helpful to embed the arithmetic operations of polynomial multiplication and polynomial GCD into a matrix problem having numeric coefficients (i.e., from ). Such an embedding allows us to employ a number of tools, including condition numbers and perturbations of matrix functions.

We start with some basic notation for mapping matrices and polynomials to vectors.

Definition 2.6 (Vec Operator).

We define the operator as follows:

The vec operator is extended to map to a single vector in

by stacking columns of (padded) coefficient vectors on top of each other.

It is sometimes useful to reduce matrix polynomials to vectors of polynomials in rather than vectors over .

Definition 2.7 (Polynomial Vec Operator).

The pvec operator maps to a vector as

We define the vectorization of matrix polynomials in this somewhat non-standard way so that we can later facilitate the computation of derivatives of matrix polynomial valued functions.

To describe polynomial multiplication in terms of linear maps over scalars we have:

Definition 2.8 (Convolution Matrix).

Polynomial multiplication between polynomials , of degrees and , respectively may be expressed as a Toeplitz-matrix-vector product. We define

When is non-zero, we can also define division through pseudo-inversion or linear least squares. In a similar manner, we can define the product of matrix polynomials through a Toeplitz-block matrix.

Definition 2.9 (Block Convolution Matrix).

We can express multiplication of a matrix and vector of polynomials, and , of degrees at most and respectively, as a scalar linear system

where

The block convolution matrix is sometimes referred to as a “Sylvester matrix” associated with . However, we reserve the term “Sylvester matrix” for the more standard linear system appearing from the GCD of two (or more) polynomials. The block convolution matrix is a scalar matrix whose entries have a linear (Toeplitz-block) structure.

Definition 2.10 (Kronecker Product).

The Kronecker product of and denoted as is the matrix over defined as

This definition of Kronecker product, sometimes referred to as the “outer product”, also holds for scalar matrices (and vectors).

Lemma 2.11.

For scalar matrices of compatible dimension and over , we can

Likewise, for matrix polynomials and of compatible dimension over , we have

The Kronecker product can also be used to re-write matrix equations of the form , for matrices , and of compatible dimensions, to

2.3 Derivatives of Matrix Polynomial Valued Functions

In this paper we will need to compute derivatives of some important matrix polynomial valued functions, namely the determinant and adjoint. This problem is approached in the context of computing the Jacobian matrix of a vector valued function. The analysis in this section will be useful for showing that Lagrange multipliers typically exist in the optimization problems encountered. The quantities computed can also be used to derive first-order perturbation bounds for these matrix polynomial valued functions with respect to .

Recall that the adjoint of a matrix polynomial , denoted by , is the transpose of the cofactor matrix. Thus where is the cofactor of , that is, the matrix formed by removing row and column from . When has full rank, satisfies .

The determinant of an matrix polynomial having entries of degree at most can be viewed as a mapping from , since the determinant has degree at most . With this same viewpoint, we can view the adjoint of a matrix polynomial as a mapping from , since the degree of the entries of the adjoint are at most . Our notation for computing derivatives of vector valued functions follows that of (Magnus and Neudecker, 1988).

It is not surprising that the determinant of a matrix polynomial has a similar identity (Magnus and Neudecker, 1988) to the well-known scalar identity

Theorem 2.12.

Let have degree at most , then

Proof.

We note that from generalizing the scalar identity , we can write a first-order expansion of the determinant as

and ignoring higher-order terms we obtain the scalar expression

The Jacobian can be extracted by (padding with zero coefficient entries as necessary) writing as a matrix-vector product. Thus, using block-convolution matrices we have

Now that we have a closed-form expression for the derivative of the determinant, it is useful to derive a closed-form expression for the adjoint matrix. The closed-form expression reveals rank information, and is independently useful for optimization algorithms requiring derivatives. The rank information is useful to obtain insights about the existence of Lagrange multipliers. For example, if has a non-zero leading coefficient matrix and a non-zero trailing coefficient matrix, then has full rank. If has full rank then the linearly independent constraint qualification will hold for several constrained optimization problems involving the determinant.

Theorem 2.13.

Let have degree at most and rank . The Jacobian of is with

where is understood to be the identity matrix and for a scalar matrix of full rank, is the Moore-Penrose pseudo-inverse arising from the SVD.

Proof.

First recall that if has full rank, then . This expression defines the adjoint matrix when has full rank. We can write

thus converting to a linear system over produces

Applying the product rule yields

(1)

Next we observe that (1) has the same coefficients as the expression

which is equivalent to

which reduces to

(2)

We now have the derivative of the left hand side the expression . Differentiation of the right hand side yields

which is equivalent to the expression

(3)

Converting (3) into a linear system over leads to

(4)

which is the derivative of the right-hand side.

Combining (2) and (4) we have

Assuming that has full rank so is pseudo-invertible, we can write

which completes the proof. ∎

An observation that is important later is that the derivative of the adjoint has a Toeplitz-block structure. More importantly, the bandwidth is , and we only need to compute columns instead of . We also note that may be padded with zeros, since may not have generic degrees.

Corollary 2.14.

If has full rank then has full rank.

Proof.

The matrix has full rank since has full rank. The matrix

(5)

is a rank one update to a matrix polynomial. By evaluating (5) at a complex number that is not an eigenvalue of we can show that (5) has full rank. Let , so has full rank.

Using the Sherman-Morrison formula  (Higham, 2002, pg. 487) for rank 1 updates to a matrix, we need to verify that

in order to ensure that (5) has full rank. We have that

thus (5) has full rank. Note we used the identities for matrices and of appropriate dimension, that and . Again, we have that

has full rank, thus is a product of two matrices of full rank, so must also have full rank. ∎

Corollary 2.14 implies that Lagrange multipliers will exist to several optimization problems involving the adjoint matrix as a constraint, since the Jacobian matrix of the adjoint has full rank. The linear independent constraint qualification or the constant rank constraint qualification will hold for several optimization problems of the form

for some reasonably prescribed .

Remark 2.15.

If is rank deficient, then the derivative is still defined, but not necessarily by Theorem 2.13. If then , since all minors vanish ( consists of the coefficients of these minors). If or then is still defined and in both cases . However is not necessarily described by Theorem 2.13.

For several affine or linear perturbation structures (such as ones that preserve the degree of entries or the support of entries), Theorem 2.13 and the associated Corollary 2.14 will hold (after deleting some extraneous rows or columns).

3 When Does a Numerical Matrix Polynomial have a trivial SNF?

In this section we consider the question of determining if a matrix polynomial has a non-trivial SNF, or rather how much do the coefficients need to be perturbed to have a non-trivial SNF. We provide a lower bound on this distance by analyzing the distance to a reduced-rank generalized Sylvester matrix.

3.1 Embeddings into generalized Sylvester matrices and approximate GCDs

In the introduction we demonstrated that some nearby non-trivial Smith Forms are unattainable. In this subsection we investigate why these unattainable values occur. We first review some basic results needed to analyze the topology of the approximate Smith form problem.

For a matrix , we know that , the quotient of the determinant and the GCD of all minors. Since these minors are precisely the entries of the adjoint matrix, it follows that has a non-trivial Smith form if and only if the GCD of all entries of the adjoint is non-trivial, that is, . In order to obtain bounds on the distance to a matrix having a non-trivial Smith form, we consider an approximate GCD problem of the form

If this was a classical approximate GCD problem, then the use of Sylvester-like matrices would be sufficient. However, in our problem the degrees of the entries of the adjoint may change under perturbations. In order to perform an analysis, we need to study a family of generalized Sylvester matrices that allow higher-degree zero coefficients to be perturbed.

The computation of the GCD of many polynomials is typically embedded into a scalar matrix problem using the classical Sylvester matrix. However, in our case we want to look at GCDs of nearby polynomials but with the added wrinkle that the degrees of the entries of the individual polynomials may change under perturbations. In order to perform such an analysis, we need to study a family of generalized Sylvester matrices that allow higher-degree zero coefficients to be perturbed.

Let be a vector of polynomials with degrees ordered as for . Set and and suppose that for each we have .

Definition 3.1 (Generalized Sylvester Matrix).

The generalized Sylvester matrix of is defined as

Some authors, e.g., (Fatouros and Karcanias, 2003; Vardulakis and Stoyle, 1978), refer to such a matrix as an expanded Sylvester matrix or generalized resultant matrix. The generalized Sylvester matrix has many useful properties pertaining to the Bézout coefficients. However, we are only concerned with the well known result that if and only if has full rank (Vardulakis and Stoyle, 1978).

Sometimes treatreating a polynomial of degree as one of larger degree is useful. This can be accomplished by constructing a similar matrix and padding rows and columns with zero entries. The generalized Sylvester matrix of degree at most (component-wise) of is defined analogously as , taking to be the largest degree entry and to be the largest degree of the remaining entries of . Note that is possible and typical. If the entries of have a non-trivial GCD (that is possibly unattainable) under a perturbation structure , then it is necessary that is rank deficient, and often this will be sufficient.

If we view the entries of as polynomials of degree and for all , then the entries of have an unattainable GCD of distance zero, typically of the form . In other words, the underlying approximate GCD problem is ill-posed in a sense that the solution is unattainable. In order to study the theory of unattainable GCD’s, sometimes referred to as GCD’s at infinity, we need to study the notion of a degree reversed polynomial.

Lemma 3.2.

If then has full rank if and only if and has full rank.

Proof.

Let and be the largest and second largest entries of and be the second largest entry of . The result follows from the main theorem of Vardulakis and Stoyle (1978) by considering the case of . ∎

This lemma characterizes the (generic) case when elements of maximal degree of do not change under perturbations, in which case the generalized Sylvester matrix still meaningfully encodes GCD information. However, it is possible that has full rank and is rank deficient but the distance to a non-trivial GCD is not zero. This can occur when for some and . To understand the most general case, we need to look at generalized Sylvester matrices of the reversal of several polynomials.

Definition 3.3.

The degree reversal of of degree at most is defined as . For a vector of polynomials of degrees at most the degree reversal of is the vector .

The following theorem enables us to determine if unattainable solutions are occurring in an approximate GCD problem with an arbitrary (possibly non-linear) structure on the coefficients.

Theorem 3.4.

Let be a vector of non-zero polynomials of degree at most . Suppose that has full rank and is rank deficient, where the perturbations have degrees at most and the entries of have degrees . Then has an unattainable non-trivial GCD of distance zero under the perturbation structure if and only if is rank deficient.

Proof.

Suppose that has full rank. Then , hence does not have an unattainable non-trivial GCD, since . Conversely, suppose that is rank deficient. Then, is a factor of but is not a factor of . Accordingly, all entries of may increase in degree and so the distance of having a non-trivial GCD is zero, and so is unattainable. ∎

If the generalized Sylvester matrix of has full rank, but the generalized Sylvester matrix that encodes the perturbations is rank deficient, then either there is an unattainable GCD, or the generalized Sylvester matrix is rank deficient due to over-padding with zeros. Theorem 3.4 provides a reliable way to detect this over-padding.

Definition 3.5.

We say that has an unattainable non-trivial Smith form if and for an arbitrarily small perturbation of prescribed affine structure .

It is important to carefully consider structured perturbations, because some matrix polynomials have an unattainable non-trivial SNF under unstructured perturbations, but have an attainable non-trivial SNF under structured perturbations (perturbations that preserve the degree of entries or support of entries are structured). Solutions that cannot be attained correspond to an eigenvalue at infinity of with a non-trivial spectral structure. Such examples are easily constructed when or has non-generic degrees.

Example 3.6.

Let

Then has an unattainable non-trivial Smith form if all perturbations to are support or degree preserving (i.e. they preserve zero entries or do not increase the degree of each entry), both linear structures. Note that and are both unimodular. However small perturbations to the non-zero coefficients of make non-unimodular.

The Smith form of is

which implies that the eigenvalue at infinity of has a non-trivial spectral structure. The eigenvalue at infinity having a non-trivial spectral structure implies that the SNF of is unattainable. Note that this is equivalent to saying that has a non-trivial Smith-McMillan form.

These examples are non-generic. Generically, the degree of all entries in the adjoint will be and will remain unchanged locally under perturbations to the coefficients. Computing the distance to the nearest matrix polynomial with a non-trivial Smith form under a prescribed perturbation structure can be formulated as finding the nearest rank deficient (structured) generalized Sylvester matrix of the adjoint or the reversal of the adjoint.

3.2 Nearest Rank Deficient Structured Generalized Sylvester Matrix

Suppose that of degree at most has a trivial Smith form and does not have an unattainable non-trivial Smith form. Then one method to compute a lower bound on the distance the entries of need to be perturbed to have an attainable or unattainable non-trivial Smith form is to solve

(6)

Here is the vector of the largest possible degrees of each entry of , and is a prescribed linear or affine perturbation structure.

It is sufficient to compute , a quantity which will generically be . For non-generic instances we require the computation of . This optimization problem is non-convex, but multi-linear in each coefficient of .

We do not attempt to solve this problem directly via numerical techniques, since it enforces a necessary condition that is often sufficient. Instead we use it to develop a theory of solutions which can be exploited by faster and more robust numerical methods.

Lemma 3.7.

Let be a vector of polynomials with degrees and admissible perturbations of degrees where . Then the family of generalized Sylvester matrices of rank at least form an open set under the perturbations .

Proof.

By the degree assumption on we have that for an infinitesimal that and have the same dimension. Accordingly, let us suppose that has rank at least . Then the Sylvester matrix in question must have rank at least in an open-neighborhood around it. In particular, when then and the result follows. ∎

Theorem 3.8.

The optimization problem (6) has an attainable global minimum under linear perturbation structures.

Proof.

Let be the set of all rank at most generalized Sylvester matrices of prescribed shape by and . Lemma 3.7 implies that is topologically closed.

Let , where the generalized Sylvester matrices are padded with zeros to have the appropriate dimension if required. Since has a linear perturbation structure, a feasible point is always . By inspection is seen to be a non-empty set that is bounded and closed.

The functional is continuous over the non-empty closed and bounded set . Let . By Weierstrass’s theorem has an attainable global minimum over . ∎

Note that if a feasible point exists under an affine perturbation structure, then a solution to the optimization problem exists as well. What this result says is that computing the distance to non-triviality is generally a well-posed problem, even though computing a matrix polynomial of minimum distance may be ill-posed (the solution is unattainable). The same results also hold when working over the reversed coefficients. A similar argument is employed by (Kaltofen et al., 2007, Theorem 2.1).

3.3 Bounds on the Distance to non-triviality

Suppose that , of degree at most , has a trivial Smith form and does not have an unattainable non-trivial Smith form. This section provides some basic bounds on the distance coefficients of need to be perturbed to have a non-trivial Smith form. The bounds we derive are unstructured, although they can be generalized to several perturbation structures (such as ones that preserve the degree or support of entries) in a straight forward manner.

If we consider the mapping as a vector-valued function from (with some coordinates possibly fixed to zero), then we note that the mapping is locally Lipschitz. More precisely, there exists such that

The quantity can be bounded above by the (scalar) Jacobian matrix evaluated at . A local upper bound for is approximately . We can invoke Theorem 2.13 if has full rank. By considering , we obtain the (absolute) first-order perturbation bound

The entries of consist of the coefficients of the minors of . This follows because is a multi-linear vector mapping and the derivative of each entry is a coefficient of the leading coefficient with respect to the variable of differentiation. The size of each minor can be bounded above (albeit poorly) by Hadamard’s inequality (Goldstein-Graham variant, see (Lossers, 1974)). As such, we have the sequence of bounds

where is understood to be a vector norm and is understood to be a matrix norm. The bound in question can be used in conjunction with the SVD to obtain a lower bound on the distance to a matrix polynomial with a non-trivial Smith form.

Theorem 3.9.

Suppose that and has rank . Then a lower bound on the distance to non-triviality is

Proof.

We note that for polynomials with degrees that . Accordingly, if is a minimal perturbation to non-triviality, then

and the theorem follows by a simple rearrangement. Note that . ∎

If has different entries, then where and are the largest and second-largest entries of . The lower bound provided can also be improved using the Karmarkar-Lakshman distance (Karmarkar and Lakshman, 1996) in lieu of the smallest singular value of the generalized Sylvester matrix, the reversal of the adjoint or other approximate GCD lower bounds (e.g., (Beckermann and Labahn, 1998)).

4 Approximate SNF via Optimization

In this section we formulate the approximate Smith form problem as the solution to a continuous constrained optimization problem. We assume that the solutions in question are attainable and develop a method with rapid local convergence. As the problem is non-convex, our convergence analysis will be local.

4.1 Constrained Optimization Formulation

An equivalent statement to having a non-trivial attainable Smith form is that where is a vector (or matrix) of scalar polynomials and is a divisor of . This directly leads to the following optimization problem:

(7)

This is a multi-linearly