 # Probabilistic Analysis of Block Wiedemann for Leading Invariant Factors

We determine the probability, structure dependent, that the block Wiedemann algorithm correctly computes leading invariant factors. This leads to a tight lower bound for the probability, structure independent. We show, using block size slightly larger than r, that the leading r invariant factors are computed correctly with high probability over any field. Moreover, an algorithm is provided to compute the probability bound for a given matrix size and thus to select the block size needed to obtain the desired probability. The worst case probability bound is improved, post hoc, by incorporating the partial information about the invariant factors.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

For prime power , let denote a finite field of cardinality . Let be a matrix over . For chosen block size , let be uniformly random in . Call the sequence the (U,V)-projection of . The Wiedemann () and Coppersmith’s block Wiedemann () algorithms compute the minimal generating polynomial, , of . This means that , for all , where , and is minimal (Wiedemann, 1986; Coppersmith, 1994). Then the -th largest invariant factor of divides the -th largest invariant factor of , and is equal with high probability (Kaltofen and Villard, 2001, 2004) for large enough field. Observations of the behavior for small fields were noted by Coppersmith and the analysis has been extended by Villard (1997a, b) and Brent et al. (2003) to small fields subject to certain constraints. We call a projection and it’s minimal generator, , r-faithful to if the largest invariant factors of are the largest invariant factors of .

Wiedemann and Coppersmith developed their algorithms for the purpose of solving linear systems and weren’t explicitly concerned with determining invariant factors. Prior analysis of the block Wiedemann algorithm was motivated by this problem, and is therefore one-sided (asymmetric in the treatment of projection from left and right). Given and chosen uniformly at random, where , Villard (1997a, b) gives a bound on the probability that the minimal generating polynomial of is -faithful to the minimal generating polynomial of . An exact formula and tighter bound for this probability are given in Brent et al. (2003). These analyses are dependent on and the minimal generator of having at most nontrivial invariant factors, thus eliminating the “pathological” case discussed in Coppersmith (1994). They do not speak directly to two sided analysis for situations in which has more than nontrivial invariant factors. Moreover, it is important for our purposes to have a probability bound that applies without regard to the matrix structure and that quantifies the increased confidence that can be achieved for computing invariants by selecting block size somewhat larger than .

In this paper we develop an exact formula (Theorem 14), , for the probability that a random projection is -faithful, for given eigenstructure of . We then construct the worst case and derive a sharp lower bound (Theorem 16), , on the probability that a random projection is -faithful for arbitrary matrix . Since the worst case occurs when there are exactly invariant factors, the bounds from Brent et al. (2003)

can be applied to estimate

(Theorem 18). Knowing allows to be computed such that for any desired probability . Using this we show that with a block size slightly larger than the projection is -faithful with high probability. This makes precise previous observations and estimates regarding block size. The results in this paper are an extension of our previous work in which we presented formulas for and (Harrison et al., 2016).

The worst case bound can be improved by incorporating information about the invariant factors of the minimal generating matrix . In the extreme case, where the sum of the degrees of the invariant factors of equals the matrix dimension, the invariant factors of are equal to those of . In less extreme cases the partial information obtained from can be used, post hoc, to improve the probability bounds for to be -faithful (Theorem 22).

The main results of this paper have been presented without the proofs as a poster at ISSAC 2016, with abstract (Harrison et al., 2017). They are presented here with full development and proofs along with examples and new results on the post hoc analysis.

## 2 Probability Analysis

In this section we derive and prove an exact formula (Theorem 14), , for the probability that a random projection is -faithful, for given eigenstructure of . Similarly to the proofs in Villard (1997a, b) and Brent et al. (2003), our analysis reduces the probability calculation, first to primary components and then to a direct sum of companion matrices of irreducible polynomials.

After introducing notation and some technical results, we show that the probability calculation can be split into independent consideration of the distinct primary components (Theorem 7). Then the probability for a primary component is reduced to that of a direct sum of companion matrices (Theorem 8

). Finally, we show that the sequences generated by the individual companion matrices can mapped to vector outer products (Lemma

9), which reduces the problem to a rank calculation (Theorems 11 and 14).

The following notation will be used throughout the paper. Starting with finite field , we will be working with the ring of polynomials and it’s modular images , (). Let , . We are concerned with matrix sequences of the form . Define an action of on such sequences by, for polynomial of degree , , where . When we say that generates .

Let denote the set of matrix sequences generated by the scalar polynomial . is a module over with respect to the action given above. For short we use to denote the scalar sequences. Let denote matrix polynomials and similarly for . We define a mapping, that is both a vector space isomorphism over and isomorphism of right modules. It also satisfies the property , for being modulo (this is Corollary 4). Note that is a -module and for , we have whenever . This mapping is an extension of the mapping used by Wiedemann (1986) in his probabalistic analysis to blocks with the details made explicit.

Now consider the action from the right of on matrix sequences. We say generates , if and for all , where . is minimal if its columns form a basis for the annihilator of , or equivalently is minimal (Kaltofen and Yuhasz, 2013). It follows that two minimal generators for are unimodularly equivalent and thus have the same Smith normal form. By Theorem 2.12 in Kaltofen and Villard (2004), if is minimal then the -th invariant factor of divides the -th invariant factor of . This section analyzes the probability that for random and that is -faithful to .

###### Definition 1.

Let , let be uniformly random, let , and let minimally generate projection . Define to be the probability that is -faithful to .

Let where are nonsingular and is a generalized Jordan normal form. Because are chosen uniformly at random, and are also uniformly random and . Therefore, we can restrict our analysis to matrices in Jordan form: where are distinct monic irreducible polynomials and denotes the generalized Jordan block associated with . Let denote the companion matrix of , and note that .

 Cf=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣00…−f010…−f10⋱⋱⋮001−fd−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦,   Jfe=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Cf0…0ICf…00⋱⋱000ICf⎤⎥ ⎥ ⎥ ⎥ ⎥⎦,
###### Definition 2.

Let be a scalar polynomial of degree . Define to be the regular representation of the polynomial algebra . Here we are equating polynomials modulo with their column vectors of coefficients and, explicitly, is the Krylov matrix generated by the companion matrix and .

 ρ(a)=Kf(a)=d−1∑i=0aiCif=[aCfa…Cd−1fa].

Define by , and then define by where is a nonsingular matrix satisfying for all . The existence of such is shown in (Taussky and Zassenhaus, 1959). Extend and componentwise to .

###### Lemma 3.

Let be a polynomial of degree , , and Then

###### Proof.

Because is generated by , and , the sequences and are fully defined by their first elements. We can write as a Hankel matrix times vector product and observe that

 ωf(Sg) = ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣S0S1⋯Sd−1S1S2Sd⋮⋱⋮Sd−1Sd⋯S2d−2⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣g0g1⋮gd−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣ωTf(S)ωTf(S)Cf⋮ωTf(S)Cd−1f⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣g0g1⋮gd−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦ = (ωTf(S)[gCfg⋯Cd−1fg])T = (ωTf(S)ρ(g))T=ρT(g)ωf(S),

Therefore, (by definition), which by the previous observation equals . Using the definition of and , we have .

Let Then, .

###### Proof.
 ϕf(SG)ij = ϕf((SG)ij)=ϕf(d∑k=0SikGkj)=d∑k=0ϕf(SikGkj) = d∑k=0ϕf(Sik)Gkj=(ϕf(S)G)ij,

where

In view of Corollary 4, generates if and only if and . Motivated by this we will also speak of generating matrices over : generates if and , and minimally generates if is minimal. Lemma 5 and Theorem 6 relate the Smith normal form (snf) of a matrix over and the Smith normal form of its minimal generating matrix in the case that is an irreducible power.

###### Lemma 5.

Let be an irreducible polynomial of degree and let be a positive integer. Let , and be the number of non-zero invariant factors of . If generates , then .

###### Proof.

Let , where are unimodular. Let denote the number of invariant factors of divisible by and let denote
, in which is repeated times. Since , we have that , with being the count of invariant factors equal to . Moreover, since generates , , and since is unimodular, . Using this as the base case, it follows by induction that , for . Since and has ones followed by copies of along the diagonal and for , it follows that is a matrix whose first columns are zero.

 AfiX=(b−gi0\vlinegi∗).

For a matrix in this form, multiplication from the right by has the same effect as multiplication by so that Finally, since has non-zero invariant factors, the maximum number of columns of which are zero is equal to and we conclude that and . ∎

###### Theorem 6.

Let be an irreducible polynomial of degree , be a positive integer, , and let minimally generate . Let

 snf(A)=diag(f0,…,f0m0,…,fe−1,…,fe−1me−1,0,…,0me).

Then

 Tsnf(G)T=diag(fe,…,fem0,…,f,…,fme−1,1,…,1me),

where is the ones-on-antidiagonal matrix, i.e., we’ve reversed order of the invariants for convenience. Moreover, , where is the number of non-zero invariant factors of .

###### Proof.

Observe that for and consequently . Let , where are unimodular, and let

 H=Q−1diag(fe,…,fem0,…,f,…,fme−1,1,…,1me).

By definition, generates , and . By Lemma 5, is minimal because no generators with lower determinantal degree exist. Since minimal generators are unimodularly equivalent, has the same Smith form. ∎

Let where , minimally generate modulo , and minimally generate modulo . By the Chinese remainder theorem and Newman’s Theorem II.14 (Newman, 1972), the Smith normal form of the minimal generator, , of is . This observation leads to the following theorem, which reduces the probability calculation to primary components.

###### Theorem 7.

Let and be matrices with relatively prime minimal polynomials. Then

 Pq,b,r(A⊕B)=Pq,b,r(A)Pq,b,r(B).
###### Proof.

Let and be the minimal polynomials of and respectively, and let and with minimal generators and respectively. Then is the minimal polynomial of . Let

 Sfg={(UT1UT2)(A⊕B)i(V1V2)}∞i=0

with minimal generator . Since , is -faithful if and only if and are -faithful. ∎

In view of theorem 7 we may focus on each primary component. Henceforward the matrix will be of the form , where for given irreducible polynomial and nonincreasing exponent sequence we let denote the Jordan block .

###### Theorem 8.

Using the primary component Jordan form notation just introduced, let be the greatest index such that (thus first index such that ). For all ,

 Pq,b,r(⊕mi=1Ji)=Pq,b,r(⊕ti=1J1)=Pq,b,r(⊕ti=1Cf).

[Lower order invariants don’t matter, and the effect of a Jordan block is the same as that of a companion matrix.]

###### Proof.

Let , let minimally generate , and let . By Theorem 6, is -faithful if the number of invariant factors of is at least . Because for all , , where are blocks of conforming to the blocks of . Furthermore, , where is nonsingular, and and are the rightmost and topmost blocks of and respectively (Harrison et al., 2016). Because is uniformly random and is nonsingular, is uniformly random, and therefore, . ∎

Thus we may focus attention on the probability in the case of companion matrices. To complete the picture we will reduce to the probability that a sum of outer products has a given rank. First (Lemma 9) we observe the relationship between sequences of projections of companion matrices and outer products. Then (Theorem 11) we relate the sum of outer products to the probability.

###### Lemma 9.

Let , where is an irreducible polynomial of degree , and are chosen uniformly at random. Then is the outer product of two uniformly random vectors in .

###### Proof.

The entry of satisfies

 ωf(Sij)=(UTiVj,UTiCfVj,…,UTiCd−1fVj)T=ρ(Vj)TUi,

where denotes the -th column of and similarly for . Consequently the entry of

 ϕf(S)ij=Pρ(Vj)TUi=ρ(Vj)(PUi)=Vj(PUi).

Since is nonsingular and is uniformly random, and is also uniformly random. Therefore, is the outer product of two uniformly random vectors in . ∎

###### Definition 10.

Let denote the probability that when is a sum of outer products, , and the vectors are chosen uniformly at random.

###### Theorem 11.

For irreducible of degree ,

 Pq,b,r(⊕ti=1Cf)=t∑i=rQqd,b,i(t).
###### Proof.

Let , where are chosen at uniformly random. Let minimally generate . By Theorem 6, has nontrivial invariant factors, where . By Lemma 9, is the sum of outer products of uniformly random vectors in , and the probability that is . ∎

The probability that the sum of outer products has rank can be computed with the following recurrence (Harrison et al., 2016).

###### Theorem 12.
 Qq,b,r(t)=⎧⎨⎩0if r<0 or r>min(t,b),1if r=0 and t=0,ψt,rotherwise,

where with and .

Now a formula for follows from Theorems 6, 8 and 11. A Schur complement argument is involved in separating the leading repeated block from the lower exponent blocks.

###### Lemma 13.

Let be an irreducible polynomial of degree , and let . Let and be uniformly random, where and are blocks conforming to the dimensions of the blocks of . Then, , where is a projection of .

###### Proof.

Because and is a local ring, the matrix is nonsingular. Let

 P=(I0−C(Ir+A)−1I) and Q=(I−(Ir+A)−1B0I).

Then,

 P(G+H)Q=(Ir+A)⊕(−C(Ir+A)−1B+D)=(Ir+A)⊕Y,

where . and are trivially unimodular, and is the top-right block of . Therefore , is a projection of . ∎

###### Theorem 14.

Let be an irreducible polynomial of degree , and (notation of theorem 8) and let be the greatest index such that . Then

 Pq,b,r(A)=⎧⎪⎨⎪⎩1if A=0,Pq,b,t(A)Pq,b−t,r−t(⊕mi=t+1Ji)if r>t,∑ti=rQqd,b,i(t)if r≤t.
###### Proof.

The probability that the largest invariant factor is preserved at least times is given in Theorem 11, and by Theorem 8 is independent of smaller invariant factors being preserved. Therefore, if , the probability that the largest invariant factors are preserved is .

If , the largest invariant factors must be successfully preserved along with the next invariant factors. Let minimally generate . By Theorem 6, if the leading invariant factors of are correct if there are ones in the Smith normal form of . Therefore, by a unimodular transformation, is equivalent to a block matrix in the hypothesis of Lemma 13 and consequently the remaining blocks are projected onto a block. Since the projection was accomplished by a unimodular transformation, the probability that the remaining invariant factors are successfully preserved is the probability that a uniformly random projection preserves them. ∎

## 3 Examples

In this section we present several examples of how to compute for given matrix structures. Let , and let be distinct irreducible with and . Note that these are the three lowest degree irreducible polynomials in . For let denote the list of invariant factors of . For example, let .

To compute , first is split into its distinct factors (Theorem 7)

 Pq,b,r(A)=Pq,b,r(Cf2⊕Cf)Pq,b,r(Cg⊕Cg)Pq,b,r(Ch).

When , applying Theorem 14 yields

 Pq,b,r(A) = Pq,b,1(Cf2)Pq,b,1(Cg⊕Cg)Pq,b,1(Ch) = Qq,b,1(1)(Qq,b,1(1)+Qq,b,1(2))Qq2,b,1(1)

Otherwise, when ,

 Pq,b,r(A) = Pq,b,1(Cf2)Pq,b−1,1(Cf)Pq,b,2(Cg⊕Cg)Pq,b,1(Ch) = Qq,b,1(1)Qq,b−1,1(1)Qq,b,2(2)Qq2,b,1(1) = Qq,b,2(2)Qq,b,2(2)Qq2,b,1(1).

Let , , , , , and To illustrate the effect of invariant structure on , Table 1 shows computed for and . Note that and are the worst case matrices for and , respectively, using the worst case construction given in the following section.

We also performed an experimental check on the probabilities . In Novocin et al. (2015) the Ding-Yuan family of matrices were among those studied, with the goal of developing a formula for their ranks over the field . One matrix in this family is having invariants wherein appears 19 times and . We computed with and obtained the results in table 2, giving the probability that exactly invariants are correct and for each the percentage of the ten thousand trials in which exactly correct invariants resulted. The data is quite consistent with theory. It turns out that 3 or more correct invariants is sufficient in this case to infer the rank. Only two trials failed to provide the first 3 invariants. The example is further discussed in section 7.

## 4 Worst Case

Recall from the introduction that we define . The formula we will derive for can be used to determine the necessary blocksize needed to preserve the leading invariant factors with a specified probability of success. It will show that with a blocksize modestly larger than the probability of preserving invariant factors is quite high, even for small fields. The construction and formula generalize Theorem 20 from (Harrison et al., 2016) which obtained a similar bound for preserving the minimal polynomial. To develop the formula, we begin with the following properties derived from Theorems 7 and 14 to compute the probability for the leading Jordan block and the Schur complement to induct on the remaining blocks.

###### Lemma 15.

1. for .

2. Let and be irreducible polynomials of degree and respectively with , then
.

3. Let and be irreducible polynomials both of degree and let r be given. Then
, for is minimized when and .

###### Proof.

Parts 2 and 3 follow from Theorem 14 and are straightforward. Part 1, while intuitively clear, is more complicated. For part 1, let , where . Let