Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils

In this paper we consider the problem of computing generalized eigenvectors of a matrix pencil in real Schur form. In exact arithmetic, this problem can be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers xTGEVC in LAPACK prevent overflow by dynamically scaling the eigenvectors. These subroutines are sequential scalar codes which compute the eigenvectors one by one. In this paper we discuss how to derive robust blocked algorithms. The new StarNEig library contains a robust task-parallel solver Zazamoukh which runs on top of StarPU. Our numerical experiments show that Zazamoukh achieves a super-linear speedup compared with DTGEVC for sufficiently large matrices.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/25/2019

Robust Task-Parallel Solution of the Triangular Sylvester Equation

The Bartels-Stewart algorithm is a standard approach to solving the dens...
11/24/2017

Exploring Approximations for Floating-Point Arithmetic using UppSAT

We consider the problem of solving floating-point constraints obtained f...
04/09/2018

Restructuring expression dags for efficient parallelization

In the field of robust geometric computation it is often necessary to ma...
09/16/2020

An Integer Arithmetic-Based Sparse Linear Solver Using a GMRES Method and Iterative Refinement

In this paper, we develop a (preconditioned) GMRES solver based on integ...
12/11/2017

Computing Lower Rank Approximations of Matrix Polynomials

Given an input matrix polynomial whose coefficients are floating point n...
03/27/2018

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

We present memory-efficient and scalable algorithms for kernel methods u...
11/07/2016

SPECTRA -a Maple library for solving linear matrix inequalities in exact arithmetic

This document describes our freely distributed Maple library spectra, f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let and let . The matrix pencil consists of all matrices of the form where

. The set of (generalized) eigenvalues of the matrix pencil

is given by

(1)

We say that is a (generalized) eigenvector of the matrix pencil if and only if and

(2)

The eigenvalues of can be computed by first reducing to real Schur form . Specifically, there exist orthogonal matrices and such that is quasi-upper triangular and is upper triangular. It is clear that

(3)

Moreover, if is a generalized eigenvector of corresponding to the eigenvalue , then is a generalized eigenvector of corresponding to the eigenvalue .

In this paper, we consider the parallel computation of eigenvectors of a matrix pencil in real Schur form. In exact arithmetic, this problem can be solved using substitution. However, substitution is very vulnerable to floating-point overflow.

In LAPACK [3] there exists a family xTGEVC of subroutines which compute the generalized eigenvectors of a matrix pencil in Schur form. They prevent overflow by dynamically scaling the eigenvectors. These subroutines are scalar codes which compute the eigenvectors one by one. In this paper we discuss the construction of algorithms which are not only robust, but blocked and parallel.

Our paper is organized as follows. In Section 2 we consider the problem of computing the eigenvectors of a matrix pencil in real Schur form using real arithmetic. This problem is equivalent to solving a homogeneous matrix equation of the form

(4)

where is diagonal and is block diagonal with diagonal blocks which are either 1-by-1 or 2-by-2. In Section 3 we present a blocked algorithm for solving this matrix equation. In Section 4 we discuss how to prevent overflow in this algorithm. The concept of an augmented matrix is central to this discussion. A robust task-parallel solver Zazamoukh has been developed and integrated into the new StarNEig library for solving non-symmetric eigenvalue problems [1, 9]. The performance of Zazamoukh is compared to LAPACK in Section 6. We outline several directions for future work in Section 7.

2 Real arithmetic

Let and be given. The set of generalized eigenvalues of the matrix pencil can be computed by first reducing to generalized real Schur form. Specifically, there exist orthogonal matrices and such that

are upper block triangular and . It is clear that

In order to simplify the discussion, we will make the following assumptions.

  1. If , then has a single real eigenvalue.

  2. If , then has two complex conjugate eigenvalues.

  3. All eigenvalues are distinct.

We follow the standard convention and represent eigenvalues

using an ordered pair

where and . If , then is a finite eigenvalue. The case of and , corresponds to an infinite eigenvalue. The case of corresponds to an indefinite problem.

We now consider the problem of computing generalized eigenvectors of . Our goal is to obtain an equivalent problem which can be solved using a blocked algorithm.

2.1 Computing a single eigenvector

It this subsection, we note that the problem of computing a single generalized eigenvector of is equivalent to solving a tall homogenous matrix equation involving real matrices. Let and let where and

Let and let and be given by

(5)

when (or equivalently ) and

(6)

when (or equivalently ). With this notation, the problem of computing an eigenvector is equivalent to solving the homogeneous linear equation

(7)

with respect to . This is an immediate consequence of the following lemma.

Lemma 1

Let and let where . Then the following statements are true.

  1. If , then is a real eigenvector corresponding to the real eigenvalue if and only if has rank 1 and solves equation (7).

  2. If , then is a complex eigenvector corresponding to the complex eigenvalue if and only if has rank 2 and solves equation (7).

2.2 Computing all eigenvectors

It this subsection, we note that the problem of computing all generalized eigenvectors of is equivalent to solving a homogenous matrix equation involving real matrices. Specifically, let and be given by

(8)

where and are given by equations (5) and (6). Then solves the homogeneous matrix equation

(9)

if and only if solves equation (7).

3 A blocked algorithm

It is straightforward to derived a blocked algorithm for solving the homogeneous matrix equation (9). Specifically, redefine

to denote any partitioning of into an by block matrix which does not split any of the 2-by-2 blocks along the diagonal of . Apply the same partitioning to , , , and . Then , , and are block triangular. It is straightforward to see that equation (9) can be solved using Algorithm 1.

for  do
       Compute generalized eigenvectors for ;
       for  do
             for  do
                  
            Solve
(10)
with respect to and set ;
            
      
Algorithm 1 Blocked computation of all generalized eigenvectors

We see that Algorithm 1 can be implemented using three distinct kernels:

  1. Compute the eigenvectors corresponding to a small matrix pencil in real Schur form.

  2. Execute linear updates of the form

    (11)

    where and are small dense matrices, is diagonal and is block diagonal with blocks of size or .

  3. Solution of small matrix equations of the form

    (12)

    where is a matrix pencil in real Schur form and is diagonal and is block diagonal with diagonal blocks of dimension or .

Once these kernels have been implemented, it is straightforward to parallelize Algorithm 1 using a task-based runtime system such as StarPU [4].

4 Deriving a robust blocked algorithm

The subroutines xTGEVC prevent overflow using principles originally derived by Edward Anderson and implemented in the subroutines xLATRS [2]. These subroutines apply to triangular linear systems

(13)

with a single right-hand side . Mikkelsen and Karlsson [7] formalized the work of Anderson and derived a robust blocked algorithm for solving (13). Mikkelsen, Schwarz and Karlsson [8] derived a robust blocked algorithm for solving triangular linear systems

with multiple right hand sides. Their StarPU implementation (Kiya) is significantly faster than DLATRS when numerical scaling is necessary and not significantly slower than DTRSM when numerical scaling is unnecessary.

A robust variant of Algorithm 1 can be derived using the principles applied by Mikkelsen, Schwarz and Karlsson [8]. We can use the two real functions ProtectUpdate and ProtectDivision introduced by Mikkelsen and Karlsson [7]. They are used to prevent scalar divisions and linear updates from exceeding the overflow threshold . They have the following key properties.

  1. If and , and

    (14)

    then and . It follows that the scaled division

    (15)

    cannot exceed .

  2. If is defined, with

    (16)

    and

    (17)

    then and . It follows that

    (18)

    can be computed without any intermediate or final result exceeding .

Now consider the kernels needed for implementing Algorithm 1. It is easy to understand that they can be implemented using loops, divisions and linear updates. It is therefore plausible that robust variants can be implemented using ProtectDivision and ProtectUpdate. However, resolving all the relevant details requires many pages. Here we explain how the updates given by equation (11) can be done without exceeding the overflow threshold . We will use the concept of an augmented matrix introduced by Mikkelsen, Schwarz and Karlsson [8].

Definition 1

Let be partitioned into block columns

(19)

and let have . The augmented matrix represents the real matrix given by

(20)

This is a trivial extension of the original definition which only considered the case of . The purpose of the scaling factors is used to extend the normal representational range of our floating-point numbers.

Data: An augmented matrix where
and matrices such that and is defined.
Result: An augmented matrix such that
(21)
and can be computed without exceeding .
for  do
       ;
       ;
      
Algorithm 2 Right updates with block diagonal matrix

Now can be computed without overflow using Algorithm 2. In our case is diagonal, so we are merely scaling the columns of . However, can now be computed without overflow using Algorithm 3. It follows that can be computed without overflow using two applications of Algorithm 2 and Algorithm 3. Now suppose that is by . Then Algorithm 2 does flops on data. However, Algorithm 3 then does flops on the same data. Therefore, the overall arithmetic intensity is .

Data: A matrix and augmented matrices and where
(22)
such that is defined and
(23)
Result: An augmented matrix such that
(24)
and can be computed without exceeding .
for  do
       ;
       ;
       ;
       ;
       ;
      
;
return
Algorithm 3 Left update with dense matrix

In order to execute all linear updates needed for a robust variant of Algorithm 1 we require certain norms. Specifically, we need the infinity norms of all super-diagonal blocks of and . Moreover, we require the infinity norm of certain submatrices of . These submatrices consists of either a single column (segment of real eigenvector) or two adjacent columns (segment of complex eigenvector). The infinity norm must be computed whenever a submatrix has been initialized or updated. ProtectUpdate requires that the input arguments are bounded by and failure is possible if they are not. It is necessary to scale the matrices and to ensure that all blocks have infinity norm bounded by .

5 Zazamoukh - a task-parallel robust solver

The new StarNEig library runs on top of StarPU and can be used to solve dense non-symmetric eigenvalue problems. A robust variant of Algorithm 1 has been implemented in StarNEig. This implementation (Zazamoukh) uses augmented matrices and scaling factors which are signed 32 bit integers. Zazamoukh can compute eigenvectors corresponding to a subset which is closed under complex conjugation. Zazamoukh is currently limited to shared memory, but an extension to distributed memory is planned.

5.1 Memory layout

Given block sizes and Zazamoukh partitions , and the matrix of eigenvectors conformally by rows and columns. In the absence of any 2-by-2 diagonal blocks on the diagonal blocks the tiles of and are by and the tiles of are by . The only exceptions can be found along the right and lower boundaries of the matrices. This default configuration is adjusted minimally to prevent splitting any 2-by-2 block of or separating the real part and the imaginary part of a complex eigenvector into separate block columns.

5.2 Tasks

Zazamoukh relies on four types of tasks

  1. Pre-processing tasks which compute all quantities needed for robustness. This includes the infinity norm of all super-diagonal tiles of and as well as all norms needed for the robust solution of equations of the type (12). If necessary, the matrics and are scaled minimally.

  2. Solve tasks which use DTGEVC to compute the lower tips of eigenvectors and a robust solver based on DLALN2 to solve equations of the type (12).

  3. Update tasks which execute updates of the type (11) robustly.

  4. Post-processing tasks which enforce a consistent scaling on all eigenvectors.

5.3 Task insertion order and priorities

Zazamoukh is closely related to Kiya which solves triangular linear systems with multiple right-hand sides. Apart from the pre-processing and post-processing tasks, the main task graph is a the disjoint union of task-graphs, one for each block column of the matrix of eigenvectors. Zazamoukh uses the same task insertion order and priorities as Kiya to process each of the sub-graphs.

6 Numerical experiments

In this section we give the result of a set of experiments involving tiny () and small () matrices. Each experiment consisted of computing all eigenvectors of the matrix pencil. The run time was measured for (DTGEVC) LAPACK and Zazamoukh. Results related to somewhat larger matrices can be found in the NLAFET Deliverable 2.7 [10].

The experiments were executed on an Intel Xeon E5-2690v4 (“Broadwell”) node with 28 cores arranged in two NUMA islands with 14 cores each. The theoretical peak performance in double-precision arithmetic is 41.6 GFLOPS/s for one core and 1164.8 GFLOPS/s for a full node.

We used the StarNEig test-program starneig-test to generate reproducible experiments. The default parameters produce matrix pencils where approximately 1 percent of all eigenvalues are zeros, 1 percent of all eigenvalues are infinities and there are no indefinite eigenvalues. Zazamoukh used the default tile size which is 1.6 percent of the matrix dimension for matrix pencils with dimension .

All experiments were executed with exclusive access to a complete node (28 cores). LAPACK was run in sequential mode, while Zazamoukh used 28 StarPU workers and 1 master thread. The summary of our results are given in Figure 1. The speedup of Zazamoukh over LAPACK is initially very modest as there is not enough tasks to keep 28 workers busy, but it picks up rapidly and Zazamoukh achieves a superlinear speedup over DTGEVC when . This is an expression of the fact that Zazamoukh uses a blocked algorithm, whereas DTGEVC computes the eigenvectors one by one.

dimension eigenvalue analysis runtime (ms) SpeedUp
m zeros inf. indef. LAPACK StarNEig
1000 11 13 0 295 175 1.6857
2000 25 16 0 1598 409 3.9071
3000 24 30 0 6182 929 6.6545
4000 42 49 0 15476 1796 8.6169
5000 54 37 0 30730 2113 14.5433
6000 61 64 0 53700 2637 20.3641
7000 67 64 0 84330 3541 23.8153
8000 56 69 0 122527 4769 25.6924
9000 91 91 0 171800 6189 27.7589
10000 108 94 0 242466 7821 31.0019
20000 175 197 0 2034664 49823 40.8378
30000 306 306 0 7183746 162747 44.1406
40000 366 382 0 17713267 380856 46.5091
Table 1: Comparison of sequential DTGEVC to task-parallel Zazamoukh using 28 cores. The runtimes are given in milli-seconds (ms). The last column gives the speedup of Zazamoukh over LAPACK. Values above 28 correspond to super-linear speedup. All eigenvectors were computed with a relative residual less than , where denotes the double precision unit roundoff.

7 Conclusion

Previous work by Mikkelsen, Schwarz and Karlsson has shown that triangular linear systems can be solved in parallel without overflow using augmented matrices. In this paper we have shown that the eigenvectors of a matrix pencil can be computed in parallel without overflow using augmented matrices. Certainly, robust algorithms are slower than non-robust algorithms when numerical scaling is not needed, but robust algorithms will always return a result which can be evaluated in the context of the user’s application. To the best of our knowlegde StarNEig is the only library which contains a parallel robust solver for computing the generalized eigenvectors of a dense nonsymmetric matrix pencil. The StarNEig solver (Zazamoukh) runs on top of StarPU and uses augmented matrices and scaling factors with are integer powers of to prevent overflow. It acheives superlinear speedup compared with (DTGEVC) from LAPACK. In the immediate future we expect to pursue the following work:

  1. Extend Zazamoukh to also compute left eigenvectors. Here the layout of the loops is different and we must use the 1-norm instead of the infinity norm when executing the overflow protection logic.

  2. Extend Zazamoukh to distributed memory machines.

  3. Extend Zazamoukh’s solver to use recursive blocking to reduce the run-time further. The solve tasks all lie on the critical path of the task graph.

  4. Extend Zazamoukh to complex data-types. This case is simpler than real arithmetic because there are no -by- blocks on the main diagonal of .

  5. Revisit the complex division routine xLADIV [6] which is the foundation for the DLALN2 routine used by Zazamoukh’s solve tasks. In particular, the failure modes of xLADIV have not be characterized [5].

Acknowlegdements

This work is part of a project (NLAFET) that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671633. This work was supported by the Swedish strategic research programme eSSENCE. We thank the High Performance Computing Center North (HPC2N) at Umeå University for providing computational resources and valuable support during test and performance runs.

References

  • [1] StarNEig — A task-based library for solving nonsymmetric eigenvalue problems, https://github.com/NLAFET/StarNEig
  • [2]

    Anderson, E.: LAPACK Working Note No. 36: Robust Triangular Solves for Use in Condition Estimation. Tech. Rep. CS-UT-CS-91-142, University of Tennessee, Knoxville, TN, USA (August 1991)

  • [3] Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia, PA, 3rd edn. (1999)
  • [4] Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CCPE - Special Issue: Euro-Par 2009 23, 187–198 (Feb 2011). https://doi.org/10.1002/cpe.1631, http://hal.inria.fr/inria-00550877
  • [5] Baudin, M.: Personal e-mail to C.C.K. Mikkelsen
  • [6] Baudin, M., Smith, R.L.: A robust complex division in scilab. CoRR abs/1210.4539 (2012), http://arxiv.org/abs/1210.4539
  • [7] Kjelgaard Mikkelsen, C.C., Karlsson, L.: Blocked Algorithms for Robust Solution of Triangular Linear Systems. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) Parallel Processing and Applied Mathematics. pp. 68–78. Springer International Publishing, Cham (2018)
  • [8] Kjelgaard Mikkelsen, C.C., Schwarz, A.B., Karlsson, L.: Parallel Robust Solution of Triangular Linear Systems. CCPE 0(0), e5064 (2018). https://doi.org/10.1002/cpe.5064
  • [9] Myllykoski, M., Kjelgaard Mikkelsen, C.C.: StarNEig - A Task-based Library for Solving Nonsymmetric Eigenvalue Problems. Submitted to PPAM-2019
  • [10] Myllykoski, M., Kjelgaard Mikkelsen, C.C., Schwarz, A., Kågström, B.: D2.7 eigenvalue solvers for nonsymmetric problems. Tech. rep., Umeå University (2019), http://www.nlafet.eu/wp-content/uploads/2019/04/D2.7-EVP-solvers-evaluation-final.pdf