Log In Sign Up

Iterative and doubling algorithms for Riccati-type matrix equations: a comparative introduction

We review a family of algorithms for Lyapunov- and Riccati-type equations which are all related to each other by the idea of doubling: they construct the iterate Q_k = X_2^k of another naturally-arising fixed-point iteration (X_h) via a sort of repeated squaring. The equations we consider are Stein equations X - A^*XA=Q, Lyapunov equations A^*X+XA+Q=0, discrete-time algebraic Riccati equations X=Q+A^*X(I+GX)^-1A, continuous-time algebraic Riccati equations Q+A^*X+XA-XGX=0, palindromic quadratic matrix equations A+QY+A^*Y^2=0, and nonlinear matrix equations X+A^*X^-1A=Q. We draw comparisons among these algorithms, highlight the connections between them and to other algorithms such as subspace iteration, and discuss open issues in their theory.


On the maximal solution of the conjugate discrete-time algebraic Riccati equation

In this paper we consider a class of conjugate discrete-time Riccati equ...

An efficient iteration for the extremal solutions of discrete-time algebraic Riccati equations

Algebraic Riccati equations (AREs) have been extensively applicable in l...

Numerical Method for a Class of Algebraic Riccati Equations

We study an iteration approach to solve the coupled algebraic Riccati eq...

The Hitchhikers Guide to Nonlinear Filtering

Nonlinear filtering is the problem of online estimation of a dynamic hid...

The convergence analysis of an accelerated iteration for solving algebraic Riccati equations

The discrete-time algebraic Riccati equation (DARE) have extensive appli...

Rational Krylov and ADI iteration for infinite size quasi-Toeplitz matrix equations

We consider a class of linear matrix equations involving semi-infinite m...

Non-autonomous multidimensional Toda system and multiple interpolation problem

We study the interpolation analogue of the Hermite-Padé type I approxima...

1 Introduction

Riccati-type matrix equations are a family of matrix equations that appears very frequently in literature and applications, especially in systems theory. One of the reasons why they are so ubiquitous is that they are equivalent to certain invariant subspace problems; this equivalence connects them to a larger part of numerical linear algebra, and opens up avenues for many solution algorithms.

Many books (and even more articles) have been written on these equations; among them, we recall the classical monography by Lancaster and Rodman LancasterRodman, a review book edited by Bittanti, Laub and Willems BittantiLaubWillems, various treatises which consider them from different points of view such as AbouKandil, bart1, bart2, bimbook, BoyEFB-book, Datta, IonescuOaraWeiss, Meh91-book, and recently also a book devoted specifically to doubling doublingbook.

This vast theory can be presented from different angles; in this exposition, we aim to present a selection of topics which differs from that of the other books and treatises. We focus on introducing doubling algorithms with a direct approach, explaining in particular that they arise as ‘doubling variants’ of other more basic iterations, and detailing how they are related to the subspace iteration, to ADI, to cyclic reduction and to Schur complements. We do not treat algorithms and equations with the greatest generality possible, to reduce technicalities; we try to present the proofs only up to a level of detail that makes the results plausible and allows the interested reader to fill the gaps.

The basic idea behind doubling algorithms can be explained through the ‘model problem’ of computing for a certain matrix , , and . There are two possible ways to approach this computation:

  1. Compute , for starting from ; then the result is .

  2. Compute , for , starting from ; then the result is (repeated squaring).

It is easy to verify that for each . Hence iterations of (2) correspond to iterations of (1). We say that (2) is a squaring variant, or doubling variant, of (1). Each of the two versions has its own pros and cons, and in different contexts one or the other may be preferred. If is moderate and

is large and sparse, one should favor variant (1): sparse matrix-vector products can be computed efficiently, while the matrices

would become dense rather quickly, and one would need to compute and store all their entries. On the other hand, if is a dense matrix of non-trivial size (let us say or ) and is reasonably large, then variant (2) wins: fewer iterations are needed, and the resulting computations are rich in matrix multiplications and BLAS level-3 operations, hence they can be performed on modern computers even more efficiently than their flop counts suggest. This problem is an oversimplified version, but it captures the spirit of doubling algorithms, and explains perfectly in which cases they work best.

Regarding competing methods: we mention briefly in our exposition Newton-type algorithms, ADI, and Krylov-type algorithms. We do not treat here direct methods, incuding Schur decomposition-based methods Lau79-schur, PapLS80, Van81, methods based on structured QR Bye86a, BunM86, Meh88, on symplectic URV decompositions BenMX98, ChuLM07, and linear matrix inequalities BoyEFB-book. Although these competitors may be among the best methods for dense problems, they do not fit the scope of our exposition and they do not lend themselves to an immediate comparison with the algorithms that we discuss.

The equations that we treat arise mostly from the study of dynamical systems, both in discrete and continuous time. In our exposition, we chose to start from the discrete-time versions: while continuous-time Riccati equations are simpler and more common in literature, it is more natural to start from discrete-time problems in this context. Indeed, when we discuss algorithms for continuous-time problems we shall see that often the first step is a reduction to a discrete-time problem (possibly implicit).

In the following, we use the notation (resp. ) to mean that is positive definite (resp. semidefinite) (Loewner order). We use to denote the spectral radius of , the symbol to denote the (open) left half-plane, and for the (open) right half-plane. We use the notation to denote the spectrum of

, i.e., the set of its eigenvalues. We use

to denote the conjugate transpose, and to denote the transpose without conjugation, which appears when combining vectorizations and Kronecker products with the identity  gvl Sections 1.3.6–1.3.7.

2 Stein equations

The simplest matrix equation that we consider is the Stein equation (or discrete-time Lyapunov equation).


for . This equation often arises in the study of discrete-time constant-coefficient linear systems


A classical application of Stein equations is the following. If solves (1), then by multiplying by and on both sides one sees that is decreasing over the trajectories of (2), i.e., . This fact can be used to prove stability of the dynamical system (2).

2.1 Solution properties

The Stein equation (1) is linear, and can be rewritten using Kronecker products as


If is a Schur factorization of , then we can factor the system matrix as


which is a Schur-like factorization where the middle term is lower triangular. One can tell when is invertible by looking at its diagonal entries: is invertible (and hence (1) is uniquely solvable) if and only if for each pair of eigenvalues of . This holds, in particular, when . When the latter condition holds, we can apply the Neumann inversion formula


which gives (after de-vectorization) an expression for the unique solution as an infinite series


It is apparent from (6) that . A reverse result holds, but with strict inequalities: if (1) holds with and , then Datta Exercise 7.10.

2.2 Algorithms

As discussed in the introduction, we do not describe here direct algorithms of the Bartels–Stewart family BarS72, ChuBS, EptonBS, GardinerBS (which, essentially, exploit the decomposition (4) to reduce the cost of solving (3) from to ) even if they are often the best performing ones for dense linear (Stein or Lyapunov) equations. Rather, we present here two iterative algorithms, which we will use to build our way towards algorithms for nonlinear equations.

The Stein equation (1) takes the form of a fixed-point equation; this fact suggests the fixed-point iteration


known as Smith method Smi68. It is easy to see that the th iterate is the partial sum of (6) (and (5)) truncated to terms, thus convergence is monotonic, i.e., . Moreover, some manipulations give

or, devectorizing,


This relation (8) implies for each , so convergence is linear when , and it typically slows down when .

A doubling variant comes from splitting the partial sums into two halves. The truncated sums of (5) to terms can be computed iteratively using the identity

without computing all the intermediate sums. Setting and , one gets the iteration


In view of the definitions, we have ; so this method computes the th iterate of the Smith method directly with operations, without going through all intermediate ones. Convergence is quadratic: for each . The method (9) is known as squared Smith. It has been used in the context of parallel and high-performance computing BenQQ, and reappeared in recent years, when it has been used for large and sparse equations Pen99, Sad12, BenES in combination with Krylov methods.

3 Lyapunov equations

Lyapunov equations


are the continuous-time counterpart of Stein equations. They arise from the study of continuous-time constant-coefficient linear systems


A classical application is the following. If solves (10), by multiplying on by and on both sides one sees that is decreasing over the trajectories of (11), i.e., . This fact can be used to prove stability of the dynamical system (11). Today stability is more often proved by computing eigenvalues, but Stein equations (1) and Lyapunov equations (10) have survived in many other applications in systems and control theory, for instance in model order reduction BenBD11, GugA04, Sim-review, or as the inner step in Newton methods for other equations (see for instance (46) in the following).

3.1 Solution properties

Using Kronecker products, one can rewrite (10) as


and a Schur decomposition produces


Again, this is a Schur-like factorization, where the middle term is lower triangular. One can tell when is invertible by looking at its diagonal entries: that matrix is invertible (and hence (10) is uniquely solvable) if and only if for each pair of eigenvalues of . This holds, in particular, if the eigenvalues of all lie in . When the latter condition holds, an analogue of (6) is


Indeed, this integral converges for every choice of if and only if the eigenvalues of all lie in .

Notice the pleasant symmetry with the Stein case: the (discrete) sum turns into a (continuous) integral; the stability condition for discrete-time linear time-invariant dynamical systems turns into the one for continuous-time systems. Perhaps a bit less evident is the equivalence between the condition (i.e., no two eigenvalues of are mapped into each other by reflection with respect to the imaginary axis) and (i.e., no two eigenvalues of are mapped into each other by circle inversion with respect to the complex unit circle).

Lyapunov equations can be turned into Stein equations and vice versa. Indeed, for a given , (10) is equivalent to

or, if is invertible,


If , then the right-hand side is positive semidefinite and (15) is a Stein equation. The stability properties of can be explicitly related to those of via the following lemma.

[properties of Cayley transforms] Let . Then,

  1. for , we have if and only if ;

  2. for a matrix , we have if and only if .

A geometric argument to visualize (1) is the following. In the complex plane, and are symmetric with respect to the imaginary axis, with lying to its left. Thus a point is closer to than to if and only if it lies in . Part (2) follows from facts on the behaviour of eigenvalues of a matrix under rational functions LancasterRodman Proposition 1.7.3, which we will often use also in the following.

Another important property of the solutions

of Lyapunov and Stein equations is the decay of their singular values in many practical cases. We defer its discussion to the following section, since a proof follows from the properties of certain solution algorithms.

3.2 Algorithms

As in the Stein case, one can implement a direct Bartels-Stewart algorithm BarS72 by exploiting the decomposition (13): the two outer factors have Kronecker product structure, and the inner factor is lower triangular, allowing for forward substitution. An interesting variant allows one to compute the Cholesky factor of directly from the one of  Ham82.

Again, we focus our interest on iterative algorithms. We will assume . Then, thanks to Lemma 3.1, we have , so we can apply the Smith method (7) to (15). In addition, we can change the value of at each iteration. The resulting algorithm is known as ADI iteration PeaR55, Wac88:


The sequence of shifts can be chosen arbitrarily, with the only condition that . By writing a recurrence for the error , one sees that


a formula which generalizes (8). When is normal, the problem of assessing the convergence speed of this iteration can be reduced to a scalar approximation theory problem. Note that

If one knows a region that encloses the eigenvalues of , the optimal choice of is the degree- rational function that minimizes


i.e., a rational function that is ‘as large as possible’ on and ‘as small as possible’ on . Finding this rational function is known as Zolotarev approximation problem, and it was solved by its namesake for many choices of , including : this choice of corresponds to having a symmetric positive definite for which a lower and upper bound on the spectrum are known. It is known that the optimal ratio (18) decays as , where is a certain value that depends on , related to its so-called logarithmic capacity. See the recent review by Beckermann and Townsend BecT19 for more details. Optimal choices for the shifts for a normal were originally studied by Wachspress Wac88, EllW91. When is non-normal, a similar bound can be obtained from its eigendecomposition , but it includes its eigenvalue condition number , and thus it is of worse quality.

An important case, both in theory and in practice, is when has low rank. One usually writes , where is a short-fat matrix, motivated by a standard notation in control theory. A decomposition can be derived from (16), and reads


Hence is obtained by concatenating horizontally terms of size each. Each of them contains a rational function of of increasing degree multiplied by . All the factors in parentheses commute: hence that the factors can be computed with the recurrence


This version of ADI is known as low-rank ADI (LR-ADI) BenLP08. After steps, , but note that in the intermediate steps the quantity differs from in (16). Indeed, in this factorized version the shifts appear in reversed order, starting from and ending with . Nevertheless, we can use LR-ADI as an iteration in its own right: since we keep adding columns to at each step, converges monotonically to . This version is particularly convenient for problems in which is large and sparse, because in each step we only need to solve linear systems with a shifted matrix , and we store in memory only the matrix . In contrast, iterations such as (9) are not going to be efficient for problems with a large and sparse , since powers of sparse matrices become dense.

The formula (19) displays the relationship between ADI and certain Krylov methods: since the LR-ADI iterates are constructed by applying rational functions of iteratively to , the LR-ADI iterate lies in the so-called rational Krylov subspace Ruh84


constructed with pole polynomial . This suggests a different view: what is important is not the form of the ADI iteration, but rather the approximation space to which its iterates belong. Once one has chosen suitable shifts and computed an orthogonal basis of , (10) can be solved via Galerkin projection: we seek an iterate of the form , and compute by solving the projected equation

which is a smaller () Lyapunov equation.

While the approximation properties of classical Krylov subspaces are related to polynomial approximation, those of rational Krylov subspaces are related to approximation with rational functions, as in the Zolotarev problem mentioned earlier. In many cases, rational approximation has better convergence properties, with an appropriate choice of the shifts. This happens also for Lyapunov equations: algorithms based on rational Krylov subspaces (21DruS, DruKS (including ADI which uses them implicitly) often display better convergence properties than equivalent ones in which is chosen as a basis of a regular Krylov subspace or of an extended Krylov subspace


Computing a basis for a rational Krylov subspace (21) is more expensive than computing one for an extended Krylov subspace (22): indeed, the former requires solving linear systems with for many values of , while the latter uses multiple linear systems with the same matrix . However, typically, their faster convergence more than compensates for it. Another remarkable feature is the possibility to use an adaptive procedure based on the residual for shift selection DruS.

See also the analysis in Benner, Li, Truhar BenLT, which shows that Galerkin projection can improve also on the ADI solution.

An important consequence of the convergence of these algorithms is that they can be used to give bounds on the rank of the solution . Since we can find rational functions such that (18) decreases exponentially, the formula (17) shows that can be approximated well with , which has rank at most in view of the decomposition (19). This observation has practical relevance, since in many applications is very small, and the exponential decay in the singular values of is very well visible and helps reducing the computational cost.

3.3 Remarks

There is vast literature already for linear matrix equations, especially when it comes to large and sparse problems. We refer the reader to the review by Simoncini Sim-review for more details. The literature typically deals with continuous-time Lyapunov equations more often than their discrete-time counterpart; however, Cayley transformations (15) can be used to convert one to the other.

In particular, it follows from our discussion that a step of ADI can be interpreted as transforming the Lyapunov equation (10) into a Stein equation (1) via a Cayley transform (15) and then applying one step of the Smith iteration (7). Hence the squared Smith method (9) can be interpreted as a doubling algorithm to construct the ADI iterate in iterations only, but with the significant limitation of using only one shift in ADI.

It is known that a wise choice of shifts has a major impact on the convergence speed of these algorithms; see e.g. Güttel Gut13. A major challenge for doubling-type algorithms seems incorporating multiple shifts in this framework of repeated squaring. It seems unlikely that one can introduce more than one shift per doubling iteration, but even doing so would be an improvement, allowing one to leverage the theory of rational approximation that underlies ADI and Krylov space methods.

4 Discrete-time Riccati equations

We consider the equation


to be solved for . This equation is known as discrete-time algebraic Riccati equation (DARE), and arises in various problems connected to discrete-time control theory Datta Chapter 10. Variants in which are not necessarily positive semidefinite also exist RanT93, Wil71, but we will not deal with them here to keep our presentation simpler. The non-linear term can appear in various slightly different forms: for instance, if for certain matrices , , then one sees with some algebra that


and all these forms can be plugged into (23) to obtain a slightly different (but equivalent) equation. In particular, from the versions in the last two rows one sees that is Hermitian, which is not evident at first sight. These identities become clearer if one considers the special case in which : in this case, one sees that the expressions in (24) are all different ways to rewrite the sum of the converging series .

Note that the required inverses exist under our assumptions, because the eigenvalues of coincide with those of .

4.1 Solution properties

For convenience, we assume in the following that is invertible. The results in this section hold also when it is singular, but to formulate them properly one must deal with matrix pencils, infinite eigenvalues, and generalized invariant subspaces (or deflating subspaces), a technical difficulty that we would rather avoid here since it does not add much to our presentation. For a more general pencil-based presentation, see for instance Mehrmann meh-cayley.

For each solution of the DARE (23), it holds that


Equation (25) shows that is an invariant subspace of


i.e., maps this subspace into itself. In particular, the eigenvalues (counted with multiplicity) of are a subset of the eigenvalues of : this can be seen by noticing that the matrix represents (in a suitable basis) the linear operator when restricted to said subspace. Conversely, if one takes a basis matrix for an invariant subspace of , and if is invertible, then is another basis matrix, the equality (25) holds, and is a solution of (23). Hence, (23) typically has multiple solutions, each associated to a different invariant subspace. However, among them there is a preferred one, which is the one typically sought in applications.

LancasterRodman Corollary 13.1.2 and Theorem 13.1.3 Assume that , and is d-stabilizable. Then, (23) has a (unique) solution such that

  1. ;

  2. for any other Hermitian solution ;

  3. .

If, in addition, is d-detectable, then . The hypotheses involve two classical definitions from control theory Datta: d-stabilizable (resp. d-detectable) means that all Jordan chains of (resp. ) that are associated to eigenvalues outside the set are contained in the maximal (block) Krylov subspace (resp. ). We do not discuss further these hypotheses nor the theorem, which is not obvious to prove; we refer the reader to Lancaster and Rodman LancasterRodman for details, and we just mention that these hypotheses are typically satisfied in control theory applications. This solution is often called stabilizing (because of property 3) or maximal (because of property 2).

Various properties of the matrix in (26) follow from the fact that it belongs to a certain class of structured matrices. Let . A matrix is called symplectic if , i.e., if it is unitary for the non-standard scalar product associated to . The following properties hold.

  1. A matrix in the form (26) is symplectic if and only if , and the two blocks called in (26) are one the conjugate transpose of the other.

  2. If

    is an eigenvalue of a symplectic matrix with right eigenvector

    , then is an eigenvalue of the same matrix with left eigenvector .

  3. Under the hypotheses of Theorem 4.1 (including the d-detectability one in the end), then the eigenvalues of are (counting multiplicities) the eigenvalues of inside the unit circle, and the eigenvalues , outside the unit circle. In particular, spans the unique invariant subspace of  of dimension all of whose associated eigenvalues lie in the unit circle.

Parts 1 and 2 are easy to verify from the form (26) and the definition of symplectic matrix, respectively. To prove Part 3, plug into (25) and notice that has eigenvalues inside the unit circle; these are also eigenvalues of . By Part 2, all other eigenvalues lie outside the unit circle.

4.2 Algorithms

The shape of (23) suggests the iteration


This iteration can be rewritten in a form analogous to (25):


Equivalently, one can write it as


This form highlights a connection with (inverse) subspace iteration (or orthogonal iteration), a classical generalization of the (inverse) power method to find multiple eigenvalues Wat-book. Indeed, we start from the matrix , and at each step we first multiply it by , and then we normalize the result by imposing that the first block is . In inverse subspace iteration, we would make the same multiplication, but then we would normalize the result by taking the factor of its QR factorization, instead.

It follows from classical convergence results for the subspace iteration (see e.g. Watkins Wat-book Section 5.1) that (29) converges to the invariant subspace associated to the largest eigenvalues (in modulus) of , i.e., the smallest eigenvalues of . In view of Part 3 of Lemma 4.1, this subspace is precisely . Note that this unusual normalization is not problematic, since at each step of the iteration (and in the limit) the subspace does admit a basis in which the first

rows form an identity matrix. This argument shows the convergence of (

27) to the maximal solution, under the d-detectability condition mentioned in Theorem 4.1, which ensures that there are no eigenvalues on the unit circle.

How would one construct a ‘squaring’ variant of this method? Note that that ; hence one can think of computing by iterated squaring to obtain in steps. However, this idea would be problematic numerically, because it amounts to delaying the normalization in subspace iteration until the very last step. The key to solve this issue is using the LU-like decomposition obtained from (26)

We seek an analogous decomposition for the powers of , i.e.,


The following result shows how to compute this factorization with just one matrix inversion.  PolR Let . The factorization


exists if and only if is invertible, and in that case its blocks are given by

A proof follows from noticing that the factorization (31) is equivalent to the existence of a matrix such that

and rearranging block columns in this expression.

One can apply Lemma 30 (with and ) to find a factorization of the term in parentheses in


and use it to construct a decomposition (30) of starting from that of . The fact that the involved matrices are symplectic can be used to prove that the relations , , will hold for the computed coefficients. We omit the details of this computation; what matters are the resulting formulas

with .

These formulas are all we need to formulate a ‘squaring’ version of (27): for each it holds that

hence , the th iterate of (27). It is not difficult to show by induction that , and we have already argued above that . In view of the interpretation as subspace iteration, the convergence speed of (27) is linear and proportional to the ratio between the absolute values of the st and th eigenvalue of , i.e., between and its inverse . The convergence speed of its doubling variant (33) is then quadratic with the same ratio doublingbook.

The iteration (33), which goes under the name of structure-preserving doubling algorithm, has been used to solve DAREs and related equations by various authors, starting from Chu, Fan, Lin and Wang chu-dare, but it also appears much earlier: for instance, Anderson And78 gave it an explicit system-theoretical meaning as constructing an equivalent system with the same DARE solution. The reader may find in the literature slightly different versions of (33), which are equivalent to them thanks to the identities (24).

More general versions of the factorization (30) and of the iteration (33), which guarantee existence and boundedness of the iterates under much weaker conditions, have been explored by Mehrmann and Poloni MehP12. Kuo, Lin and Shieh KuoLS studied the theoretical properties of the factorization (30) for general powers , , drawing a parallel with the so-called Toda flow for the QR algorithm.

The limit of the monotonic sequence also has a meaning: it is the maximal solution of the so-called dual equation


which is obtained swapping with and with in (23). Indeed, SDA for the DARE (34) is obtained by swapping with and with in (33), but this transformation leaves the formulas unchanged. The dual equation (34) appears sometimes in applications together with (23). From the point of view of linear algebra, the most interesting feature of its solution is that is a basis matrix for the invariant subspace associated to the other eigenvalues of , those outside the unit circle. Indeed, (30) gives

so is the limit of subspace iteration applied to instead of , with initial value . In particular, putting all pieces together, the following Wiener-Hopf factorization holds


This factorization relates explicitly the solutions to a block diagonalization of .

An interesting limit case is the one when only the first part of Theorem 4.1 holds, is not d-detectable, and the solution exists but . In this case, has eigenvalues on the unit circle, and it can be proved that all its Jordan blocks relative to these eigenvalues have even size: one can use a result in Lancaster and RodmanLancasterRodman Theorem 12.2.3, after taking a factorization with and using another result in the same book LancasterRodman Theorem 12.2.1 to show that the hypothesis holds.

It turns out that in this case the two iterations still converge, although (27) becomes sublinear and (33) becomes linear with rate . This is shown by Chiang, Chu, Guo, Huang, Lin and Xu Chi09-critical-doubling; the reader can recognize that the key step there is the study of the subspace iteration in presence of Jordan blocks of even multiplicity.

Note that the case in which the assumptions do not hold is trickier, because there are examples where (23) does not have a stabilizing solution and

has Jordan blocks of odd size with eigenvalues on the unit circle: an explicit example is


which produces a matrix with two simple eigenvalues (Jordan blocks of size ) with . Surprisingly, eigenvalues on the unit circle are a generic phenomenon for symplectic matrices, which is preserved under perturbations: a small perturbation of the matrices in (36) will produce a perturbed with two simple eigenvalues that satisfy exactly , because otherwise Part 2 of Lemma 4.1 would be violated.

5 Continuous-time Riccati equations

We consider the equation


to be solved for . This equation is known as continuous-time algebraic Riccati equation (CARE), and arises in various problems connected to continuous-time control theory Datta Chapter 10. Despite the very different form, this equation is a natural analogue of the DARE (23), exactly like Stein and Lyapunov equations are related to each other.

5.1 Solution properties

For each solution of the CARE, it holds


Hence, is an invariant subspace of


Like in the discrete-time case, this relation implies that the eigenvalues of are a subset of those of ; moreover, we can construct a solution to (37) from an invariant subspace