 # Lanczos-like method for the time-ordered exponential

The time-ordered exponential is defined as the function that solves any system of coupled first order linear differential equations with constant or non-constant coefficients. In spite of it being at the heart of much system dynamics, control theory and model reduction problems, the time-ordered exponential function remains elusively difficult to evaluate. Here we present a Lanczos-like algorithm capable of evaluating it by producing a tridiagonalization of the original differential system. The algorithm is presented in a theoretical setting. Nevertheless, a natural strategy for its numerical implementation is also outlined and will form the basis of a future work.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Definition and importance

Let be real time variables and a time dependent matrix. The time-ordered exponential of is defined as the unique solution of the system of coupled linear differential equations with non-constant coefficients

 A(t′)U(t′,t)=ddt′U(t′,t),U(t,t)=Id, (1)

with

the identity matrix. If the matrix

commutes with itself at all times, i.e. for all , then the time-ordered exponential is simply given by a matrix exponential as In the general case however, when does not commute with itself at all times, the time-ordered exponential has no known explicit form in terms of and is rather denoted

 U(t′,t)=Texp(∫t′tA(τ)dτ),

with , the time-ordering operator . This expression, introduced by Dyson in 1952, is more a notation than an explicit form as the action of the time-ordering operator is very difficult to evaluate. In particular, does not have a Cauchy integral representation and it cannot be evaluated via ordinary diagonalization. It is unlikely that a fully explicit form for in terms of exists at all since even when is , can involve very complicated special functions [50, 51, 25].

The problem of evaluating time-ordered exponentials is a central question in the field of system dynamics, in particular in quantum physics where is the Hamiltonian. Situations where this operator does not commute with itself are routinely encountered  and the departure of the time-ordered exponential from a straightforward matrix exponential is responsible for many peculiar physical effects [53, 48, 45, 43, 52, 31, 2]. Further applications are found via differential Lyapunov and Riccati matrix equations which appear frequently in control theory, filter design and model reduction problems [44, 30, 9, 3, 5]. Indeed, The solutions of such differential equations involve time-ordered exponentials [29, 1], the evaluations of which form the crux of the computational difficulty; e.g., [23, 28].

### 1.2 Existing approaches: pitfalls and drawbacks

In spite of the paramount importance of the time-ordered exponential, it is usually omitted from the literature on matrix functions and suffers from a scarcity of methods capable of approximating it. Until 2015, only two families of approaches existed. The first one to have been devised relies on Floquet theory and necessitates to be periodic (see, e.g., ). This method transforms Eq. (1) into an infinite system of coupled linear differential equations with constant coefficients. This system is then solved perturbatively at very low order, as orders higher than 2 or 3 are typically too involved to be treated. The second method was developed in 1954 by Wilhelm Magnus . It produces an infinite series of nested commutators of with itself at different times, the ordinary matrix exponential of which provides the desired solution . A major drawback of Magnus series for is that they are rapidly and incurably divergent [34, 46]. In spite of this, Magnus series are very much in use nowadays  due to a lack of alternatives and because they guarantee the unitary of the approximated in quantum mechanical calculations .

In 2015, P.-L. G. proposed a third method to obtain ordered exponentials using graph theory and necessitating only the entries to be bounded functions of time . The method, termed path-sums, sees the matrix as the adjacency matrix of a dynamical graph . Suppose that is an matrix. Then will have vertices, and if for at least one value of , then an edge is drawn on from vertex to vertex with dynamical weight . Infinite series of walks on generate the time-ordered exponential of , while algebraic results on such series can transform them into a single finite, branched, continued fraction giving any desired entry or group of entries of . While this approach can provide exact solutions and has been shown to be unconditionally convergent, it suffers from a computational drawback. Indeed, it requires one to find all the simple cycles and simple paths of the graph . These are the walks on which are not self-intersecting. Unfortunately, the problem of enumerating such walks is P-complete , hindering the determination of exact solutions in large systems which must be treated using a further property of analytical path-sums called scale-invariance . The present work solves this issue with a numerical outlook, by effectively mapping the dynamical graph on a structurally simpler graph with well chosen time-dependent edge weights. On this graph the path-sum solution takes the form of a finite continued fraction.

### 1.3 Lanczos algorithms: background

Consider the simpler case in which is not time dependent. The solution is given by the matrix function which can be numerically approximated in several different ways (see, e.g., [35, 36, 24]). One possible method is Lanczos algorithm. Computing the element of is equivalent to computing the bilinear form , with vectors from the canonical Euclidean bases, and the usual Hermitian transpose (here it coincides with the transpose since the vector is real). The non-Hermitian Lanczos algorithm (e.g., [20, 21, 19, 32]) gives, when possible, biorthonormal basis

 Vn={v0,…,vn−1},Wn={w0,…,wn−1}

respectively for the Krylov subspaces

 {eℓ,Aeℓ,…,An−1eℓ},{ek,AHek,…,(AH)n−1ek}.

Note that for Hermitian and we get the Hermitian Lanczos algorithm (). The so call (complex) Jacobi matrix is a tridiagonal symmetric matrix which is the projection of given by

 Jn=WHnAVn.

Following the strategy described in , we get the approximation

 eHkexp(A)eℓ≈eH1exp(Jn)e1, (2)

which relies on the so called

matching moment property

, i.e.,

 eHk(A)jeℓ≈eH1(Jn)je1,j=0,1,…,2n−1; (3)

we refer to [19, 32] for the Hermitian case, and [40, 41] for the non-Hermitian one. Approximation (2) is connected with (formal) orthogonal polynomials, Gauss quadrature, continued fractions, the moment problem, and many other topics; we refer the reader to the books [10, 19, 32] and to the surveys [40, 41, 42, 39]. The approximation (2) is a model reduction in two senses. First, the size of is much larger than – the size of . Second, the structure of the matrix is much simpler since it is tridiagonal. From a graph prospective, if we look at and as adjacency matrices of, respectively, the graphs and , the possibly very complicated structure of is reduced to the path (with selfloops) . In this framework, property (3) shows that the weighted number of walks in of length from the node to the node is equal to the weighted number of closed walks of length in passing through the node , for ; see, e.g., [12, 4].

Inspired by approximation (2), we will introduce a method for the model reduction of a time-ordered expoenential by providing a time-dependent tridiagonal matrix satisfying properties analogous to the one described above. To this goal, we will use an biorthogonalization process with respect to a convolution-like product . We call such process -Lanczos algorithm since it resembles the Lanczos algorithm. Differently from the classical case, the -Lanczos algorithm works on vector distribution subspaces and it has to deal with a non-commutative product.111There exist other examples of variations of the Lanczos algorithm with non-commutative products, such as the global and the block Lanczos algorithms; see, e.g., [14, 15]

where several of such methods are classified in term of a matrix-valued inner product.

The time-dependent framework in which the proposed method works is much more complicated than the (time-independent) Krylov subspace approximation give by the (classical) Lanczos algorithm. In this paper, we will not deal with the behavior of the -Lanczos algorithm considering approximations and finite-precision arithmetic. Indeed, our aim is to show that it is possible to give an expression for the time-ordered exponential throughout our procedure in a finite number of steps. As it is well known, rounding errors deeply effects the behavior of (classical) Lanczos algorithm by loss of orthogonality of the computed Krylov subspaces basis (see, e.g., ). We expect to see a similar behavior in any numerical implementation of the -Lanczos. Such issue needs to be investigated further in order to confidently rely on the method in a computational setting. Especially since the proposed algorithm relies on short-recurrences. We want to stress it again, the described -Lanczos algorithm and the code presented later may not be computationally reliable due to rounding errors or inaccuracies given by needed approximations. This paper is a first step, on the basis of which further investigations will be developed.

The work is organized as follows: in Section 2, we present the Lanczos-like algorithm. This relies on a novel non-commutative -product between generalized functions of two time variables, which we introduce in Section 2.1. Then, in Section 2.2 we state the main result, Theorem 1, which underpins the Lanczos-like procedure. The Theorem establishes that the -moments of a certain tridiagonal matrix match the -moments of the original time dependent matrix . An algorithmic implementation of the procedure to construct is presented. Theorem 1 is proved with the tools developed in the subsequent Section 2.3. Section 3 is devoted to the convergence and breakdown properties of the algorithm while examples of its use are given in Section 4. We finish with Section 5, which outlines a way to numerically implement the Lanczos-like procedure and evaluates its computational cost. Appendix A gives technical results regarding -inversion.

## 2 ∗-Lanczos Algorithm

### 2.1 ∗-Product and moment matching

Let and be time variables in an interval and let and be (doubly) time-dependent generalized functions. We define the convolution-like product between and as

 (f2∗f1)(t′,t):=∫∞−∞f2(t′,τ)f1(τ,t)dτ,

Most of the generalized functions we will work with are so that when . This is because the linear algebraic structure associated with the time-ordered exponential imposes in all calculations, a complete explanation can be found in . To reflect this, we will often use generalized functions of the form , where is the Heaviside theta function with the convention that . Here and in the rest of the paper, the tilde indicates that is a regular function. The -product between now effectively reads as

 (f2∗f1)(t′,t) =∫∞−∞~f2(t′,τ)~f1(τ,t)Θ(t′−τ)Θ(τ−t)dτ, =Θ(t′−t)∫t′t~f2(t′,τ)~f1(τ,t)dτ.

The form with the integral over the interval is more convenient to wield and will be sufficient in most cases. Moreover, it shows that can be seen as a generalization of the product in . On the other side, the form with the integral from to will allow us to properly deal with distributions whenever they arise in calculations.

The -product extends to matrices composed of (doubly) time-dependent generalized functions. Consider two of such time-dependent matrices and , then

 (A2∗A1)(t′,t):=∫+∞−∞A2(t′,τ)A1(τ,t)dτ,

where the sizes of are compatible for the usual matrix product (here and in the following we omit the dependency on and when it is clear from the context). The -product is associative and distributive with respect to the addition but it is non-commutative. The identity element with respect to this product is with the identity matrix of appropriate dimension and is the Dirac delta distribution. The -product is also well defined for matrices which depend on less than two time variables. Consider the matrices and . Then and In other terms, the time variable of is always treated as the left time variable of a doubly time dependent matrix. These definitions extend straightforwardly to time-independent matrices.

Using the -product, the -resolvent of any matrix depending on at most two time variables is well defined, as exists provided every entry of is bounded for all . Then,

 U(t′,t)=Θ(t′−t)∫t′tR∗(A)(τ,t)dτ, (4)

is the time-ordered exponential of . Note that usually time-ordered exponentials are presented with only one time variable, corresponding to . Yet, in general .

In the spirit of Lanczos algorithm, given a time-dependent matrix , we will construct a matrix of size with a simpler (tridiagonal) structure and so that

 (A∗j(t′,t))k,ℓ=(T∗jn(t′,t))1,1, for j=0,…,2n−1,t′,t∈I. (5)

In particular, when property (5) stands for every , giving

 R∗(A)k,ℓ=(T∗jN)1,1.

Hence the solution is given by the path-sum techniques exploiting the fact that the graph associated with is a path with self-loops. Property (5) can be rewritten more generally as

 wH(A∗j(t′,t))v=eH1(T∗jn(t′,t))e1, % for j=0,…,2n−1,t′,t∈I,

where are time-independent vectors. Note that when the product is omitted it stands for the usual matrix-vector product.

### 2.2 Building up the Lanczos process

Given a doubly time dependent matrix and scalar generalized functions which play the role of the coefficients, we define the matrix -polynomial of degree as

 p(A)(t′,t):=k∑j=0(A∗j∗αj)(t′,t);

moreover we define the corresponding dual time-dependent matrix polynomial as

 pD(A)(t′,t):=k∑j=0(¯¯¯¯αj∗(A∗j)H)(t′,t),

where is the conjugated value of and is the Hermitian transpose of . Let be a vector not depending on time, then we can define the set of vectors , with a time-dependent matrix polynomial (the product between and is the usual matrix vector product). Such set is a vector space with respect to the product with the scalar coefficients (the addition is the usual addition between vectors). Similarly, given a vector not depending on time, we can define the vector space given by the dual vectors . In particular, we can define the time-dependent Krylov subspaces

 Kn(A,v)(t′,t) :={(p(A)v)(t′,t)|p of degree at most n−1}, KDn(AH,w)(t′,t) :={(wHpD(AH))(t′,t)|p of degree at most n−1}.

Clearly the vectors and are bases respectively for and . We aim to derive -biorthonormal bases and for the Krylov subspaces, i.e., so that

 wHi∗vj=δij1∗, (6)

with the Kronecker delta.

Assume for the moment that , we can use a non-Hermitian Lanczos like biorthogonalization process for the triplet . We shall call this method the -Lanczos process. The first vectors of the biorthogonal basis are

 v0=v1∗,wH0=wH1∗,

so that . Consider now a vector given by

 ˆv1=A∗v0−v0∗α0=Av−vα0.

If the vector satisfies the -biorthogonal condition , then

 α0=wH0∗A∗v0=wHAv. (7)

Similarly, we get the expression

 ˆwH1=wH0∗A−α0∗wH0=wHA−α0wH,

with given by (7). Hence the corresponding -biorthonormal vectors are defined by

 v1=ˆv1∗(ˆwH1∗ˆv1)∗−1,w1=ˆw1.

Assume now that we have the -biorthonormal bases and . Then we can build the vector

 ˆvn=A∗vn−1−n−1∑i=0vi∗γi,

where the are determined through the -biorthogonality condition for , giving

 γj=wHj∗A∗vn−1,j=0,…,n−1.

In particular, since we get for . This leads to the following three-term recurrences for using the convention , equationparentequation

 wHn =wHn−1∗A−αn−1∗wHn−1−βn−1∗wHn−2, (8a) vn∗βn =A∗vn−1−vn−1∗αn−1−vn−2, (8b)

with the coefficients given by

 αn−1=wHn−1∗A∗vn−1,βn=wHn∗A∗vn−1. (9)

Should not be -invertible, we would get a breakdown in the algorithm, since it would be impossible to compute . We developed a range of methods to determine the -inverse of functions of two time variables which are gathered in Appendix A. These methods show constructively that as long as is holonomic (i.e. D-finite) in either or then exists and can be computed. From now on, we assume that is indeed -invertible, while breakdowns are studied in Section 3.2.

The -orthogonalization process described above can be summarized in what we call the -Lanczos Algorithm (Table 1). The reason for this name is that the algorithm resembles the original Lanczos algorithm. Indeed, if all the inputs were time-independent, and if we substituted the product with the usual vector (or matrix-vector) product and with , then Algorithm 1 would be mathematically equivalent to the non-Hermitian Lanczos algorithm.

Let us define the tridiagonal matrix

 Tn:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣α01∗β1α1⋱⋱⋱1∗βn−1αn−1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (10)

and the matrices and . Then the three-term recurrences Eqs. (8) read, in matrix form,

 A∗Vn=Vn∗Tn+(vn∗βn)eHn, WHn∗A=Tn∗WHn+enwHn.

Hence the tridiagonal matrix (10) is the projection of onto along the direction of , i.e.,

 Tn=WHn∗A∗Vn.

The following property of is fundamental for deriving our approximation. We will prove it, together with a polynomial interpretation of the -Lanczos Algorithm, in the following subsection.

###### Theorem 1 (Matching Moment Property)

Let and be as described above, then

 wH(A∗j)v=eH1(T∗jn)e1, for j=0,…,2n−1. (11)

Consider the time-ordered exponential given by the differential equation

 Tn(t′,t)Un(t′,t)=ddt′Un(t′,t). (12)

Theorem 1 and Eq. (4) justify the use of the approximation

 wHU(t′,t)v≈eH1Un(t′,t)e1=∫t′tR∗(Tn)1,1(τ,t)dτ. (13)

The approximation (13) can be seen as a reduced order model of the initial differential equation (1) from two points of view. First, may be much smaller than the size of ; in this sense, in Section 3, we will discuss the convergence behavior of the approximation using Theorem 1. Secondly, looking at and as adjacency matrices, may correspond to a graph with a complex structure, while corresponds to a very simple graph composed of one path with possible self-loops. Then the path-sum method gives

 R∗(Tn)1,1(t′,t)=(1∗−α0−(1∗−α1−(1∗−...)∗−1∗β2)∗−1∗β1)∗−1, (14)

see [17, 18]. Of course, this is entirely similar to the expression for the first diagonal entry of the inverse of an ordinary tridiagonal matrix  (see also [19, 32] for the case of Jacobi matrices), except here all operations are taken with respect to the -product.

For , we get . As a consequence, we get

 wH(A∗j)v=eH1(T∗jN−1)e1, for j=0,1,….

Therefore for the approximation (13) is exact.

### 2.3 Matching moment property by ∗-biorthonormal polynomials

In order to prove Theorem 1, we will describe and use the connection between -Lanczos Algorithm and families of -biorthonormal polynomials. Le us define the set of -polynomials

 P∗:={p(λ)=k∑j=0λ∗j∗γj(t′,t)},

with scalar generalized functions. Moreover, let be the polynomial whose coefficients are the conjugate of coefficients, i.e., . Consider a -sesquilinear form , i.e., so that

 [q∗α,p∗β] =¯¯¯¯α∗[q,p]∗β, =[q1,p1]+[q2,p1]+[q1,p2]+[q2,p2].

From now on we assume that every considered -sesquilinear form satisfies

 [λ∗q,p]=[q,λ∗p]. (15)

The -sesquilinear form is determined by its -moments defined as

 mj(t,t′):=[λ∗j,1]=[1,λ∗j],j=0,1,….

We want to define the sequences of polynomials and which are -biorthonormal with respect to , i.e., so that

 [qi,pj]=δij1∗, (16)

where the subindex in and corresponds to the degree of the polynomial. Assuming (without loss of generality) that gives . Consider the polynomial

 q1(λ)=λ∗q0(λ)−q0(λ)∗¯¯¯¯α0.

The orthogonality conditions (16) give

 α0=[λ∗q0,p0].

Similarly, we get the polynomial

 p1(λ)∗β1=λ∗p0(λ)−p0(λ)∗α0,

with

 α0=[q0,λ∗p0],β1=[q1,λ∗p0].

Repeating the -orthogonalization process, we obtain the three-term recurrences for equationparentequation

 qn(λ) =λ∗qn−1(λ)−qn−1(λ)∗¯¯¯¯αn−1−qn−2(λ)∗¯¯¯βn−1 (17a) pn(λ)∗βn =λ∗pn−1(λ)−pn−1(λ)∗αn−1−pn−2(λ), (17b)

with and

 αn−1=[qn−1,λ∗pn−1],βn=[qn,λ∗pn−1]. (18)

Note that deriving the recurrences needs property (15). Clearly, the sequences of -biorthonormal polynomials and exist under the assumption that are -invertible.

Let be a time-dependent matrix, and time-independent vectors such that . Consider the -sesquilinear form defined by

 [q,p]=wH¯¯¯qD(AH)∗p(A)v.

Assume that there exist -polynomials and -biorthonormal with respect to . Defining the vectors

 vj=pj(A)v,wHj=wH¯¯¯qDj(AH)

and using the recurrences (17) gives the -Lanczos recurrences (8). Moreover the coefficients in (18) are the -Lanczos coefficients in (9).

Let be a tridiagonal matrix as in (10) composed of the coefficients (18) associated with the -sesquilinear form . Then we can define the -bilineair form

 [q,p]n=eH1¯¯¯qD(THn)∗p(Tn)e1.

The following lemmas will show that

 mj=[λ∗j,1∗]=[λ∗j,1∗]n,j=0,…,2n−1,

proving Theorem 1.

###### Lemma 1

Let and be -biorthonormal polynomials with respect to the -sesquilinear form . Assume that the coefficients in the related recurrences (17) are -invertible. Then the -polynomials are also -biorthonormal with respect to the form defined above.

###### Proof

Consider the vectors and . Since the matrix is tridiagonal we have

 eHixj=0, for i≥j+2, and eHj+1xj=βj∗⋯∗β1, yHjei=0, for i≥j+2, and yHjej+1=1∗.

By assumption, the product is -invertible. Therefore there exist -polynomials and so that, for , we get

 eHi+1=eH1ˆqi(Tn),ei+1=ˆpi(Tn)e1.

Such polynomials are -biorthonormal with respect to since they satisfy

 [ˆqi,ˆpj]=eHi+1ej+1=δij1∗.

Moreover, for , the corresponding recurrence coefficients (18) are the same of the polynomials and . Indeed,

 ˆαi−1 =[ˆqi−1,λ∗ˆpi−1]=eHi−1Tnei−1=αi−1, ˆβi =[ˆqi,λ∗ˆpi−1]=eHiTnei−1=βi.

Therefore since , we get and for .

###### Lemma 2

Let and be -biorthonormal polynomials with respect to a -sesquilinear form and to a -sesquilinear form . If , then for .

###### Proof

We prove it by induction. Let and for . By the expression for the coefficients in (18) we get

 [q0,λ∗p0]A=α0=[q0,λ∗p0]B.

Hence . Assuming for we will prove that and , for . Considering the coefficient expressions in (18) gives

 [qk−1,λ∗pk−2]A=βk−1=[qk−1,λ∗pk−2]B.

The previous equation can be rewritten as

 k−1∑i=0k−2∑j=0¯ai∗mi+j+1∗bj=k−1∑i=0k−2∑j=0¯ai∗ˆmi+j+1∗bj,

with the coefficients respectively of and . The inductive assumptions implies

 ¯ak−1