 # Time integration of tree tensor networks

Dynamical low-rank approximation by tree tensor networks is studied for the data-sparse approximation to large time-dependent data tensors and unknown solutions of tensor differential equations. A time integration method for tree tensor networks of prescribed tree rank is presented and analyzed. It extends the known projector-splitting integrators for dynamical low-rank approximation by matrices and Tucker tensors and is shown to inherit their favorable properties. The integrator is based on recursively applying the Tucker tensor integrator. In every time step, the integrator climbs up and down the tree: it uses a recursion that passes from the root to the leaves of the tree for the construction of initial value problems on subtree tensor networks using appropriate restrictions and prolongations, and another recursion that passes from the leaves to the root for the update of the factors in the tree tensor network. The integrator reproduces given time-dependent tree tensor networks of the specified tree rank exactly and is robust to the typical presence of small singular values in matricizations of the connection tensors, in contrast to standard integrators applied to the differential equations for the factors in the dynamical low-rank approximation by tree tensor networks.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

For the approximate solution of the initial value problem for a (huge) system of differential equations for the tensor ,

 ˙A(t)=F(t,A(t)), (1)

we aim to construct in an approximation manifold  of much smaller dimension, which in the present work will be chosen as a manifold of tree tensor networks of fixed tree rank. This shall provide a data-sparse computational approach to high-dimensional problems that cannot be treated by direct time integration because of both excessive memory requirements and computational cost.

A differential equation for is obtained by choosing the time derivative as that element in the tangent space for which

 ∥˙Y(t)−F(t,Y(t))∥is minimal,

where the norm is chosen as the Euclidean norm of the vector of the tensor entries. In the quantum physics and chemistry literature, this approach is known as the Dirac–Frenkel time-dependent variational principle, named after work by Dirac in 1930 who used the approach in the context of what is nowadays known as the time-dependent Hartree–Fock method for the multi-particle time-dependent Schrödinger equation; see, e.g.,

[14, 15]. Equivalently, this minimum-defect condition can be stated as a Galerkin condition on the state-dependent approximation space ,

 ˙Y(t)∈TY(t)Msuch that⟨˙Y(t)−F(t,Y(t)),Z⟩=0 ∀Z∈TY(t)M.

Using the orthogonal projection onto the tangent space , this can be reformulated as the (abstract) differential equation on ,

 ˙Y(t)=P(Y(t))F(t,Y(t)). (2)

This equation needs to be solved numerically in an efficient and robust way. For fixed-rank matrix and tensor manifolds, the orthogonal projection turns out to be an alternating sum of subprojections, which reflects the multilinear structure of the problem. The explicit form of the tangent space projection in the low-rank matrix case as an alternating sum of three subprojections was derived in  and was used in  to derive a projector-splitting integrator for low-rank matrices, which efficiently updates an SVD-like low-rank factorization in every time step and which is robust to the typically arising small singular values that cause severe difficulties with standard integrators applied to the system of differential equations for the factors of the SVD-like decomposition of the low-rank matrices; see . The projector-splitting integrator was extended to tensor trains / matrix product states in ; see also  for a description of the algorithm in a physical idiom. The projector-splitting integrator was extended to Tucker tensors of fixed multilinear rank in . A reinterpretation was given in , in which the Tucker tensor integrator was rederived by recursively performing inexact substeps in the matrix projector-splitting integrator applied to matricizations of the tensor differential equation followed by retensorization. This interpretation made it possible to show that the Tucker integrator inherits the favorable robustness properties of the low-rank matrix projector-splitting integrator.

In the present paper we take up such a recursive approach to derive an integrator for general tree tensor networks, which is shown to be efficiently implementable (provided that the righthand side function can be efficiently evaluated on tree tensor networks in factorized form) and to inherit the robust convergence properties of the low-rank matrix, Tucker tensor and tensor train / matrix product state integrators shown previously in [11, 19]. The proposed integrator for tree tensor networks reduces to the well-proven projector-splitting integrators in the particular cases of Tucker tensors and tensor trains / matrix product states. We expect (but will not prove) that it can itself be interpreted as a projector-splitting integrator based on splitting the tangent space projection of the fixed-rank tree tensor network manifold.

In Section 2 we introduce notation and the formulation of tree tensor networks as multilevel-structured Tucker tensors and give basic properties, emphasizing orthonormal factorizations. The tree tensor network (TTN) is constructed from the basis matrices at the leaves and the connection tensors at the inner vertices of the tree in a multilinear recursion that passes from the leaves to the root of the tree.

In Section 3 we recall the algorithm of the Tucker tensor integrator of  and extend it to the case of several Tucker tensors with the same basis matrices. This extended Tucker integrator, which is nothing but the Tucker integrator for an extended Tucker tensor, is a basic building block of the integrator for tree tensor networks.

In Section 4

, the main algorithmic section of this paper, we derive the recursive TTN integrator and discuss the basic algorithmic aspects. In every time step, the integrator uses a recursion that passes from the root to the leaves of the tree for the construction of initial value problems on subtree tensor networks using appropriate restrictions and prolongations, and another recursion that passes from the leaves to the root for the update of the factors in the tree tensor network. The integrator only solves low-dimensional matrix differential equations (of the dimension of the basis matrices at the leaves) and low-dimensional tensor differential equations (of the dimension of the connection tensors at the inner vertices of the tree), alternating with orthogonal matrix decompositions of such small matrices and of matricizations of the connection tensors.

In Section 5 we prove a remarkable exactness property: if for a given tree tensor network of the specified tree rank, then the recursive TTN integrator for this tree rank reproduces exactly. This exactness property is proved using the analogous exactness property of the Tucker tensor integrator proved in , which in turn was proved using the exactness property for the matrix projector-splitting integrator that was discovered and proved in .

In Section 6 we prove first-order error bounds that are independent of small singular values of matricizations of the connecting tensors. The proof relies on the similar error bound for the Tucker integrator , which in turn relies on such an error bound for the fixed-rank matrix projector-splitting integrator proved in , in a proof that uses in an essential way the exactness property. The robustness to small singular values distinguishes the proposed integrator substantially from standard integrators applied to the differential equations for the basis matrices and connection tensors derived in . We note that the proposed TTN integrator foregoes the formulation of these differential equations for the factors. The ill-conditioned density matrices whose inverses appear in these differential equations are never formed, let alone inverted, in the TTN integrator.

The present paper thus completes a path to extend the low-rank matrix projector-splitting integrator of , together with its favorable properties, from the dynamical low-rank approximation by matrices of a prescribed rank to Tucker tensors of prescribed multilinear rank to general tree tensor networks of prescribed tree rank.

In Section 7 we present a numerical experiment which shows the error behaviour of the proposed integrator in accordance with the theory. We choose the example of retraction of the sum of a tree tensor network and a tangent network, which is an operation needed in many optimization algorithms for tree tensor networks; cf.  for the low-rank matrix case. The corresponding example was already chosen in numerical experiments for the low-rank matrix, tensor train and Tucker tensor cases in [17, 18, 19], respectively. It is beyond the scope of this paper to present the results of numerical experiments with the recursive TTN integrator in actual applications of tree tensor networks in physics, chemistry or other sciences. We note, however, that striking numerical experiments with precisely this integrator (given in an ad hoc formulation) are already reported in  for the Vlasov–Poisson equation of plasma physics, for a tree tensor network where the tree is given by the separation of the position and velocity variables, which are further separated into their Cartesian coordinates.

## 2 Preparation: Matrices, Tucker tensors, tree tensor networks, and their ranks

### 2.1 Matrices of rank r

The singular value decomposition shows that a matrix

is of rank if and only if it can be factorized as

 A=USV⊤,

where and have orthonormal columns, and has full rank . The real matrices of rank (exactly) are known to form a smooth embedded manifold in .

### 2.2 Tucker tensors of multilinear rank (ri)

For a tensor , the multilinear rank is defined as the -tuple of the ranks of the matricizations for , where . We recall that the th matricization aligns in the th row (for ) all entries of that have the index in the th position, usually ordered co-lexicographically. The inverse operation is retensorisation of the matrix, denoted by :

 A(i)=Mati(A)∈Rni×n¬i if and only if A=Teni(A(i))∈Rn1×⋯×nd.

It is known from  that the tensor has multilinear rank if and only if it can be factorized as a Tucker tensor (we adopt the shorthand notation from )

 A=C\large\sf Xdi=1Ui,i.e.,ak1,…,kd=r1∑l1=1…rd∑ld=1cl1,…,lduk1,l1…ukd,ld, (3)

where the basis matrices have orthonormal columns and the core tensor has full multilinear rank . (This requires a compatibility condition among the ranks: . In particular, this condition is satisfied if all ranks are equal.)

A useful formula for the matricization of Tucker tensors is

 Mati(C\large\sf Xdj=1Uj)=UiMati(C)⨂j≠iU⊤j, (4)

where denotes the Kronecker product of matrices.

The tensors of given dimensions and fixed multilinear rank are known to form a smooth embedded manifold in .

### 2.3 Orthonormal tree tensor networks of tree rank (rτ)

A tree tensor network is a multilevel-structured Tucker tensor where the configuration is described by a tree. The notion of a ‘tree tensor network’ was coined in the quantum physics literature , but tree tensor networks were actually already used a few years earlier in the multilayer MCTDH method of chemical physics . In the mathematical literature, tree tensor networks with binary trees have been studied as ‘hierarchical tensors’  and with general trees as tensors in ‘tree-based tensor format’ [5, 6]. We remark that Tucker tensors and matrix product states / tensor trains [21, 20] are particular instances of tree tensor networks, whose trees are trees of minimal height (bushes) and binary trees of maximal height, respectively. As there does not appear to exist a firmly established notation for tree tensor networks, we give a formulation from scratch that we find useful for our purposes.

[Ordered trees with distinct leaves] Let be a given finite set, to which we refer as the set of leaves. We define the set of trees with leaves in recursively as follows:

1. ,   for each .

2. If, for some ,

 τ1,…,τm∈T,L(τi)∩L(τj)=∅∀i≠j,

then the ordered -tuple

 τ:=(τ1,…,τm)∈T,L(τ):=˙⋃mi=1L(τi) .

The graphical interpretation is that (i) leaves are trees and (ii) every other tree is formed by connecting a root to several trees with distinct leaves. (Note that is excluded: the 1-tuple is considered as identical to .) is the set of leaves of the tree .

The trees are called direct subtrees of the tree , which together with direct subtrees of direct subtrees of etc. are called the subtrees of . We let be the set of subtrees of a tree , including . More formally, we set

 T(ℓ):={ℓ} for ℓ∈L,  and  T(τ):={τ}˙∪˙⋃mi=1T(τi) for τ=(τ1,…,τm).

In the graphical interpretation, the subtrees are in a bijective correspondence with the vertices of the tree, by assigning to each subtree its root; see Figure 2.1.

On the set of trees we define a partial ordering by writing, for ,

 σ≤τif and only ifσ∈T(τ), σ<τif and only ifσ≤τ  % and  σ≠τ.

On a given tree , we work with the following set of data:

• To each leaf we associate a dimension , a rank and a basis matrix of full rank .

• To every subtree we associate a rank and a connection tensor of full multilinear rank . We set .

This can be interpreted as associating a tensor to the root of each subtree. In this way every vertex of the given tree carries either a matrix — if it is a leaf — or else a tensor whose order equals the number of edges leaving the vertex.

With these data, a tree tensor network (TTN) is constructed in a recurrence relation that passes from the leaves to the root of the tree.

[Tree tensor network] For a given tree , we recursively define a tree tensor network as follows:

1. If , we set

 Xℓ:=U⊤ℓ∈Rrℓ×nℓ .
2. If , we set and

the identity matrix of dimension

, and

 Xτ:=Cτ×0Iτ\large\sf Xmi=1Uτi∈Rrτ×nτ1×⋯×nτm, Uτ:=Mat0(Xτ)⊤∈Rnτ×rτ .

We note that the expression on the righthand side of the definition of can be viewed as an -tuple of -tensors with the same basis matrices but different core tensors for . The vectorizations of these -tensors, which are of dimension , form the columns of the matrix . The index in and refers to the mode of dimension of , which we count as mode .

It is favorable to work with orthonormal matrices, so that each tensor is in the Tucker tensor format.

[Orthonormal tree tensor network] A tree tensor network (more precisely, its representation in terms of the matrices ) is called orthonormal if for each subtree , the matrix has orthonormal columns.

The following is a key lemma.

For a tree , let the matrices have orthonormal columns. Then, the matrix has orthonormal columns if and only if the matricization has orthonormal columns.

We have, by the definition of and and the unfolding formula (4),

 U⊤τ=Mat0(Xτ)=IτMat0(Cτ)m⨂i=1U⊤τi=Mat0(Cτ)m⨂i=1U⊤τi.

It follows that

 U⊤τUτ=Mat0(Cτ)(m⨂i=1U⊤τiUτi)Mat0(Cτ)⊤=(Mat0(Cτ)⊤)⊤Mat0(Cτ)⊤,

which proves the result.

We observe that due to the recursive definition of a tree tensor network it thus suffices to require that for each leaf the matrix and for every other subtree the matrix have orthonormal columns. Unless stated otherwise, we intend the tree tensor networks to be orthonormal in this paper.

The orthonormality condition of Lemma 2.3 is very useful, because it reduces the orthonormality condition for the large, recursively constructed and computationally inaccessible matrix to the orthonormality condition for the smaller, given matrix . To our knowledge, the first use of this important property was made in the chemical physics literature in the context of the multilayer MCTDH method .

A further consequence of Lemma 2.3

is that every tree tensor network has an orthonormal representation. This is shown using a QR decomposition of non-orthonormal matrices

and including the non-orthonormal factor in the tensor of the parent tree . It is of full tree rank if all these matrices are of full rank.

We rely on the following property, which we can be proved as in , where binary trees are considered (this corresponds to the case above); see also .

The set of tree tensor networks for a tree of given dimensions and ranks is a smooth embedded manifold in the tensor space .

## 3 Extended Tucker Integrator

In Subsection 3.1 we recapitulate the algorithm of the Tucker tensor integrator of [16, 19], and in Subsection 3.2 we extend the algorithm to -tuples of Tucker tensors with the same basis matrices. This will be a basic building block for the tree tensor network integrator derived in the next section.

### 3.1 Tucker tensor integrator

Let be the manifold of Tucker Tensors with fixed multi-linear rank r=. Let us consider an approximation to the initial data ,

 Y0=C0\large\sf Xdi=1U0i∈Rn1×⋯×nd.

The nested Tucker integrator is a numerical procedure that gives, after substeps, an approximation in Tucker tensor format, , to the full solution after a time step . The procedure is repeated over further time steps to yield approximations to . At each substep of the algorithm, only one factor of the Tucker representation is updated while the others, with the exception of the core tensor, remain fixed. The essential structure of the algorithm can be summarized in the following algorithmic scheme.

The update process in the Tucker integrator is the most intensive and technical part of the algorithm. In order to reduce the complexity of the explanation we introduce the subflows and , corresponding to the update of the basis matrices and of the core tensor, respectively. We set .

The remaining subflow describes the final update process of the core.

Finally, the result of the Tucker tensor integrator after one time step can be expressed in a compact way as

 Y1=Ψ∘Φ(d)∘⋯∘Φ(1)(Y0) . (5)

We refer the reader to  for a detailed derivation and major properties of this Tucker tensor integrator.

The efficiency of the implementation of this algorithm depends on the possibility to evaluate the functions without explicitly forming the large slim matrix and the tensor . This is the case if is a linear combination of Tucker tensors of moderate rank whose factors can be computed directly from the factors and without actually computing the entries of the Tucker tensor.

### 3.2 Extended Tucker Integrator

We consider the case of a -dimensional Tucker tensor where the first basis matrix in the decomposition of the initial data is the identity matrix of dimension ,

 Y0=C0×0Ir\large\sf X% di=1U0i∈Rr×n1×⋯×nd,

as appears in the recursive construction of orthonormal tree tensor networks. This can be viewed as a collection of Tucker tensors in with the same basis matrices . Recalling (5), the action of the Tucker integrator after one time step can be represented as

 Y1=Ψ∘Φ(d)∘⋯∘Φ(1)∘Φ(0)(Y0) .

The following result holds.

The action of the subflow on is trivial, i.e.

 Φ(0)(Y0)=Y0.

In the first step of the subflow , we matricize the core tensor in the zero mode and we perform a QR decomposition,

 Mat0(C0)⊤=Q00S0,⊤0.

We define

 V0,⊤0=Mat0(Ten0(Q0,⊤0)\large\sf Xdl=1U0,⊤l)

and we set

 K00=IrS00∈Rr×r.

The next step consists of solving the differential equation

 ˙K0(t)=Mat0(F(t,Ten0(K0(t)V0,⊤0))% V00, K0(t0)=K00.

We define

 K10=K0(t1)∈Rr×r

and we compute the corresponding QR decomposition,

 K10=IrK10.

To conclude, we solve the differential equation

 ˙S0(t)=−I⊤rMat0(F(t,Ten0(IrS0(t)V0,⊤0))V00, S0(t0)=K10.

We now observe that due to the presence of the negative sign, the solution of the equation can be directly computed, i.e.

 S0(t1)=K0(t0)=S00.

Therefore, and we conclude that .

During the update process the structure of the initial data is preserved and we can now introduce the extended Tucker integrator as follows.

## 4 Recursive tree tensor network integrator

We now come to the central algorithmic section of this paper. We derive an integrator for orthonormal tree tensor networks, which updates the orthonormal-basis matrices of the leaves and the orthonormality-constrained core tensors of the other vertices of the tree by a recursive TTN integrator.

### 4.1 Derivation

Let be a given tree with the set of leaves and a specified family of tree ranks, where we assume . For each subtree we introduce the space

 Vτ:=Rrτ×nτ1×⋯×nτm . (6)

In the following, we associate to each subtree of the given tree a tensor-valued function . Its actual recursive construction, starting from the root with and passing to the leaves, will be given in the next subsection.

Consider a tree and an extended Tucker tensor associated to the tree ,

 Y0τ=C0τ×0Iτ\large\sf Xmi=1U0τi .

Applying the extended Tucker integrator with the function we have that

 Y1τ=Ψτ∘Φ(m)τ∘⋯∘Φ(1)τ(Y0τ) .

We recall that the subflow gives the update process of the basis matrix . The extra subscript indicates that the subflow is computed for the function .

We have two cases:

1. If is a leaf, we directly apply the subflow and update the basis matrix.

2. Else, we apply only approximately (but call the procedure still ). We tensorize the basis matrix and we construct new initial data and a function . We iterate the procedure in a recursive way, reducing the dimensionality of the problem at each recursion.

This leads to the definition of the recursive tree tensor network (TTN) integrator. It has the same general structure as the extended Tucker integrator.

The difference to the Tucker integrator is that now the subflow is no longer the same as for the extended Tucker integrator, but it recursively uses the TTN integrator for the subtrees. This approximate subflow is defined in close analogy to the subflow for Tucker tensors, but the first differential equation is solved only approximately unless is a leaf.

The subflow is the same as for the Tucker integrator, for the function instead of .