Computing the solution of high-dimensional partial differential equations (PDEs) has become central to many new areas of application such as optimal mass transport[Osher2019, Villani], random dynamical systems [Klyatskin1, Venturi_MZ, Venturi_PRS], mean field optimal control [Weinan2019, Ruthotto2020], and functional-differential equations [VenturiSpectral, venturi2018]
. Classical numerical methods based on full tensor product discretizations are not viable in practice, due to the exponential growth of the degrees of freedom with the dimension. To address this problem, a significant research effort has been recently focused on developing effective approximation methods for high-dimensional PDEs. In particular, techniques such as sparse collocation[Bungartz, Chkifa, Barthelmann, Foo1, Akil]
, physics-informed neural networks[GK2020, Raissi, Raissi1, Zhu2019] and tensor methods [khoromskij, Bachmayr, rodgers2020stability, parr_tensor, Hackbusch_book, Kolda] were proposed to mitigate the exponential growth in complexity [Uschmajew2014], computational cost and memory requirements.
To describe the mathematical setting, let us consider the following initial value problem for an abstract nonlinear evolution equation
where is a -dimensional (time-dependent) scalar field defined in the domain (), and is a nonlinear operator which may be dependent on and may incorporate boundary conditions. Equation (1) is first approximated with respect to the variables , e.g., by finite differences [Strikwerda] or pseudo-spectral methods [spectral]. If is an element of a separable Hilbert space for all , then (1
) can be transformed into the following system of ordinary differential equations
where is multi-dimensional array of real numbers (the solution tensor), and is a tensor-valued nonlinear map (the discrete form of ). The structure of depends on the discretization scheme, as well as on the tensor format utilized for . Combining (2) with an ODE formula for the time-stepping, e.g., the Adams-Bashforth formula, yields a fully discrete system of nonlinear equations. For instance, if we discretize (2) in time with a one-step method on an evenly-spaced grid we obtain
where for etc. Of particular interest are low-rank tensor approximations of the solution to (2). Such approximations allow us to significantly reduce the number of degrees of freedom in the representation of the solution tensor , while maintaining accuracy. Low-rank tensor approximations of (2) can be constructed by using, e.g., rank-constrained temporal integration [lubich2013dynamical, koch2010dynamtucker, dektor2020dynamically, Alec2020] on a smooth Euclidean tensor manifold with constant rank [uschmajew2013geometry, Holtz_2012].
Alternatively, one can utilize the fully discrete scheme (3) followed by a rank-reduction (truncation) operation [grasedyck2010hierarchical, grasedyck2018distributed, kressner2014algorithm]. These methods will be called step-truncation methods [rodgers2020stability], and their analysis is one of the main objectives of the present paper. With reference to the time stepping scheme (3), a step-truncation method can be written as
where is a tensor rank reduction operation, i.e., a nonlinear projection to a tensor manifold with multilinear rank [uschmajew2013geometry]. The need for tensor rank-reduction when integrating (2) in time with (3) can be easily understood by noting that additions between tensors and the application of an operator to a tensor increase the tensor rank [kressner2014algorithm]. Hence, iterating (3) in the space of tensors with no rank reduction can yield a fast growth of the tensor rank which, in turn, can tax the memory requirements and the computational cost significantly, especially in high dimensions. To thoroughly analyze the step-truncation algorithms we propose in this paper, we will leverage the operator framework we recently introduced in [rodgers2020stability], where a projection onto a low-rank tensor manifold is seen as a nonlinear operator in the space of tensors. This allows us to prove various consistency results and errors estimates for a wide range of step-truncation methods. In particular, we establish consistency between the best step-truncation method and the best tangent space projection integrator proposed by Lubich et al. in [lubich2013dynamical, koch2010dynamtucker, Lubich2014].
This paper is organized as follows. In section 2 we introduce three distinct time integrators, namely the best step-truncation (B-ST) integrator, the SVD step-truncation (SVD-ST) integrator, and the best tangent space projection (B-TSP) integrator, to solve (2) on a tensor manifold with fixed rank. In section 3 and section 4 we develop a thorough analysis of B-ST and SVD-ST, respectively. In particular, we prove that B-ST is consistent with B-TSP as the time step goes to zero. In section 5 we obtain necessary and sufficient conditions for B-ST and SVD-ST to be high-order integrators. Numerical examples demonstrating the theoretical claims are presented and discussed in section 6. The main findings are summarized in section 7. We also include two brief appendices in which we develop a perturbation analysis of B-ST in two dimensions (Appendix A), and summarize the parallel implementation of the hierarchical Tucker tensor format we utilized for our numerical simulations (Appendix B).
2 Temporal integrators on tensor manifolds with fixed rank
Let be the manifold of hierarchical Tucker tensors with rank corresponding to a prescribed dimension tree [uschmajew2013geometry]. We are interested in developing consistent algorithms for computing the approximate solution of (2) in (closure111 It was shown in [uschmajew2013geometry] that is the set of of hierarchical Tucker tensors with ranks at most . of ). To this end, we begin by presenting three different nonlinear projections onto which are fundamental to the development of step-truncation algorithms. The first is the so-called “best approximation” as described by Grasedyck in [grasedyck2010hierarchical], and it is defined by the minimization principle
The second projector is known as high-order singular value decomposition[grasedyck2018distributed] (HOSVD, or more concisely SVD). Such projector is defined as composition of linear projections obtained from a sequence of singular value decompositions of appropriate matricizations of the tensor . These projections are denoted in terms of the layers of the dimension tree
When applied to linear multistep integration schemes this projector is proved to yield a stable method [rodgers2020stability]. The third projector we consider is the best projector onto tangent space of
. This projector takes a vectorand projects it onto the tangent space of at some point , i.e.
where denotes the tangent space of at . In all cases, the norm used is the standard 2-norm in the embedding space. This implies that the projection defined in (7) is a linear function of , since it is the solution to a linearly constrained least squares problem. Let
be a convergent one-step scheme222 As is well-known, the scheme (8) includes Runge-Kutta and also linear multi-step methods. In the latter case the vector is defined by stacking the solution at different time steps, while need to be modified accordingly (see [rodgers2020stability] for details). approximating the solution to the initial value problem (2). For example, the Heun method (explicit RK2) can be written in the form (8) provided we define
Given we have no guarantee that defined in (8) is still in . To make sure that this is the case, we can apply the nonlinear projection (5) or (6) to the right hand side of (8). This yields the following step-truncation methods:
Both methods aim at solving (2) in by performing a nonlinear projection onto at each time step. In applications, we will refer to them as the best step-truncation (B-ST) and the SVD step-truncation (SVD-ST) methods, respectively.
An alternative method which leads to a differential equation on a tensor manifold with a given rank (i.e., ) was developed in [lubich2013dynamical] (see also [Lubich2014, koch2007dynamical, koch2010dynamtucker]). The differential equation follows from the variational principle (7), with replaced by , and it can be written as
where is the projection (7) onto the tangent space of at . Since this method also comes from a minimization principle, we shall refer to it as the best tangent space projection (B-TSP) method. A temporal discretization of (12) with the one-step method (8) yields
3 Analysis of the best step-truncation integrator
Here we show that the best step-truncation scheme (10) is consistent with (13) at least to order one in , as goes to zero. To this end, we first notice that while gives an element in the closure of , the tangent projector is defined in terms of a given element on the interior. To circumvent this issue we resort to a perturbation analysis of the best step-truncation operator, i.e., a formal power series expansion of the form
where denotes the Jacobian of at , and . Since , we have that . This allows us to write (14) as
The following Proposition characterizes the Jacobian as a continuous function of in , and shows that it coincides with the projection operator (7) onto the tangent space .
Proposition 1 (Smoothness of the best truncation operator)
is continuously differentiable in . Moreover,
where is the projection (7) onto the tangent space of at .
As discussed in [grasedyck2010hierarchical], every tensor has an exact hierarchical Tucker (HT) decomposition. From [uschmajew2013geometry], we also know that every HT decomposition lies in some closure . Moreover, we know that the boundary of is a union of lower rank open (interior) manifolds. Thus, every point in is either the tensor of zeros everywhere or exists as an element of a fixed rank manifold. So we have that Proposition 1 applies everywhere but the origin.
To prove Proposition 1 we first need to ensure that the topology of is nice enough so as to guarantee that the Jacobian of exists everywhere in . For this, we have the following
Let be a point on the hierarchical Tucker manifold of constant rank. Let be an arbitrary vector in the tangent plane of at . Then there exists so that for all satisfying , we have . As a consequence, if is a closed and bounded set containing the origin, then there exists an open subset so that , for all .
First, consider a simpler problem, where we have two matrices where is full column rank. Consider the function
Clearly, is a polynomial and thus smooth in . Moreover, since is full column rank. Since is smooth, there exists some such that for all . Since the full-rank hierarchical Tucker manifold is defined via the full column rank constraints on an array of matrices corresponding to matricizations of the tensor [uschmajew2013geometry], we can apply the principle above to every full column rank matrix associated with the tree, using addition of a point and a tangent as referenced in Proposition 3 of [da2015optimization]. We have now proved the part one of the lemma where is taken to be the minimum over the tree nodes. As for existence of an open set, suppose is bounded. Now we apply the above matrix case to the boundary , giving us a star shaped set . Letting be the interior, completes the proof of the lemma.
At this point we have all elements to prove Proposition 1. Our proof is an adapted version of a general theorem for surfaces without boundary by Marz and Macdonald [marz2012calculus].
Let be a set which is open in the topology of . Also, let be a local parametrization at . For the parametrizing coordinates, we take an open subset of the tangent space embedded in . This means that the parametrization takes tangent vectors as inputs and maps them into tensors in , i.e.
Moreover, we assume that the coordinates are arranged in column major ordering as a vector. This allows for the Jacobian to be a basis for the tangent space . Note that is a matrix with real coefficients. Now, let be a matrix of column vectors spanning the space orthogonal to in . Since the two linear spaces are disjoint, we have a local coordinate map for the ball , given by
where is tangent and is normal (both column vectors). By construction,
which is smooth in both and
. Therefore, we can take the total derivative on the embedded space and apply the chain rule to obtain the Jacobian of. Doing so, we have
where the symbol denotes column concatenation of matrices, is the dimension of the normal space , is the -th column of , and is the -th component of . We can take since the above expression extends smoothly from the embedding space onto . Hence, the Jacobian of is the solution to the linear equation
Since the right factor of the left hand side has a pair of orthogonal blocks, we can write the inverse using the pseudo-inverse of the blocks, i.e.,
The right hand side is the block concatenation of the rows of each pseudo-inverse. Plugging the above expression in (24), we find
which is exactly the expression for the orthogonal projector onto the tangent space [lubich2013dynamical]. This completes the proof.
We now discuss how Proposition 1 can be used to prove consistency between the best step-truncation algorithm (10) and the best tangent space projection integrator (13). To this end, let be an order increment function and let . By using the perturbation series (15) and Proposition 1 we can conclude that
Therefore the best step-truncation scheme (10) is consitent at least to order 1 with the projective integration scheme (13). We emphasize that equation (27) includes also consistency of step-truncation algorithms in the setting of dynamically biorthogonal equations (DyBO) [cheng2013dynamically], dynamically orthogonal equations (DO) [sapsis2009dynamically], and double dynamically orthogonal equations (DDO) [koch2007dynamical].
Lemma 2 (Consistency of B-TSP)
Let be a solution to the initial value problem (12). Then for a time integration period where the fixed rank solution exists, we have
where is the orthogonal complementary error. Additionallly, there is an integration time where can be made arbitrarily small by increasing the rank .
To obtain (28), we note that
To show that can be made as small as we like, note that we can make the dimension of the tangent space as large as we like, with larger dimension as a function of the singular values of the hierarchical tensor format for . In particular, in [lubich2013dynamical, Theorem 5.2] rigorous bounds on were proven in terms of the smoothness of and the singular values of a particular solution for sufficiently small . Such bounds can be used here to conclude that can be made as small as we like with the rank .
We emphasize that Lemma 2 is a generalization of [cheng2013dynamically, Theorem 3.2]. At this point we have all elements to rigorously state consistency between the best step-truncation algorithm (10) and the integration scheme (12) based on tangent space projection.
Theorem 1 (Consistency of the best step-truncation integrator)
be consistent time stepping scheme for the initial value problem (2). Then the best step-truncation scheme
converges to the equation in the limit and for fixed hierarchical rank . Consequently, asymptotic error estimates in [lubich2013dynamical] are shared by the best step-truncation integrator (30).
4 Analysis of the SVD step-truncation integrator
Now that we have thoroughly analyzed the best truncation operator , we can apply those results to a simpler truncation operator based on high-order singular value decomposition [grasedyck2010hierarchical], i.e., . The key property which ties these two truncation operators is a bound on how bad the error of SVD truncation can get, proven in [grasedyck2010hierarchical],
where is the number of tensor product factors which form the space . In the context the PDE (1), is the dimension of the domain . Equation (31) allows us to obtain the an error bound for the SVD step-truncation scheme in terms of the best truncation. We will first show that the error bound of SVD step-truncation integrator is dominated by the error of the best step-truncation integrator.
Lemma 3 (Error bound on the SVD step-truncation integrator)
Let be the exact solution to equation (2) on a time interval , and an increment function with order of convergence . Then we have the following local error estimate for the SVD step-truncation integrator
First, we apply triangle inequality to
Since the increment function is, by assumption, of order we have
Next, we apply (31) to obtain
Another application of the triangle inequality yields
Collecting like terms yields the desired result.
The next Corollary shows that for fixed dimension , the SVD step-truncation integrator is consistent with the ODE (2) as the rank is increased.
Corollary 1 (Consistency of the SVD step-truncation integrator)
The Corollary follows immediately from the inequality (32), which implies that implies that the SVD truncation error can be made arbitrarily small if the best truncation error can be made arbitrarily small.
The following Corollary characterizes an error bound on the SVD step-truncation integrator which depends on the projection onto the tangent space of the tensor manifold . More precisely, by applying series expansion (27) to the result (32), we can construct another bound for the SVD step-truncation integrator that holds as is taken down to zero. We omit the proof of this corollary, as it essentially follows the same strategy we used in the proof of Lemma 3.
5 High-order step-truncation integrators
Now that we have given consistency results and error estimates for both B-ST and SVD-ST integrators, we are interested in how accurate of an estimate in time the best truncation method is for the original dynamics (2). We see from (27) that if the higher order derivatives of are not sufficiently small, then the overall order may be only 1. Hereafter we present a condition on for the order of accuracy to (12) to be equal to the order of . Subsequently, we transplant the error bounds obtained in this way to the SVD step-truncation integrator.
Proposition 2 (High-order B-ST integrator)
First, the forward direction of the proposition. Since we have that for all . Therefore,
implies that is at most away from its argument. Hence, by the triangle inequality, we have
Therefore, if then the best step-truncation method retains the order of the scheme (40). Now we will prove the converse. Suppose that
Employing the expansion (27), we have
Now we can directly apply the triangle inequality.
By taking the limit for going to zero we have
Proposition 2 suggests that we should expect a high-order of accuracy when applying the discretization scheme (13) if the rank of the solution to (2) does not change in time. Moreover, the series expansion (27) tells us that the order of accuracy of (13) is lower bounded by one. Hence, we can conclude that the equation (13) has order of accuracy which is dependent on the dynamics defined by (2).
We now apply these results to the SVD step-truncation integrator . The following Corollary characterizes the order of accuracy of the scheme (11).
Corollary 3 (High-order SVD step-truncation integrator)
If then the SVD step-truncation integrator
has the same order of accuracy (in time) as the scheme (40).
Though the best truncation has some desirable theoretical properties, we remark that it is the solution to a possibly large-scale optimization problem (5). Solving such a large problem can be computationally costly. However, many of the desirable features of are inherited by the SVD truncation operator . Moreover, the SVD truncation has efficient numerical implementations [grasedyck2018distributed, kressner2014algorithm].
6 An application to the Fokker–Planck equation
In this section we apply the proposed step-truncation algorithms to a Fokker-Planck equation with non-constant drift and constant diffusion coefficients, and demonstrate their accuracy in predicting relaxation to statistical equilibrium. To this end, consider the following stochastic model
, , and . In equation (45), are the phase variables of the model, which we assume periodic (i.e., ), and is a standard vector–valued Wiener process. The Fokker–Planck equation that corresponds to (45) has the form
Our numerical examples will be computed on a torus with dimension and .
6.1 Two-dimensional Fokker-Planck equation
In Figure 1 we plot the numerical solution of the Fokker–Planck equation (47) in dimension we obtained with three distinct methods: best tangent space projector (B-TSP) integrator (13), best step-trucation (B-ST) integrator (10), and full tensor product integrator (8) (benchmark solution, hereafter denoted as ). These solutions were computed by discretizing the initial condition
where is a normalization factor, on a grid with evenly–spaced interior points, and then truncating the corresponding tensor (matrix) to rank . The discretized Fokker–Planck operator apperaring in Eq. (2) is built on the same grid with a Fourier pseudo-spectral expansion. The time stepping scheme is a strong stability preserving RK3. This defines the iteration function in the schemes (8), (10) and (13). The steady state was determined for this computation by halting execution when was below a numerical threshold of . This happens at approximately for the initial condition (48). We can see from Figure 1 that even for a fairly small rank (), the best step-truncation method (10) and the best tangent space projection integrator (13
) still capture much of the qualitative behavior of the full state probability density. It is worth noting here that the ability for the various fixed-rank approximations to capture the behavior of an initial value problem is dependent on both initial condition and the dynamical system under consideration.
As is well known, in the two-dimensional setting we are considering in this section (discretized solution in the plane at each time), the best truncation operator coincides with the SVD truncation (classical SVD), i.e., we have . This allows us to easily provide a numerical verification of Theorem 1. The theorem makes two claims. The first claim is that there exists a rank , and a time step such that the local truncation error can be made smaller than any fixed . The second claim is that the best step-truncation integrator (10) is consistent with the projected tangent space dynamics (13) as on a matrix manifold with fixed rank. That is, if solves (12) then in a neighborhood around the point we have
Above, we have as the order of consistency and as the error scaling constant determined by the derivatives of . For these numerical examples, we estimate the accuracy by examining the error of a common RK3 scheme. This ensures that the numerical error order is less than the theoretical bound .
In Figure 2 we plot the norm of the solution we obtained with B-ST and and B-TSP. Nothing about our analysis implies that for a fixed rank will we see preservation of quantities such as probability mass. It is only the case that as rank is taken sufficiently large do we see proper retaining of those quantities. Numerically speaking, our discrete time integration scheme does not take into account any mass preservation properties which may be upheld before tangent space projection. However, by Theorem 30 the error can be made arbitrarily small. Thus we must have preservation in the limiting case. If the initial condition has a high rank with slowly decaying singular values, then as demonstrated in Figure 3 we will have a rather large error penalty from projecting the initial condition into the tensor manifold . Another interesting observation from Figure 3 is that the error between the best tangent space projection integrator and the SVD step-truncation method appears to have three characterizing regions. The first region is were the error slope is large and the error is large. This behavior could be explained by Lemma 1. In fact, if the perturbation is too large, then we do not guarantee that the numerical update of the solution is within and therefore the all results we have proved, which hold for small , do not apply. The second error region is one where the slope of the local truncation error is about one in scale. It is our numerical experience that this region begins shortly before a leveling-off in the error due to truncation of the initial condition onto a manifold of lower rank. Interestingly, the best step-truncation integrator does not suffer from these effects. Though, as expected, the asymptotic error for has the same level for a fixed rank, and is approached rather smoothly. Additionally, we remark that the rate at which the level is approached as need not be the same for each choice of rank. Our analysis indeed implies that the error can be made arbitrarily small, but does not put any order conditions on the rank. In fact, the rate will be in general dependent on the rank and singular value decay of the initial condition, as well as the full rank dynamics.
In Figure 4 we demonstrate the inequality (49) numerically. it is seen that for all the listed ranks that as nears machine accuracy, the order is . This bound holds most clearly for the low rank approximations. We then see that there is the expected cancellation error at machine accuracy of double precision floating point. Again, we see steep slopes for large ranks. This observation combined with Figure 3 gives numerical evidence of some unbearable stiffness in the best tangent space projection equation (12). In particular, the error constant for large would seem to be very large. However, a full analysis of the stiffness for these equations is beyond the scope of the present paper.
6.2 Four-dimensional Fokker-Planck equation
-dimensional probability density functionwith respect to and at different times. These solutions are computed by first truncating the initial condition (50) to a hirarchical Tucker balanced tree representation with maximal rank of 5, on a grid with interior points (evenly spaced). The discretized Fokker–Planck operator is built on the same grid with a Fourier pseudo-spectral expansion. The time stepping scheme is RK1 with . The steady state is determined for this computation by halting execution when is below a numerical threshold of