A vast number of todays technological advancements is due to the ever advancing improvement in modeling and solving of partial differential equation (PDE) problems. One class of especially challenging problems is PDE-constrained optimization which has a broad number of applications, reaching from engineering design and control problems to medical applications like cancer treatment[schenk2009, biegler2003large, fisher2009data].
We consider a typical PDE-constrained optimization problem on a space-time cylinder given by
where we are interested in the minimization of the functional constrained by a differential operator with space and time dependent state and control . An extensive analysis of this type of problems can be found e.g. in [Troeltzsch2010] or [Hinze2009]. In this work, we will follow the popular approach of first discretizing the problem and then formulating the discrete first-order optimality conditions as described in [book::IK08]. Finding an efficient numerical solution to the large system of equations resulting from these conditions is of high interest and many approaches exist to solve the resulting saddle point system such as using a block preconditioner to subsequently solve the preconditioned system with iterative solvers like Minres (cf. [stoll2010, pearson2012regularization, rees2010optimal, schoberl2007symmetric]).
However, todays PDE models often result in millions of variables and solving optimization problems governed by such huge models is still a challenging task due to their size and complexity. They often even prove to be impossible to solve on a normal computer due to the memory required to store the usually large and dense solution computed by standard algorithms. Therefore, memory efficient solvers, which compute reduced approximations to the solution are a desirable approach to solve these challenging problems. We here propose a widely applicable low-rank solver, which computes a projection of the solution onto a small subspace.
The method relies on the reformulation of the problem into a matrix equation, which will be illustrated in section 2. The numerical solution of this PDE-constrained optimization problem poses a significant challenge regarding the complexity of both storage and computational resources. We show that it is possible to separate spatial and temporal data via a novel low-rank matrix equation solver in section 3. We introduce a method to reduce the system of equations into a single low-rank matrix equation. In section 4 we discuss the choice of the approximation space, while in section 5 we analyze stopping criteria and a fast update scheme of the residual computation. And finally in section 6, we examine the performance and applicability of our method on various numerical examples. We show the method’s robustness with respect to the various problem parameters as well as assess the performance of our method compared with a previously proposed low-rank approach on a distributed heat equation model as well as a boundary control problem. Further, we show the applicability of our method to more challenging problems such as a partial state observation and a non-symmetric PDE constraint.
2 Problem formulation
We consider a PDE-constrained optimal control problem on a space-time cylinder with time interval and a spatial domain with a subdomain as in Equation (1)-(2). The functional we want to minimize reads,
where is the state, the given desired state, and the control, which is regularized by the control cost parameter . For the PDE subject to which we want to minimize the functional let us exemplarily consider the heat equation with and Dirichlet boundary condition,
There are two distinctive options to proceed with this problem. Either we formulate the optimality conditions and discretize them subsequently or we first discretize the problem and then formulate the discrete first-order optimality conditions, known as the Karush-Kuhn-Tucker (KKT) conditions [book::IK08]. In this work we follow the second approach.
We discretize space and time in an all-at-once approach. The spatial discretization is done with finite elements and to discretize in time, we split the time interval into intervals of length . Using a rectangle rule, the discretization of (3) becomes
where , and are spatial discretizations of , and of size for each time step . Using an implicit Euler-scheme the discrete formulation of the PDE (4) reads
Here, are the mass matrices of the spatial discretization and denotes the so-called stiffness matrix resulting from the discretization of the differential operator . The boundary constraints are incorporated into the stiffness matrix. On further details about the discretization of PDE operators we refer the reader to [elman2005]. We collect the discretizations of the variables in matrices
and denote their vectorizations by, respectively for and . With this we write the optimization problem in compact form as
with the matrices
where mass matrices and stiffness matrix repeat times each. The matrices and are block diagonal matrices like but with different mass matrices and respectively on the diagonal.
To solve the discrete problem (9) - (10), we have to solve the system of equations resulting from the first order optimality conditions [benzi_2005], They state that an optimal solution must be a saddle point of the Lagrangian of the problem,
The Lagrangian of this problem reads
Thus, the optimal solution solves the following set of linear equations,
Memory-wise, an effective approach to solve this large-scale problem is to use a low-rank approach, which finds a cheap representation of the solution matrix within a low-rank subspace range() and a reduced solution , whose number of columns is related to the number of employed time steps,
and similarly for the other matrix variables and . In this work we show that we can compute such a low-rank solution more efficiently than recent approaches by rearranging the large system of equations into a generalized Sylvester matrix equation of the form
The resulting matrix equation can be efficiently solved by using a tailored low-rank Krylov-subspace method. This new approach greatly reduces storage and time requirements while being robust to parameter changes. This matrix equation oriented methodology allows one to clearly identify space and time dimensions, and aims at reducing the problem dimension in the space variables; the time variable is handled using an all-at-once procedure. Similar strategies have been used, for instance, in [Breiten.Simoncini.Stoll.16, Palitta.19].
One assumption we make is that we have a low-rank approximation or representation of the desired state as
with , and . First, we introduce the auxiliary matrices
We exploit the relation
The mass matrix arising from a standard Galerkin method will always be symmetric and can be considered lumped, i.e., diagonal. Therefore, we can eliminate Equation (26) by setting in the remaining two equations,
which are equivalent to
where . For now, let us assume that but our approach can be easily generalized for non-symmetric as we will demonstrate later on. Now this representation corresponds to
We denote , , , , and , where with of low column and row rank, respectively. Then we get the desired format from Equation (18) as
where the left-hand coefficient matrices have size while the right-hand ones have size , so that .
3 Low rank solution
The generalized Sylvester equation in (33) replaces the large Kronecker product based system of equations (21) - (23). Since the solution matrix will be dense and potentially very large, to exploit the new setting it is desirable to find an appropriate approximation space and a low-rank reduced matrix approximation such that
where the orthonormal columns of generate the approximation space. With this setting we can construct a reduced version of the matrix equation (33). Let us assume we compute an approximation as in (34). Then the residual matrix associated with (33) reads
We impose the Galerkin orthogonality of the residual matrix with respect to the approximation space, which in the matrix inner product is equivalent to writing , so that our equation becomes
Let us denote the reduced coefficient matrices as , , and set . The resulting reduced equation
with . For a small subspace size this system of equations is significantly easier to solve and we can either use a direct or an iterative method to do so. If the obtained approximate solution is not sufficiently good, then the space can be expanded and a new approximation constructed, giving rise to an iterative method. The use of low-rank methods within optimization of large-scale systems has been successfully documented in several articles, and we refer the reader to [stoll2014, dolgov2016fast, cichocki2017tensor, benner2016block] for recent accounts.
4 Subspace computation
To construct the projection (34) we need an iterative subspace method, which constructs a relevant subspace for our problem. For this we make use of rational Krylov subspaces
with shifts as described in [Simoncini2016]
. This approach has proven to be very well suited to solve similarly structured problems as the shifts allow for efficient updates of the subspace using relevant information on the matrices’ eigenvalues. As an initial vector we take the right-hand side,, and to construct the subspace (39) we employ a tailored strategy to adapt to the different settings. More precisely:
i) Case , square, full rank. This corresponds to a setup where desired state and control are both distributed equally on the whole domain . We use the matrix . We observe that in this case and is a diagonal nonsingular matrix.
ii) Case , square, full rank. This corresponds to a setup where, e.g., the resulting state is only observed on a partial domain. We construct a mixed subspace where we add the following two new vectors in step ,
so that the space dimension grows by at most two per iteration, instead of one.
iii) Case , tall. In this case, we can only control a partial domain, e.g., the boundary. Here, while is not invertible. We thus define , this selection is justified below. In this case we use the following two new vectors in step ,
and once again the space dimension grows by at most two per iteration, instead of one.
iv) Case , square, full rank, . For a number of PDE constraints, like convection-diffusion problems, the stiffness matrix is non-symmetric. In this case, we have to slightly modify the first component in (33) to . Here, again we construct a mixed subspace where we add two new vectors in step ,
A strategy similar to (ii)-(iv) was successfully adopted in [Powell2017] for a multiterm linear matrix equation in a different context. The effectiveness of the mixed approaches in (ii)-(iii) relies on the fact that the generated space be rich in eigencomponents of both matrices. In (ii) where is diagonal and singular with a bounded and tight nonzero spectral interval, a good spectral homogeneity in the space construction can be reached by shifting the matrix by some so that the spectra of and are contained by the same interval. Hence, we introduce a shift parameter in (33) to get
With these premises, a good choice for is a rough approximation to the largest eigenvalue of . Due to the good properties of the transformed spectrum, the shifts were computed by only using the projection of the matrix onto the generated space; more details on the shift selection can be found in [simoncini2011]. In the cases i) and iv), is not singular but still the spectra of and differ greatly. Applying the same strategy to shift as in 43 also proved to be beneficial in these cases.
In (iii), the structure and singularity of was significantly more challenging, because inversely depends on , hence the spectrum of a shifted version of may not be enclosed in that of . We propose to consider the equivalent problem
where . With this formulation, we found particularly successful the use of the projection of the pair onto the current approximation space to determine the next shifts during the iteration.
Further subspace reduction. By computing we want to reduce storage requirements and computation time. This goal is only achieved if the approximation space dimension remains considerably smaller than . By enriching the subspace as outlined above, however, the space dimension may in principle grow up to in case the solution is not sufficiently accurate. To keep the subspace as small as possible when it is not optimal we include a truncation scheme which bounds the subspace dimension to less than
. Thus, in the worst case the solution will have maximum rank. To reduce the dimension we compute a singular value decomposition of, with where are the singular values of . In case some of the singular values are too small, say for some , it is clear that some of the subspace information is not needed for computing a good approximation to the solution. If this occurs we truncate the subspace with by setting
with denoting the first columns of . Note that the next subspace additions (40) are still conducted using from the latest generated vectors in the original subspace , and orthogonalized with respect to the reduced subspace. We refer the reader to the discussion of Table 4 for an illustration of the achievable memory savings by adopting this strategy.
5 Residuals and stopping criteria
which are closely related to the original system.
Computing the residuals poses a bottleneck in this scheme as straightforward computation is time consuming and would require forming the full solution . To avoid this and substantially speed up the computation time, we rely on a low-rank representation of the residual and calculate its norm making use of an updated QR method. The matrix residuals can be rewritten as
where denotes the -component of and denotes the component associated with respectively. Let any of the two quantities be denoted by
Here the matrices have dimensions and
. We consider a reduced QR decomposition of the tall and skinny matrix,
At each iteration new columns are added to the matrices and . To further reduce the computational cost, we update the QR decomposition so as to avoid a new factorization from scratch at each iteration. Assume that we have a current subspace of size , so that and we add another vector to the space. Thus, four new columns are added to , which we add to the end of , while keeping the same reordering for . Adding a new column gives
where only two column vectors and and a scalar value have to be computed. We have
Setting and constructing the vector
we can set and thus . With this, Equation (55) holds and the new column is orthogonal to ,
Now we can completely avoid forming the full residuals, as with the desired Frobenius norm of the residual becomes
To compute the residual norms in this form only a very small matrix of size has to be computed. Both of these residuals can be computed without forming the approximations and . The use of the trace square root may lead to inaccuracy at the level of the machine epsilon square root. However, our convergence tolerance is in general significantly larger, say , hence these problems were not encountered.
This computational procedure allows us to cheaply compute both and , that is the absolute value of the residual norms. Following the guidelines in [Higham2002] for linear matrix equations, we also monitor a scaled backward error norm of the two equations, that is
which takes into account the data order of magnitude. Summarizing, we stop our iteration whenever
where tol is a prescribed tolerance.
6 Numerical Results
We now present the performance and flexibility of our method on multiple examples for different PDE-constrained optimization problems. The spatial FE-discretizations of the PDE operator were conducted using the deal.II framework [deal2007] with Q1 finite elements and the method described in the previous section was implemented in matlab R2018b. All experiments were run on a desktop computer with an Intel Core i7-4770 Quad-core processor running at 4 3400 MHz with 32 GB of RAM.
We will show robustness with respect to the discretization sizes and as well as the control parameter . Furthermore, we demonstrate the applicability of our method to multiple different setups such as partial observation of the desired state, boundary control and a different non-symmetric PDE-operator.
Other low-rank solvers. To emphasize the competitiveness of our new approach we compare it with another low-rank method aimed at the same problem class. The idea of exploiting the system structure to avoid forming a large system of equations and finding a low-rank way to compactly represent the solution is not an entirely new approach. Earlier, in [stoll2014] the authors developed a technique rewriting the problem in a low-rank context and solving it with a preconditioned Krylov subspace solver, namely Minres introduced in [paige1975]. We denote this solver as lrminres (low-rank Minres ) 111The lrminres MATLAB code used for our comparison is available under https://www.tu-chemnitz.de/mathematik/wire/pubs/PubCodes.zip..
for Equation (25) and respectively for Equations (26) and (27). From here, the system of equations is solved with a tailored version of Minres . Opposed to the original version of Minres , the residual and all other upcoming variables are formulated as low-rank matrices rather than vectors and subsequently truncated to keep memory requirements low. Combined with exploiting the structure of the low-rank equation system in the upcoming matrix vector products and preconditioning, this method provides a memory efficient and fast alternative to established approaches for solving PDE constrained optimization problems.
A bottleneck in this approach, however, is the setup of a preconditioner. To construct the necessary low-rank preconditioner, a matrix equation needs to be solved which significantly affects the overall performance. Therefore, our approach of directly starting from a matrix equation formulation proofs to be superior in many setups.
6.1 Fully observed heat equation
As a first example PDE we use the heat equation with a full observation, thus and . This leads to a simplified version of Equation (18) as
First, we will use this simple setup to investigate the convergence behavior and our choice of stopping criteria by comparing the results to a solution obtained by a direct solver accurate up to machine precision. The constant-in-time desired state is displayed in Figure 1(a). This constant desired state leads to a low-rank representation of the right hand side of rank 1. The reference solution was obtained by solving the linear equation system of a small setup with MATLABs backslash operator. We used a discretization of and .
. We see that the monitored quantities are a good estimation for the magnitude of the errors asstays close above the actual errors. Inserting the direct solution into Equation (33) gives a residual of . Thus, the different scaling of the matrix equation prohibits further accuracy gains once a high accuracy is reached. This effect is reflected in Figure 1 as the last iterations do not provide further accuracy gains.
We continue with the above simple example, i. e. the heat equation with full observation, and and a constant-in-time desired state. We now investigate the performance of our method with respect to time and space discretization. We vary the number of time steps from 20 to 2500 and the number of spatial discretization nodes from to
which roughly resembles up to a total of 78 million degrees of freedom. Here, again we fix the control parameter to. As seen in Table 1, increasing the discretization size barely impacts the resulting subspace sizes. Additionally, the time needed to solve the optimization problems increases considerably slowly.
With the same data as in Example 6.1 we report on performance comparisons with the low-rank preconditioned Minres (lrminres ) proposed in [stoll2014] and introduced above. We fixed the number of time steps to and computed solutions for different discretization sizes and control parameters . The maximum discretization size here is . The results in Table 2 reveal that our method’s performance is superior regarding both required memory and CPU time. Here, our method is labeled as sys2mateq . The column denoted by states the subspace size for sys2mateq , while for lrminres it denotes the maximum rank per vector needed during the iterations of which up to 15 are required. The column states the rank of the final solution in both methods. Note that even though sometimes the ranks achieved by lrminres are smaller the required memory to store the solution is still greater. This is because the lrminres scheme needs to store a low-rank representation for each of the three variables as opposed to sys2mateq where the same subspace is used for all variables. Additionally, during the iterations our method needs to store only one subspace of size , whereas lrminres requires to store multiple vectors of size . Also note, that our scheme does not rely on the time consuming computation of a preconditioner as in lrminres which requires the solution of a matrix equation that gets increasingly difficult for large spatial discretizations combined with small . For the last entry of Table 2 lrminres did not reach convergence.
With the same settings as before, we now raise the rank of the desired state to 6 – leading to a larger rank matrix – and compare both methods once again. Table 3 displays the results as before. Again, our method successfully converges with a small subspace sizes within a very short amount of time, whereas lrminres was considerably slower especially for larger problem sizes and did not converge for the largest discretization with being very small.
We are also interested in the memory savings our method provides. Therefore, we monitor the memory consumption of sys2mateq , lrminres and a full rank Minres method for the same set-ups displayed in Table 3. For sys2mateq we monitor the memory needed to construct the subspace, the reduced solution and the setup of the reduced equation system (38). For lrminres we monitor the memory for all vectors that are used. For comparison we additionally report the memory consumed by a standard Minres method only to store the required vectors, not taking into account the system matrix and preconditioner. The results are shown in Table 4. The quantities refer to the overall memory consumption in megabytes during the process of solving the equation system with the respective method. We see that even with constructing the reduced equation system 38 the memory requirements stay well below those of a not low-rank approach.
6.2 Partial observation on heat equation
Until now we only investigated the case where and therefore the matrix being the identity. One highly interesting and challenging problem is the optimization under partial observation, meaning the desired state is only of interest on a partial domain . This leads to a singular matrix , which further increases the difficulty of the PDE-constrained optimization problem.When we set up the rational Krylov subspace with only for this setting, we do not reach convergence. Therefore, we add up to two new vectors in each iteration as outlined in Section 4.
We solved the problem with control parameter and 100 time steps on a grid of 1089 spatial DoFs. In Figure 1(a) we see the desired state for this problem and the white area is an example of non-observed nodes, roughly 10 %. Hence, the corresponding entries in are set to 0. Figure 1(b) and 1(c) show the resulting state and control respectively for a fixed time step . Table 5 shows the results when increasing the number of unobserved nodes from to which is about 90% of the nodes. Here, the constructed subspaces are not optimal. Thus, we make use of the truncation modification outlined in Chapter 4. We see that the time and iterations needed to solve this more challenging problem are higher than for the fully observed case in the table’s first row. For the non-truncated case with full , the columns denoted by for the subspace size and for the solution rank show a discrepancy which disappears with stricter truncation. When truncating with a tolerance of both values coincide for all cases. In some cases applying this truncation greatly increases the number of iterations needed to find a solution but the end result is a higher memory reduction. Thus, this option can be toggled to better reflect the desired behavior.
Unfortunately, the lrminres scheme from [stoll2014] is not adapted for the case of . Hence, we have no other low-rank method to compare the performance of our method with.
|full||truncated ,||truncated ,|
6.3 Boundary Control
Another setup of high interest is having a non-distributed control. Here, we take a boundary control problem as an example. This leads to the control not being distributed on but only on a subdomain . Therefore, we get a smaller mass matrix associated with present on spatial nodes, and is now rectangular (tall). Therefore, we have a modification of Equation (33) as
with and . Here, the subspace we compute is derived from and as in (40).
As before, we use time steps and a convergence threshold of for this setup. Results with a constant in time desired state for different levels of spatial discretization and different values for are displayed in Table 6. sys2mateq produces robust results with respect to the discretization size as we can only see a small increase in the ranks throughout discretization. Additionally, the results are acquired within a short amount of time even for small . For larger discretization sizes the required time grows only slightly opposed to lrminres where time requirements increase substantially for larger discretization sizes.
6.4 Non-symmetric convection-diffusion PDE
Until now we only investigated the performance of our method under the constraint of a heat equation. But for more challenging PDEs our method works just as well, e.g. the convection-diffusion equation,
with diffusion coefficient and velocity field . When , the equation is convection-dominated and solving it is a challenging task. As the stiffness matrix of the resulting equation system is non-symmetric, we have to adjust the matrix equation accordingly. Thus, we get another modification of Equation (33) for this setup as
We consider example 3.1.4 from [elman2005] with a recirculating wind as the underlying model. The velocity field is displayed in Figure 2(a). In Figures 2(b)-2(c) the desired state and a snapshot of a control solution for are displayed for and . The sys2mateq method produces reliable results throughout a range of different values for and as reported in Table 7. Even very small values in do not pose a problem to the solver.
7 Conclusion and Outlook
We proposed a new scheme to solve a class of large-scale PDE-constrained optimization problems in a low-rank format. This method relied on the reformulation of the KKT saddle point system into a matrix equation, which was subsequently projected onto a low dimensional space generated with rational Krylov type iterations. We showed that the method’s convergence is robust with respect to different discretizations and parameters. Furthermore we demonstrated higher memory and time savings compared to an established low-rank scheme. Additionally, the sys2mateq is very flexible with respect to different constraints such as non-symmetric PDE operators or partial state observation.
In the future, we plan on further investigating the subspace choice and the performance of our scheme for other challenging setups. Further improvements are expected from realizing a truncation or restarting mechanism in cases where the subspaces become unexpectedly large.
The work of the first and third authors was supported by the German ScienceFoundation (DFG) through grant 1742243256 - TRR 9. The second author is a member of Indam-GNCS. Its support is gratefully acknowledged. This work was also supported by the DAAD-MIUR Mobility Program 2018 “Optimization and low-rank solvers for isogeometric analysis (34876)” between TU-Chemnitz and the Università di Bologna.