Stochastic partial differential equations (PDEs) are widely used to model physical problems with uncertainty. In this paper, we develop some new computational methods for solving the stochastic unsteady Navier–Stokes equations, using stochastic Galerkin methods  to address the stochastic nature of the problem and so-called all-at-once treatment of time integration.
For a time-dependent problem, the solutions at different time steps are usually computed in a sequential manner via time stepping. For example, a fully-implicit scheme with adaptive time step sizes was studied in [7, 14]. On the other hand, an all-at-once system can be formed by collecting the algebraic systems at all the discrete time steps into a single one, and the solutions are computed simultaneously. Such a formulation avoids the serial nature of time stepping, and allows parallelization in the time direction for accelerating the solution procedure [10, 18, 19]. A drawback, however, is that for large-size problems, the all-at-once system may require excessive storage. In this study, we address this issue by using a low-rank tensor representation of data within the solution methods.
We develop a low-rank iterative algorithm for solving the unsteady Navier–Stokes equations with an uncertain viscosity. The equations are linearized with Picard’s method. At each step of the nonlinear iteration, the stochastic Galerkin discretization gives rise to a large linear system, which is solved by a Krylov subspace method. Similar approaches have been used to study the steady-state problem [23, 27], where the authors also proposed effective preconditioners by taking advantage of the special structures of the linear systems. To reduce memory and computational costs, we compute low-rank approximations to the discrete solutions, which are represented as three-dimensional tensors in the all-at-once formulation. We refer to  for a review of low-rank tensor approximation techniques, and we will use the tensor train decomposition  in this work. The tensor train decomposition allows efficient basic operations on tensors. A truncation procedure is also available to compress low-rank tensors in the tensor train format to ones with smaller ranks.
Our goal is to use the low-rank tensors within Krylov subspace methods, in order to efficiently solve the large linear systems arising in each nonlinear step. The basic idea is to represent all the vector quantities that arise during the course of a Krylov subspace computation as low-rank tensors. With this strategy, much less memory is needed to store the data produced during the iteration. Moreover, the associated computations, such as matrix-vector products and vector additions, become much cheaper. The tensors are compressed in each iteration to maintain low ranks. This idea has been used for the conjugate gradient (CG) method and the generalized minimal residual (GMRES) method, with different low-rank tensor formats[1, 2, 5, 15]. In addition, the convergence of Krylov subspace methods can be greatly improved by an effective preconditioner. In conjunction with the savings achieved through low-rank tensor computations, we will derive preconditioners for the stochastic all-at-once formulation based on some state-of-the-art techniques used for deterministic problems, and we will demonstrate their performances in numerical experiments. We also explore the idea of inexact Picard methods where the linear systems are solved inexactly at each Picard step to further save computational work, and we show that with this strategy very small numbers of iterations are needed for the Krylov subspace method.
We note that a different type of approach, the alternating iterative methods [6, 13, 25], including the density matrix renormalization group (DMRG) algorithm and its variants, can be used for solving linear systems in the tensor train format. In these methods, each component of the low-rank solution tensor is approached directly and optimized by projecting to a small local problem. This approach avoids the rank growth in intermediate iterates typically encountered in a low-rank Krylov subspace method. However, these methods are developed for solving symmetric positive definite systems and require nontrivial effort to be adapted for a nonsymmetric Navier–Stokes problem.
The rest of the paper is organized as follows. In Section 2 we give a formal presentation of the problem. Discretization techniques that result in an all-at-once linear system at each Picard step are discussed in Section 3. In Section 4 we introduce the low-rank tensor approximation and propose a low-rank Krylov subspace iterative solver for the all-at-once systems. The preconditioners are derived in Section 5 and numerical results are given in Section 6.
2 Problem setting
Consider the unsteady Navier–Stokes equations for incompressible flows on a space-time domain ,
where and stand for the velocity and pressure, respectively, is the viscosity, and is a two-dimensional spatial domain with boundary . The Dirichlet boundary consists of an inflow boundary and fixed walls, and Neumann boundary conditions are set for the outflow,
We assume the Neumann boundary is not empty so that the pressure is uniquely determined. The function denotes a time-dependent inflow, typically growing from zero to a steady state, and it is set to zero at fixed walls. The initial conditions are zero everywhere for both and .
The uncertainty in the problem is introduced by a stochastic viscosity
, which is modeled as a random field depending on a finite collection of random variables(or written as a vector ). Specifically, we consider a representation as a truncated Karhunen–Loève (KL, ) expansion,
where is the mean viscosity, and are determined by the covariance function of . We assume that the random parameters are independent and that the viscosity satisfies almost surely for any . We refer to [23, 27] for different forms of the stochastic viscosity. The solutions and in Eq. 2.1 will also be random fields which depend on the space parameter , time , and the random variables .
3 Discrete problem
In this section, we derive a fully discrete problem for the stochastic unsteady Navier–Stokes equations Eq. 2.1. This involves a time discretization scheme and a stochastic Galerkin discretization for the physical and parameter spaces at each time step. The discretizations give rise to a nonlinear algebraic system. Instead of solving such a system at each time step, we collect the systems from all time steps to form an all-at-once system, where the discrete solutions at all the time steps are solved simultaneously. The discrete problem is then linearized with Picard’s method, and a large linear system is solved at each step of the nonlinear iteration.
3.1 Time discretization
For simplicity we use the backward Euler method for time discretization, which is first-order accurate but unconditionally stable and dissipative. The all-at-once formulation discussed later in section 3.3 requires predetermined time steps. Divide the interval into uniform steps with step size and initial time . Given the solution at time , we need to solve the following equations for and :
After discretization (in physical space and parameter space) the implicit method requires solving an algebraic system at each time step. In the following we discuss how the system is assembled from the stochastic Galerkin discretization of Eq. 3.1.
3.2 Stochastic Galerkin method
At time step , the stochastic Galerkin method finds parametrized approximate velocity solutions and pressure solutions in finite-dimensional subspaces of and , where is the joint image of the random variables . The functional spaces are defined as follows,
The expectations are taken with respect to the joint distribution of the random variables. In the following we use to denote the expected value. Let the finite-dimensional subspaces be , , and . Let and be the spaces of functions in with Dirichlet boundary conditions and imposed for the velocity field, respectively. Then for Eq. 3.1 the stochastic Galerkin formulation entails the computation of and , satisfying the weak form
for any and . Here, denotes the inner product in . For the physical spaces, we use a div-stable Taylor–Hood discretization  on quadrilateral elements, with biquadratic basis functions for velocity, and bilinear basis functions for pressure. The stochastic basis functions are -dimensional orthonormal polynomials constructed from generalized polynomial chaos (gPC, ) satisfying . The stochastic Galerkin solutions are expressed as linear combinations of the basis functions,
The coefficient vectors and similarly defined are computed from the nonlinear algebraic system
Here is the identity matrix, and denotes the Kronecker product of two matrices. The boldface matrices , , and are block-diagonal, with the scalar mass matrix , weighted stiffness matrix , and discrete convection operator as diagonal components, where
for . Note the dependency on comes from the nonlinear convection term , with convection velocity . Let . The discrete divergence operator , with
for and . The matrices and of Eq. 3.6 come from the stochastic basis functions and have entries
3.3 All-at-once system
As discussed in the beginning of the section, we consider an all-at-once system where the discrete solutions at all the time steps are computed together. Let
and let , , and be similarly defined. By collecting the algebraic systems Eq. 3.5 corresponding to all the time steps , we get the single system
where is block diagonal with as the th diagonal block, , and with . Note that the zero initial conditions are incorporated in Eq. 3.5 for . The all-at-once system Eq. 3.11 is nonsymmetric and blockwise sparse. Each part of the system contains sums of Kronecker products of three matrices, i.e., in the form . In fact, from Eq. 3.6,
We discuss later (see section 4.3) how the convection matrix can also be put in the Kronecker product form. It will be seen that this structure is useful for efficient matrix-vector product computations.
3.4 Picard’s method
We use Picard’s method to solve the nonlinear equation Eq. 3.11. Picard’s method is a fixed-point iteration. Let , be the approximate solutions at the th step. Each Picard step entails solving a large linear system
Instead of Eq. 3.13, one can equivalently solve the corresponding residual equation for a correction of the solution. Let , . Then and satisfy
where the nonlinear residual is
Let denote the right-hand side of Eq. 3.11. The complete algorithm is summarized in Algorithm 3.1. The initial iterates , are obtained as the solution of a Stokes problem, for which in Eq. 3.13 the convection matrix is set to zero.
4 Low-rank approximation
In this section we discuss low-rank approximation techniques and how they can be used with iterative solvers. The computational cost of solving Eq. 3.14 at each Picard step is high due to the large problem size , especially when large numbers of spatial grid points or time steps are used to achieve high-resolution solution. We will address this using low-rank tensor approximations to the solution vectors and . We will develop efficient iterative solvers and preconditioners where the solution is approximated using a compressed data representation in order to greatly reduce memory requirements and computational effort. The idea is to represent the iterates in an approximate Krylov subspace method in a low-rank tensor format. The basic operations associated with the low-rank format are much cheaper, and as the Krylov subspace method converges it constructs a sequence of low-rank approximations to the solution of the system.
4.1 Tensor train decomposition
A tensor is a multidimensional array with entries , where , . The solution coefficients in Eq. 3.4 can be represented in the form of three-dimensional tensors (where ) and (), such that and . Equivalently, such tensors can be represented in vector format, where the vector version and are specified using the vectorization operation
where , and in a similar manner. In an iterative solver for the system Eq. 3.14, any iterate can be equivalently represented as a three-dimensional tensor . In the sequel we use vector and tensor interchangebly. The tensor train decomposition  is a compressed low-rank representation to approximate a given tensor and efficiently perform tensor operations. Specifically, the tensor train format of is defined as
where , , are the tensor train cores, and and are called the tensor train ranks. It is easy to see that if and is small, the memory cost to store is reduced from to .
The tensor train decomposition allows efficient basic operations on tensors. Most importantly, matrix-vector products can be computed much less expensively if the vector is in the tensor train format. For as in Eq. 4.2, the vector has an equivalent Kronecker product form 
where in the right-hand side , , and are vectors of length , , and , respectively, obtained by fixing the indices and in , , and . Then for any matrix , such as the blocks in Eq. 3.14,
The product is also in tensor train format with the same ranks as in (of the right-hand side of Eq. 4.2), and it only requires matrix-vector products for each component of . From left to right in the Kronecker products, the component matrices from Eq. 3.12 are sparse with numbers of nonzeros proportional to , , and , respectively, and the computational cost is thus reduced from to .
Other vector computations, including additions and inner products, are also inexpensive with the tensor train format. One thing to note is that the additions of two vectors in tensor train format will tend to increase the ranks. This can be easily seen from Eq. 4.2, since the addition of two low-rank tensors end up with more terms for the summation on the right-hand side. An important operation for the tensor train format is a truncation (or rounding) operation, used to reduce the ranks for tensors that are already in the tensor train format but have suboptimal high ranks. For a given tensor as in Eq. 4.2, the truncation operation with tolerance computes
such that has smaller ranks than and satisfies the relative error
. In the algorithm, a sequence of singular value decompositions (SVDs) is computed for the so-called unfolding matrix, obtained by reshaping the entries of a tensor into a two-dimensional array. Terms corresponding to small singular values are dropped such that an error in the truncated SVD satisfies , (see line 4 of Algorithm 4.1). It was shown in  that the algorithm produces a tensor train that satisfies
Thus, one can choose to make the relative error . Note the algorithm is costly since it requires SVDs on matrices . However, when the tensor is already in the tensor train format, the computation can be greatly simplified, and only SVDs on the much smaller tensor train cores are needed. In this case, the cost of the truncation operation is if and . We refer to  for more details. In the numerical experiments, we use TT-Toolbox  for tensor train computations.
4.2 Low-rank solver
The tensor train decomposition offers efficient tensor operations and we use it in iterative solvers to reduce the computational costs. The all-at-once system Eq. 3.14 to be solved at each step of Picard’s method is nonsymmetric. We use a right-preconditioned GMRES method to solve the system. The complete algorithm for solving is summarized in Algorithm 4.2. The preconditioner entails an inner iterative process and is not fixed for each GMRES iteration, and therefore a variant of the flexible GMRES method (see, e.g., ) is used. As discussed above, all the iterates in the algorithm are represented in the tensor train format for efficient computations, and a truncation operation with tolerance is used to compress the tensor train ranks so that they stay small relative to the problem size. It should be noted that since the quantities are truncated, the Arnoldi vectors do not form orthogonal basis for the Krylov subspace, and thus this is not a true GMRES computation. When the algorithm is used for solving Eq. 3.14, the truncation operator is applied to quantities associated with the two tensor trains and separately. In Section 5, we construct effective preconditioners for the system Eq. 3.14.
We also use the tensor train decomposition to construct a more efficient variant of Algorithm 3.1. In particular, the updated solutions and in line 5 are truncated, with a tolerance , so that
Another truncation operation with is applied to compress the ranks of the nonlinear residual in line 7. We will use this truncated version of Algorithm 3.1 in numerical experiments; choices of the truncation tolerances will be specified in Section 6.
4.3 Convection matrix
We now show that in Eq. 3.12 if the velocity is in the tensor train format, the convection matrix can be represented as a sum of Kronecker products of matrices , which allows efficient matrix-vector product computations as in Eq. 4.4. Assume the coefficient tensor in Eq. 3.4 is approximated by a tensor train decomposition,
Note that the entries of are linear in and
Let . Then the th diagonal block of is
The convection matrix can be expressed as
Here is a vector obtained by fixing the index in , and is a diagonal matrix with on the diagonal. The result is a sum of Kronecker products of three smaller matrices. Such a representation can be constructed for any iterate in the tensor train format.
Given the number of terms in the summation in the right-hand side of Eq. 4.12, the matrix-vector product with will result in a dramatic tensor train rank increase, from to . Unless is very small, a tensor train with rank will require too much memory and also be expensive to work with. To overcome this difficulty, when solving the all-at-once system Eq. 3.14, we use a low-rank approximation of to construct . Specifically, let
with some truncation tolerance . Since has smaller ranks than , the approximate convection matrix contains a smaller number of terms in Eq. 4.12, and thus the rank increase becomes less significant when computing matrix-vector products with it. In other words, the linear system solved at each Picard step becomes
Note that the original is still used for computing the nonlinear residual in Picard’s method.
In this section we discuss preconditioning techniques for the all-at-once system Eq. 3.14 so that the Krylov subspace methods converge in a small number of iterations. To simplify the notation, we use instead of , and the associated approximate solution at the th time step is
with . In the following the dependence on in is omitted in most cases. We derive a preconditioner by extending ideas for more standard problems , starting with an “idealized” block triangular preconditioner
With this choice of preconditioner, the Schur complement is , and the idealized preconditioned system derived from a block factorization
has eigenvalues equal to 1 and Jordan blocks of order 2. (Hereis an identity block.) Thus a right-preconditioned true GMRES method will converge in two iterations. However, the application of involves solving linear systems associated with and . These are too expensive for practical computation and to develop preconditioners we will construct inexpensive approximations to the linear solves. Specifically, we derive mean-based preconditioners that use results from the mean deterministic problem. Such preconditioners for the stochastic steady-state Navier–Stokes equations have been studied in . We generalize the techniques for the all-at-once formulation of the unsteady equations.
5.1 Deterministic operator
We review the techniques used for approximating the Schur complement in the deterministic case . The approximations are based on the fact that a commutator of the convection-diffusion operator with the divergence operator
is small under certain assumptions about smoothness and boundary conditions. The subscript means the operators are defined on the pressure space. For a discrete convection-diffusion operator (which is part of the mean problem we discuss later), as defined in Eq. 3.7, an approximation to the Schur complement is identified from a discrete analogue of Eq. 5.4,
where the subscript means the corresponding matrices constructed on the discrete pressure space. Equation 5.5 leads to an approximation to the Schur complement matrix,
The pressure convection-diffusion (PCD) preconditioner is constructed by replacing the mass matrices with approximations containing only their diagonal entries (denoted by a subscript ) in Eq. 5.6,
The least-squares commutator (LSC) preconditioner avoids the construction of matrices on the pressure space, with the approximation to ,