- (1) ttpy (https://github.com/oseledets/ttpy) library contains Python implementation of several important procedures for working with tensors in TT-format.
The Boltzmann kinetic equation (BKE) is the main mathematical model of the theory of rarefied gases. Due to the high dimensionality of the phase space and the complexity of the collision integral, the numerical solution of the BKE is much more complicated and computationally expensive than the numerical solution of macroscopic equations, such as the Navier-Stokes equations of the compressible gas Petrov:2018a.
There are several simplified collision models, which allow to simplify the Boltzmann equation, while preserving a number of it’s important properties. The simplest is the BGK model betahatnagar1954model. A more accurate approximation is given by the Shakhov model (S-model) Shakhov1968 and its extention to the diatomic gases by Rykov Rykov1975. Comparisons with calculations using the exact Boltzmann equation, the direct simulation Monte Carlo method, and with experimental data have confirmed good accuracy of the S-model, see e.g. Sharipov:1998a; Titarev:2012d; Frolova:2018b; titarev2018application; Titarev2020 and references therein.
In model kinetic equations the calculation of the collision integral requires only the knowledge of a certain number of macroparameters, or moments of the distribution function, i.e. 3-dimensional integrals over the velocity space. Despite this simplification, their numerical solution is still quite computationally demanding task, especially for three-dimensional applications. One of the approaches to reduce the computational cost and memory requirements of numerical methods for model kinetic equations is to use adaptive mesh in the velocity spaceArslanbekov:2013a; Baranger:2014; guo:sparse; titarev2016openmp; titarev2018application
. It should be noted that the use of an adaptive unstructured meshes significantly complicates the algorithm of the numerical method and often requires some a-priori information about the problem being solved. The simplest algorithm is constructed with the use of structured Cartesian meshes in the velocity space. In this case, values of the distribution function at all nodes of the mesh form a multidimensional array, which will be hereafter called ”tensor”. Therefore, the natural way to speed up the method and reduce the amount of required memory is to use low-rank tensor approximations, which are well-known in the field of linear algebra. This is justified by theoretical estimates showing that for tensors, generated by the values of smooth functions, such approximations always existTyrtyshnikov2004423; Tyrtyshnikov2004Sbornik.
There are many studies on this subject. In Khoromskij20071291 a special tensor format is proposed for approximation of tensors that arise from calculation of the exact collision integral on a tensor mesh. In Dolgov2014268 tensor approximations were successfully applied to the numerical method for the Vlasov equation with the BGK model for the collision integral. The memory consumption was reduced 17 times as compared with the standard numerical method on the same meshes. Another version of the numerical method for the Vlasov equation is described in Kormann2015B613. It is noted that the use of tensor decompositions reduces the memory by more than 100 times. A recent paper Boelens2018519
describes the general idea of using tensor decomposition in the numerical method for partial differential equations of a certain type and presents the results of test calculations of simple problems for the Boltzmann equation with the BGK model collision integral.
In the cited papers tensor decompositions are applied to tensors formed by values of distribution function on structured tensor mesh in both physical and velocity space. Such tensors have dimension 4 or 6 depending on the dimensionality of problem. For such dimensions low-rank approximations are especially effective. However, this approach is applicable only to problems with simple boundary conditions and simple geometry so that one can use a structured mesh in physical space, while in many applications computational domain has complex shape. For such problems with complex shape one has to use an unstructured mesh in physical space (for example, tetrahedral, or multi-block structured). In this regard, it is more convenient to approximate tensor formed by the distribution function values only on the velocity mesh at each point of physical space.
In this paper an analogue of the discrete velocity method is proposed, in which the tensors formed by the values of the distribution function on the velocity mesh at various spatial points are approximated using Tensor-Train format. Examples of test calculations are presented, which show that the proposed approach allows to reduce the computer memory consumption 30-50 times while maintaining satisfactory accuracy; CPU time increases only mildly.
2 Mathematical model
In general, the Boltzmann equation of a monatomic gas with a model collision integral has the following form:
where – value of distribution function,
– velocity vector,– vector of macroparameters, which are expressed through the moments of the distribution function:
In the Shakhov model Shakhov1968 the collision integral is given by
Here – dynamic viscosity, – Prandtl number, – locally Maxwell (equilibrium) distribution function, – gas constant.
At the boundaries of the computational domain in physical space it is necessary to specify distribution function values for molecules whose velocity vector is directed inside the domain. On the surface of the body, the boundary condition of diffuse reflection with full thermal accommodation to the surface temperature is used. The distribution function of reflected molecules is written as:
The density of reflected molecules is found from the impermeability condition:
where is the projection of the velocity onto the normal to the surface, directed outside the computational domain, is distribution function of molecules coming to the wall.
For symmetry planes the following boundary condition is set:
where – outward looking unit normal vector for plane.
For the free stream condition the distribution function is equal to the Maxwell distribution function for prescribed values of free-stream macroparameters.
3 Discrete velocity method
In this paper, we use a variant of the discrete velocity method described in titarev2010jvm, titarev2016openmp, Titarev201417. For brevity, we explain the main idea using an explicit first-order method, although implicit scheme of arbitrary approximation order can be used.
We introduce a uniform Cartesian mesh in the velocity space:
The integrals in the velocity space are replaced by the 2nd order quadrature formula:
The values of the distribution function at the nodes of the velocity mesh form a three-dimensional tensor, which is denoted by
Writing the kinetic equation at each node of the velocity mesh, we obtain a system of linear constant-coefficient equations with a source term. This system can be written in the tensor form:
where ”” denotes the component-wise (Hadamard) product of tensors, are tensors formed by the values of the velocity component at each node of the velocity mesh, .
A standard finite-volume method is used to discretize the left-hand side of the resulting system. The computational domain in physical space is divided into finite volumes (polyhedrons) , . System (9) is integrated over , the volume integral is replaced by the sum of surface integrals over the cell faces from the fluxes projected onto the normal to the face. Thus we obtain a semi-discrete scheme of the following form:
Here is the tensor formed by the values of the integral averages of the distribution function over cell , is the outer normal vector of the th face of the cell with index , is the face of the cell. The final form of the method depends on the flux approximation and time-marching scheme.
For brevity we consider first order method: distribution function is assumed to be piece-wise constant, the numerical flux is given by an exact or approximate solution of the one-dimensional Riemann problem along normal vector at each face center. In the case of the exact solution (CIR scheme), the expression for flux is the following:
here are the values to the left and right of the face with respect to the normal. If the cell is adjacent to the boundary, one of these values is set based on the boundary condition.
It should be noted, that in (11) instead of some estimate can be used. This may be interpreted as Riemann solver of the Rusanov type.
Using the explicit Euler method to solve the ODE system, we obtain the fully discrete one-step method:
Let us introduce additional notation for a brief description of the computational algorithm. We denote – the number of all faces in the mesh. We assume that the normal to the face is given on each face. Let be the number equal to if the normal to the face is external with respect to the cell , and otherwise. The procedure for calculating distribution function in each cell on the next time layer is listed in algorithm 1.
Pseudo-code of the function for computing the model collision integral is given in algorithm 2. The function sum calculates the sum of all elements of the tensor, the symbol denotes the tensor consisting of ones: , the function maxwell returns the tensor formed by values of Maxwell function for given macroparameters on the velocity mesh.
The main observation that can be made from listed algorithms is that one step of the numerical method requires only a few simple operations with tensors, namely:
component-wise sum of two tensors
component-wise product of two tensors
sum of all elements in a tensor, or, in the case of nonuniform Cartesian mesh in velocity space, convolution of the following form:
where are 1D vectors consisting of weights of a quadrature rule
It follows from this observation that if there is some parametric representation of tensors, storage of all tensor elements can be avoided.
The same applies for many implicit methods. In our code we implemented a version of LU-SGS method. This method is very effective, since it’s computational cost is only about 50% larger then one of the explicit method. For brevity we do not list all formulas, details of the implementation in the context of kinetic solvers can be found in Titarev:2012c; titarev2016openmp; Chikitkin2018503.
In the next section we briefly formulate general idea of tensor decompositions and describe Tensor-Train decomposition used in our work.
4 Tensor decompositions
Tensor decompositions extend the idea of separation of variables to multidimensional arrays. In the two-dimensional case, for any matrix of rank
the singular value decomposition (SVD) exists:
The Eckart-Young theorem states that the best approximation of the rank to the matrix in the 2-norm and the Frobenius norm is obtained by dropping the terms in SVD of , which correspond to the smallest singular numbers. The low-rank approximation allows one to reduce the required storage memory to , where is the size of the matrix (for the case of a square matrix) and reduce complexity of matrix-vector operations.
A direct generalization of the form (14) and of the definition of the rank of the tensor in the multidimensional case is the canonical decomposition (CANDECOMP, PARAFAC) DeLathauwer20001324.
where is called tensor rank.
It’s use in numerical methods is limited due to the lack of stable algorithms. Nevertheless, there are theoretical estimates, which show, that tensors formed by values of smooth function on Cartesian meshes can be approximated with high accuracy by low-rank tensor Tyrtyshnikov2004Sbornik.
In the three-dimensional case, the Tucker decomposition is often used tucker1963:
This representation allows to employ robust SVD based procedures for fast linear algebra operation for tensors in this format.
Obviously, Tucker decomposition does not allow to circumvent the “curse of dimensionality”, sinceelements are needed to store the core for dimension . However, in many problems the ranks are very small.
There are two formats for tensors of arbitrary dimension , which generalize idea of Tucker format: hierarchical-Tucker (HT) format Grasedyck2009 and Tensor-Train (TT) format Oseledets2011. Both formats are based on a dimensionality reduction tree and use the SVD of auxiliary matrices for a low-rank approximation of an arbitrary tensor.
In the current paper we use the TT format, and the ttpy library (https://github.com/oseledets/ttpy), in which all the basic operations with tensors in this format are implemented.
In the TT format a tensor is represented as:
are called TT-cores. Two cores - the first and the last - are matrices whereas all the rest are 3D tensors. The numbers are called TT- ranks.
A shorter form is given as a product of matrices:
Here is a row vector, is a column vector, all the others are matrices.
Storage of the TT tensor requires memory, therefore, for small ranks, significant memory savings are obtained compared to storage for the full tensor.
The following operations with TT tensors are important for applying TT decomposition in the discrete velocity method:
Computation of tensor in TT format with minimum TT ranks, which approximates the full tensor with a given relative accuracy:
– Frobenius norm. Algorithm requires operation, if .
Component-wise sum: if tensors and of the same size are represented in TT format, i.e.
then has TT-representation with cores:
Element-wise sum does not require any calculations while the TT ranks of the sum are equal to the sum of the TT ranks of the and .
The element-wise (Hadamard) product of two tensors is represented in TT format with cores:
where – Kronecker product of matrices.
Element-wise multiplication requires operations; the ranks of the product are equal to the product of the ranks of the factors.
Algorithm for tensor rounding in TT-format, i.e. for tensor in the TT format with ranks one can find tensor with lower ranks such that
The algorithm consists of a sequence of SVD and decompositions of auxiliary unfolding matrices and has complexity
Computation of convolution ():
All the listed basic procedures allow to rewrite the algorithm of discrete velocity method as a sequence of operations with tensors in TT-format. Element-wise operations are replaced by their TT-analogues, besides, intermediate rounding is added to prevent the growth of TT-ranks.
It should be noted that TT format is redundant for , because TT-cores are still 3D tensors. For the case the Tucker expansion may be more efficient. Nevertheless, the modification of the algorithm will be essentially the same for any tensor format, but with the TT format one can easily switch to higher dimension, for example, for problems of state-to-state kinetics, where the distribution function depends also on the energy level numbers. Besides, in low-dimensional problems it is possible to apply artificial increase of dimension, which often gives an additional gain dolgov2013QTT. For these reasons, the TT format was chosen in this work.
The next section describes the details of the adaptation of the algorithm.
5 Tensorized discrete velocity method
In the tensorized version of the method all low-rank arrays are constructed immediately in the TT form. Since the Maxwell distribution function is the product of 3 1D functions, we can construct the TT tensor with ranks 1 with corresponding TT-cores (projections of 1D functions onto 1D mesh).
Since the tensors also have rank 1, most of the tensors arising in the calculation of the collision integral have small ranks. For example, the tensor has TT ranks , regardless of the size of the velocity mesh.
The tensor on each face has a TT rank of (actually, ranks are at most ), because it is the sum of three rank-1 tensors:
The only bottleneck is the tensor or the tensor , in which the negative values in the tensor is replaced by 0. These tensors can not be approximated with high accuracy by tensors with small ranks since in general case the normal vector does not coincide with one of the coordinate axes and tensor is a projection of non-smooth function on the mesh.
Nevertheless, in the formula for the face flux (11) one can replace the tensor with some estimate. This can be interpreted as replacing the exactn numerical flux with a Rusanov-type flux. In our numerical experiments we used TT approximations of with ranks 4 for all faces.
The figure 1 shows a comparison between the cross sections at of the exact tensor and its low-rank approximation. It can be seen that the estimate mimics well the exact function. Tests have shown that for first- and second-order schemes this approximation does not significantly affect the accuracy of the computed solution, but for higher-order schemes, a better approximation may be needed.
After all operations which may lead to a large increase in TT ranks, TT rounding was added with prescribed relative error . It should be noted that when applying a specific tensor format, it is necessary to take into account the computational complexity of each element-wise operation and rounding, not only the asymptotic growth rate, but also the constants included in the estimates. For example, in the method under consideration, it makes no sense to insert rounding after each operation, which leads to an increase in ranks: it is more optimal to do rounding after several operations. In addition, it makes sense to reorder some operations, since it is more preferable to avoid Hadamard multiplication of two tensors with large ranks while element-wise sums for the same ranks are relatively cheap.
The key modification in the tensorized version of the implicit LU-SGS method is the simplified element-wise division by the following tensor in each cell:
where is face area. There is no algorithm for exact component-wise division in TT-format. One way to compute the result in TT-format is to use some cross-approximation technique Oseledets201070, which computes small number of elements of the resulting tensor and constructs low-rank approximation based on these values. We adopt a simpler approach: since tensor is used in a preconditioner we can use any estimate providing that . Therefore it is convenient to use 1-rank approximation of the form ( – 1d vectors), since exact element-wise division can be computed in operations for any TT-tensor:
Such operation can be easily implemented using NumPy package broadcasting ability.
For comparison between two methods both standard discrete velocity method and it’s tensorized version have been implemented in Python language. We use Python 2.7 since ttpy library is based on this Python version.
Program consists of three main Python modules:
read_starcd.py – an auxiliary module for reading an unstructured mesh in StarCD format. It contains class Mesh, constructor of this class takes path to the folder with mesh files and creates an object where all information needed in numerical method is stored (cell volumes, face normals, etc.) This object is then serialized using pickle module. After that in run script mesh object is read from serialization file.
solver.py – this module contains function implementing standard first order discrete velocity method and additional routines and structures.
solver_tt.py – contains implementation of tensorized version of the discrete velocity method.
Besides, there are four scripts for two test problems: the first is 1D shock wave structure problem, and the second – flow past planar circular cylinder (see section 7).
The shock wave test can be used for the first validation and experiments, since the spatial mesh is very small and so is the computational time.
The second test demonstrates that the tensorized version of algorithm provides a significant memory reduction in real-life problems.
The spatial mesh for additional tests can be created using any appropriate software. StarCD is a widespread format so one can convert mesh from almost any format to StarCD format. We used Ansys ICEM to create mesh for our tests.
In order to solve a new problem one need to create an object of the “Problem” class (see listing 1) and pass it to solver together with object of “Mesh” class.
7 Test problem
The problem of a high-speed rarefied gas flow past a circular cylinder is considered. The setup of the problem is taken from Lofthouse:2008a. The kinetic solution by the S-model equation and the exact Boltzmann equation was compared against the DSMC solution in a number of recent papers titarev2018application; Frolova:2018b; Titarev:2018c for large free-stream Mach numbers (up to 25) and good agreement was observed. The geometry of the computational domain along with the spatial mesh is shown in the figure 2. The problem is essentially two-dimensional, but we solve it as 3D on the 3D mesh with one cell in direction. Mesh is treated as unstructured by the solver. The flow is directed along -axis, boundary condition (4) is set on the wall. At the remaining boundaries, the symmetry boundary condition is used.
The following dimensional parameters was chosen: Free stream velocity m/s, , , wall temperature , the cylinder radius m. Knudsen number calculated by the parameters of the free stream and the radius of the cylinder , free stream Mach number equals . The power law was used for viscosity:
In the tensorized method the relative accuracy was used. Uniform velocity mesh contains nodes in each direction, number of cells in the spatial mesh equals . For this test case we choose a relatively coarse space mesh so that the standard method can be run on a desktop computer. Therefore, near the surface the mesh resolution is poor (the height of the first cell is too large), but here we concentrate on comparison between two methods rather than accurate computation of the heat transfer.
The figure 3 shows the temperature distribution obtained by the standard and tensorized methods. The figure 4 shows graphs of the temperature versus the normal coordinate for the stagnation line and at an angle of 45 degrees. Temperature was chosen for comparison since it is more sensible quantity, differences for density and velocity are much smaller. The plots show that tensorized method provides good accuracy even at the shock wave front.
In the figure 5 distribution of the ratio is shown, where are the TT-ranks of the distribution function in each cell, and - number of nodes in velocity mesh in one dimension. I can be seen that TT-rounding works like adaptive mesh refinement: near the inflow, where the distribution function is almost equilibrium, ranks are very small. Near the shock wave and surface ranks automatically increases in order to provide prescribed accuracy.
It is clear from presented figures that tensorized method yields almost the same accuracy as standard discrete velocity method. Memory size for storage of distribution function in tensorized method is more than smaller than in the standard method.
Another advantage of tensorized method is that it still allows to study behaviour of distribution function itself. Figure 6 shows -slice of distribution function tensor in cell with (near stagnation line on shock front). In this area flow is strongly non-equilibrium and distribution function has two peaks. It can be seen that difference is negligible, i.e. tensorized method successfully captures the main properties of distribution function.
Despite the significant memory reduction, for this test case computational time of both methods is approximately equal. The reason is the high cost of element-wise multiplication and TT-rounding. The same situation is reported in other studies, for instance Dolgov2014268. However, for this test we use very small velocity mesh ( nodes). For larger meshes tensorized algorithm would be faster the the standard one.
8 Concluding remarks and perspectives
The Boltzmann-T solver for numerical solution of kinetic Boltzmann equation is described. The solver provides a working example of implementation of a tensorized discrete velocity method. This implementation demonstrates prospects of using tensor decompositions for significant memory reduction in practical computations with discrete velocity method on unstructured space mesh.
From our experience we draw the following conclusions, which may be useful for other researchers dealing with tensorized versions of discrete velocity method:
For the present case Tucker format seems to be more efficient than Tensor Train format, since storage reduces to instead of in case of TT format.
Problem with tensors generated by non-smooth function (such as ) can be overcome if the spherical coordinate system is used in the velocity space. In this case tensors like are low-rank. One possible drawback is that spherical coordinates lead to more complicated quadrature formulas.
In this paper we consider the most straightforward approach for algorithm modification: all basic operation are replaced to tensorized analogues. The more elegant approach is to use cross-approximation techniques like Oseledets201070. Nevertheless, in our opinion, the straightforward approach is more robust and does not require deep understanding of underlying tensor algorithms.
In all tensor formats storage and operations cost are proportional to - length of original tensor in one direction. For large artificial increase of dimensionality or so-called quantized tensor formats dolgov2013QTT can be used in order to decrease memory consumption even further.
In future we plan to implement a parallel version of our solver using mpi4py
package and space mesh decomposition. Besides, we plan to add model collision integrals for diatomic gas with internal degrees of freedom. The numerical method will be extended to higher orders, tetrahedral space meshes, and unsteady problems.
The authors would like to thank Sergey Dolgov, Maxim Rakhuba and Ivan Oseledets for their helpful recommendations regarding tensor formats.
A. Chikitkin is supported by President Grant MK-2855.2019.1.