1 Introduction
1.1 Background
In this paper, we consider the timedependent linear system (TDLS) of this form
(1) 
where , is or (), is an initial value, and
is a linear operator. TDLS appears in many branches of science and engineering, such as dynamical systems, quantum mechanics, linear Hamiltonian dynamics, linear kinetic equations, semidiscretization of partial differential equations, differential matrix equation, etc
[18, 13, 28, 24, 19, 2]. Among them, we focus on linear differential systems and linear matrix systems. If is a differential operator on space, TDLS becomes the linear differential system(2) 
where with is a bounded and open domain. is the initial condition and is the source term. The differential matrix equation, such as the Sylvester matrix equation, could be the following form
(3) 
where for each , are large and sparse, and are full rank matrices with . The initial condition is , where .
The TDLS Eq. 1 can be discretized by appropriate approaches, such as linear multistep schemes, RungeKutta methods, general linear methods, block implicit methods, boundary value methods (BVMs) [31, 17, 10, 16, 22, 8, 7]. By the time discretization scheme, TDLS Eq. 1 becomes a linear algebraic equation, , where is a block sparse matrix. This linear system is usually large. Directly solving it may be expensive. Alternatively, iteration methods become a better choice. Alternating direction implicit (ADI) methods are a class of useful iteration schemes, which were initially designed to solving partial differential equations [25, 15]. Afterward, the idea of the ADI methods has been extended to more branches, such as including numerical algebra, optimization [29, 23]
. Among these ADI methods, a large class of Hermitian and skewHermitian splitting (HSS) type methods have been developed to solve linear systems
[3, 4, 30, 5]. More recently, a general ADI (GADI) framework has been presented to put most existing ADI methods into a unified framework [21]. In this paper, we are concerned with the development of GADI methods for TDLS.From the theoretical viewpoint, the convergence of exist GADI methods requires that one splitting matrix is positive definite and the other is positive semidefinite [21]. Similarly, the convergence of some HSS methods needs that the matrix of system is positive definite [3, 4]. For the TDLS, these assumptions may be difficult to meet. Therefore, it requires to designe new GADI methods with more reasonable convergence assumptions.
Further, as splitting schemes, ADI methods require to split the matrix into different parts with splitting parameters. And the convergence and efficiency of ADI methods are very sensitive to splitting parameters. Therefore, choosing the optimal splitting parameters is critical. A common approach is using theoretical analysis to estimate the bound of splitting parameters in a casebycase way [3, 11]. However, in practice, this method has certain limitation for solving largescale systems and the efficiency heavily depends on the theoretical bound. Recently, Ref. [21] has presented a datadriven approach, the Gaussian process regression (GPR) method by choosing appropriate kernel, to predict relatively optimal splitting parameters. The GPR has a lot of advantages such as being easy to implement, high precision, and high generalization capability. The GPR method can efficiently predict one splitting parameter one time [21]. However, there may be two or more splitting parameters in GADI methods. And these splitting parameters may have some complicated connections. Independently predicting each splitting parameter would inevitably affect the predicted accuracy of original GPR method. Therefore, it requires to improve the GRP method to predict multiparameters simultaneously. Another critical component that determines the GPR’s availability is the kernel function [33, 35]. The original GRP method chooses kernel function by the problem’s properties [21], which might produce artificial error. Meanwhile the chosen kernel in a kind of problem may be difficult to extend to other problems. Therefore, how to automatically learn kernel functions from problems is still a challenge in predicting splitting parameters.
1.2 Our work
In this work, we mainly concern developing new and efficient GADI methods to solve TDLS Eq. 1. Our contributions are summarized as follows:

We prove that the GADIKP method converges to the unique solution when all eigenvalues of one splitting matrix have positive real parts and those of another have nonnegative real parts, which does not require the positive definiteness.

We present a multitask kernel learning GPR method to predict multiple splitting parameters simultaneously and to automatic learn kernel function.

Based on the GADIKP method, we provide an effective preconditioner to accelerate Krylov subspace methods in solving TDLS Eq. 1. The spectrum of the preconditioner matrix has a tight cluster.

We apply our developed methods to some standard TDLSs, including twodimensional (2D) diffusion and convectiondiffusion equations, and the differential Sylvester matrix equation. Numerical results demonstrate the efficiency of our methods.
1.3 Organization
The rest of this paper is organized as follows. In Section 2, we describe the discrete process of TDLS by BVM in detail, and give two concrete applications of linear differential systems and linear matrix systems. In Section 3, we propose the GADIKP method for TDLS, and prove the convergence property. We present a preconditioning strategy based on the GADIKP method to speed up Krylov subspace methods. Additionally, to improve the GPR method, we propose the multitask kernel learning GPR method. In Section 4, we show the efficiency and superiority of our methods by solid numerical experiments. In Section 5, we carry out the summary of whole paper and future work prospects.
2 Discretization of TDLS
In this section, we describe the discrete process of TDLS Eq. 1 by BVM, and apply it to linear differential systems and linear matrix systems.
2.1 Notation
Throughout the paper, the sets of complex and real matrices are denoted by and , respectively. If , let and denote the transpose and inverse of . and denote the spectral set and spectral radius of . is an
order identity matrix.
represents thenorm of a vector. Let
, the Kronecker product of and is defined byWe recall some properties of Kronecker products. More context about the Kronecker product can refer to [20].
Lemma 1
Let , , then
(1);
(2);
(3);
(4);
(5).
Here, the ‘’ operator transforms matrices into vectors by stacking columns
2.2 Temporal Discretization
We give a brief description of BVM [8, 9]. For (), the step BVM is
(4) 
where is the step size, , , and are parameters. The extra initial and final equations are equationparentequation
where the coefficients and are chosen such that the truncation errors over all node are consistent.
By applying the BVM, the TDLS Eq. 1 is discretized as
(6) 
where , , , the matrices have the following structures
2.3 Applications
In this work, we consider two types of TDLSs, the linear differential system Eq. 2, and the linear matrix system Eq. 3.
2.3.1 Linear differential system
By appropriately spatial discretization, such as finite difference and finite element methods, we can obtain the ordinary differential equation (ODE) system
(7) 
where contains approximate values of over spatial grid nodes.
is the degree of freedom of spatial discretization.
and are similar notations corresponding to and . are mass and stiff matrices, respectively. Then we apply BVM to Eq. 7, we can arrive at a linear system(8) 
where , , .
2.3.2 Linear matrix system
3 Our methods
In this section, we propose GADIKP method, a new splitting iteration method, to solve TDLS Eq. 1. Compared with existing ADI methods, this method can converge to the unique solution under a mild condition. More significantly, to improve the GPR method in multiparameter prediction, we introduce the multitask kernel learning GPR method. We also present a preconditioning strategy based on the GADIKP method to accelerate the Krylov subspace methods.
3.1 GADIKP method
In this subsection, we propose the GADIKP method and give a fast algorithm of GADIKP method by using the properties of Kronecker product. The generalized Kronecker product splitting (GKPS) method [11, 12] is equationparentequation
(10a)  
(10b) 
Inspired by the GADI framework [21], we add a viscosity term to the right side of Eq. 10b
Then we obtain the GADIKP scheme
(11) 
where , the splitting parameters and . Notice that, when , the GADIKP method reduces to the GKPS scheme [12], when and , the GADIKP method becomes the Kronecker product splitting (KPS) approach [11].
Eliminating the intermediate vector , we can write the GADIKP method in a fixed point form
where the iteration matrix of the GADIKP method is
(12)  
and
(13) 
It is easy to see that there is a unique splitting
where
Notice that
(14) 
Therefore, the GADIKP method Eq. 11 reduces to
Using the properties of Kronecker product, we can obtain a fast implementation of the GADIKP method (see Algorithm 1).
Remark
For each iteration of the GADIKP method, the computational complexity by directly solver is . Base on Algorithm 1, the computational complexity can be reduced to .
3.2 Convergence analysis of GADIKP
In this subsection, we discuss the convergence of GADIKP method.
Theorem
Assume that all eigenvalues of have positive real parts and all the eigenvalues of have nonnegative real parts. Then for any , and , the GADIKP method Eq. 11 is convergent to the unique solution of the linear system Eq. 8. The spectral radius of iteration matrix of GADIKP method satisfies
where
(15) 
Proof
Let and be the eigenvalues of the matrices and , respectively. Then by the property of the Kronecker product, the eigenvalues of the iteration matrix Eq. 12 have the following form:
Let and , then
which is equivalent to
where , .
3.3 Krylov subspace acceleration with GADIKP preconditioner
In this subsection, we give a preconditioning strategy based on the GADIKP method to accelerate Krylov subspace methods.
The GADIKP method can be used as a preconditioner to accelerate Krylov subspace methods such as GMRES [27]. From Eq. 14, the linear system Eq. 8 is equivalent to
(17) 
where and the matrix is a preconditioner. This equivalent system can be solved by GMRES. Algorithm 2 gives the solving process for Eq. 8 by putting the matrix as a preconditioner in GMRES. Notice that using the GADIKP preconditioner within GMRES requires solving a linear system at each iteration. We can use Algorithm 1 to economically solve the above preconditioned linear system.
Assume that all the eigenvalues of have positive real parts and all the eigenvalues of have nonnegative real parts, , and . Since , we can naturally conclude that all the eigenvalues of the preconditioned matrix are located in a circle with radius . Compared with the KPS and GKPS preconditioners (see Figure 1), GADIKP preconditioner can further reduce the condition number of matrix , and the eigenvalue distribution of preconditioned system has a more tighter bound.
3.4 Multitask kernel learning GPR
In this subsection, we propose a multitask kernel learning GPR method which improves both simultaneous prediction of multiple splitting parameters and datadriven kernel selection.
3.4.1 Multitask GPR
GADItype methods have two or more splitting parameters. As we know, the splitting parameters heavily affect the efficiency of GADI approaches. Now we propose a practical multitask GPR method to simultaneously predict these splitting parameters. In the multitask method, we are interested in learning related functions , from training data , with , and . Consider the following noised model to avoid singularity in numerical computation
(18) 
where denotes the th output (input) of the th task,
is the white noise of the
th task.Let be the output vector, and latent function be the input vector. The multitask regression problem can be presented as the Gaussian process prior over the latent function
where and are the task and data covariance matrices, respectively [6]. The noised model Eq. 18 becomes
where is a diagonal matrix with . The prediction distribution for the th task on a new point is
where
. and are the th diagonal element and the th column of , respectively. is a covariances vector between test point and training point , and
denotes the covariance matrix of all training points. Hyperparameters
and appear both in the task and data covariance functions. In order to achieve the hyperparameter estimation, we apply the LBFGS method to maximize marginal likelihood function in logarithmic form3.4.2 Kernel learning approach
Choosing an appropriate kernel function determines whether GRP method succeeds. The original GRP method directly selects kernel function from the feature of problems [21]. It might result in artificial error and could be hardly extended to other problems. For example, Fig. 3 shows that the GPR method with manual kernel function will produce a bad regression curve. How to give an automatic way to choose kernel functions is worth investigating. Here we present a datadriven method to learn the kernel function. Give a kernel library that contains basic kernel functions and their multiplicative combinations. For the th () training task, the required kernel function is the linear combination of library elements
For the training multitask, the weighted matrix is
All weights can be obtained by training the data from concrete TDLS. A similar idea can be also found in pattern discovery [34, 32].
Remark
In GADI methods, the training data comes from the smaller systems [21]
. It is a small data set. Learning kernel function by (deep) neural network from small data is difficult. Therefore, we predetermine the kernel library via a priori knowledge instead of directly learning kernel function from data.
4 Numerical experiments
In this section, we present three numerical examples: 2D diffusion and convectiondiffusion equations, and the differential Sylvester matrix equation, to show the performance of our proposed GADIKP, GMRES with GADIKP preconditioner (GMRESGADIKP) methods. As a comparison, we also give the experimental data of KPS, GKPS, GMRES, and GMRES with GKPS preconditioner (GMRESGKPS) methods. All computations are carried out using PyCharm 2020.2 on a Mac laptop with 2.3 GHz Quad Intel Core i5. All tests are started with the zero vector. All iteration methods are terminated if the relative residual error satisfies , where is the step residual.
In the multitask kernel learning GPR method, the kernel library contains linear, Gaussian, periodic kernels, and their multiplicative combinations. Concretely, three basic kernel functions are
Linear kernel:  
Gaussian kernel:  
Periodic kernel: 