Physics-informed neural networks (PINNs) are increasingly used to solve a wide range of forward and inverse problems involving partial differential equations (PDEs), including fluids mechanics(Raissi et al., 2020), materials modeling (Liu and Wang, 2019), safety verification (Bansal and Tomlin, 2021) and control (Onken et al., 2021) of autonomous systems. Despite their success of learning complex systems using the simple multilayer perception (MLP) architecture, large neural networks are often required to achieve high expressive power. This has significantly increased the memory and computing cost of training a PINN. Furthermore, a PINN often has to be trained many times in practice once the problem setting (e.g., boundary condition, measurement data, safety specification) changes.
It is increasingly important to enable PINN training on resource-constraint edge devices. On one side, safety-aware learning-based verification and control (Bansal and Tomlin, 2021; Onken et al., 2021) often require the PINN to be trained on a tiny embedded processor of an autonomous agent. On the other side, the emerging digital twin and smart manufacturing need AI-assistant design with IP protection (Stevens et al., 2020), where federated learning with many edge devices allows users to design shared AI models without disclosing their private data. In both cases, training has to be done on edge devices with very limited memory, computing and energy budget.
This paper proposes TT-PINN, an end-to-end tensor-compressed method for training PINNs. This method achieves huge parameter and memory reduction in the training process, by combining Tensor-Train compressed model representation and a physics-informed network to approximate the solutions of PDEs. We use this method to solve a Helmholtz equation and compare it with standard PINNs. With only thousands of parameters, our models significantly outperform the original PINNs of similar or larger sizes.
2 Background: PINN
We consider the problem of solving a PDE
where and are the spatial and temporal coordinates respectively, and denote the computational domain and its boundary; is a general linear or nonlinear operator; is the solution of the above PDE with the initial condition and the boundary condition. In PINNs (Raissi et al., 2019), a neural network approximation parameterized by is substituted into the PDE (1) and yields a residual defined as
We train parameters
by minimizing the loss function
penalize the residual of the PDE, the boundary conditions and the initial conditions respectively; , , and are the numbers of data points for corresponding loss terms.
3 The TT-PINN Method
3.1 TT-PINN Architecture
In this work, we consider tensor-compressed training of PINN based on a multilayer perception (MLP) network. A standard MLP uses an -layer cascaded function
with and to approximate the solution. The weight matrix can consume lots of memory, making the training unaffordable on edge devices. This challenge becomes more significant when the PDE operator involves highly inhomogeneous material properties or strongly scattered waves. In these cases, large neural networks are often needed to obtain high expressive powers.
As shown in the Figure 1, the TT-PINN replaces the weight matrix of an MLP layer by a series of TT-cores in the training process. For simplicity, we drop the layer index, and let denote a generic weight matrix in an MLP layer. We factorize its dimension sizes as and , fold into a -way tensor , and approximate with the TT-decomposition (Oseledets, 2011):
Here is the -th slice of the TT-core by fixing its nd index as
. The vectoris called TT-ranks with the constraint . This TT representation reduces the number of unknown variables in a weight matrix from to . The compression ratio can be controlled by the TT-ranks. Recent approaches can learn proper TT-ranks automatically in the training process via a Bayesian formulation (Hawkins and Zhang, 2021; Hawkins et al., 2022).
In most existing works of TT-layer (Novikov et al., 2015), a Tensor-Train-Matrix (TTM) decomposition is used, in which the weight matrix is represented by -way TT-cores instead of -way TT-cores as we described above. Here we adopt TT instead of TTM, because the TT format allows easy tensor-network contraction (shown in Section 3.2), which can greatly reduce the memory and computational cost in both forward and backward propagation.
3.2 Forward & Backward Propagation of TT-PINN
Compared to techniques that compress a well-trained model for inference, TT-PINNs are directly trained in the compressed format. Specifically, the TT-cores that approximate a weight matrix are directly used in the forward propagation and updated in the backward propagation.
Memory-Efficient Forward Propagation.
As shown in (5), the main cost in a forward pass is computing a matrix vector product like . Instead of reconstructing from its TT-cores, we directly use its low-rank TT-cores to obtain the result. Specifically, let be the folding of into a -way tensor, then TT-PINN computes a series of tensor-network contractions between tensor and the TT-cores as shown in Fig. 2.
We use the tensor-network notation (Orús, 2014; Cichocki, 2014) to show the computation process. A generic -way tensor is represented by circle and edges; a shared edge among two tensors mean production (i.e., contraction) along that dimension. Before the computation starts, TT-cores are neither connected to each other nor connected to the tensor . We now explain the whole process by three steps. Firstly, the tensor contracts with the last TT-core as shown by the red dashed rectangle in Fig. 2 (a), producing an intermediate tensor , in which the size of the -th dimension changes from to and all the other dimensions remains unchanged. In the second step, the rest of the red TT-cores are contracted in sequence, from to . Fig. 2 (b) shows the contraction between the first intermediate tensor and on two dimensions, producing a -way tensor . Similarly, each time the -th intermediate tensor contracts with the -th TT-core , and the resulting tensor will have one dimension eliminated. After contracts with the last red TT-core , the resulting tensor will only have one dimension of size , as shown in Part (c) and (d) of Fig. 2. Finally, we contract with and connect all the other TT-cores together by sequentially contracting with , , and all the way to , obtaining the final result as a vector of size .
After the forward propagation, our proposed TT-PINNs calculate the customized loss function similarly as the traditional PINNs, then the backward propagation begins, in which the auto-differentiation (AD) algorithm (Baydin et al., 2018)
is applied. Since the AD automatically records each computation step and the evolved objects during the forward pass to generate a so-called computational graph that is used to calculate the gradient of the loss w.r.t each object through the chain rule, we are able to obtain the gradient for each TT-core thus directly updating each TT-core using stochastic gradient descent.
Through the whole process of the forward and backward propagations in the proposed TT-PINNs, all computations are done on the compressed parameters, i.e., TT-cores, instead of a full-size weight matrix. Therefore, this end-to-end compressed training framework can largely reduce the memory cost during the training.
4 Experiments and Results
In this section, we present a series of numerical studies to assess the performance of the proposed TT-PINN against a standard MLP PINN. Specifically, we consider a two-dimensional Helmholtz PDE:
where is the Laplace operator and is the wave number. The exact solution to this problem takes the form , corresponding to a source term
The PINN approximation to solving (7) can be constructed by parametrizing its solution with a deep neural network
The above transformation is applied to the neural network to exactly meet the Dirichlet boundary condition (Lu et al., 2021). Then the parameters can be identified by minimizing the total residual at collocation points that are randomly placed inside the domain .
We use this benchmark problem to compare the performances of TT-PINNs against PINNs in terms of the total number of parameters. Specifically, we consider a set of neural networks with 3 hidden layers, and we control the number of parameters by varying the number of neurons per layer for PINNs and the choice of TT-ranks for TT-PINNs. In TT-PINNs, the TT-ranks were determined by the desired compression ratio for each hidden layer. For example, to compress aweight matrix in a fully-connected layer with compression, the TT-ranks are determined as when factorizing each dimension of as . It is also possible to automatically determine the TT-ranks via the Bayesian tensor rank determination in (Hawkins et al., 2022; Hawkins and Zhang, 2021). To guarantee convergence, all models are trained with 40,000 iterations. As for the training settings, we use the Adam optimizer (Kingma and Ba, 2014) with an initial learning rate decayed by the factor of 0.9 after each 1000 iterations. The neural networks are initialized by the Xavier initialization scheme (Glorot and Bengio, 2010), and a activation function is applied to each neuron.
Table 1 and Table 2 summarize our results. Clearly, the expressive power of both standard PINNs and the proposed TT-PINN scales with its model size: larger models provide better approximation to the ground-truth solution. However, our proposed TT-PINNs achieves satisfactory prediction while using much less parameters than a fully connected 3-layer PINN with 256 neurons per layer. To avoid any confusion, the compression ratios reported in Table 2 are for tensorized hidden layers, not for the whole model because so far we only tensorize the hidden layers and leave the input layer and the output layer uncompressed.
Figure 3 shows a visualized comparison of the prediction performance between TT-PINNs and PINNs. As can be seen, the PINN with model parameters, which corresponds to 64 neurons per layer, produces the worst prediction among the 4 models. Meanwhile, the proposed TT-PINN with only parameters, which corresponds to compressing a weight matrix by in the training, achieves a significantly improved prediction. Also, the TT-PINN with parameters yields an equally accurate prediction as the PINN with model parameters. These results show that, by approximating a more complicated neural network with the low rank structure (i.e., TT-cores), our proposed TT-PINNs are capable of, in some level, preserving the expressive power of a larger PINN. This will greatly reduce the requirement of hardware resources in edge computing.
5 Conclusion and Discussions
In this paper, we have proposed an end-to-end compressed architecture for training PINNs with less computing resources. It is the first time that a low-rank structure is applied to achieve memory efficiency while maintaining satisfactory performance in training PINNs. This work is a promising solution for training PINNs on edge devices.
This work, however, is still at the early stage thus very limited. Firstly, the PDE we considered in this work is relatively simple and does not have stiffness issue that frequently occurs in many engineering problems. Secondly, the current network size we have considered is still relatively small, the performance of TT-PINN needs to be demonstrated on larger PINNs. Finally, deploying this framework on edge computing platforms (e.g., embedded GPU or FPGA) requires further algorithm/hardware co-design.
This work was supported by NSF # 1817037 and NSF # 2107321.
Deepreach: a deep learning approach to high-dimensional reachability. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. Cited by: §1, §1.
- Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research 18, pp. 1–43. Cited by: §3.2.
- Tensor networks for big data analytics and large-scale optimization problems. arXiv preprint arXiv:1407.3124. Cited by: §3.2.
Understanding the difficulty of training deep feedforward neural networks.
Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §4.
Towards compact neural networks via end-to-end training: a Bayesian tensor approach with automatic rank determination.
SIAM Journal on Mathematics of Data Science4 (1), pp. 46–71. Cited by: §3.1, §4.
- Bayesian tensorized neural networks with automatic rank selection. Neurocomputing 453, pp. 172–180. Cited by: §3.1, §4.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.
- Multi-fidelity physics-constrained neural network and its application in materials modeling. Journal of Mechanical Design 141 (12). Cited by: §1.
- Physics-informed neural networks with hard constraints for inverse design. SIAM Journal on Scientific Computing 43 (6), pp. B1105–B1132. Cited by: §4.
- Tensorizing neural networks. Advances in neural information processing systems 28. Cited by: §3.1.
- A neural network approach applied to multi-agent optimal control. In European Control Conference (ECC), pp. 1036–1041. Cited by: §1, §1.
- A practical introduction to tensor networks: matrix product states and projected entangled pair states. Annals of physics 349, pp. 117–158. Cited by: §3.2.
- Tensor-train decomposition. SIAM Journal on Scientific Computing 33 (5), pp. 2295–2317. Cited by: §3.1.
- Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, pp. 686–707. Cited by: §2.
- Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1.
- AI for science. Technical report Argonne National Lab.(ANL), Argonne, IL (United States). Cited by: §1.