Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture

05/27/2022
by   Yicheng Wang, et al.
Texas A&M University
Rice University
0

Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation. In forward modeling problems, PINNs are meshless partial differential equation (PDE) solvers that can handle irregular, high-dimensional physical domains. Naturally, the neural architecture hyperparameters have a large impact on the efficiency and accuracy of the PINN solver. However, this remains an open and challenging problem because of the large search space and the difficulty of identifying a proper search objective for PDEs. Here, we propose Auto-PINN, the first systematic, automated hyperparameter optimization approach for PINNs, which employs Neural Architecture Search (NAS) techniques to PINN design. Auto-PINN avoids manually or exhaustively searching the hyperparameter space associated with PINNs. A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs. We find that the different hyperparameters can be decoupled, and that the training loss function of PINNs is a good search objective. Comparison experiments with baseline methods demonstrate that Auto-PINN produces neural architectures with superior stability and accuracy over alternative baselines.

READ FULL TEXT VIEW PDF

page 4

page 15

page 16

page 17

page 18

05/03/2021

Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization

Neural architecture search (NAS) and hyperparameter optimization (HPO) m...
07/18/2018

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

While existing work on neural architecture search (NAS) tunes hyperparam...
11/09/2020

Neural Architecture Search with an Efficient Multiobjective Evolutionary Framework

Deep learning methods have become very successful at solving many comple...
06/24/2020

Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL

While early AutoML frameworks focused on optimizing traditional ML pipel...
09/09/2019

Neural Architecture Search in Embedding Space

The neural architecture search (NAS) algorithm with reinforcement learni...
06/26/2022

Noise-aware Physics-informed Machine Learning for Robust PDE Discovery

This work is concerned with discovering the governing partial differenti...
08/26/2020

NASirt: AutoML based learning with instance-level complexity information

Designing adequate and precise neural architectures is a challenging tas...

1 Introduction

Physics-informed neural networks (PINNs) raissi2019physics

are promising partial differential equations (PDE) solvers which integrate machine learning with physical laws. Benefiting from the strong expressive power of deep neural networks, PINNs are widely adopted to solve various real-world problems, such as fluid mechanics

cai2022physics; jin2021nsfnets; sun2020surrogate, material science haghighat2021physics; zhang2022analyses; zhang2021physics and biomedical engineering sahli2020physics; kissas2020machine; liu2020generic. PINNs do not require the time-consuming construction of elaborate grids, and can therefore be applied more easily to irregular and high-dimensional domains than traditional PDE solvers can.

The structures of PINNs are usually simple multilayer perceptrons (MLPs). In the training process, the physical laws in PDE systems are rewritten as loss functions. The PINNs are trained to fit the groundtruth solutions under the supervision of the loss functions. Similar to other deep learning tasks, the neural architecture configurations of MLP networks, such as depths/widths, and activation functions, have great effect on the performance of PINNs. However, there is little research on this problem. For instance,

raissi2019physics found that increasing the width and depth of PINNs will improve the predictive accuracy, but their experiments are limited to a single PDE problem within a very small search space. While the Tanh activation function is the default option for PINNs, some studies al2021time; markidis2021old report that Sigmoid or Swish RamachandranZL18

functions are more effective in some cases. However, they did not reach a conclusion about which activation function is preferred for various PDE problems. Therefore, further investigation is required to understand the relationship between the PINN architectures and their performances. Moreover, there are a number of important hyperparameters for training PINNs, such as the learning rate, the number of training epochs, and the choices of optimizers. Manually tuning the architecture and hyperparameters is tedious and laborious. Therefore, we are motivated to study the following research question:

Can we automate the process of architecture and hyperparameters selection to improve the performance of PINNs?

Despite the recent progress of automated hyperparameter tuning optuna_2019; bergstra2013hyperopt and neural architecture search (NAS) liu2018darts; pham2018efficient; cai2018efficient, automating the neural architecture design of PINNs remains an open and challenging problem. First, the search space includes both discrete and continuous hyperparameters, and is extremely large. Existing hyperparameter optimization methods usually search the whole hyperparameter space, which can be inefficient. Second, the search objective for PINNs is unclear. Unlike other tasks that can natually use the performance metric (e.g., accuracy) as the search objective, many PDEs may have no exact solutions such that the error values are not available. Therefore, we have to identify an alternative search objective.

To this end, we first conduct a comprehensive set of benchmarking pre-experiments to understand the search space by studying the relationship between each hyperparameter and the performance. We make two key observations from the experiments. First, we find that some design choices play a dominant role in the performance. For example, there is often a dominant activation function working better for each PDE. This motivates us to reduce the search space by decoupling it in a certain order. For instance, we can determine the best activation function with a small number of search trials, and then fix the activation function and focus on the search of the other hyperparameters. We observe similar phenomenons for other hyperparameters such as the changing point, depth, and width, which enables us to decouple them in a similar fashion. Second, we discover that the loss values are highly correlated with the errors. This makes the loss value a desirable search objective since it can be naturally obtained during the search for all the PDEs.

Based on the above observations, we propose Auto-PINN, the first automated machine learning framework to optimize the neural architecture and the hyperparameters of PINNs. Auto-PINN adopts a step-by-step decoupling strategy for search. Specifically, we search one hyperparameter at each step with the others fixed to one or a few sets of options. This strategy decreases the scale of the search space drastically. We perform extensive experiments to evaluate Auto-PINN on seven PDE benchmarks with different training data sampling methods. The quantitative comparison results show that Auto-PINN outperforms the other optimization strategies on both accuracy and stability with less search trials.

We summarize our main contributions as follows:

  • [leftmargin=0.4cm, itemindent=.0cm, itemsep=0.0cm, topsep=0.0cm]

  • We conduct a comprehensive set of benchmarking pre-experiments on the hyperprameter-performance relationships of PINNs. Our observations suggest we can significantly reduce the search space via decoupling the search of different hyperparameters. We also identify the loss value of PINNs as a desirable search objective.

  • We propose Auto-PINN, the first automated neural architecture and hyperparameter optimization approach for PINNs. The decoupling strategy can substantially decrease the search complexity.

  • We evaluate Auto-PINN on a series of PDE benchmarks and compare it with other baseline methods. The results suggest that Auto-PINN can consistently search PINN architectures that display good accuracy in different PDE problems, which outperforms other search algorithms,

2 Preliminaries

2.1 Partial Differential Equations (PDEs) and Physics-Informed Neural Networks (PINNs)

We briefly review the baseline PINN algorithm, proposed by raissi2019physics. Consider a general form of a partial differential equation (PDE) defined on a bounded spatio-temporal domain {}:

(2.1)
(2.2)
(2.3)

where is a spatio-temporal differential operator, is the boundary of the domain, and , and specify the source, boundary condition, and initial condition, respectively. Please refer to Appendix A to see a few concrete examples of PDEs.

A PINN is a deep learning approximator of the solution of a PDE. Namely, a model , with a set of learnable parameters

, e.g., a vanilla multilayer perceptron (MLP), is utilized to approximate

in the domain . The governing equation, boundary conditions, and initial conditions are rewritten into the training loss of the neural network as follows:

(2.4)
(2.5)
(2.6)
(2.7)

where is the PDE residual loss on collocation training points sampled randomly in the domain, is the boundary condition loss on boundary points , and is the initial condition loss on initial points .

Benchmarking PDEs. We select a set of standard PDE benchmarks for the experiments. We conduct pre-experiments on four representative PDEs. They are two diffusion (heat) equations, a wave equation and a Burgers’ equation which are commonly used in PINN research lu2021deepxde. We will refer to them as Heat_0, Heat_1, Burgers and Wave in the following sections for conciseness. These PDEs include different kinds of differential operators and boundary/initial conditions, which are capable of illustrating regular rules for PINNs in the pre-experiments. Moreover, we design different data sampling schemes for each PDE to improve the credibility of the results. These PDEs are also involved in the formal comparison experiments, along with another three PDEs, which are two advection equations (Advection_0 and Advection_1) and a reaction equation (Reaction). The details of these PDEs and the data sampling methods are shown in the Appendix A.

2.2 Search Space

In this section, we define the search space for the PINN architectures. Here we consider the following variables.

  • [leftmargin=0.4cm, itemindent=.0cm, itemsep=0.0cm, topsep=0.0cm]

  • Width and Depth. For an MLP- structured PINN, the depth is the number of hidden layers and the width

    means the number of neurons in each hidden layer. We set the

    width ranging in with the step of (only for Heat_0) or in with the step of (for other PDEs). The depth ranges in with the step of .

  • Activation Function. The activation functions

    in PINNs determine different non-linear elements in the networks. We provide four options: Tanh, Sigmoid, ReLU and Swish

    RamachandranZL18. We leave the definitions of these activation functions in Appendix B.

  • Changing Point. According to lu2021deepxde, PINNs can reach their best performance by training with an Adam optimizer in the first stage to get closer to the minimum and then switching to an L-BFGS second-order optimizer liu1989limited. We need to decide the timing of that change. Therefore, we introduce a hyperparameter named Changing Point as a float number ranging from 0 to 1. This changing point indicates the proportion of the epochs using Adam to the total training epochs. For example, if the training epoch number is set to 10000 with a 0.4 Changing Point, that means the PINN will be trained with the Adam optimizer for epochs, followed by -epoch L-BFGS training. However, it makes little sense to search on a precise grid, so we only consider five discrete options .

Initially, we do not include learning rates and the training epochs into the search. However, we will show results on those two training hyperparameters in Section 5.3, which indicate that they have a small effect on the final architecture search results.

3 Pre-Experiments and Observations

As we mentioned previously, neural architecture optimization for PINNs is still an under-explored problem. Therefore, we should first explore general rules for the hyperparameters of PINNs. Differently from other deep learning tasks, the training strategy and the physical constraints of PINNs are unique. Therefore, we do not simply apply a hyperparameter search algorithm, but first do pre-experiments to understand the behavior of PINNs. For all experiments in this section, the PINNs are trained with training epochs and the learning rate is set to .

Figure 1: L2 Error heatmaps on different PDEs with different PINN configurations.

We first study the relationship between structure and performance of the PINNs. Throughout this paper, the main figure of merit to measure accuracy is the relative error:

(3.1)

where and are respectively the PINN prediction and a high-fidelity PDE solution on a dense set of test points in the PDE domain. The reported errors are averages over three separate random initializations of the neural network weights in each case.

A set of heatmaps is shown in Figure 1 to display the L2 error results. Each row corresponds to a different PDE, and each column corresponds to a different activation function. The x-tick labels of each heatmap represent different width and depth settings of PINNs from small to larger scales. For instance, "" means the width is and the depth is . The y-tick labels are different changing points from to . For each cell in the heatmaps, the top number is the smallest L2 error value that the PINN actually reached in its training process. For a direct visual representation, the cells with deeper colors correspond to smaller error values, i.e., the PINNs have better performances. The number in the parentheses in each cell is the absolute distance from the smallest actual error value to the error value when the smallest loss function value is reached. In a real scenario when no solution to the PDE is available, and therefore the L2 error is unknown, we would like to investigate whether the training loss function is a good alternative, so we report these distances. On the top of each heatmap, we report the average, median and minimum error values across the heatmap. The numbers in the parentheses around the x and y labels are mean error values across columns and rows, which show the average performance when some of the hyperparameters are fixed. More heatmap results are shown in Appendix C. We obtain several observations from these heatmaps:

Observation 1

There is a dominant activation function in PINNs working better for each PDE, which can be easily found by searching a small subset of the whole space. For example, it is easy to see that Tanh is the best choice for Heat_0. Median error values across the subsets is a good metric to determine the dominant activation function.

Observation 2

Under the dominant activation function, the larger changing points perform better or comparably than smaller values, as can be seen in the average error values shown beside the y-axis.

Observation 3

The "wider and deeper PINNs are better" rule does not apply to all PDEs, e.g., the Wave PDE.

Observation 4

The error distances (the values in parentheses in the cells) are usually very small, i.e., when the loss functions reach the smallest values in the training processes, so do the L2 error values.

Observation 1 and 2 suggest that it is possible to decouple the activation function and changing point from the search space. Observation 3 indicates that further research on the MLP structure is needed to determine widths and depths for PINNs. Observation 4 means that overfitting is not a problem for PINNs. The error value at the minimum loss function value is close to the minimum error that a PINN can actually reach. However, it is just a local summary for each cell. We need to compare the loss-error relationship between cells to establish that the loss function is a good search objective over the entire space.

3.1 Structure-Error Relationship

Width and depth are key structure hyperparameters of PINNs. However, as mentioned in Observation 3, the pre-experiments above cannot give a clear relationship between the structures and error values of PINNs because of the low sampling rate in the spaces of the two hyperparamters. For that reason, we continue by doing more specific pre-experiments on the structure-error relationship.

The results are shown in Figure 2. Each data point (joined by lines) is the average L2 error over three random initializations. We fix the activation function and the changing point in each case and then sample the space of width and depth. More results can be found in Appendix C. Clearly, there are many width regions in which the best-performing network is not the deepest, e.g., width within for the Heat_0 and around for the Wave. This confirms again Observation 3 that the "wider and deeper PINNs are better" rule does not always apply and that it is important to determine the optimal depth for a given width.

Figure 2: Structure-error relationship.
Observation 5

Good regions in the space of width can be identified, and then one can search for the optimal depth.

3.2 Loss-Error Relationship

We would like to investigate if the loss function is a representative search objective across the entire configuration space. Hence, we report the loss-error relationship for different PDEs in Figure 3. The -axis is the lowest loss values that the PINNs reached in the training process, the -axis is the corresponding L2 error value. Note that Observation 4 has informed us that the smallest training error values agree with the smallest L2 error values. Each point in the figure represents a PINN architecture configuration shown in Figure 1. The R-square value indicate a strong linear correlation between the logarithmic loss and error values across the whole search space. Therefore, it is appropriate to leverage these loss values to judge the real performance of the PINNs. We have more linear correlation evidence shown in Appendix C. Hence, we have another observation:

Figure 3: Loss-error relationship.
Observation 6

There is a strong linear correlation between the smallest log-loss values and the corresponding log-error values for the PINNs with different hyperparameter configurations in a PDE problem. We can take advantage of the training loss function values to assess the performance of the PINNs.

4 Auto-PINN: Automated Architecture Optimization for PINNs

The pre-experiments have provided us with a clear guideline on how to decouple the hyperparameter space, and we now can present our Auto-PINN approach. With the help of Observations 1–6, we are ready to decouple the hyperparameters in the large search space and find the best architectures with only a small number of search trials.

4.1 Methodology

Input: A PDE problem with training and testing data points. The hyperparameter search space is mentioned in Section 2.2.

Search Objective: The smallest loss values reached by the PINNs in the training processes (Observation 3 and 6).

Set changing points to 0.5 (Observation 2) for the following Step 1 to Step 2.2.

Search the activation function (Observation 1). Sample on the width space exponentially and the depth space uniformly. Use the median search objective to determine the dominant activation function. This activation function will become the only choice in the following steps.

Search the depth and width

Search the good-performance regions of width (Observation 5). Split the width space into several intervals. Sample several width settings in each interval to represent the performance of their interval, then combine them with the uniformly sampled depth settings. Use the median search objective to find several best width intervals as the good-performance regions.

Search the best depth settings. Collect all width settings inside the "good performance" regions with all depth settings. Search and find the top candidate structure configurations.

Search the best changing point for each candidate.

Verify the performance of the searched PINN architectures. Retrain every selected PINN five times with different intializations and report its median performance.

Output: different PINN architectures with small training loss values, which correlate with small testing errors (Observation 4).

Please refer to Appendix C to see the default settings of the Auto-PINN algorithm.

4.2 Complexity Analysis

The scale of the search space is greatly decreased by utilizing the proposed Auto-PINN algorithm. The entire search space in Section 2.2 contains possible configurations. Auto-PINN under the default settings only needs at most trials to find the best architectures, which is only to the whole search space. Detailed analysis is in Appendix C.

5 Experiments

5.1 Experimental Settings

In this section, we present the results of experiments that validate the effectiveness of the proposed method.

PDE Benchmarks. We conducted experiments using the Heat_0, Heat_1, Wave, Burgers, Advection_0, Advection_1 and Reaction PDEs, which are described in Appendix A.

Baseline Methods. Random Search and HyperOpt bergstra2013hyperopt are selected as the baseline methods for comparison with Auto-PINN.

Implementation Details. We set the learning rates to and the number of training epochs to . We implemented Auto-PINN and the baseline methods with the PINN package DeepXDE lu2021deepxde and hyperparameter tuning package Tune liaw2018tune. DeepXDE lu2021deepxde

is a user friendly open-sourced library for physics-informed machine learning including common PINNs and different training strategies. We use DeepXDE with some modifications in the APIs. Tune

liaw2018tune

is another Python library for experiment execution and hyperparameter tuning. It supports all mainstream machine learning frameworks and a large number of hyperparameter optimization methods. We utilized the trial parallelism feature of Tune to make our Auto-PINN more efficient. Underlying MLP models are built and trained with PyTorch framework. We ran the experiments on 4 Nvidia 3090 GPUs.

5.2 Comparison Results

The results are shown in Figure 4

. Compared with the two baseline methods, the results show that Auto-PINN is more stable, as can be seen from the mostly shorter bar lengths. However, the Random Search and HyperOpt suffer from very large performance variances frequently. Meanwhile, Auto-PINN is capable of finding the architectures with good performance. In most cases, the architectures with the smallest error values, i.e., the bottom of the bars, found by Auto-PINN matches or exceeds the performances of the baseline methods. In summary, Auto-PINN outperforms the baseline methods in both stability and accuracy with fewer search trials.

Figure 4: Comparison Results. Each subfigure shows the range of median errors of the searched architectures in each PDE problem. Here, we use the median error value of each searched architecture as the figure of merit. The top and the bottom of each bar show the median error values of the worst and the best architectures, respectively. The -axes of the subfigures are different data sampling methods.
Figure 5: Searched architecture distributions by Auto-PINN with different learning rate and epoch settings. The sizes of the markers display the relative performances of each PINN configuration. Larger markers mean better performance, that is, the corresponding PINNs have smaller testing error values.

5.3 Influence of Learning Rates and Training Epochs

As we mentioned, we do not consider the learning rates and training epochs in our search space. Further experimental results indicate that Auto-PINN is not sensitive to these two training hyperparameters. As shown in Figure 5, the searched architectures with different learning rates and epochs congregate at a specific region in the search space, which means those architectures searched by Auto-PINN are still available within a range of proper learning rates and epochs. Therefore, there is no need to search again with different learning rates and the numbers of epochs. On the other hand, we can see that the PDEs show different preferences in the structures. For example, the Heat_0 PDE requires wider structures but is insensitive to the depth, whereas the Wave PDE is not sensitive to width but prefers shallower PINNs. Auto-PINN is able to identify consistent architectures for different PDEs, which is a very important point for future research.

Figure 6: Searched architecture distributions by Auto-PINN with different data sampling methods.

5.4 Influence of Data Sampling

Different PDEs have distinct sensitivity to data sampling, as illustrated in Figure 6, where we employed random and uniform sampling schemes, each using two different sampling densities, for the collocation, boundary and initial points; see Appendix A for the details. The results confirm that the structure-sampling relationship is distinct for each PDE. Therefore, it is not appropriate to simply copy-and-paste the searched architectures when the sampling strategy changes. We remark that more sophisticated adaptive- and residual-based data sampling methods CiCP-29-930; nabian2021efficient; peng2022rang for PINNs have been proposed. These sampling methods can be used as an internal setting for Auto-PINN to achieve better search accuracy in future work.

6 Related Works

Neural Architecture Search (NAS). NAS elsken2019neural

is an AutoML technique for automating the architecture design engineering for neural networks. Traditional NAS methods includes random search, evolutionary algorithms and Bayesian optimization. Some recent works build NAS pipelines with reinforcement learning

cai2018efficient; pham2018efficient or gradient-based strategy liu2018darts

. However, most NAS methods are computationally expensive and specifically designed for convolutional neural networks. There is little research on NAS pipelines for MLPs.

NAS for PINNs. Most research on PINNs focuses on training strategies such as the problem of loss function weighting mcclenny2020self; wang2021understanding; wang2022and. There are few studies on neural architecture search methods for PINNs. markidis2021old investigates simple features of the PINNs under different configurations using a Poisson equation and then designs a PINN-traditional hybrid PDE solver. skomski2021automating

proposes a genetic algorithm to search the types of the networks and the optimization functions for several dynamic systems. Another research

guo2020stochastic simply applies some existing hyperparameter optimization approaches to optimize their PINN models on solving a groundwater flow problem. Beyond that, there are no other studies, as far as we know, on NAS for PINNs using different PDE benchmarks.

7 Conclusion and Future Work

In this paper, we proposed Auto-PINN, the first systematic neural architecture and hyperparameter optimization approach for PINNs, which can search for the best architectures and hyperpramsters for different PDE problems within the large search spaces that are characteristic of PINNs. We conducted a comprehensive set of pre-experiments to understand the search space of PINNs. Based on the observations, we proposed a step-by-step decoupling strategy to reduce the search space and use the loss value as the search objective. The comparison results demonstrate the stability and the effectiveness of Auto-PINN. In addition, we perform experiments to analyze the influences of the learning rate, the training epochs, and the data sampling strategies. We found that the performance is not sensitive to the learning rates and the training epochs, while the best configuration depends on the adopted sampling strategy. We hope the insights gleaned from our observations can motivate future exploration in PINNs and that Auto-PINN can serve as a strong baseline in future research. As future work, we plan to incorporate more sophisticated data sampling strategies into the search space of Auto-PINN to achieve better performances.

References

Appendix A Benchmarking PDEs and Data Samplings

In this appendix, we present the PDEs and the data sampling strategies used in our experiments.

a.1 Benchmarking PDEs

We utilize some standard PDE benchmarks in our experiments, including two heat equations, a wave equation, a Burgers’ equation, two advection equations, and a reaction equation.

Heat_0: The heat equation describes the heat or temperature distribution in a given domain over time. This heat equation contains Dirichlet boundary conditions.

Equation: (A.1)
Boundary Condition: (A.2)
Initial Condition: (A.3)
Solution: (A.4)

Heat_1: This heat equation contains Neumann boundary conditions.

Equation: (A.5)
Boundary Condition: (A.6)
Initial Condition: (A.7)
Solution: (A.8)

Wave: The wave equation describes the propagation of oscillations in a space, such as mechanical and electromagnetic waves. This wave equation contains both Dirichlet and Neumann conditions.

Equation: (A.9)
Boundary Condition: (A.10)
Initial Condition: (A.11)
Solution: (A.12)

Burgers: Burgers equation has been leveraged to model shock flows, wave propagation in combustion chambers, vehicular traffic movement, and more. We use an accurate approximation of the solutionjohn2015burgers.

Equation: (A.13)
Boundary Condition: (A.14)
Initial Condition: (A.15)

Advection_0:

The advection equation describes the motion of a scalar field as it is advected by a known velocity vector field.

Equation: (A.16)
Boundary Condition: (A.17)
Initial Condition: (A.18)
Solution: (A.19)

Advection_1: This advection equation has different boundary conditions than Advection_0.

Equation: (A.20)
Boundary Condition: (A.21)
Initial Condition: (A.22)
Solution: (A.23)

Reaction: The reaction equation describes chemical reactions.

Equation: (A.24)
Boundary Condition: (A.25)
Initial Condition: (A.26)
Solution: (A.27)

a.2 Data Sampling Methods

In our experiments, we consider different sampling methods including random samplings and uniform samplings. We provide the details of the different data sampling methods in Table 1.

PDEs Sampling Methods # Collocation Points # Boundary Points # Initial Points
Heat_0 Random1 105 40 20
Random2 512 200 100
Uniform1 105 40 20
Uniform2 512 200 100
Heat_1 Random1 400 100 50
Random2 2500 500 250
Uniform1 400 100 50
Uniform2 2500 500 250
Wave Random1 2025 200 200
Random2 5041 500 500
Uniform1 2025 200 200
Uniform2 5041 500 500
Burgers Uniform1 5040 500 500
Uniform2 10057 1000 1000
Advection_0 Random1 200 40 20
Random2 800 160 80
Advection_1 Random1 200 40 20
Random2 800 160 80
Reaction Random 800 160 80
Uniform 800 160 80
Table 1: Data sampling methods for each PDE. "Random" means pseudo-random sampling and "Uniform" is sampling on a uniform grid in the spatio-temporal domain. The numbers after each sampling method name distinguish between different sampling densities.

Appendix B Activation Functions

In this appendix, we present the mathematical expressions of the four activation functions in the search space.

Tanh:

(B.1)

Sigmoid:

(B.2)

ReLU:

(B.3)

SwishRamachandranZL18:

(B.4)

Appendix C Additional Pre-Experiment Results and More Details about Auto-PINN

c.1 Additional Pre-Experiment Results

In this section, we provide additional results to further support the observations in Section 3.

Additional Error Heatmap Results. We report additional error heatmap results in Figure 7, 8, 9 and 10 for different PDEs. The heatmaps in rows follows the order of the sampling methods in Table 1. The heatmaps shown in the main paper are also included for reference.

Figure 7: Error heatmaps for Heat_0.
Figure 8: Error heatmaps for Heat_1.
Figure 9: Error heatmaps for Wave.
Figure 10: Error heatmaps for Burgers.
Figure 11: Structure-error relationship.
Figure 12: Loss-error relationship.

Additional Structure-Error Relationship Results. We report additional structure-error relationship results in Figure 11.

Additional Loss-Error Relationship Results. We report additional loss-error relationship results in Figure 12.

c.2 More Details about Auto-PINN

In this section, we present more details about the proposed method, Auto-PINN.

Default Settings. Here we present the default setting of our proposed method.

Set changing points to (Observation 2) for the following Step 1 to Step 2.2.

Search the activation function (Observation 1). Sample widths exponentially and depths uniformly. Use the median search objective to determine the dominant activation function. This activation function will become the only choice in the following steps.

Search the depth and width.

Search the good-performance regions of width (Observation 5). Split the width space into intervals uniformly. Sample width settings in each interval randomly to represent the performance of their interval, then combine them with the uniformly sampled depth settings. Use the median search objective to find the top width intervals as the good-performance regions.

Search the best depth settings. Collect all or width settings inside the top "good performance" regions with all depth settings. Search and find the top candidate structure configurations.

Search the best changing point for each candidate from the choices.

Verify the performance of the searched PINN architectures. Retrain every selected PINN five times with different intializations and report its median performance.

Complexity Analysis. Here we provide a detailed complexity analysis about the number of the search trials.

In the default search space, we provide activation functions, options of the changing points, widths and depths. Therefore, the entire search space contains configurations.

Under the default Auto-PINN settings, without considering the Step 4 for verification, the pipeline needs trials in Step 1, trials in Step 2.1, at most trials in Step 2.2 and in Step 3. Therefore, Auto-PINN requires at most search trials in total, which is of the whole search space.

Best Architectures Searched by Auto-PINN. Here we give a list to show the architecture searching results by Auto-PINN. The diversity of the searched best architectures can be shown in Table 2, which suggests the effectiveness of Auto-PINN. All the experiment settings are same as those in the main paper. We set the learning rates to and the number of training epochs to .

PDEs Sampling Methods Width Depth Activation Function Changing Point
Heat_0 Random1 80 8 Swish 0.5
Random2 512 7 Tanh 0.4
Uniform1 464 5 Tanh 0.5
Uniform2 496 5 Tanh 0.4
Heat_1 Random1 236 8 Tanh 0.5
Random2 248 9 Tanh 0.5
Uniform1 232 8 Tanh 0.5
Uniform2 252 10 Tanh 0.4
Wave Random1 116 4 Swish 0.4
Random2 180 7 Tanh 0.4
Uniform1 136 3 Swish 0.5
Uniform2 40 7 Tanh 0.4
Burgers Uniform1 256 10 Tanh 0.5
Uniform2 212 10 Tanh 0.4
Advection_0 Random1 48 7 ReLU 0.5
Random2 16 5 Tanh 0.2
Advection_1 Random1 256 4 Tanh 0.4
Random2 84 5 Tanh 0.1
Reaction Random 32 3 Tanh 0.4
Uniform 156 4 Swish 0.2
Table 2: Best architectures searched by Auto-PINN for each PDE and sampling method, according to smallest L2 Error among the top results.