1 Introduction
Physicsinformed neural networks (PINNs) raissi2019physics
are promising partial differential equations (PDE) solvers which integrate machine learning with physical laws. Benefiting from the strong expressive power of deep neural networks, PINNs are widely adopted to solve various realworld problems, such as fluid mechanics
cai2022physics; jin2021nsfnets; sun2020surrogate, material science haghighat2021physics; zhang2022analyses; zhang2021physics and biomedical engineering sahli2020physics; kissas2020machine; liu2020generic. PINNs do not require the timeconsuming construction of elaborate grids, and can therefore be applied more easily to irregular and highdimensional domains than traditional PDE solvers can.The structures of PINNs are usually simple multilayer perceptrons (MLPs). In the training process, the physical laws in PDE systems are rewritten as loss functions. The PINNs are trained to fit the groundtruth solutions under the supervision of the loss functions. Similar to other deep learning tasks, the neural architecture configurations of MLP networks, such as depths/widths, and activation functions, have great effect on the performance of PINNs. However, there is little research on this problem. For instance,
raissi2019physics found that increasing the width and depth of PINNs will improve the predictive accuracy, but their experiments are limited to a single PDE problem within a very small search space. While the Tanh activation function is the default option for PINNs, some studies al2021time; markidis2021old report that Sigmoid or Swish RamachandranZL18functions are more effective in some cases. However, they did not reach a conclusion about which activation function is preferred for various PDE problems. Therefore, further investigation is required to understand the relationship between the PINN architectures and their performances. Moreover, there are a number of important hyperparameters for training PINNs, such as the learning rate, the number of training epochs, and the choices of optimizers. Manually tuning the architecture and hyperparameters is tedious and laborious. Therefore, we are motivated to study the following research question:
Can we automate the process of architecture and hyperparameters selection to improve the performance of PINNs?Despite the recent progress of automated hyperparameter tuning optuna_2019; bergstra2013hyperopt and neural architecture search (NAS) liu2018darts; pham2018efficient; cai2018efficient, automating the neural architecture design of PINNs remains an open and challenging problem. First, the search space includes both discrete and continuous hyperparameters, and is extremely large. Existing hyperparameter optimization methods usually search the whole hyperparameter space, which can be inefficient. Second, the search objective for PINNs is unclear. Unlike other tasks that can natually use the performance metric (e.g., accuracy) as the search objective, many PDEs may have no exact solutions such that the error values are not available. Therefore, we have to identify an alternative search objective.
To this end, we first conduct a comprehensive set of benchmarking preexperiments to understand the search space by studying the relationship between each hyperparameter and the performance. We make two key observations from the experiments. First, we find that some design choices play a dominant role in the performance. For example, there is often a dominant activation function working better for each PDE. This motivates us to reduce the search space by decoupling it in a certain order. For instance, we can determine the best activation function with a small number of search trials, and then fix the activation function and focus on the search of the other hyperparameters. We observe similar phenomenons for other hyperparameters such as the changing point, depth, and width, which enables us to decouple them in a similar fashion. Second, we discover that the loss values are highly correlated with the errors. This makes the loss value a desirable search objective since it can be naturally obtained during the search for all the PDEs.
Based on the above observations, we propose AutoPINN, the first automated machine learning framework to optimize the neural architecture and the hyperparameters of PINNs. AutoPINN adopts a stepbystep decoupling strategy for search. Specifically, we search one hyperparameter at each step with the others fixed to one or a few sets of options. This strategy decreases the scale of the search space drastically. We perform extensive experiments to evaluate AutoPINN on seven PDE benchmarks with different training data sampling methods. The quantitative comparison results show that AutoPINN outperforms the other optimization strategies on both accuracy and stability with less search trials.
We summarize our main contributions as follows:

[leftmargin=0.4cm, itemindent=.0cm, itemsep=0.0cm, topsep=0.0cm]

We conduct a comprehensive set of benchmarking preexperiments on the hyperprameterperformance relationships of PINNs. Our observations suggest we can significantly reduce the search space via decoupling the search of different hyperparameters. We also identify the loss value of PINNs as a desirable search objective.

We propose AutoPINN, the first automated neural architecture and hyperparameter optimization approach for PINNs. The decoupling strategy can substantially decrease the search complexity.

We evaluate AutoPINN on a series of PDE benchmarks and compare it with other baseline methods. The results suggest that AutoPINN can consistently search PINN architectures that display good accuracy in different PDE problems, which outperforms other search algorithms,
2 Preliminaries
2.1 Partial Differential Equations (PDEs) and PhysicsInformed Neural Networks (PINNs)
We briefly review the baseline PINN algorithm, proposed by raissi2019physics. Consider a general form of a partial differential equation (PDE) defined on a bounded spatiotemporal domain {}:
(2.1)  
(2.2)  
(2.3) 
where is a spatiotemporal differential operator, is the boundary of the domain, and , and specify the source, boundary condition, and initial condition, respectively. Please refer to Appendix A to see a few concrete examples of PDEs.
A PINN is a deep learning approximator of the solution of a PDE. Namely, a model , with a set of learnable parameters
, e.g., a vanilla multilayer perceptron (MLP), is utilized to approximate
in the domain . The governing equation, boundary conditions, and initial conditions are rewritten into the training loss of the neural network as follows:(2.4)  
(2.5)  
(2.6)  
(2.7) 
where is the PDE residual loss on collocation training points sampled randomly in the domain, is the boundary condition loss on boundary points , and is the initial condition loss on initial points .
Benchmarking PDEs. We select a set of standard PDE benchmarks for the experiments. We conduct preexperiments on four representative PDEs. They are two diffusion (heat) equations, a wave equation and a Burgers’ equation which are commonly used in PINN research lu2021deepxde. We will refer to them as Heat_0, Heat_1, Burgers and Wave in the following sections for conciseness. These PDEs include different kinds of differential operators and boundary/initial conditions, which are capable of illustrating regular rules for PINNs in the preexperiments. Moreover, we design different data sampling schemes for each PDE to improve the credibility of the results. These PDEs are also involved in the formal comparison experiments, along with another three PDEs, which are two advection equations (Advection_0 and Advection_1) and a reaction equation (Reaction). The details of these PDEs and the data sampling methods are shown in the Appendix A.
2.2 Search Space
In this section, we define the search space for the PINN architectures. Here we consider the following variables.

[leftmargin=0.4cm, itemindent=.0cm, itemsep=0.0cm, topsep=0.0cm]

Width and Depth. For an MLP structured PINN, the depth is the number of hidden layers and the width
means the number of neurons in each hidden layer. We set the
width ranging in with the step of (only for Heat_0) or in with the step of (for other PDEs). The depth ranges in with the step of . 
Changing Point. According to lu2021deepxde, PINNs can reach their best performance by training with an Adam optimizer in the first stage to get closer to the minimum and then switching to an LBFGS secondorder optimizer liu1989limited. We need to decide the timing of that change. Therefore, we introduce a hyperparameter named Changing Point as a float number ranging from 0 to 1. This changing point indicates the proportion of the epochs using Adam to the total training epochs. For example, if the training epoch number is set to 10000 with a 0.4 Changing Point, that means the PINN will be trained with the Adam optimizer for epochs, followed by epoch LBFGS training. However, it makes little sense to search on a precise grid, so we only consider five discrete options .
Initially, we do not include learning rates and the training epochs into the search. However, we will show results on those two training hyperparameters in Section 5.3, which indicate that they have a small effect on the final architecture search results.
3 PreExperiments and Observations
As we mentioned previously, neural architecture optimization for PINNs is still an underexplored problem. Therefore, we should first explore general rules for the hyperparameters of PINNs. Differently from other deep learning tasks, the training strategy and the physical constraints of PINNs are unique. Therefore, we do not simply apply a hyperparameter search algorithm, but first do preexperiments to understand the behavior of PINNs. For all experiments in this section, the PINNs are trained with training epochs and the learning rate is set to .
We first study the relationship between structure and performance of the PINNs. Throughout this paper, the main figure of merit to measure accuracy is the relative error:
(3.1) 
where and are respectively the PINN prediction and a highfidelity PDE solution on a dense set of test points in the PDE domain. The reported errors are averages over three separate random initializations of the neural network weights in each case.
A set of heatmaps is shown in Figure 1 to display the L2 error results. Each row corresponds to a different PDE, and each column corresponds to a different activation function. The xtick labels of each heatmap represent different width and depth settings of PINNs from small to larger scales. For instance, "" means the width is and the depth is . The ytick labels are different changing points from to . For each cell in the heatmaps, the top number is the smallest L2 error value that the PINN actually reached in its training process. For a direct visual representation, the cells with deeper colors correspond to smaller error values, i.e., the PINNs have better performances. The number in the parentheses in each cell is the absolute distance from the smallest actual error value to the error value when the smallest loss function value is reached. In a real scenario when no solution to the PDE is available, and therefore the L2 error is unknown, we would like to investigate whether the training loss function is a good alternative, so we report these distances. On the top of each heatmap, we report the average, median and minimum error values across the heatmap. The numbers in the parentheses around the x and y labels are mean error values across columns and rows, which show the average performance when some of the hyperparameters are fixed. More heatmap results are shown in Appendix C. We obtain several observations from these heatmaps:
Observation 1
There is a dominant activation function in PINNs working better for each PDE, which can be easily found by searching a small subset of the whole space. For example, it is easy to see that Tanh is the best choice for Heat_0. Median error values across the subsets is a good metric to determine the dominant activation function.
Observation 2
Under the dominant activation function, the larger changing points perform better or comparably than smaller values, as can be seen in the average error values shown beside the yaxis.
Observation 3
The "wider and deeper PINNs are better" rule does not apply to all PDEs, e.g., the Wave PDE.
Observation 4
The error distances (the values in parentheses in the cells) are usually very small, i.e., when the loss functions reach the smallest values in the training processes, so do the L2 error values.
Observation 1 and 2 suggest that it is possible to decouple the activation function and changing point from the search space. Observation 3 indicates that further research on the MLP structure is needed to determine widths and depths for PINNs. Observation 4 means that overfitting is not a problem for PINNs. The error value at the minimum loss function value is close to the minimum error that a PINN can actually reach. However, it is just a local summary for each cell. We need to compare the losserror relationship between cells to establish that the loss function is a good search objective over the entire space.
3.1 StructureError Relationship
Width and depth are key structure hyperparameters of PINNs. However, as mentioned in Observation 3, the preexperiments above cannot give a clear relationship between the structures and error values of PINNs because of the low sampling rate in the spaces of the two hyperparamters. For that reason, we continue by doing more specific preexperiments on the structureerror relationship.
The results are shown in Figure 2. Each data point (joined by lines) is the average L2 error over three random initializations. We fix the activation function and the changing point in each case and then sample the space of width and depth. More results can be found in Appendix C. Clearly, there are many width regions in which the bestperforming network is not the deepest, e.g., width within for the Heat_0 and around for the Wave. This confirms again Observation 3 that the "wider and deeper PINNs are better" rule does not always apply and that it is important to determine the optimal depth for a given width.
Observation 5
Good regions in the space of width can be identified, and then one can search for the optimal depth.
3.2 LossError Relationship
We would like to investigate if the loss function is a representative search objective across the entire configuration space. Hence, we report the losserror relationship for different PDEs in Figure 3. The axis is the lowest loss values that the PINNs reached in the training process, the axis is the corresponding L2 error value. Note that Observation 4 has informed us that the smallest training error values agree with the smallest L2 error values. Each point in the figure represents a PINN architecture configuration shown in Figure 1. The Rsquare value indicate a strong linear correlation between the logarithmic loss and error values across the whole search space. Therefore, it is appropriate to leverage these loss values to judge the real performance of the PINNs. We have more linear correlation evidence shown in Appendix C. Hence, we have another observation:
Observation 6
There is a strong linear correlation between the smallest logloss values and the corresponding logerror values for the PINNs with different hyperparameter configurations in a PDE problem. We can take advantage of the training loss function values to assess the performance of the PINNs.
4 AutoPINN: Automated Architecture Optimization for PINNs
The preexperiments have provided us with a clear guideline on how to decouple the hyperparameter space, and we now can present our AutoPINN approach. With the help of Observations 1–6, we are ready to decouple the hyperparameters in the large search space and find the best architectures with only a small number of search trials.
4.1 Methodology
Input: A PDE problem with training and testing data points. The hyperparameter search space is mentioned in Section 2.2.
Search Objective: The smallest loss values reached by the PINNs in the training processes (Observation 3 and 6).
Set changing points to 0.5 (Observation 2) for the following Step 1 to Step 2.2.
Search the activation function (Observation 1). Sample on the width space exponentially and the depth space uniformly. Use the median search objective to determine the dominant activation function. This activation function will become the only choice in the following steps.
Search the depth and width
Search the goodperformance regions of width (Observation 5). Split the width space into several intervals. Sample several width settings in each interval to represent the performance of their interval, then combine them with the uniformly sampled depth settings. Use the median search objective to find several best width intervals as the goodperformance regions.
Search the best depth settings. Collect all width settings inside the "good performance" regions with all depth settings. Search and find the top candidate structure configurations.
Search the best changing point for each candidate.
Verify the performance of the searched PINN architectures. Retrain every selected PINN five times with different intializations and report its median performance.
Output: different PINN architectures with small training loss values, which correlate with small testing errors (Observation 4).
Please refer to Appendix C to see the default settings of the AutoPINN algorithm.
4.2 Complexity Analysis
The scale of the search space is greatly decreased by utilizing the proposed AutoPINN algorithm. The entire search space in Section 2.2 contains possible configurations. AutoPINN under the default settings only needs at most trials to find the best architectures, which is only to the whole search space. Detailed analysis is in Appendix C.
5 Experiments
5.1 Experimental Settings
In this section, we present the results of experiments that validate the effectiveness of the proposed method.
PDE Benchmarks. We conducted experiments using the Heat_0, Heat_1, Wave, Burgers, Advection_0, Advection_1 and Reaction PDEs, which are described in Appendix A.
Baseline Methods. Random Search and HyperOpt bergstra2013hyperopt are selected as the baseline methods for comparison with AutoPINN.
Implementation Details. We set the learning rates to and the number of training epochs to . We implemented AutoPINN and the baseline methods with the PINN package DeepXDE lu2021deepxde and hyperparameter tuning package Tune liaw2018tune. DeepXDE lu2021deepxde
is a user friendly opensourced library for physicsinformed machine learning including common PINNs and different training strategies. We use DeepXDE with some modifications in the APIs. Tune
liaw2018tuneis another Python library for experiment execution and hyperparameter tuning. It supports all mainstream machine learning frameworks and a large number of hyperparameter optimization methods. We utilized the trial parallelism feature of Tune to make our AutoPINN more efficient. Underlying MLP models are built and trained with PyTorch framework. We ran the experiments on 4 Nvidia 3090 GPUs.
5.2 Comparison Results
The results are shown in Figure 4
. Compared with the two baseline methods, the results show that AutoPINN is more stable, as can be seen from the mostly shorter bar lengths. However, the Random Search and HyperOpt suffer from very large performance variances frequently. Meanwhile, AutoPINN is capable of finding the architectures with good performance. In most cases, the architectures with the smallest error values, i.e., the bottom of the bars, found by AutoPINN matches or exceeds the performances of the baseline methods. In summary, AutoPINN outperforms the baseline methods in both stability and accuracy with fewer search trials.
5.3 Influence of Learning Rates and Training Epochs
As we mentioned, we do not consider the learning rates and training epochs in our search space. Further experimental results indicate that AutoPINN is not sensitive to these two training hyperparameters. As shown in Figure 5, the searched architectures with different learning rates and epochs congregate at a specific region in the search space, which means those architectures searched by AutoPINN are still available within a range of proper learning rates and epochs. Therefore, there is no need to search again with different learning rates and the numbers of epochs. On the other hand, we can see that the PDEs show different preferences in the structures. For example, the Heat_0 PDE requires wider structures but is insensitive to the depth, whereas the Wave PDE is not sensitive to width but prefers shallower PINNs. AutoPINN is able to identify consistent architectures for different PDEs, which is a very important point for future research.
5.4 Influence of Data Sampling
Different PDEs have distinct sensitivity to data sampling, as illustrated in Figure 6, where we employed random and uniform sampling schemes, each using two different sampling densities, for the collocation, boundary and initial points; see Appendix A for the details. The results confirm that the structuresampling relationship is distinct for each PDE. Therefore, it is not appropriate to simply copyandpaste the searched architectures when the sampling strategy changes. We remark that more sophisticated adaptive and residualbased data sampling methods CiCP29930; nabian2021efficient; peng2022rang for PINNs have been proposed. These sampling methods can be used as an internal setting for AutoPINN to achieve better search accuracy in future work.
6 Related Works
Neural Architecture Search (NAS). NAS elsken2019neural
is an AutoML technique for automating the architecture design engineering for neural networks. Traditional NAS methods includes random search, evolutionary algorithms and Bayesian optimization. Some recent works build NAS pipelines with reinforcement learning
cai2018efficient; pham2018efficient or gradientbased strategy liu2018darts. However, most NAS methods are computationally expensive and specifically designed for convolutional neural networks. There is little research on NAS pipelines for MLPs.
NAS for PINNs. Most research on PINNs focuses on training strategies such as the problem of loss function weighting mcclenny2020self; wang2021understanding; wang2022and. There are few studies on neural architecture search methods for PINNs. markidis2021old investigates simple features of the PINNs under different configurations using a Poisson equation and then designs a PINNtraditional hybrid PDE solver. skomski2021automating
proposes a genetic algorithm to search the types of the networks and the optimization functions for several dynamic systems. Another research
guo2020stochastic simply applies some existing hyperparameter optimization approaches to optimize their PINN models on solving a groundwater flow problem. Beyond that, there are no other studies, as far as we know, on NAS for PINNs using different PDE benchmarks.7 Conclusion and Future Work
In this paper, we proposed AutoPINN, the first systematic neural architecture and hyperparameter optimization approach for PINNs, which can search for the best architectures and hyperpramsters for different PDE problems within the large search spaces that are characteristic of PINNs. We conducted a comprehensive set of preexperiments to understand the search space of PINNs. Based on the observations, we proposed a stepbystep decoupling strategy to reduce the search space and use the loss value as the search objective. The comparison results demonstrate the stability and the effectiveness of AutoPINN. In addition, we perform experiments to analyze the influences of the learning rate, the training epochs, and the data sampling strategies. We found that the performance is not sensitive to the learning rates and the training epochs, while the best configuration depends on the adopted sampling strategy. We hope the insights gleaned from our observations can motivate future exploration in PINNs and that AutoPINN can serve as a strong baseline in future research. As future work, we plan to incorporate more sophisticated data sampling strategies into the search space of AutoPINN to achieve better performances.
References
Appendix A Benchmarking PDEs and Data Samplings
In this appendix, we present the PDEs and the data sampling strategies used in our experiments.
a.1 Benchmarking PDEs
We utilize some standard PDE benchmarks in our experiments, including two heat equations, a wave equation, a Burgers’ equation, two advection equations, and a reaction equation.
Heat_0: The heat equation describes the heat or temperature distribution in a given domain over time. This heat equation contains Dirichlet boundary conditions.
Equation:  (A.1)  
Boundary Condition:  (A.2)  
Initial Condition:  (A.3)  
Solution:  (A.4) 
Heat_1: This heat equation contains Neumann boundary conditions.
Equation:  (A.5)  
Boundary Condition:  (A.6)  
Initial Condition:  (A.7)  
Solution:  (A.8) 
Wave: The wave equation describes the propagation of oscillations in a space, such as mechanical and electromagnetic waves. This wave equation contains both Dirichlet and Neumann conditions.
Equation:  (A.9)  
Boundary Condition:  (A.10)  
Initial Condition:  (A.11)  
Solution:  (A.12) 
Burgers: Burgers equation has been leveraged to model shock flows, wave propagation in combustion chambers, vehicular traffic movement, and more. We use an accurate approximation of the solutionjohn2015burgers.
Equation:  (A.13)  
Boundary Condition:  (A.14)  
Initial Condition:  (A.15) 
Advection_0:
The advection equation describes the motion of a scalar field as it is advected by a known velocity vector field.
Equation:  (A.16)  
Boundary Condition:  (A.17)  
Initial Condition:  (A.18)  
Solution:  (A.19) 
Advection_1: This advection equation has different boundary conditions than Advection_0.
Equation:  (A.20)  
Boundary Condition:  (A.21)  
Initial Condition:  (A.22)  
Solution:  (A.23) 
Reaction: The reaction equation describes chemical reactions.
Equation:  (A.24)  
Boundary Condition:  (A.25)  
Initial Condition:  (A.26)  
Solution:  (A.27) 
a.2 Data Sampling Methods
In our experiments, we consider different sampling methods including random samplings and uniform samplings. We provide the details of the different data sampling methods in Table 1.
PDEs  Sampling Methods  # Collocation Points  # Boundary Points  # Initial Points 

Heat_0  Random1  105  40  20 
Random2  512  200  100  
Uniform1  105  40  20  
Uniform2  512  200  100  
Heat_1  Random1  400  100  50 
Random2  2500  500  250  
Uniform1  400  100  50  
Uniform2  2500  500  250  
Wave  Random1  2025  200  200 
Random2  5041  500  500  
Uniform1  2025  200  200  
Uniform2  5041  500  500  
Burgers  Uniform1  5040  500  500 
Uniform2  10057  1000  1000  
Advection_0  Random1  200  40  20 
Random2  800  160  80  
Advection_1  Random1  200  40  20 
Random2  800  160  80  
Reaction  Random  800  160  80 
Uniform  800  160  80 
Appendix B Activation Functions
In this appendix, we present the mathematical expressions of the four activation functions in the search space.
Tanh:
(B.1) 
Sigmoid:
(B.2) 
ReLU:
(B.3) 
SwishRamachandranZL18:
(B.4) 
Appendix C Additional PreExperiment Results and More Details about AutoPINN
c.1 Additional PreExperiment Results
In this section, we provide additional results to further support the observations in Section 3.
Additional Error Heatmap Results. We report additional error heatmap results in Figure 7, 8, 9 and 10 for different PDEs. The heatmaps in rows follows the order of the sampling methods in Table 1. The heatmaps shown in the main paper are also included for reference.
Additional StructureError Relationship Results. We report additional structureerror relationship results in Figure 11.
Additional LossError Relationship Results. We report additional losserror relationship results in Figure 12.
c.2 More Details about AutoPINN
In this section, we present more details about the proposed method, AutoPINN.
Default Settings. Here we present the default setting of our proposed method.
Set changing points to (Observation 2) for the following Step 1 to Step 2.2.
Search the activation function (Observation 1). Sample widths exponentially and depths uniformly. Use the median search objective to determine the dominant activation function. This activation function will become the only choice in the following steps.
Search the depth and width.
Search the goodperformance regions of width (Observation 5). Split the width space into intervals uniformly. Sample width settings in each interval randomly to represent the performance of their interval, then combine them with the uniformly sampled depth settings. Use the median search objective to find the top width intervals as the goodperformance regions.
Search the best depth settings. Collect all or width settings inside the top "good performance" regions with all depth settings. Search and find the top candidate structure configurations.
Search the best changing point for each candidate from the choices.
Verify the performance of the searched PINN architectures. Retrain every selected PINN five times with different intializations and report its median performance.
Complexity Analysis. Here we provide a detailed complexity analysis about the number of the search trials.
In the default search space, we provide activation functions, options of the changing points, widths and depths. Therefore, the entire search space contains configurations.
Under the default AutoPINN settings, without considering the Step 4 for verification, the pipeline needs trials in Step 1, trials in Step 2.1, at most trials in Step 2.2 and in Step 3. Therefore, AutoPINN requires at most search trials in total, which is of the whole search space.
Best Architectures Searched by AutoPINN. Here we give a list to show the architecture searching results by AutoPINN. The diversity of the searched best architectures can be shown in Table 2, which suggests the effectiveness of AutoPINN. All the experiment settings are same as those in the main paper. We set the learning rates to and the number of training epochs to .
PDEs  Sampling Methods  Width  Depth  Activation Function  Changing Point 

Heat_0  Random1  80  8  Swish  0.5 
Random2  512  7  Tanh  0.4  
Uniform1  464  5  Tanh  0.5  
Uniform2  496  5  Tanh  0.4  
Heat_1  Random1  236  8  Tanh  0.5 
Random2  248  9  Tanh  0.5  
Uniform1  232  8  Tanh  0.5  
Uniform2  252  10  Tanh  0.4  
Wave  Random1  116  4  Swish  0.4 
Random2  180  7  Tanh  0.4  
Uniform1  136  3  Swish  0.5  
Uniform2  40  7  Tanh  0.4  
Burgers  Uniform1  256  10  Tanh  0.5 
Uniform2  212  10  Tanh  0.4  
Advection_0  Random1  48  7  ReLU  0.5 
Random2  16  5  Tanh  0.2  
Advection_1  Random1  256  4  Tanh  0.4 
Random2  84  5  Tanh  0.1  
Reaction  Random  32  3  Tanh  0.4 
Uniform  156  4  Swish  0.2 