1 Introduction
Boundary value problems (BVPs) are ubiquitous in the sciences [41]. From elasticity to quantum electronics, BVPs have been fundamental in the development and engineering design of numerous transformative technologies of the 20th century. Historically, the formulation of many canonical problems in physics and engineering result in linear BVPs: from Fourier formulating the heat equation in 1822 [8] to more modern applications such as designing chip architectures in the semiconductor industry [15, 46]. Much of our theoretical understanding of BVPs comes from the construction of the fundamental solution of the BVP, commonly known as the Green’s function [40]. The Green’s function solution relies on a common property of many BVPs: linearity
. Specifically, general solutions rely on linear superposition to hold, thus limiting their usefulness in many modern applications where BVPs are often heterogeneous and nonlinear. By leveraging modern deep learning, we are able to learn linearizing transformations of BVPs that render
nonlinear BVPs linear so that we can construct the Green’s function solution. Our deep learning of Green’s functions, DeepGreen, provides a transformative architecture for modern solutions of nonlinear BVPs.DeepGreen is inspired by recent works which use deep neural networks (DNNs) to discover advantageous coordinate transformations for dynamical systems
[25, 5, 43, 27, 42, 47, 33, 21, 7, 9]. The universal approximation properties of DNNs [6, 13] are ideal for learning coordinate transformations that linearize nonlinear BVPs, ODEs and PDEs. Specifically, such linearizing transforms fall broadly under the umbrella of Koopman operator theory [17], which has a modern interpretation in terms of dynamical systems theory [28, 29, 4, 30]. There are only limited cases in which Koopman operators can only be constructed explicitly [3]. However Dynamic Mode Decomposition (DMD) [39] provides a numerical algorithm for approximating the Koopman operator [37], with many recent extensions that improve on the DMD approximation [18]. More recently, neural networks have been used to construct Koopman embeddings [25, 43, 27, 42, 47, 33, 21, 9]. This is an alternative to enriching the observables of DMD [31, 32, 44, 45, 16, 19, 34]. Thus, neural networks have emerged as a highly effective mathematical tool for approximating complex data [2, 11] with a linear model. DNNs have been used in this context to discover timestepping algorithms for complex systems [36, 10, 38, 20, 22]. Moreover, DNNs have been used to approximate constitutive models of BVPs [14].DeepGreen leverages the success of DNNs for dynamical systems to discover coordinate transformations that linearize nonlinear BVPs so that the Green’s function solution can be recovered. This allows for the discovery of the fundamental solutions for nonlinear BVPs, opening many opportunities for the engineering and physical sciences. DeepGreen exploits physicsinformed learning by using autoenconders (AEs) to take data from the original highdimensional input space to the new coordinates at the intrinsic rank of the underlying physics [25, 35, 5]. The architecture also leverages the success of Deep Residual Networks (DRN) [12] which enables our approach to efficiently handle nearidentity coordinate transformations [9].
The Green’s function constructs the solution to a BVP for any given forcing by superposition. Specifically, consider the classical linear BVP [40]
(1) 
where is a linear differential operator, is a forcing, is the spatial coordinate, and is an open set. The boundary conditions are imposed on with a linear operator . The fundamental solution is constructed by considering the adjoint equation
(2) 
where is the adjoint operator (along with its associated boundary conditions) and is the Dirac delta function. Taking the inner product of (1) with respect to the Green’s function gives the fundamental solution
(3) 
which is valid for any forcing . Thus once the Green’s function is computed, the solution for arbitrary forcing functions can be easily extracted from integration. This integration represents a superposition of a continuum of delta function forcings that are used to represent .
In many modern applications, nonlinearity plays a fundamental role so that the BVP is of the form
(4) 
where is a nonlinear differential operator. For this case, the principle of linear superposition no longer holds and the notion of a fundamental solution is lost. However, modern deep learning algorithms allow us the flexibility of learning a coordinate transformation (and their inverses) of the form
(5a)  
(5b) 
such that and satisfy the linear BVP (1) for which we generated the fundamental solution (3). This gives a nonlinear fundamental solution through use of this deep learning transformation.
DeepGreen is a supervised learning
algorithm which is ultimately a highdimensional interpolation problem
[26] for learning the coordinate transformations and . DeepGreen is enabled by a physicsinformed deep autoencoder coordinate transformation which establishes superposition for nonlinear BVPs, thus enabling a Koopman BVP framework. The learned Green’s function enables accurate construction of solutions with new forcing functions in the same way as a linear BVP. We demonstrate the DeepGreen method on a variety of nonlinear boundary value problems, including a nonlinear 2D Poisson problem, showing that such an architecture can be used in many modern and diverse applications in aerospace, electromagnetics, elasticity, materials, and chemical reactors.2 Deep Autoencoders for Linearizing BVPs
Deep AEs have been used to linearize dynamical systems, which are initial value problems. We extend this idea to BVPs. To be precise, we consider BVPs of the form
(6a)  
(6b) 
where is a simply connected open set in with boundary , is a nonlinear differential operator, is the nonhomogeneous forcing function, is a boundary condition, and is the solution to the BVP. We wish to find a pair of coordinate transformations of the form (5) such that and satisfy a linear BVP
(7a)  
(7b) 
where is a linear differential operator, is the spatial coordinate in the transformed domain with boundary . Because is linear, there is a Green’s function such that the solution to the BVP (7) can be obtained through convolution of the Green’s function and transformed forcing function
(8) 
The coordinate transformation along with the Green’s function of the linearized BVP provide the analog of a Green’s function for the nonlinear BVP (6). In particular, for a forcing function , the transformed forcing function is . The solution to the linearized BVP can be obtained using the Green’s function . Then the solution to the nonlinear BVP (6) is obtained by inverting the coordinate transformation to obtain the solution to the nonlinear BVP, .
The question that remains is how to discover the appropriate coordinate transformations and . We leverage the universal approximation properties of neural networks in order to learn these transformations. In order to use neural networks, we first need to discretize the BVP. Let be a spatial discretization of and be a discretization of . Then the discretized version of the BVP (6) is
(9a)  
(9b) 
Neural networks and are used to transform and
to the latent space vectors
and(10a)  
(10b) 
where and satisfy the linear equation
(11) 
for some matrix , which is also learned. In order to learn invertible transforms and , we construct the problem as a pair of autoencoder networks.
In this construction, the transforms and
are the encoders and the transform inverses are the decoders. The network architecture and loss functions are shown in Figure
2.The neural network is trained using numerous and diverse solutions to the nonlinear BVP (9), which can be obtained with many different forcings . Consider a dataset comprised of pairs of discretized solutions and forcing functions . The loss function for training the network is the sum of six losses, each of which enforces a desired condition. The loss functions can be split into three categories:

Autoencoder losses: We wish to learn invertible coordinate transformations given by equations (10a) and (10b). In order to do so, we use two autoencoders. The autoencoder for consists of an encoder which performs the transformation (10a) and a decoder which inverts the transformation. In order to enforce that the encoder and decoder are inverses, we use the autoencoder loss
(12) Similarly, there is an autoencoder for where the encoder performs the transformation (10b). This transformation also has an inverse enforced by the associated autoencoder loss function
(13) 
Linearity losses: In the transformed coordinate system, we wish for the BVP to be linear so that the operator can be represented by a matrix . The matrix and the encoded vectors and should satisfy equation (11). This is enforced with the linear operator loss
(14) The major advantage of working with a linear operator is that linear superposition holds. We use a linear superposition loss in order to further enforce the linearity of the operator in the latent space
(15) 
Crossmapping losses: The losses described above are theoretically sufficient to find coordinate transformations for and as well as a linear operator . However, in practice the two autoencoders were not capable of generating the Green’s function solution. To rectify this, we add two “crossmapping” loss functions that incorporate parts of both autoencoders. The first crossmapping loss enforces the following mapping from to . First, one of the solutions from the dataset is encoded with . This is an approximation for . This is then multiplied by the matrix , giving an approximation of . Then the result is decoded with . This gives an approximation of . The to crossmapping loss is given by the formula
(16) We can similarly define a crossmapping from to . For a forcing function from the dataset, it is encoded with , multiplied by the Green’s function (), and then decoded with to give an approximation of . The to crossmapping loss is
(17)
Note that this final loss function gives the best indication of the performance of the network to solve the nonlinear BVP (9) using the Green’s function. The strategy for solving (9) for a given discrete forcing function is to encode the forcing function to obtain , apply the Green’s function as in equation (8) to obtain , and then decode this function to get the solution . The discrete version of the convolution with the Green’s function given in equation (8) is multiplication by the matrix .
For the encoders and and decoders and , we use a residual neural network (ResNet) architecture [12]. The ResNet architecture has been successful in learning coordinate transformations for physical systems [9] and is motivated by nearidentity transformations in physics. The linear operator is constrained to be a real symmetric matrix and therefore is selfadjoint. Additionally,
is initialized as the identity matrix. Therefore,
is strictly diagonally dominant for at least the early parts of training which guarantees is invertible and wellconditioned. For more information on the network architecture and training procedure, see Appendix B.3 Results
The DeepGreen architecture, which is highlighted in Fig. 2 and whose detailed loss functions are discussed in the last section, is demonstrated on a number of canonical nonlinear BVPs. The first three BVPs are onedimensional systems and the final one is a twodimensional system. The nonlinearities in these problems do not allow for a fundamental solution, thus recourse is typically made to numerical computations to achieve a solution. DeepGreen, however, can produce a fundamental solution which can then be used for any new forcing of the BVP.
3.1 Cubic Helmholtz
The architecture and methodology is best illustrated using a basic example problem. The example problem uses a nonhomogeneous secondorder nonlinear Sturm–Liouville model with constant coefficients and a cubic nonlinearity, thus making it a cubic Helmholtz equation. The differential equation is given by
(18a)  
(18b) 
where is the solution when the system is forced with with , and . The notation denotes . The dataset contains discretized solutions and forcings, . The forcing functions used for training are cosine and Gaussian functions; details of data generation and the forcing functions are provided in Appendix A.
The data is divided into three groups: training, validation, and test. The training and validation sets are used for training the model. The test set is used to evaluate the results. The training set contains vector pairs and . The validation set contains and test set contains .
3.1.1 Training the Model
The autoencoders used in this example are constructed with fully connected layers. In both autoencoders, a ResNetlike identity skip connection connects the input layer to the layer before dimension reduction in the encoder, and the first fulldimension layer in the decoder with the final output layer (see Figure 14).
The model is trained in a twostep procedure. First, the autoencoders are trained, without connection in the latent space, to condition the networks as autoencoders. In this first phase, only the autoencoder loss functions listed in Figure 2 are active ( and
). After a set number of epochs, the latent spaces are connected by an invertible matrix operator,
, and the remaining 4 loss functions in Figure 2 become active (–). In the final phase of training, the autoencoder learns to encode a latent space representation of the system where properties associated with linear systems hold true, such as linear superposition.Figure 3 shows a typical training loss curve. The vertical dashed line indicates the transition between the two training phases. The models in this work are trained for 75 epochs in the first autoencoderonly phase and 2750 epochs in the final phase. The firstphase epoch count was tuned empirically based on final model performance. The final phase epoch count was selected for practical reasons; the training curve tended to flatten around 2750 epochs in all of our tested systems. The autoencoder latent spaces are critically important. The latent space is the transformed vector space where linear properties (e.g. superposition) are enforced which enables the solution of nonlinear problems. In the onedimensional problems, the latent spaces vectors and are in .
The latent spaces did not have any obvious physical interpretation, and qualitatively appeared similar to the representations shown in Figure 4. We trained 100 models to check the consistency in the learned model and latent space representations, but discovered the latent spaces varied considerably (see Appendix C). This implies the existence of an infinity of solutions to the coordinate transform problem, which indicates further constraints could be placed on the model.
Despite lacking obvious physical interpretations, the latent space enables discovery of an invertible operator which described the linear system . The operator matrix can be inverted to yield the Green’s function matrix , which allows computation of solutions to the linearized system . An example of the operator and its inverse are shown in Figure 5. The operator and Green’s function shown in Figure 5 display an important prominent feature seen in all of the results: a diagonallydominant structure. We initialize the operator as an identity matrix, but the initialization had little impact on the diagonallydominant form of the learned operator and Green’s function matrices (see Appendix C). The diagonallydominant operators indicate that the deep learning network tends to discover a coordinate transform yielding a nearlyorthonormal basis, which mirrors the common approach of diagonalization in spectral theory for Hermitian operators. Furthermore, diagonallydominant matrices guarantee favorable properties for this application such as being wellconditioned and nonsingular.
We emphasize that training parameters and model construction choices used in this work were not extensively optimized. We expect the model performance can be improved in a myriad of ways including extending training times, optimizing model architecture, modifying the size of the latent spaces, restricting the form of the operator, and applying additional constraints to the model. However, these topics are not the main scope of the present work; our focus is to illustrate the use of autoencoders as a coordinate transform for finding solutions to nonlinear BVPs.
3.1.2 Evaluating the Model
The goal for this model is to find a Green’s function for computing solutions to a nonlinear BVP governed by (6) for a given forcing function
. Similarly, we can estimate the forcing term,
, given the solution . The model is consequently evaluated by its ability to use the learned Green’s function and operator for predicting solutions and forcings, respectively, for new problems from a withheld test data set.Recall the original model is trained on data where the forcing function is a cosine or Gaussian function. As shown in Figure 6, the model performs well on withheld test data where the forcing functions are cosine or Gaussian functions, producing a cumulative loss around . The solutions and forcing are depicted for the best, mean, and worst samples scored by cumulative loss.
It’s important to note the test data used in Figure 6 is similar to the training and validation data. Because ML models typically work extremely well in interpolation problems, it is reasonable to expect the model to perform well on this test data set.
As an interesting test to demonstrate the ability of the model to extrapolate, we prepared a separate set of test data containing solutions where are cubic polynomial forcing functions. This type of data was not present in training, and provides some insight into the generality of the learned linear operator and Green’s function matrices. Figure 7 shows examples of how the model performs on these cubic polynomialtype forcing functions. Similar to Figure 6, the best, mean, and worst samples are shown as graded by overall loss.
Figures 6 and 7 provide some qualitative insight into the model’s performance on specific instances selected from the pool of evaluated data. A quantitative perspective of the model’s performance is presented in Figure 8. This box plot shows statistics (median value, , , and range) for four of the loss functions evaluated on the similar (cosine and Gaussian) test data. Note the superposition loss function is not scored in this plot because the superposition loss function can only be evaluated within a single batch, and the loss depends on batch size and composition.
In conclusion, the DeepGreen architecture enables discovery of invertible, linearizing transformations that facilitate identification of a linear operator and Green’s function to solve nonlinear BVPs. It is tested on data similar and dissimilar to the training data, and evaluated on the loss functions that guide the training procedure. The discovered operator and Green’s function take on a surprisingly diagonallydominant structure, which hints at the model’s preference to learn an optimal basis. The model appears to extrapolate beyond the test data, suggesting that the learned operator is somewhat general to the system.
3.2 Nonlinear Sturm–Liouville and Biharmonic Operators
In addition to the example system described above, the approach was applied to two other onedimensional systems. We used the same training procedure and forcing functions that were described in Section 3.1. The first is a system governed by the nonlinear Sturm–Liouville equation
where controls the extent of nonlinearity, and and are spatiallyvarying coefficients
with . The final onedimensional system is a biharmonic operator with an added cubic nonlinearity
where and are the coefficients and controls the nonlinearity. As in the prior example, the forcing functions in the training data are cosine and Gaussian functions, which are described further in Appendix A.
Results for all the onedimensional models, including the cubic Helmholtz example from Section 3.1, are presented in Table 1. Model performance is quantitatively summarized by box plots and the Green’s function matrix is shown for each model.









Importantly, the learned Green’s function matrices consistently exhibit diagonallydominant structure. The losses for the nonlinear cubic Helmholtz equation and the nonlinear Sturm–Liouville equation are similar which indicates that spatiallyvarying coefficients do not make the problem significantly more difficult for the DeepGreen architecture. In contrast, the loss for the nonlinear biharmonic equation are about an order of magnitude higher than the other two systems. This result implies the fourthorder problem is more difficult than the secondorder problems. Also of note is that the linear operator loss is consistently the highest loss across all models. Therefore, it is easier for DeepGreen to find invertible transformations for the solutions and forcing functions than it is to find a linear operator that connects the two latent spaces.
3.3 Nonlinear Poisson Equation
We also tested our method on a twodimensional system. The twodimensional model is a nonlinear version of the Poisson equation with Dirichlet boundary conditions
(19a)  
(19b) 
where . Similar to the onedimensional models, the forcing functions used to train the model are cosine and Gaussian functions, the details of which are provided in Appendix A. The sizes of the data sets are also similar to the onedimensional data sets. The training data contains vector pairs and , the validation data contains , and the test data contains .
The network architecture of the encoders and decoders for the twodimensional example differs from the onedimensional examples. Instead of fully connected layers, convolutional layers were used in the encoders and decoders. However, we still use a ResNet architecture. Additionally, the latent space vectors are in . Full details on the network architecture can be found in Appendix B. Note that the method proposed for discovering Green’s functions allows for any network architecture to be used for the encoders and decoders. For the onedimensional example, similar results were obtained using fully connected and convolutional layers. However, the convolutional architecture was better in the twolayer case and also allowed for a more manageable number of parameters for the wider network that resulted from discretizing the twodimensional space.
The operator and Green’s function for the twodimensional model are similar to those displayed in shown in Figure 5. The diagonal dominance is even more prevalent in this case than the onedimensional example. The model was evaluated on test data containing cosine and Gaussian forcing functions. Figure 9a shows the true solution and forcing function as well as the network predictions for the example from the test data for which the model performed the best (i.e. the smallest value of the loss). The difference between the true and predicted functions is shown in the right column of Figure 9a and is scaled by the infinity norm of the true solution or forcing functions. Figure 9b shows similar results but for the worst example from the test data. In both cases, the model gives a qualitatively correct solution for both and . Unsurprisingly, the network struggles most on highly localized forcing functions and has the highest error in the region where the forcing occurs.
The model was also evaluated on test data that has cubic polynomial forcing functions, a type of forcing function not found in the training data. The best and worst examples are shown in Figure 10. Although the model does not perform as well for test data which is not similar to the training data, the qualitative features of the predicted solutions are still consistent with the true solutions. Figure 11 shows a box plot of the model’s performance on the similar (cosine and Gaussian forcing test data). The results are similar to the onedimensional results, and, in fact, better than the biharmonic operator model.
4 Conclusion
We have leveraged the expressive capabilities of deep learning to discover linearizing coordinates for nonlinear BVPs, thus allowing for the construction of the fundamental solution or nonlinear Green’s function. Much like the Koopman operator for timedependent problems, the linearizing transformation provides a framework whereby the fundamental solution of the linear operator can be constructed and used for any arbitrary forcing. This provides a broadly applicable mathematical architecture for constructing solutions for nonlinear BVPs, which typically rely on numerical methods to achieve solutions. Our DeepGreen architecture can achieve solutions for arbitrary forcings by simply computing the convolution of the forcing with the Green’s function in the linearized coordinates.
Given the critical role that BVPs play in the mathematical analysis of constrained physical systems subjected to external forces, the DeepGreen architecture can be broadly applied in nearly every engineering discipline since BVPs are prevalent in diverse problem domains including fluid mechanics, electromagnetics, quantum mechanics, and elasticity. Importantly, DeepGreen provides a bridge between a classic and widely used solution technique to nonlinear BVP problems which generically do not have principled techniques for achieving solutions aside from bruteforce computation. DeepGreen establishes this bridge by providing a transformation which allows linear superposition to hold. DeepGreen is a flexible, datadriven, deep learning approach to solving nonlinear boundary value problems (BVPs) using a dualautoencoder architecture. The autoencoders discover an invertible coordinate transform that linearizes the nonlinear BVP and identifies both a linear operator and Green’s function which can be used to solve new nonlinear BVPs. We demonstrated that the method succeeds on a variety of nonlinear systems including nonlinear Helmholtz and Sturm–Liouville problems, nonlinear elasticity, and a 2D nonlinear Poisson equation. The method merges the strengths of the universal approximation capabilities of deep learning with the physics knowledge of Green’s functions to yield a flexible tool for identifying fundamental solutions to a variety of nonlinear systems.
Acknowledgments
SLB is grateful for funding support from the Army Research Office (ARO W911NF1710306). JNK acknowledges support from the Air Force Office of Scientific Research (FA95501910011).
References
 [1] (2015) The fenics project version 1.5. Archive of Numerical Software 3 (100). External Links: Document Cited by: §A.2.
 [2] (2006) Pattern Recognition and Machine Learning. Springer. Cited by: §1.
 [3] (2016) Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLOS ONE 11 (2), pp. 1–19. Cited by: §1.
 [4] (2012) Geometry of the ergodic quotient reveals coherent structures in flows. Physica D: Nonlinear Phenomena 241 (15), pp. 1255–1269. Cited by: §1.
 [5] (201911) Datadriven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences 116 (45), pp. 22445–22451 (en). External Links: Document, ISSN 00278424, 10916490 Cited by: §1, §1.

[6]
(198912)
Approximation by superpositions of a sigmoidal function
. Mathematics of Control, Signals, and Systems (MCSS) 2 (4), pp. 303–314. External Links: Document, ISSN 09324194, Link Cited by: §1.  [7] (2018) Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. Applied and Computational Harmonic Analysis 44 (3), pp. 759–773. Cited by: §1.
 [8] (1822) Théorie analytique de la chaleur. Chez Firmin Didot. Cited by: §1.
 [9] (2020) Deep learning models for global coordinate transformations that linearize PDEs. To appear in the European Journal of Applied Mathematics. Note: Preprint Available: arXiv:1911.02710 Cited by: §1, §1, §2.
 [10] (1998) Identification of distributed parameter systems: a neural net based approach. Computers & chemical engineering 22, pp. S965–S968. Cited by: §1.
 [11] (2016) Deep learning. MIT Press. Cited by: §1.

[12]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §1, §2.  [13] (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks 3 (5), pp. 551–560. Cited by: §1.
 [14] (202009) Learning constitutive relations from indirect observations using deep neural networks. Journal of Computational Physics 416, pp. 109491. External Links: ISSN 00219991, Document, Link Cited by: §1.
 [15] (2007) Classical electrodynamics. John Wiley & Sons. Cited by: §1.
 [16] (2018) Datadriven model reduction and transfer operator approximation. Journal of Nonlinear Science 28 (3), pp. 985–1010. Cited by: §1.
 [17] (1931) Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences 17 (5), pp. 315–318. External Links: http://www.pnas.org/content/17/5/315.full.pdf+html, Link Cited by: §1.
 [18] (2016) Dynamic mode decomposition: datadriven modeling of complex systems. SIAM. Cited by: §1.

[19]
(2018)
Applied Koopman theory for partial differential equations and datadriven modeling of spatiotemporal systems
. Complexity 2018 (6010634), pp. 1–16. Cited by: §1.  [20] (2020) From fourier to koopman: spectral methods for longterm time series prediction. arXiv preprint arXiv:2004.00574. Cited by: §1.
 [21] (2017) Extended dynamic mode decomposition with dictionary learning: a datadriven adaptive spectral decomposition of the koopman operator. Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (10), pp. 103111. Cited by: §1.
 [22] (2020) Hierarchical deep learning of multiscale differential equation timesteppers. arXiv preprint arXiv:2008.09768. Cited by: §1.
 [23] (2012) Automated solution of differential equations by the finite element method. Springer. External Links: Document, ISBN 9783642230981 Cited by: §A.2.
 [24] (2010) DOLFIN: automated finite element computing. ACM Transactions on Mathematical Software 37 (2). External Links: Document Cited by: §A.2.
 [25] (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications 9 (1), pp. 4950. Cited by: §1, §1.
 [26] (2016) Understanding deep convolutional networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374 (2065), pp. 20150203. Cited by: §1.
 [27] (2018) VAMPnets: deep learning of molecular kinetics. Nature Communications 9 (5). Cited by: §1.
 [28] (2004) Comparison of systems with complex behavior. Physica D: Nonlinear Phenomena 197 (1–2), pp. 101 – 133. External Links: Document, ISSN 01672789, Link Cited by: §1.
 [29] (2005) Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics 41 (13), pp. 309–325. Cited by: §1.
 [30] (2013) Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics 45, pp. 357–378. Cited by: §1.
 [31] (2013) A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling & Simulation 11 (2), pp. 635–655. Cited by: §1.
 [32] (2014) Variational approach to molecular kinetics. Journal of chemical theory and computation 10 (4), pp. 1739–1752. Cited by: §1.
 [33] (2019) Linearlyrecurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 558–593. Cited by: §1.
 [34] (2018) Koopman analysis of burgers equation. Physical Review Fluids 3 (7), pp. 071901. Cited by: §1.
 [35] (2020) Physicsinformed probabilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability. SIAM Journal on Applied Dynamical Systems 19 (1), pp. 480–509. Cited by: §1.
 [36] (1995) Nonlinear system identification using neural networks: dynamics and instabilities. Neural networks for chemical engineers, pp. 409–442. Cited by: §1.
 [37] (2009) Spectral analysis of nonlinear flows. J. Fluid Mech. 645, pp. 115–127. Cited by: §1.
 [38] (2019) Deep learning of dynamics and signalnoise decomposition with timestepping constraints. Journal of Computational Physics 396, pp. 483–506. Cited by: §1.
 [39] (201008) Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656, pp. 5–28 (English). External Links: ISSN 00221120 Cited by: §1.
 [40] (2011) Green’s functions and boundary value problems. Vol. 99, John Wiley & Sons. Cited by: §1, §1.
 [41] (2000) Boundary value problems of mathematical physics: 2volume set. Vol. 29, Siam. Cited by: §1.
 [42] (2017) Learning koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pp. 1130–1140. Cited by: §1.
 [43] (2017) Timelagged autoencoders: deep learning of slow collective variables for molecular kinetics. The Journal of Chemical Physics 148 (24), pp. 241703. Cited by: §1.
 [44] (2015) A datadriven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 25, pp. 1307–1346. Cited by: §1.
 [45] (2015) A kernelbased method for datadriven Koopman spectral analysis. Journal of Computational Dynamics 2 (2), pp. 247–265. Cited by: §1.
 [46] (1989) Quantum electronics. John Wiley & Sons. Cited by: §1.
 [47] (2019) Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. Cited by: §1.
Appendix A Data Generation
a.1 1D Problems
The data for all of the onedimensional systems are created using the same method and forcing functions. Each solution is computed on an evenlyspaced 128point grid using MATLAB’s bvp5c solver with a relative error tolerance of and an absolute error tolerance of . The forcing functions are designed to yield a variety of solutions such that .
The training data consists of two types of systems: Gaussianforced and cosineforced systems. The Gaussianforced systems have forcing functions of the form
where , , and . The cosine forcing functions are of the form
where and . This gives a total of 5000 Gaussianforced solutions and 7371 cosineforced solutions. For the cubic Helmholtz equation and the nonlinear Sturm–Liouville equation with spatiallyvarying coefficients, all of the 12371 solutions were within the error tolerance. However, there were 97 solutions of the nonlinear biharmonic equation that did not meet the error tolerance and were therefore discarded. Of the remaining data, 10% are randomly chosen and withheld as test data, 80% are used as training data, and 20% are used as validation data.
In order to test the ability of the network to generalize, we also have another test data set that consists of solutions with cubic forcing functions of the form
where , and cubic forcing functions of the form
where , , and . There are a total of 4140 solutions with cubic forcing functions.
a.2 2D Problem
The twodimensional data satisfies the nonlinear Poisson equation (19). The solutions are computed with a finite element method using the DOLFIN library [24] of the FEniCS Project [1, 23]. The forcing functions are similar to the onedimensional data in that there are Gaussian and cosine forcing functions along with a separate data set of cubic polynomial forcing functions used to test the ability of the network to generalize. The Gaussian forcing functions are of the form
where , , and . The cosine forcing functions are of the form
where and . The cubic forcing functions are of the form
where , and cubic forcing functions of the form
where , , and . There are 6250 solutions with Gaussian forcing functions, 7371 solutions with cosine forcing functions, and 4416 solutions with cubic forcing functions.
Appendix B Neural Network Implementation Details
The model training procedure is kept constant for all of the examples in this work. The networks are optimized with an Adam optimizer (, ). Every numerical experiment starts by training a set of 20 models for a ‘small’ number of epochs. Each of the 20 models has a randomly selected learning rate for the Adam optimizer, uniformly selected between and . The initial training period consists of two phases: autoencoderonly (75 epochs) and full model (250 epochs). The autoencoderonly phase only enforces the autoencoder losses and
during backpropagation. A checkpoint algorithm is used to keep track of the model with the lowest overall loss throughout the training procedure. At the end of the initial period, the best model is selected and the others are eliminated. The best model is trained for an additional 2500 epochs.
There are two network architectures in this work. The architectures depicted in Figures 12 and 13 are applied to the twodimensional nonlinear Poisson BVP. The architecture depicted in Figure 14, is applied to onedimensional problems.
The two architectures have a few training variables in common. Both models use variance scaling initialization,
regularization (), and ReLu activation functions for fully connected (1D architecture) and convolutional (2D architecture) layers. Notably, the two layers immediately before and after the latent space do not have activation functions. A normalized mean squared error loss function is used for all of the loss functions, as described in Section 2. The models are trained in batches of 64 samples.The 2D architecture utilizes convolutional layers and pooling layers, as shown in Figures 12 and 13. All convolutional layers use a kernel size of . There are differences between the convolutional layers in the encoder and the convolutional layers in the decoder. The encoder convolutional layers use a stride size of and an increasing number of filters (). The deconvolutional layers use a stride size of with decreasing filter size (). Pooling layers are similar for both the encoder and decoder with a stride size of and a pool size of .
Appendix C Additional Results
The repeatability of the results and models learned by the DeepGreen architecture are interesting to study from the perspective of operator convergence and latent space representations. In both cases, we aim to investigate the convergence of the model parameters to determine if the learned latent spaces and operators are unique or nonunique.
c.1 Operator Initialization
We repeat the training procedure for DeepGreen with three different initialization approaches for the operator . Again, we train with data from the example nonlinear cubic Helmholtz model. This experiment focuses on comparing the initial values of the operator with the final values of the operator at the end of training to determine if the DeepGreen approach tends to converge to a specific operator construction. The results in Figure 15 show the initial and final operator for identityinitialized, randomly initialized, and Toeplitzinitialized operator matrices. Impressively, the result shows that the network tends to learn operators with diagonal dominance for all of the tested initialization strategies. This approach, which DeepGreen appears to prefer, draws strong parallels to the coordinate diagonalization approach commonly used in physics.
c.2 Latent Space Analysis
We repeat the training procedure for the example system, the nonlinear cubic Helmholtz model, a total of one hundred times. A single sample was selected from the training data and the latent space representation, and , of the input vectors and are computed. Statistics for the latent space representations are presented in Figure 16. It is evident that the latent space vectors are not identical between runs, and that the values in the vector do not follow any particular statistical distribution. This information implies that the learned weights in the model, and the learned latent space representations, vary for each training instance and do not appear to converge to a single representation.
c.3 Residual network architecture
All of the autoencoders used in this work use a residual network (ResNet) architecture. In order to demonstrate the advantage of the ResNet architecture, we trained six models using the DeepGreen architecture for each of the four systems. Three of the models use the ResNet skip connections, while three do not use the ResNet architecture.
For the two simplest systems, the nonlinear cubic Helmholtz equation and the nonlinear Sturm–Liouville equation, the difference between the models with and without the ResNet skip connections was negligible. For the nonlinear cubic Helmholtz equation, the mean validation loss for the nonResNet models was and the median validation loss was . Using the ResNet architecture resulted in a mean validation loss of and a median validation loss of . The ResNet architecture resulted in a lower median validation loss but a higher mean due to one of the three models performing much more poorly than the other two. The results for the nonlinear Sturm–Liouville system are analogous. With a nonResNet architecture, the mean validation loss was and the median validation loss was . With a ResNet architecture, the mean validation loss was and the median validation loss was . Therefore, the use of the ResNet architecture produced similar results to a nonResNet architecture for these two simple systems.
For the two systems that had larger losses — the nonlinear biharmonic equation in 1D and the 2D nonlinear Poisson equation — the ResNet architecture was clearly superior to a nonResNet architecture. For the nonlinear biharmonic equation, the ResNet architecture yields a mean validation loss of and median validation loss of for the three models, compared with and , respectively, for the nonResNet architecture. Therefore, the ResNet architecture performed better in terms of both the mean and median loss. The ResNet architecture is absolutely vital for the nonlinear Poisson system. Without the ResNet architecture, the model essentially did not converge. Both the mean and median validation losses were . In contrast, the ResNet architecture had a mean validation loss of and a median of .
Comments
There are no comments yet.