DeepGreen: Deep Learning of Green's Functions for Nonlinear Boundary Value Problems

Boundary value problems (BVPs) play a central role in the mathematical analysis of constrained physical systems subjected to external forces. Consequently, BVPs frequently emerge in nearly every engineering discipline and span problem domains including fluid mechanics, electromagnetics, quantum mechanics, and elasticity. The fundamental solution, or Green's function, is a leading method for solving linear BVPs that enables facile computation of new solutions to systems under any external forcing. However, fundamental Green's function solutions for nonlinear BVPs are not feasible since linear superposition no longer holds. In this work, we propose a flexible deep learning approach to solve nonlinear BVPs using a dual-autoencoder architecture. The autoencoders discover an invertible coordinate transform that linearizes the nonlinear BVP and identifies both a linear operator L and Green's function G which can be used to solve new nonlinear BVPs. We find that the method succeeds on a variety of nonlinear systems including nonlinear Helmholtz and Sturm–Liouville problems, nonlinear elasticity, and a 2D nonlinear Poisson equation. The method merges the strengths of the universal approximation capabilities of deep learning with the physics knowledge of Green's functions to yield a flexible tool for identifying fundamental solutions to a variety of nonlinear systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

07/04/2020

Numerical method for solving the Dirichlet boundary value problem for nonlinear triharmonic equation

In this work, we consider the Dirichlet boundary value problem for nonli...
12/07/2020

Variational Autoencoders for Learning Nonlinear Dynamics of Physical Systems

We develop data-driven methods for incorporating physical information fo...
02/11/2018

Hypersingular nonlinear boundary-value problems with a small parameter

For the first time, some hypersingular nonlinear boundary-value problems...
05/19/2020

SINDy-BVP: Sparse Identification of Nonlinear Dynamics for Boundary Value Problems

We develop a data-driven model discovery and system identification techn...
10/15/2021

Learning the Koopman Eigendecomposition: A Diffeomorphic Approach

We present a novel data-driven approach for learning linear representati...
09/21/2021

Nonlinear and Linearised Primal and Dual Initial Boundary Value Problems: When are they Bounded? How are they Connected?

Linearisation is often used as a first step in the analysis of nonlinear...
09/28/2020

The model reduction of the Vlasov-Poisson-Fokker-Planck system to the Poisson-Nernst-Planck system via the Deep Neural Network Approach

The model reduction of a mesoscopic kinetic dynamics to a macroscopic co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Boundary value problems (BVPs) are ubiquitous in the sciences [41]. From elasticity to quantum electronics, BVPs have been fundamental in the development and engineering design of numerous transformative technologies of the 20th century. Historically, the formulation of many canonical problems in physics and engineering result in linear BVPs: from Fourier formulating the heat equation in 1822 [8] to more modern applications such as designing chip architectures in the semi-conductor industry [15, 46]. Much of our theoretical understanding of BVPs comes from the construction of the fundamental solution of the BVP, commonly known as the Green’s function [40]. The Green’s function solution relies on a common property of many BVPs: linearity

. Specifically, general solutions rely on linear superposition to hold, thus limiting their usefulness in many modern applications where BVPs are often heterogeneous and nonlinear. By leveraging modern deep learning, we are able to learn linearizing transformations of BVPs that render

nonlinear BVPs linear so that we can construct the Green’s function solution. Our deep learning of Green’s functions, DeepGreen, provides a transformative architecture for modern solutions of nonlinear BVPs.

DeepGreen is inspired by recent works which use deep neural networks (DNNs) to discover advantageous coordinate transformations for dynamical systems 

[25, 5, 43, 27, 42, 47, 33, 21, 7, 9]. The universal approximation properties of DNNs [6, 13] are ideal for learning coordinate transformations that linearize nonlinear BVPs, ODEs and PDEs. Specifically, such linearizing transforms fall broadly under the umbrella of Koopman operator theory [17], which has a modern interpretation in terms of dynamical systems theory [28, 29, 4, 30]. There are only limited cases in which Koopman operators can only be constructed explicitly [3]. However Dynamic Mode Decomposition (DMD) [39] provides a numerical algorithm for approximating the Koopman operator [37], with many recent extensions that improve on the DMD approximation [18]. More recently, neural networks have been used to construct Koopman embeddings [25, 43, 27, 42, 47, 33, 21, 9]. This is an alternative to enriching the observables of DMD [31, 32, 44, 45, 16, 19, 34]. Thus, neural networks have emerged as a highly effective mathematical tool for approximating complex data [2, 11] with a linear model. DNNs have been used in this context to discover time-stepping algorithms for complex systems [36, 10, 38, 20, 22]. Moreover, DNNs have been used to approximate constitutive models of BVPs [14].

DeepGreen leverages the success of DNNs for dynamical systems to discover coordinate transformations that linearize nonlinear BVPs so that the Green’s function solution can be recovered. This allows for the discovery of the fundamental solutions for nonlinear BVPs, opening many opportunities for the engineering and physical sciences. DeepGreen exploits physics-informed learning by using autoenconders (AEs) to take data from the original high-dimensional input space to the new coordinates at the intrinsic rank of the underlying physics [25, 35, 5]. The architecture also leverages the success of Deep Residual Networks (DRN) [12] which enables our approach to efficiently handle near-identity coordinate transformations [9].

The Green’s function constructs the solution to a BVP for any given forcing by superposition. Specifically, consider the classical linear BVP [40]

(1)

where is a linear differential operator, is a forcing, is the spatial coordinate, and is an open set. The boundary conditions are imposed on with a linear operator . The fundamental solution is constructed by considering the adjoint equation

(2)

where is the adjoint operator (along with its associated boundary conditions) and is the Dirac delta function. Taking the inner product of (1) with respect to the Green’s function gives the fundamental solution

(3)

which is valid for any forcing . Thus once the Green’s function is computed, the solution for arbitrary forcing functions can be easily extracted from integration. This integration represents a superposition of a continuum of delta function forcings that are used to represent .

In many modern applications, nonlinearity plays a fundamental role so that the BVP is of the form

(4)

where is a nonlinear differential operator. For this case, the principle of linear superposition no longer holds and the notion of a fundamental solution is lost. However, modern deep learning algorithms allow us the flexibility of learning a coordinate transformation (and their inverses) of the form

(5a)
(5b)

such that and satisfy the linear BVP (1) for which we generated the fundamental solution (3). This gives a nonlinear fundamental solution through use of this deep learning transformation.

DeepGreen is a supervised learning

algorithm which is ultimately a high-dimensional interpolation problem 

[26] for learning the coordinate transformations and . DeepGreen is enabled by a physics-informed deep autoencoder coordinate transformation which establishes superposition for nonlinear BVPs, thus enabling a Koopman BVP framework. The learned Green’s function enables accurate construction of solutions with new forcing functions in the same way as a linear BVP. We demonstrate the DeepGreen method on a variety of nonlinear boundary value problems, including a nonlinear 2D Poisson problem, showing that such an architecture can be used in many modern and diverse applications in aerospace, electromagnetics, elasticity, materials, and chemical reactors.

Figure 1: DeepGreen solves nonlinear BVPs by identifying the Green’s Function of the nonlinear problem using a deep learning approach with a dual autoencoder architecture. An nonhomogenous linear BVP can be solved using the Green’s function approach, but a nonlinear BVP cannot. DeepGreen transforms a nonlinear BVP to a linear BVP, solves the linearized BVP, and then inverse transforms the linear solution to solve the nonlinear BVP.

2 Deep Autoencoders for Linearizing BVPs

Deep AEs have been used to linearize dynamical systems, which are initial value problems. We extend this idea to BVPs. To be precise, we consider BVPs of the form

(6a)
(6b)

where is a simply connected open set in with boundary , is a nonlinear differential operator, is the nonhomogeneous forcing function, is a boundary condition, and is the solution to the BVP. We wish to find a pair of coordinate transformations of the form (5) such that and satisfy a linear BVP

(7a)
(7b)

where is a linear differential operator, is the spatial coordinate in the transformed domain with boundary . Because is linear, there is a Green’s function such that the solution to the BVP (7) can be obtained through convolution of the Green’s function and transformed forcing function

(8)

The coordinate transformation along with the Green’s function of the linearized BVP provide the analog of a Green’s function for the nonlinear BVP (6). In particular, for a forcing function , the transformed forcing function is . The solution to the linearized BVP can be obtained using the Green’s function . Then the solution to the nonlinear BVP (6) is obtained by inverting the coordinate transformation to obtain the solution to the nonlinear BVP, .

The question that remains is how to discover the appropriate coordinate transformations and . We leverage the universal approximation properties of neural networks in order to learn these transformations. In order to use neural networks, we first need to discretize the BVP. Let be a spatial discretization of and be a discretization of . Then the discretized version of the BVP (6) is

(9a)
(9b)

Neural networks and are used to transform and

to the latent space vectors

and

(10a)
(10b)

where and satisfy the linear equation

(11)

for some matrix , which is also learned. In order to learn invertible transforms and , we construct the problem as a pair of autoencoder networks.

In this construction, the transforms and

are the encoders and the transform inverses are the decoders. The network architecture and loss functions are shown in Figure

2.

Figure 2: DeepGreen architecture. Two autoencoders learn invertible coordinate transformations that linearize a nonlinear boundary value problem. The latent space is constrained to exhibit properties of a linear system, including linear superposition, which enables discovery of a Green’s function for nonlinear boundary value problems.

The neural network is trained using numerous and diverse solutions to the nonlinear BVP (9), which can be obtained with many different forcings . Consider a dataset comprised of pairs of discretized solutions and forcing functions . The loss function for training the network is the sum of six losses, each of which enforces a desired condition. The loss functions can be split into three categories:

  1. Autoencoder losses: We wish to learn invertible coordinate transformations given by equations (10a) and (10b). In order to do so, we use two autoencoders. The autoencoder for consists of an encoder which performs the transformation (10a) and a decoder which inverts the transformation. In order to enforce that the encoder and decoder are inverses, we use the autoencoder loss

    (12)

    Similarly, there is an autoencoder for where the encoder performs the transformation (10b). This transformation also has an inverse enforced by the associated autoencoder loss function

    (13)
  2. Linearity losses: In the transformed coordinate system, we wish for the BVP to be linear so that the operator can be represented by a matrix . The matrix and the encoded vectors and should satisfy equation (11). This is enforced with the linear operator loss

    (14)

    The major advantage of working with a linear operator is that linear superposition holds. We use a linear superposition loss in order to further enforce the linearity of the operator in the latent space

    (15)
  3. Cross-mapping losses: The losses described above are theoretically sufficient to find coordinate transformations for and as well as a linear operator . However, in practice the two autoencoders were not capable of generating the Green’s function solution. To rectify this, we add two “cross-mapping” loss functions that incorporate parts of both autoencoders. The first cross-mapping loss enforces the following mapping from to . First, one of the solutions from the dataset is encoded with . This is an approximation for . This is then multiplied by the matrix , giving an approximation of . Then the result is decoded with . This gives an approximation of . The to cross-mapping loss is given by the formula

    (16)

    We can similarly define a cross-mapping from to . For a forcing function from the dataset, it is encoded with , multiplied by the Green’s function (), and then decoded with to give an approximation of . The to cross-mapping loss is

    (17)

Note that this final loss function gives the best indication of the performance of the network to solve the nonlinear BVP (9) using the Green’s function. The strategy for solving (9) for a given discrete forcing function is to encode the forcing function to obtain , apply the Green’s function as in equation (8) to obtain , and then decode this function to get the solution . The discrete version of the convolution with the Green’s function given in equation (8) is multiplication by the matrix .

For the encoders and and decoders and , we use a residual neural network (ResNet) architecture [12]. The ResNet architecture has been successful in learning coordinate transformations for physical systems [9] and is motivated by near-identity transformations in physics. The linear operator is constrained to be a real symmetric matrix and therefore is self-adjoint. Additionally,

is initialized as the identity matrix. Therefore,

is strictly diagonally dominant for at least the early parts of training which guarantees is invertible and well-conditioned. For more information on the network architecture and training procedure, see Appendix B.

3 Results

The DeepGreen architecture, which is highlighted in Fig. 2 and whose detailed loss functions are discussed in the last section, is demonstrated on a number of canonical nonlinear BVPs. The first three BVPs are one-dimensional systems and the final one is a two-dimensional system. The nonlinearities in these problems do not allow for a fundamental solution, thus recourse is typically made to numerical computations to achieve a solution. DeepGreen, however, can produce a fundamental solution which can then be used for any new forcing of the BVP.

3.1 Cubic Helmholtz

The architecture and methodology is best illustrated using a basic example problem. The example problem uses a nonhomogeneous second-order nonlinear Sturm–Liouville model with constant coefficients and a cubic nonlinearity, thus making it a cubic Helmholtz equation. The differential equation is given by

(18a)
(18b)

where is the solution when the system is forced with with , and . The notation denotes . The dataset contains discretized solutions and forcings, . The forcing functions used for training are cosine and Gaussian functions; details of data generation and the forcing functions are provided in Appendix A.

Figure 3: Learning curve. This is a typical learning curve for the DeepGreen architecture. The vertical dashed line indicates where the training procedure transitions from autoencoders-only (only and ) to a full-network training procedure (all losses).
Figure 4: Latent space representations and . The autoencoder transformation encodes to the latent space, producing the vector (orange). The forcing vector is transformed by to the encoded vector (blue).
Figure 5: Visualized operator and Green’s function. Discovered Green’s function and corresponding linear operator .

The data is divided into three groups: training, validation, and test. The training and validation sets are used for training the model. The test set is used to evaluate the results. The training set contains vector pairs and . The validation set contains and test set contains .

3.1.1 Training the Model

The autoencoders used in this example are constructed with fully connected layers. In both autoencoders, a ResNet-like identity skip connection connects the input layer to the layer before dimension reduction in the encoder, and the first full-dimension layer in the decoder with the final output layer (see Figure 14).

The model is trained in a two-step procedure. First, the autoencoders are trained, without connection in the latent space, to condition the networks as autoencoders. In this first phase, only the autoencoder loss functions listed in Figure 2 are active ( and

). After a set number of epochs, the latent spaces are connected by an invertible matrix operator,

, and the remaining 4 loss functions in Figure 2 become active (). In the final phase of training, the autoencoder learns to encode a latent space representation of the system where properties associated with linear systems hold true, such as linear superposition.

Figure 3 shows a typical training loss curve. The vertical dashed line indicates the transition between the two training phases. The models in this work are trained for 75 epochs in the first autoencoder-only phase and 2750 epochs in the final phase. The first-phase epoch count was tuned empirically based on final model performance. The final phase epoch count was selected for practical reasons; the training curve tended to flatten around 2750 epochs in all of our tested systems. The autoencoder latent spaces are critically important. The latent space is the transformed vector space where linear properties (e.g. superposition) are enforced which enables the solution of nonlinear problems. In the one-dimensional problems, the latent spaces vectors and are in .

The latent spaces did not have any obvious physical interpretation, and qualitatively appeared similar to the representations shown in Figure 4. We trained 100 models to check the consistency in the learned model and latent space representations, but discovered the latent spaces varied considerably (see Appendix C). This implies the existence of an infinity of solutions to the coordinate transform problem, which indicates further constraints could be placed on the model.

Figure 6: Model predictions on test data. The top row shows the true solution and the solution predicted by the network given the forcing using the Green’s function . The bottom row shows the true forcing function compared to the forcing computed by applying the operator to the solution . Three columns show the best, mean, and worst case samples as evaluated by the sum of normalized reconstruction errors.

Despite lacking obvious physical interpretations, the latent space enables discovery of an invertible operator which described the linear system . The operator matrix can be inverted to yield the Green’s function matrix , which allows computation of solutions to the linearized system . An example of the operator and its inverse are shown in Figure 5. The operator and Green’s function shown in Figure 5 display an important prominent feature seen in all of the results: a diagonally-dominant structure. We initialize the operator as an identity matrix, but the initialization had little impact on the diagonally-dominant form of the learned operator and Green’s function matrices (see Appendix C). The diagonally-dominant operators indicate that the deep learning network tends to discover a coordinate transform yielding a nearly-orthonormal basis, which mirrors the common approach of diagonalization in spectral theory for Hermitian operators. Furthermore, diagonally-dominant matrices guarantee favorable properties for this application such as being well-conditioned and non-singular.

We emphasize that training parameters and model construction choices used in this work were not extensively optimized. We expect the model performance can be improved in a myriad of ways including extending training times, optimizing model architecture, modifying the size of the latent spaces, restricting the form of the operator, and applying additional constraints to the model. However, these topics are not the main scope of the present work; our focus is to illustrate the use of autoencoders as a coordinate transform for finding solutions to nonlinear BVPs.

3.1.2 Evaluating the Model

The goal for this model is to find a Green’s function for computing solutions to a nonlinear BVP governed by (6) for a given forcing function

. Similarly, we can estimate the forcing term,

, given the solution . The model is consequently evaluated by its ability to use the learned Green’s function and operator for predicting solutions and forcings, respectively, for new problems from a withheld test data set.

Recall the original model is trained on data where the forcing function is a cosine or Gaussian function. As shown in Figure 6, the model performs well on withheld test data where the forcing functions are cosine or Gaussian functions, producing a cumulative loss around . The solutions and forcing are depicted for the best, mean, and worst samples scored by cumulative loss.

It’s important to note the test data used in Figure 6 is similar to the training and validation data. Because ML models typically work extremely well in interpolation problems, it is reasonable to expect the model to perform well on this test data set.

As an interesting test to demonstrate the ability of the model to extrapolate, we prepared a separate set of test data containing solutions where are cubic polynomial forcing functions. This type of data was not present in training, and provides some insight into the generality of the learned linear operator and Green’s function matrices. Figure 7 shows examples of how the model performs on these cubic polynomial-type forcing functions. Similar to Figure 6, the best, mean, and worst samples are shown as graded by overall loss.

Figure 7: Model predictions on cubic Helmholtz forced system. The top row shows the true solution and the solution predicted by the network given the forcing using the Green’s function . The bottom row shows the true forcing function compared to the forcing computed by applying the operator to the solution . Three columns show the best, mean, and worst case samples as evaluated by the sum of normalized reconstruction errors.
Figure 8: Model performance summary. Distribution of loss values are shown for every sample in the test data set. Model loss functions are minimized during training, making them a natural metric to use for summarizing performance.

Figures 6 and 7 provide some qualitative insight into the model’s performance on specific instances selected from the pool of evaluated data. A quantitative perspective of the model’s performance is presented in Figure 8. This box plot shows statistics (median value, , , and range) for four of the loss functions evaluated on the similar (cosine and Gaussian) test data. Note the superposition loss function is not scored in this plot because the superposition loss function can only be evaluated within a single batch, and the loss depends on batch size and composition.

In conclusion, the DeepGreen architecture enables discovery of invertible, linearizing transformations that facilitate identification of a linear operator and Green’s function to solve nonlinear BVPs. It is tested on data similar and dissimilar to the training data, and evaluated on the loss functions that guide the training procedure. The discovered operator and Green’s function take on a surprisingly diagonally-dominant structure, which hints at the model’s preference to learn an optimal basis. The model appears to extrapolate beyond the test data, suggesting that the learned operator is somewhat general to the system.

3.2 Nonlinear Sturm–Liouville and Biharmonic Operators

In addition to the example system described above, the approach was applied to two other one-dimensional systems. We used the same training procedure and forcing functions that were described in Section 3.1. The first is a system governed by the nonlinear Sturm–Liouville equation

where controls the extent of nonlinearity, and and are spatially-varying coefficients

with . The final one-dimensional system is a biharmonic operator with an added cubic nonlinearity

where and are the coefficients and controls the nonlinearity. As in the prior example, the forcing functions in the training data are cosine and Gaussian functions, which are described further in Appendix A.

Results for all the one-dimensional models, including the cubic Helmholtz example from Section 3.1, are presented in Table 1. Model performance is quantitatively summarized by box plots and the Green’s function matrix is shown for each model.

Nonlinear
cubic Helmholtz
(constant coefficients)
Nonlinear
Sturm–Liouville
(varying , )
Nonlinear
Biharmonic operator
(constant coefficients)
Table 1: Summary of results for three one-dimensional models. The models are provided with the Green’s function learned by DeepGreen. A summary box plot shows the relative losses , , , , and for all three model systems.

Importantly, the learned Green’s function matrices consistently exhibit diagonally-dominant structure. The losses for the nonlinear cubic Helmholtz equation and the nonlinear Sturm–Liouville equation are similar which indicates that spatially-varying coefficients do not make the problem significantly more difficult for the DeepGreen architecture. In contrast, the loss for the nonlinear biharmonic equation are about an order of magnitude higher than the other two systems. This result implies the fourth-order problem is more difficult than the second-order problems. Also of note is that the linear operator loss is consistently the highest loss across all models. Therefore, it is easier for DeepGreen to find invertible transformations for the solutions and forcing functions than it is to find a linear operator that connects the two latent spaces.

3.3 Nonlinear Poisson Equation

Figure 9: Model predictions for the (a) best and (b) worst examples from test data with Gaussian and cosine forcings. In both (a) and (b), the top row shows the true solution , the predicted solution using the Green’s function, and the difference between the true and predicted solution. The bottom row shows the true forcing function , the predicted forcing function, and the difference between the true and predicted forces. In order to account for the difference in scale between and , the differences are scaled by the infinity norm of the true solution or forcing function ().

We also tested our method on a two-dimensional system. The two-dimensional model is a nonlinear version of the Poisson equation with Dirichlet boundary conditions

(19a)
(19b)

where . Similar to the one-dimensional models, the forcing functions used to train the model are cosine and Gaussian functions, the details of which are provided in Appendix A. The sizes of the data sets are also similar to the one-dimensional data sets. The training data contains vector pairs and , the validation data contains , and the test data contains .

The network architecture of the encoders and decoders for the two-dimensional example differs from the one-dimensional examples. Instead of fully connected layers, convolutional layers were used in the encoders and decoders. However, we still use a ResNet architecture. Additionally, the latent space vectors are in . Full details on the network architecture can be found in Appendix B. Note that the method proposed for discovering Green’s functions allows for any network architecture to be used for the encoders and decoders. For the one-dimensional example, similar results were obtained using fully connected and convolutional layers. However, the convolutional architecture was better in the two-layer case and also allowed for a more manageable number of parameters for the wider network that resulted from discretizing the two-dimensional space.

Figure 10: Model predictions for the (a) best and (b) worst examples from test data with cubic polynomial forcings. In both (a) and (b), the top row shows the true solution , the predicted solution using the Green’s function, and the difference between the true and predicted solution. The bottom row shows the true forcing function , the predicted forcing function, and the difference between the true and predicted forces. In order to account for the difference in scale between and , the differences are scaled by the infinity norm of the true solution or forcing function ().

The operator and Green’s function for the two-dimensional model are similar to those displayed in shown in Figure 5. The diagonal dominance is even more prevalent in this case than the one-dimensional example. The model was evaluated on test data containing cosine and Gaussian forcing functions. Figure 9a shows the true solution and forcing function as well as the network predictions for the example from the test data for which the model performed the best (i.e. the smallest value of the loss). The difference between the true and predicted functions is shown in the right column of Figure 9a and is scaled by the infinity norm of the true solution or forcing functions. Figure 9b shows similar results but for the worst example from the test data. In both cases, the model gives a qualitatively correct solution for both and . Unsurprisingly, the network struggles most on highly localized forcing functions and has the highest error in the region where the forcing occurs.

The model was also evaluated on test data that has cubic polynomial forcing functions, a type of forcing function not found in the training data. The best and worst examples are shown in Figure 10. Although the model does not perform as well for test data which is not similar to the training data, the qualitative features of the predicted solutions are still consistent with the true solutions. Figure 11 shows a box plot of the model’s performance on the similar (cosine and Gaussian forcing test data). The results are similar to the one-dimensional results, and, in fact, better than the biharmonic operator model.

Figure 11: Two-dimensional Poisson model performance summary. Distribution of loss values are shown for every sample in the test data set. Model loss functions are minimized during training, making them a natural metric to use for summarizing performance.

4 Conclusion

We have leveraged the expressive capabilities of deep learning to discover linearizing coordinates for nonlinear BVPs, thus allowing for the construction of the fundamental solution or nonlinear Green’s function. Much like the Koopman operator for time-dependent problems, the linearizing transformation provides a framework whereby the fundamental solution of the linear operator can be constructed and used for any arbitrary forcing. This provides a broadly applicable mathematical architecture for constructing solutions for nonlinear BVPs, which typically rely on numerical methods to achieve solutions. Our DeepGreen architecture can achieve solutions for arbitrary forcings by simply computing the convolution of the forcing with the Green’s function in the linearized coordinates.

Given the critical role that BVPs play in the mathematical analysis of constrained physical systems subjected to external forces, the DeepGreen architecture can be broadly applied in nearly every engineering discipline since BVPs are prevalent in diverse problem domains including fluid mechanics, electromagnetics, quantum mechanics, and elasticity. Importantly, DeepGreen provides a bridge between a classic and widely used solution technique to nonlinear BVP problems which generically do not have principled techniques for achieving solutions aside from brute-force computation. DeepGreen establishes this bridge by providing a transformation which allows linear superposition to hold. DeepGreen is a flexible, data-driven, deep learning approach to solving nonlinear boundary value problems (BVPs) using a dual-autoencoder architecture. The autoencoders discover an invertible coordinate transform that linearizes the nonlinear BVP and identifies both a linear operator and Green’s function which can be used to solve new nonlinear BVPs. We demonstrated that the method succeeds on a variety of nonlinear systems including nonlinear Helmholtz and Sturm–Liouville problems, nonlinear elasticity, and a 2D nonlinear Poisson equation. The method merges the strengths of the universal approximation capabilities of deep learning with the physics knowledge of Green’s functions to yield a flexible tool for identifying fundamental solutions to a variety of nonlinear systems.

Acknowledgments

SLB is grateful for funding support from the Army Research Office (ARO W911NF-17-1-0306). JNK acknowledges support from the Air Force Office of Scientific Research (FA9550-19-1-0011).

References

  • [1] M. S. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells (2015) The fenics project version 1.5. Archive of Numerical Software 3 (100). External Links: Document Cited by: §A.2.
  • [2] C. Bishop (2006) Pattern Recognition and Machine Learning. Springer. Cited by: §1.
  • [3] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N. Kutz (2016) Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLOS ONE 11 (2), pp. 1–19. Cited by: §1.
  • [4] M. Budišić and I. Mezić (2012) Geometry of the ergodic quotient reveals coherent structures in flows. Physica D: Nonlinear Phenomena 241 (15), pp. 1255–1269. Cited by: §1.
  • [5] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton (2019-11) Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences 116 (45), pp. 22445–22451 (en). External Links: Document, ISSN 0027-8424, 1091-6490 Cited by: §1, §1.
  • [6] G. Cybenko (1989-12)

    Approximation by superpositions of a sigmoidal function

    .
    Mathematics of Control, Signals, and Systems (MCSS) 2 (4), pp. 303–314. External Links: Document, ISSN 0932-4194, Link Cited by: §1.
  • [7] C. J. Dsilva, R. Talmon, R. R. Coifman, and I. G. Kevrekidis (2018) Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. Applied and Computational Harmonic Analysis 44 (3), pp. 759–773. Cited by: §1.
  • [8] J. J. Fourier (1822) Théorie analytique de la chaleur. Chez Firmin Didot. Cited by: §1.
  • [9] C. Gin, B. Lusch, J. N. Kutz, and S. L. Brunton (2020) Deep learning models for global coordinate transformations that linearize PDEs. To appear in the European Journal of Applied Mathematics. Note: Preprint Available: arXiv:1911.02710 Cited by: §1, §1, §2.
  • [10] R. Gonzalez-Garcia, R. Rico-Martinez, and I. Kevrekidis (1998) Identification of distributed parameter systems: a neural net based approach. Computers & chemical engineering 22, pp. S965–S968. Cited by: §1.
  • [11] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Cited by: §1.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: §1, §2.
  • [13] K. Hornik, M. Stinchcombe, and H. White (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks 3 (5), pp. 551–560. Cited by: §1.
  • [14] D. Z. Huang, K. Xu, C. Farhat, and E. Darve (2020-09) Learning constitutive relations from indirect observations using deep neural networks. Journal of Computational Physics 416, pp. 109491. External Links: ISSN 00219991, Document, Link Cited by: §1.
  • [15] J. D. Jackson (2007) Classical electrodynamics. John Wiley & Sons. Cited by: §1.
  • [16] S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C. Schütte, and F. Noé (2018) Data-driven model reduction and transfer operator approximation. Journal of Nonlinear Science 28 (3), pp. 985–1010. Cited by: §1.
  • [17] B. O. Koopman (1931) Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences 17 (5), pp. 315–318. External Links: http://www.pnas.org/content/17/5/315.full.pdf+html, Link Cited by: §1.
  • [18] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor (2016) Dynamic mode decomposition: data-driven modeling of complex systems. SIAM. Cited by: §1.
  • [19] J. N. Kutz, J. L. Proctor, and S. L. Brunton (2018)

    Applied Koopman theory for partial differential equations and data-driven modeling of spatio-temporal systems

    .
    Complexity 2018 (6010634), pp. 1–16. Cited by: §1.
  • [20] H. Lange, S. L. Brunton, and N. Kutz (2020) From fourier to koopman: spectral methods for long-term time series prediction. arXiv preprint arXiv:2004.00574. Cited by: §1.
  • [21] Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis (2017) Extended dynamic mode decomposition with dictionary learning: a data-driven adaptive spectral decomposition of the koopman operator. Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (10), pp. 103111. Cited by: §1.
  • [22] Y. Liu, J. N. Kutz, and S. L. Brunton (2020) Hierarchical deep learning of multiscale differential equation time-steppers. arXiv preprint arXiv:2008.09768. Cited by: §1.
  • [23] A. Logg, K. Mardal, G. N. Wells, et al. (2012) Automated solution of differential equations by the finite element method. Springer. External Links: Document, ISBN 978-3-642-23098-1 Cited by: §A.2.
  • [24] A. Logg and G. N. Wells (2010) DOLFIN: automated finite element computing. ACM Transactions on Mathematical Software 37 (2). External Links: Document Cited by: §A.2.
  • [25] B. Lusch, J. N. Kutz, and S. L. Brunton (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications 9 (1), pp. 4950. Cited by: §1, §1.
  • [26] S. Mallat (2016) Understanding deep convolutional networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374 (2065), pp. 20150203. Cited by: §1.
  • [27] A. Mardt, L. Pasquali, H. Wu, and F. Noé (2018) VAMPnets: deep learning of molecular kinetics. Nature Communications 9 (5). Cited by: §1.
  • [28] I. Mezić and A. Banaszuk (2004) Comparison of systems with complex behavior. Physica D: Nonlinear Phenomena 197 (1–2), pp. 101 – 133. External Links: Document, ISSN 0167-2789, Link Cited by: §1.
  • [29] I. Mezić (2005) Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics 41 (1-3), pp. 309–325. Cited by: §1.
  • [30] I. Mezić (2013) Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics 45, pp. 357–378. Cited by: §1.
  • [31] F. Noé and F. Nüske (2013) A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling & Simulation 11 (2), pp. 635–655. Cited by: §1.
  • [32] F. Nüske, B. G. Keller, G. Pérez-Hernández, A. S. Mey, and F. Noé (2014) Variational approach to molecular kinetics. Journal of chemical theory and computation 10 (4), pp. 1739–1752. Cited by: §1.
  • [33] S. E. Otto and C. W. Rowley (2019) Linearly-recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 558–593. Cited by: §1.
  • [34] J. Page and R. R. Kerswell (2018) Koopman analysis of burgers equation. Physical Review Fluids 3 (7), pp. 071901. Cited by: §1.
  • [35] S. Pan and K. Duraisamy (2020) Physics-informed probabilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability. SIAM Journal on Applied Dynamical Systems 19 (1), pp. 480–509. Cited by: §1.
  • [36] R. Rico-Martinez, I. Kevrekidis, and K. Krischer (1995) Nonlinear system identification using neural networks: dynamics and instabilities. Neural networks for chemical engineers, pp. 409–442. Cited by: §1.
  • [37] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and D.S. Henningson (2009) Spectral analysis of nonlinear flows. J. Fluid Mech. 645, pp. 115–127. Cited by: §1.
  • [38] S. H. Rudy, J. N. Kutz, and S. L. Brunton (2019) Deep learning of dynamics and signal-noise decomposition with time-stepping constraints. Journal of Computational Physics 396, pp. 483–506. Cited by: §1.
  • [39] P. J. Schmid (2010-08) Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656, pp. 5–28 (English). External Links: ISSN 0022-1120 Cited by: §1.
  • [40] I. Stakgold and M. J. Holst (2011) Green’s functions and boundary value problems. Vol. 99, John Wiley & Sons. Cited by: §1, §1.
  • [41] I. Stakgold (2000) Boundary value problems of mathematical physics: 2-volume set. Vol. 29, Siam. Cited by: §1.
  • [42] N. Takeishi, Y. Kawahara, and T. Yairi (2017) Learning koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pp. 1130–1140. Cited by: §1.
  • [43] C. Wehmeyer and F. Noé (2017) Time-lagged autoencoders: deep learning of slow collective variables for molecular kinetics. The Journal of Chemical Physics 148 (24), pp. 241703. Cited by: §1.
  • [44] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley (2015) A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 25, pp. 1307–1346. Cited by: §1.
  • [45] M. O. Williams, C. W. Rowley, and I. G. Kevrekidis (2015) A kernel-based method for data-driven Koopman spectral analysis. Journal of Computational Dynamics 2 (2), pp. 247–265. Cited by: §1.
  • [46] A. Yariv (1989) Quantum electronics. John Wiley & Sons. Cited by: §1.
  • [47] E. Yeung, S. Kundu, and N. Hodas (2019) Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. Cited by: §1.

Appendix A Data Generation

a.1 1D Problems

The data for all of the one-dimensional systems are created using the same method and forcing functions. Each solution is computed on an evenly-spaced 128-point grid using MATLAB’s bvp5c solver with a relative error tolerance of and an absolute error tolerance of . The forcing functions are designed to yield a variety of solutions such that .

The training data consists of two types of systems: Gaussian-forced and cosine-forced systems. The Gaussian-forced systems have forcing functions of the form

where , , and . The cosine forcing functions are of the form

where and . This gives a total of 5000 Gaussian-forced solutions and 7371 cosine-forced solutions. For the cubic Helmholtz equation and the nonlinear Sturm–Liouville equation with spatially-varying coefficients, all of the 12371 solutions were within the error tolerance. However, there were 97 solutions of the nonlinear biharmonic equation that did not meet the error tolerance and were therefore discarded. Of the remaining data, 10% are randomly chosen and withheld as test data, 80% are used as training data, and 20% are used as validation data.

In order to test the ability of the network to generalize, we also have another test data set that consists of solutions with cubic forcing functions of the form

where , and cubic forcing functions of the form

where , , and . There are a total of 4140 solutions with cubic forcing functions.

a.2 2D Problem

The two-dimensional data satisfies the nonlinear Poisson equation (19). The solutions are computed with a finite element method using the DOLFIN library [24] of the FEniCS Project [1, 23]. The forcing functions are similar to the one-dimensional data in that there are Gaussian and cosine forcing functions along with a separate data set of cubic polynomial forcing functions used to test the ability of the network to generalize. The Gaussian forcing functions are of the form

where , , and . The cosine forcing functions are of the form

where and . The cubic forcing functions are of the form

where , and cubic forcing functions of the form

where , , and . There are 6250 solutions with Gaussian forcing functions, 7371 solutions with cosine forcing functions, and 4416 solutions with cubic forcing functions.

Figure 12: Encoder network architecture for the two-dimensional data. All convolutional layers use

kernels with stride size 1, zero-padding, and ReLU activation functions. All pooling layers are average pooling layers with pool size 2 and stride size 2.

Figure 13: Decoder network architecture for the two-dimensional data. All transposed convolutional layers use kernels with stride size 2, zero-padding, and ReLU activation functions except for the last layer which has stride size 1.
Figure 14: Layer-by-layer autoencoder architecture for 1D problems.

Appendix B Neural Network Implementation Details

The model training procedure is kept constant for all of the examples in this work. The networks are optimized with an Adam optimizer (, ). Every numerical experiment starts by training a set of 20 models for a ‘small’ number of epochs. Each of the 20 models has a randomly selected learning rate for the Adam optimizer, uniformly selected between and . The initial training period consists of two phases: autoencoder-only (75 epochs) and full model (250 epochs). The autoencoder-only phase only enforces the autoencoder losses and

during backpropagation. A checkpoint algorithm is used to keep track of the model with the lowest overall loss throughout the training procedure. At the end of the initial period, the best model is selected and the others are eliminated. The best model is trained for an additional 2500 epochs.

There are two network architectures in this work. The architectures depicted in Figures 12 and 13 are applied to the two-dimensional nonlinear Poisson BVP. The architecture depicted in Figure 14, is applied to one-dimensional problems.

The two architectures have a few training variables in common. Both models use variance scaling initialization,

regularization (), and ReLu activation functions for fully connected (1D architecture) and convolutional (2D architecture) layers. Notably, the two layers immediately before and after the latent space do not have activation functions. A normalized mean squared error loss function is used for all of the loss functions, as described in Section 2. The models are trained in batches of 64 samples.

The 2D architecture utilizes convolutional layers and pooling layers, as shown in Figures 12 and 13. All convolutional layers use a kernel size of . There are differences between the convolutional layers in the encoder and the convolutional layers in the decoder. The encoder convolutional layers use a stride size of and an increasing number of filters (). The deconvolutional layers use a stride size of with decreasing filter size (). Pooling layers are similar for both the encoder and decoder with a stride size of and a pool size of .

Appendix C Additional Results

The repeatability of the results and models learned by the DeepGreen architecture are interesting to study from the perspective of operator convergence and latent space representations. In both cases, we aim to investigate the convergence of the model parameters to determine if the learned latent spaces and operators are unique or non-unique.

c.1 Operator Initialization

We repeat the training procedure for DeepGreen with three different initialization approaches for the operator . Again, we train with data from the example nonlinear cubic Helmholtz model. This experiment focuses on comparing the initial values of the operator with the final values of the operator at the end of training to determine if the DeepGreen approach tends to converge to a specific operator construction. The results in Figure 15 show the initial and final operator for identity-initialized, randomly initialized, and Toeplitz-initialized operator matrices. Impressively, the result shows that the network tends to learn operators with diagonal dominance for all of the tested initialization strategies. This approach, which DeepGreen appears to prefer, draws strong parallels to the coordinate diagonalization approach commonly used in physics.

Figure 15: Initial vs learned operators for an operator matrix for different initial conditions. The top row shows identity matrix initialization, the middle row shows random initialization (He normal), and the bottom row shows a Toeplitz gradient initialization.

c.2 Latent Space Analysis

We repeat the training procedure for the example system, the nonlinear cubic Helmholtz model, a total of one hundred times. A single sample was selected from the training data and the latent space representation, and , of the input vectors and are computed. Statistics for the latent space representations are presented in Figure 16. It is evident that the latent space vectors are not identical between runs, and that the values in the vector do not follow any particular statistical distribution. This information implies that the learned weights in the model, and the learned latent space representations, vary for each training instance and do not appear to converge to a single representation.

Figure 16: Statistics of latent space values of a single sample over 100 experimental runs.

c.3 Residual network architecture

All of the autoencoders used in this work use a residual network (ResNet) architecture. In order to demonstrate the advantage of the ResNet architecture, we trained six models using the DeepGreen architecture for each of the four systems. Three of the models use the ResNet skip connections, while three do not use the ResNet architecture.

For the two simplest systems, the nonlinear cubic Helmholtz equation and the nonlinear Sturm–Liouville equation, the difference between the models with and without the ResNet skip connections was negligible. For the nonlinear cubic Helmholtz equation, the mean validation loss for the non-ResNet models was and the median validation loss was . Using the ResNet architecture resulted in a mean validation loss of and a median validation loss of . The ResNet architecture resulted in a lower median validation loss but a higher mean due to one of the three models performing much more poorly than the other two. The results for the nonlinear Sturm–Liouville system are analogous. With a non-ResNet architecture, the mean validation loss was and the median validation loss was . With a ResNet architecture, the mean validation loss was and the median validation loss was . Therefore, the use of the ResNet architecture produced similar results to a non-ResNet architecture for these two simple systems.

For the two systems that had larger losses — the nonlinear biharmonic equation in 1D and the 2D nonlinear Poisson equation — the ResNet architecture was clearly superior to a non-ResNet architecture. For the nonlinear biharmonic equation, the ResNet architecture yields a mean validation loss of and median validation loss of for the three models, compared with and , respectively, for the non-ResNet architecture. Therefore, the ResNet architecture performed better in terms of both the mean and median loss. The ResNet architecture is absolutely vital for the nonlinear Poisson system. Without the ResNet architecture, the model essentially did not converge. Both the mean and median validation losses were . In contrast, the ResNet architecture had a mean validation loss of and a median of .