Solving Forward and Inverse Problems Using Autoencoders

12/05/2019
by   Hwan Goh, et al.
47

This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.

READ FULL TEXT VIEW PDF

page 4

page 5

page 6

page 7

page 8

page 9

04/20/2020

Sparse aNETT for Solving Inverse Problems with Deep Learning

We propose a sparse reconstruction framework (aNETT) for solving inverse...
08/14/2018

Analyzing Inverse Problems with Invertible Neural Networks

In many tasks, in particular in natural science, the goal is to determin...
06/06/2019

Learning to regularize with a variational autoencoder for hydrologic inverse analysis

Inverse problems often involve matching observational data using a physi...
10/13/2021

Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Text autoencoders are often used for unsupervised conditional text gener...
12/21/2021

Regularization from examples via neural networks for parametric inverse problems: topology matters

In this work we deal with parametric inverse problems, which consist in ...
02/04/2022

Verifying Inverse Model Neural Networks

Inverse problems exist in a wide variety of physical domains from aerosp...
05/26/2021

Operator Autoencoders: Learning Physical Operations on Encoded Molecular Graphs

Molecular dynamics simulations produce data with complex nonlinear dynam...

Code Repositories

uq-vae

Solving Bayesian Inverse Problems via Variational Autoencoders


view repo

1 Abstract

This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.

2 Introduction

Deep learning and related neural network techniques have provided a useful framework for modelling physical systems. A significant question that arises in effective modelling is how one informs the neural network of the task at hand in order to improve its performance. Examples include physics informed neural networks [19, 20, 21, 18]

where the residual governing partial differential equation acts as a regularizing term to inform the network of the underlying physics, or FEA-net and MG-net in

[28, 7] which leverages the known structure of discrete PDE solvers, and many others [25, 11, 27, 5]. In the context of physically driven inverse problems, the solving of PDEs can be considered as the forward problem. Neural networks have also been utilized in the solving of inverse problems [2, 1, 10, 12, 9, 16]. As with solving forward problems, many of the techniques used to solve inverse problems with neural networks involve the introduction of the neural network as a regularizer to the problem. In our work, we aim to equip autoencoders with the underlying forward model by enforcing the latent variables to be the observational data. This model-aware autoencoder, as shall be demonstrated, is a promising approach for solving inverse problems.

Autoencoders were first introduced in [22]

to address the challenges of unsupervised learning by using the input data as the teacher to train a neural network. The more modern use of autoencoders involves selecting a particular architecture or optimization problem for training the autoencoder so that it is able to learn new representations of data. For example, by selecting a network architecture involving a bottleneck in the hidden layers, a network that is trained to recover the input through the bottleneck can then be used as a dimensionality reduction operator

[8, 24]. Another notable example is that by corrupting the input with noise, the trained autoencoder can act as a denoiser of data [23]. In [3], a general framework for studying linear and nonlinear autoencoders was introduced.

Autoencoders have been used in solving inverse problems in [9, 29, 26, 14, 17, 10, 6, 13, 15] mostly in the context of image reconstruction. In [9], a two-layer autoencoder was used in the setting of compressed sensing in order to recover a high-dimensional signal from underdetermined linear measurements. In doing so, a generative model was trained such that an approximation of the signal can be obtained as a mapping from some latent space. Further, the work provided a proof that if the number of measurements is larger than twice the dimension of the generative model, the signal can be recovered from the measurements up to some distortion. In [6, 15], training autoencoders to obtain a signal generating model was used in the context of medical imaging. In [13], a more generic patch-based reconstruction technique was introduced that can be applied to any imaging modality. In this paper we take a radically different approach, that is, instead of considering the encoder-decoder transition layer as the latent space, we enforce it to be our measurement space where we input data. In doing so, by training the autoencoder, we simultaneously learn the forward map as the encoder and the inverse map as the decoder.

Regularization is often used to prevent autoencoders from learning a trivial identity mapping from input to output. In our work, we consider the input and output data to be a parameter of interest and we use regularization in order to ensure that the autoencoder learns the inverse mapping from measurement data to the parameter. Specifically, we consider a loss function of the form

(1)

where the parameter of interest, denotes the parameter data, the encoder, the decoder, the autoencoder network weights and the observation of the state. By minimizing this loss function, we obtain the observation-to-parameter map as the decoder. Additionally, we also obtain the parameter-to-observation map as the encoder. A simple three-layer autoencoder is depicted in Figure 1

. With this approach, measurement data can be input as an argument of the decoder portion of the autoencoder in order to obtain an estimate as the output; thereby effectively performing an inverse problem solve.

To the best of our knowledge, the equipping of the latent space of an autoencoder as a measurement space to make the autoencoder aware of its task in learning the forward and inverse mapping simultaneously is a novel concept.

Figure 1: Left: Autoencoder network. Center: The encoder network that models the forward problem. Right: The decoder network that models the inverse problem. The blue nodes depict the observation nodes where state measurement data is located. The nodes denote the parameter of interest and and denote the weights of the first and th layers.

3 Results

In this section, we present preliminary results for the thermal fin problem. The temperature distribution within the fin, , is governed by the following elliptic partial differential equation:

(2)
(3)
(4)

where denotes the thermal heat conductivity, is the Biot number, is the physical domain describing the thermal fin, is the bottom edge of the fin, is the exterior edges of the fin, equation (3) models convective heat losses to the external surface, and equation (4) models the heat source at the root. The experiment parameterizes the thermal conductivity as a function defined over a finite element mesh. The quantity of interest for the forward problem is the pointwise temperature across the fin. The quantity of interest for the inverse problem is the heat conductivity across the fin. The finite element mesh for the thermal fin is displayed in Figure 2. We consider two cases of parameter distribution. The first case is a piece-wise constant distribution over the eight subfins and central subdomain. The second case is a parameter distribution that is spatially varying over the whole fin.

Figure 2: Finite element mesh for the thermal fin.

3.1 Neural Network Model and Training Properties

Our neural network architecture consists of hidden layers with each layer possessing 500 nodes. We consider two test cases. For the first case, we assume that the measurements are taken over the full domain and so the third hidden layer where the measurement data is input consists of nodes; equal to the number of nodes in the computational mesh of the thermal fin. The second case assumes that we have boundary measurements; this consists of measurement points along the outside of the thermal fin and so the third hidden layer where the measured data is input consists of nodes. Optimization is performed using the Adam optimizer with epochs and we consider the regularization parameters for the case of a piece-wise constant parameter distribution and the regularization parameters for the case of a spatially varying parameter distribution. We use a data set of parameter and state measurement pairs for training. We compare the estimates obtained using our autoencoder with estimates obtained using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. The loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter map are respectively:

(5a)
(5b)

3.2 Piecewise Constant Parameter Distribution

We begin with the case of a piecewise constant distribution of the heat conductivity. In Figure 3 we display the estimates with the accompanying relative errors displayed in Table 1. The training metrics are displayed in Figure 12 and Figure 13 of the Appendix for the and cases respectively. For the case when , we can see that the estimates are accurate. This is also quantitatively supported by the low relative errors. Furthermore, the training metrics displayed in Figure 13 of the appendix show desirable behaviour with the relative errors of the parameter and state predictions decreasing as training loss decreases. For the case when , we can see that the parameter and state estimates are considerably less accurate than when the which suggests a dependence on the measurement data when training the network.

Figure 3: Encoder state estimates and decoder parameter estimates with full domain data. First row shows the true parameter and the true state. The second row shows estimates of the true parameter and true state with regularization parameter . The third row shows estimates of the true parameter and true state with regularization parameter
Full Data Boundary Data
Parameter Estimate State Estimate Parameter Estimate State Estimate
0.01 32.119% 6.932 % 37.218% 8.418%
1 4.151% 0.409% 5.139% 0.3286
Table 1: Relative errors of the decoder parameter estimates and the encoder state estimates with a piecewise constant parameter distribution.

We now display results obtained using boundary data. In Figure 4 we display the estimates with the accompanying relative errors displayed in Table 1. Note that there is no state prediction displayed as the middle layer of the autoencoder corresponding to the measurement data input only consists of nodes. Therefore, the encoder can only be used to estimate the boundary data and not the full domain distribution of temperature. As with the full domain data results, for the case when , we can see that the parameter and state estimates are considerably less accurate than when the which suggests a dependence on the measurement data when training the network. The training metrics displayed in Figure 18 and Figure 19 of the Appendix respectively again show desirable behaviour.

Figure 4: Decoder estimates with boundary data. The top row displays the true parameter. For the bottom row, on the left is the parameter estimate with and on the right is the parameter estimate with .

We compare these estimates with estimates obtained from using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. We denote these as and respectively and the architecture of these networks match that of their analogous encoder and decoder portions of the autoencoder. That is, each network consists of two hidden layers each possessing nodes. The loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter map are as displayed in (5). We set the regularization parameter to be .

We display the parameter estimates in Figure 5 and the state estimates in Figure 6. Note that there is no reconstruction displayable for the parameter-to-observation model when boundary data was used. From the quality of the estimates, it is clear that both neural networks possess enough capacity to accurately learn their respective maps. This is quantitatively supported by the relative errors displayed in Table 2. Therefore, our results suggest that simultaneous learning with an autoencoder does not significantly improve the learning of the parameter-to-observation and the observation-to-parameter map for the case of a piece-wise constant parameter distribution.

Full Data Boundary Data
Parameter Estimate State Estimate Parameter Estimate State Estimate
0.01 6.806% 5.581 % 8.987% 6.014%
Table 2: Relative errors of the standard feed-forward model parameter and state estimate with a piecewise constant parameter distribution.
Figure 5: Standard feed-forward model estimates of the true parameter with a piecewise constant parameter distribution. Top row: the true parameter distribution. Bottom row: left is the estimate obtained using full domain data and right is the estimate obtained using boundary data.
Figure 6: Standard feed-forward model estimate of the true state with a piecewise constant parameter distribution. Left is the true temperature distribution and right is the estimate using full domain data.

3.3 Spatially Varying Parameter Distribution

We now display the results when the heat conductivity is spatially varying over all points of the domain. This distribution was drawn from a random Gaussian field. We begin with full domain data; the results are displayed in Figure 7 for and in Figure 8 for with accompanying relative errors displayed in Table 3. In contrast to the case of piecewise constant parameters, we require a much larger value for the regularization parameter in order to achieve under 20% relative error. The training metrics are displayed in Figure 14, 15, 16, 17 of the Appendix for respectively.

Full Data Boundary Data
Parameter Estimate State Estimate Parameter Estimate State Estimate
0.01 69.432% 77.152% 69.889% 87.426%
1 24.321% 3.421% 26.679% 4.211%
10 16.063% 1.601% 17.727% 1.427%
50 14.021% 1.371% 15.892% 1.191%
Table 3: Relative errors of the decoder parameter estimates and the encoder state estimates with a spatially varying parameter distribution.
Figure 7: Encoder state estimates and decoder parameter estimates obtained from using data. First row shows the true parameter and the true state. The second and third rows displays estimates of the true parameter and true state obtained using regularization parameters .
Figure 8: Encoder state estimates and decoder parameter estimates obtained from using data. First row displays the true parameter and the true state. The second and third row displays estimates of the true parameter and true state obtained using regularization parameters .

Finally, we display the results when the heat conductivity is spatially varying at all points of the domain and our observations are boundary data. The results are displayed in Figure 9 for with accompanying relative errors displayed in Table 3. Again, in contrast to the case of piecewise constant parameters, we require a much larger value for the regularization parameter in order to achieve under 20% relative error. Also, we notice that the relative errors are slightly worse than when full domain data is used. The training metrics are displayed in Figure 20, 21, 22, 23 of the Appendix for respectively.

Figure 9: Encoder state estimates and decoder parameter estimates with boundary data. First row displays the true parameter and the true state. The second row from left to right displays estimates of the true parameter with . The third row from left to right displays estimates of the true parameter with .

We compare these estimates with estimates obtained from using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. The architecture of these networks again match that of their analogous encoder and decoder portions of the autoencoder and the loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter are as in (5). We display the results in Figure 10 for the parameter estimates and Figure 11 for the state estimates. Note that there is no estimate displayable for the parameter-to-observable model when boundary data was used. Unlike with the case of a piecewise constant parameter distribution, here both the state and parameter estimates are worse than the estimates obtained using the autoencoder. This is quantitatively supported by the relative errors displayed in Figure 4; we see that these errors are higher than the errors displayed in Table 2.

Full Data Boundary Data
Parameter Estimate State Estimate Parameter Estimate State Estimate
0.001 17.930% 8.828 % 19.027% 9.232%
0.01 25.421% 17.697 % 27.901% 10.907%
0.1 37.79% 25.069 % 39.585% 25.272%
Table 4: Relative errors of the standard feed-forward model parameter and state estimate with a spatially varying parameter distribution.
Figure 10: Standard feed-forward deep neural network estimates of the true parameter with a spatially varying parameter distribution. Top row: true parameter distribution. Left column: estimates using full domain data. Right column: estimates using boundary data. Second, third and fourth row: estimates with .
Figure 11: Standard feed-forward deep neural network estimates of the true state with a spatially varying parameter distribution obtained using full domain measurements. Top row: true state distribution. Bottom row left to right: estimates with .

4 Conclusion and Future Work

In this paper we introduce a new method for solving scientific forward and inverse problems through use of autoencoders. At the heart of our method is to encode the forward model by enforcing the latent variables as the observational data. This informs autoencoders of the underlying forward model, and hence improving the autoencoder performance as an inversion method. Indeed, the results with the spatially varying parameter distribution suggests that there are synergistic advantages when learning the parameter-to-observation and observation-to-parameter map simultaneously. However, care must be taken in the selection of the regularization parameter as the resulting model of both the parameter-to-observation and observation-to-parameter map display sensitivity to our choice. Choosing appropriate regularization parameter can be done through many existing methods including cross validation and -curve. Ongoing work includes the understanding of when the method breaks down and how to further encode the knowledge of the underlying mathematical models in training, and thus further improving the inversion capability of the method. Part of future work is to extend the proposed approach to more challenging inverse problems including those governed by hyperbolic PDEs.

References

  • [1] J. Adler and O. Öktem (2017) Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems 33 (12), pp. 124007. Cited by: §2.
  • [2] J. Adler and O. Öktem (2018) Deep bayesian inversion. arXiv preprint arXiv:1811.05910. Cited by: §2.
  • [3] P. Baldi (2012) Autoencoders, unsupervised learning, and deep architectures. In

    Proceedings of ICML workshop on unsupervised and transfer learning

    ,
    pp. 37–49. Cited by: §2.
  • [4] D. Clevert, T. Unterthiner, and S. Hochreiter (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.
  • [5] N. B. Erichson, M. Muehlebach, and M. W. Mahoney (2019) Physics-informed autoencoders for lyapunov-stable fluid flow prediction. arXiv preprint arXiv:1905.10866. Cited by: §2.
  • [6] A. Gogna, A. Majumdar, and R. Ward (2016) Semi-supervised stacked label consistent autoencoder for reconstruction and analysis of biomedical signals. IEEE Transactions on Biomedical Engineering 64 (9), pp. 2196–2205. Cited by: §2.
  • [7] J. He and J. Xu (2019)

    MgNet: a unified framework of multigrid and convolutional neural network

    .
    Science China Mathematics, pp. 1–24. Cited by: §2.
  • [8] G. E. Hinton and R. R. Salakhutdinov (2006) Reducing the dimensionality of data with neural networks. science 313 (5786), pp. 504–507. Cited by: §2.
  • [9] S. Jalali and X. Yuan (2019) Using auto-encoders for solving ill-posed linear inverse problems. arXiv preprint arXiv:1901.05045. Cited by: §2, §2.
  • [10] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §2, §2.
  • [11] R. King, O. Hennigh, A. Mohan, and M. Chertkov (2018) From deep to physics-informed learning of turbulence: diagnostics. arXiv preprint arXiv:1810.07785. Cited by: §2.
  • [12] H. Li, J. Schwab, S. Antholzer, and M. Haltmeier (2018) NETT: solving inverse problems with deep neural networks. arXiv preprint arXiv:1803.00092. Cited by: §2.
  • [13] A. Majumdar (2018) An autoencoder based formulation for compressed sensing reconstruction. Magnetic resonance imaging 52, pp. 62–68. Cited by: §2.
  • [14] X. Mao, C. Shen, and Y. Yang (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in neural information processing systems, pp. 2802–2810. Cited by: §2.
  • [15] J. Mehta and A. Majumdar (2017) RODEO: robust de-aliasing autoencoder for real-time medical image reconstruction. Pattern Recognition 63, pp. 499–510. Cited by: §2.
  • [16] D. Patel and A. A. Oberai (2019) Bayesian inference with generative adversarial network priors. arXiv preprint arXiv:1907.09987. Cited by: §2.
  • [17] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros (2016) Context encoders: feature learning by inpainting. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 2536–2544. Cited by: §2.
  • [18] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §2.
  • [19] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2017) Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561. Cited by: §2.
  • [20] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2017) Physics informed deep learning (part ii): data-driven discovery of nonlinear partial differential equations. arXiv preprint arXiv:1711.10566. Cited by: §2.
  • [21] M. Raissi (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations.

    The Journal of Machine Learning Research

    19 (1), pp. 932–955.
    Cited by: §2.
  • [22] D. E. Rumelhart, G. E. Hinton, and R. J. Williams (1985) Learning internal representations by error propagation. Technical report California Univ San Diego La Jolla Inst for Cognitive Science. Cited by: §2.
  • [23] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol (2010)

    Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion

    .
    Journal of machine learning research 11 (Dec), pp. 3371–3408. Cited by: §2.
  • [24] W. Wang, Y. Huang, Y. Wang, and L. Wang (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 490–497. Cited by: §2.
  • [25] J. Wu, J. Wang, H. Xiao, and J. Ling (2016) Physics-informed machine learning for predictive turbulence modeling: a priori assessment of prediction confidence. e-print. Cited by: §2.
  • [26] J. Xie, L. Xu, and E. Chen (2012) Image denoising and inpainting with deep neural networks. In Advances in neural information processing systems, pp. 341–349. Cited by: §2.
  • [27] Y. Yang and P. Perdikaris (2018) Physics-informed deep generative models. arXiv preprint arXiv:1812.03511. Cited by: §2.
  • [28] H. Yao, Y. Ren, and Y. Liu (2019) FEA-net: a deep convolutional neural network with physicsprior for efficient data driven pde learning. In AIAA Scitech 2019 Forum, pp. 0680. Cited by: §2.
  • [29] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao (2015)

    Coupled deep autoencoder for single image super-resolution

    .
    IEEE transactions on cybernetics 47 (1), pp. 27–37. Cited by: §2.

Appendix A Training Metrics for Results

Figure 12: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, the parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 13: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, the parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 14: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 15: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 16: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 17: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 18: Training metrics with regularization parameter and using boundary data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 19: Training metrics with regularization parameter and using boundary data. Top row: loss scale losses of the autoencoder, parameter data and state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 20: Training metrics with regularization parameter and using boundary data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 21: Training metrics with regularization parameter and using full domain data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 22: Training metrics with regularization parameter and using boundary data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.
Figure 23: Training metrics with regularization parameter and using boundary data. Top row: log scale losses of the autoencoder, parameter data and the state data . Bottom row: relative error of parameter estimates from the decoder and relative error of the state estimates from the encoder.