uq-vae
Solving Bayesian Inverse Problems via Variational Autoencoders
view repo
This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.
READ FULL TEXT VIEW PDFSolving Bayesian Inverse Problems via Variational Autoencoders
This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.
Deep learning and related neural network techniques have provided a useful framework for modelling physical systems. A significant question that arises in effective modelling is how one informs the neural network of the task at hand in order to improve its performance. Examples include physics informed neural networks [19, 20, 21, 18]
where the residual governing partial differential equation acts as a regularizing term to inform the network of the underlying physics, or FEA-net and MG-net in
[28, 7] which leverages the known structure of discrete PDE solvers, and many others [25, 11, 27, 5]. In the context of physically driven inverse problems, the solving of PDEs can be considered as the forward problem. Neural networks have also been utilized in the solving of inverse problems [2, 1, 10, 12, 9, 16]. As with solving forward problems, many of the techniques used to solve inverse problems with neural networks involve the introduction of the neural network as a regularizer to the problem. In our work, we aim to equip autoencoders with the underlying forward model by enforcing the latent variables to be the observational data. This model-aware autoencoder, as shall be demonstrated, is a promising approach for solving inverse problems.Autoencoders were first introduced in [22]
to address the challenges of unsupervised learning by using the input data as the teacher to train a neural network. The more modern use of autoencoders involves selecting a particular architecture or optimization problem for training the autoencoder so that it is able to learn new representations of data. For example, by selecting a network architecture involving a bottleneck in the hidden layers, a network that is trained to recover the input through the bottleneck can then be used as a dimensionality reduction operator
[8, 24]. Another notable example is that by corrupting the input with noise, the trained autoencoder can act as a denoiser of data [23]. In [3], a general framework for studying linear and nonlinear autoencoders was introduced.Autoencoders have been used in solving inverse problems in [9, 29, 26, 14, 17, 10, 6, 13, 15] mostly in the context of image reconstruction. In [9], a two-layer autoencoder was used in the setting of compressed sensing in order to recover a high-dimensional signal from underdetermined linear measurements. In doing so, a generative model was trained such that an approximation of the signal can be obtained as a mapping from some latent space. Further, the work provided a proof that if the number of measurements is larger than twice the dimension of the generative model, the signal can be recovered from the measurements up to some distortion. In [6, 15], training autoencoders to obtain a signal generating model was used in the context of medical imaging. In [13], a more generic patch-based reconstruction technique was introduced that can be applied to any imaging modality. In this paper we take a radically different approach, that is, instead of considering the encoder-decoder transition layer as the latent space, we enforce it to be our measurement space where we input data. In doing so, by training the autoencoder, we simultaneously learn the forward map as the encoder and the inverse map as the decoder.
Regularization is often used to prevent autoencoders from learning a trivial identity mapping from input to output. In our work, we consider the input and output data to be a parameter of interest and we use regularization in order to ensure that the autoencoder learns the inverse mapping from measurement data to the parameter. Specifically, we consider a loss function of the form
(1) |
where the parameter of interest, denotes the parameter data, the encoder, the decoder, the autoencoder network weights and the observation of the state. By minimizing this loss function, we obtain the observation-to-parameter map as the decoder. Additionally, we also obtain the parameter-to-observation map as the encoder. A simple three-layer autoencoder is depicted in Figure 1
. With this approach, measurement data can be input as an argument of the decoder portion of the autoencoder in order to obtain an estimate as the output; thereby effectively performing an inverse problem solve.
To the best of our knowledge, the equipping of the latent space of an autoencoder as a measurement space to make the autoencoder aware of its task in learning the forward and inverse mapping simultaneously is a novel concept.In this section, we present preliminary results for the thermal fin problem. The temperature distribution within the fin, , is governed by the following elliptic partial differential equation:
(2) | ||||
(3) | ||||
(4) |
where denotes the thermal heat conductivity, is the Biot number, is the physical domain describing the thermal fin, is the bottom edge of the fin, is the exterior edges of the fin, equation (3) models convective heat losses to the external surface, and equation (4) models the heat source at the root. The experiment parameterizes the thermal conductivity as a function defined over a finite element mesh. The quantity of interest for the forward problem is the pointwise temperature across the fin. The quantity of interest for the inverse problem is the heat conductivity across the fin. The finite element mesh for the thermal fin is displayed in Figure 2. We consider two cases of parameter distribution. The first case is a piece-wise constant distribution over the eight subfins and central subdomain. The second case is a parameter distribution that is spatially varying over the whole fin.
Our neural network architecture consists of hidden layers with each layer possessing 500 nodes. We consider two test cases. For the first case, we assume that the measurements are taken over the full domain and so the third hidden layer where the measurement data is input consists of nodes; equal to the number of nodes in the computational mesh of the thermal fin. The second case assumes that we have boundary measurements; this consists of measurement points along the outside of the thermal fin and so the third hidden layer where the measured data is input consists of nodes. Optimization is performed using the Adam optimizer with epochs and we consider the regularization parameters for the case of a piece-wise constant parameter distribution and the regularization parameters for the case of a spatially varying parameter distribution. We use a data set of parameter and state measurement pairs for training. We compare the estimates obtained using our autoencoder with estimates obtained using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. The loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter map are respectively:
(5a) | |||
(5b) |
We begin with the case of a piecewise constant distribution of the heat conductivity. In Figure 3 we display the estimates with the accompanying relative errors displayed in Table 1. The training metrics are displayed in Figure 12 and Figure 13 of the Appendix for the and cases respectively. For the case when , we can see that the estimates are accurate. This is also quantitatively supported by the low relative errors. Furthermore, the training metrics displayed in Figure 13 of the appendix show desirable behaviour with the relative errors of the parameter and state predictions decreasing as training loss decreases. For the case when , we can see that the parameter and state estimates are considerably less accurate than when the which suggests a dependence on the measurement data when training the network.
Full Data | Boundary Data | |||
---|---|---|---|---|
Parameter Estimate | State Estimate | Parameter Estimate | State Estimate | |
0.01 | 32.119% | 6.932 % | 37.218% | 8.418% |
1 | 4.151% | 0.409% | 5.139% | 0.3286 |
We now display results obtained using boundary data. In Figure 4 we display the estimates with the accompanying relative errors displayed in Table 1. Note that there is no state prediction displayed as the middle layer of the autoencoder corresponding to the measurement data input only consists of nodes. Therefore, the encoder can only be used to estimate the boundary data and not the full domain distribution of temperature. As with the full domain data results, for the case when , we can see that the parameter and state estimates are considerably less accurate than when the which suggests a dependence on the measurement data when training the network. The training metrics displayed in Figure 18 and Figure 19 of the Appendix respectively again show desirable behaviour.
We compare these estimates with estimates obtained from using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. We denote these as and respectively and the architecture of these networks match that of their analogous encoder and decoder portions of the autoencoder. That is, each network consists of two hidden layers each possessing nodes. The loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter map are as displayed in (5). We set the regularization parameter to be .
We display the parameter estimates in Figure 5 and the state estimates in Figure 6. Note that there is no reconstruction displayable for the parameter-to-observation model when boundary data was used. From the quality of the estimates, it is clear that both neural networks possess enough capacity to accurately learn their respective maps. This is quantitatively supported by the relative errors displayed in Table 2. Therefore, our results suggest that simultaneous learning with an autoencoder does not significantly improve the learning of the parameter-to-observation and the observation-to-parameter map for the case of a piece-wise constant parameter distribution.
Full Data | Boundary Data | |||
---|---|---|---|---|
Parameter Estimate | State Estimate | Parameter Estimate | State Estimate | |
0.01 | 6.806% | 5.581 % | 8.987% | 6.014% |
We now display the results when the heat conductivity is spatially varying over all points of the domain. This distribution was drawn from a random Gaussian field. We begin with full domain data; the results are displayed in Figure 7 for and in Figure 8 for with accompanying relative errors displayed in Table 3. In contrast to the case of piecewise constant parameters, we require a much larger value for the regularization parameter in order to achieve under 20% relative error. The training metrics are displayed in Figure 14, 15, 16, 17 of the Appendix for respectively.
Full Data | Boundary Data | |||
---|---|---|---|---|
Parameter Estimate | State Estimate | Parameter Estimate | State Estimate | |
0.01 | 69.432% | 77.152% | 69.889% | 87.426% |
1 | 24.321% | 3.421% | 26.679% | 4.211% |
10 | 16.063% | 1.601% | 17.727% | 1.427% |
50 | 14.021% | 1.371% | 15.892% | 1.191% |
Finally, we display the results when the heat conductivity is spatially varying at all points of the domain and our observations are boundary data. The results are displayed in Figure 9 for with accompanying relative errors displayed in Table 3. Again, in contrast to the case of piecewise constant parameters, we require a much larger value for the regularization parameter in order to achieve under 20% relative error. Also, we notice that the relative errors are slightly worse than when full domain data is used. The training metrics are displayed in Figure 20, 21, 22, 23 of the Appendix for respectively.
We compare these estimates with estimates obtained from using a standard feed-forward deep neural network to model the parameter-to-observation and the observation-to-parameter map. The architecture of these networks again match that of their analogous encoder and decoder portions of the autoencoder and the loss functions we minimize to learn the parameter-to-observation and the observation-to-parameter are as in (5). We display the results in Figure 10 for the parameter estimates and Figure 11 for the state estimates. Note that there is no estimate displayable for the parameter-to-observable model when boundary data was used. Unlike with the case of a piecewise constant parameter distribution, here both the state and parameter estimates are worse than the estimates obtained using the autoencoder. This is quantitatively supported by the relative errors displayed in Figure 4; we see that these errors are higher than the errors displayed in Table 2.
Full Data | Boundary Data | |||
---|---|---|---|---|
Parameter Estimate | State Estimate | Parameter Estimate | State Estimate | |
0.001 | 17.930% | 8.828 % | 19.027% | 9.232% |
0.01 | 25.421% | 17.697 % | 27.901% | 10.907% |
0.1 | 37.79% | 25.069 % | 39.585% | 25.272% |
In this paper we introduce a new method for solving scientific forward and inverse problems through use of autoencoders. At the heart of our method is to encode the forward model by enforcing the latent variables as the observational data. This informs autoencoders of the underlying forward model, and hence improving the autoencoder performance as an inversion method. Indeed, the results with the spatially varying parameter distribution suggests that there are synergistic advantages when learning the parameter-to-observation and observation-to-parameter map simultaneously. However, care must be taken in the selection of the regularization parameter as the resulting model of both the parameter-to-observation and observation-to-parameter map display sensitivity to our choice. Choosing appropriate regularization parameter can be done through many existing methods including cross validation and -curve. Ongoing work includes the understanding of when the method breaks down and how to further encode the knowledge of the underlying mathematical models in training, and thus further improving the inversion capability of the method. Part of future work is to extend the proposed approach to more challenging inverse problems including those governed by hyperbolic PDEs.
Proceedings of ICML workshop on unsupervised and transfer learning
, pp. 37–49. Cited by: §2.MgNet: a unified framework of multigrid and convolutional neural network
. Science China Mathematics, pp. 1–24. Cited by: §2.Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 2536–2544. Cited by: §2.The Journal of Machine Learning Research
19 (1), pp. 932–955. Cited by: §2.Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
. Journal of machine learning research 11 (Dec), pp. 3371–3408. Cited by: §2.Coupled deep autoencoder for single image super-resolution
. IEEE transactions on cybernetics 47 (1), pp. 27–37. Cited by: §2.