Deep Multi-fidelity Gaussian Processes

04/26/2016
by   Maziar Raissi, et al.
0

We develop a novel multi-fidelity framework that goes far beyond the classical AR(1) Co-kriging scheme of Kennedy and O'Hagan (2000). Our method can handle general discontinuous cross-correlations among systems with different levels of fidelity. A combination of multi-fidelity Gaussian Processes (AR(1) Co-kriging) and deep neural networks enables us to construct a method that is immune to discontinuities. We demonstrate the effectiveness of the new technology using standard benchmark problems designed to resemble the outputs of complicated high- and low-fidelity codes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/29/2020

Multi-fidelity modeling with different input domain definitions using Deep Gaussian Processes

Multi-fidelity approaches combine different models built on a scarce but...
03/18/2019

Deep Gaussian Processes for Multi-fidelity Modeling

Multi-fidelity methods are prominently used when cheaply-obtained, but p...
11/15/2018

Optimizing Photonic Nanostructures via Multi-fidelity Gaussian Processes

We apply numerical methods in combination with finite-difference-time-do...
06/01/2012

Predictive Information Rate in Discrete-time Gaussian Processes

We derive expressions for the predicitive information rate (PIR) for the...
12/04/2021

Data Fusion with Latent Map Gaussian Processes

Multi-fidelity modeling and calibration are data fusion tasks that ubiqu...
09/20/2017

Integrating hyper-parameter uncertainties in a multi-fidelity Bayesian model for the estimation of a probability of failure

A multi-fidelity simulator is a numerical model, in which one of the inp...
09/14/2021

HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO

To achieve peak predictive performance, hyperparameter optimization (HPO...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Motivation

Multi-fidelity modeling proves extremely useful while solving inverse problems for instance. Inverse problems are ubiquitous in science. In general, the response of a system is modeled as a function . The goal of model inversion is to find a parameter setting that matches a target response . In other words, we are solving the following optimization problem:

for some suitable norm. In practice,

is often a high-dimensional vector and

is a complex, non-linear, and expensive to compute map. These factors render the solution of the optimization problem very challenging and motivate the use of surrogate models as a remedy for obtaining inexpensive samples of at unobserved locations. To this end, a surrogate model acts as an intermediate agent that is trained on available realizations of , and then is able to perform accurate predictions for the response at a new set of inputs. Multi-fidelity framework can be employed to build efficient surrogate models of . Our Deep Multi-fidelity GP algorithm is most useful when the function is very complicated, involves discontinuities, and when the correlation structures between different levels of fidelity have discontinuous nonfunctional forms.

2 Introduction

Using deep neural networks, we build a multi-fidelity model that is immune to discontinuities. We employ Gaussian Processes (GPs) (see [5]) which is a non-parametric Bayesian regression technique. Gaussian Processes is a very popular and useful tool to approximate an objective function given some of its observations. It corresponds to a particular class of surrogate models which makes the assumption that the response of the complex system is a realization of a Gaussian process. In particular, we are interested in Manifold Gaussian Processes [1] that are capable of capturing discontinuities. Manifold GP is equivalent to jointly learning a data transformation into a feature space followed by a regular GP regression. The model profits from standard GP properties. We show that the well-known classical multi-fidelity Gaussian Processes (AR(1) Co-kriging) [4] is a special case of our method. Multi-fidelity modeling is most useful when low-fidelity versions of a complex system are available. They may be less accurate but are computationally cheaper.

For the sake of clarity of presentation, we focus only on two levels of fidelity. However, our method can be readily generalized to multiple levels of fidelity. In the following, we assume that we have access to data with two levels of fidelity , where has a higher level of fidelity. We use to denote the number of observations in and to denote the sample size of . The main assumption is that . This is to reflect the fact that high-fidelity data are scarce since they are generated by an accurate but costly process. The low fidelity data, on the other hand, are less accurate, cheap to generate and hence are abundant.

As for the notation, we employ the following convention. A boldface letter such as is used to denote data. A non-boldface letter such as is used to denote both a vector or a scalar. This will be clear from the context.

3 Deep Multi-fidelity Gaussian Processes

A simple way to explain the main idea of this work is to consider the following structure:

(1)

where

The high fidelity system is modeled by and the low fidelity one by . We use to denote a Gaussian Process. This approach can use any deterministic parametric data transformation . However, we focus on multi-layer neural networks

where each layer of the network performs the transformation

with being the transfer function, the weights, and the bias of the layer. We use to denote the parameters of the neural network. Moreover, and denote the hyper-parameters of the covariance functions and , respectively. The parameters of the model are therefore given by

It should be noted that the AR(1) Co-kriging model of [4] is a special case of our model in the sense that for AR(1) Co-kriging .

3.1 AR(1) Co-kriging

In [4]

, the authors consider the following autoregressive model

where and are two independent Gaussian Processes with

and

Therefore,

and

(2)

which is a special case of (1) with . The importance of is evident from (2). If , the high fidelity and low fidelity models are fully decoupled and by combining there will be no improvements of the prediction.

4 Prediction

The Deep Multi-fidelity Gaussian Process structure (1) can be equivalently written in the following compact form of a multivariate Gaussian Process

(3)

with , and . This can be used to obtain the predictive distribution

of the surrogate model for the high fidelity system at a new test point (see equation (4)). Note that the terms and model the correlation between the high-fidelity and the low-fidelity data and therefore are of paramount importance. The key role played by is already well-known in the literature [4]. Along the same lines one can easily observe the effectiveness of learning the transformation function jointly from the low fidelity and high fidelity data.

We obtain the following joint density:

where , and . From this, we conclude that

(4)

where

(5)
(6)
(7)

5 Training

The Negative Marginal Log Likelihood is given by

(8)

where

The Negative Marginal Log Likelihood along with its Gradient can be used to estimate the parameters

. Finding the gradient is discussed in the following. First observe that

Therefore,

(9)

and

(10)

where

. We use backpropagation to find

. Backpropagation is a popular method of training artificial neural networks. With this method one can calculate the gradients of with respect to all the parameters in the network.

6 Summary of the Algorithm

The following summarizes our Deep Multi-fidelity GP algorithm.

  • First, we employ the Negative Marginal Log Likelihood (see eq. 8) to train the parameters and hyper-parameters of the model using the low and high-fidelity data . We are therefore jointly training the neural network and the kernels and introduced in eq. 1.

  • Then, use eq. 4 to predict the the output of the high-fidelity function at a new test point .

7 Numerical Experiments

To demonstrate the effectiveness of our proposed method, we apply our Deep Multi-fidelity Gaussian Processes algorithm to the following challenging benchmark problems.

7.1 Step Function

The high fidelity data is generated by the following step function

where

and the low fidelity data are generated by

where

In order to generate the training data, we pick uniformly distributed random points from the interval . Out of these points, are chosen at random to constitute and are picked at random to create . We therefore obtain the dataset . This dataset is depicted in figure 1.

Figure 1: Low-fidelity and High-fidelity dataset

We use a multi-layer neural network of neurons. This means that is given by

Moreover, is given by with

being the Sigmoid function. Furthermore,

is given by .

As for the kernels and , we use the squared exponential covariance functions with Automatic Relevance Determination (ARD) (see [5]) of the form

The predictive mean and two standard deviation bounds for our Deep Multi-fidelity Gaussian Processes method is depicted in figure

2.

Figure 2: Deep Multi-fidelity Gaussian Processes predicive mean and two standard deviations

The 2D feature space discovered by the nonlinear mapping is depicted in figure 3. Recall that, for this example, we have .

Figure 3: The 2D feature space discovered by the nonlinear mapping . Notice that . Latent dimensions 1 and 2 correspond to the first and second dimensions of .

The discontinuity of the model is captured by the non-linear mapping . Therefore, the mapping from the feature space to outputs is smooth and can be easily handled by a regular AR(1) Co-kriging model. In order to see the importance of the mapping , let us compare our method with AR(1) Co-kriging. This is depicted in figure 4.

Figure 4: AR(1) Co-kriging predicive mean and two standard deviations

7.2 Forrester Function [3] with Jump

The low fidelity data are generated by

and the high fidelity data are generated by

In order to generate the training data, we pick uniformly distributed random points from the interval . Out of these points, are chosen at random to constitute and are picked at random to create . We therefore obtain the dataset . This dataset is depicted in figure 5.

Figure 5: Low-fidelity and High-fidelity dataset

Figure 6 depicts the relation between the low fidelity and the high fidelity data generating processes. One should notice the discontinuous and non-functional form of this relation.

Figure 6: Relation between the Low-fidelity and High-fidelity data generating processes.

Our choice of the neural network and covariance functions is as before. The predictive mean and two standard deviation bounds for our Deep Multi-fidelity Gaussian Processes method is depicted in figure 7.

Figure 7: Deep Multi-fidelity Gaussian Processes predicive mean and two standard deviations

The 2D feature space discovered by the nonlinear mapping is depicted in figure 8.

Figure 8: The 2D feature space discovered by the nonlinear mapping . Notice that . Latent dimensions 1 and 2 correspond to the first and second dimensions of .

Once again, the discontinuity of the model is captured by the non-linear mapping . In order to see the importance of the mapping , let us compare our method with AR(1) Co-kriging. This is depicted in figure 9.

Figure 9: AR(1) Co-kriging predicive mean and two standard deviations

7.3 A Sample Function

The main objective of this section is to demonstrate the types of cross-correlation structures that our framework is capable of handling. In the following, let the true mapping be given by

This is plotted in figure 10.

Figure 10: The true mapping .

Given , we generate a sample of the joint prior distribution 1. This gives us two sample functions and , where is the high-fidelity one. In order to generate the training data, we pick uniformly distributed random points from the interval . Out of these points, are chosen at random to constitute and are picked at random to create . We therefore obtain the dataset . This dataset is depicted in figure 11.

Figure 11: Low-fidelity and High-fidelity dataset

Figure 12 depicts the relation between the low fidelity and the high fidelity data generating processes. One should notice the discontinuous and non-functional form of this relation.

Figure 12: Relation between the Low-fidelity and High-fidelity data generating processes.

Our choice of the neural network and covariance functions is as before. The predictive mean and two standard deviation bounds for our Deep Multi-fidelity Gaussian Processes method is depicted in figure 13.

Figure 13: Deep Multi-fidelity Gaussian Processes predicive mean and two standard deviations

The 2D feature space discovered by the nonlinear mapping is depicted in figure 14. One should notice the discrepancy between the true mapping and the one learned by our algorithm. This discrepancy reflects the fact that the mapping from to the feature space is not necessarily unique.

Figure 14: The 2D feature space discovered by the nonlinear mapping . Notice that . Latent dimensions 1 and 2 correspond to the first and second dimensions of .

Once again, the discontinuity of the model is captured by the non-linear mapping . In order to see the importance of the mapping , let us compare our method with AR(1) Co-kriging. This is depicted in figure 15.

Figure 15: AR(1) Co-kriging predicive mean and two standard deviations

8 Conclusion

We devised a surrogate model that is capable of capturing general discontinuous correlation structures between the low- and high-fidelity data generating processes. The model’s efficiency in handling discontinuities was demonstrated using benchmark problems. Essentially, the discontinuity is captured by the neural network. The abundance of low-fidelity data allows us to train the network accurately. We therefore need very few observations of the high-fidelity data generating process.

A major drawback of our method could be its overconfidence which stems from the fact that, unlike Gaussian Processes, neural networks are not capable of modeling uncertainty. Modeling the data transformation function as a Gaussian Process, instead of a neural network, might be a more proper way of modeling uncertainty. However, this becomes analytically intractable and more challenging. This could be a promising subject of future research. A good reference in this direction is [2].

Acknowledgments

This work was supported by the DARPA project on Scalable Framework for Hierarchical Design and Planning under Uncertainty with Application to Marine Vehicles (N66001-15-2-4055).

References