Neural-networks for geophysicists and their application to seismic data interpretation

by   Bas Peters, et al.

Neural-networks have seen a surge of interest for the interpretation of seismic images during the last few years. Network-based learning methods can provide fast and accurate automatic interpretation, provided there are sufficiently many training labels. We provide an introduction to the field aimed at geophysicists that are familiar with the framework of forward modeling and inversion. We explain the similarities and differences between deep networks to other geophysical inverse problems and show their utility in solving problems such as lithology interpolation between wells, horizon tracking and segmentation of seismic images. The benefits of our approach are demonstrated on field data from the Sea of Ireland and the North Sea.



page 4

page 5

page 6


Low Shot Learning with Untrained Neural Networks for Imaging Inverse Problems

Employing deep neural networks as natural image priors to solve inverse ...

Data-consistent neural networks for solving nonlinear inverse problems

Data assisted reconstruction algorithms, incorporating trained neural ne...

Multi-resolution neural networks for tracking seismic horizons from few training images

Detecting a specific horizon in seismic images is a valuable tool for ge...

Neural Networks, Hypersurfaces, and Radon Transforms

Connections between integration along hypersufaces, Radon transforms, an...

Deep autoregressive neural networks for high-dimensional inverse problems in groundwater contaminant source identification

Identification of a groundwater contaminant source simultaneously with t...

Automatic classification of geologic units in seismic images using partially interpreted examples

Geologic interpretation of large seismic stacked or migrated seismic ima...

Emerging Directions in Geophysical Inversion

In this chapter, we survey some recent developments in the field of geop...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks (DNNs) have revolutionized computer vision, image processing, and image understanding (see for example

(Deng et al., 2009; Krizhevsky and Hinton, 2009; Ronneberger et al., 2015; Goodfellow et al., 2016)

and references within). In particular, deep convolutional networks have solved long standing problems such as image classification, segmentation, debluring, denoising and more. Most of the applications are based on supervised learning, that is, we are given some data and its corresponding interpretation or labels. The goal of the network is to empirically find the connection between the data and its labels.

Seismic interpretation can be viewed as a type of image understanding, where the 3D-image is the seismic cube, and the interpretation of the seismic data, e.g., horizons, faults, etc. are the labeled features that need to be recovered. Using deep convolution networks is therefore a straight forward extension of existing neural network technology and have been studied recently by many authors (see for example (Peters et al., 2018, 2019; Wu and Zhang, 2018; Waldeland et al., 2018; Poulton, 2002; Leggett et al., 2003; Lowell and Paton, 2018; Zhao, 2018) and references within).

However, while it seems straight forward to use such algorithms, there are some fundamental differences between vision-related applications to seismic processing. First, and maybe most importantly is the amount of labeled, or annotated, data available. While in computer vision labeled data is easy to obtain, it is much more difficult to do so for seismic applications. Second, while the labeled data is likely to be correct in vision, it is much more uncertain in seismic interpretation. For example, when viewing an image, it is usually obvious if an object such as a car exists within a frame; on the other hand, two geologists may argue about the existence or the exact location of a particular fault or a deep horizon. This makes the data for the seismic problem biased. Thirdly, even for labeled data, in most applications, the data is not fully labeled and only small portions of it have been annotated. Finally, while most vision data is 2D, seismic data is typically in 3D and should therefore be learned in 3D when possible. This makes using Graphical Processing Units (GPUs) challenging due to memory restrictions, especially when the networks are deep and wide.

In this paper, we review and discuss some recent work that we and others have done to tackle some of the challenges when attempting to use deep networks for problems that arise from seismic interpretation. In particular, we address DNNs from a geophysicist’s point of view, in terms of network design and optimization. We show that the network can be interpreted as a forward problem while the learning can be interpreted as the inverse problem. Any geophysicist that is familiar with the process of modeling and inversion can therefore understand the process and draw from her previous experiences.

In the rest of the paper, we give background information about deep networks. In particular, we discuss the connection between deep networks to differential equations and show that the machine learning problem is similar to other well-studied problems in geophysics such as the full-waveform inversion or electromagnetic forward and inverse problems. This should make it easy for any geophysicist with such background to understand and contribute to the field. We then discuss two different applications that can be tackled using this framework. First, we explain how DNNs can interpolate lithology, given sparse borehole information and seismic data. Next, we show how networks can predict multiple horizons, including branching horizons. We then summarize the paper and discuss and suggest future applications.

2 Deep Neural Networks - A Geophysicist View

Supposed we are given data, , and its corresponding label map . If there is a physical basis to obtain from , then one should use it. For example, assume that is a velocity model and is a seismic cube. In this case, one can use the wave equation to obtain from . However, for many problems in science and engineering such a mapping is unavailable. Since there is no physical basis to recover from , we turn to an empirical relationship. Many empirical models work well for different applications. For problems where and have a spatial interpretation, deep neural networks have been successful in capturing the information and generating empirical relationships that hold well in practice.

A deep network is a chain of nonlinear transformations of the data. In particular, we turn to recent work (He et al., 2015; Chang et al., 2018; Haber and Ruthotto, 2017) that uses residual networks that have the form


Here, are states, are convolution kernels and

are bias vectors.

Given the network (1) one pushes the data forward through layers to obtain . Given it is possible to predict the label by simply multiplying by a matrix . That is


Let us review the process above from a geophysicist’s point of view and show that the above is equivalent to many other forward problems in geophysics. To this end, the deep network (1) can be viewed as a discretization of a physical process, e.g., the wave or Maxwell’s equations. From this point of view, are the fields (e.g., acoustic or electromagnetic) and and are model parameters such as seismic velocity or electric conductivity. Just like in any other field, when considering the forward problem we assume that we know the model parameters and therefore we can predict the fields, . The classification process in Equation (2) can be interpreted as projecting the fields to measure some of their properties. A similar process in geophysics is when is a projection matrix that measures the field at some locations, that is, in receiver positions.

It is important to stress that the network presented in Equation (1) is just one architecture that we can use. For problems of semantic segmentation it has been shown that coupling a few of these networks, each on a different resolution, gives much better results than using a single resolution. The idea behind such networks is plotted in Figure 1. We refer the reader to (Ronneberger et al., 2015) for more details on efficient network architectures that deal with data with multiple scales.

Figure 1: Unet - a number of resnets with scales (original image), (coarsen image) and . The networks are coupled by restriction and prolongation and are used to deal with data at different resolutions

In general, the model parameters and are unknown in practice and need to be calibrated from the data. This process is similar to the process of finding the seismic velocity model or electric conductivity from some measured geophysical data. To this end, we assume that we have some observed labels

. The learning problem can be framed as a parameter estimation problem, or an inverse problem where we fit the observed labels by minimizing the objective function


Here we introduce the cumulation of model parameters and a regularization term . Most literature assumes that

is a simple Tikhonov regularization or, in the language of deep learning, weight decay, that is

As we will show next, such basic regularization may not be sufficient for problems that arise from seismic applications, and we review other more appropriate regularization for the problems presented here.

While we have emphasized the similarities between the training problem to other geophysical problems, at this point, it is worthwhile pointing out two fundamental differences between deep learning and geophysical inverse problems. First, and most important, in geophysics we are interested in the model, . Such a model generally has some physical attributes that we are interested in. The model typically represents velocity, conductivity, porosity or other physical properties. In machine learning, on the other hand, the model has no real significance. It does not have any physical meaning (that we know of), and therefore it is hard to know what is a “reasonable” model. Second, optimizing the objective function in (3

) is typically done using stochastic gradient descent (SGD)

(Bottou and Bousquet, 2008). It has been shown that using SGD is crucial for the solution of the problem.

In the following sections, we discuss how we use the setting discussed above to solve a number of practical problems that arise in seismic interpretation.

3 Applications to seismic interpretation

In this section, we discuss the application of deep networks to two seismic applications. All applications share the same forward propagation process and the main difference is the way we set up the loss function (misfit) and the regularization. We find it rather remarkable that similar network architectures work for such different problems, and this emphasizes the strength of deep learning applied to seismic interpretation.

One common feature that most geophysical problems share is that the labels,

are not present for the whole seismic image. For example, it is common to have part of the image labeled but not all of it. Another example is that we know only part of a horizon. This is in stark contrast to most computer vision problems where the images are fully labeled. This difference results from the technical difficulty and expertise that is needed to label seismic data. While most non-specialists can identify a cat in an image, an expert may be needed to classify a seismic unit. However, we note that most applications in geophysics share this type of sparse measurement. For example, we never have a fully observed wave field when considering the full waveform inversion, and the misfit is calculated only on the observable point (where we record the data). We therefore modify common loss functions in DNN training to return the misfit only from the locations where the image is labeled.

3.1 Interpolation of lithology between wells using seismic data

Consider some boreholes and assume that geological lithology is observed within the boreholes. Our goal is to use lithology information from the wells to interpret the seismic image (Figure 1(a)).

Specifically, we illustrate the benefits of being able to train on sparse labels such as in Figure 1(c) and predict fully annotated images as in Figure 1(b).

Figure 2: (a) A slice from a 3D seismic model. This is an example of an input for the network. (b) A fully annotated label image where each color indicates a rock/lithology type of interest. We do not use full labels as the target for our networks, because they are time-consuming to generate. (c) An example of a type of label that we use in our examples. The information corresponds to the lithological units derived from logs in two wells. The white space is not used to measure the misfit or compute a gradient; it is unknown information not used for training the network.

When minimizing the loss (3) discussed above, artifacts typically appear in the prediction. These artifacts are a result of the lack of data everywhere. To overcome this problem, we propose to add new regularization terms to the loss. This regularization penalizes unwanted oscillations in the prediction maps.

Note that the true label images that we hope to predict are ‘blocky’. This implies that the underlying probability of each lithological unit should be smooth. The probability of a particular class changes smoothly from low to high across the interface if the network is well trained. We propose to mitigate a lack of labels everywhere by using the prior knowledge that the prediction per class should be smooth. This type of prior information fits in the neural-network training process as a penalty function on the output of the network. To this end consider solving an optimization problem of the form


The regularization is chosen as


where is a discrete gradient matrix (Haber, 2014) that can be implemented using convolutions with kernels of .

Note that the regularization always applies to the full network output. The output is a full image regardless of sparse sampling of data and/or labels. We can still subsample to introduce randomization or for computational reasons. The network is trained using the loss function defined in Equation (4) with quadratic smoothing regularization (5) applied to the network output. The prediction in Figure 2(a) is smooth and the maximum predicted class probability per pixel in Figure 2(b) is a good approximation to the true map as verified by Figure 4. Without regularization, the prediction contains many oscillatory artifacts.

Figure 3: (a) prediction for a single class and (b) maximum predicted class probability per pixel. Both are the result of training including regularization on the network output.
Figure 4: The predicted segmentation from Figure 2(b) (using network output-regularization) overlaid on the seismic input data.

3.2 Horizon tracking by interpolation of scattered picks

Our second application is tracking a horizon from a small number of horizon picks (seed points) in a few large seismic images.

Horizon tracking using neural-networks has seen a few time-periods of varying activity (Harrigan et al., 1992; Veezhinathan et al., 1993; Liu et al., 2005; Huang, 2005; Huang et al., 2005; Kusuma and Fish, 2005; Alberts et al., 2005). Algorithms that are not based on learning have also made progress, see, e.g (Wu and Fomel, 2018) for recent work that combines and extends multiple concepts on deterministic horizon tracking.

It was shown previously (Peters et al., 2018) that it is possible to track a single horizon using the U-net based networks and loss-functions that compute losses and gradients based on the sparse labels only. Therefore, there was no need to work in small patches around labeled points or manually generate fully annotated label images. Here we answer two follow-up questions: 1) can we train a network to track more than one horizon simultaneously? 2) How do networks deal with multiple horizons that merge and split? These two questions warrant a new look at the automatic horizon tracking/interpolation problem because results with merging horizons are very rarely published. Especially since there is a renewed surge of interest in using neural networks for seismic interpretation, we need to test the promise of networks against the more challenging situation posed in the above two questions.

We demonstrate our method using a 3D seismic dataset from the North Sea. One of the slices is shown in Figure 4(a). An industrial partner provided us the horizon x-y-z locations, picked by seismic interpreters because their auto-tracking algorithms had difficulties tracking the deeper horizons. We create a label image by convolving the horizon picks (seed points) with a Gaussian kernel in the vertical direction. This procedure adds a sense of uncertainty to the pick. We use approximately locations per slice for training, as shown in Figure 4(b)

. Only the colored columns are used to train the network; in the white space, it is unknown if and where the horizon is. The loss function only uses the information in the known label columns. We see that there are two horizons of interest which merge near the right side of the figure and also get close to each other at the left end. We train a single network to predict both horizons simultaneously, using the non-linear regression and optimization approach detailed in

(Peters et al., 2018). The network design is as described earlier in this work.

Figure 5: (a) one of the data images, (b) a label image, about ten columns per image are known, the network never uses the white space. The labels are the convolutions of a Gaussian kernel with the horizon picks. (c) network output with training and testing picks. (d) color-coded network horizon prediction on top of the data.

Figure 4(c) displays the network output, which ideally is the true horizon everywhere convolved with the Gaussian kernel that we used to generate training label images. The training and evaluation picks are plotted on top, and validate that the network is able to predict both horizons accurately, including the point where they merge. In Figure 4(d) we show the network output prediction plotted on top of the seismic data to provide some more insight. The color-coding corresponds to the greyscale intensity of the previous figure. The colors and vertical spread indicate how ‘sure’ the network thinks it is about the prediction.

From the results, we conclude that we can train a single network to simultaneously predict the location of multiple horizons that merge and branch. The symmetric convolutional U-net variant, with the same network architecture as in the previous example, trained by a partial loss-function on a small number of known horizon x-y-z locations achieves excellent results. Data-augmentation and regularization as described in an earlier section can reduce the number of required training x-y-z picks.

4 Conclusions

In this paper, we have introduced deep neural networks from an inverse problems point of view. We have shown that the network can be considered as the “forward problem” and the training as the “inverse problem”. We have explored the connection between deep networks to other geophysical inverse problems. We believe that approaching the learning problem in this way allows us to understand better the role of data fitting, regularization, the stability of the network itself, the propagation of noise within the network, and the associated uncertainties; all topics that have received ample treatment in geophysical inverse problems.

We have demonstrated the capability of deep networks to deal with problems that arise from seismic interpretation. In our experience, neural networks can do exceptionally well for such problems given some thought about appropriate regularization and loss or misfit functions.

When solving a particular problem, it is important to realize that geophysical problems are very different from common vision problems. The availability of accurate training data is key to training the network and this can be difficult to obtain in many applications. Another important aspect is the size of the data. While vision problems are typically 2D, many geophysical problems are 3D. We believe that new algorithms should be developed to deal with the size of geophysical images as well as with the uncertainty that is an inherent part of geophysical processing.