Deep neural networks (DNNs) have revolutionized computer vision, image processing, and image understanding (see for example(Deng et al., 2009; Krizhevsky and Hinton, 2009; Ronneberger et al., 2015; Goodfellow et al., 2016)
and references within). In particular, deep convolutional networks have solved long standing problems such as image classification, segmentation, debluring, denoising and more. Most of the applications are based on supervised learning, that is, we are given some data and its corresponding interpretation or labels. The goal of the network is to empirically find the connection between the data and its labels.
Seismic interpretation can be viewed as a type of image understanding, where the 3D-image is the seismic cube, and the interpretation of the seismic data, e.g., horizons, faults, etc. are the labeled features that need to be recovered. Using deep convolution networks is therefore a straight forward extension of existing neural network technology and have been studied recently by many authors (see for example (Peters et al., 2018, 2019; Wu and Zhang, 2018; Waldeland et al., 2018; Poulton, 2002; Leggett et al., 2003; Lowell and Paton, 2018; Zhao, 2018) and references within).
However, while it seems straight forward to use such algorithms, there are some fundamental differences between vision-related applications to seismic processing. First, and maybe most importantly is the amount of labeled, or annotated, data available. While in computer vision labeled data is easy to obtain, it is much more difficult to do so for seismic applications. Second, while the labeled data is likely to be correct in vision, it is much more uncertain in seismic interpretation. For example, when viewing an image, it is usually obvious if an object such as a car exists within a frame; on the other hand, two geologists may argue about the existence or the exact location of a particular fault or a deep horizon. This makes the data for the seismic problem biased. Thirdly, even for labeled data, in most applications, the data is not fully labeled and only small portions of it have been annotated. Finally, while most vision data is 2D, seismic data is typically in 3D and should therefore be learned in 3D when possible. This makes using Graphical Processing Units (GPUs) challenging due to memory restrictions, especially when the networks are deep and wide.
In this paper, we review and discuss some recent work that we and others have done to tackle some of the challenges when attempting to use deep networks for problems that arise from seismic interpretation. In particular, we address DNNs from a geophysicist’s point of view, in terms of network design and optimization. We show that the network can be interpreted as a forward problem while the learning can be interpreted as the inverse problem. Any geophysicist that is familiar with the process of modeling and inversion can therefore understand the process and draw from her previous experiences.
In the rest of the paper, we give background information about deep networks. In particular, we discuss the connection between deep networks to differential equations and show that the machine learning problem is similar to other well-studied problems in geophysics such as the full-waveform inversion or electromagnetic forward and inverse problems. This should make it easy for any geophysicist with such background to understand and contribute to the field. We then discuss two different applications that can be tackled using this framework. First, we explain how DNNs can interpolate lithology, given sparse borehole information and seismic data. Next, we show how networks can predict multiple horizons, including branching horizons. We then summarize the paper and discuss and suggest future applications.
2 Deep Neural Networks - A Geophysicist View
Supposed we are given data, , and its corresponding label map . If there is a physical basis to obtain from , then one should use it. For example, assume that is a velocity model and is a seismic cube. In this case, one can use the wave equation to obtain from . However, for many problems in science and engineering such a mapping is unavailable. Since there is no physical basis to recover from , we turn to an empirical relationship. Many empirical models work well for different applications. For problems where and have a spatial interpretation, deep neural networks have been successful in capturing the information and generating empirical relationships that hold well in practice.
A deep network is a chain of nonlinear transformations of the data. In particular, we turn to recent work (He et al., 2015; Chang et al., 2018; Haber and Ruthotto, 2017) that uses residual networks that have the form
Here, are states, are convolution kernels and
are bias vectors.
Given the network (1) one pushes the data forward through layers to obtain . Given it is possible to predict the label by simply multiplying by a matrix . That is
Let us review the process above from a geophysicist’s point of view and show that the above is equivalent to many other forward problems in geophysics. To this end, the deep network (1) can be viewed as a discretization of a physical process, e.g., the wave or Maxwell’s equations. From this point of view, are the fields (e.g., acoustic or electromagnetic) and and are model parameters such as seismic velocity or electric conductivity. Just like in any other field, when considering the forward problem we assume that we know the model parameters and therefore we can predict the fields, . The classification process in Equation (2) can be interpreted as projecting the fields to measure some of their properties. A similar process in geophysics is when is a projection matrix that measures the field at some locations, that is, in receiver positions.
It is important to stress that the network presented in Equation (1) is just one architecture that we can use. For problems of semantic segmentation it has been shown that coupling a few of these networks, each on a different resolution, gives much better results than using a single resolution. The idea behind such networks is plotted in Figure 1. We refer the reader to (Ronneberger et al., 2015) for more details on efficient network architectures that deal with data with multiple scales.
In general, the model parameters and are unknown in practice and need to be calibrated from the data. This process is similar to the process of finding the seismic velocity model or electric conductivity from some measured geophysical data. To this end, we assume that we have some observed labels
. The learning problem can be framed as a parameter estimation problem, or an inverse problem where we fit the observed labels by minimizing the objective function
Here we introduce the cumulation of model parameters and a regularization term . Most literature assumes that
is a simple Tikhonov regularization or, in the language of deep learning, weight decay, that is
As we will show next, such basic regularization may not be sufficient for problems that arise from seismic applications, and we review other more appropriate regularization for the problems presented here.
While we have emphasized the similarities between the training problem to other geophysical problems, at this point, it is worthwhile pointing out two fundamental differences between deep learning and geophysical inverse problems. First, and most important, in geophysics we are interested in the model, . Such a model generally has some physical attributes that we are interested in. The model typically represents velocity, conductivity, porosity or other physical properties. In machine learning, on the other hand, the model has no real significance. It does not have any physical meaning (that we know of), and therefore it is hard to know what is a “reasonable” model. Second, optimizing the objective function in (3
) is typically done using stochastic gradient descent (SGD)(Bottou and Bousquet, 2008). It has been shown that using SGD is crucial for the solution of the problem.
In the following sections, we discuss how we use the setting discussed above to solve a number of practical problems that arise in seismic interpretation.
3 Applications to seismic interpretation
In this section, we discuss the application of deep networks to two seismic applications. All applications share the same forward propagation process and the main difference is the way we set up the loss function (misfit) and the regularization. We find it rather remarkable that similar network architectures work for such different problems, and this emphasizes the strength of deep learning applied to seismic interpretation.
One common feature that most geophysical problems share is that the labels,
are not present for the whole seismic image. For example, it is common to have part of the image labeled but not all of it. Another example is that we know only part of a horizon. This is in stark contrast to most computer vision problems where the images are fully labeled. This difference results from the technical difficulty and expertise that is needed to label seismic data. While most non-specialists can identify a cat in an image, an expert may be needed to classify a seismic unit. However, we note that most applications in geophysics share this type of sparse measurement. For example, we never have a fully observed wave field when considering the full waveform inversion, and the misfit is calculated only on the observable point (where we record the data). We therefore modify common loss functions in DNN training to return the misfit only from the locations where the image is labeled.
3.1 Interpolation of lithology between wells using seismic data
Consider some boreholes and assume that geological lithology is observed within the boreholes. Our goal is to use lithology information from the wells to interpret the seismic image (Figure 1(a)).
When minimizing the loss (3) discussed above, artifacts typically appear in the prediction. These artifacts are a result of the lack of data everywhere. To overcome this problem, we propose to add new regularization terms to the loss. This regularization penalizes unwanted oscillations in the prediction maps.
Note that the true label images that we hope to predict are ‘blocky’. This implies that the underlying probability of each lithological unit should be smooth. The probability of a particular class changes smoothly from low to high across the interface if the network is well trained. We propose to mitigate a lack of labels everywhere by using the prior knowledge that the prediction per class should be smooth. This type of prior information fits in the neural-network training process as a penalty function on the output of the network. To this end consider solving an optimization problem of the form
The regularization is chosen as
where is a discrete gradient matrix (Haber, 2014) that can be implemented using convolutions with kernels of .
Note that the regularization always applies to the full network output. The output is a full image regardless of sparse sampling of data and/or labels. We can still subsample to introduce randomization or for computational reasons. The network is trained using the loss function defined in Equation (4) with quadratic smoothing regularization (5) applied to the network output. The prediction in Figure 2(a) is smooth and the maximum predicted class probability per pixel in Figure 2(b) is a good approximation to the true map as verified by Figure 4. Without regularization, the prediction contains many oscillatory artifacts.
3.2 Horizon tracking by interpolation of scattered picks
Our second application is tracking a horizon from a small number of horizon picks (seed points) in a few large seismic images.
Horizon tracking using neural-networks has seen a few time-periods of varying activity (Harrigan et al., 1992; Veezhinathan et al., 1993; Liu et al., 2005; Huang, 2005; Huang et al., 2005; Kusuma and Fish, 2005; Alberts et al., 2005). Algorithms that are not based on learning have also made progress, see, e.g (Wu and Fomel, 2018) for recent work that combines and extends multiple concepts on deterministic horizon tracking.
It was shown previously (Peters et al., 2018) that it is possible to track a single horizon using the U-net based networks and loss-functions that compute losses and gradients based on the sparse labels only. Therefore, there was no need to work in small patches around labeled points or manually generate fully annotated label images. Here we answer two follow-up questions: 1) can we train a network to track more than one horizon simultaneously? 2) How do networks deal with multiple horizons that merge and split? These two questions warrant a new look at the automatic horizon tracking/interpolation problem because results with merging horizons are very rarely published. Especially since there is a renewed surge of interest in using neural networks for seismic interpretation, we need to test the promise of networks against the more challenging situation posed in the above two questions.
We demonstrate our method using a 3D seismic dataset from the North Sea. One of the slices is shown in Figure 4(a). An industrial partner provided us the horizon x-y-z locations, picked by seismic interpreters because their auto-tracking algorithms had difficulties tracking the deeper horizons. We create a label image by convolving the horizon picks (seed points) with a Gaussian kernel in the vertical direction. This procedure adds a sense of uncertainty to the pick. We use approximately locations per slice for training, as shown in Figure 4(b)
. Only the colored columns are used to train the network; in the white space, it is unknown if and where the horizon is. The loss function only uses the information in the known label columns. We see that there are two horizons of interest which merge near the right side of the figure and also get close to each other at the left end. We train a single network to predict both horizons simultaneously, using the non-linear regression and optimization approach detailed in(Peters et al., 2018). The network design is as described earlier in this work.
Figure 4(c) displays the network output, which ideally is the true horizon everywhere convolved with the Gaussian kernel that we used to generate training label images. The training and evaluation picks are plotted on top, and validate that the network is able to predict both horizons accurately, including the point where they merge. In Figure 4(d) we show the network output prediction plotted on top of the seismic data to provide some more insight. The color-coding corresponds to the greyscale intensity of the previous figure. The colors and vertical spread indicate how ‘sure’ the network thinks it is about the prediction.
From the results, we conclude that we can train a single network to simultaneously predict the location of multiple horizons that merge and branch. The symmetric convolutional U-net variant, with the same network architecture as in the previous example, trained by a partial loss-function on a small number of known horizon x-y-z locations achieves excellent results. Data-augmentation and regularization as described in an earlier section can reduce the number of required training x-y-z picks.
In this paper, we have introduced deep neural networks from an inverse problems point of view. We have shown that the network can be considered as the “forward problem” and the training as the “inverse problem”. We have explored the connection between deep networks to other geophysical inverse problems. We believe that approaching the learning problem in this way allows us to understand better the role of data fitting, regularization, the stability of the network itself, the propagation of noise within the network, and the associated uncertainties; all topics that have received ample treatment in geophysical inverse problems.
We have demonstrated the capability of deep networks to deal with problems that arise from seismic interpretation. In our experience, neural networks can do exceptionally well for such problems given some thought about appropriate regularization and loss or misfit functions.
When solving a particular problem, it is important to realize that geophysical problems are very different from common vision problems. The availability of accurate training data is key to training the network and this can be difficult to obtain in many applications. Another important aspect is the size of the data. While vision problems are typically 2D, many geophysical problems are 3D. We believe that new algorithms should be developed to deal with the size of geophysical images as well as with the uncertainty that is an inherent part of geophysical processing.
- Alberts et al.  P. Alberts, M. Warner, and D. Lister. Artificial neural networks for simultaneous multi horizon tracking across discontinuities. In SEG Technical Program Expanded Abstracts 2000, pages 651–653, 2005. doi: 10.1190/1.1816150. URL https://library.seg.org/doi/abs/10.1190/1.1816150.
- Bottou and Bousquet  L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in neural information processing systems, pages 161–168, 2008.
- Chang et al.  B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham. Reversible architectures for arbitrarily deep residual neural networks. In AAAI Conference on AI, 2018.
- Deng et al.  J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- Goodfellow et al.  I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
- Haber  E. Haber. Computational Methods in Geophysical Electromagnetics. SIAM, Philadelphia, 2014.
- Haber and Ruthotto  E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1):014004, dec 2017. doi: 10.1088/1361-6420/aa9a90.
- Harrigan et al.  E. Harrigan, J. R. Kroh, W. A. Sandham, and T. S. Durrani. Seismic horizon picking using an artificial neural network. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 105–108 vol.3, March 1992. doi: 10.1109/ICASSP.1992.226265.
- He et al.  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.
- Huang et al.  K.-Y. Huang, C.-H. Chang, W.-S. Hsieh, S.-C. Hsieh, L. K. Wang, and F.-J. Tsai. Cellular neural network for seismic horizon picking. In 2005 9th International Workshop on Cellular Neural Networks and Their Applications, pages 219–222, May 2005. doi: 10.1109/CNNA.2005.1543200.
- Huang  K. . Huang. Hopfield neural network for seismic horizon picking. In SEG Technical Program Expanded Abstracts 1997, pages 562–565, 2005. doi: 10.1190/1.1885963. URL https://library.seg.org/doi/abs/10.1190/1.1885963.
- Krizhevsky and Hinton  A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Kusuma and Fish  T. Kusuma and B. C. Fish. Toward more robust neural‐network first break and horizon pickers. In SEG Technical Program Expanded Abstracts 1993, pages 238–241, 2005. doi: 10.1190/1.1822449. URL https://library.seg.org/doi/abs/10.1190/1.1822449.
- Leggett et al.  M. Leggett, W. A. Sandham, and T. S. Durrani. Automated 3-D Horizon Tracking and Seismic Classification Using Artificial Neural Networks, pages 31–44. Springer Netherlands, Dordrecht, 2003. ISBN 978-94-017-0271-3. doi: 10.1007/978-94-017-0271-33. URL https://doi.org/10.1007/978-94-017-0271-33.
- Liu et al.  X. Liu, P. Xue, and Y. Li. Neural network method for tracing seismic events. In SEG Technical Program Expanded Abstracts 1989, pages 716–718, 2005. doi: 10.1190/1.1889749. URL https://library.seg.org/doi/abs/10.1190/1.1889749.
- Lowell and Paton  J. Lowell and G. Paton. Application of deep learning for seismic horizon interpretation. In SEG Technical Program Expanded Abstracts 2018, pages 1976–1980, 2018. doi: 10.1190/segam2018-2998176.1. URL https://library.seg.org/doi/abs/10.1190/segam2018-2998176.1.
- Peters et al.  B. Peters, J. Granek, and E. Haber. Multi-resolution neural networks for tracking seismic horizons from few training images. arXiv preprint arXiv:1812.11092, 2018.
- Peters et al.  B. Peters, J. Granek, and E. Haber. Automatic classification of geologic units in seismic images using partially interpreted examples. arXiv preprint arXiv:1901.03786, 2019.
- Poulton  M. M. Poulton. Neural networks as an intelligence amplification tool: A review of applications. GEOPHYSICS, 67(3):979–993, 2002. doi: 10.1190/1.1484539. URL https://doi.org/10.1190/1.1484539.
- Ronneberger et al.  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, page 234–241, 2015. ISSN 1611-3349. doi: 10.1007/978-3-319-24574-428. URL http://dx.doi.org/10.1007/978-3-319-24574-428.
- Veezhinathan et al.  J. Veezhinathan, F. Kemp, and J. Threet. A hybrid of neural net and branch and bound techniques for seismic horizon tracking. In Proceedings of the 1993 ACM/SIGAPP Symposium on Applied Computing: States of the Art and Practice, SAC ’93, pages 173–178, New York, NY, USA, 1993. ACM. ISBN 0-89791-567-4. doi: 10.1145/162754.162863. URL http://doi.acm.org/10.1145/162754.162863.
- Waldeland et al.  A. U. Waldeland, A. C. Jensen, L.-J. Gelius, and A. H. S. Solberg. Convolutional neural networks for automated seismic interpretation. The Leading Edge, 37(7):529–537, 2018. doi: 10.1190/tle37070529.1. URL https://doi.org/10.1190/tle37070529.1.
- Wu and Zhang  H. Wu and B. Zhang. A deep convolutional encoder-decoder neural network in assisting seismic horizon tracking. arXiv preprint arXiv:1804.06814, 2018.
- Wu and Fomel  X. Wu and S. Fomel. Least-squares horizons with local slopes and multigrid correlations. GEOPHYSICS, 83(4):IM29–IM40, 2018. doi: 10.1190/geo2017-0830.1. URL https://doi.org/10.1190/geo2017-0830.1.
- Zhao  T. Zhao. Seismic facies classification using different deep convolutional neural networks. In SEG Technical Program Expanded Abstracts 2018, pages 2046–2050, 2018. doi: 10.1190/segam2018-2997085.1. URL https://library.seg.org/doi/abs/10.1190/segam2018-2997085.1.