Multi-resolution neural networks for tracking seismic horizons from few training images

by   Bas Peters, et al.

Detecting a specific horizon in seismic images is a valuable tool for geological interpretation. Because hand-picking the locations of the horizon is a time-consuming process, automated computational methods were developed starting three decades ago. Older techniques for such picking include interpolation of control points however, in recent years neural networks have been used for this task. Until now, most networks trained on small patches from larger images. This limits the networks ability to learn from large-scale geologic structures. Moreover, currently available networks and training strategies require label patches that have full and continuous annotations, which are also time-consuming to generate. We propose a projected loss-function for training convolutional networks with a multi-resolution structure, including variants of the U-net. Our networks learn from a small number of large seismic images without creating patches. The projected loss-function enables training on labels with just a few annotated pixels and has no issue with the other unknown label pixels. Training uses all data without reserving some for validation. Only the labels are split into training/testing. Contrary to other work on horizon tracking, we train the network to perform non-linear regression, and not classification. As such, we propose labels as the convolution of a Gaussian kernel and the known horizon locations that indicate uncertainty in the labels. The network output is the probability of the horizon location. We demonstrate the proposed computational ingredients on two different datasets, for horizon extrapolation and interpolation. We show that the predictions of our methodology are accurate even in areas far from known horizon locations because our learning strategy exploits all data in large seismic images.



page 5

page 6

page 13

page 15

page 16

page 17

page 20

page 21


Automatic classification of geologic units in seismic images using partially interpreted examples

Geologic interpretation of large seismic stacked or migrated seismic ima...

Seismic horizon detection with neural networks

Over the last few years, Convolutional Neural Networks (CNNs) were succe...

Neural-networks for geophysicists and their application to seismic data interpretation

Neural-networks have seen a surge of interest for the interpretation of ...

Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing

Single image dehazing is the ill-posed two-dimensional signal reconstruc...

Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural Network

Automated pavement crack detection is a challenging task that has been r...

Cluster-to-Conquer: A Framework for End-to-End Multi-Instance Learning for Whole Slide Image Classification

In recent years, the availability of digitized Whole Slide Images (WSIs)...

On Choosing Training and Testing Data for Supervised Algorithms in Ground Penetrating Radar Data for Buried Threat Detection

Ground penetrating radar (GPR) is one of the most popular and successful...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Geologic horizons are interfaces between two rock types with distinct petrophysical properties. These horizons are of great importance for understanding the geology and targeting resources such as hydrocarbons and water. Seismic imaging of the subsurface is the method of choice to obtain high-resolution images, from shallow to relatively large depths (see Figure 1 for an example).

Seismic data is collected as a function of a shot and a recording location. The raw seismic data can be converted to images with the vertical axes representing either depth or time. In this work, we assume that the raw seismic data is already converted into such images. Traditionally, these seismic images are then manually interpreted by experts to identify and interpret the horizons. Hand picking the horizons in large seismic cross-sections or 3D volumes can be very time-consuming, especially where the resolution of the seismic image is sub-optimal or the geology is more complex.

1.1 Previous Work & Related Problems

Seismic attributes (e.g., pre-processed seismic images such as coherence, slope, energy, or dip) and automatic horizon tracking algorithms help seismic interpreters by tracking the horizon based on a few hand-picked reference locations, however, in more challenging environments can produce poor results and require added user supervision.

Automatic horizon detection algorithms come in various flavors for horizon interpretation. Neural network based methods have a long history for these applications. Early works Harrigan et al. (1992); Veezhinathan et al. (1993); Liu et al. (2005); Huang (2005); Huang et al. (2005); Kusuma and Fish (2005); Alberts et al. (2000)

use multi-layer perceptron or recurrent networks of a few layers. Neither the quantity or quality of data, nor the computing power used in these early works were comparable with today’s standards. Some of the earlier references were limited to work with one or a few time-recordings (traces) at a time, thereby limiting the spatial information the networks can exploit.

Wu and Zhang (2018); Zhao (2018) use a convolutional auto-encoder to perform segmentation of seismic images into a few regions. They pose the segmentation problem as a classification task where the horizons delineate the boundary between the class regions. Their training data is randomly selected out of a seismic volume and is, therefore, an example of interpolation of horizon locations. Di (2018)

proposes to train on a large number of small annotated patches, for classifying seismic data volumes as an integrated geologic interpretation. A key difference from our work is that we work with the largest images practically possible, such that we can exploit spatial information over long distances to help predictions. A comparison by

Zhao (2018) confirms training image-to-image leads to better predictions compared to predicting the class of the central pixel from a small patch.

There are also many algorithms for seismic horizon tracking that do not employ neural networks. These often require data pre-processing, or detect all horizons in a seismic image rather than one specific interface (Kusuma and Fish, 2005; Li et al., 2012). Wu and Fomel (2018) propose a method that uses information over multiple length-scales on coarsened computational grids.

Our goal of interpolating or extrapolating a specific seismic horizon is different from the related problems of salt-body (Waldeland et al., 2018; Shi et al., 2018), fault (Tingdahl and De Rooij, 2005; Araya-Polo et al., 2017), chimney detection (Meldahl et al., 2001), or multiple features (Alaudah et al., 2018). For these applications, binary classification is the most common formulation: either pixels are the target of interest (i.e., salt) or they are not. Horizon detection presents a different challenge because every seismic image will contain many horizons, but we are typically interested in a small specific subset. We thus not only need to learn how to detect a horizon but also characteristics (thickness, amplitude, position in the stratigraphic sequence, depth, curvature) which help to uniquely identify it from other linear features in a seismic image.

1.2 New contributions

We provide a new approach to the horizon detection problem in seismic images. First, given the multiscale nature of seismic data, we employ a recently proposed network architecture (Ronneberger et al., 2015)

which has been shown to produce best-in-class performance for image segmentation in other fields such as medical imaging. Second, in contrast to the majority of datasets using deep learning for image recognition

(Deng et al., 2009; Krizhevsky and Hinton, 2009), our dataset consists of a relatively small set of large images. To facilitate learning in such conditions, we introduce a partial loss function that enables training on partially labeled horizons. Our partial loss is different from methods that extract a small patch/cube around a label point (also known as a seed interpretation, Meldahl et al. (2005)) and classify the data patch by patch, yielding one classified pixel at a time. The partial loss enables us to train on sparse labels directly, without extracting a patch around the label point.

Contrary to most work based on neural networks, we do not frame our problem as a classification task. Instead, we formulate non-linear regression problems where the label image values correspond to the probability of a horizon being at that depth for a given location. This is a convenient way to include uncertainty information on the horizon labels explicitly. The network output is therefore also an image that naturally conveys the uncertainty in the horizon depth estimates. Note that a classification approach provides the probability map of a class, which corresponds to the probability of a geologic rock type at each location in the image. The horizon location follows from such information as the points where the maximum class probability changes from one class to another, however, this does not directly provide the probability of the horizon at each location.

Because we train on large images, there is no need to create small patches. We thereby avoid manual user input on the window/patch size which would impact the results, as well as any artifacts resulting from a tiled solution to the problem. The dataset used consists of seismic images, and the algorithm is trained without the use of any other attribute images, wavelet information, or pre-processing that earlier work on horizon tracking used as supplemental input, see e.g. Meldahl et al. (2001); Poulton (2002); Leggett et al. (2003); Huang et al. (2005); Alberts et al. (2000).

Finally, because our approach uses regression to train for the depth of the horizon, special consideration is necessary when preparing labels for the problem. We introduce a novel parametrization of the training label information which lends itself to a more transparent handling of uncertainty information and a probabilistic interpretation of the predicted results. Due to the sparsity of the horizon labels, the resulting training set can be very unbalanced. We handle this problem by re-balancing the training set at each iteration via per-class random sampling and demonstrate the importance of this step for the result.

1.3 Application to field data

We validate the proposed computational methods, loss function and learning strategy using seismic images from sedimentary areas in the North Sea and the Sea of Ireland. We demonstrate the effectiveness of our network architecture and new partial loss function, as well as investigate the difference between alternate problem setups, including interpolation versus extrapolation, and in-line versus cross-line predictions.

Figure 1: A seismic image where red dots indicate the x-y-z locations of a horizon of interest. This image is a small part of the size we use for training.

2 Label preparation and handling

The raw label data are x-y-z coordinates of the location of the horizon of interest. We can directly plot the x-y-z coordinates in an image, by assigning the number to non-horizon locations and the number to horizon locations. We found training and predictions from this type of training labels rather ineffective, and it does not include valuable information about the uncertainty of the horizon picks.

The horizon picks are either hand-picked or obtained using an automatic horizon tracker with some human assistance and quality control. The selected x-y-z locations are therefore not completely accurate. Another source of label errors is the seismic image itself, from which the labels are generated. The quality of the seismic image decreases if there is noise in the data, or if the geology violates the assumptions on the migration method that generated the seismic image from raw seismic data. A common assumption is that the geology that is almost laterally invariant, i.e., slowly varying in the horizontal direction. More advanced imaging algorithms (e.g., reverse time-migration) assume a background velocity model that is approximately a smoothed version of the true velocity model. Violations of the assumptions result in parts of the seismic image becoming blurred, and continuous layers are broken up. The exact location of the horizon is ambiguous in these situations.

To reflect the uncertainty in the provided horizon labels, we add information about the uncertainty as follows: we convolve the horizon locations with a normalized Gaussian kernel. The resulting values are probabilities of the horizon location. The provided x-y-z location has the highest value, and the probability of a horizon tapers off as the distance from the x-y-z pick increases. In Figure 1(b) we show an example of a data image and label for the case we are given a horizon and need to extrapolate it. These images are of the size that we use for training.

Figure 2: a) A single data image. b) A single label image for horizon extrapolation. The part of the label image that is on the left of the vertical magenta line is used for training. The part that is right of the vertical line is for evaluation only. The label is the convolution of a Gaussian kernel with the horizon x-y-z locations.

3 Network design

A key component in the prediction of an interface is the network architecture. Most work done in the past uses very few layers for the prediction. It has been shown that for many vision applications such networks can have a limited power of prediction. Recent architectures are based on deep networks that can contain tens if not hundreds of layers. One such stable design is the residual network (ResNet, He et al. (2015)) which can be written as


Here, are the layers in the network, are the parameters of each layer to be learned from the data and

is a nonlinear function that consists of a hyperbolic tangent or a rectified linear unit (ReLu). The transformation

is used to increase or decrease the number of channels of the network. For our problem, we start with a single channel (the seismic data) and open the network to a few tens of channels.

While ResNets have been very successful for image classification, they tend to be less accurate for segmentation problems. The main problem is scale; convolution is a local operator and therefore the network can have difficulties to learn features that span a number of scale-lengths.

In order to resolve this problem, we have used a U-Net Ronneberger et al. (2015) structure. U-Nets are similar to auto-encoders as they restrict (that is, downsample the image) as they go deeper. The network has two “arms”. In the down-scale arm, equation 1 is used, with a small modification


Here is a restriction operator that down sample the image using a full weighting (Briggs et al., 2000).

Let be the image sampled on the lowest resolution. In the second arm of the network the image is up-sampled to its original size, that is, the image is interpolated starting with by the equation:


Here, is the transpose of the restriction operator. In order to obtain symmetry for the two branches of the net, we choose the parameters of the up-going net to be the adjoints of the down going ones. In particular, we use the transpose of the convolutions of the weights that are down-going.

The combination of low-resolution and high-resolution features allow the network to communicate between different scales, which is crucial for our application where reflectors have both local and global features.

Layer # Feature size # of channels kernel size
1 4
2 4
3 4
4 6
5 6
6 6
7 8
8 8
9 8
10 12
11 12
12 12
13 16
14 16
15 16
16 24
17 24
18 32
19 32
Table 1: The down-scale arm of the U-net used for Sea of Ireland data. The up-sampled arm is the same in reverse order, and resuses the weights of the down-scale arm, resulting in parameters. The network output has the same size as the input image, and represents the probability of the horizon of interest.

4 Partial loss function

Consider a network

that maps from (vectorized) images

of size to images of the same size. The network weights are convolutional kernels, biases and a linear classifier

(in classification settings this would be a matrix). The last layer of the network reduces a tensor

to . The final network output is thus given by . We learn a single classifier that acts on every pixel of the image.

The least-squares loss is defined as


where is a vectorized label image. This is a separable function, so we can compute a partial loss over a selection of pixels as


where is the set of pixel indices where we have labels. Note that this is subsampling of the prediction, , which requires a full forward-pass through the network. The gradient computation uses the loss at the points in only.

Another interpretation of the partial loss that is more common in geophysical literature is in terms of a projection. Define as a projection matrix that projects onto the points in , i.e.,

contains a subset of the rows of the identity matrix. We can then write the partial loss in equation

5 as


where are the partial labels. In this work we use the norm, which is separable as well. The partial, or projected loss is defined as


The partial loss function enables us to train on partially known labels, as long as we know which pixels they are associated with, without labeling the whole seismic volume.

4.1 Stochastic optimization using a projected loss function

Many neural network training strategies for classification of datasets that contain a large number of small (

) images use random mini-batch stochastic gradient descent (SGD). At each iteration of SGD, the algorithm computes a gradient based on a small number of images and labels. For our applications, we typically only have access to a small number (

) of large images/labels (

), sometimes even only a single image. If we were to compute a gradient based on a single image/label, there is only a single gradient and no stochastic effects. It has long been observed that full gradient methods are not competitive to randomized and stochastic gradient-based optimization algorithms for non-convex optimization in machine learning, particularly neural networks

(Bottou and Bousquet, 2008). The subsampling of the image and label pixels as proposed in the previous section provides us with a stochastic optimization algorithm by using a random subset of the points in at each iteration.

4.2 Re-balancing for the projected loss

Seismic horizon detection problems have labels where most pixels have a value equal to , which means there is no probability the horizon is located at that pixel. In each column, there are only a few entries that have a non-zero label value. This imbalance (about times more zero labels than non-zero in our numerical examples) can lead to slower training and low-quality predictions. To mitigate these issues, we apply binary re-balancing and use an equal number of zero and non-zero pixel values.

In a randomized stochastic optimization algorithm, at each iteration we draw randomly selected samples out of the set of known label pixels . Binary re-balancing means there are samples that have a label value equal to zero - denoted by the set - and samples that correspond to a non-zero label value - denoted by the set . The union of the two subsets is .

We summarize the stochastic optimization algorithm for training neural networks using a partial loss function in combination with binary sample re-balancing in Algorithm 1. The numerical examples show that balancing of zero and non-zero labels result in better predictions.

//data images
//label images
//known label pixels in each image
//number of samples per iteration
//set learning rate
for  do
       for  do
             //draw random data, label image, known label indices
             //draw at random zero label indices from
             //draw at random non-zero label indices from
             , //partial loss and gradient
             //update parameters
       end for
end for
Algorithm 1 Stochastic gradient descent based training algorithm with partial labels and binary random sample re-balancing.

5 Field example of horizon tracking using neural networks

Our data consists of seismic images that are models of the reflectivity of the Earth. The amplitude in the data relates to the elastic impedance contrast between the geological layers. The raw data have been processed into a large 3D model. We work with 2D slices. The labels are a combination of hand picking and algorithm assisted horizon tracking. Both data and picks were previously generated as part of a commercial exploration project by an external company.

We present results for extrapolation by in-line continuation, as well as interpolation from scattered horizon picks. The results also indicate the effect of balancing the number of zero and non-zero label values in each random batch.

We use the same network design for both examples and train the two networks using the projected -loss as defined in equation 7. The initialization of the network parameters is random.

5.1 Horizon interpolation

Hand-picking horizon locations is a time-consuming task. Many interpreted horizons have sparse spatial sampling as a result. In this case, we want to interpolate the picks to obtain continuous horizon surfaces, as shown in Figure 3. To be able to train on just a few labeled points in large images, we need a loss function that measures the loss at the labeled points only, but not at the other parts of the image. For a seismic horizon image, this means that we compute the loss based on the columns that have a horizon label (Figure 3(b)). In each of these columns, there is one horizon location; the other column entries are labels that indicate there is no horizon. The columns where we do not have any labeled information are excluded from training by the projected loss-function as defined in equation 7; the network trains on all seismic data but only part of the label images.

Figure 3: Top view of horizon interpolation from scattered label points.

The training data (Figure 3(a) ) are full 2D slices of size of pixels, without windowing or splitting into patches. The label images are only known at on average nine random locations per slice, provided by an industrial partner. We convolve the horizon location with a Gaussian kernel (in the vertical direction only) to assign an uncertainty to the hand-picked location. All other entries in the same column have a value equal to zero, which indicates the horizon does not occur at that location, see Figure 3(b).

Figure 4: The data is the input for the network, the label is used in combination with the projected loss function that only computes the loss and gradient in the columns that have non-zero values. The non-zero label values are a Gaussian kernels centered at the provided horizon x-y-z locations - white areas do not have associated label values.

Training starts with epochs and a learning rate of . Every iteration of each epoch uses a single data and label image. Out of the approximately known label pixels, we randomly select samples per iteration. As a result, not all label pixels are shown to the network during each epoch. We distribute the samples between zero and non-zero values equally. Note that the Gaussian kernel that we convolve with the horizon x-y-z locations has a width of pixels, so there are on average non-zero label values per image. Training continues with another epochs and the learning rate is reduced by a factor of ten. The third and last training stage is epochs where we again reduce the learning rate by a factor of ten.

Figure 4(b) displays the prediction for two slices. Figure 5(b) shows the same information using color-coding for the predicted probability and overlaid on the data. The zoomed version in Figure 7 shows more details.

Figure 5: Predicted likelihood of the interface for two different slices.
Figure 6: Predicted likelihood of the interface for two different slices overlaid on the seismic data.

The results in Figures 4(b), 5(b), and 7 show excellent predictions. The network output displays the probability of a horizon directly and no additional post-processing was applied. The average of nine picks per slice is not a lower limit or recommended number. Getting good predictions using fewer picks is possible. We point out that we could train more to reduce the validation loss, see Figure 8. We also did not use any data-augmentation, which could benefit the training in the case of fewer label points.

With regards to the balancing procedure outlined in an earlier section, the loss function logs in Figure 8 clearly show that not balancing the number of zero and non-zero label points during each SGD iteration leads to a worse validation loss. Note that contrary to many works on horizon tracking using neural networks based on classification, our non-linear regression strategy does not have a prediction accuracy. Figure 9 shows a prediction from training without balancing, which is not close to the desired output in any way.

Figure 7: Zoomed version of Figure 5(b).
Figure 8: Training and evaluation loss for training with and without re-balancing the number of zero and non-zero label samples. The training loss is much smaller without re-balancing because most of the label samples have a value equal to zero. The evaluation loss is measured over all available labels and it is evident that re-balancing leads to better evaluation performance.
Figure 9: Prediction when the network is trained without re-balancing the number of zero and non-zero label values.

5.2 Horizon extrapolation

Points that indicate the x-y-z locations of a horizon are also called horizon picks. Given a collection of picks in an area, we can try to extrapolate the horizon away from the known locations. Much historical industrial work produced large quantities of horizon picks that we can use for training. A potential challenge is that the extrapolation can be in areas with different geology than where the training picks are.

There are multiple types of extrapolation, two of which are shown in Figure 10. Perhaps most similar to standard classifications or segmentation tasks on data sets containing many small images (e.g., MNIST, CIFAR), is to train on one set of images, then apply the trained network and classifier on another test set of images. We call this line-by-line or slice-by-slice learning. A slice refers to a 2D slice from a 3D tensor. The second strategy extrapolates a horizon in-line. The training procedure sees the full data (seismic image), but the label is only partially known.

Figure 10: Plan view of two different types of horizon extrapolation.

5.2.1 Line-by-line versus in-line extrapolation

We provide some insight about which of the two types of horizon extrapolation is preferable. First of all, it is important to realize that the interpretation of seismic images is different from other problems, such as segmentation of images from video for self-driving vehicles. That application has pre-recorded video/images along with segmented labels available for training. The testing data arrives in real-time and segmentation needs to happen in a short amount of time. In our case, the complete seismic 3D volume is available at the time of training. It is only the labels that are incomplete. Therefore, we would like to use all training data, together with the labels corresponding to a part of the training data. In-line extrapolation keeps a number of slices separately for testing, so the network never has access to those seismic images. Contrary, in-line extrapolation trains on the full seismic slices, but sees only part of the labels, see Figure 1(b). Because we will use a deep neural network with multiple convolutional layers and subsampling/upsampling stages, the data from the area without labels will influence the prediction in the area where we do have labels. For this reason, in-line extrapolation has the capability to utilize all data, and we focus on this method in the remainder of this paper.

For training, we use just images of size pixels. There are three training stages. We start with epochs and a learning rate of , followed by epochs with the learning rate reduced by a factor of ten. The last stage is another epochs where we reduce the learning rate by another factor of ten.

Figure 1(b) shows an example of the labels and data. The evaluation part of each data image is about , which is the extrapolation distance of interest to an industrial partner. In Figure 10(b) we display the predictions for two slices. The prediction on the right of the vertical line shows that we generally predict a continuous line, but it is difficult to see how accurate the prediction is. By color coding the predicted likelihood of the horizon in Figure 11(b), we see that the incorrectly highlighted areas have a much lower probability than the correct horizon locations. We also observe that our prediction on the training part is almost perfect. Figure 12(b) shows a zoomed-in section that better illustrates the relation between predicted probability and the seismic image.

Figure 11: Predictions for two slices on both training and evaluation data. The part of the figures left of the vertical line is the prediction on training data, on the right is the prediction on evaluation data.
Figure 12: Color-coded predictions overlaid on the seismic data for two slices of both training and evaluation data. Yellow colors indicate a high probability of an interface, purple corresponds to low probability. The true horizon location is indicated by a red dotted line. The part of the figures left of the vertical line is the prediction on training data, on the right is the prediction on evaluation data.
Figure 13: Zoomed section of Figure 11(b). Figure shows predicted probabilities, maximum predicted probability and the horizon locations that are hand-picked by an industrial partner.

6 Conclusions

In this work we provided a new look at the problem of detecting horizons in seismic images using neural networks. Specifically, we addressed extrapolation away from previously interpreted horizons, as well as the interpolation of a small number of scattered hand-picked horizon locations. The proposed networks, loss function, and learning strategies to overcome issues that limit the success of automatic interpretation using neural networks. We employ deep networks with a multi-resolution structure to train on a small number of large seismic images that take large-scale geological structures into account, in the sense that information propagates over long distances on multiple scales. This is not directly possible using standard network-based learning methods that train on small image patches. We proposed a projected loss function that enables training on label images with only a few annotated pixels. Generating such labels is easier and faster than working with conventional label images that need complete labeling of a full image or patch. The standard practice of splitting data and labels into training and test sets is no longer necessary when we train with the projected loss. In seismic imaging, we have access to all data during training. It is the labels that are incomplete. Our networks train on all available seismic images, and we compute the loss and gradient based on a small number of known label pixels. The data in areas without corresponding labels is still seen by the network, and because the network has multiple layers on multiple resolutions, the information influences the predictions and misfit at locations where we do have horizon picks. Application of the proposed network, loss function, and learning strategy to horizon extrapolation and interpolation showed that our methods provide accurate predictions and uncertainty estimates both close and farther from known horizon locations. Our experiments so far were restricted to sedimentary geological settings in the North Sea and Sea or Ireland. The proposed methods make automatic horizon detection possible using fewer horizon picks and take all available seismic data into account.