1 Introduction
Seismic interpretation requires detailed understanding of seismic acquisition, processing, and data models to infer geological meaning. The process of seismic data preprocessing and migration involves geometric transformation and analysis to produce an accurate image of Earth’s subsurface. Prestack and poststack preprocessing can introduce artifacts or coherent noise into seismic data, leading to falsepositive identification of seismic structures during interpretation. Hence, consistent accurate interpretation of seismic data requires many years of experience. Despite improvement in the quality of 3D migrated seismic data, thorough interpretation of certain geologic elements remains subjective. Subjective interpretation occurs due to the presence of many ‘valid’ interpretations or weak amplitude reflections. One cause of weak amplitude reflection is the absence of bottom simulating reflectors in the subsurface during acquisition bedle2019seismic
. In addition, interpreted seismic data are considered intellectual property by energy industries. Consequently, publicly annotated data are scarce. All these challenges necessitate the need for a modelbased framework that is objective to defined constraints, requires little or no humanassisted label, and powerful enough to learn deepdiverse patterns in seismic data. Recent deep learning success stories have further motivated research for automated seismic interpretation using machine learning. But the limited label problem is a challenge to training deep learning models imported from computer vision applications where labels ramp into millions in number. Consequently, deep learning models designed for seismic applications must be trained with limited data awareness. To address these challenges, we propose a
selfsupervised learning framework that does not require any labels from interpreters. Rather, the model is trained on the physics of seismic patterns, from which homogeneous patterns are separated out using constraints. Various computerassisted frameworks have been proposed in the literature, of which we make two broad categories: attributesbased interpretation and machine learningbased interpretation. Seismic attributesbased methods rely on mathematical computations to identify distinctive patterns of seismic amplitudes. These patterns are mapped to a database of previous successful patterns for successful interpretation chopra2005seismic. Because the computation of geometric attributes can be automated, many seismic attributes have been introduced to the geoscience community taner1994seismic; barnes1992calculation; chen1997seismic; shafiq2015detection; shafiq2017salt; shafiq2018role; shafiq2018novel. However, seismic attributes are usually designed to identify specific patterns of interest to an interpreter. Hence, patterns that deviate from the specific target are unidentified. This implies a suite of complementary attributes must be selected by an interpreter interested in identifying important events. Our proposed model, however, learns patterns without previous specification bias. In addition, it contains millions of parameters which is powerful enough to learn very complex patterns that would be missed by simpler algorithms such as attributes. Early adoption of machine learning models in seismic research began with supervised methods in which the model has access to labeled data. di2018patch; wu2018convolutional; xiong2018seismic; araya2017automated; dramsch2018deep. Several notable works have also attempted to overcome the effect of limited annotated data by employing semisupervised and weakly supervised techniques alaudah2016weakly; alaudah2018structure; di2019semi; babakhin2019semi; alfarraj2019semi; wu2019faultseg3d; liu20193d; di2018deep. In these frameworks, there is less dependence on fully labeled data. In semisupervised frameworks, for instance, researchers use fewer annotations augmented with pseudolabels. Weakly supervised learning models use weaker labels like image labels only
alaudah2018learning. Weak labels are easier to generate in numerous quantities compared to pixellevel annotations required in supervised frameworks. In contrast, our method does not require any annotated labels from interpreters. This ranks it at the same ease of use as attributes, with the exception that we applied a more powerful learnable model. Another relevant body of literature explores unsupervised learning works which also addresses the limited data problem. These frameworks are mostly based on KMeans clustering and Konohen’s selforganizing maps (SOMs)
barnes2002investigation. The basic workflow includes extracting attributes from seismic data and using a dimensionreduction algorithm, mostly principal component analysis (PCA), to identify the most important features. The principal features are then clustered to a specified number of centroids. Although these methods have produced great results, the groundtruth centroids of the attributes are unknown; and manually initializing centroids does not converge to groundtruth centroids. Secondly, PCA leads to information loss and the distance metric used in clustering algorithms is usually Euclidean based, which affects the cluster accuracy. Our proposed clustering framework does not require the number of centroids to be predefined, rather our algorithm explores multiscaled, directional spectral information in the images to extract highdimensional coefficients before clustering them using a customdistance metric. Lastly, we explore other relevant literature based on their target applications. For instance, detection of faults
araya2017automated; di2019improving; di2019semi; shafiq2018novel; wu2019faultseg3d; xiong2018seismic, delineation of salt bodies, di2018multi; di2018deep; shafiq2015detection, classification of facies liu20193d; dramsch2018deep; qian2017seismic; alaudah2019machine; alaudah2019facies, prediction of rock lithology from well logs alfarraj2019semi; das2018convolutional; das2019effect and segmentation of seismic layers are few areas of relevant applications of deep learning to seismic. In these literature, various depth of labels are used to train the network. Our proposed method eliminates this challenge by learning image labels using an unsupervised framework. dubrovina2019composite propose using the latent space of an encoderdecoder architecture to split and rearrange various parts of a 3D object. The latent space is projected to summable orthogonal subspaces. Each orthogonal subspace of the latent variable retains lowlevel features of various parts of the 3D object. However, the dataset includes groundtruth labels of 3D parts, our method does not include any groundtruths for partitioned parts. li2019orthogonal add an orthogonal penalty to latent variables and show that by using SOMs on the orthogonal features, more separability on pixelspace features is realized. The methodology presented in li2019orthogonalis similar to ours in applying orthogonality to latent variables. In our selfsupervised framework, we introduced projection matrices to factorize input images; hence, eliminating the need for SOM on the features. In this work, we use the F3 block as our dataset. The F3 block is an offshore block in the Dutch sector of the North Sea. The dataset is preprocessed using a dipmedian filter to remove random noise and to enhance the edges of the seismic reflections. Seismic traces in the volume are clipped above and below 4.0 times the standard deviation of the volume. All amplitude values are normalized to the range [1, 1]. The volume is split into 120x120 overlapping patches. We propose a new hierarchical clustering model to group these patches into
classes. Four classes out of all classes are passed to a deep encoderdecoder model, augmented with two discriminators. The latent space of the encoderdecoder model is projected unto two learned subspaces. We added constraints to guide the factorization of each input patch into two images. Each synthesized image corresponds to a learned subspace. The subspaces are further constrained to be orthogonal. Though we use discriminators, our method is selfsupervised because the adversarial part of our model is used in a multitask setting aside from learning the factorized latent space, which is done unsupervised. Lastly, we conclude by evaluating our proposed method against related research. We further show that our deep encoderdecoder model learns attributes that are qualitatively better than traditional attributes for structural delineation.2 Hierarchical Clustering Framework
We propose a hierarchical clustering framework to cluster images into clusters. Where is the number of clusters. The volume is subdivided into subset blocks. Each subset block contains 15 vertical sections in the inline or crossline direction. We separate inline and crossline subset blocks. The clustering framework consists of several layers. The first layer is initialized using a densitybased algorithm: densitybased spatial clustering of applications with noise (DBSCAN) ester1996density, to produce contiguous clusters. All other layers consist of hierarchical merges of clusters in proceeding layers. Three sections are extracted from each subset block taken at five intervals apart. A set of three sections at intervals of five apart, is called a category. Hence, each subsetblock contains three categories. We do not randomize all the images collected from the volume to make them independent and identically distributed, to avoid losing domain correlation information between sections. However, within each category, we randomize the order of all images. Each category contains images and images for inline and crossline respectively. Inline categories are clustered separately from crossline categories due to differences in feature representation in both sets.
2.1 Clustering with DBSCAN
DBSCAN uses densitybased metrics for cluster discovery. Two parameters are usually defined for DBSCAN: and . is the minimum number of points that can form an independent cluster, while is the maximum distance between two core points. Given a group of points, DBSCAN determines whether a point is a core or border point. The former belongs to a cluster with at least minimum points within distance of each other. The latter is within distance from any core point but not within distance to points. At initialization, one cluster is randomly chosen from the dataset and neighboring points are checked to find whether they are core or border points. If points are core points, a cluster is established and other core points and border points distant away are added to that cluster. Points that are neither core nor border point, are considered noise. In Figure 1, we show a toy example of DBSCAN with . The red points are core points because they have neighboring points at distance away. The yellow points are border points, because they are distant from less than points, but are in the neighborhood of a core point. All core points are considered densityreachable from other core points. The blue points are considered noise because they are not distant from any point in the group.
To specify a distance metric for DBSCAN between two seismic images, we used a similarity (SIM) metric proposed by alfarraj2016content. The authors showed that SIM compares curvelet coefficients from two images. Curvelets coefficients starck2002curvelet
are multiscaled decompositions of an image obtained by tiling the frequency domain with trapezoidalshaped tiles
alfarraj2016content. Additionally, SIM outperforms many other similarity measures for texture images like SSSIM wang2004image, CWSSIM gao2011cw, and STSIM zujovic2013structural. The tiled frequencies collected at various scales (here 32) of the image in various directions, make the algorithm a good edge detector. Hence, suitable for seismic images. Singular value decomposition (SVD) is applied to the curvelet coefficients extracted from the images to select the best coefficients. The reduced coefficients are then compared using Czenkanoski’s similarity to obtain a score in the range [0, 1]
alfarraj2016content:(1) 
The returns values from for each image pair, where and are images and and are the SVD reduced curvelet coefficients. Hence, a implies perfectly similar, while
denotes perfectly dissimilar. Inline and crossline images are fed sequentially into the DBSCAN algorithm. By experimenting with several hyperparameter values, we set
, for inline images and , for crosslines, while for both inline and crossline images. Thus, for any image occurring at any location within any section, if that image is not distant to any other image in the same category, it is marked as noise. We keep small to ensure thorough discrimination between clusters. The drawback of a small is that there will be many , small clusters per category. However, this remains solvable since a hierarchical algorithm is applied to merge contiguous clusters.2.1.1 Algorithm 1
In the first step, we merge intracategory clusters. As shown in Figure 2, we arrange the clusters in order of cluster size. Each circle represents a cluster. The blue column represent one category. We select a handful of images from the smallest cluster and compare these images to remaining clusters inside the category. The smallest cluster is then merged with the most similar cluster, in this case, cluster three. The algorithm is iterated until the number of clusters reduces from to a predefined small number (e.g., ).
2.1.2 Algorithm 2
In the second step, we merge intercategory clusters. In Figure 3, the blue column is category , the yellow column is category . A few images are randomly sampled from each cluster in and compared with all clusters in . All clusters in with a onetoone mapping, in best proximity to clusters in , are merged. For instance, clusters 1, 2, and 4 in all select cluster 2 in as their closest cluster (manytoone mapping), we do not merge any of these clusters. However, we merge clusters 3to1 and NtoN. After merging similar clusters, the remaining clusters in are transferred to .
Figure 4 illustrates how the hierarchical clustering module works. Both algorithms are applied alternatively on the data until the total number of categories are reduced to and the total number of clusters inside all categories are reduced to .
Lastly, we inspect all 14 clusters. The total number of clustered images was 6386. There are eight visually unique clusters, including noise. Hence, we combine repetitive clusters. For instance, dipping amplitudes and nondipping amplitudes can be combined into the ‘horizons’ cluster. In Figure 5, Cluster 0 is labeled as ‘chaotic’ because the dominant geologic component highlighted by the clustering algorithm is the chaotic facies. Cluster 1 contains images from the salt dome region of the volume. In this cluster, it is interesting that although the patterns of the reflection amplitudes are diverse, our algorithm clusters them accurately. Cluster 2 identifies parallel reflectors, that we label as ‘horizons’. The horizons cluster contains the most number of images of all clustered images, and the clustering algorithm also attains the highest accuracy in identifying this cluster.
Cluster 3 captures faulted regions of the volume and are labeled as faults. Notice some discontinuities are labeled as faults. Cluster 4 captures another variant of the chaotic class with some nonconformity in seismic reflections. Cluster 5 contains images taken from the brightspot region of the seismic volume. Interestingly, our algorithm differentiated this class of images from the horizon class in Cluster 2. Cluster 6 highlights images from the top of the salt dome and they were clustered into a different class by our algorithm because they did not fully represent structures captured inside the salt dome. Cluster 7 shows irregular seismic reflections. Two of them contain chaotic regions while others do not fit into any of the other classes. There are several classes like cluster 7 and 5 that would not be used in training our deep learning framework, but would be left for further study in future research. Figure 5 contains the output of our clustering framework. Four clusters were selected from Figure 5 for training the deep learning model. These clusters would be referred to as classes in future references. From the result in Figure 5 Horizons (class 2), faults (class 3), salt domes (class 1) and chaotic (class 0) labels are representative of the selected images. The ‘horizon’ cluster is the largest class with 1258 images and the Chaotic class is the smallest class with 389 images. The Fault and the salt dome classes had 720 and 500 images, respectively. We discarded images clustered as noise. Hence, for balanced classes, we randomly select 389 images from each class, making a total of 1556 training images.
3 Deep Learning Framework
We have a set of clustered seismic images in four classes, each with equal number of images. The assignment of image labels has been done by our clustering module similar to the example in Figure 6a. We propose an encoderdecoder model to learn annotations as shown in Figure 6b. Clustered images were not pixel annotated, Figure 6b is only for illustration. The architecture of the proposed model is shown in Figure 7. The set of all training images is designated as . The encoder is designated: and the decoder is designated . Additionally,
is the corresponding latent vector to
. The learning parameters in both encoder and decoder are and respectively, andis the probability distribution of all seismic images over parameter
. Thus, the logprobability of the data is: . Now, is intractable because the posterior is intractable kingma2013auto. For notational convenience, we will drop the subscript on and , as all future references to these variables refers to a single data instance during iterative training or inference. We introduce two projection matrices: and . Both matrices are fully connected layers of size 1024x1024. The matrices project to orthogonal and subspaces respectively. In Figure 6b, and are mapped to the blue and red annotations in each input image. Note that we do not assume there are no images with overlapping class representations in our clusters. In one forward pass, operators and are applied to thus:(2) 
and are synthesized from the decoder, . The decoder tries to learn some properties of each seismic image and furnish such knowledge into reconstructing both images.
(3) 
We desire the reconstructed image to be as similar to the input image as possible. Note that is usually blurry in encoderdecoder networks. Hence, we use a discriminator, , to ensure is sharp. A second discriminator, , is applied to the encoder to ensure latent variables of are spread out. In the original Bayes autoencoder architecture, kingma2013auto derive the evidence lower bound for reconstructing . Because we modify the behavior of the latent space, by projecting it to orthogonal latent spaces, we rederive the evidence lower bound in the context of projecting the latent space. The decoder distribution, is intractable because the posterior is intractable. Hence, we sample from an auxillary distribution, . We drop and for notation convenience knowing and are distributions over both parameters respectively. The derivation of the evidence lower bound (ELBO) on the loglikelihood of is as follows: is loglikelihood of generating the data. Next, we introduce and and sum over their marginals:
(4)  
We sample from and find the expectation over its conditional distribution because we cannot sample from the posterior :
(5)  
Simplifying expressions in (5) in divergence terms gives:
(6)  
We can write the terms in (6) in entropy terms:
(7)  
where
is the KullbackLeibler divergence between distributions
and . We could conclude from equation 7 that: evidence reconstruction + entropy of projected latent codes  entropy of actual latent priors. In equation 7, the entropy of the encoder is and the conditional entropy of the encoder due to the projected prior is and . The entropies of the priors in the orthogonal subspaces are: and . If the conditional entropies of the the encoder after projecting the latent space matches the entropies of the projected priors: and , then the reconstructed image will be a perfect reconstruction of . This would only happen if the projected priors are perfectly sampled from the actual priors of distributions representing the foreground and background of each . Two images are reconstructed by the decoder: and . Here, we assume that the encoder learns not only to generate the latent codes, but understands that it needs to generate two latent codes that match the two virtual distributions in the decoder. Hence, it must generate one latent code to encapsulate two virtual distributions in the decoder. The decoder learns parameters such that each reconstructed seismic image has two factors inherent in them. It remains that the decoder must be constrained to understand both factors.3.1 Latent space factorization using projection matrices
A projection matrix is an square matrix that gives a vector subspace representation that is a projection of to . For any real operator to be a valid projection matrix, it must satisfy the following properties:

( is the adjoint of ).

(idempotent property).
Figure 7 shows two projection matrices and designed to be mutually orthogonal. However, we impose no constraint to make either of them orthogonal projection matrices in which case we would need for a real
. To impose a projection matrix behavior on both matrices, we create a fully connected layer in the model and initialize it using a uniform distribution. Now, both matrices are learnable and both projection and orthogonality constraints can be imposed on them by solving an optimization objective. We formulate the following projection loss function:
(8) 
Where is the projection loss added to adversarial losses. We discuss the use of discriminators and their corresponding losses below.
3.2 Adversarial Training
goodfellow2014generative
introduce Generative Adversarial Networks (GANs). A GAN sets up an adversarial minmax game between a generator
and a discriminator . At training, the generator attempts to generate an image that lies in the dataset distribution space such that the discriminator is unable to distinguish if the image is real or generated. While the discriminator gets better at detecting real and generated (fake) images. This adversarial setup constrains the generator to generate sharp images almost at the quality of the input image. GANs are superior in the quality of generated images they produce compared to basic encoderdecoder models. Images reconstructed from encoderdecoder models are usually blurry. Blurry images are due to using the mean squared error (MSE) loss between the reconstructed and original image, implying the error between both images is Gaussian noise, which is not in the case in seismic images zhao2017towards. For notation purpose, our decoder would pass as a generator in this section, while our encoder will be represented as in adversarial setting. The two discriminators specified in Figure 7 retain their notations as: and . ensures a sharp reconstruction, , while helps in the spread of latent variables, . As seen in Figure 7, . We introduce an loss between to enforce structural difference:(9) 
Where is the penalty. Since discriminator takes as input, the adversarial loss further ensure is similar to . The adversarial loss for becomes:
(10)  
Similarly, we define an adversarial loss on the latent space . Training a GAN is very challenging. One experimental problem with training GANs is the problem of mode–collapse. Mode–collapse occurs when reconstructs the same output image for varying latent variable . In which case, converges to a local minimum of one or few images without generalizing properly on the whole data distribution. To solve this problem, we apply discriminator on . helps to impose a uniform distribution on and to flatten out the mixed Gaussian space of .
(11) 
Equation 11 defines the loss on
, to flatten the mode of the Gaussian distribution in the latent space. Thus, the modecollapse problem in the latent space is diminished. Equations
10 and 11 are defined as respectively.3.3 Reconstruction Loss
In equation 9, we impose a structural difference loss between and . We need a reconstruction loss to ensure both images do not become a trivial solution while minimizing in equation 9. A trivial solution to could be , a matrix of all 1s, and , a matrix of all 0s. To avoid this, we define a reconstruction loss on as follows:
(12) 
But . Hence,
(13) 
Equation 13 ensures both and are valid seismic images and ensures a cycle consistency with the input image. Lastly, the composite model is trained with losses (813
) over 300 epochs. After each forward pass, we backpropagate
to update , , , . is backpropagated to update models and . While is backpropagated to update models and . Finally, and losses are used to update and simultaneously.4 Results
(a) Chaotic Region Delineation  (b) Salt Region Delineation 
(c) Horizon Lines Delineation  (d) Fault Region Delineation 
We show qualitative results on the trained images and generalize our predictions on the output images to make predictions on full vertical seismic sections. In addition, we show how our method can be applied to attribute extraction. In our attribute extraction process, we analyze two complementary attributes and compare them to six existing attributes in literature. First, each output image from the entire workflow has both an assigned class and a pixel label corresponding to a region of interest. In Figure 8, the input image obtained from our clustering module is passed to the latent space model to obtain and . In each subfigure of Figure 8, and are synthesized by the model to be structurally complementary to each other such that their combination reconstructs . And the norm of their difference reveals a geologically meaningful region. In each subfigure, we show sparse inference regions using confidence values. Each pixel value is a probability value in , where is the probability a pixel is not part of a geological class and is otherwise. For improved performance, we thresholdout probability values below a threshold to obtain a cleaner binary image at the bottom right. An evidence to show that both and are factorized from , as we hypothesized, is to observe their meansquared error (MSE) against input image . In all four subfigures, both synthesized images (, ), have higher MSEs than the reconstruction . This implies that both images are not as similar to the input image as the reconstructed one, because they are factorized components of the input image. Further evidence is observed in in Figures 8(a), 8(c), 8(d) and in Figure 8(b). In all these instances, the model generates an image that is unlike any image in our training set. The orthogonal counterpart of these images are similar to our input images with more contrast. Hence, and are equivalent representation of the orthogonal subspaces to which their latent subsidiaries were projected. The image label and pixel annotation obtained in Figure 8 can be generalized to label sections of the F3 block to assist with seismic interpretation. We trained a segmentation model  DeepLab V3+ chen2017deeplab on all output images from our proposed pipeline and applied a softmax classification layer to make a prediction on each reflection amplitude to any of the predefined four classes. Furthermore, we ran another experiment to determine whether each amplitude in the seismic section belongs to a structure class. The later experiment fits into an attribute extraction framework to be discussed later. In all predicted pixel labels, we threshold out small probability values to improve our confidence on the pixel labels. Figure 9 show four levels of threshold values: and .
As shown in Figure 9, images with a cutoff of performed best on structural classification of the seismic amplitudes when trained on DeepLab V3+ model. Hence, we report results based on a threshold of . Note that the colorbars in Figure 9 do not reflect the thresholds. The thresholds are applied to the output images of the latent space model which are then used to train the segmentation network. Figure 10, shows the corresponding results for a test section at inline 350 of the F3 block. Figure 10a identifies parallel reflectors with dipping amplitude reflections. In Figure 10b, an approximate blurb is shown on the chaotic sedimentary strip of the section. The chaotic region to the right is not as well delineated as the region on the left. A possible explanation for this observation is that most of the images clustered into the chaotic class were taken from regions on the left of the F3 block. If the chaotic facies on the right differs from the one on the left, then this partial delineation is a derived anomaly that may be corrected by including images from the right chaotic region into our training set. However, since our method of model training is selfsupervised, we report our result as is. Figure 10c identifies the region containing fault structures on the left of the salt dome in the highlighted section. The highlighted region on the right of the salt dome is a false positive. However, the left identified region is delineated correctly. Note that the model delineates regions with faults without tracking the fault lines. Delineating regions of faults helps interpreters locate fault regions from which manual tracking of the faults may be done. Although salt domes are challenging to delineate, we show that our method identifies the salt dome boundary with significant accuracy in Figure 10d. Other nonsalt regions to the left and right of the salt are lightly highlighted. They are also false positives but we can dismiss them due to low probability values () in those regions.
We extended our sectionbased structural delineation in Figure 10 to 3D volume factorization of the F3 block. Figure 11
shows four facies in four separate classification instances in the F3 block. The delineated facies are marked using a white arrow. Evidence that factorization of the volume occurs can be seen by the relative absence of other structures apart from the classified one. For instance in Figure
11a, only parallel reflectors from the horizon class are shown while the salt dome, fault region, and chaotic facies are factorized out. In Figure 11b, the chaotic facies is delineated and the parallel reflectors shown in Figure 11b are absent. In fault regions shown in Figure 11c, the delineated region the right is a false positive corresponding to the false positive identified region on the right of the salt dome in Figure 10. However, the left white arrows identify the fault region. Note that in Figure 11d, the salt dome is delineated, but the horizons and chaotic regions are factorized out. The 3D delineation applies mostly to the boundary of the salt dome structure and corresponds to the regions delineated in Figure 10d.4.1 Attribute Extraction
We further demonstrate that our deep learning model can factorize the F3 block into foreground and background attributes, each acting as features to guide the seismic interpreter to make informed decisions. The DeepLab segmentation model is modified to classify each amplitude in a section into two classes: foreground and background. Hence, every amplitude is mapped to a binary class. For each section, we compare the two predicted classes to six attributes from the literature for qualitative assessment. Gradient of texture (GoT) is a recent seismic attribute based on the gradients of seismic amplitudes in a moving window. In the 3D version, the gradients in the x, y and z directions were measured and combined into one value per pixel. The 2D version calculates gradients only in the x, y coordinates. The GoT attribute has been applied to delineate the boundaries of salt domes in 2D sections and 3D seismic volumes shafiq2015detection. We compared our proposed attributes to 2D and 3D GoT.
Phase congruency shafiq2017salt is another recent seismic attribute that improved on GoT. Phase congruency is a dimensionless quantity and it is unaffected by changes in image illumination and contrast, unlike GoT. In addition, we compared our method against a saliencybased seismic attribute shafiq2018role
, which view the features in the seismic volume in a perception analogous to human vision. The component features of saliency attributes consist of fast Fourier Transform (FFT) coefficients. Similar to 3D GoT, saliencybased attributes can be computed in 2D and 3D variants. Lastly, we compare against graylevel cooccurrence matrices (GLCM) texture attributes
chopra2006applications and instantaneous amplitude white1991properties attributes. Figures 12g and 12h are our proposed attributes, while Figures 12a  12f are attributes we compared against. Figure 12g is the proposed background attribute and Figure 12h is the proposed foreground attribute. GoT and Phase congruency, in Figure 12a and 12c, show better delineation of parallel reflectors than our foreground attribute shown in Figure 12h. However, our method shows a less noisy delineation of the parallel reflectors and highlights the most important ones. Saliency, instantaneous amplitude and 2DGoT in Figure 12b, 12e, and 12f delineates mostly the salt boundary and the fault regions  which implies our framework performs better in delineating parallel reflectors. GLCM at Figure 12d performed least in delineating structural attributes among all eight attributes compared.4.2 Comparison with a Machine Learning Framework by alaudah2018structure
Our deep learning framework could also be applied to solve a labelmapping problem similar to the problem solved by alaudah2018structure. The author applied a nonnegative matrix factorization (NNMF) algorithm to predict pixel labels from image labels. The NNMF factorizes an input matrix into features and label assignments:
where is a matrix with images as column entries. is a features vector and gives the assignment of features in into their respective classes. The features learned during factorization were mapped to corresponding images to delineate geological structures. In Figure 13, we attempt to label pixels by mapping image labels learned from our clustering framework to pixel predictions made by our deep learning model and we compare the result with alaudah2018structure’s framework.
The author used four classes: chaotic, salt dome, faults, and others, where the latter include images that do not belong to the previous three classes. Two major differences between our framework and the author’s framework are as follows: in the latter, an interpreter assigns image labels to a few examples, uses an image retrieval technique to obtain labels for the remaining images which is then applied to the weaklysupervised framework. Secondly,
alaudah2018structure’s method is not based on a deep learning framework. In contrast, we do not use image labels and we train a deep learning frame for pixel and image labels. Figure 13 shows four classes of images. Note that in labeling pixels in all the classes, the accuracy of our framework is highest along edges while alaudah2018structure’s labeling is more region oriented. For instance, our framework assigns the highest confidence values of delineated geologic features to edges of peakstotroughs in all four classes. This is because we used an norm sparsity loss in generating the probability maps for predicting structures. In the chaotic features delineated, the performance of both frameworks is close. Because alaudah2018structure’s method was not trained on the horizon’s class specifically, the delineation of structures in this class of images was left out by the algorithm. However, in the same class, we are able to delineate lines of parallel reflectors. alaudah2018structure’s labeling captured the fault region more elegantly than ours, but our method highlights the salt regions elaborately than alaudah2018structure’s method which mostly labeled the edges or boundary of the salt structure. Salt imaging is challenging due to the rugose structure of salt domes, leading to complex amplitude patterns in salt facies. These complex patterns can conflict with other structural patterns in the feature factorization of alaudah2018structure’s framework leading to relatively poorer performance. Our method focuses attempts to separate interesting features from the surrounding background, hence it captures complex patterns better than the previous framework.5 Conclusion
We proposed two frameworks based on hierarchical clustering and a deep adversarial model. We showed that our hierarchical clustering model can be used for unsupervised clustering of seismic images. We also showed that our selfsupervised deep learning framework can be used for selfsupervised segmentation of geological structures using orthogonal latent space projection. One limitation of our selfsupervised model is that the delineated structures are partial, which leaves room for improvement. Furthermore, the hierarchical clustering part and the selfsupervised part are separate modules. In future work, we could create a comprehensive framework that would not need a preclustering module. The adversarial training method could also be improved for more stable training. Lastly, the latent space factorization methodology could be extended to multiple geological component delineations in each image compared to the current foreground and backgroundbased methodology.
6 Appendix  A Proof of Equation 7
Here, we provide detailed proof of equation 7. Let be our input data of images and . We can assume independence on each . Let the distribution of be . Assume there exists a decoder/generator from which we can generate . Let the distribution of this generator over some learned parameter , be . The loglikelihood of generating . We introduce an auxiliary distribution over another learning parameter , to approximate . We introduce because to reconstruct each from the likelihood function, we must know the underlying distribution. We can rewrite the loglikelihood as:
(14)  
(15) 
but the posterior is intractable, hence we use an auxiliary distribution,
, to estimate
. Note that we introduce a prior into equation 15. From an encoderdecoder framework, the prior is the latent space variable passed into the decoder/generator. We know the distribution ofis Gaussian with zero mean, unit variance due to batch normalization at the end of the encoder. Hence, we rewrite equation
15 as:(16) 
we can condition the likelihood on without loss of generality:
(17) 
Next, we introduce , into (17) follows:
(18)  
Simplifying,
(19)  
Applying Jensen’s inequality,
(20)  
Now we can write each term in their equivalence.
(21)  
We can rearrange equation 21 in entropy terms:
(22)  
Further rearranging, we arrive at the same form as equation 7:
(23)  
This proves equation (7).
Comments
There are no comments yet.