Stacked autoencoders based machine learning for noise reduction and signal reconstruction in geophysical data

07/07/2019 ∙ by Debjani Bhowick, et al. ∙ 0

Autoencoders are neural network formulations where the input and output of the network are identical and the goal is to identify the hidden representation in the provided datasets. Generally, autoencoders project the data nonlinearly onto a lower dimensional hidden space, where the important features get highlighted and interpretation of the data becomes easier. Recent studies have shown that even in the presence of noise in the input data, autoencoders can be trained to reconstruct the noisefree component of the data from the reduced-dimensional hidden space. In this paper, we explore the application of autoencoders within the scope of denoising geophysical datasets using a data-driven methodology. The autoencoder formulation is discussed, and a stacked variant of deep autoencoders is proposed. The proposed method involves locally training the weights first using basic autoencoders, each comprising a single hidden layer. Using these initialized weights as starting points in the optimization model, the full autoencoder network is then trained in the second step. The applicability of denoising autoencoders has been demonstrated on a basic mathematical example and several geophysical examples. For all the cases, autoencoders are found to significantly reduce the noise in the input data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning has been a trending topic in the past two decades, and it has widely been used in various science and engineering disciplines for improved interpretation of big datasets. Creating a machine learning algorithm essentially refers to building a model that can output approximately correct information when fed with certain input data. These models can be thought of as black-boxes: input goes in and output comes out - although the mapping from input to output can be fairly complex in itself. With the advent of powerful computers, it has become remarkably easy to train the computers to identify the hidden complex representations in provided datasets. For an overview of the applications of machine learning in various fields, see the recent review works presented in [58, 15, 26, 24, 25], among others.

Among the various machine learning methods, neural networks in particular, have received enormous attention. Here, we list some of the early works related to application of neural networks in various sectors. For a detailed overview, please see the citing references to these papers. In the finance sector, neural networks are used for bankruptcy prediction of banks/firms [50], future options hedging and pricing [20], credit evaluation [23], interest rate prediction [35], inter-market analysis [49] and stock performance [5]. In human resources, the processes of personnel selection and workplace behavior prediction are automated using this technique [40, 13]. In the information sector, neural networks have widely been used for authentication or identification of computer users [43], recognition and classification of computer viruses [16], pictorial information retrieval [55], etc. Recently, neural networks have been used to solve several challenging problems in the field of medical imaging. For example, with this powerful tool, lung cancer can be effectively diagnosed and differentiated from lung benign diseases, normal control and gastrointestinal cancers [17]. There is an unending list of other applications where neural networks have proved to be worthy, and not all of these can be listed here.

The discipline of geophysics is no exception and neural networks have been used on various geophysical problems, e.g., inversion of electromagnetic, magnetelluric and seismic data [39, 65, 46], waveform recognition and first-break picking [34], trace editing [32], lithological classification [30], guide geophysical and geological modeling process [42]

, creating ensemble models for the estimation of petrophysical parameters

[8], etc.

The objective of most of the neural network based formulations is to mimic the internal representation of the highly nonlinear mapping from input to output. It could either be a classification problem where the correct label needs to be identified for a given input, or a regression problem where correct estimation of a response is desired. An autoassociative network (autoassociator) is an artificial neural network formulation which tries to learn the reconstruction of input using backpropagation. Thus, for an autoassociator, the input is same as the output and an approximation to the identity mapping is obtained in a nonlinear setting. In the past, some researchers have used neural networks as autoassociators with the aim of extracting sparse internal representations of the input data (

e.g. [3, 14, 12]). However, the use of insufficient layers restricted the generalization of these networks, therefore limiting their applicability. Kramer [27] used three hidden layers comprising linear and nonlinear activations and showed the applicability of their autoassociator for gross noise reduction in process measurements. However, for cases where the features of a process are related through a complex nonlinear function, the use of even three hidden layers may not necessarily be sufficient for certain cases.

Bengio [7] presented autoencoder, a form of autoassociators in a deep network framework, which allowed learning more accurate internal representations of the input. Since ‘autoencoder’ is a more common term in the recent literature, the rest of the paper uses it over ‘autoassociators’. Autoencoders have primarily been used to reduce the dimensionality of large datasets. A projection to a lower dimensional space helps to identify several hidden features and promotes improved interpretation of the data. Vincent et al. [61] used autoencoders for denoising tasks by cheaply generating input training data and corrupting it. This denoising procedure was aimed at making autoencoders more robust and allowed reducing the dimension of data efficiently even in the presence of noise. Valentine and Trampert [59] presented the geophysical application of autoencoders for data reduction and quality assessment of waveform data. The paper presented a precise and clear overview of the classical autoencoder theory, followed by its application on seismic waveform data. Since then, autoencoders have already been used for a few other problems e.g. analysis of topographic features [60], identification of geochemical anomalies [64].

Amongst several others, one of the biggest challenges in geophysics is the denoising of data. The problem of noise removal from data is very common with various other disciplines, and has been studied extensively in the past. Some of these are based on local smoothing to blur the noise e.g. nonlinear total variation based approaches [48], anisotropic diffusion method [62], bilateral filtering [57], etc. Other category of denoising approaches involves learning on noise-free datasets and then exploitation of noisy datasets [63, 21, 47]. One such way is to learn using wavelets and then shrinkage of the coefficients to remove the noise [37, 38]. Wavelet based shrinking has been used on various geophysical problems, e.g. seismic noise attenuation [6, 29].

In the context of denoising, limited research has been done in the past to investigate the applicability of autoencoders. Recently, Burger et al. [10] used autoencoders in a deep network framework for denoising input images and the approach was found to outperform some and perform equal to other state-of-art denoising methods. Schuler et al. [53] presented a neural network based non-blind image deconvolution approach capable of sharpening blurred images. The network was trained on large datasets comprising noisefree as well as noisy images, and it was found to work well for the task of denoising. Ojha and Garg [36] used autoencoders for denoising high-resolution multispectral images. It was shown in this paper that after training the model on a large set of noisy and denoised images, results comparable to non-local means algorithm are obtained in a significantly lesser amount of time.

Clearly, the works on denoising outlined above have demonstrated the potential of autoencoders. In a recent work, we have briefly shown that denoising autoencoders work very well for geophysical problems [9], and it is of interest to explore further in this direction. In this paper, we study in detail the application of autoencoders for denoising geophysical data. This work revolves around using autoencoders to learn the representation of the signal, and separate the noise content. We start with exploiting the potential of shallow autoencoders for denoising purpose. Based on the identified limitations of these networks, deep autoencoders with several different number of hidden layers are tested. To further enhance the denoising characteristic of the autoencoders, a stacked formulation is presented, the application potential of which is demonstrated on various numerical examples.

Here, we discuss the outline for the rest of the paper. The theoretical details of autoencoders are discussed in Section 2. This includes a brief description of the traditional autoencoders (Section 2.1) followed by its denoising variant (Section 2.2). The concept behind the stacking of autoencoders is discussed in Section 2.3. To demonstrate the working of autoencoders for denoising tasks, a basic mathematical example is presented in Section 3. The applications on geophysical problems are discussed in Section 4 and the final discussions and conclusions are presented in Sections 5 and 6, respectively.

2 Theory

2.1 Autoencoder

Autoencoders aim at learning the internal representation of data, typically encoding, and identify the important hidden features [7]

. In its simplest form, an autoencoder is very similar to a multilayer perceptron (MLP), which consists of an input layer, an output layer, and one or more hidden layers. For an ideal autoencoder, the input and output are same, which implies that the hidden units need to be tuned such that an accurate nonlinear approximation to the identity function can be obtained.

When neural networks are used, our interest is in learning an internal representation that relates the input to the output. Fig. 1 shows the schematic diagram of a neural network, where and

are the input and output vectors, respectively. The output vector

is obtained from through a series of linear/nonlinear mappings denoted by and functionals, respectively. For a deep network, these mappings themselves could comprise several hidden layers. The vector in Fig. 1 corresponds to a hidden layer in the network with reduced dimensionality.

For the network shown in Fig. 1 to be formulated as an autoencoder, and need to be ideally the same. Generally, the dimensionality of the hidden layers (e.g. ) is kept lower than that of the input and output, which allows autoencoders to learn an approximate compressed representation of the input. As discussed in [61], a natural criterion that any good representation should be expected to meet is that a significant amount of information about the input is retained. However, this condition alone is not sufficient to yield a good representation. Using hidden layers of same dimensionality or higher can lead to identity mapping, which is unlikely to lead to any useful information. Although traditionally followed, it is not necessary to use hidden layers of lower dimensions, rather, these can be larger than the input.

With the constraint of reduced dimensionality in the hidden layers, autoencoders can provide an alternative reduced-dimensional representation of the massive datasets, providing a novel insight into the data. Adding sparsity constraint allows using higher dimensionalities and it has been observed that these can provide very useful feature representations (e.g. [44]). An advantage of the sparse autoencoders is that they can handle variable-sized representations [61]

. A simplified version of an autoencoder, where no nonlinear transformations are used and a squared-loss error function is employed, is equivalent to performing principal component analysis (PCA)

[4]. However, this is generally not true for the traditional autoencoders where sigmoid-based nonlinearity exists.

Figure 1: Schematic structure of a traditional neural network. For , the network corresponds to an autoassociator/autoencoder.

A traditional autoassociator consists of two parts: an encoder and a decoder. Looking back at Fig. 1, let us assume that the shown neural network corresponds to an autoencoder with one hidden layer. Thus, corresponds to the hidden representation with reduced dimensionality. The mapping phase where the input is transformed into the hidden representation is termed as an encoder. A decoder is the part of an autoencoder where the input is reconstructed back as from its hidden representation . In Fig. 1, the encoding and decoding functions are denoted by and , respectively and these mappings are parametrized by vectors and , respectively. Typically, the mapping functions comprise of affine mapping followed by certain nonlinearity and can be expressed as:

(1)
(2)

where, and are parameter sets with and denoting the weight matrices and and

representing the bias vectors, respectively. Typically the nonlinear mapping

is achieved using sigmoid or radial basis functions.

The goal of the autoencoder presented in Fig. 1 is to minimize the reconstruction loss between and

. As error (loss) function, typically squared-error function or cross-entropy loss are used depending on the type of problem. The autoencoder presented in Fig.

1 consists of a single hidden layer. However, for autoencoders of higher complexity, several hidden layers can be used. Accordingly, the encoder and decoder will then comprise of a series of mappings in each. For more details related to autoencoders, see [61].

2.2 Denoising autoencoder

Figure 2:

Schematic diagram of a denoising autoencoder showing the encoder and decoder segments. The network comprises 3 hidden layers of 7, 5 and 7 neurons, respectively.

Any model can be considered as a good denoiser, if it can output clean signal from a noisy input. The first and foremost thing needed to build a denoising autoencoder is to identify a mapping from the noisy domain to the noisefree domain. The complexity of this mapping depends on several factors (e.g. level of noise), and cannot be expressed using a simple formula. However, if there exist a large number of data samples, autoencoders can be used to determine the fitting empirical model [10].

As outlined in [61], the concept of denoising autoencoders is based on the following ideas:

  • The higher level representation of the input data (i.e. primary signal) is generally assumed to be more stable and robust to addition of noise.

  • Denoising approach should be able to capture the features associated with the primary signal in the inner hidden layers of the network.

Typically, it is assumed that our primary signal has a certain well-defined representation, while noise may have it or may not. For non-coherent noise, the denoiser needs to be trained to filter our the components of input data which do not comprise any well-defined pattern. For coherent noise, the denoiser has to be trained such that it preserves the representation of the signal, but filters out the noise component. Removing coherent noise can be a challenging problem, especially when the signal-to-noise ratio is quite low.

Fig. 2 shows the schematic network representation for a basic denoising autoencoder. The autoencoder comprises three hidden layers of 7, 5 and 7 neurons respectively, and the input consists of 9 features. Compared to the input,the dimensionality of the innermost representation is 44% lower, which means that a compressed representation of the input will be encoded, and there can possibly be a loss of certain features. The goal is to train this network to construct clean output from the corrupted version .

Through a series of two projections (let us assume ), the noisy signal is mapped onto a reduced dimensional space, and the result is . The functional here refers to the encoder part of the autoencoder. Ideally, from the hidden representation , the noisefree signal needs to be constructed, and this process is referred to as decoding (). During the optimization process, this is achieved by training the set of parameters and and obtaining output , such that the reconstruction error is minimized.

Vincent et al. [61] have provided a nice geometrical interpretation for denoising autoencoders. This interpretation is based on the so-called manifold assumption ([11]

), according to which natural high-dimensional data concentrates close to a non-linear low-dimensional manifold. Generally, the noisefree data can be understood as a combination of several principal components, and using a neural network architecture, it is possible to obtain these components. Noise is generally found to be shifted away from these manifolds, and in the process of minimizing the loss, the optimization process tends to not include the noise component in the reduced dimensionality.

2.3 Stacked autoencoders

Figure 3: Schematic diagram of a stacked denoising autoencoder showing the two steps involved in the denoising process. The entire deep network is assumed to comprise hidden layers. During Step 1, the weights are trained for one hidden layer at a time, in a recursive manner. Finally, during Step 2, the entire network is trained at once using the pre-initialized weights obtained from Step 1.

Training the autoencoder to remove noise from a given dataset can be highly nonlinear and is not an easy problem to solve. Learning such complex representations requires deep multi-layered neural networks. The standard approach, comprising random initialization of weights and using gradient descent based backpropagation, is known to produce poor solutions for 3 or more hidden layers. In [28], this aspect has been studied in detail, and it has been observed that if efficient algorithms are used, deep architectures perform better than shallow ones.

In this paper, a stacked formulation for autoencoders is proposed, where multiple cycles of simple autoencoders are trained in a zoom-in fashion, followed by training the whole network at once. Fig. 3 shows the schematic diagram explaining the two steps of stacked denoising autoencoder. It is assumed that the deep network architecture of the autoencoder comprises hidden layers. The details related to the two steps follow below.

2.3.1 Recursive pre-training of weights and biases

In this step, the weights and biases of the network are pretrained using simple autoencoders comprising 1 hidden layer each. For hidden layers in the deep autoencoder network, a total of autoencoders need to be formulated due to symmetry in the network architecture. Here, it is assumed that

is an odd number as shown in Fig.

3. The weights corresponding to the hidden layers are determined, starting with the outermost hidden layers and moving towards the innermost representation in a recursive manner.

Let and denote the noisy signal and its uncorrupted version, respectively. Let and denote the projection of onto the first hidden layer space (where the representation is denoted by ) and projection from the hidden layer to the output space, respectively. This implies that and . Note here that after the autoencoder has been trained, there will still be an approximation error, and may not necessarily be equal to . Once the parametrization vectors and have been trained, the next inner representation needs to be optimized.

For training the next level representation, an autoencoder comprising a 3-layer neural network is formulated. The input and output vectors for this autoencoder are set to each. The respective mappings onto the hidden space and the output space are denoted by and functionals, respectively. The parameter vectors and are optimized, and is computed. This whole process of formulating an autoencoder and computing level hidden level representation is repeated times as shown in Fig. 3 and are obtained. At the same time, parameter vectors and have been optimized to certain values.

2.4 Training the full network

Once the parametrization vectors have been initialized, Step 2 involves further training the entire deep network at once. The deep network architecture has been shown in Fig. 3. The input and output vectors for this network are set to and , respectively and the parameters for the layers from left to right are set to . Accordingly, the mapping functionals from left to right of the network are defined to be , , , , , , , , respectively. Once the whole network has been set up and the parameters have been initialized with the values computed in Step 1, optimization is performed and the new representations for the hidden layers of the deep network are obtained. These are denoted by .

3 A basic mathematical example

To provide a better understanding of how an autoencoder works, here we present a basic mathematical example. Since the goal of this paper is typically to demonstrate the use of autoencoders for noise reduction and data reconstruction and excludes its usage as merely a dimensionality reduction tool, we restrict ourselves to denoising autoencoders. For readers interested in the application of autoencoders for dimensionality reduction in geophysics, we advise looking at the work of [59].

To start with, a simple mathematical problem of two parameters ( and ) is chosen and a process model of the following form is used [18],

(3)

where, and are the input and output fields, respectively, and and are the model parameters. For , a range of 0.05 to 1 is chosen with a sampling interval of 0.05. The input field is then used to generate the output signal , where denotes the number of sampling points. Once is obtained, it is assumed that the process model is not known anymore. Next, an autoencoder is trained to learn the internal representation of such that for any noisy variant , the noise-free signal can be recovered. Learning the representation here refers to approximating the process model through a neural network and rejecting the component of the input which does not fit well with the model.

An autoencoder comprising 2 hidden layers is formulated. The number of neurons in each hidden layers is kept to be equal to . Note that choosing the number of neurons equal to the number of input units here does not lead to a plain identity function due to the large noise added in some of the signals. Table 3

states the neural network parameters used for this autoencoder. Nonlinear (sigmodial) activation functions are used for projection from the input layer to hidden layer 1 and hidden layer 1 to hidden layer 2. For obtaining the output

, linear activations are used. The error (loss) function is defined as

(4)

where, refers to the number of samples,and and refer to the noise-free and recovered samples, respectively.

A set of 20000 samples is generated using the process model stated in Eq. 3, and is further divided into 80% and 20% for training and validation samples, respectively. Random noise of upto 25% is added to the data points of 10000 samples and the other 10000 samples are kept noise-free. For 50% of the noisy-samples, the magnitude of added noise scales based on the local value at the respective point of the sample. For the remaining 50%, it scales with the mean of all the data points of the sample. For optimization purpose, the traditional gradient descent algorithm is used. Due to the simplicity of the problem, no regularization is needed, and the convergence of the optimization problem is found to be very fast.

Parameter Value
TRAINING PHASE
No. of samples 20000
Noisy samples 10000
Noise type random noise (upto 25%)
Training samples 80%
Validation samples 20%
No. of hidden layers 2
Hidden units {20, 20}
Activation functions {sigmoid, sigmoid, linear}
TEST PHASE
Test samples 100
noise between 10% and 25 %
Table 1: Parameters for the simple denoising autoassociative neural network used in Section 3.
(a) Noise reduction for the test data
(b) Test sample 20
(c) Test sample 70
Figure 4: (a) Relative noise reduction for the 100 noisy test samples, (b) noise-free, noisy and autoencoder (AE) corrected data for sample index 20 and (c) for sample index 70. A denoising autoencoder comprising 2 hidden layers with 20 neurons in each is used.

To test the accuracy of the learnt representation , 100 test samples are generated, and noise, chosen randomly between the levels of 10% and 25%, is added to each of these samples. Next, these data samples are passed through the learnt representation to reduce the noise and obtain the output . The efficiency of the learnt representation for any noisy signal is given by

(5)

Fig. 3(a) shows the values for 100 noisy samples measured using Eq. 5. For these samples, the mean value of is found to be , which means that the learnt autoencoder can reduce the noise by approximately 90%. This is a significant improvement and clearly demonstrates the applicability of autoencoders for denoising purpose. Figs. 3(b) and 3(c) show two data samples, their noisy versions as well as the autoencoder corrected signals. These data samples have been picked randomly from the set of 100 samples for demonstration purpose . From the results, it is clear that the autoencoder has learnt to identify the signal pattern within the data, and reject any random noise.

4 Applications

4.1 Self-potential problem

Figure 5: Schematic diagram of a buried vertical cylinder, also showing some of the parameters that characterize the 1D SP anomaly caused due to it.

In a self-potential (SP) survey, the naturally occurring potential differences generated by electrochemical, electrokinetic and thermoelectric sources are measured. This approach has been used in a wide range of applications: in exploration mainly for sulphides and graphite [56], ground water investigations [52], detection of cavities [22] and geothermal exploration [66]. For some cases, SP anomaly can be modeled using simple geometries, e.g. sphere, cylinder and inclined sheet [51]. From the obtained field data, the set of parameters defining the buried source can be determined using various methods such as curve matching [33], gradient-based methods [2], global optimization [19], etc.

The SP data acquired in the field can also comprise noise from several sources, and without any post-processing, it is possible that the inverted set of parameters defining the buried source do not comply well with the actual properties. With the gradient-based methods, there exist high chances of getting stuck in a local optimum. The global optimization methods as well have been found to be sensitive to the level of noise in the data (e.g. [51, 19]). To circumvent this problem, we use autoencoders for reducing the noise content in the acquired data.

In the context of SP anomaly inversion, the application of autoencoders is demonstrated on data obtained for a 1D problem. A vertical cylinder buried in the subsurface, as shown in Fig. 5, is considered. The forward model for the SP anomaly at a point is computed as

(6)

where, , , and

denote depth, polarization angle, current dipole moment and shape factor, respectively, and

refers to the origin of the anomaly. These five variables are the parameters obtained generally by inverting the SP data. Note that in this work, we are not developing another efficient inversion approach. Rather, with the forward model known, our goal is to train the neural network to be able to identify the component of data that complies with it, and reject the other parts.

Parameter Min Max
depth (m) 1 8
polarization angle (degrees) 25 75
electric dipole moment (mV) -1000 1000
shape factor 0.5 1.5
origin of the anomaly (m) -5 5
Table 2: Range of values for the parameters characterizing the forward model for 1D SP anomaly caused due to the burial of a vertical cylinder ([51]).

To start with, we define ranges for the parameters stated in Eq. 6, and these are shown in Table 2. Combinations of parameters’ values are randomly chosen from these ranges to generate data samples for training the neural network. The value of is varied from -20.0 m to 20.0 m with a spacing of 2.5 m. As stated in Table 3, a total of 60000 samples are used, out of which 40000 samples are corrupted with random noise. The entire dataset is divided into training and test sets in the ratio of 4:1. All the layers comprise sigmoid activations, except the last one which has only a linear activation. The loss (error) at every step of training is computed in a similar fashion as stated in Eq 4. Further, to get the quantitative estimate of the noise reductions, we use the function stated in Eq. 5.

Parameter Value
TRAINING PHASE
No. of samples 60000
Noisy samples 40000
Noise type uniform random noise
Noisy units per sample 50%
Training samples 80%
Validation samples 20%
Activations all sigmoid and last as linear
TEST PHASE
Test samples 1000
Noisy units per sample 50%
Noise random (upto 50) %
Table 3: Network parameters for the various (stacked) denoising autoencoders used for reduction of noise in self-potential data.
Hidden nodes Network-type (in %)
{4} SA 0.0 17.6
{12} SA 0.0 54.7
{25} SA 0.0 55.4
{12, 12} DA 0.0 58.4
{20 25, 20} DA 0.0 65.4
{17, 20 25, 20} DA 0.0 64.1
{20 25, 30, 25, 20} DA 0.0 63.7
{20 25, 20} SDA {0.0, 0.0; 0.0} 61.8
{20 25, 20} SDA {0.05, 0.05; 0.0} 70.9
{20, 25, 30, 25, 20} SDA {0.05, 0.05; 0.05; 0.0} 73.0
{20, 25, 30, 25, 20} SDA-R {0.05, 0.05; 0.05; 0.0} 78.5
{25 20, 17, 20, 25} SDA {0.0, 0.0; 0.0; 0.0} 72.1
{35 25, 17, 25, 35} SDA {0.05, 0.05; 0.05; 0.0} 77.2
{35 25, 17, 25, 35} SDA-R {0.0, 0.0; 0.0; 0.0} 80.6
{35 25, 17, 25, 35} SDA-R {0.05, 0.05; 0.05; 0.0} 81.3
Table 4: Information related to runs of denoising SP data using several autoencoder configurations. Here, SA, DA, SDA and SDA-R refer to shallow autoencoders, deep autoencoders, stacked deep autoencoders and stacked deep autoencoders with randomness, respectively. Also defines the extent of regularization denotes percentage reduction in noise.

Several different autoencoder configurations are tested to understand how complexity and composition of the neural networks affect the performance of autoencoders. Table 4 lists the number of neurons in each layer for the various autoencoders used. It is observed that with shallow autoencoders (SA), which comprise up to two hidden layers, the reduction in noise is less than 60%. For the network with one hidden layer comprising only 4 neurons, noise reduction level is merely around 17%. This happens because such a compressed internal representation might not be enough to fully capture the signal pattern. Increasing the number of hidden layers to 12 already pushes the efficiency of the autoencoder beyond 50%.

The use of deep autoencoders (DE) with 3 or more layers has been observed to further improve the performance. With 3-5 hidden layers, reaches close to 65% (Table 4). An interesting observation is that with increasing number of hidden layers, the value of reduces. This is because as the network grows, training it becomes more difficult due to the increased number of variables and vanishing gradients. Clearly, with these bottlenecks, the conventional deep networks are not the right solution to obtain very efficient autoencoders for the SP problem.

To circumvent the issue related to training the deep networks in a standard manner, we explore the application of stacked autoencoders for denoising SP data. Several neural network configurations are tested using the two step approach described in Fig. 3. In the first step, the weights corresponding to every hidden layer are trained using basic autoencoders with only one hidden layer each. Once the weights have been initialized, a full-fledged deep neural network is trained to reach the final solution. Stacked autoencoders have been found to push beyond 70%. This can be improved further by regularizing the weights and avoiding over-fitting. With 5 hidden layers, the stacked autoencoder could achieve efficiencies close to 78%.

Fig. 6 shows three data samples chosen randomly out of 1000 data samples in the test set. A stacked deep network with 5 hidden layers is used. It is observed that for the three cases, autoencoder could significantly reduce noise in the data. However, in Fig. 5(c), certain amount of bias can be seen in some parts of the result obtained using autoencoder. Although the employed autoencoder could smoothen the data in that region, the associated values deviate significantly from the actual values. A reason could be that the training set did not comprise samples resembling this data, and the model was not sufficiently trained for it. Clearly, a remedy would be to further train the model in a feedback loop based on the fitting obtained for such examples.

We also observed that for stacked autoencoders, perturbing the weights obtained in step 1 improves the convergence of step 2. Randomly 10% of the weights corresponding to each hidden layer are sampled and perturbed by up to 5%. With this configuration, value of 80.6% is obtained. This approach when combined with regularization can remove more than 81% of the noise from SP data. With this level of improvement in the data, it can be claimed that stacked autoencoders could be a potential denoising tool for such problems.

(a) Test sample 27
(b) Test sample 73
(c) Test sample 282
Figure 6: Noise-free, noisy and autoencoder (AE) corrected data for samples with index 27, 73 and 282. These samples have been chosen randomly out of the 1000 test samples. The noisy signal consists of upto 50% random noise in 50% of the points for each sample. A stacked deep network with the hidden structure of {25, 20, 17, 20, 25} is used.

4.2 Seismic data

Figure 7: An example seismic section considered for generating the training and test samples for this study.
Figure 8: Schematic diagram demonstrating the extraction of small image slices from a test image of size . For the seismic example case, and would denote time samples and number of traces, respectively. The image slices are chosen to be of size

and stride

.
(a) noise-free data
(b) noisy data
(c) corrected data obtained using a traditional deep autoencoder
(d) corrected data obtained using a stacked deep autoencoder
Figure 9: Denoising of corrupted seismic data using a traditional deep autoencoder and a stacked deep autoencoder. For the noisy data, some traces chosen randomly, have been corrupted by replacing them with monofrequency sinusoidal traces in a frequency range of 100 to 220 Hz.

In this section, the applicability of autoencoders is explored for the removal of random noise from seismic data. As stated earlier, the application of autoencoders on seismic waveform data has been demonstrated in the past by [59], however, only in the context of dimensionality reduction. Here, our goal is to remove noise from seismic data. For simplification purpose, we do not discuss the headers associated with the data, rather we treat the seismic data only as a two-dimensional matrix of amplitude values. Also, in this paper, our scope is restricted to non-coherent noise, and cases of coherent noise are not considered. Fig. 7 shows the seismic section that has been used in this study to generate training and test samples for the autoencoder network.

Parameter Value
TRAINING PHASE
Sample size
No. of samples 1.1 million
Noisy samples 0.75 million
Noise type monofrequency sinusoidal noise
Noise frequency between 100 and 220 Hz
Noisy traces per sample 1
Training samples 95%
Validation samples 5%
Activations all sigmoid and last as linear
DA network {300, 400, 300}
SDA network {300, 400, 500, 400, 300}

Max. epochs

50000
TEST PHASE
Test image size
Number of noisy traces 7
Noise type monofrequency sinusoidal noise
Noise frequency between 100 and 220 Hz
window size
stride 1
Table 5: Network parameters for the various (stacked) denoising autoencoders used for reduction of noise in self-potential data.

For the purpose of training the autoencoder, a total of 1.1 million small seismic samples are used. Details related to the training and test datasets are presented in Table 5. From the seismic section shown in Fig. 7, around 0.38 million smaller sample images are randomly chosen. Each sample image contains data points, 9 being the number of traces and 42 denoting the number of data points along the time axis for every trace. These images are assumed to be the clean versions of data. Further, each uncorrupted image is used to generate two noisy samples. A trace is randomly chosen from the clean image, and it is replaced by a monofrequency trace with frequency in the range 100-220 Hz. The amplitude of the noisy trace is randomly chosen between and , where refers to the maximum amplitude observed in the seismic section. In this way, a total of around 1.1 million seismic samples are obtained.

The entire dataset is divided into training and validation sets in the ratio 95:5. To train the autoencoder, first we start with 3 hidden layers comprising 300, 400 and 300 neurons, respectively. All the activations are set to sigmoid, except the last one where linear activations are employed. The weights are initialized randomly and the entire network is trained for up to 50000 epochs.

The trained model is then tested on a seismic image comprising 99 traces with 42 data points in each. Fig. 8(a) shows the clean image used for generating test data. Table 5 lists complete details associated with testing the model. The noisy test data is generated by corrupting 7 traces in the noisefree test image as shown in Fig. 8(b). The noisy traces correspond to monofrequency sinusoidal signals with frequency randomly chosen from the range 100-220 Hz.

To feed the test image to the trained model, compatible image slices need to be chosen. These image slices are chosen using a window operator, as shown in Fig. 8. This window is slid along the row and column directions of the image using a certain stride . Here, refers to the jump made by the window per step. For a test image size of , a window size of and stride , a total of image slices are fed to the trained model as shown in Fig. 8. Thus, the test seismic image used in this study is represented using 91 image slices.

The output of the trained model is then summed up using a weighted approach to obtain the output test image of the seismic section. The image slices are properly aligned and stacked using weights proportional to the number of times a data point has been mapped into an image slice. Fig. 8(c) shows the denoised version of the noisy seismic image obtained using the autoencoder with 3 hidden layers. It is seen that the chosen autoencoder network could significantly remove random noise from the data. However, the autoencoder regularizes the seismic image due to which resolution of the image is lost to a certain extent. Clearly, this is not desired, since the reduced resolution will lose information related to thin beds as well as other fine features present in the seismic section.

Further, to test whether the random noise in seismic data can be reduced without compromising too much with the resolution of data, the potential of stacked autoencoder network is explored. A network comprising 5 hidden layers with 300, 400, 500, 400 and 300 neurons, respectively, is employed. In the first step of autoencoding, the weights are initialized using several traditional autoencoders comprising single hidden layer each. During the second step, the whole network is trained using the pre-initialized weights. The trained autoencoder is then used to denoise the test data. In the hidden layer comprising 500 neurons, an additional sparsity constraint is added which ensures that not all the neurons of this layer are activated at the same time.

Fig. 8(d) shows the denoised seismic image obtained from the trained stacked autoencoder. The random noise has been significantly suppressed. Compared to Fig. 8(c), it can also be seen that the resolution of the output image has significantly increased. However, with this autoencoder configuration as well, the resolution of output seismic data is compromised to a certain extent. Nevertheless, the stacked autoencoder configuration shows potential in suppressing noise in seismic data, and a future direction of research would be to design networks of even higher complexity that can produce better results.

Data source Porosity (%) Clay fraction (%) Hydrate saturation (%)
Synthetic 1 30 - 60 50 - 80 0 - 20
KG basin (NGHP-01-05) 0 - 90 85 - 95 0 - 30
Mt. Elbert-01, Alaska North Slope 40 (approx.) 0 - 40 0 - 60
Table 6: Range of values for the three property logs used to generate training dataset.

4.3 Well log data

Figure 10: Schematic diagram of an image slice provided as a training sample to the denoising autoencoder. Here, the superscripts 1, 2, 3, , denote the features used for training. For the well data denoising problem considered in this paper, the features are porosity, saturation, p-wave velocity and clay content. To denoise the data point in the well logs, the image slice includes information of points above as well as below this data point in the well logs.

Well logging is the practice of obtaining detailed information related to the geological formations in an area through sensors deployed in a borehole. This approach has widely been used to search for oil and gas, ground water, minerals as well as geotechnical studies. Several different properties such as porosity, density, velocity, water saturation, etc. can be estimated using well log data. Often for a certain geology, empirical linear/nonlinear relationships are established between two or more such properties, and this is further used as a template to calculate one property if the other is known. A detailed discussion on some such templates can be found in [31] and references therein.

Amongst a set of logs acquired in a borehole, it is possible that information in some parts of one log is either corrupted or missing. In case the data is missing, it can be calculated from other logs using appropriate empirical relationships. Here, the challenge is to identify the correct empirical relationship, since it depends very much on the local geology. For cases where one of the logs contains some random noise, it might not be easy to identify it.

In this paper, we explore the application of denoising autoencoders to solve the two issues outlined above. For the first case, we look at a suite of logs where parts of the data has been corrupted. Next, we train an autoencoder and use it to correct the data contained within this suite. In another test, parts of one of the logs are muted, and we let the trained autoencoder predict information in those parts. For study purpose, we use well log data from two gas hydrate sites: NGHP-01-05 site in the Krishna-Godavari (KG) basin of India [54] and Mt. Elbert (ME) site in the Alaska North Slope region [45].

For the two sites, the available logs are porosity (), shale volume () and -wave velocity (). Using these logs, gas hydrate saturation values () are calculated using the velocity-porosity transform proposed in [41]. It comprises 3 relations between velocity and porosity, which are as follows.

(7)
(8)
(9)

where, and denote matrix and fluid velocities, respectively. The data from the two sites has contrasting lithological composition, the KG basin sediment being shaly with around 80-90% clay content in the rock matrix, and the ME data having low clay content and low porosity values. Due to this contrast, a single model that can fit these lithologies would have to be very nonlinear.

The goal of the denoising autoencoder for this case would be to receive a suite of logs (4 logs for the examples above), and identify the parts of the data in the logs which do not satisfy the model of [41]. One reason could be that one of the 4 logs for those parts of the data is corrupted with noise. Alternatively, it is possible that certain parts of the logs are missing, due to which the check cannot be done.

To train the denoising autoencoder, a large set of training examples needs to be generated. Prior information related to the possible range of values for , , and needs to be known. Table 6 lists the ranges for , and that have been used to generate the training set for this example. These ranges correspond to the range of values observed in KG basin and the ME site. In addition, a range of synthetic data values has also been added to make the autoencoder more robust and generalized. From these ranges, random values of , and are chosen and the corresponding value of is calculated using the model proposed in [41]. The log properties are shifted using their respective means and normalized using their maximum and minimum values.

The training samples need to be fed to the autoencoder network in the form of image slices of size as shown in Fig. 10. Here, denotes the number of property logs available (equal to 4 for the example considered here). For the data point to be corrected, information from a total of continuous data points needs to be considered, which includes data points from above as well as below the point. In general, well log properties do not vary much between continuous data points. Although we include this fact into our training data, we still allow the properties to vary by up to 20% between adjacent data points. In this manner, a total of 0.1 million clean images are generated.

The autoencoder needs corrupted images as well for training. For every clean image, 7 noisy images are generated. To generate a noisy image, one of the 4 logs is randomly chosen, and between 10% to 40% data points of this log property are modified. Either up to 10% random noise is added, or the data at these points is muted. The muted data points are set to -0.1 to differentiate them from the rest of the data. Since there are 4 logs, . Two different autoencoder models are trained with different values of (3 and 70). The low value 3 is used to understand the local characteristics of the data, and with 70, the goal is to understand the characteristics on a larger scale so that the noisy or muted parts of the data can be differentiated from the noisefree parts. We denote these models by and .

The trained autoencoder is then tested on the test set. The real datasets from KG basin and ME site are used for this purpose. Parts of the log from ME site and log from the KG basin site are muted as shown in Figs. 11 and 12. Also, random noise is added in parts of the log of KG basin site as shown in Fig. 13. The objective of the trained autoencoders is to predict the correct information in these parts. For quantitative impression of the added noise, the noisefree data is also shown in Figs. 11, 12 and 13.

From the test dataset, several test image slices are generated as shown in Fig. 10. The concept of a sliding window, as discussed in Fig 8, is used with a stride of 1. Next, the two trained autoencoders and are applied on the input test data. The results obtained from the two trained autoencoders are then summed up using weights 0.7 and 0.3, respectively. These weights have been obtained empirically using a trial and error method.

The final denoised outputs are shown in Figs. 11, 12 and 13, respectively. It is observed that the trained autoencoder can nicely predict the missing values of for the ME site and the error in prediction in the missing parts is found to be less than 10%. For the KG site also, the values of and are predicted very well. However, for the KG site, it is observed that the log data gets regularized to a larger extent, and the resolution is reduced. For all the test examples, some noise gets introduced in the noisefree parts of the data, however, the magnitude of this noise is significantly low. A direction of future research would be to minimize this undesired noise. Nevertheless, from these examples of well data, it is observed that autoencoders can be trained to effectively reduce noise in well log data, as well as predict data in the missing parts of well logs.

Figure 11: Noisefree, noisy and autoencoder (AE) corrected values for the Mt. Elbert (ME) site, and the error associated with the corrupted and AE corrected logs. The noisy log comprises parts where the data is muted (set to -0.1) and the denoising autoencoder predicts information in these parts.
Figure 12: Noisefree, noisy and autoencoder (AE) corrected porosity values for the Krishna-Godavari (KG) basin site, and the error associated with the corrupted and AE corrected logs. The noisy log comprises parts where the data is muted (set to -0.1) and the denoising autoencoder predicts information in these parts.
Figure 13: Noisefree, noisy and autoencoder (AE) corrected values for the Krishna-Godavari (KG) basin site, and the error associated with the corrupted and AE corrected logs. The noisy log comprises parts where the random noise has been added to the data and the denoising autoencoder corrects the information in these parts.

5 Discussions

In this paper, the applicability of autoencoders has been investigated for the reduction of noise and signal reconstruction in geophysical data. A stacked denoising variant of deep autoencoder network has been proposed, which involves two-step training of the network. Through several numerical examples, we have shown that the proposed methodology works well on geophysical data. However, there are certain limitations of the current methodology, and several research directions can be outlined to improve further on this study. In this section, we briefly look at some of these important aspects.

For better denoising in a deep network regime, we presented a stacked version of deep denoising autoencoders. Note that the presented stacked denoising autoencoder should not be confused with the work of [61], where the stacked denoising formulation aimed at making the autoencoder more robust to noise for classification problems in particular. Moreover, the stacked formulation presented in their work differs significantly from the one presented in this paper. As stated above, during the first step of our stacked denoising autoencoder, the weights of the full network are trained in parts using basic autoencoders with one hidden layer in each. Once the weights are initialized, the second step involves training the full network at once. We have observed that randomly perturbing some of the weights obtained from the first step helps the convergence of the training process in the second step. However, the exact effect of this randomness is unknown yet, and identifying the optimal type of randomness as well as its optimal magnitude is still be to investigated.

Another important aspect that needs to be looked into is the quality check (QC) of the results obtained from the denoising autoencoders. While we confidently show that the proposed autoencoders can significantly reduce noise in the chosen examples, it is difficult to predict how the trained autoencoder will perform on data not represented in the training set. This is a known problem in the field of machine learning, and it would always be advised to limit the application of a trained network to datasets whose representation overlaps well with the training set. For physics-based problems, similar to the ones considered here, inaccurate removal of noise can change the internal representation of the data, and the whole modeling process can be adversely affected. In this regard, a direction of future research would be to devise a QC approach for machine learning approaches applied on physics-based problems. This QC approach would be expected to provide the extent of uncertainty in the autoencoder output based on the difference between the provided input and the training set.

The focus of this paper has been restricted to non-coherent noise, and to restrict the length of the paper, we have only considered uniform random noise. Morever, we use mean squared errors which are particularly effective with Gaussian noise. Thus, it needs to be looked into whether choosing a different error function would help to further improve the performance of the denoisers. The application of autoencoders on other non-coherent noise types has been briefly been studied in [10]. However, in the field of geophysics, removing coherent noise (e.g.) ground roll, multiples, etc. from seismic data is also a tough challenge. We believe that it would be of interest to the geophysical community to explore the application of stacked denoising autoencoders for the removal of these types of noises as well. Also, the variation of random data that forms an inclusive subset of the training data is limited for the examples considered in this paper. For example, for the seismic test problem considered here, we only assume noisy traces to comprise single frequency component. However, the random noise might have a more complex representation comprising multifrequency components or other functions. Noise based on such representations should also be considered for the robustness of the denoiser.

Moreover, for the seismic problem, we assumed that only a maximum of one trace is corrupt for every small image slice used for training. However, in reality, multiple adjacent traces can be corrupted, and such scenarios should also be modeled. We believe this does not vary the concept demonstrated here, except that the values of and chosen for the seismic problem would have to be significantly larger. Due to the availability of limited computational resources, restriction was imposed on the values of and

, however, this would be an interesting direction to look into. Choosing larger image slices allows the autoencoder to interpret a more zoomed-out picture of the representation, thereby providing the capability to interpolate the values of multiple traces at the same time. A similar aspect has been looked into for the well data correction problem considered in this paper.

One limitation observed in the seismic and well data results is that the resolution of the data is compromised during the denoising process. This issue could be suppressed to a certain extent by the use of larger training sets as well as larger training samples in the set. Using a larger training set allows to identify patterns from a wide range of frequencies, which cannot be identified in relatively smaller training sets. Having a wide frequency band plays an important role in improving the resolution of the data. Hence, when larger training set size is used, the resolution of autoencoder output improves. At the same time, it is also important that the low frequency components are preserved so that the zoomed-out representation of the data can be understood. For example, for cases of seismic or well data, where multiple adjacent traces are corrupted, it is important that the pattern of the data is identified on a more global level, and this would require identifying the low frequency representation of the data.

Few additional challenges have been identified that need further investigation for designing improved denoising autoencoders. We observe that in the seismic and well data results obtained from the autoencoder, some noise gets added in the clean parts of the data. This is an undesired noise and needs to be prevented. It is believed that modifying the error (loss) function should help to tackle this issue, and this is still to be investigated. Another aspect is on the choice of continuity in the synthetic well data used for training the autoencoder. In this paper, it is assumed that the the properties between two adjacent samples do not vary by more than 20%. Based on some preliminary tests, we have observed that the trained autoencoder is very sensitive to the choice of this threshold value. Hence, a direction of research would be to obtain a detailed understanding on the effect of this parameter on autoencoder’s performance.

6 Conclusions

Autoencoders are capable of learning the internal representation of even very complex datasets. Since autoencoders can nonlinearly project data onto a lower dimensional space, several hidden features of the data can be identified, which cannot be realized using the traditional dimensionality reduction techniques. This capability allows identifying the pattern of the signal in the provided data, and separate the noise component. In this paper, the application of autoencoders has been explored in the context of denoising geophysical data. A stacked variant of denoising autoencoders has been formulated, and its application is demonstrated on several numerical examples. For a basic mathematical example, it has been shown that more than 90% of the random noise can be reduced using denoising autoencoders. For SP anomaly data, the deep networks formulated in this paper could reduce around 80% of the random noise, when trained using appropriate forward model. The stacked autoencoders are also found to perform very well on seismic and well log data, reducing the random noise and recovering the missing values to a significant extent.

Clearly, the presented stacked denoising autoencoders help to tackle the issue of noise reduction and recovery of missing values in geophysical data. For future work, our goal is to explore the application of these autoencoders on larger datasets, and in the presence of coherent noise. Nevertheless, based on the results presented in this study, it can already be argued that denoising autoencoders could serve as an important data-driven methodology for the elimination of noise in geophysical datasets.

Acknowledgements

Parts of the research in this paper have been carried out using TensorFlow, an open source software library for high performance numerical computation, especially in the space of machine learning research [1]. We would like to thank the developers of this software. Also, we express our thanks to Nikhil Kumar and Jai Gupta for their valuable suggestions and help in the completion of this paper.

References

  • Abadi et al. [2015] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. URL https://www.tensorflow.org/, software available from tensorflow.org
  • Abdelrehman et al. [2003] Abdelrehman EM, El-Araby TM, Hassaneen AG, Hafez MA (2003) New methods for shape and depth determinations from SP data. Geophysics 68:1202–1210
  • Ackley et al. [1985]

    Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for boltzmann machines. Cognitive Sci 9:147–169

  • Baldi and Hornik [1989] Baldi P, Hornik K (1989) Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks 2:53–58
  • Bansal et al. [1993] Bansal A, Kauffman RJ, Weitz RR (1993) Comparing the modeling performance of regression and neural networks as data quality varies: a business value approach. J Manage Inform Syst 10:11–32
  • Beenamol et al. [2012] Beenamol M, Prabavathy S, Mohanalin J (2012) Wavelet based seismic signal de-noising using shannon and tsallis entropy. Comput Math Appl 64:3580–3593
  • Bengio [2009] Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1):1–127
  • Bhowmick et al. [2016]

    Bhowmick D, Shankar U, Maiti S (2016) Revisiting supervised learning in the context of predicting gas hydrate saturation. In: 78th EAGE Conf. Exhib., pp 1–4

  • Bhowmick et al. [2018] Bhowmick D, Gupta DK, Maiti S, Shankar U (2018) Deep autoassociative neural networks for noise reduction in seismic data. CoRR abs/1805.00291
  • Burger et al. [2012] Burger CH, Schuler CJ, Harmeling S (2012) Image denoising with multi-layer perceptrons, part 1: comparison with existing algorithms and with bounds. CoRR 1211.1544
  • Chapelle et al. [2006]

    Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge, MA

  • Chauvin [1989] Chauvin Y (1989) Towards a connectionist model of symbolic emergence. In: Proc. 11th Ann. Conf. of the Cognitive Science Soc., pp 580–1587
  • Collins and Clark [1993] Collins JM, Clark MR (1993) An application of the theory of neural computation to the prediction of workplace behavior: an illustration and assessment of network analysis. Pers Psychol 46(3):503–522
  • Cottrell et al. [1987] Cottrell GW, Munro P, Zipser D (1987) Learning internal representations from gray-scale images: an example of extensional programming. In: Proc. 9th Ann. Conf. of the Cognitive Science Soc., pp 461–473
  • Crisci et al. [2012] Crisci C, Ghattas B, Perera G (2012) A review of supervised machine learning algorithms and their applications to ecological data. Ecol Model 240:113–122
  • Doumas et al. [1995] Doumas A, Mavroudakis K, Gritzalis D, Katsikas S (1995) Design of a neural network for recognition and classification of computer viruses. Comput Sec 14(5):435–448
  • Feng et al. [2012] Feng F, Wu Y, Nie G, Ni R (2012) The effect of artificial neural network model combined with six tumor markers in auxillary diagnosis of lung cancer. J Med Syst 36:2973–2980
  • Gupta et al. [2012] Gupta DK, Arora Y, Singh UK, Gupta JP (2012) Recursive ant colony optimization for estimation of parameters of a function. In: Proc. RAIT-2012, IEEE, pp 1–7
  • Gupta et al. [2013] Gupta DK, Gupta JP, Arora Y, Shankar U (2013) Recursive ant colony optimization: a new technique for the estimation of function parameters from geophysical field data. Near Surf Geophys 11:325–339
  • Hutchinson et al. [1994] Hutchinson J, Lo AW, Poggio T (1994) A non-parametric approach to pricing and hedging derivative securities via learning networks. J Finance 11:325–339
  • Jain and Sebastian [2009] Jain V, Sebastian S (2009) Natural image denoising with convolutional networks. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in Neural Information Processing Systems 21, Curran Associates, Inc., pp 769–776
  • Jardani et al. [2007] Jardani A, Revil A, Santos FAM, Fauchard C, Dupont JP (2007) Detection of preferential infiltration pathways in sinkholes using joint inversion of self-potential and EM-34 conductivity data. Geophy Prospec 55(5):749–760
  • Jenson [1992] Jenson HL (1992) Using neural networks for credit scoring. Manag Finance 18(6):15–26
  • Jones et al. [2016] Jones DE, Ghandehari H, Facelli JC (2016) A review of the applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles. Comput Meth Prog Bio 132:93–103
  • Khan and Yairi [2018]

    Khan S, Yairi T (2018) A review on the application of deep learning in system health management. Mech Syst Signal Pr 107:241–265

  • Kourou et al. [2015] Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17
  • Kramer [1992] Kramer MA (1992) Autoassociative neural networks. Computers Chem Engng 16(4):313–328
  • Larochelle et al. [2007] Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proc. 24th Int. Conf. Machine Learning, ICML, pp 536–543
  • Li et al. [2017] Li J, Zhang Y, Qi R, Liu QH (2017) Wavelet-based higher order correlative stacking for seismic data denoising in the curvelet domain. IEEE J Sel Top Appl
  • Maiti et al. [2007] Maiti S, Tiwari RK, Kümpel HJ (2007) Neural network modelling and classification of lithofacies using well log data: A case study from ktb borehole site. Geophy J Int 169(2):733–746
  • Mavko et al. [2009] Mavko G, Mukerji T, Dvorkin J (2009) The Rock Physics Handbook: Tools for Seismic Analysis of Porous Media. Cambridge University Press, Cambridge
  • McCormack et al. [1993] McCormack MD, Zaucha DE, Dushek DW (1993) First-break refraction event picking and seismic data trace editing using neural networks. Geophysics 58:67–78
  • Meiser [1962] Meiser P (1962) A method of quantitative interpretation of self-potential measurements. Geophys Prosp 10:203–218
  • Murat and Rudman [1992] Murat ME, Rudman AJ (1992) Automated first arrival picking: A neural network approach. Geophys Prosp 40:587–604
  • Nikolopoulos and Fellrath [1994] Nikolopoulos C, Fellrath P (1994) A hybrid expert system for investment advising. Expert Syst 11(4):245–250
  • Ojha and Garg [2016] Ojha U, Garg A (2016) Denoising high resolution multispectral images using deep learning approach. In: 15th Int. Conf. Machine Learning and Appl., IEEE, pp 1–5
  • Pizurica et al. [2002] Pizurica A, Philips W, Lemahieu I, Acheroy M (2002) A joint inter- and intrascale statistical model for bayesian wavelet based image denoising. IEEE T Image Process 11(5):545–557
  • Portilla et al. [2003] Portilla J, Strela V, Wainright MJ, Simoncelli EP (2003) Image denoising using scale mixtures of gaussians in the wavelet domain. IEEE T Image Process 12(11):1338–1351
  • Poulton et al. [1992] Poulton MM, Sternberg BK, Glass CE (1992) Location of subsurface targets in geophysical data using neural networks. Geophysics 57:1534–1544
  • Proctor [1991] Proctor RA (1991) An expert system to aid in staff selection: a neural network approach. Int J Manpower 12(8):18–21
  • Raymer et al. [1980] Raymer LL, Hunt ER, Gardner JS (1980) An improved sonic transit time-to-porosity transform. In: Proc. SPWLA 21st Annual Logging Symposium, pp 1–13
  • Reading et al. [2015] Reading AM, Cracknell MJ, Bombardieri DJ, Chalke T (2015) Combining machine learning and geophysical inversion for applied geophysics. In: Proc. ASEG-PESA 2015 Conf., pp 1–4
  • Rogers [1995] Rogers J (1995) Neural network user authentication. AI Expert 10:29–33
  • Ronzato et al. [2007]

    Ronzato M, Poultney CS, Chopra S, LeCun Y (2007) Efficient learning of sparse representations with an energy-based model. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in Neural Information Processing Systems 19 (NIPS’06), MIT Press, pp 1137–1144

  • Rose et al. [2011] Rose K, Boswell R, Collett T (2011) Mount elbert gas hydrate stratigraphic test well, alaska north slope: Coring operations, core sedimentology, and lithostratigraphy. Mar Pet Geo 28(2):311 – 331
  • Röth and Tarantola [1994] Röth G, Tarantola A (1994) Neural networks and inversion of seismic data. J Geophys R 99:6753–6768
  • Roth and Black [2009]

    Roth S, Black MJ (2009) Fields of experts. Int J Comput Vision 82(2):205–229

  • Rudin et al. [1992] Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Physica D 60:259–268
  • Ruggiero Jr. [1994] Ruggiero Jr MA (1994) Training neural networks for intermarket analysis. Futures 23(9):42–44
  • Salchenberger et al. [1992] Salchenberger LM, Cinar EM, Lash NA (1992) Neural networks: a new tool for predicting thrift failures. Decis Sci 23:899–196
  • Santos [2010]

    Santos FAM (2010) Inversion of self-potential of idealized bodies’ anomalies using particle swarm optimization. Comput Geosci 36:1185–1190

  • Santos et al. [2002] Santos FAM, Almeida EP, Castro R, Nolasco M, Mendes-Victor L (2002) A hydrogeological investigation using em34 and sp surveys. Earth Planets Space 54:655–662
  • Schuler et al. [2013] Schuler CJ, Burger HC, Harmeling S, Scholkopf B (2013) A machine learning approach for non-blind image deconvolution. In: Proc. IEEE Conf. Comp. Vision Patt. Recog. (CVPR), pp 1067–1074
  • Shankar et al. [2013] Shankar U, Gupta DK, Bhowmick D, Sain K (2013) Gas hydrate and free gas saturations using rock physics modelling at site nghp-01-05 and 07 in the krishna–godavari basin, eastern indian margin. J Petrol Sci Eng 106:62–70
  • Stafylopatis and Likas [1992] Stafylopatis A, Likas A (1992) Pictorial information retrieval using the random neural network. IEEE Trans Softw Eng 18(7):590–600
  • Sundararajan et al. [1998] Sundararajan N, Rao PS, Sunitha V (1998) An analytical method to interpret self-potential anomalies caused by 2d inclined sheets. Geophysics 63:1551–1555
  • Tomasi and Manduchi [1998] Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proc. 6th Int. Conf. Comp. Vision (ICCV), pp 839–846
  • Tsai et al. [2009] Tsai C, Hsu Y, Lin C, Lin W (2009) Intrusion detection by machine learning: A review. Expert Syst Appl 36(10):11994–12000
  • Valentine and Trampert [2012] Valentine AP, Trampert J (2012) Data space reduction, quality assessment and searching of seismograms: autoencoder networks for waveformdata. Geophys J Int 189(3):1183–1202
  • Valentine et al. [2013] Valentine AP, Kalnins LM, Trampert J (2013) Discovery and analysis of topographic features using learning algorithms: A seamount case study. Geophys Res Lett 40(12):3048–3054
  • Vincent et al. [2010] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
  • Weickert [1998] Weickert J (1998) Anisotropic diffusion in image processing. ECMI Series, Teubner-Verlag, Stuttgart, Germany
  • Weiss and Freeman [2007] Weiss Y, Freeman WT (2007) What makes a good model of natural images? In: Proc. IEEE Int. Conf. Comp. Vis. Patt. Recog. (CVPR), pp 1–8
  • Xiong and Zuo [2016] Xiong Y, Zuo R (2016) Recognition of geochemical anomalies using a deep autoencoder network. Comput Geosci 86:75–82
  • Zhang and Paulson [1997] Zhang Y, Paulson KV (1997) Magnetotelluric inversion using regularized hopfield neural networks. Geophys Prosp 45:725–743
  • Zlotnini and Nishida [2003] Zlotnini J, Nishida Y (2003) Review on morphological insights of self-potential anomalies on volcanoes. Surveys Geophy 24:291–338