I Introduction
Electrical resistivity survey (ERS), as one of the most wellknown and commonly applied investigation technologies (Loke et al. [1]), has been widely used in environmental investigations (LaBrecque et al. [2], Hayley et al. [3], Reynolds [4]), engineering prospecting (Kim et al. [5], Wilkinson et al. [6], Sjodahl et al. [7]), hydrological surveys (Park [8]; Slater and Binley [9], Rucker [10], Cho et al. [11]) and mining applications (Legault et al. [12], Chambers et al. [13], Liu et al. [14]). Geological interpretations using observed data are usually far from revealing the complex characteristics of subsurface properties. Thus, to satisfy geological interpretation purposes, geophysical inversion methods are continually focused on reconstructing more accurate and detailed subsurface properties.
After years of development, nonlinear optimization methods have been widely adopted in ERS inverse problems, such as Genetic Algorithm (Schwarzbach
et al. [15], Liu et al. [16]), Simulated Annealing Algorithm (Sharma and Kaikkonen [17]), Particle Swarm Optimization (PSO) (Shaw and Srivastava
[18]). Particularly, instead of optimizing resistivity model, Artificial Neural Networks (ANNs) build the mapping between apparent resistivity data and resistivity model directly by updating parameters of networks. With ANNs, some significant results in ERS inversion have been gained in both synthetic and field tests (ElQady and Ushijima [19], Neyamadpour et al. [20], Maiti et al. [21], Jiang et al. [22]). Usually, ANNs are optimized by gradients derived from loss functions. Due to the gradient vanish/explosion problems, it is not easy to train a deep and large ANNs with strong modeling capacity at the beginning. Thus, some limitations such as slow convergence, low accuracy and overfitting phenomenon in training remain (AlAbri and Hilal [23]).Recently, many approaches are proposed to help backpropagate gradients efficiently through the whole networks, such as residual units proposed by He et al. [24]
, activation functions proposed by Maas
et al. [25] and normalization layer proposed by Ioffe and Szegedy [26]. With them, ANNs with deep layers and tremendous parameters could be optimized. Accordingly, people tend to refer to deep Artificial Neural Networks (ANNs) as Deep Neural Networks (DNNs). Methods based on DNNs are usually called deep learning which has shown superior performance in many problems that require the perception and decision abilities of machines (LeCun et al. [27]). Originally succeeding in computer vision for the tasks of image perception (
e.g., Krizhevsky et al. [28], Simonyan and Zisserman [29], He et al. [24], Jiang et al. [30], etc.), now DNNs have been prevalent in many fields such as computer graphics, natural language processing (NLP). Besides, at present, DNNs have developed many variants with different computational logic, such as Multilayer Perceptrons (MLPs, also known as fully connected networks), convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which further extends the application scenarios.
Many attempts have already verify that DNNs could approximate very complex nonlinear mapping functions, especially for illposed inverse problems, such as image superresolution (Dong
et al. [31]), medical imaging (Jin et al. [32]), and 3D model reconstruction (Choy et al. [33]). The successes have also inspired many approaches for geophysical problems, particularly in the field of seismic inversion. ArayaPolo et al. [34] use a velocity related feature cube transferred from raw seismic data to generate velocity model by CNNs, while Wu, Lin, and Zhou [35] treat seismic inversion as image mapping and build the mapping from seismic profiles to velocity model directly. Further, Li et al. [36] figure out the weak spatial correspondence and the uncertain reflectionreception relationship problems between seismic data and velocity model, and propose to generate spatially aligned features by MLPs at first. The latter two works could build the mapping from raw seismic data to velocity model directly without data preprocessing. All of these works demonstrate promising performance in result accuracy and computation speed, which bring new perspectives for ERS inversion. Besides, by utilizing big data, DNNs based inversion methods are more robust and have the potential of the practical application.In this paper, deep learning inversion of ERS is to learn the mapping from input (apparent resistivity data) to output (resistivity model) directly by CNNs, which is illustrated in Fig. 1. Typically, the existence of resistivity anomalies in the model will cause responses in apparent resistivity data. Meanwhile, as can be seen from Fig. 1, the responses of observed data caused by different resistivity anomalies also show different patterns, and the patterns demonstrate certain spatial correspondence to resistivity anomalies in the model. Since the patterns are local existing and spatial corresponding to the resistivity anomalies, the input and output can be considered as the natural image, and the task can be treated as the common mapping between images. In this case, CNNs are preferred among a variety of DNNs variants, because they are much powerful in extracting local patterns and more efficiency regards to the number of parameters. And the performance of CNNs has been widely concerned in the research of remote sensing scene (Maggiori et al. [37], Cheng et al. [38], Zhang et al. [39]). Apart from the similarities with the natural images, apparent resistivity data has its own characteristic. As demonstrated in Fig. 2, when the same anomalous body is located at the different vertical positions, the apparent resistivity data show patterns with large difference, namely patterns have the vertically varying characteristic. This characteristic poses a great challenge for CNNs and may make outputs of CNNs ambiguity. It is because that, with the local spatial and weight sharing convolutional kernels in CNNs, CNNs would have certain receptive field and effective area, and when applying CNNs to the data with vertically varying characteristic, there may be situations where CNNs are requested to give different outputs from similar patterns within effective area. The details will be discussed in Sec. III. This may be the main difficulty for applying CNNs to inverse electrical resistivity data.
We adopt the prevalent CNNs based UNet architecture [40] to design our networks (ERSInvNet). To capture the potential global resistivity distribution change caused by resistivity anomalies, we set ERSInvNet with layers to have enough receptive fields. Then to reduce the potential ambiguity caused by vertically varying characteristic of apparent resistivity data, we supply ERSInvNet with vertical position information by concatenating an additional tier feature map to the input data. To address the common problem that deep anomalies are difficult to inverse well, we introduce depth weight function in loss function to let network pay more attention to the deep region of resistivity models. Besides, we apply smooth constraints in the loss function to suppress potential false anomalies. In experiments, synthetic examples in our proposed ERSInv dataset are used to verify the feasibility and efficiency, and ERSInvNet is trained endtoend without any data processing. Through comprehensive qualitative analysis and quantitative comparison, the proposed ERSInvNet consistently achieves promising performance regarding the realtime inference and high inversion accuracy.
Ii Backgrounds
The basic measurements of ERS are made by injecting current into the ground through two current electrodes and measuring the potential difference between other pairs of electrodes. In a typical scenario of ERS, the potential difference data are acquired at the Earth’s surface as observed data . The inverse problem involves inferring a set of parameters in model from a set of data , and it usually relies on the minimization of the objective function. Since we achieve resistivity inversion by building the mapping from the observed data to the resistivity model directly through CNNs, the mechanism of CNNs and related concepts are first displayed in this section.
Multilayer perceptrons (MLPs) are the most basic type of neural networks and have been studies for decades that they are sometimes colloquially referred to as ”vanilla” neural networks. It consists of many layers of neurons which weight and nonlinearly map all inputs from the previous layer to outputs in the current layer. MLPs are a stack of fully connected layers which defined as
(1) 
where denotes the weights in the links from all the neurons in layer to the th neuron in layer while is the corresponding bias. represents used nonlinear activation function, such as and (Maas et al. [25]) that
(2)  
is the most common nonlinear activation function. However, it may cause gradient vanish problem as layer increase. Thus, most recent works prefer to use rectified linear activation functions such as which are also more biologically plausible. Nowadays, dozens of nonlinear activation functions have been proposed with properties to handle different tasks. According to Hornik [41], even the simplest MLPs that contain three layers of neurons (an input layer, a hidden layer, and an output layer.) could approximate continuous functions on compact subsets of
under mild assumptions on the activation function, which is known as Universal approximation theorem. To optimize parameters of networks more efficiently, the backpropagation (BP) algorithm is proposed, it computes the error gradients layer by layer according to the chain rule. Using MLPs is the most direct way to construct the mapping between data of uncertain relationship or with global dependency.
Though MLPs have shown promising performance for many tasks, they are inefficient and impractical to process input with large dimensions, such as images. Usually, the patterns in the images are locally correlated that pixels form the patterns are spatially nearby. Moreover, the patterns in the images are spatially irrelevant that the same pattern may appear in any position. Given these characteristics, CNNs with local spatial and weight sharing convolutional kernels are proposed, which are more efficient and could make the best use of characteristics of natural images. The most basic convolutional operation with inputs in CNNs is defined as:
(3) 
where is the convolution kernel of size for layer and channel while is the corresponding bias, and is the feature map of layer while is the element of position in convolution results of layer . It is easy to get that the feature map in layer is where is the nonlinear activation function. Please note that Eq. 3 is the most basic convolutional operation with
inputs without stride and dilation choice.
A three layers CNNs with only one channel each layer is shown in Fig. 3, the kernel and the bias in each layer share the same weight that and , which is referred to as weight sharing property of CNNs. Modern CNNs are with many convolutional layers, and each layer includes convolution, nonlinear activation and some optional operations, such as batchnorm and pooling. Usually, more layers the CNNs have, much stronger the capacity of CNNs is. One rule to decide the number of CNNs’ layers is making it at least have enough size of receptive field to cover the whole pattern of the object and meanwhile have enough nonlinear expressiveness. Receptive field is the total number of neighborhood pixels or grids in the image the CNNs consumed to give output. As illustrated in Fig. 2
, to classify an image with pentagrams, we concatenate two convolutional layers with
kernels (stride and dilate ). In this way, each final output element could have the receptive filed of size in the input image and thus get aware of the existence of pentagram nearby. Besides, according to Luo et al. [42], the center parts of receptive field usually have more influence than the surrounding parts, which is commonly referred to as effective area. Certainly, there are many choices to satisfy the requirement of receptive field, and the size of receptive field will be affected by many operations such as pooling and upsampling. As for nonlinear expressiveness of CNNs, it is hard to know whether it enough for the task as the mechanism of CNNs has not been studied comprehensively, so people usually use excessive layers to guarantee the nonlinearity.Iii Methodology
Iiia Approach
In this work, we intend to learn the mapping function from apparent resistivity data to resistivity model directly by DNNs that:
(4) 
As stated in Sec. I, because of patterns shown in apparent resistivity data, our task can be treated as the common image to image mapping where CNNs are much powerful and efficient because of CNNs’ weight sharing local convolution kernels. However, with patterns of vertically varying characteristic, CNNs may encounter ambiguous situations. In the following, we will demonstrate and discuss the potential problem when using CNNs on apparent resistivity data.
Firstly, as illustrated in Fig. 4 and proved in Luo et al. [42], even using deep CNNs with receptive field cover the whole apparent resistivity data, the patterns within center parts of receptive field usually have much more influence on the corresponding output model value. Thus, the center parts of receptive field are usually referred to as effective area. Secondly, due to the definition of apparent resistivity, the patterns within effective area in apparent resistivity data have vertically varying characteristic. There may be situations as shown in the top two figures in Fig. 4. The similar patterns within effective area appear at different tier positions , but correspond to different anomalous bodies and model values. Consequently, during training, CNNs may get ambiguous when requested to give different model values from similar input patterns within effective area. As a comparison, for natural images, as the bottom figure in Fig. 4 shows, the patterns of cats correspond to the same semantic meaning in the output no matter where they are located, which means natural images do not have position varying characteristic.
Characteristics of natural images and weightsharing property of CNNs make CNNs powerful and efficient in dealing with natural images. To make the best use of CNNs in ERS and reduce potential ambiguity, we should supplement input data with more distinguishable information which related to input data, thus reducing the potential ambiguity of CNNs when giving output. In surface ERS, when the distance between two injecting electrodes enlarged, the apparent resistivity data with deeper tier positions in vertical direction could be calculated to achieve the electrical sounding purpose. The data with deeper tier position has the stronger correlation with the deeper anomalous body, which means that the tier position information is helpful for CNNs to distinguish the data patterns caused by anomalous bodies with different depths. Therefore, adding the tier position information of the data to the input would benefit CNNs for building the mapping. As shown in Fig. 5, CNNs could be easier to determine the model values from both patterns and location information together than only rely on possible indistinguishable patterns. Finally, we let CNNs learn the mapping from data and location to model value:
(5) 
where denotes the tier positions in vertical directions of apparent resistivity data. In the following section, we will detail the architecture of CNNs and how we introduce .
IiiB Networks
We design our networks based on prevalently used UNet architecture (Ronneberger et al. [40]) as shown in Fig. 7. UNet is well known for its shortcut operation which concatenates feature maps from the shallow layer (lowlevel feature maps) to feature maps from the deep layer (highlevel feature maps). Normally, highlevel features contain knowledge more related to final result value, while lowlevel features have knowledge related to some general concepts such as position, shape, etc. In this way, the shortcut would make the last several layers give outputs based on highlevel and lowlevel knowledge together, so as to help get final results with both accurate value and anomalous morphology. Moreover, the shortcut will help back propagate gradients and accelerate parameter optimization in shallow layers. We also add several residual blocks (He et al. [24]) at the end of UNet to enhance the capacity. Finally, there are layers with convolution operation ( kernel) and nonlinear activation function, and also layers with maxpooling operation, which results in enough large receptive field and nonlinearity for our data and task.
To reduce the potential ambiguity when applying CNNs in our task as discussed in previous sections, we introduce tier feature map and concatenate it to the input data to supplement tier position information. In typical surface detection scenarios, the apparent resistivity data with different tier positions in the vertical direction can be obtained by changing the space between injecting electrodes, which is the basic method of electrical sounding. Under different sounding conditions, the electrodes device moves horizontally along the survey line to form apparent resistivity profile, thereby the data matrix of apparent resistivity could be acquired. In the data matrix of apparent resistivity, the data with different tier positions are strongly correlated with the anomalous structures of corresponding depths. That is to say, introducing tier position information into apparent resistivity data can be regarded as supplementing depth information for CNNs. Our tier feature map is with tier structure that element in each tier has the value equal to tier number, and has the same spatial dimension as . We denote tier feature map as and assume has the height of , thus has total tiers and = , where indicate vertical and horizontal locations respectively, as shown in Fig. 7. Concatenating to the input data is equivalent to treat as another channel of the input data, as illustrated in Fig. 7.
IiiC Loss Function
IiiD Basic Metric and Weighting Function
For value regression problem, we apply prevalently used MSE metric for data value term in loss function. For classical ERS inversion, it is usually more difficult to obtain accurate inversion results for deep anomalies. Li and Oldenburg [43, 44] proposed a weighting function to counteract the natural decay of the static field to overcome the tendency of putting structure at the surface. Its effectiveness has been demonstrated (Kang and Oldenburg [45], Qin et al. [46]). In this case, we also take the idea of depth weighting function in loss function to let the network invest more capability in the deep area. In this way, the inversion accuracy and resolution of deep anomalous bodies will be improved. The depth weighting function is defined as:
(6) 
where is the predicted value at position of the resistivity model. is the constant parameter related to the grid size and the location of the current electrodes. The parameter is a constant for controlling depth weight distribution.
Finally, we design our data value term as
(7) 
where is the inverse result by our networks and is the corresponding ground truth.
IiiE Smooth Constraints
Inversion tasks are often mathematically illposed that the solutions are usually nonunique and unstable. One way to solve this problem is by adopting the welltested smoothness constrained leastsquares approach (Tikhonov et al. [47]). Restricted by the smooth constraints, sudden changes between adjacent grids in the resistivity model will be reduced. We carry out the smooth constraints by introducing smooth term:
(8) 
Smooth term plays the role of regularization and is also known as total variation loss.
IiiF Final Formulation
Consequently, our final loss function is defined as
(9) 
where is the smoothness factor. All the operations and losses are derivable and result in our endtoend networks which we call ERSInvNet.
Iv Experiments
Iva Dataset Preparation
For deep learning based geophysical inversion, data set generally should reach a sufficient amount and guarantee the diversity. As such, in our work, we collect a dataset with pairs of different resistivity model and corresponding apparent resistivity data, which is called ERSInv Dataset. Resistivity model is designed by referring to real  ERS scenarios. We generate synthetic data by predefining a few anomalous bodies with different resistivity value, and then embedding them to the different positions of homogeneous medium ( ). The resistivity anomalous bodies consist of subsets as follows. Type I: Single rectangular body ( sample pairs), Type II: Two rectangular bodies ( sample pairs), Type III: Three rectangular bodies ( sample pairs), Type IV: Single declining bodies ( sample pairs) and Type V: Two declining bodies ( sample pairs). For each type, the resistivity anomalous bodies may have different resistivity values, in our dataset low resistivity anomaly is the one with body value from [ , , ] and high resistivity anomaly is the one with body value from [ , , ]. Accordingly, we have total five different types of resistivity models and the corresponding apparent resistivity data are generated by forward modeling. Schematic diagram and parameters of anomalous bodies are shown in Tab. I.
The selection of electrode configuration for the ERS is crucial in acquiring the response of the observed target because different electrode configurations have different horizontal and vertical resolution [48]. The apparent resistivity data of Wenner and WennerSchlumberger arrays are adopted in this work since these two configurations will have good vertical resolution and appropriate horizontal resolution (Sasaki [49], Szalai et al. [50]) when they used together. Our observation data are generated through forward modeling on resistivity models.
The simulated electrical fields are usually generated by finiteelement methods based on anomalous potential method. The values of both input and output during training are normalized to the range of since standardizing either input or target variables tend to make the training process better behaved (ElQady and Ushijima [19]). With tier feature map proposed in the last section, each input data will have three channels with two channels of apparent resistivity data and one channel of tier feature map. The dataset is randomly divided into training set, validation set and test set in a ratio of (training set: pairs; validation set: pairs; test set: pairs).
IvB Implementation
The neural networks in this work are built using PyTorch. SGD optimizer with batchsize
, learning rate , momentum and weight decay  is used to optimize networks. During training, we carry out epochs of optimization in total, and also perform one time of validation after each training epoch to verify training effect. The parameters and of depth weighting function are set to be and respectively, and the smoothness factor is set to be . In this work, the hyperparameters such as , and are chosen according to the evaluation on the validation set. All computations are carried out with the machine of single NVIDIA TITAN Xp. It is worth to note that in this environment, our ERSInvNet could reach realtime inference during testing with data.In order to verify the proposed ERSInvNet, four experiments are arranged as follows: Experiment 1, ERSInvNet performance analysis; Experiment 2, the ablation study of tier feature map; Experiment 3, the ablation study of depth weighting function and smooth regularization; Experiment 4, comparison of ERSInvNet and linear least squares inversion. In addition to qualitative evaluation through visual judgment, weighted mean square error (WMSE) and weighted correlation coefficient (WR) are also used to measure the performance quantitatively, which is given as follows:
(10)  
where and
are vectorized actual and predicted model values respectively while
is the vectorized weight, and we use to denote the average values of . is designed to make the region far from anomalies in the resistivity model has large weight, because false anomalies far from true anomalies are not preferred, while false anomalies closed to true anomalies are usually acceptable. is the number of samples. WMSE measures the value fitting between prediction and groundtruth with the value the lower the better, while WR measures the statistical relationship between prediction and groundtruth with the value the larger the better and value range in .V Results and Discussion
Va Results of Experiment
In Experiment , some examples will be shown to demonstrate the inversion performance of the proposed method. The misfit degree of locations and shapes of anomalous bodies as well as resistivity values are the major factors considered during evaluation. In Fig. 8, we randomly select five inversion results which correspond to five model types respectively in the test set. The images arranged from left to right are the corresponding ground truth, apparent resistivity data, ERSInvNet results and the vertical resistivity profiles, respectively.
From the first two columns, we can notice that spatial correspondence extensively exists between the apparent resistivity data and the resistivity models. From the overall observation of Fig. 8, ERSInvNet could predict model value accurately and also get good localization of anomalous bodies, which demonstrates its promising inversion ability. In order to visualize the positions, shapes and resistivity values of anomalous bodies in inversion results more intuitively, the resistivity change along anomalous body profile (shown by the black line in Fig. 8) is given in the form of curves on the fourth column. We can see that the resistivity curves of inversion results and models are almost aligned (with the error within ) and change synchronously in most places except regions near the boundary of the anomalous bodies. This is because that the smooth constraints restrict the mutation of resistivity value near abnormal body boundary. The effect of smooth constraints will be discussed in the Sec. VC.
In the third example, three anomalous bodies with different depths can be clearly and accurately reflected in the inversion results. Among them, the high resistivity bodies at the depth of and are closed to the model value, while the resistivity of the deepest one (with value ) is lower than the background (with value ). The reason is that when powered on the surface, the field responses caused by deep anomaly have not shown obvious patterns in the apparent resistivity data. The lack of obvious pattern makes ERSInvNet hard to give accurate predictions. Besides, the inversion results show interesting phenomena that boundary description of high resistivity body is more accurate than that of low resistivity body. More examples from the validation and test sets are shown in the supplementary for further comparison.
In Fig. 9, we show the loss curves of our ERSInvNet on the training and validation set respectively. Both loss curves decrease gradually with the increase of epochs, which indicates the nonexistence of overfitting during training. When the epochs reached times, the loss was reduced below and the trend of decline seems will continue.
VB Results of Experiment
In Experiment , to verify the role of tier feature map, we compare the ERSInvNet results with/without the tier feature map when having both depth weighting function and smooth constraints in the loss function. Examples with five different types of anomalous bodies are randomly selected. Inversion results with/without tier feature map are shown in Fig. 10. From example , example and low resistivity body on the right side of example , we can see the supplement of tier information helps improve the morphological accuracy when inverse resistivity anomalies. Also, the delineation of the boundaries of anomalous bodies is also improved. In example and , for multiple anomalous bodies, the tier feature map helps suppress the obvious false anomalies near true ones. Specifically, in example , the false anomalies around the low resistivity anomalous body are removed after introducing tier feature map, meanwhile the shape of the anomalous body is more accurate. Similarly, three obvious high resistivity false anomalies in example are completely removed. Without tier feature map, using CNNs on data with vertically varying characteristic will cause ambiguity as discussed in Sec. III, and give many assumptions to make loss function lower which finally results in false anomalies. (It is worth to note that what the possible results the networks guessed look like depends on what samples the networks learn during training.)
In summary, the tier feature map can suppress false anomalies. Such rules generally exist in other results. To check the overall performance on validation and test sets, we quantitatively compare results by MAE and metrics in Table. II. It is easy to get that quantitative evaluation also supports the positive effect of the tier feature map. Certainly, besides the effects of tier feature map, the inversion performance also depends on the contribution of our smooth constraints and depth weighting function, which will be discussed in the next subsection.
VC Results of Experiment
Test  Valid  

MSE  MSE  

0.000335  0.540852  0.000341  0.538876  

0.000731  0.387227  0.000721  0.385944 
In experiment , we compare the results with/without smooth constraints and depth weighting function when having tier feature map in the input data. Thus, we have four configurations in total that with smooth constraint and depth weighting function together (SD), with only smooth constraint (OS), with only depth weighting function (OD) and with nothing (NA). In Fig. 11, resistivity models and inversion results of NA, OS, OD and SD are given from left to right.
By comparing the results of NA and OS, we see the results with smooth constraints have fewer false anomalies but poor boundary accuracy. (See the second and third columns). By comparing NA and OD, we found results with depth weighting function have more accurate anomaly morphology as well as anomaly value, especially in the deep area. That is to say, the main contribution of smooth constraints is to suppress false anomalies, while depth weighting function will benefit inversion accuracy. The overall comparison indicates that ERSinvNet with all the tier feature map, smooth constraints and depth weighting function (SD) has the best performance.
Test  Valid  

MSE  MSE  
SD  0.000335  0.540852  0.000341  0.538876 
OD  0.000608  0.387756  0.000592  0.387921 
OS  0.000959  0.335904  0.000971  0.335029 
NA  0.000549  0.425744  0.000555  0.421256 
MSE and scores of four different ERSInvNet configurations on validation and test sets are listed in Tab. III. On the whole, SD has the highest as well as lowest MES value, which is consistent with the results of the visual comparison. However, OD and OS are unexpected even worse than NA after introducing smooth constraints and depth weighting function respectively. Especially, when smooth constraints are applied in OS, the MSE increases by
which means smooth constraints would sacrifice much accuracy for delineating the boundaries of anomalies. For the performance decrease of applying depth weighting function, we guess it may be caused by the overproduced false anomalies in the deep area. With depth weighting function, to avoid missing detection of anomalies in the deep area and causing high loss, networks may make many assumptions which result in false anomalies. After introducing smooth constraints and depth weighting function together (
SD), we got the best performance which indicates that smooth constraints and depth weighting function can mutually benefit and restrain the negative effects.VD Results of Experiment
In Experiment , we benchmark ERSInvNet against the wellknown iterated linear inversion using RES2DINV software which is widely applied in ERS inversion. For a fair comparison, we use synthetic model with position, size and resistivity value of anomalies that unprecedented during the training of our ERSInvNet. And the same configuration are used to generate corresponding resistivity data for both methods. Fig. 12 (b) and (c) show the results of the linear method and our proposed ERSInvNet respectively. The results of both methods can depict the existence of the anomalous bodies, but the ERSInvNet predicts the location, shape of the conductive block and the resistivity value more precisely than the linear method. Compared with traditional methods, DNNs based methods could utilize data prior learned from the training set as well as humanintroduced priors such as smoothness, meanwhile they have more powerful nonlinear approximation abilities. With all these advantages, DNNs reach these promising results in our task.
Vi Conclusions
In this paper, we propose a CNNs based network called ERSInvNet for inverse problems on resistivity data. Though some attempts of CNNs based tomography have been made, ERS inversion is different from the previous studies because of the vertically varying characteristic inherent in apparent resistivity data. This characteristic will lead to ambiguity when using CNNs directly. To address this issue, we supplement a tier feature map to the input data. Besides, to further reduce the false anomalies and improve the prediction accuracy for the deep region, smooth constraints and depth weighting function are introduced into loss function during training.
To train, validate and test the proposed ERSInvNet, we collect a dataset that contains pairs of apparent resistivity data and resistivity model. Comparative experiments show that including the tier feature map helps to obtain more accurate inversion results and suppress false anomalies. The individually use of smooth constraints and depth weighting function can reduce false anomalies or improve prediction accuracy for the deep region. However it will sacrifice performance in other aspects. Through comprehensive qualitative analysis and quantitative comparison, simultaneously use of both them achieves the best results. Moreover, comparing with traditional methods, ERSInvNet could reach realtime inference during testing. In future research, we will focus on the establishment of a general dataset covering typical geological conditions and the application of field data.
Vii Acknowledgments
This work was supported by the grants of National Natural Science Foundation of China (No. 51739007, 61702301), the United Fund of National Natural Science Foundation of China (U1806226), Development Program of China (No. 2016YFC0401805), Key Research and Development Plan of Shandong Province (Z135050009107), the Interdiscipline Development Program of Shandong University (No. 2017JC002) and Fundamental Research Funds of Shandong University.
References
 [1] M. Loke, J. Chambers, D. Rucker, O. Kuras, and P. Wilkinson, “Recent developments in the directcurrent geoelectrical imaging method,” Journal of Applied Geophysics, vol. 95, pp. 135–156, 2013.
 [2] D. LaBrecque, A. Ramirez, W. Daily, A. Binley, and S. Schima, “Ert monitoring of environmental remediation processes,” Measurement Science and Technology, vol. 7, no. 3, pp. 375–385, 1996.
 [3] K. Hayley, L. Bentley, and M. Gharibi, “Timelapse electrical resistivity monitoring of saltaffected soil and groundwater,” Water Resources Research 45, W07425, 2009.
 [4] J. Reynolds, “An introduction to applied and environmental geophysics, 2nd edition,” John Wiley & Sons, 2011.
 [5] J. Kim, M. Yi, Y. Song, S. Seol, and K. Kim, “Application of geophysical methods to the safety analysis of an earth dam,” Journal of Environmental and Engineering Geophysics, vol. 12, no. 2, pp. 221–235, 2007.
 [6] P. Wilkinson, J. Chambers, P. Meldrum, D. Gunn, R. Ogilvy, and O. Kuras, “Predicting the movements of permanently installed electrodes on an active landslide using timelapse geoelectrical resistivity data only,” Geophysical Journal International, vol. 183, no. 2, pp. 543–556, 2010.
 [7] P. Sjodahl, T. Dahlin, and S. Johansson, “Using the resistivity method for leakage detection in a blind test at the rossvatn embankment dam test facility in norway,” Bulletin of Engineering Geology and the Environment, vol. 69, no. 4, pp. 643–658, 2010.
 [8] S. Park, “Fluid migration in the vadose zone from 3d inversion of resistivity monitoring data,” Geophysics, vol. 63, no. 1, pp. 41–51, 1998.
 [9] L. Slater and A. Binley, “Evaluation of permeable reactive barrier (prb) integrity using electrical imaging methods,” Geophysics, vol. 68, no. 3, pp. 911–921, 2003.
 [10] D. Rucker, “A coupled electrical resistivityinfiltration model for wetting front evaluation,” Vadose Zone Journal, vol. 8, no. 2, pp. 383–388, 2009.
 [11] S. Cho, H. Jung, H. Lee, H. Rim, and S. K. Lee, “Realtime underwater object detection based on dc resistivity method,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 11, pp. 6833–6842, 2016.
 [12] J. Legault, D. Carriere, and L. Petrie, “Synthetic model testing and distributed acquisition dc resistivity results over an unconformity uranium target from the athabasca basin, northern saskatchewan,” The Leading Edge, vol. 27, no. 1, pp. 46–51, 2008.
 [13] J. Chambers, P. Wilkinson, D. Wardrop, A. Hameed, I. Hill, C. Jeffrey, M. Loke, P. Meldrum, O. Kuras, M. Cave, and D. Gunn, “Bedrock detection beneath river terrace deposits using threedimensional electrical resistivity tomography,” Geomorphlogy, pp. 17–25, 2012.
 [14] X. Liu, F. Liu, J. Chen, Z. Zhao, A. Wang, and Z. Lu, “Resistivity logging through casing response of inclined fractured formation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 8, pp. 4919–4929, 2018.
 [15] C. Schwarzbach, R.U. Borner, and K. Spitzer, “Twodimensional inversion of direct current resistivity data using a parallel, multiobjective genetic algorithm,” Geophysical Journal International, pp. 685–695, 2005.
 [16] B. Liu, S. Li, L. Nie, J. Wang, X. L, and Q. Zhang, “3d resistivity inversion using an improved genetic algorithm based on control method of mutation direction,” Journal of Applied Geophysics, vol. 87, no. 12, pp. 1–8, 2012.
 [17] S. P. Sharma, “Vfsaresa very fast simulated annealing fortran program for interpretation of 1d dc resistivity sounding data from various electrode arrays,” Computers & Geosciences, pp. 177–188, 2012.
 [18] R. Shaw and S. Srivastava, “Particle swarm optimization: A new tool to invert geophysical data,” Geophysics, vol. 72, pp. 75–83, 2007.
 [19] G. ElQady and K. Ushijima, “Inversion of dc resistivity data using neural networks,” Geophysical Prospecting, vol. 49, pp. 417–430, 2001.
 [20] A. Neyamadpour, W. W. Abdullah, S. Taib, and D. Niamadpour, “3d inversion of dc data using artificial neural networks,” Studia Geophysica Et Geodaetica, vol. 54, pp. 465–485, 2010.
 [21] S. Maiti, V. C. Erram, G. Gupta, and R. K. Tiwari, “Ann based inversion of dc resistivity data for groundwater exploration in hard rock terrain of western maharashtra (india),” Journal of Hydrology, 2012.
 [22] F. Jiang, L. Dong, and Q. Dai, “Electrical resistivity imaging inversion: An isfla trained kernel principal component wavelet neural network approach,” Neural Networks, vol. 104, pp. 114–123, 2018.
 [23] M. AlAbri and N. Hilal, “Artificial neural network simulation of combined humic substance coagulation and membrane filtration,” Chemical Engineering Journal, vol. 141, pp. 27–34, 2008.

[24]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2016.  [25] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.

[26]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in
Proceedings of the 32nd International Conference on Machine Learning
, 2015.  [27] Y. LeCun, Y. Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

[28]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012, pp. 1097–1105.  [29] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv:1409.1556, 2014.
 [30] P. Jiang, F. Gu, Y. Wang, C. Tu, and B. Chen, “Difnet: Semantic segmentation by diffusion networks,” in Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018.
 [31] C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016.
 [32] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.
 [33] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3dr2n2: A unified approach for single and multiview 3d object reconstruction,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016.
 [34] M. ArayaPolo, J. Jennings, A. Adler, and T. Dahlke, “Deeplearning tomography,” The Leading Edge, vol. 37, no. 1, pp. 58–66, 2018.
 [35] Y. Wu, Y. Lin, and Z. Zhou, “Inversionnet: Accurate and efficient seismic waveform inversion with convolutional neural networks,” in SEG Technical Program Expanded Abstracts 2018. Society of Exploration Geophysicists, 2018, pp. 2096–2100.
 [36] S. Li, B. Liu, Y. Ren, Y. Chen, S. Yang, Y. Wang, and P. Jiang, “Deep learning inversion of seismic data,” arXiv:1901.07733, 2019.
 [37] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional neural networks for largescale remotesensing image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 645–657, 2017.

[38]
G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative cnns,”
IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 5, pp. 2811–2821, 2018.  [39] Q. Zhang, Q. Yuan, C. Zeng, X. Li, and Y. Wei, “Missing data reconstruction in remote sensing image with a unified spatialtemporalspectral deep convolutional neural network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 8, pp. 4274–4288, 2018.
 [40] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and ComputerAssisted Intervention – MICCAI 2015, 2015.
 [41] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.
 [42] W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,” in Advances in Neural Information Processing Systems 29, 2016, pp. 4898–4906.
 [43] Y. Li and D. Oldenburg, “3d inversion of magnetic data,” Geophysics, vol. 61, no. 2, pp. 394–408, 1996.
 [44] ——, “3d inversion of gravity data,” Geophysics, vol. 63, no. 1, pp. 109–119, 1998.
 [45] S. Kang and D. Oldenburg, “On recovering distributed ip information from inductive source time domain electromagnetic data,” Geophysical Journal International, vol. 207, no. 1, pp. 174–196, 2016.
 [46] P. Qin, D. Huang, Y. Yuan, M. Geng, and J. Liu, “Integrated gravity and gravity gradient 3d inversion using the nonlinear conjugate gradient,” Journal of Applied Geophysics, vol. 126, pp. 52–73, 2016.
 [47] T. A.N., G. A.V., S. V.V., and Y. A.G., Numerical Methods for the Solution of Illposed Problems. Kluwer Academic Publisher, 1995.
 [48] B. Zhou and S. Greenhalgh, “Crosshole resistivity tomography using different electrode configurations,” Geophysical Prospecting, vol. 48, no. 5, pp. 887–912, 2000.
 [49] Y. Sasaki, “Resolution of resistivity tomography inferred from numerical simulation,” Geophysical Prospecting, vol. 40, pp. 453–464, 1992.
 [50] S. Szalai and L. Szarka, “On the classification of surface geoelectric arrays,” Geophysical Prospecting, vol. 56, no. 8, pp. 4274–4288, 2008.