1 Introduction
Geophysical properties (such as velocity, impedance, and density) play an important role in various subsurface applications including subsurface energy exploration, carbon capture and sequestration, estimating pathways of subsurface contaminant transport, and earthquake early warning systems to provide critical alerts. These properties can be obtained via seismic surveys, i.e., receiving reflected/refracted seismic waves generated by a controlled source. This paper focuses on reconstructing subsurface velocity maps from seismic measurements. Mathematically, the velocity map and seismic measurements are correlated through an acousticwave equation (a secondorder partial differential equation) as follows:
(1) 
where denotes the pressure wavefield at spatial location and time , represents the velocity map of wave propagation, and is the source term. Full Waveform Inversion (FWI) is a methodology that determines highresolution velocity maps of the subsurface via matching synthetic seismic waveforms to raw recorded seismic data , where represents the locations of the seismic receivers.
A velocity map describes the wave propagation speed in the subsurface region of interest. An example in 2D scenario is shown in Figure 1(a). Particularly, the xaxis represents the horizontal offset of a region, and the yaxis stands for the depth. The regions with the same geologic information (velocity) are called a layer in velocity maps. In a sample of seismic measurements (termed a shot gather in geophysics) as depicted in Figure 1(b), each grid in the xaxis represents a receiver, and the value in the yaxis is a 1D timeseries signal recorded by each receiver.
Existing approaches solve FWI in two directions: physicsdriven and datadriven. Physicsdriven approaches rely on the forward modeling of Equation 1
, which simulates seismic data from velocity map by finite difference. They optimize velocity map per seismic sample, by iteratively updating velocity map from an initial guess such that simulated seismic data (after forward modeling) is close to the input seismic measurements. However, these methods are slow and difficult to scale up as the iterative optimization is required per input sample. Datadriven approaches consider FWI problem as an imagetoimage translation task and apply convolution neural networks (CNN) to learn the mapping from seismic data to velocity maps
(Wu and Lin, 2019). The limitation of these methods is that they require paired seismic data and velocity maps to train the network. Such ground truth velocity maps are hardly accessible in realworld scenario because generating them is extremely timeconsuming even for domain experts.In this work, we leverage advantages of both directions (physics + data driven) and shift the paradigm to unsupervised learning of FWI by connecting forward modeling and CNN in a loop. Specifically, as shown in Figure 1, a CNN is trained to predict a velocity map from seismic data, which is followed by forward modeling to reconstruct seismic data. The loop is closed by applying reconstruction loss on seismic data to train the CNN. Due to the differentiable forward modeling, the whole loop can be trained endtoend. Note that the CNN is trained in an unsupervised manner, as the ground truth of velocity map is not needed. We name our unsupervised approach as UPFWI (Unsupervised Physicalinformed Full Waveform Inversion).
Additionally, we find that perceptual loss (Johnson et al., 2016) is crucial to improve the overall quality of predicted velocity maps due to its superior capability in preserving the coherence of the reconstructed waveforms comparing with other losses like Mean Squared Error (MSE) and Mean Absolute Error (MAE).
To encourage fair comparison on a large dataset with more complicate geological structures, we introduce a new dataset named OpenFWI, which contains 60,000 labeled data (velocity map and seismic data pairs) and 48,000 unlabeled data (seismic data alone). 30,000 of those velocity maps contain curved layers that are more challenge for inversion. We also add geological faults with various shift distances and tilting angles to all velocity maps.
We evaluate our method on this large dataset. Experimental results show that for velocity maps with flat layers, our UPFWI trained with 48,000 unlabeled data achieves 1146.09 in MSE, which is 26.77% smaller than that of the supervised method, and 0.9895 in Structured Similarity (SSIM), which is 0.0021 higher than the score of the supervised method; for velocity maps with curved layers, our UPFWI achieves 3639.96 in MSE, which is 28.30% smaller than that of supervised method, and 0.9756 in SSIM, which is 0.0057 higher than the score of the supervised method.
Our contribution is summarized as follows:

We propose to solve FWI in an unsupervised manner by connecting CNN and forward modeling in a loop, enabling endtoend learning from seismic data alone.

We find that perceptual loss is helpful to boost the performance comparable to the supervised counterpart.

We introduce a largescale dataset as benchmark to encourage further research on FWI.
2 Preliminaries of Full Waveform Inversion (FWI)
The goal of FWI in geophysics is to invert for a velocity map from seismic measurements , where and denote the horizontal and vertical dimensions of the velocity map, is the number of sources that are used to generate waves during data acquisition process, denotes the number of samples in the wavefields recorded by each receiver, and represents the total number of receivers.
In conventional physicsdriven methods, forward modeling is commonly referred to the process of simulating seismic data from given estimated velocity maps . For simplicity, the forward acousticwave operator can be expressed as
(2) 
Given this forward operator , the physicsdriven FWI can be posed as a minimization problem (Virieux and Operto, 2009)
(3) 
where is the the distance between true seismic measurements and the corresponding simulated data , is a regularization parameter and is the regularization term which is often the or norm of . This requires optimization per sample, which is slow as the optimization involves multiple iterations from an initial guess.
Datadriven methods leverage convolutional neural networks to directly learn the inverse mapping as (Adler et al., 2021)
(4) 
where is the approximated inverse operator of parameterized by . In practice, is usually implemented as a convolutional neural network (Adler et al., 2021; Wu and Lin, 2019)
. This requires paired seismic data and velocity maps for supervised learning. However, the acquisition of large volume of velocity maps in field applications can be extremely challenging and computationally prohibitive.
3 Method
In this section, we present our Unsupervised Physicsinformed solution (named UPFWI), which connects CNN and forward modeling in a loop. It addresses limitations of both physicsdriven and datadriven approaches, as it requires neither optimization at inference (per sample), nor velocity maps as supervision.
3.1 UPFWI: Connecting CNN and Forward Modeling
As depicted in Figure 1, our UPFWI connects a CNN and a differentiable forward operator to form a loop. In particular, the CNN takes seismic measurements as input and generates the corresponding velocity map . We then apply forward acousticwave operator (see Equation 2) on the estimated velocity map to reconstruct the seismic data . Typically, the forward modeling employs finite difference (FD) to discretize the wave equation (Equation 1). The details of forward modeling will be discussed in the subsection 3.3. The loop is closed by the reconstruction loss between input seismic data and the reconstructed seismic data . Notice that the ground truth of velocity maps is not involved, and the training process is unsupervised
. Since the forward operator is differentiable, the reconstruction loss can be backpropagated (via gradient descent) to update the parameters
in the CNN.3.2 CNN Network Architecture
We use an encoderdecoder structured CNN (similar to Wu and Lin (2019) and Zhang and Lin (2020)) to model the mapping from seismic data to velocity map
. The encoder compresses the seismic input and then transforms the latent vector to build the velocity estimation through a decoder. Since the number of receivers
and the number of timesteps in seismic measurements are unbalanced (), we first stack a 71 and six 31 convolutional layers (with stride 2 every the other layer to reduce dimension) to extract temporal features until the temporal dimension is close to
. Then, six 33 convolutional layers are followed to extract spatialtemporal features. The resolution is downsampled every the other layer by using stride 2. Next, the feature map is flattened and a fully connected layer is applied to generate the latent feature with dimension 512. The decoder first repeats the latent vector by 25 times to generate a 55512 tensor. Then it is followed by five 3
3 convolutional layers with nearest neighbor upsampling in between, resulting in a feature map with size 808032. Finally, we centercrop the feature map (7070) and apply a 33 convolution layer to output a single channel velocity map. All the aforementioned convolutional and upsampling layers are followed by a batch normalization
(Ioffe and Szegedy, 2015)and a leaky ReLU
(Nair and Hinton, 2010)3.3 Differentiable Forward Modeling
We apply the standard finite difference (FD) in the space domain and time domain to discretize the original wave equation. Specifically, the secondorder central finite difference in time domain ( in Equation 1) is approximated as follows:
(5) 
where denotes the pressure wavefields at timestep , and and are the wavefields at and , respectively. The Laplacian of can be estimated in the similar way on the space domain (see Appendix). Therefore, the wave equation can then be written as
(6) 
where here denotes the discrete Laplace operator.
The initial wavefield at the timestep 0 is set zero (i.e. ). Thus, the gradient of loss with respect to estimated velocity at spatial location
can be computed using the chain rule as
(7) 
where indicates the length of the sequence.
3.4 Loss Function
The reconstruction loss of our UPFWI includes a pixelwise loss and a perceptual loss as follows:
(8) 
where and are input and reconstructed seismic data, respectively. The pixelwise loss combines and distance as:
(9) 
where and are two hyperparameters to control the relative importance. For the perceptual loss , we extract features from conv5 in a VGG16 network (Simonyan and Zisserman, 2015)
pretrained on ImageNet
(Krizhevsky et al., 2012) and combine the and distance as:(10) 
where represents the output of conv5 in the VGG16 network, and and are two hyperparameters. Compared to the pixelwise loss, the perceptual loss is better to capture the regionwise structure, which reflects the waveform coherence. This is crucial to boost the overall accuracy of velocity map (e.g. the quantitative velocity values and the structural information).
4 OpenFWI Dataset
We introduce a new largescale geophysics FWI dataset OpenFWI, which consists of 108K seismic data for two types of velocity maps: one with flat layers (named FlatFault) and the other one with curved layers (named CurvedFault). Each type has 54K seismic data, including 30K with paired velocity maps (labeled) and 24K unlabeled. The 30K labeled pairs of seismic data and velocity maps are splitted as 24K/3K/3K for training, validation and testing respectively. Samples are shown in Appendix.
The shape of curves in our dataset follows a sine function. Velocity maps in CurvedFault are designed to validate the effectiveness of FWI methods on curved topography. Compared to the maps with flat layers, curved velocity maps yield much more irregular geological structures, making inversion more challenging. Both FlatFault and CurvedFault contain 30,000 samples with 2 to 4 layers and their corresponding seismic data. Each velocity map has dimensions of 70
70, and the grid size is 15 meter in both directions. The layer thickness ranges from 15 grids to 35 grids, and the velocity in each layer is randomly sampled from a uniform distribution between 3,000 meter/second and 6,000 meter/second. The velocity is designed to increase with depth to be more physically realistic. We also add geological faults to every velocity map. The faults shift from 10 grids to 20 grids, and the tilting angle ranges from 123
to 123.To synthesize the seismic data, five sources are evenly distributed on the surface with a spacing of 255 meter, and the seismic traces are recorded by 70 receivers positioned at each grid with an interval of 15 meter. The source is a Ricker wavelet with a central frequency of 25 Hz Wang (2015). Each receiver records timeseries data for 1 second, and we use a 1 millisecond sample rate to generate 1,000 timesteps. Therefore, the dimensions of seismic data become 5100070. Compared to the existing datasets (Yang and Ma, 2019; Moseley et al., 2020), OpenFWI is significantly larger. It includes more complicated and physically realistic velocity maps. We hope it establishes a more challenging benchmark for the community.
5 Experiments
In this section, we present experimental results of our proposed UPFWI evaluated on the OpenFWI dataset. We also discuss different factors that affect the performance of our method.
5.1 Implementation Details
Training Details: The input seismic data are normalized to range [1, 1]. We employ AdamW (Loshchilov and Hutter, 2018) optimizer with momentum parameters , and a weight decay of to update all parameters of the network. The initial learning rate is set to be , and we reduce the learning rate by a factor of 10 when validation loss reaches a plateau. The minimum learning rate is set to be . The size of minibatch is set to be 16. All tradeoff hyperparameters
in our loss function are set to be 1. We implement our models in Pytorch and train them on 8 NVIDIA Tesla V100 GPUs. All models are randomly initialized.
Evaluation Metrics: We consider three metrics for evaluating the velocity maps inverted by our method: MAE, MSE and Structural Similarity (SSIM). Both MAE and MSE have been employed in the existing methods (Wu and Lin, 2019; Zhang and Lin, 2020) to measure the pixelwise error. Considering that the layeredstructured velocity maps contain highly structured information, degradation or distortion in velocity maps can be easily perceived by a human. To better align with human vision, we employ SSIM to measure the perceptual similarity. It is important to note that for MAE and MSE calculation, we denormalize velocity maps to their original scale while we keep them in normalized scale [1, 1] for SSIM according to the algorithm.
Comparison: We compare our method with three stateoftheart algorithms: two pure datadriven methods, i.e., InversionNet (Wu and Lin, 2019) and VelocityGAN (Zhang and Lin, 2020), and a physicsinformed method HPGNN (Sun et al., 2021). We follow the implementation described in these papers and search for the best hyperparameters for OpenFWI dataset. Note that we improve HPGNN by replacing the network architecture with the CNN in our UPFWI and adding perceptual loss, resulting in a significant boosted performance. We refer our implementation as HPGNN+, which is a strong supervised baseline. Our method has two variants (UPFWI24K and UPFWI48K), using 24K and 48K unlabeled seismic data respectively.
5.2 Main Results
Dataset  Supervision  Method  MAE  MSE  SSIM 
FlatFault  Supervised  InversionNet  15.83  2156.00  0.9832 
VelocityGAN  16.15  1770.31  0.9857  
HPGNN+ (our implementation)  12.91  1565.02  0.9874  
Unsupervised  UPFWI24K (ours)  16.27  1705.35  0.9866  
UPFWI48K (ours)  14.60  1146.09  0.9895  
CurvedFault  Supervised  InversionNet  23.77  5285.38  0.9681 
VelocityGAN  25.83  5076.79  0.9699  
HPGNN+ (our implementation)  24.19  5139.60  0.9685  
Unsupervised  UPFWI24K (ours)  29.59  5712.25  0.9652  
UPFWI48K (ours)  23.56  3639.96  0.9756 
Results on FlatFault: Table 1 shows the results of different methods on FlatFault. Compared to datadriven InversionNet and VelocityGAN, our UPFWI24K performs better in MSE and SSIM, but is slightly worse in MAE score. Compared to the physicsinformed HPGNN+, there is a gap between our UPFWI24K and HPGNN+ when trained with the same amount of data. However, after we double the size of unlabeled data (from 24K to 48K), a significant improvement is observed in our UPFWI48K for all three metrics, and it outperforms all three supervised baselines in MSE and SSIM. This demonstrates the potential of our UPFWI for achieving higher performance with more unlabeled data involved.
The velocity maps inverted by different methods are shown in Figure 4. Consistent with our quantitative analysis, more accurate details are observed in the velocity maps generated by UPFWI48K. For instance, in the first row of Figure 4, although all models somehow reveal the small geophysical fault near to the right boundary of the velocity map, only UPFWI48K reconstructs a clear interface between layers as highlighted by the yellow square. In the second row, we find both InversionNet and VelocityGAN generate blurry results in deep region, while HPGNN+, UPFWI24K and UPFWI48K yield much clearer boundaries. We attribute this finding as the impact of seismic loss. We further observe that the slope of the fault in deep region is different from that in the shallow region, yet only UPFWI48K replicates this result as highlighted by the green square.
Results on CurvedFault Table 1 shows the results of CurvedFault. Performance degradation is observed for all models, due to the more complicated geological structures in CurvedFault. Although our UPFWI24K underperforms the three supervised baselines, our UPFWI48K significantly boosts the performance, outperforming all supervised methods in terms of all three metrics. This demonstrates the power of unsupervised learning in our UPFWI that greatly benefits from more unlabeled data when dealing with more complicated curved structure.
Figure 5 shows the visualized velocity maps in CurvedFault obtained using different methods. Similar to the observation in FlatFault, our UPFWI48K yields more accurate details compared to the results of supervised methods. For instance, in the first row, only our UPFWI24K and UPFWI48K precisely reconstruct the fault beneath the curve around the topleft corner as highlighted by the yellow square. Although some artifacts are observed in the results of UPFWI24K around the layer boundary in deep region, they are eliminated in the results of UPFWI48K. As for the example in the second row, it is obvious that the shape of geological anomalies in the shallow region is best reconstructed by our UPFWI24K and UPFWI48K as highlighted by the red square. More visualization results are shown in the Appendix.
5.3 Ablation Study
Below we study the contribution of different loss functions: (a) pixelwise distance (MSE), (b) pixelwise distance (MAE), and (c) perceptual loss. All experiments are conducted on FlatFault using 24,000 unlabeled data.
Figure 5(a) shows the predicted velocity maps for using three loss combinations (pixel, pixel, pixel+perceptual) in UPFWI. The ground truth seismic data and velocity map are shown in the left column. For each loss option, we show the difference between the reconstructed and the input seismic data (on the top) and predicted velocity (on the bottom). When using pixelwise loss in distance alone, there are some obvious artifacts in both seismic data (around 600 millisecond) and velocity map. These artifacts are mitigated by introducing additional pixelwise loss in distance. With perceptual loss added, more details are correctly retained (e.g. seismic data from 400 millisecond to 600 millisecond, velocity boundary between layers). Figure 5(b) compares the reconstructed seismic data (in terms of residual to the ground truth) at a slice of 525 meter offset (orange dash line in Figure 5(a)). Clearly, the combination of pixelwise and perceptual loss has the smallest residual.
The quantitative results are shown in Table 2. They are consistent with the observation in qualitative analysis (Figure 5(a)). In particular, using pixelwise loss in distance has the worst performance. The involvement of distance mitigates all velocity errors but is slightly worse on MSE and SSIM of seismic error. Adding perceptual loss further boosts the performance in all performance metrics by a clear margin. This shows that perceptual loss is helpful to retain waveform coherence, which is correlated to the velocity boundary, and validates our proposed loss function (combining pixelwise and perceptual loss).
Loss  Velocity Error  Seismic Error  

pixel  pixel  perceptual  MAE  MSE  SSIM  MAE  MSE  SSIM 
✓  32.61  10014.47  0.9735  0.0167  0.0023  0.9978  
✓  ✓  21.71  2999.55  0.9775  0.0155  0.0025  0.9977  
✓  ✓  ✓  16.27  1705.35  0.9866  0.0140  0.0021  0.9984 
6 Discussion
Our UPFWI has two major limitations. Firstly, it needs further improvement on a small number of challenging velocity maps where adjacent layers have very close velocity values. We find that the lack of supervision is not the cause as our UPFWI can yield comparable or even better results compared to its supervised counterparts. Another limitation is the speed and memory consumption for forward modeling, as the gradient of finite difference (see Equation 6) need to be stored for backpropagation. We will explore different loss functions (e.g. adversarial loss) and the methods that can balance the requirement of computation resources and the accuracy in the future work. We believe the idea of connecting CNN and PDE to solve full waveform inversion has potential to be applied to other inverse problems with a governing PDE such as medical imaging and flow estimation.
7 Related Work
Physicsdriven Methods There are two primary physicsdriven methods, depending on the complexity of the forward model. The simpler one is via travel time inversion (Tarantola, 2005), which has a linear forward operator, but provides results of inferior accuracy (Lin et al., 2015). FWI techniques (Virieux and Operto, 2009), being the other one, provide superior solutions by modeling the wave propagation, but the forward operator is nonlinear and computationally expensive. Furthermore the problem is illposed (Virieux and Operto, 2009), making a prior model of the solution space essential. Since regularized FWI solved via iterative techniques need to apply the forward model many times, these solutions are very computationally expensive. In addition, existing regularized FWI methods employ relatively simplistic models of the solution space (Hu et al., 2009; Burstedde and Ghattas, 2009; Ramírez and Lewis, 2010; Lin and Huang, 2017, 2015b, 2015a; Guitton, 2012; Treister and Haber, 2016), leaving considerable room for improvement in the accuracy of the solutions. Another common approach to alleviate the illposedness and nonlinearity of FWI is via multiscale techniques (Bunks et al., 1995; Boonyasiriwat et al., 2009; Feng and Schuster, 2019). Rather than matching the seismic data all at once, the multiscale techniques decompose the data into different frequency bands so that the lowfrequency components will be updated first and then followed with higher frequency components.
Datadriven Methods
Recently, a new type of methods has been developed based on deep learning.
ArayaPolo et al. (2018) proposed a model based on a fully connected network. Wu and Lin (2019) further converted FWI into an imagetoimage translation task with an encoderdecoder structure that can handle more complex velocity maps. Zhang and Lin (2020)adopted GAN and transfer learning to improve the generalization.
Li et al. (2020) designed SeisInvNet to solve the misaligned issue when dealing sources from different locations. In Yang and Ma (2019), a UNet architecture was proposed with skip connections. Feng et al. (2021) proposed a datadriven multiscale framework by considering different frequency components. RojasGómez et al. (2020) developed an adaptive data augmentation method to improve the generalization. Ren et al. (2020) combined the datadriven and physicsbased methods and proposed HPGNN model. Some similar ideas were developed on different physical forward models. Wang et al. (2020) proposed a model by connecting two CNNs approximating the forward model and inversion process, and their model was tested on welllogging data. Alfarraj and AlRegib (2019) utilized the forward model to constrain the training of convolutional and recurrent neural layers to invert welllogging seismic data for elastic impedance. All of those aforementioned works were developed based on supervised learning. Biswas et al. (2019) designed an unsupervised CNN to estimate subsurface reflectivity using prestack seismic angle gather. Comparing to FWI, their problem is simpler because of the approximated and linearized forward model.8 Conclusion
In this study, we introduce an unsupervised method named UPFWI to solve FWI by connecting CNN and forward modeling in a loop. Our method can learn the inverse mapping from seismic data alone in an endtoend manner. We demonstrate through a series of experiments that our UPFWI trained with sufficient amount of unlabeled data outperforms the supervised counterpart on our dataset to be released. The ablation study further substantiates that perceptual loss is a critical component in our loss function and has a great contribution to the performance of our UPFWI.
References
 Deep learning for seismic inverse problems: toward the acceleration of geophysical analysis workflows. IEEE Signal Processing Magazine 38 (2), pp. 89–119. Cited by: §2.
 Semisupervised sequence modeling for elastic impedance inversion. Interpretation 7 (3), pp. SE237–SE249. Cited by: §7.
 Deeplearning tomography. The Leading Edge 37 (1), pp. 58–66. Cited by: §7.
 Prestack and poststack inversion using a physicsguided convolutional neural network. Interpretation 7 (3), pp. SE161–SE174. Cited by: §7.
 An efficient multiscale method for timedomain waveform tomography. Geophysics 74 (6), pp. WCC59–WCC68. Cited by: §7.
 Multiscale seismic waveform inversion. Geophysics 60 (5), pp. 1457–1473. Cited by: §7.
 Algorithmic strategies for full waveform inversion: 1D experiments. Geophysics 74 (6), pp. 37–46. Cited by: §7.
 Application of the perfectly matched absorbing layer model to the linear elastodynamic problem in anisotropic heterogeneous media. Geophysics 66 (1), pp. 294–307. Cited by: §A.1.
 Multiscale datadriven seismic fullwaveform inversion with field data study. IEEE Transactions on Geoscience and Remote Sensing (), pp. 1–14. External Links: Document Cited by: §7.
 Transmission+ reflection anisotropic waveequation traveltime and waveform inversion. Geophysical Prospecting 67 (2), pp. 423–442. Cited by: §7.
 Blocky regularization schemes for full waveform inversion. Geophysical Prospecting 60, pp. 870–884. Cited by: §7.
 Simultaneous multifrequency inversion of fullwaveform seismic data. Geophysics 74 (2), pp. 1–14. Cited by: §7.

Batch normalization: accelerating deep network training by reducing internal covariate shift.
In
International Conference on Machine Learning
, pp. 448–456. Cited by: §3.2. 
Perceptual losses for realtime style transfer and superresolution
. InEuropean Conference on Computer Vision
, pp. 694–711. Cited by: §1.  ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, pp. 1097–1105. Cited by: §3.4.
 Deeplearning inversion of seismic data. IEEE Transactions on Geoscience and Remote Sensing 58 (3), pp. 2135–2149. External Links: Document Cited by: §7.
 Acoustic and elasticwaveform inversion using a modified TotalVariation regularization scheme. Geophysical Journal International 200 (1), pp. 489–502. External Links: Document Cited by: §7.
 Quantifying subsurface geophysical properties changes using doubledifference seismicwaveform inversion with a modified TotalVariation regularization scheme. Geophysical Journal International 203 (3), pp. 2125–2149. External Links: Document Cited by: §7.
 Building subsurface velocity models with sharp interfaces using interfaceguided seismic fullwaveform inversion. Pure and Applied Geophysics 174 (11), pp. 4035–4055. External Links: Document Cited by: §7.
 Doubledifference traveltime tomography with edgepreserving regularization and a priori interfaces. Geophysical Journal International 201 (2), pp. 574. External Links: Document Cited by: §7.
 Decoupled weight decay regularization. In International Conference on Learning Representations, Cited by: §5.1.
 Deep learning for fast simulation of seismic waves in complex media. Solid Earth 11 (4), pp. 1527–1549. Cited by: §4.
 Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML10), pp. 807–814. Cited by: §3.2.
 Regularization and fullwaveform inversion: a twostep approach. In 80th Annual International Meeting, SEG, Expanded Abstracts, pp. 2773–2778. Cited by: §7.
 A physicsbased neuralnetwork way to perform seismic full waveform inversion. IEEE Access 8, pp. 112266–112277. Cited by: §7.
 Physicsconsistent datadriven waveform inversion with adaptive data augmentation. IEEE Geoscience and Remote Sensing Letters. Cited by: §7.
 Very deep convolutional networks for largescale image recognition. In International Conference on Learning Representations, Cited by: §3.4.
 Physicsguided deep learning for seismic inversion with hybrid training and uncertainty analysis. Geophysics 86 (3), pp. R303–R317. Cited by: Figure 3, §5.1.
 Inverse problem theory and methods for model parameter estimation. SIAM. Cited by: §7.
 Full waveform inversion guided by travel time tomography. SIAM Journal on Scientific Computing 39, pp. S587–S609. Cited by: §7.
 An overview of fullwaveform inversion in exploration geophysics. Geophysics 74 (6), pp. WCC1–WCC26. Cited by: §2, §7.
 Frequencies of the Ricker wavelet. Geophysics 80 (2), pp. A31–A37. Cited by: §4.
 Welllogging constrained seismic inversion based on closedloop convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing 58 (8), pp. 5564–5574. Cited by: §7.
 InversionNet: an efficient and accurate datadriven full waveform inversion. IEEE Transactions on Computational Imaging 6 (1), pp. 419–433. Cited by: §1, §2, §3.2, §5.1, §5.1, §7.
 Deeplearning inversion: a nextgeneration seismic velocity model building method. Geophysics 84 (4), pp. R583–R599. Cited by: §4, §7.
 Datadriven seismic waveform inversion: a study on the robustness and generalization. IEEE Transactions on Geoscience and Remote Sensing 58, pp. 6900–6913. Cited by: §3.2, §5.1, §5.1, §7.
Appendix A Appendix
a.1 Derivation of Forward Modeling in Practice
Similar to the finite difference in time domain, in 2D situation, by applying the fourthorder central finite difference in space, the Laplacian of can be discretized as
(11) 
where , , , , and and stand for the horizontal offset and the depth of a 2D velocity map, respectively. For convenience, we assume that the vertical grid spacing is identical to the horizontal grid spacing .
During the simulation of the forward modeling, the boundaries of the velocity maps should be carefully handled because they may cause reflection artifacts that interfere with the desired waves. One of the standard methods to reduce the boundary effects is to add absorbing layers around the original velocity map. Waves are trapped and attenuated by a damping parameter when propagating through those absorbing layers. Here, we follow Collino and Tsogka (2001) and implement the damping parameter as
(13) 
where denotes the overall thickness of absorbing layers, indicates the distance between the current position and the closest boundary of the original velocity map, and is the theoretical reflection coefficient chosen to be . With absorbing layers added, Equation 6 can be ultimately written as
(14) 