The seismic reflection method is paramount for the location of possible hydrocarbon accumulation in the subsurface. Besides providing structural imaging and high area coverage in a short time, this geophysical method, in conjunction with additional data, provides invaluable information about rock properties, fluid content and lithology.
In this context, using the knowledge provided by the seismic method, one may evaluate not only the location of possible reservoirs but also the economic viability with reasonable accuracy, therefore reducing risk and potential losses. However, the interpretation procedure of the seismic data is a human-intensive and time-consuming task, performed by geoscientists who are continually dealing with tight deadlines and the ever-increasing size of datasets .
In this scenario, researchers have proposed the application of computer-aided systems to assist geoscientists in several tasks involving seismic interpretation. For example, [2, 3, 4, 5] aim at the identification of specific structures on seismic images and [6, 7] propose techniques to automate part of the seismic facies analysis process.
Other domains facing similar problems are using neural networks and machine/deep learning techniques with great success to support tasks that deal with high volumes of data and are considered human-centered, for instance, image classification and segmentation[8, 9], and object detection and recognition [10, 11]. Nevertheless, these methods require training datasets with a sufficient amount of data for the testing and validation of the proposed methodology. For the machine learning community, it is a common practice to make these datasets publicly available. Some examples are MNIST  (60k images), PASCAL-VOC  (40k images), MS-COCO  (330k images), and ImageNet  (14 million images). These datasets allow researchers to compare their advances, extend the state-of-the-art, and find new possible applications.
Although we have seen an increasing interest in machine learning in geosciences, to the best of our knowledge, there is no public labeled dataset targeting the seismic interpretation task. In this sense, we propose a new dataset to be publicly available. The Penobscot interpretation dataset consists of 7 horizons and +100,000 labeled seismic images derived from the Penobscot 3D seismic data , already in the public domain. The seismic lines were segmented in different portions based on their seismic facies. The proposed dataset has already been used in some applications, for example, the works of Chevitarese et al. [17, 18, 19], which will be discussed in Section VI.
The present paper is organized as follows: in the next section we describe the regional geology where the Penobscot survey lies. In Sections III and IV we present the Penobscot 3D seismic dataset and our interpretation procedure. Section V presents the proposed dataset and describes its main characteristics. Section VI briefly discuss two works in which the proposed datasets was employed and we conclude our work in Section VII.
Ii Geological Settings
The Penobscot 3D dataset  was acquired in the Scotian Basin, located on the Scotian Shelf, offshore Nova Scotia, Canada (Figure 1). The basin was formed during the break-up of Pangaea – separation of North American and African plates – and covers an area of 300,000 km2 with sediment maximum thickness of 18 km . The rifting process that took place from the Triassic to the Early Jurassic developed several sub-basins – Shelburne, Sable, Abenaki, South Wale, and Orpheus Graben – and, posteriorly in a passive margin configuration, two plateaus – Banquereau and La Have [21, 22].
During the break-up phase, which began in the Middle Triassic, the main infilling of the several interconnected sub-basins were the fluvial red bed sediments, along with volcanic rocks associated with the rifting process. In the Late Triassic, a shallow marine environment began, with the development of the Eurydice Formation, primarily formed by siliciclastic and carbonate sediments. The proper climatic configuration promoted the evaporation of the marine waters depositing salt layers, corresponding to the Argo Formation . Subsequently, in the extent of the Early Jurassic/Late Triassic, the rifting process continued until the Break-Up Unconformity and the beginning of the proto-ocean.
Following the Jurassic succession lithostratigraphy, a shallow marine environment allowed the deposition of tidally influenced dolomites with anhydrides and siliciclastics . Afterwards, the Mohican Formation sediments were deposited, which comprise muds and shales derived from a fluviomarine environment. The formation is overlaid by the Abenaki Formation, deposited in the Jurassic–Early Cretaceous during the spread of the sea floor. This formation consists of thick carbonate beds – predominantly limestones and dolomites – due to the configuration of a carbonate platform along the basin margin, and mudstones.
In the Late Jurassic, the Mic-Mac Formation, along with the Verril Canyon Formation and the Mohawk Formation, were deposited. These formations primarily consist of sands interbedded with shales and intercalated with carbonates, marking the initial phase of uplift and delta progradation . In the Early Cretaceous, the Scotian Basin suffered a marine regression phase, resulting in the progradation and deposition of thick fluvio-deltaic sediments of the Mississauga Formation. Subsequently, thick shale packages with sand beds occurred due to intense deltaic sedimentation in a transgression phase, forming the Logan Canyon Formation. Still in the retrogradation phase, the Dawson Canyon Formation primarily consists of deep marine shale deposits and some limestones located across the Scotian Shelf. This sedimentation is calcareous or marly on the top, becoming shaley and silty towards its base .
The cessation of the deltaic sedimentation during the Cretaceous allowed the deposit of the Wyandot Formation. This deposit is the most distinctive and recognized lithologic unit on the Scotian Shelf, consisting of chalky carbonates grading from pure chalk to marl . Overlying the Wyandot Formation, the Banquerau Formation consists of Tertiary marine shelf mudstones and shelf sands and conglomerates. This formation has a varying thickness, reaching more than 4 km beneath the continental slope, due to halokinesis .
The overview of the Scotian Basin lithostratigraphy and additional information about eustatic variation and possible hydrocarbon reservoirs are displayed in Figure 2.
Iii Seismic Data
The seismic data used for the generation of the proposed dataset is a public 3D seismic survey called Penobscot 3D , contributed by the Nova Scotia Department of Energy and the Canada Nova Scotia Offshore Petroleum Board, and managed by dGB Earth Sciences Open Seismic Repository . The dataset consists of 87 km2 time migrated 3D seismic data, with 601 inlines and 482 crosslines, located in offshore Nova Scotia, Canada (Figure 1).
The seismic data has a time range of 6,000 ms, with 2 ms of sampling rate. The signal has a low resolution below 3,000 ms, approximately 5 km, with a SEG standard polarity. The acquisition parameters are 12.5 m25 m bin size (inlinecrossline) with 60-fold coverage standard polarity. Along with the 3D seismic data, the repository also provides 2 wells, L-30 and B-41 – with some markers and no geophysical logs –, pre-stack gathers, 2D seismic data, stacking velocity, and 5 interpreted horizons.
Iv Seismic Interpretation
The Penobscot 3D seismic dataset was imported into OpendTect  and then interpreted by two geoscientists. It is important to note that although other data are available in the repository, only the 3D seismic data were used to produce the interpretation. The interpretation was performed disregarding the horizons provided by the Open Seismic Repository  since they sometimes comprise more than one significant texture, what could hinder the performance of the machine learning algorithms.
Seven horizons were interpreted: H1, H2, H3, H4, H5, H6, and H7, numbered from the highest depth to the lowest. They divide the seismic cube into eight intervals with different pattern configurations. Figure 3 shows the 7 interpreted horizons along with two seismic lines. We emphasize that these horizons do not necessarily have a direct relationship with the geological settings. This means that surfaces may not correspond to the top of formations or stratal interfaces. Four faults were also interpreted to assist the horizons interpretation. For the purposes of machine/deep learning, horizon surfaces were created to separate different seismic facies intervals.
The analysis of seismic facies consists of the identification of seismic reflection parameters, based primarily on configuration patterns that indicate geological factors like lithology, stratification, depositional systems, etc. . In the following list we explain briefly the seismic facies of each of the horizon intervals based on the amplitude and continuity of reflectors.
H1: the facies unit below H1 is characterized primarily by parallel, concordant, high-amplitude reflectors. It is also possible to identify chaotic reflectors, but that may be a consequence of the decrease of seismic frequency with depth.
H2-H1: the facies unit is characterized by parallel to subparallel, continuous, high-amplitude reflectors. Parallel/subparallel configuration reflects the uniform deposition of fluvio-deltaic sediments of the Mississauga Formation.
H3-H2: facies unit containing parallel to subparallel reflectors, like the previous interval, but less continuous.
H4-H3: reflectors below this horizon are continuous but have low amplitude, which makes it difficult to identify them. This is expected since the sedimentary package consists of deep marine shales and limestone showing little lithological contrast.
H5-H4: reflectors are predominantly subparallel and present varying amplitude.
H6-H5: the package consists mostly of parallel, high-amplitude reflectors. A facies unit with chaotic seismic reflectors is also noticed and may be associated with marine slump deposits.
H7-H6: the facies unit is composed of high-amplitude reflectors. Although most of the reflectors are continuous, some have diving angles and others are truncated, evidencing a high energy environment.
V Penobscot Interpretation Dataset
The Penobscot interpretation dataset consists primarily of 7 interpreted horizons in XYZ format and 2,166 images (1,083 seismic lines in TIFF format and 1,083 labeled images in PNG format). To create the labeled images, we took the intersection between the horizon surfaces and each seismic line and labeled the pixels from 0 to 7, following each horizon interval. Figure 4 shows a pair of an inline image (cropped in the figure) and its respective labels. In this paper, we present two applications: a classification and a semantic segmentation of seismic images. For the user’s convenience, we provide the image tiles used in the classification task along with the dataset111The Penobscot interpretation dataset is available at: https://doi.org/10.5281/zenodo.1324463.
To produce the classification dataset, we break the seismic images into tiles with 4040 pixels. One tile has the majority of its area belonging to only one class . In the provided dataset, we allow 30% of interference from other classes as discussed in [17, 31]. The entire process of creating tiles from a seismic image comprises the following steps:
V-1 split image files into training and test sets
V-2 shuffle the selected images
V-3 process the images removing extreme amplitudes and re-scale values between 0 and 255
by doing this, we reduce the search space and make data smaller.
V-4 generate tiles from processed images
V-5 balance train and test datasets
although seismic images are imbalanced regarding the areas of each seismic facies unit, machine learning methods usually rely on uniform distribution to optimize its parameters. By balancing the classes, we make the training process simpler allowing us, for example, to use well-known metrics, such as accuracy and precision.
The provided classification dataset comprises 6,124 crossline and 4,706 inline seismic tiles per class along with their respective labels. Table I describes the files in the dataset.
|File||Format||# Files||Total size (MB)|
|Seismic tiles (train)||PNG||75,810||116|
|Seismic labels (train)||JSON||2||1.5|
|Seismic tiles (test)||PNG||28,000||116|
|Seismic labels (test)||JSON||2||0.5|
In this section, we present two applications that demonstrate the utility of the Penobscot dataset for seismic classification and segmentation using deep learning.
Vi-a Seismic Facies Classification
The first application is the classification of seismic facies presented in  and . The authors successfully trained deep neural networks to discriminate the seismic facies present in the Penobscot dataset. They assume that one may distinguish different classes (facies) by their textural features as discussed in [5, 32].
For the presented classification task, it was necessary to break the seismic images into smaller parts (tiles), so that the majority of a tile’s area belongs to only one class. Tiles are the input of the deep neural network that classifies each tile as one of the possible classes.
The authors tested multiple tile sizes, number of examples per class, different interference percentages among many other parameters. The results in  show that one can train a neural network in 4 minutes using 25 inline slices, and yet obtain 89% of accuracy. Moreover, they reached up to 97% of accuracy in 30 minutes using 276 inline slices. Figure 5 shows the impact of the number of slices in the final classification accuracy.
Vi-B Seismic Facies Segmentation
The second application in which the Penobscot interpretation dataset was used is the semantic segmentation of seismic facies. In , the authors trained a deep neural network for the classification task. Then, they modified the final part of the model to produce pixel-wise predictions. Finally, they trained the resulting model using the Penobscot interpretation dataset.
For the segmentation task, the authors also divided the input seismic images into tiles. However, the tiles are larger than the ones used for the classification task since they need to comprise more than one class. Next, they applied the network throughout the image to generate the final prediction. By doing this, they achieved more than 97% of the intersection over union (IOU). Figure 6 shows that the model produced masks very close to the actual interpretation with very little discontinuity. Notice that the authors in  joined classes 2 and 3, and 4 and 5.
We argued in this letter that the expansion of machine learning in other fields was only possible, in part, because of the number of public datasets that have been made available in the past years. The Penobscot interpretation dataset is our contribution to foster the development of machine learning in seismic interpretation which, in our view, has gained increasing interest but still needs to build this common basis. With our dataset, we provide geoscientists, and machine learning practitioners working in the field, with +100,000 labeled images that can be used to develop their methods and compare their results in an easier and faster way.
In the experiments presented, the authors were able to successfully apply state-of-the-art deep learning techniques on the proposed dataset to reach high-accuracy results for seismic facies classification and segmentation. However, these are only two among many possible applications that could benefit from the dataset, such as, clustering, retrieval, and transfer learning. As future work, we intend to elaborate another dataset for other public seismic dataset called Netherlands F3. These two datasets together will provide valuable data for developing and testing machine learning methods for seismic interpretation.
-  T. Randen, E. Monsen, C. Signer, A. Abrahamsen, J. O. Hansen, T. Sæter, and J. Schlaf, “Three-dimensional texture attributes for seismic data analysis,” in SEG Technical Program Expanded Abstracts 2000, pp. 668–671, SEG, 2000.
P. Guillen, G. Larrazabal, G. González, D. Boumber, and R. Vilalta, “Supervised learning to detect salt body,” inSEG Technical Program Expanded Abstracts 2015, pp. 1826–1829, SEG, 2015.
-  C. Zhang, C. Frogner, M. Araya-Polo, and D. Hohl, “Machine-learning based automated fault detection in seismic traces,” in 76th EAGE ACE, 2014.
-  D. Gao, “Latest developments in seismic texture analysis for subsurface structure, facies, and reservoir characterization: A review,” Geophysics, vol. 76, no. 2, pp. W1–W13, 2011.
A. B. Mattos, R. S. Ferreira, R. M. D. G. Silva, M. Riva, and E. V. Brazil, “Assessing texture descriptors for seismic image retrieval,” in2017 30th SIBGRAPI, pp. 292–299, Oct 2017.
-  B. P. West, S. R. May, J. E. Eastwood, and C. Rossen, “Interactive seismic facies classification using textural attributes and neural networks,” The Leading Edge, vol. 21, no. 10, pp. 1042–1049, 2002.
-  C. Song, Z. Liu, H. Cai, Y. Wang, X. Li, and G. Hu, “Unsupervised seismic facies analysis with spatial constraints using regularized fuzzy c-means,” Journal of Geophysics and Engineering, vol. 14, no. 6, p. 1535, 2017.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” nov 2015.
-  E. Shelhamer, J. Long, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,”
-  J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in CVPR, June 2016.
-  S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Advances in NIPS 28 (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds.), pp. 91–99, Curran Associates, Inc., 2015.
-  Y. LeCun, C. Cortes, and C. J. Burges, “Mnist handwritten digit database,” AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, vol. 2, 2010.
-  M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
-  T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, pp. 740–755, Springer, 2014.
-  J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, pp. 248–255, IEEE, 2009.
-  O. S. R. dGB Earth Sciences, “Penobscot 3d - survey,” 2017.
-  D. S. Chevitarese, D. Szwarcman, R. M. G. e Silva, and E. V. Brazil, “Deep learning applied to seismic facies classification: a methodology for training,” in EAGE Saint Petersburg, 2018.
-  D. Chevitarese, D. Szwarcman, R. M. D. Silva, and E. V. Brazil, “Seismic facies segmentation using deep learning,” in AAPG ACE 2018.
-  D. Chevitarese, D. Szwarcman, R. M. D. Silva, and E. V. Brazil, “Transfer learning applied to seismic images classification,” in AAPG ACE 2018.
-  D. M. Hansen, J. W. Shimeld, M. A. Williamson, and H. Lykke-Andersen, “Development of a major polygonal fault system in upper cretaceous chalk and cenozoic mudrocks of the sable subbasin, canadian atlantic margin,” Marine and Petroleum Geology, vol. 21, no. 9, pp. 1205–1219, 2004.
-  M. Albertz, C. Beaumont, J. W. Shimeld, S. J. Ings, and S. Gradmann, “An investigation of salt tectonic structural styles in the scotian basin, offshore atlantic canada: 1. comparison of observations with geometrically simple numerical models,” Tectonics, vol. 29, no. 4, 2010.
-  Y. A. Kettanah, “Hydrocarbon fluid inclusions in the argo salt, offshore canadian atlantic margin,” Canadian Journal of Earth Sciences, vol. 50, no. 6, pp. 607–635, 2013.
-  C.-N. S. O. P. Board, Technical Summaries of Scotian Shelf Significant and Commercial Discoveries. Canada-Nova Scotia Offshore Petroleum Board, 1997.
-  F. Qayyum, O. Catuneanu, and C. E. Bouanga, “Sequence stratigraphy of a mixed siliciclastic-carbonate setting, scotian shelf, canada,” Interpretation, vol. 3, no. 2, pp. SN21–SN37, 2015.
-  A. Mandal and E. Srivastava, “Enhanced structural interpretation from 3d seismic data using hybrid attributes: New insights into fault visualization and displacement in cretaceous formations of the scotian basin, offshore nova scotia,” Marine and Petroleum Geology, vol. 89, pp. 464–478, 2018.
-  C.-N. S. O. P. Board, “Regional geology overview,” 2017.
-  B. U. Haq, J. Hardenbol, and P. R. Vail, “Mesozoic and cenozoic chronostratigraphy and cycles of sea-level change,” 1988.
-  dGB Earth Sciences, “Open seismic repository,” 2018.
-  dGB Earth Sciences, “Seismic interpretation software & services,” 2018.
-  L. F. Brown Jr, “Seismic stratigraphic interpretation and petroleum exploration,” Course Notes AAPG, no. 16, p. 181p, 1980.
-  D. S. Chevitarese, D. Szwarcman, E. V. Brazil, and B. Zadrozny, “Efficient classification of seismic textures,” in 2018 IJCNN, pp. 2984–2991, July 2018.
-  S. Chopra and V. Alexeev, “Applications of texture attribute analysis to 3d seismic data,” The Leading Edge, vol. 25, no. 8, pp. 934–940, 2006.