Accurate Tissue Interface Segmentation via Adversarial Pre-Segmentation of Anterior Segment OCT Images

05/07/2019 ∙ by Jiahong Ouyang, et al. ∙ Carnegie Mellon University UPMC 0

Optical Coherence Tomography (OCT) is an imaging modality that has been widely adopted for visualizing corneal, retinal and limbal tissue structure with micron resolution. It can be used to diagnose pathological conditions of the eye, and for developing pre-operative surgical plans. In contrast to the posterior retina, imaging the anterior tissue structures, such as the limbus and cornea, results in B-scans that exhibit increased speckle noise patterns and imaging artifacts. These artifacts, such as shadowing and specularity, pose a challenge during the analysis of the acquired volumes as they substantially obfuscate the location of tissue interfaces. To deal with the artifacts and speckle noise patterns and accurately segment the shallowest tissue interface, we propose a cascaded neural network framework, which comprises of a conditional Generative Adversarial Network (cGAN) and a Tissue Interface Segmentation Network (TISN). The cGAN pre-segments OCT B-scans by removing undesired specular artifacts and speckle noise patterns just above the shallowest tissue interface, and the TISN combines the original OCT image with the pre-segmentation to segment the shallowest interface. We show the applicability of the cascaded framework to corneal datasets, demonstrate that it precisely segments the shallowest corneal interface, and also show its generalization capacity to limbal datasets. We also propose a hybrid framework, wherein the cGAN pre-segmentation is passed to a traditional image analysis-based segmentation algorithm, and describe the improved segmentation performance. To the best of our knowledge, this is the first approach to remove severe specular artifacts and speckle noise patterns (prior to the shallowest interface) that affects the interpretation of anterior segment OCT datasets, thereby resulting in the accurate segmentation of the shallowest tissue interface.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 10

page 12

page 14

page 16

page 17

page 24

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

  • [1] D. Huang et al., “Optical Coherence Tomography”, Science 254, 1178-1181 (1991).
  • [2] J. Fujimoto et al., “Optical coherence tomography (OCT) in Ophthalmology: Introduction”, 17(5), 3978-3979 (2009).
  • [3] J. Izatt et al., “Micrometer-Scale Resolution Imaging of the Anterior Eye In Vivo With Optical Coherence Tomography”, Arch Ophthalmol. 112(12), 1584-1589 (1994).
  • [4] K. Lathrop et al., “Optical Coherence Tomography as a Rapid, Accurate, Noncontact Method of Visualizing the Palisades of Vogt”, 53(3), 1381-1387 (2012).
  • [5] A. Kuo et al., “Corneal Biometry from Volumetric SDOCT and Comparison with Existing Clinical Modalities”, 3(6), 1279-1290 (2012).
  • [6] N. Venkateswaran et al., “Optical Coherence Tomography for Ocular Surface and Corneal Diseases: A Review”, Eye and Vision 5(1), 1-13 (2018).
  • [7] B. Keller et al., “Real-time Corneal Segmentation and 3D Needle Tracking in Intrasurgical OCT”, 9, 2716-2732 (2018).
  • [8] K. Bizheva et al., “In Vivo Volumetric Imaging of the Human Corneo-Scleral Limbus with Spectral Domain OCT”, 2(7), 1794-1802 (2011).
  • [9] K. Bizheva et al., “In-Vivo Imaging of the Palisades of Vogt and the Limbal Crypts with Sub-Micrometer Axial Resolution Optical Coherence Tomography”, 8(9), 4141-4151 (2017).
  • [10] M. Haagdorens et al., “A method for quantifying limbal stem cell niches using OCT imaging”, Br. J. Ophthalmol., 101(9), 1250-1255 (2017).
  • [11] F. LaRocca et al., “Robust Automatic Segmentation of Corneal Layer Boundaries in SDOCT Images using Graph Theory and Dynamic Programming”, 2(6), 1524-1538 (2011).
  • [12] L. Ge et al., “Automatic Segmentation of the Central Epithelium Imaged With Three Optical Coherence Tomography Devices”, Eye & Contact Lens, 38(3), 150-157 (2012).
  • [13] D. Williams, Y. Zheng, F. Bao, and A. Elsheikh, “Automatic segmentation of anterior segment optical coherence tomography images”, J. Biomed. Opt. 18, 056003 (2013).
  • [14] Y. Li et al., “Corneal Pachymetry Mapping with High-speed Optical Coherence Tomography”, Ophthalmology 113, 792-799 (2006).
  • [15]

    D. Williams et al., “Reconstruction of 3D Surface Maps from Anterior Segment Optical Coherence Tomography Images using Graph Theory and Genetic Algorithms”, Biomed. Sig. Proc. Cont.,

    25, 91-98 (2016).
  • [16] H. Rabbani et al., “Obtaining Thickness Maps of Corneal Layers Using the Optimal Algorithm for Intracorneal Layer Segmentation”, Int. J. Biomed. Imag., 2016, (2016).
  • [17]

    M. Jahromi et al., “An Automatic Algorithm for Segmentation of the Boundaries of Corneal Layers in Optical Coherence Tomography Images using Gaussian Mixture Model”, J. Med. Signals Sensors,

    4, 171-180 (2014).
  • [18] T. Schmoll et al., “Precise thickness measurements of Bowman’s layer, epithelium, and tear film”, Optom. & Vis. Sci. 89, 795-802 (2012).
  • [19] T. Zhang et al., “A Novel Technique for Robust and Fast Segmentation of Corneal Layer Interfaces Based on Spectral-Domain Optical Coherence Tomography Imaging”, IEEE Access, 5, 10352-10363 (2017).
  • [20] T. Mathai et al., “Visualizing the Palisades of Vogt: Limbal Registration by Surface Segmentation”, IEEE International Symposium on Biomedical Imaging, 1327-1331 (2018).
  • [21] B. Davidson et al. “Application of optical coherence tomography to automated contact lens metrology”, J. Biomed. Opt., 15, 15-24 (2010).
  • [22] M. Shen et al. “Extended scan depth optical coherence tomography for evaluating ocular surface shape”, J. Biomed. Opt., 16(5) (2011).
  • [23] D. Fernandez et al., “Automated detection of retinal layer structures on optical coherence tomography images”, 13, 10200-10216 (2005).
  • [24] H. Ishikawa et al., “Macular Segmentation with Optical Coherence Tomography”, 46(6) (2005).
  • [25] T. Fabritius et al., “Automated segmentation of the macula by optical coherence tomography”, 17, 15659-15669 (2009).
  • [26] K. Li et al., “Optimal surface segmentation in volumetric images-a graph-theoretic approach”, IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 119-134 (2006).
  • [27] A. P. Dufour et al., “Graph-based multi-surface segmentation of OCT data using trained hard and soft constraints”, IEEE Trans. Med. Imaging 32(3), 531-543 (2013).
  • [28] A. Shah et al., “Multiple Surface Segmentation Using Truncated Convex Priors”, in Medical Image Computing and Computer Assisted Intervention 97-104 (2015).
  • [29] J. Tian et al., “Real-time automatic segmentation of optical coherence tomography volume data of the macular region”, PloS One 10(8), e0133908 (2015).
  • [30] S. J. Chiu et al., “Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation”, 18(18), 19413-19428 (2015).
  • [31] Y. Boykov et al., “Graph cuts and efficient and image segmentation”, Int. J. Comp. Vis. 70(2), 109-131 (2006).
  • [32] M. K. Garvin et al., “Automated 3D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images”, IEEE Trans. Med. Imag. 28(9), 1436-1447 (2009).
  • [33] F. Shi et al., “Automated 3D retinal layer segmentation of macular optical coherence tomography images with serous pigment epithelial detachments”, IEEE Trans. Med. Imag. 34(2), 441-452 (2015).
  • [34] K. Lee et al., “Segmentation of the optic disc in 3-d oct scans of the optic nerve head”, IEEE Trans. Med. Imag. 29(1), 159-168 (2010).
  • [35] Q. Song et al., “Optimal multiple surface segmentation with shape and context priors”, IEEE Trans. Med. Imag. 32(2), 376-386 (2013).
  • [36] A. Shahet al., “Automated surface segmentation of internal limiting membrane in spectral-domain optical coherence tomography volumes with a deep cup using a 3D range expansion approach”, IEEE International Symposium on Biomedical Imaging, 1405-1408 (2014).
  • [37] A. Yazdanpanah et al., “Intraretinal layer segmentation in optical coherence tomography using an active contour approach”, in Medical Image Computing and Computer Assisted Intervention, 649-656 (2009).
  • [38] S. Niu et al., “Automated geographic atrophy segmentation for SD-OCT images using region-based CV model via local similarity factor”, 7(2), 581-600 (2016).
  • [39] L. de Sisternes et al., “Automated intraretinal segmentation of SD-OCT images in normal and age-related macular degeneration eyes”, 8(3), 1926-1949 (2017).
  • [40] A. Lang et al., “Retinal layer segmentation of macular OCT images using boundary classification”, 4(7), 1133-1152 (2013).
  • [41] Z. Ma et al., “A review on the current segmentation algorithms for medical images”, International Conference on Imaging Theory and Applications, (2009).
  • [42] R. Kafieh et al., “A review of algorithms for segmentation of optical coherence tomography from retina”, J. Med. Sig. Sens. 3(1), 45 (2013).
  • [43]

    B. J. Antony et al., “A combined machine learning and graph-based framework for the segmentation of retinal surfaces in SD-OCT volumes”,

    4(12), 2712-2728 (2013).
  • [44]

    L. Fang et al., “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search”,

    8(5), 2732-2744 (2017).
  • [45]

    M. Chen et al., “Automated segmentation of the choroid in edi-oct images with retinal pathology using convolution neural networks”, in Fetal, Infant and Ophthalmic Medical Image Analysis, 177-184 (2017).

  • [46] X. Sui et al., “Choroid segmentation from optical coherence tomography with graph edge weights learned from deep convolutional neural networks”, J. Neurocomp. 237, 332-341 (2017).
  • [47] F. Venhuizen et al., “Robust total retina thickness segmentation in optical coherence tomography images using convolutional neural networks”, 1(8), 3292-3316 (2017).
  • [48] A. G. Roy et al., “Relaynet: Retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional network”, 8, 3627-3642 (2017)
  • [49] A. Shah et al., “Simultaneous multiple surface segmentation using deep learning”, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3-11 (2017).
  • [50] C. S. Lee et al., “Deep-learning based, automated segmentation of macular edema in optical coherence tomography”, 8, 3440-3448 (2017)
  • [51] T. Mathai et al., “Learning to Segment Corneal Tissue Interfaces in OCT Images”, arXiv:1810.06612 (2018).
  • [52] V. Santos et al., “CorneaNet: fast segmentation of cornea OCT scans of healthy and keratoconic eyes using deep learning”, 10, 622-641 (2019).
  • [53] M. Szkulmowski et al., “Efficient reduction of speckle noise in Optical Coherence Tomography”, 20, 1337-1359 (2012).
  • [54] A. E. Desjardins et al., “Angleresolved optical coherence tomography with sequential angular selectivity for speckle reduction”, 15(10), 6200-6209 (2007).
  • [55] M. Hughes et al., “Speckle noise reduction in optical coherence tomography of paint layers”, 49(1), 99-107 (2010).
  • [56] M. Pircher et al., “Measurement and imaging of water concentration in human cornea with differential absorption optical coherence tomography”, 11(18), 2190-2197 (2003).
  • [57] J. Rogowska et al., “Image Processing Techniques for Noise Removal, Enhancement and Segmentation of Cartilage OCT Images”, Phys. Med. Bio. 47 (4), 641-655 (2002).
  • [58] D. C. Adler et al., “Speckle reduction in optical coherence tomography images by use of a spatially adaptive wavelet filter”, 29(24), 2878-2880 (2004).
  • [59] A. Ozcan et al., “Speckle reduction in optical coherence tomography images using digital filtering”, 24(7), 1901-1910 (2007).
  • [60] P. Puvanathasan et al., “Speckle noise reduction algorithm for optical coherence tomography based on interval type II fuzzy set”, 15(24), 15747-15758 (2007).
  • [61] M. Gargesha et al., “Denoising and 4D visualization of OCT images”, 16(16), 12313-12333 (2008).
  • [62] S. Chitchian et al., “Denoising during optical coherence tomography of the prostate nerves via wavelet shrinkage using dual-tree complex wavelet transform”, J. Biomed. Opt. 14(1), 014031 (2009).
  • [63]

    A. Wong et al., “General Bayesian Estimation for Speckle Noise Reduction in Optical Coherence Tomography Retinal Imagery”,

    18 (8), 8338-8352 (2010).
  • [64] R. Bernardes et al., “Improved Adaptive Complex Diffusion Despeckling Filter”, 18 (23), 24048-24059 (2010).
  • [65] Z. Hongwei et al., “Adaptive Wavelet Transformation for Speckle Reduction in Optical Coherence Tomography Images”, IEEE International Conference on Signal Processing, Communications and Computing, 1-5 (2011).
  • [66] S. Moon et al., “Reference Spectrum Extraction and Fixed-pattern Noise Removal in Optical Coherence Tomography”, 18 (24), 395-404 (2010).
  • [67] S. Vergnole et al., “Artifact Removal in Fourier-domain Optical Coherence Tomography with a Piezoelectric Fiber Stretcher”, 33(7), 732-734 (2008).
  • [68] D. Marks et al., “Speckle Reduction by I-divergence Regularization in Optical Coherence Tomography”, 22 (11), 2366-2371 (2005).
  • [69] I. Goodfellow et al., “Generative Adversarial Nets”, in Advances in Neural Information Processing Systems, 2672-2680 (2014).
  • [70]

    P. Isola et al., “Image-to-Image Translation with Conditional Adversarial Networks”, IEEE Computer Vision and Pattern Recognition (2017).

  • [71] A. Radford, L. Metz, and S. Chintala. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, in International Conference on Machine Learning (2016).
  • [72] Y. Ma et al., “Speckle noise reduction in optical coherence tomography images based on edge-sensitive cGAN”, 9(11), 5129-5146 (2018).
  • [73] S. Apostolopoulos et al., “Pathological OCT Retinal Layer Segmentation Using Branch Residual U-Shape Networks”, in Medical Image Computing and Computer Assisted Intervention, 10435 (2017).
  • [74]

    L. Gondara, “Medical Image Denoising Using Convolutional Denoising Autoencoders”, IEEE IEEE International Conference on Data Mining Workshops, 241-246 (2016).

  • [75] O. Ronneberger et al., “U-Net: Convolutional Networks for Biomedical Image Segmentation”, in Medical Image Computing and Computer Assisted Intervention, 9351 (2015).
  • [76] A. Shah et al., “Multiple Surface Segmentation using Convolution Neural Nets: Application to Retinal Layer Segmentation in OCT Images”, 9, 4509-4526 (2018).
  • [77] V. Koltun et al., “Multi-Scale Context Aggregation by Dilated Convolutions”, in International Conference on Machine Learning (2016).
  • [78] S. Devalla et al., “DRUNET: a Dilated-Residual U-Net Deep Learning Network to Segment Optic Nerve Head Tissues in Optical Coherence Tomography Images”, 9(3), 244-265 (2018).
  • [79] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, arXiv:1502.03167 (2015).
  • [80] C. Szegedy et al., “Going Deeper with Convolutions”, IEEE Computer Vision and Pattern Recognition (2015).
  • [81] K. He et al., “Deep Residual Learning for Image Recognition”, IEEE Computer Vision and Pattern Recognition (2016).
  • [82] G. Huang et al., “Densely Connected Convolutional Networks”, IEEE Computer Vision and Pattern Recognition, 2261-2269 (2017).
  • [83] S. Jegou et al., “The One Hundred Layers Tiramisu: Fully Convolutional Densenets for Semantic Segmentation”, IEEE Computer Vision and Pattern Recognition Workshops, 1175-1183 (2017).
  • [84] N. Khosravan et al.,“S4ND: Single-Shot Single-Scale Lung Nodule Detection”, in Medical Image Computing and Computer Assisted Intervention, 11071 (2018).
  • [85] A. Odena et al., “Deconvolution and Checkerboard Artifacts”, Distill (2016).
  • [86] H. Noh et al., “Learning Deconvolution Network for Semantic Segmentation”, IEEE International Conference on Computer Vision (2015).
  • [87] J. Long et al., “Fully Convolutional Networks for Semantic Segmentation”, IEEE Computer Vision and Pattern Recognition (2015).
  • [88] B. Wang et al., “Gold Nanorods as a Contrast Agent for Doppler Optical Coherence Tomographys”, PLoS ONE 9(3) (2014).
  • [89] V. Srinivasan et al., “High-Definition and 3-Dimensional Imaging of Macular Pathologies with High-Speed Ultrahigh- Resolution Optical Coherence Tomography”, Ophthalmology 113(11), 1-14 (2014).
  • [90] Leica Envisu C2300 system specifications, https://www.leica-microsystems.com/fileadmin/downloads/Envisu%20C2300/Brochures/Envisu_C2300_EBrochure_2017_en.pdf.
  • [91]

    O. Russakovsky et al.,“ImageNet Large Scale Visual Recognition Challenge”, in International Journal of Computer Vision,

    115(3), 211-252 (2015).
  • [92] D. Patrice et al., “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, IEEE International Conference on Document Analysis and Recognition, 958-963 (2003).
  • [93] D. Kingma et al., “Adam: a Method for Stochastic Optimization”, in International Conference on Machine Learning (2015).
  • [94] F. Milletari et al., “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation”, Int. Conf. 3D Vision, 565-571 (2016).
  • [95] T.S. Mathai et al., “Graphics Processor Unit (GPU) Accelerated Shallow Transparent Layer Detection in Optical Coherence Tomographic (OCT) Images for Real-Time Corneal Surgical Guidance”, in Medical Image Computing and Computer Assisted Intervention Workshops, 8678 (2014).
  • [96] W. Cleveland, “LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression”, The American Statistician, 35(1) (1981).
  • [97] M. Felsberg and G. Sommer, “The Monogenic Signal”, IEEE Transactions on Signal Processing, 49(12), 3136-3144 (2001).

1 Introduction

Optical coherence tomography (OCT) is a non-invasive and non-contact imaging technique that has been widely adopted for imaging sub-surface tissue structures with micrometer depth resolution in clinical ophthalmology [1, 2]. OCT is a popular method to visualize structures in the eye, especially those in the retina [1], cornea [3], and limbus [4]. Specific to the anterior segment of the eye, OCT has been clinically used to characterize the changes that occur during the progression of Keratoconus [5, 6], diagnose benign and malignant conjunctival and corneal pathologies, such as Ocular Surface Squamous Neoplasia [4, 6], and monitor potential complications for many anterior segment surgical procedures, such as Deep Anterior Lamellar Keratoplasty (DALK) [7] and Descemet Membrane Endothelial Keratoplasty (DMEK) [6]. Furthermore, OCT has been used to image the limbus [4, 8, 9], and enabled the analysis of the Palisades of Vogt (POV) [10].

In all these applications, accurate estimation of the corneal or limbal tissue boundaries is required to determine a quantitative parameter for diagnosis or treatment. For example, in [5], the corneal tissue interfaces were identified to estimate corneal biometric parameters. In [10], the shallowest limbal interface was first identified, and then the tissue structure visualized in the image was “flattened” [11, 10, 20] to enable the measurement of the palisade density. However, precise estimation of the corneal and limbal tissue interface location is challenging in anterior segment OCT imaging. The low signal-to-noise ratio (SNR), increased speckle noise patterns, and predominant specular artifacts pose barriers towards automatic delineation of the tissue interfaces (see Fig. 1). Furthermore, datasets are typically acquired in a clinical setting using different OCT scanners (including custom-built OCT scanners for clinical research) from different vendors as shown in Fig. 1. The scan settings of these OCT machines are usually different, thereby resulting in datasets with different image dimensions, SNR, speckle noise patterns, and specular artifacts.


(a)


(b)


(c)


(d)


(e)


(f)

Figure 1: Original B-scans from (a) a 66mm corneal volume acquired by a custom SD-OCT scanner, (b) a 66mm corneal volume and (c) a 33mm corneal volume acquired by a UHR-OCT scanner, (d) a 44mm limbal volume acquired by a hand-held Leica SD-OCT scanner, and (e)-(f) 44mm limbal volumes acquired by a UHR-OCT scanner. Specular artifacts in (a)-(d) and poor visibility in (e)-(f) affect the precise delineation of the tissue interfaces.

Speckle noise patterns and specular artifacts are major factors that influence the correct interpretation of anterior segment OCT images. To mitigate these degradations, there are many hardware- and software-based approaches that process each B-scan before they are analyzed in a segmentation pipeline. Hardware-based speckle noise reduction techniques [53, 54, 55] rely on the acquisition of multiple tomograms with decorrelated speckle patterns, such that they can be averaged to obtain images with lower speckle contrast. These techniques usually require modification of the OCT system’s optical configuration and/or its scanning protocols. Software-based methods include wavelet transformations [58, 59, 60, 61, 62, 65], local averaging and median filtering [56, 57], percentile and bilateral filtering [20], regularization [68], local Bayesian estimation [63], and diffusion filtering [64]. Efforts were also made to remove artifacts by using the reference spectrum [66, 11], and piezoelectric fiber stretchers [67] in the Fourier domain. However, these methods only work when a fixed type of artifact is encountered, such as the horizontal artifacts in [66, 11], and they do not generalize to datasets where the assumption of the artifact presence is violated [51] as seen in Fig. 2. Furthermore, all the prior work are not robust when the SNR dropoff is substantial, which is typically the case while imaging the limbus; the anatomic curvature (and thus orientation toward the OCT scanner) changes when moving away from the cornea and towards the limbus, thereby causing a significant decrease in visibility of tissue boundaries as seen in Figs. 1(d) - 1(f). Particularly in our case, datasets were acquired by OCT scanners that imaged the limbal junction; the OCT scanner commenced scanning at the limbus and crossed over to the cornea, thereby incorporating the limbal junction during image acquisition. At the limbus, often only the shallowest interface is visible, and as the scanner crosses the limbal junction to image the cornea, different interfaces are gradually seen, such as the Bowman’s Layer etc. In this work, we focus on delineating the shallowest tissue interface in all corneal and limbal datasets.

Towards the goal of mitigating these image degrading factors, a recent learning-based method featuring a conditional Generative Adversarial Network (GAN) [69, 70] was proposed to remove the speckle noise patterns in retinal OCT images [71, 72]. It also generalized to datasets acquired from multiple OCT scanners. Although qualitatively good results were obtained, the central premise in their approach was based on limited (little to none) eye motion between frames during imaging. The ground truth data was generated using a compounding technique; the same tissue area was imaged multiple times, and individual volumes were registered yielding averaged B-scans for training, which corresponded to the gold standard despeckled images. However, in our case, this methodology to generate ground truth data for training is not feasible as corneal datasets exhibit large motion when acquired in-vivo, which makes registration and compounding challenging. In addition, existing research databases, from which corneal datasets can be extracted for use in algorithmic development, rarely contain multiple scans of the same tissue area for compounding. Moreover, the authors in [72] opined that it was difficult to judge the efficacy of a despeckling algorithm using existing metrics, such as SNR or Contrast-to-Noise Ratio (CNR), as any one metric is not a good determining factor of the quality of the denoised image. They suggested that an alternate way to analyze the utility of a despeckling method was to estimate the improvement in segmentation accuracy following denoising.

To deal with these challenging scenarios, it is desirable for a tissue-interface segmentation algorithm to possess the following characteristics: 1) Robustness in the presence of speckle noise and artifacts, 2) Generalization capacity across datasets acquired from multiple OCT scanners with different scan settings, and 3) Applicability to different (anterior segment) anatomical regions. Currently, there are a myriad of prior approaches that directly segment corneal and retinal tissue interfaces. They can be broadly grouped into four categories: 1) Traditional image analysis-based segmentation algorithms, 2) Graph-based segmentation methods, 3) Contour modeling-based segmentation methods, and 4) Machine learning-based (including deep learning-based) segmentation algorithms. Traditional image analysis-based approaches filter the individual B-scans to enhance the contrast of tissue interfaces, and then threshold the image to segment the corneal [21, 22, 20] and retinal [23, 24, 25] interface boundaries. These filters are typically hand-tuned and chosen for the explicit purpose of reducing speckle noise patterns and enhancing edges in the image for easier segmentation. Graph-based methods [26, 27, 28, 31, 32, 33, 34, 35, 36] pose the segmentation of the interfaces as an optimization problem, wherein tissue interfaces are detected subject to surface smoothness priors and the distance constraint between interfaces. Other graph-based methods [29, 30] involve posing the boundary segmentation problem as a shortest-path finding approach, wherein the shortest path between a source node and sink node is deduced, given costs assigned to the nodes between them. Contour modeling approaches utilize active contours that dynamically change their shape based on shape metrics, such as deviation from a second order polynomial [37, 38], edge gradients [39] underlying the contour etc.

Machine learning techniques express the segmentation problem as a classification task; features related to the tissue interfaces to be segmented are extracted, and then classified as belonging to the tissue boundary or background

[40, 41, 42]. In other cases, learning-based methods are an element of a hybrid system [43, 44], wherein the generated output, or the intermediate learned features, improve/assist the performance of traditional/graph-based/contour modeling approaches. Currently, deep neural networks are the state-of-the-art algorithms [44, 45, 46, 47, 48, 49, 50] of choice for the segmentation task as they can learn highly discriminative multi-scale features from training data, thereby outperforming all other segmentation approaches. These neural network models are alluring because key algorithm parameters are learned from the training data, which are often manually tuned in other approaches - for example, hand-crafted parameters in traditional image analysis-based [21, 22, 20, 23, 24, 25] and active contour-based approaches [37, 38, 39]. They can also be applied to pathological patients if appropriate datasets were introduced during the training procedure.


(a)


(b)


(c)


(d)


(e)


(f)

Figure 2: (a),(d) Original B-scans from a 44mm limbal dataset acquired using a hand-held SD-OCT scanner and from a 33mm corneal dataset acquired using a UHR-OCT scanner respectively. As proposed in previous algorithms [5, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19], vertical lines (magenta) denote the division of the image into three regions in order to deal with specular artifacts. (b),(e) Segmentation of the shallowest interface (cyan contour) by these algorithms failed due to presence of specular artifacts in different regions in the image. (c),(f) Segmentation result (red curve) from the proposed cascaded framework that accurately determined the location of shallowest tissue interface.

However, among all the aforementioned methods, the majority of traditional methods [27, 28, 32, 33, 34, 35, 36, 29, 30] and learning-based methods [44, 45, 46, 47, 48, 49, 50, 73] are focused on retinal interface segmentation. Corneal interface segmentation algorithms are predominately based on traditional approaches [5, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19], with limited learning-based approaches [51, 52] being proposed. Similarly, prior work on limbal interface segmentation is limited to a traditional image analysis-based approach [20]. Moreover, most of the prior work is suited towards the task of segmenting tissue interfaces of only one particular type of anatomy, such as retina or cornea, and these prior approaches are not easily generalizable across different types of anatomy. As shown in Fig. 2, most of the traditional approaches were not resilient when the methodology was transferred to our datasets obtained from different OCT scanners, which contained bulk tissue motion, severe specular artifacts and speckle noise patterns.

As seen in Figs. 2(b) and 2(e), previous segmentation approaches would divide (A-scan-wise) the OCT image into three sections, and assume that the location of the central specular artifact was limited to the center of the OCT image (region between the vertical magenta lines) [5, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19]. But as seen in Fig. 2, this assumption can be violated when the central artifacts are located in different image regions [51]. In such cases, prior approaches failed to accurately segment the tissue interface as shown in Figs. 2(b) and 2(e). From our experiments, we postulated that most traditional algorithms are confounded by the presence of these strong specular artifacts and speckle noise patterns. Yet, once the shallowest interface is identified, these traditional approaches were able to delineate other interfaces, such as Bowman’s Layer, Endothelium etc.

Furthermore, there were two independent and concurrently published deep learning-based corneal interface segmentation approaches [51, 52]. One of these approaches [52] acquired data from a single OCT scanner, and focused only on the region centered around the corneal apex in these OCT sequences as the drop in SNR was greater when moving away from this region. The other approach is our recent publication [51], where we utilized the entire OCT sequence from multiple scanners containing strong specular artifacts and low SNR regions, and successfully segmented three tissue interfaces. Yet, our previously proposed approach did not readily provide intermediate outputs, wherein the specular artifacts and speckle noise patterns were ameliorated, which could be used as input to the traditional approaches [5, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] for segmentation.

To this end, in this paper, we propose the first approach (to the best of our knowledge) to accurately identify the shallowest tissue interface in OCT images by mitigating speckle noise patterns and severe specular artifacts. We propose the creation of an intermediate OCT image representation that can influence the performance of a segmentation approach. Our major contributions in this paper are three-fold:

  1. Cascaded Framework: We present a cascaded neural network framework, which comprises of a conditional Generative Adversarial Network (cGAN) and a Tissue Interface Segmentation Network (TISN). The cGAN pre-segments OCT images by removing undesired specular artifacts and speckle noise patterns just prior to the shallowest tissue interface. The pre-segmentation output of the cGAN is an intermediate output. Following pre-segmentation, the TISN predicts the final segmentation using both the original and pre-segmented images, and the shallowest interface is extracted and fitted with a curve.

  2. Hybrid Framework: The intermediate pre-segmentation output yielded by the cGAN is used as the image input to another tissue-interface segmentation algorithm, e.g. [20]. In general, the pre-segmentation can be used by any segmentation algorithm, but in the Hybrid Framework the second-stage segmentation algorithm does not have access to the original OCT image.

  3. cGAN Weighted Loss: We propose a task-specific weighted loss for the cGAN, which enforces the preservation of details related to the tissue structure, while removing specular artifacts and speckle noise patterns just prior to the shallowest interface in a context-aware manner.

Our cascaded framework was first applied to corneal datasets, which were acquired using two different OCT systems and different scan protocols. Encouraged by our cascaded framework’s performance on corneas, we diversified our training to also include limbal datasets (also acquired with different OCT systems). It seemed reasonable to seek generalized learning since the characteristics of limbal datasets are similar to corneal datasets in terms of low SNR, speckle noise patterns, and specular artifacts. In all these datasets, we segmented the shallowest interface that could be extracted in each B-scan.

A key motivation for the proposed hybrid framework was to directly integrate the output of the cGAN into the image acquisition pipeline of custom-built OCT scanners. As we postulated earlier, the varying degrees of specular artifacts and speckle noise patterns confound traditional segmentation algorithms. If the cGAN were integrated into the imaging pipeline and OCT B-scans were pre-segmented after acquisition, then we hypothesized that previously proposed segmentation algorithms should benefit from the removal of specular artifacts and speckle noise patterns just above the shallowest interface. Thus, our goal with the development of the hybrid framework was to show that the pre-segmented OCT image enabled one of these segmentation algorithms [20] to generate lower segmentation errors.

To quantify the performance of our proposed frameworks, we compared the results of the following baselines: 1) A traditional image analysis-based algorithm [20] that directly segmented the tissue interface, 2) The hybrid framework, 3) A deep learning-based approach [51] that directly segmented the tissue interface, and 4) The cascaded framework. We provide a summary of the major results below:

  1. We show that our approach is generalizable to datasets acquired from multiple scanners displaying varying degrees of specular noise, artifacts, and bulk tissue motion.

  2. Our proposed frameworks segment the shallowest interface in datasets where the scanner starts by imaging the limbus, crosses over the limbal junction, and images the cornea.

  3. By executing a traditional image analysis-based algorithm on the pre-segmentation, the segmentation error was always reduced.

  4. We always accurately segmented the shallowest interface in corneal datasets using our proposed frameworks.

  5. In a majority of limbal datasets (15/18), we were able to precisely delineate the shallowest interface with our proposed frameworks.

2 Methods

2.1 Problem Statement

Given an OCT image , the task of a conditional Generative Adversarial Network (cGAN) is to find a function : that maps a pixel in

using a random noise vector

to a pre-segmented output image . The pixels in just prior to the tissue interface are mapped to (black), while those at and below the interface are retained. can then be used in a hybrid framework by any other segmentation algorithm.

Next, the task of the Tissue Interface Segmentation Network (TISN) is to determine a mapping : , wherein every corresponding pixel in and is assigned a label in the final segmentation . In this paper, we only segment the shallowest tissue interface in the image, and thus assign pixels in as: (0) pixels just above the tissue interface, (1) pixels at and below the tissue interface. Our frameworks are pictorally shown in Fig. 3.

Figure 3: Our proposed approach contains two frameworks: a cascaded framework (purple) and a hybrid framework (orange). First, a conditional Generative Adversarial Network (cGAN) takes an input OCT image, and produces an intermediate pre-segmentation image. In the pre-segmentation, pixels just prior to the shallowest tissue interface are set to 0 (black), while others are retained. In the cascaded framework, the pre-segmentation, along with the input image, are passed to a Tissue Interface Segmentation Network (TISN). The TISN predicts the location of shallowest interface by generating a binary segmentation mask (overlaid on the original image with a false color overlay; red - foreground, turquoise - background). In the hybrid framework, the pre-segmentation can be utilized by other segmentation algorithms. Ultimately, both frameworks fit a curve to the interface to produce the final segmentation.

2.2 Architecture

We first describe the neural network architecture that was used as the base for both the cGAN (generator), and the TISN. As mentioned in Sec. 1, images of the anterior segment of the eye acquired using OCT contain low SNR, strong specular artifacts, and faintly discernable interfaces that are corrupted by speckle noise patterns. In our previous work [51], we have shown that the CorNet architecture captures faintly visible features across multiple scales. It produced state-of-the-art results on corneal datasets acquired using different OCT systems and using different scan protocols. The errors were 2 lower than non-proprietary state-of-the-art segmentation algorithms, including traditional image analysis-based [11, 19] and deep learning-based approaches [48, 75, 73].

The CorNet architecture was built upon the BRUNET [73]

architecture, and enhanced the reuse of features generated in the network through residual connections

[81], dense connections [82], and dilated convolutions [77, 78, 80]

. It alleviated the vanishing gradient problem, and prevented the holes in the segmentation generated by current deep learning-based approaches

[75, 48, 73]. It could accurately extract poorly defined corneal interfaces, such as the Endothelium, which is very common in anterior segment OCT imaging [51].

As shown in Fig. 4, the CorNet architecture comprised of contracting and expanding branches; each branch consisted of a building block, which was inspired by the Inception block [80], followed by a bottleneck block. The building block extracted features related to edges and boundaries at different resolutions. The bottleneck block compactly represented the salient attributes, and these properties (even from earlier layers) were encouraged to be reused throughout the network. Thereby, faint tissue boundaries essential to our segmentation task were distinguished from speckle noise patterns, and pixels corresponding to the tissue interface and those below it were correctly predicted. In addition, extensive experiments were conducted in [51]

to determine the right feature selection mechanisms

[86, 87, 85, 48, 84]

for segmentation, such as max-pooling

[84]

for downsampling and nearest neighbor interpolation + 3

3 convolution [85] for upsampling.

Figure 4: The CorNet model is the base architecture used for training both the cGAN and TISN. The input to the cGAN is a two-channel image, the input OCT image and binary mask (see Sec. 3.1.2), and the output is a pre-segmented OCT image (orange box). The TISN gets a two-channel input (magenta and orange boxes), and the output is a binary mask (yellow box). The dark green blocks in the contracting path represent downsampling operations, while the blue blocks constitute upsampling computations. This model uses residual and dense connections to efficiently pre-segment the OCT image, and predict the location of the shallowest interface in the final output. The light blue module at the bottom of the model did not upsample feature maps, instead it functioned as a bottleneck to create outputs with the same size as those from the last layer.

2.3 Conditional Generative Adversarial Network (cGAN)

2.3.1 Original cGAN

Conditional Generative Adversarial Networks [70]

are currently popular choices for image-to-image translation tasks, such as image super-resolution and painting style transfer. In these tasks, the cGAN learns to generate an output by being introduced to (conditioned on) an input image. The cGAN framework consists of two entities: a Generator (G) and a Discriminator (D). The generator G takes an input image

and a random noise vector , and generates a prediction that is similar to the desired gold standard output . Next, the input is paired with and , thereby creating two pairs of images respectively; the true gold standard pair (, ) and the predicted pair (, ). Then, the discriminator D recognizes the pair that most accurately represents the gold standard output desired. These two entities are trained in conjunction, such that they compete with each other; G tries to fool D by producing an output that closely resembles the gold standard, while D tries improve its ability to distinguish the two pairs of images.

Initially, G generates a prediction that poorly resembles . It learns to produce more realistic predictions by minimizing an objective function shown in Eq. (1). On the other hand, D tries to maximize this objective by accurately distinguishing the generated prediction from the true gold standard . The objective function comprises of two losses: in Eq. (2), and in Eq. (3), with being a hyper-parameter. The loss penalizes regions in the generated output that differ from the ground truth image provided, thereby making the loss a “structured” loss [70]. It forces the output of the generator to be close to the ground truth in the sense. This loss proved to result in less blurry outputs as opposed to the original GAN formulation [69], which utilized an loss. The PatchGAN [70]

discriminator was employed to output the probability of a pair of images being real or fake.

(1)
(2)
(3)

Directly transferring the full cGAN implementation with the cGAN loss in Eq. (1) to our OCT datasets resulted in checkerboard artifacts [85] in the generated predictions. Moreover, as shown in Fig. 5, parts of the tissue boundary that needed to be preserved were removed instead. From our experiments, we made two empirical observations: 1) The U-Net generator architecture [75] that was utilized in the cGAN paper [70] created checkerboard artifacts in the generated pre-segmentation and did not preserve tissue boundaries correctly; it has been shown in prior work [85, 73, 51] that the original U-Net implementation is not the optimal choice; 2) The loss in Eq. (3) penalizes all pixels in the image equally.


(a)


(b)


(c)


(d)


(e)


(f)


(g)


(h)

Figure 5: Comparing generated pre-segmentations between the U-Net architecture used in the original cGAN implementation [70] against those generated by the CorNet architecture [51]. Original B-scans for two different limbal datasets are shown in (a) and (e) respectively, while the generated pre-segmentations for the cGAN U-Net is shown in (b) and (f), and the generated pre-segmentations for the CorNet are shown in (c) and (g). Note that in (b) and (f), the U-Net did not remove the speckle patterns above the shallow tissue interface, while also encroaching upon the tissue boundaries without preserving them accurately. (d) and (h) show heat maps of the difference between original and pre-segmented OCT B-scans by CorNet.

2.3.2 Modified cGAN with Weighted Loss

The required output of the cGAN is a pre-segmented OCT image, wherein the background pixels just prior to the shallowest tissue interface are to be eliminated, and the region at and below the interface is to be preserved. As mentioned before, the L1 loss in Eq. (3) equally penalized all pixels in the image without imparting a higher penalty to the background pixels, which contains specular artifacts and speckle noise patterns hindering segmentation, above the shallowest tissue interface. To mitigate this problem, a novel task-specific weighted loss, defined in Eq. (4), is proposed in this paper. In Eq. (4), denotes the pixel-wise product, and is the hyper-parameter that imparts higher weight to the background pixels over the foreground pixels.

(4)

As the preservation of pixels at and below the interface is paramount, our loss function incorporated a binary mask

, which imparted different weights to the foreground and background pixels. This mask was generated from the gold standard annotation of an expert grader for each image in the training dataset, and its design is further described in Sec. 3.1.2. We replaced the loss in Eq. (1) with our weighted loss in Eq. (4), and it eliminated the speckle patterns and specular artifacts just prior to the shallowest interface.

2.4 Tissue Interface Segmentation Network (TISN)

As mentioned in Sec. 2.2, the CorNet architecture was used as the base model in order to segment the shallowest tissue interface. The intermediate pre-segmented OCT image from the cGAN, along with the original OCT image, is passed to the TISN to delineate the shallowest tissue interface. The output of the TISN is a binary mask, wherein pixels corresponding to the tissue interface and those below it were labeled as the foreground (1) and those above the interface were labeled as the background (0). As shown in Figs. 3 and 4, the shallowest interface was extracted from this binary mask [95] and fitted with a curve [96].

3 Experiments and Results

3.1 Data

3.1.1 Acquisition

25 corneal datasets and 25 limbal datasets, totaling 50 datasets, were randomly selected from an existing research database [51]. These datasets were acquired using different scan protocols from three different OCT scanners: a custom Bioptigen Spectral Domain OCT (SD-OCT) scanner (Device 1) that has been described before [88], a high-speed ultra-high resolution OCT (hsUHR-OCT) scanner (Device 2) [89], and a Leica (formerly Bioptigen) Envisu C2300 SD-OCT system (Device 3) [90]. Device 1 had a axial and lateral spacing, and it was used to scan an area of size mm on the cornea. Device 2 was used to scan two areas of sizes 66mm and 33mm respectively. This system had a 1.3 axial and a 15 lateral spacing while interrogating the 66mm tissue area. It had the same axial spacing, but a different lateral spacing of 7.5 while imaging the 33mm area. Device 3 had a 2.44 axial and 12 lateral spacing when fitted with the 18mm anterior imaging lens. Devices 1 and 2 were solely used to scan the cornea, with the former producing datasets of dimensions 1024100050 pixels, and the latter generating datasets of dimensions 400102450 pixels. Devices 2 and 3 were used to scan the limbus, resulting in volumes that had varying dimensions; the number of A-scans across all limbal datasets varied between 256 and 1024, with a constant 1024 pixels axial resolution, and the number of B-scans across all datasets varied between 25 and 375.

3.1.2 Data Preparation

From the 50 datasets, we had a total of 1250 corneal images and 4437 limbal images respectively. Of the 50 corneal and limbal datasets, 14 datasets were randomly chosen for training the cGAN, and the remaining were used for testing. These datasets were chosen such that they came from both eyes; the number of patients that were imaged could not be ascertained as the database contained deidentified datasets. From the total set, we chose the training set to comprise of a balanced number of limbal and corneal datasets (7 each) that exhibited different magnitudes of specular artifacts, shadowing, and speckle. The training set contained 350 corneal and 1382 limbal images respectively, and the remaining were set aside in the testing set. Considering the varying dimensions of the OCT images acquired from three OCT systems that were used in this work, along with the limited GPU RAM available for training, it was challenging to train a framework using full-width images while preserving the pixel resolution. Similar to previous approaches [48, 51], we sliced the input images width-wise to produce a set of images of dimensions 2561024 pixels, and in this way, we preserved the OCT image resolution. We used the same datasets that were selected in the training set for training both the cGAN and the TISN.

An example annotation by an expert grader is shown in Fig. 6(a). To generate the gold standard pre-segmentation images for training, we eliminated the speckle noise and specular artifacts by setting the region just above the annotated surface to 0 (black), and kept the same pixel intensities corresponding to the tissue structure at the annotation contour and for all pixels below it - see Fig. 6(b). The binary mask that was used in the Eq. (4) is shown in Fig. 6(c). Using the image in Fig. 6(d) as reference, we detail the process of obtaining . In Fig. 6(d), the original annotation of the tissue interface boundary by the grader is shown in red, and this red annotated contour was shifted down by 50 pixels to the position of the magenta contour. The magenta contour, along with the blue region below the contour, was considered the foreground, while all pixels above the magenta contour belong to the background. The background in the binary mask was set to 1 and the foreground was set to 0, with the background being weighted times higher than the foreground.

In order to understand the effect of the proposed mask design, let us consider an alternate binary mask design . Let represent the mask of the expert annotation in Fig. 6(a), wherein the pixels above the annotation (without shifting it down/up) are the background and those at and below the annotation are the foreground, with the background weighted times higher than the foreground. When the cGAN used this mask , it mistakenly eroded the tissue interface and regions below it similar to the image in Fig. 5(b). In such a scenario, there is no large penalty applied to the erosion of pixels as detailed in Eq. (4). In order to correct this mistake, it would be necessary to impart a higher penalty to the region that was eroded. To do so, we measured the maximum extent of structural erosion (at the tissue interface and/or pixels below it) from the shallowest interface in the UNET pre-segmentation outputs. Using this value (rounded up to a nearest multiple of 10), we shifted expert annotation down (by 50 pixels) in our binary mask , and conferred the same weight to the regions (green + red + gray) to avoid the erosion of the tissue interface.


(a)


(b)


(c)


(d)

Figure 6: (a) Expert annotation of an original B-scan in a 66mm limbal volume acquired by Device 3, (b) Gold standard pre-segmentation image for training, (c) Binary mask used in Eq. (4) for training the cGAN, (d) Label map detailing the process of generating (see Sec. 3.1.2).

3.1.3 Data Augmentation

As our training datasets were smaller in number in contrast to those from datasets typically available in computer vision tasks, such as image recognition [91], we augmented our datasets to increase the variety of the images that were seen during the training. These augmentations [92] included horizontal flips, gamma adjustment, elastic deformations, Gaussian blurring, median blurring, bilateral blurring, Gaussian noise addition, cropping, and affine transformations. The full set of augmented images was used to train the TISN as it required substantially larger amounts of data to generalize to new test inputs. On the other hand, the cGAN can be trained with smaller quantities of input training data as it has been shown to perform well on small training datasets [70]. For the cGAN, augmentation was done by simply flipping each input slice horizontally along the X-axis.

3.2 Experimental Setup

3.2.1 cGAN Training

Training of the cGAN commenced from scratch using the architecture shown in Fig. 4. The input to the generator was a two-channel image; the first channel corresponds to the input OCT image, and the second channel corresponds to the binary mask . We used = 100, and = 10 in final objective function, and optimized the network parameters using the ADAM optimizer [93]

. We used 90% of the input data for training, and the remaining 10% for validation. We trained the network for 100 epochs with the learning rate set to

. In order to prevent the network from over-fitting to the training data, early stopping was applied when the validation loss did not decrease for 10 epochs. At the last layer of the generator, a convolution operation, followed by a TanH activation, was used to convert the final feature maps into the desired output pre-segmentation with pixel values mapped to the range of . A NVIDIA Tesla V100 16GB GPU was used for training the cGAN with a batch size of 4. During test time, the input OCT image is replicated to produce a two-channel input to the cGAN.

3.2.2 TISN Training

The same datasets from cGAN training were used for training the TISN from scratch. The input to the TISN is a two-channel image; the first channel corresponds to the original input image, and the second channel corresponds to the predicted pre-segmentation obtained from the cGAN. The two-channel input allowed the TISN to focus on the high frequency regions, corresponding to the interface, in the image. The Mean Squared Error (MSE) loss, along with the ADAM optimizer [93], was used for training. In this work, we used MSE loss to be consistent with the original CorNet implementation [51], but the MSE loss can easily be substituted for the cross entropy loss [75] or the dice loss [94]. The batch size used for training was set to 2 slices as we fully wanted to utilize memory on a NVIDIA Titan Xp GPU. Validation data comprised of 10% of the training data. We trained the network for a total of 150 epochs with the learning rate set to . When the validation loss did not improve for 5 epochs, the learning rate was decreased by a factor of 2. Finally, in order to prevent over-fitting, the training of the TISN was halted through early stopping when the validation loss did not improve for 10 consecutive epochs.

The feature maps in the final layer of the network are activated using the softmax function to produce a two-channel output. Once the network was trained, it was used to segment the shallowest interface in our testing datasets. At test time, the TISN yielded a two-channel output; the first channel corresponded to the foreground tissue segmentation, and the second channel corresponded to the background pixel segmentation (above the tissue interface). The foreground pixels corresponded to the boundary of the interface and those pixels below it, while the pixels above the tissue boundary denoted the background. Finally, the predicted segmentation was fitted with a curve [96] after the tissue interface was identified using a fast GPU-based method [95]. We show our final results in Figs. 7, 8 and 15 along with the supplementary video visualizations.

Figure 7: Corneal interface segmentation results for datasets acquired using Devices 1 and 2. Columns from left to right: (a) Original B-scans in corneal OCT datasets, (b) Pre-segmented OCT images from the cGAN with the specular artifact and speckle noise patterns removed just prior to the shallowest tissue interface, (c) Binary segmentation from the TISN overlaid in false color (red - foreground, turquoise - background) on the original B-scan, (d) Curve fit to the shallowest interface (red contour).
Figure 8: Limbal interface segmentation results for datasets acquired using Devices 2 and 3. Columns from left to right: (a) Original B-scans in the limbal OCT datasets, (b) Pre-segmented OCT images from the cGAN with the specular artifact and speckle noise patterns removed above the shallowest tissue interface, (c) Binary segmentation from the TISN overlaid in false color (red - foreground, turquoise - background) on the original B-scan, (d) Curve fit to the shallowest interface (red contour).

3.3 Baseline Comparisons

Extensive evaluation of the performance of our approach was conducted across all the testing datasets. First, we wanted to investigate the accuracy of a traditional image analysis-based algorithm [20] that directly segmented the interface in our test datasets. Briefly, this algorithm filtered the OCT image to reduce speckle noise and artifacts, extracted the monogenic signal [97], and segmented the tissue interface. We denote this baseline in the rest of the paper by the acronym: Traditional WithOut Pre-Segmentation (TWOPS).

Second, we designed a hybrid framework, where the pre-segmented OCT image from the cGAN is used by the traditional image analysis-based algorithm [20] to segment the shallowest interface. We wanted to determine the improvement in segmentation accuracy when the traditional algorithm used the pre-segmentation instead of the original OCT image. Going forward, we denote this baseline by the acronym: Traditional With Pre-Segmentation (TWPS).

Third, we trained a CorNet architecture [51] to directly segment the foreground in the input OCT image, without including the cGAN pre-segmentation as an additional input channel. We compared the direct segmentation result against our cascaded framework. Henceforth, in the remainder of the paper, we refer to the direct deep learning-based segmentation approach by the acronym: Deep Learning WithOut Pre-Segmentation (DLWOPS). Finally, we call our cascaded framework as: Deep Learning With Pre-Segmentation (DLWPS).

To summarize, the following baseline methods were considered for performance evaluation:

  1. TWOPS - A traditional image analysis-based algorithm [20] that directly segmented the tissue interface.

  2. TWPS - The hybrid framework.

  3. DLWOPS - A deep learning-based approach [51] that directly segmented the tissue interface.

  4. DLWPS - The cascaded framework.

3.4 Evaluation

3.4.1 Annotation

Each corneal dataset was annotated by an expert grader (G1; Grader 1) and a trained grader (G2; Grader 2). However, only expert annotations were available for the limbal datasets in the research database. The graders were asked to annotate the shallowest interface in all test datasets. For each dataset, the graders annotated the interface using a 5-pixel width band with an admissible annotation error of 3 pixels. All the annotations were fitted with a curve for comparison with the different baselines. We also estimated the inter-grader annotation variability for the corneal datasets, and refer to it in the rest of the paper by the acronym: IG.

3.4.2 Metrics

In order to compare the segmentation accuracy across the different baselines, we calculated the following metrics: 1) Mean Absolute Difference in Layer Boundary Position (MADLBP) and 2) Hausdorff Distance (HD) between the fitted curves. These metric values were determined over all testing datasets, and only for the shallowest interface. In Eqs. (5) and (6), the sets of points that represent the gold standard annotation and the segmentation to which it is compared (each fitted with curves) are denoted by and respectively. We denote by the Y-coordinate (rounded down after curve fitting) of the point in whose X-coordinate is , and is the Y-coordinate (rounded down) of the point in . is the distance of a point in to the closest point in , and similarly for .

We chose MADLBP in Eq. (5) as one of our error metrics since it was used in [20] to compare the segmentation accuracy between the automatic segmentations and grader annotations. Although MADLBP quantifies error in pixels, it did not measure the Euclidean distance error; instead, it simply measured the positional distance between the detected boundary location and the annotation along the same A-scan. On the other hand, the Hausdorff distance in Eq. (6) captured the greatest of all distances between the points in the segmentation and annotation. Therefore, it quantitatively describes the worst segmentation error in microns as it is more clinically relevant (e.g. to detect structural changes over time). In this work, we did not compute Dice similarity as it did not provide segmentation error in microns.

MADLBP (5)
HD (6)

In Fig. 9, the HD error and the MADLBP error across all baselines for the corneal datasets acquired from devices 1 and 2 were compared. In Fig. 10, the benefit of pre-segmenting the OCT image was verified by first grouping the baselines into two categories - Traditional Comparison (TC; TWOPS vs TWPS) and Deep Learning Comparison (DLC; DLWOPS vs DLWPS) - and then contrasting the maximum HD error per dataset for each category and for each grader. We also determined the HD and MADLBP error across the limbal datasets in Figs. 11 and 14. Again in Fig. 12, we estimated the benefit of pre-segmenting limbal datasets by grouping baselines into two categories, TC and DLC, and comparing maximum HD error per dataset for each category. Moreover, we found a few instances where our cascaded framework failed to correctly segment the tissue interface as seen in Fig. 12 (results after the red vertical line).

Figure 9: (a)-(c) HD error and (d)-(f) MADLBP error comparison for the corneal datasets acquired with Devices 1 and 2 respectively. In the boxplots, the segmentation results obtained for each baseline method are contrasted against expert grader (blue) and trained grader (red) annotations, while the Inter-Grader (IG) variability is shown in yellow.
Figure 10: Quantitative estimation of the benefit of pre-segmenting the corneal OCT image. All the baselines were grouped into two categories: Traditional Comparison (TC; TWOPS vs TWPS), and Deep Learning Comparison (DLC; DLWOPS vs DLWPS). The first column corresponds to the former, and the second column corresponds to the latter. For each corneal test dataset, the image with the maximum HD error was found over all images in the sequence, and the image location in the sequence was stored. This was done only for the TWOPS and DLWOPS baselines respectively. The stored location indicies were then used to retrieve the corresponding HD errors from the TWPS and DLWPS baselines respectively. This procedure was repeated for each grader and plotted. G1 : without pre-segmentation (purple curve), with pre-segmentation (black curve). G2 : without pre-segmentation (yellow curve), with pre-segmentation (gray curve).
Figure 11: (a)-(b) HD error and (c)-(d) MADLBP error comparison for the limbal datasets acquired with Devices 2 and 3 respectively. For the limbal datasets, the segmentation results obtained for each baseline method were contrasted exclusively against the expert annotations (G1). This graph plots the errors across all limbal datasets, including the failure cases. In contrast to Fig. 14, note the increased segmentation error in the DLWPS baseline due to imprecise pre-segmentations.
Figure 12: Quantitative estimation of the benefit of pre-segmenting the corneal OCT image. All the baselines were grouped into two categories: TC (TWOPS vs TWPS), and DLC (DLWOPS vs DLWPS). The first column corresponds to the former, and the second column corresponds to the latter. For each test dataset, the image with the maximum HD error was found over all images in the sequence, and the image location in the sequence was stored. This was done only for the TWOPS and DLWOPS baselines respectively. The stored location indicies were then used to retrieve the corresponding HD errors from the TWPS and DLWPS baselines respectively. This procedure was done for only the expert grader and plotted. G1 : without pre-segmentation (purple curve), with pre-segmentation (black curve). Errors shown after red vertical line correspond to the failure cases of our approach.

4 Discussion

4.1 Segmentation Accuracy of Corneal Interface

From the HD and MADLBP errors in Figs. 9, the error is worse for the TWOPS baseline method, where the traditional algorithm [20] used the original OCT image (without the pre-segmentation) to directly segment the interface. The hand-crafted features in this baseline algorithm failed to handle severe specular artifacts and noise patterns as seen in Fig. 2

. In contrast, the TWPS baseline (hybrid framework), which uses the pre-segmented image instead of the original OCT image, produced a lower segmentation error. To quantify these observations, a paired t-test between the TWOPS and TWPS baselines was computed for each error metric, and we estimated that the results were statistically significant (

= 4.2747e-05, = 1.2859e-05). From these results, we concluded that the traditional algorithm fared better in the hybrid framework when the pre-segmented OCT image was used to segment the corneal tissue interface.

The DLWOPS baseline in Fig. 9 had lower HD and MADLBP errors than the TWPS baseline for the expert grader annotations. However, the errors were higher for the trained grader especially on the 33mm datasets from Device 2, as seen in Figs. 9(c) and 9(f), due to the large inter-grader variability. On the other hand, our DLWPS baseline approach, which used the pre-segmented image, fared better in contrast to the other three baselines. Again, we computed paired t-tests between the DLWPS approach and all other baselines to determine the improvement in segmentation accuracy for each error metric. From the -values in Table. 1 and Fig. 9, the cascaded framework generated results that were an improvement upon the other baselines, and indicated statistically significant results across all corneal datasets (p 0.05).

TWOPS TWPS DLWOPS
5.1929e-06 2.2079e-04 5.1454e-04
2.6848e-06 1.9264e-04 2.0734e-04
Table 1: Statistical significance between our cascaded framework (DLWPS) against each baseline method for all the corneal datasets from Devices 1 and 2.

We also wanted to determine the improvement in segmentation accuracy on an per-image basis in each of the corneal test datasets. To do so, we first grouped the baselines into two categories: only traditional image analysis-based approaches (TWOPS vs. TWPS), and only deep learning-based approaches (DLWOPS vs. DLWPS). Next, we searched for the image in each corneal dataset that had the maximum HD error over all images in that dataset, and noted its index in the sequence. This was done only for the TWOPS and DLWOPS baselines respectively, and we plotted these maximum HD errors for each grader in Fig. 10 (purple and yellow colored curves). Then, we queried the errors for the same images (using the image indicies) in the TWPS and DLWPS baseline approaches respectively, and plotted the corresponding HD errors for each grader in Fig. 10 (black and gray curves). From Fig. 10, we noted that the baselines incorporating the pre-segemented OCT image performed better than one that did not include the pre-segmentation. The pre-segmentation always improved the segmentation performance of the traditional image-analysis based approach when incorporated into a hybrid framework, and also improved the accuracy of a deep learning-based approach in a majority of corneal datasets when used in the cascaded framework. This quantitatively attests to the benefit of utilizing the pre-segmented OCT image as part of a segmentation framework.

4.2 Segmentation Accuracy of Limbal Interface

We plotted the segmentation error for the baseline methods executed on limbal datasets in Figs. 11, 12 and 14. In Fig. 11, we plotted the errors across all limbal test datasets, including the instances when the cascaded and hybrid frameworks failed to accurately segment the shallowest interface. In Fig. 14, we plot the errors only for the successful instances of interface segmentation. From Figs. 11 and 14, the error for the TWOPS baseline is the worst amongst all baselines as it failed to handle strong specular artifacts and severe speckle noise. On the other hand, the TWPS baseline fared better with lower errors than the TWOPS baseline. Similar to Sec. 4.1, we also assessed the improvement in segmentation accuracy on a per-image basis for each of the 18 limbal datasets. We plotted these errors in Fig. 12. From the errors (after the red vertical dashed line) in Figs. 12(a) and 12(c), the hybrid framework (TWPS baseline) was able to reduce the segmentation error even with an incorrect OCT image pre-segmentation. Therefore, the incorporation of the pre-segmented OCT image in the hybrid framework lead to lower errors for the traditional image analysis-based approach.

The DLWOPS baseline had lower errors as shown in Figs. 11 and 14 as compared to the TWOPS and TWPS baselines. But, at an image level, it sometimes yielded higher segmentation errors as seen in Figs. 12(b) and 12(d). On the other hand, the DLWPS baseline (cascaded framework) improved the segmentation error in a majority of the datasets, with the exception of three datasets, which are our failure cases. As shown in Fig. 13, two datasets presented with saturated tissue regions, which were washed out by specular artifacts. Another dataset contained regions where the interface was barely visible due to being obfuscated by speckle noise of the same amplitude. Due to these reasons, the incorrect pre-segmented OCT image degraded the segmentation performance of the TISN. Consequently, the segmentation error of the TWPS (hybrid framework) and DLWPS (cascaded framework) baselines was increased. As seen in Fig. 12 (after the red vertical dashed line), the DLWOPS baseline performed the best among all other baselines for these datasets.

We expound on the aforementioned reasons for segmentation failure. First, the contextual information available to the cGAN to remove the speckle noise patterns and specular artifacts is hindered when the pixel intensities on the tissue interface are either washed out due to saturation of the line scan camera [11, 20, 51] as shown in Fig. 13(a) (top two rows), or blend in with the background and specular artifacts of the same amplitude [11] as seen in Fig. 13

(a) (bottom). In such outlier cases, the boundary becomes difficult to delineate across multiple scales through downsampling and upsampling operations in the encoder and decoder blocks, such that even the dilated convolutions and dense connections employed in the network are insufficient to recover context from surrounding boundary regions when localizing the interface.

Second, the TISN over-relied on the pre-segmentation in order to generate the final segmenation. During training of the TISN, the original image was coupled with the gold standard pre-segmentation output (see Fig. 6) into a two-channel input. The TISN learned that the tissue boundary in the gold standard pre-segmentation was the location of the start of the true boundary. However, the TISN was not trained with gold standard pre-segmented images that were artificially induced to be corrupted and noisy, such as the images shown in Fig. 13(b). Hence, the performance of the TISN on such incorrectly pre-segmented OCT images is poor.

One way to address this issue is to re-train the framework with gold standard pre-segmentations that have corrupted boundaries. In this pilot work, we did not introduce any corruption to the gold standard pre-segmentation used during training as we wanted to directly measure the performance of the TISN when provided with a pre-segmentation from the cGAN (without regard to any imprecise pre-segmentation). Another option is to exploit the temporal correlation between B-scans in the dataset through recurrent neural networks, which retain long-term information in memory in order to deal with such challenging datasets. We intend to pursue these ideas in our future work.

In this work, we set aside these three challenging failure cases, and estimated the improvement in segmentation accuracy across the remaining 15 limbus datasets. We conducted a paired t-test between the TWOPS and TWPS baselines for each error metric, and determined that our errors were statistically significant ( = 0.0471, = 0.0313). We also calculated paired t-tests between the DLWPS baseline and all other baselines to determine the statistical significance of our results for each error metric. As seen in Table. 2, our DLWPS cascaded framework generated statistically significant results (p 0.05).

TWOPS TWPS DLWOPS
0.0240 0.0014 1.0335e-04
0.0126 0.0012 0.0344
Table 2: Statistical significance between our cascaded framework (DLWPS) against each baseline method for 15 (out of 18) limbal datasets acquired from Devices 2 and 3.
Figure 13: Failure cases of our cascaded framework on three challenging limbal OCT datasets. Columns from left to right: (a) Original B-scans in the limbal OCT volumes, (b) cGAN pre-segmentation results that imprecisely removed speckle noise patterns and specular artifacts above the shallowest tissue interface, (c) The binary segmentation masks from the TISN overlaid in false color (red - foreground, turquoise - background) on the original B-scans, (d) Curve fit to the shallowest interface (red contour).
Figure 14: (a)-(b) HD error and (c)-(d) MADLBP error comparison for the limbal datasets acquired with Devices 2 and 3 respectively. For the limbal datasets, the segmentation results obtained for each baseline method were contrasted exclusively against the expert annotations (G1). These graphs plot errors for the successful segmentation results on 15 limbal test datasets.

4.3 Interface Segmentation at Limbal Junction

During imaging of the limbal region, it is very common to acquire B-scans of the cornea and the limbus in the same dataset. This is because the scan pattern of the OCT scanner that is used to acquire the dataset will sometimes encompass sections of the limbus and the cornea. Bulk tissue motion between B-scans in a dataset is also customary during image acquisition. Therefore, it is crucial to capture the shallowest tissue interface of the limbus and the cornea as it enables distinguishing between these two distinct regions. By correctly locating these interfaces, a registration algorithm can be used to potentially align regions at and below these interfaces, while compensating for bulk tissue motion. To the best of our knowledge, we believe our approach is the first to accurately detect the shallowest corneal and limbal interface in OCT images acquired at the limbal junction even in the presence of severe speckle noise patterns and specular artifacts. Results of our approach are shown in Fig. 15, wherein the shallowest interface is identified in B-scans that partially overlap both the cornea and the limbus.

Figure 15: Segmenting the shallowest tissue interface in OCT datasets, wherein the OCT scanner commenced imaging from the limbus and crossed over into the cornea, thereby encompassing the limbal junction. (a),(b) B-scans #1 and #300 in an OCT dataset corresponding to the limbus and the cornea respectively. (c),(d) B-scans #1 and #220 in a different OCT dataset corresponding to the limbus and the cornea respectively. (e),(f),(g),(h) Segmentation (red curve) of the shallowest tissue interface in images shown in (a),(b),(c) and (d) respectively. Note the partial overlap of the limbal (left) and corneal (right) region in the B-scan in (d), and the correct identification of the shallowest interface in (h).

4.4 Choice of Framework Design

In this work, we proposed to generate an intermediate representation of the OCT image, i.e. the cGAN pre-segmentation, that can influence the performance of a segmentation algorithm. To this end, we proposed a cascaded and hybrid segmentation framework. However, we theorized that there are other framework designs that can be implemented instead of the proposed approaches. Amongst other approaches, for example, we could have utilized a GAN directly for segmenting the tissue interface from the OCT image, or trained a multi-task neural network framework (CNN, GAN, etc.) to provide both the pre-segmentation and the final interface segmentation. We reiterate our motivations next that laid the groundwork for the proposed frameworks over the aforementioned design choices. As mentioned before, our motivations were: 1) To generate a pre-segmentation that could be utilized in a hybrid framework, 2) Integrate the pre-segmentation into the image acquisition pipeline of custom-built OCT scanners, and 3) Incorporate the pre-segmentation in a cascaded framework and compare its segmentation performance against that of a state-of-the-art CNN-based segmentation method [51].

Utilizing the GAN to directly yield the final interface segmentation does not provide an intermediate output, which can be integrated in a hybrid framework. Similarly, the multi-task framework would provide both the pre-segmented OCT image and the final interface segmentation. The pre-segmentation can be directly used in the hybrid framework and imaging pipeline respectively. However, the final segmentation may only be influenced by the shared weights of the multi-task network, and not by the pre-segmentation, which will be different from the final segmentation. Thus, if the pre-segmentation must influence the final interface segmentation (as it should), it may be necessary to train a new framework again (in a cascaded fashion with the multi-task framework) that would include the pre-segmentation. For these reasons and in line with our motivations, we believe that our choice of framework design was warranted.

5 Conclusion

In this paper, we generated an intermediate OCT image representation which can influence the performance of a segmentation algorithm. The intermediate representation is a pre-segmentation, generated by a cGAN, wherein speckle noise patterns and specular artifacts are eliminated just prior to the shallowest tissue interface in the OCT image. We proposed two frameworks that incorporate the intermediate representation: a cascaded framework and a hybrid framework. The cascaded framework comprised of a cGAN and a TISN; the cGAN pre-segmented the OCT image by removing the undesired specular artifacts and speckle noise patterns that confounded boundary segmentation, while the TISN segmented the final tissue interface by combining the original image and the pre-segmentation as inputs. The hybrid framework contained an image analysis-based segmentation method, among other state-of-the-art methods, that exploited the cGAN pre-segmentation to generate the final tissue interface segmentation. The frameworks were trained on corneal and limbal datasets acquired from three different OCT scanners with different scan protocols. They were able to handle varying degrees of specular artifacts, speckle noise patterns, and bulk tissue motion, and deliver consistent segmentation results. We compared the results of our frameworks against those from the state-of-the-art image analysis-based and deep learning-based algorithms. To the best of our knowledge, this is the first approach for OCT-based tissue interface segmentation that integrated the cGAN component of our framework in a hybrid fashion. We have shown the benefit of pre-segmenting the OCT image through the lower segmentation errors that were yielded. Finally, we have shown the utility of our algorithm in being able to segment the tissue interface at the limbal junction. We believe that the cGAN pre-segmentation output can be easily integrated into the image acquisition pipelines of custom-built OCT scanners.

Acknowledgments

We thank our funding sources: NIH 1R01EY021641, Core Grant for Vision Research EY008098-28, DOD awards W81XWH-14-1-0371 and W81XWH-14-1-0370, CMU GSA. We thank NVIDIA Corporation for their GPU donations.

Disclosures

The authors declare that there are no conflicts of interest related to this article.