1 Introduction
Optical coherence tomography (OCT) is a widely used noninvasive and nonionizing modality for retina imaging which can obtain 3D retina images rapidly [8]. The depth information of the retina from OCT enables measurements of layer thicknesses, which are known to change with certain diseases [11]. Fast automated retinal layer segmentation tools are crucial for large cohort studies of these diseases.
Automated methods for retinal layer segmentation have been well explored ([12, 1]
). Stateoftheart methods use machine learning (e.g, random forest (RF)
[9]) for coarse pixelwise labeling and then level set [3] or graph methods [6, 9] to guarantee the segmentation topology (i.e., the anatomically correct retinal layer ordering) and obtain the final boundary surfaces. They are limited by the manually selected features for the pixelwise labeling task and the manually tuned parameters of the graph. To build the graph, boundary distances and smoothness constraints which are spatially varying need to be experimentally assigned. The manually selected features and fine tuned graph parameters limit the application across cohorts.Deep learning automatically extracts relevant image features from the training data and performs the segmentation in a feed forward fashion. The fully convolutional network (FCN) proposed by Long et al. [10] is a successful deep learning segmentation method and the UNet variant [14] is widely used for medical image segmentation. Both Roy et al. [15] and He et al. [7] proposed FCNs for retinal layer segmentation (the former also included fluid segmentation). However, these FCN methods provide pixelwise labeling without explicitly utilizing high level priors like shape, and neither guarantee the correct topology. Examples of FCNs giving anatomical infeasible results are shown in Fig. 4.
In order to obtain structured output directly from deep networks, Zheng et al. [16]
implemented conditional random field as a recurrent neural network. This method can provide better label consistency but cannot guarantee global topology. BenTaieb et al.
[2]proposed to explicitly integrate the topology priors into the loss function during training and Romero et al.
[13] used a second autoencoder network to learn the output shape prior. Although those methods can improve the segmentation results by utilizing shape and topology priors, they still cannot guarantee the correct topology.To obtain a topologically correct segmentation of the retinal layers from a deep network in a single feed forward propagation, we propose a cascaded FCN framework that transforms the layer segmentation problem from pixel labeling into a boundary position regression problem. Instead of outputting the boundary position directly, we use the network to output the distance between two boundaries, i.e, the layer thickness. The first network (SNet) performs pixel labeling and the second regression network (RNet) takes the topologically unconstrained SNet results and outputs layer thicknesses for each layer and each position. Relu [4] activation is used as the final operation of RNet, which guarantees the nonnegativity of the output layer thicknesses. Since the boundary position is acquired by summing up the corresponding nonnegative layer thicknesses, the ordering of the reconstructed boundaries is guaranteed even at the fovea where the distances between boundaries can be zero.
2 Method
Fig. 1 shows a schematic of our framework. We describe each step in our processing below.
Preprocessing A typical retinal Bscan is which can require large amounts of memory if directly processed by a deep network. To address this, we approximately identify the Bruch’s membrane, flatten the retina and crop the image to remove the black background. Overlapped patches of size are extracted and segmented by the deep network.
Segmentation Network (SNet) Overview Our segmentation FCN (SNet) is based on the UNet [14]. It takes a image as input and the output is a
segmentation probability map which includes probability maps for the eight retinal layers and the background above and below the retina (vitreous and choroid, respectively). Fig.
2 shows the details of SNet; specifically, four max pooling and 19 convolution layers are used.Regression Net (RNet) Overview The RNet consists of two parts: a UNet identical to our SNet (except for the input channels) and a dense layer. The input to the RNet are the topologically unconstrained results from the SNet. RNet is applied to learn the shape and topology priors of the layer structures, while being resistant to the segmentation defects. (see Fig. 4
for examples). The dense layer of the RNet uses Relu activation and thus guarantees a nonnegative output vector. The size of this output is
= 1152, corresponding to the thicknesses of the 9 layers over the 128 Ascans being segmented.2.1 Training
We train our framework in two steps: SNet is trained with a common pixelwise labeling scheme because every pixel in the training data can be treated as an independent training sample and the total training data size is enlarged [5]. RNet is trained with augmented ground truth masks to learn the shape and topology prior. An alternative way to train the RNet is to take the SNet output as input and output the ground truth layer thickness. However, training in this manner would be suboptimal as the SNet output is not the ground truth mask. Thus, the training pairs of the SNet output and the ground truth thicknesses are biased, which would bias the resultant RNet. Therefore, we train both networks independently. We note that training the RNet separately with simulated training masks allows this network to be generalized for use with other layered structures.
SNet training The SNet is trained with a common pixelwise labeling scheme, namely the crossentropy loss function:
(1) 
Here, is an indicator function on the ground truth label of pixel and is the prediction probability from the deep network that the pixel belongs to layer . Standard backpropagation is used to minimize the loss and update the network parameter .
RNet training The purpose of the regression net is to find a mapping from the pixelwise segmentation probability maps into layer thicknesses. We simulate topology defects with the ground truth mask and use RNet to recover the correct layer thicknesses. The training of RNet is based on minimizing the mean squared loss function below with standard backpropagation,
(2) 
Here, is the ground truth mask, is the corresponding ground truth layer thickness, is the prediction from the regression net, is the simulated defects and Gaussian noise [13] added to the ground truth mask. The simulated defects are random ellipses with magnitude ranging from to . Examples of the simulated input masks to Rnet are shown in Fig. 3. To prevent Rnet from overfitting the thickness values, we randomly move the position of the masks vertically and dilate or shrink the masks.
3 Experiments
Ten fully delineated Spectralis Spectral Domain OCT (SDOCT) scans (of size ) were used for training. 20 overlapped patches were extracted within each BScan for training both networks, which yielded samples for training. 20 SDOCT macular OCT scans (of size ) were acquired for validation. Ten data sets in our validation cohort were diagnosed with multiple sclerosis (MS) and the remaining ten were healthy controls.
3.1 Results
Boundary segmentation accuracy was evaluated by comparing the automatic segmentation results with manual delineation along every Ascan. The mean absolute distance (MAD), root mean square error (RMSE), and mean signed difference (MSD) were calculated for the stateoftheart RF + Graph method (RF+G) [9]^{1}^{1}1Code for Lang et al. downloaded from https://www.nitrc.org/projects/aura_tools/ and our proposed deep networks (SNet + RNet). The Wilcoxon signed test was used to compare these two methods and the 95quantile of the MSD is also reported. These results are shown in Table. 1. The depth resolution is 3.9 m. From the table, both methods have MAE and RMSE less than 1 pixel and our proposed method achieves similar or slightly better results than the stateoftheart graph methods. The MSD and 95 quantile show that compared to our proposed method, the graph method is more biased. Figs. 4 and 5 show some examples that when the image is of poor quality or the boundaries in the image are not clear, the SNet results can be wrong whereas RNet guarantees the correct topology while maintaining stateoftheart accuracy.
The total segmentation time of our proposed deep network for one scan is 10 s (preprocessing and reconstruction included), of which the deep network inference takes 5.85 s. The segmentation is performed with Python 3.6 and the preprocessing is performed in Matlab R2016b called directly from the Python environment. The RF+G method, had a total segmentation time of 100s in Matlab R2016b, of which RF classification was 62 s and the graph method took 20 s.
MAD ()  RMSE ()  

Boundary  RF+G  S+RNet  p  RF+G  S+RNet  p 
VitreRNFL  2.24  2.22  0.92  2.88  2.81  0.78 
RNFLGCL  2.90  2.95  0.76  4.28  4.45  0.51 
IPLINL  3.10  2.99  0.82  4.42  3.97  0.16 
INLOPL  3.09  3.22  0.20  4.31  4.18  0.90 
OPLONL  2.74  2.78  0.76  3.92  3.82  0.95 
ELM  2.32  2.63  0.07  2.94  3.29  0.06 
ISOS  2.38  2.12  0.76  2.91  2.62  0.97 
OSRPE  3.34  3.44  0.74  4.43  4.39  0.92 
RPE  3.33  3.02  0.84  3.94  3.72  0.90 
Overall  2.83  2.82  0.42  3.78  3.69  0.56 
MSD () (95% quantile)  
Boundary  RF+G  S+RNet 
VitreRNFL  1.02 (3.92, 6.77)  0.32 (5.06, 6.12) 
RNFLGCL  0.37 (9.14, 7.26)  0.43 (9.13, 7.21) 
IPLINL  0.68 (8.26, 8.70)  0.67 (7.03, 8.75) 
INLOPL  0.04 (8.70, 8.00)  1.08 (9.44, 6.89) 
OPLONL  0.53 (7.75, 7.98)  0.80 (6.78, 8.42) 
ELM  0.13 (5.69, 6.34)  0.94 (7.51, 5.66) 
ISOS  1.53 (3.09, 7.31)  0.07 (4.70, 5.83) 
OSRPE  1.15 (6.85, 12.20)  0.06 (9.26, 10.39) 
RPE  2.53 (3.27, 12.27)  1.10 (5.67, 10.72) 
Overall  0.80 (6.62, 9.08)  0.03 (7.54, 8.17) 
(a)  (b) 
(c)  (d) 
4 Discussion and conclusion
In this paper, we presented a fast topology guaranteed deep learning method for retinal OCT segmentation. Our method adds a thickness regression network after a conventional pixelwise labeling network and utilizes the Relu activation to guarantee the nonnegativity of the output and thus guarantee the topology. Since the RNet is trained on masks that can be easily generated, our proposed framework can provide a topology guaranteed segmentation solution for other layered structures.
5 Acknowledgments
This work was supported by the NIH/NEI under grant R01EY024655.
References
 [1] Antony, B.J., Miri, M.S., Abràmoff, M.D., Kwon, Y.H., Garvin, M.K.: Automated 3D segmentation of multiple surfaces with a shared hole: segmentation of the neural canal opening in SDOCT volumes. In: 17 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2014). Lecture Notes in Computer Science, vol. 8673, pp. 739–746. Springer Berlin Heidelberg (2014)
 [2] BenTaieb, A., Hamarneh, G.: Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In: 19 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2016). Lecture Notes in Computer Science, vol. 9901, pp. 460–468. Springer Berlin Heidelberg (2016)
 [3] Carass, A., Lang, A., Hauser, M., Calabresi, P.A., Ying, H.S., Prince, J.L.: Multipleobject geometric deformable model for segmentation of macular OCT. Biomed. Opt. Express 5(4), 1062–1074 (2014)

[4]
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for lvcsr using rectified linear units and dropout. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. pp. 8609–8613. IEEE (2013)
 [5] Dou, Q., Yu, L., Chen, H., Jin, Y., Yang, X., Qin, J., Heng, P.A.: 3d deeply supervised network for automated segmentation of volumetric medical images. Medical Image Analysis (2017)
 [6] Garvin, M.K., Abràmoff, M.D., Wu, X., Russell, S.R., Burns, T.L., Sonka, M.: Automated 3D intraretinal layer segmentation of macular spectraldomain optical coherence tomography images. IEEE Trans. Med. Imag. 28(9), 1436–1447 (2009)
 [7] He, Y., Carass, A., Yun, Y., Zhao, C., Jedynak, B.M., Solomon, S.D., Saidha, S., Calabresi, P.A., Prince, J.L.: Towards topological correct segmentation of macular oct from cascaded fcns. In: Fetal, Infant and Ophthalmic Medical Image Analysis, pp. 202–209. Springer (2017)
 [8] Hee, M.R., Izatt, J.A., Swanson, E.A., Huang, D., Schuman, J.S., Lin, C.P., Puliafito, C.A., Fujimoto, J.G.: Optical coherence tomography of the human retina. Arch. Ophthalmol. 113(3), 325–332 (1995)
 [9] Lang, A., Carass, A., Hauser, M., Sotirchos, E.S., Calabresi, P.A., Ying, H.S., Prince, J.L.: Retinal layer segmentation of macular OCT images using boundary classification. Biomed. Opt. Express 4(7), 1133–1152 (2013)

[10]
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3431–3440 (June 2015)
 [11] Medeiros, F.A., Zangwill, L.M., Alencar, L.M., Bowd, C., Sample, P.A., Jr., R.S., Weinreb, R.N.: Detection of Glaucoma Progression with Stratus OCT Retinal Nerve Fiber Layer, Optic Nerve Head, and Macular Thickness Measurements. Invest. Ophthalmol. Vis. Sci. 50(12), 5741–5748 (2009)
 [12] Rathke, F., Schmidt, S., Schnörr, C.: Probabilistic IntraRetinal Layer Segmentation in 3D OCT Images Using Global Shape Regularization. Medical Image Analysis 18(5), 781–794 (2014)
 [13] Romero, A., Drozdzal, M., Erraqabi, A., Jégou, S., Bengio, Y.: Image segmentation by iterative inference from conditional score estimation. arXiv preprint arXiv:1705.07450 (2017)
 [14] Ronneberger, O., Fischer, P., Brox, T.: UNet: Convolutional Networks for Biomedical Image Segmentation. In: 18 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2015). Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer Berlin Heidelberg (2015)
 [15] Roy, A.G., Conjeti, S., Karri, S.P.K., Sheet, D., Katouzian, A., Wachinger, C., Navab, N.: ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network. CoRR abs/1704.02161 (2017)
 [16] Zheng, S., Jayasumana, S., RomeraParedes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: The IEEE International Conference on Computer Vision (ICCV) (December 2015)
Comments
There are no comments yet.