Optical coherence tomography (OCT) is a widely used non-invasive and non-ionizing modality for retina imaging which can obtain 3D retina images rapidly . The depth information of the retina from OCT enables measurements of layer thicknesses, which are known to change with certain diseases . Fast automated retinal layer segmentation tools are crucial for large cohort studies of these diseases.
). State-of-the-art methods use machine learning (e.g, random forest (RF)) for coarse pixel-wise labeling and then level set  or graph methods [6, 9] to guarantee the segmentation topology (i.e., the anatomically correct retinal layer ordering) and obtain the final boundary surfaces. They are limited by the manually selected features for the pixel-wise labeling task and the manually tuned parameters of the graph. To build the graph, boundary distances and smoothness constraints which are spatially varying need to be experimentally assigned. The manually selected features and fine tuned graph parameters limit the application across cohorts.
Deep learning automatically extracts relevant image features from the training data and performs the segmentation in a feed forward fashion. The fully convolutional network (FCN) proposed by Long et al.  is a successful deep learning segmentation method and the U-Net variant  is widely used for medical image segmentation. Both Roy et al.  and He et al.  proposed FCNs for retinal layer segmentation (the former also included fluid segmentation). However, these FCN methods provide pixel-wise labeling without explicitly utilizing high level priors like shape, and neither guarantee the correct topology. Examples of FCNs giving anatomical infeasible results are shown in Fig. 4.
In order to obtain structured output directly from deep networks, Zheng et al. 
implemented conditional random field as a recurrent neural network. This method can provide better label consistency but cannot guarantee global topology. BenTaieb et al.
proposed to explicitly integrate the topology priors into the loss function during training and Romero et al. used a second auto-encoder network to learn the output shape prior. Although those methods can improve the segmentation results by utilizing shape and topology priors, they still cannot guarantee the correct topology.
To obtain a topologically correct segmentation of the retinal layers from a deep network in a single feed forward propagation, we propose a cascaded FCN framework that transforms the layer segmentation problem from pixel labeling into a boundary position regression problem. Instead of outputting the boundary position directly, we use the network to output the distance between two boundaries, i.e, the layer thickness. The first network (S-Net) performs pixel labeling and the second regression network (R-Net) takes the topologically unconstrained S-Net results and outputs layer thicknesses for each layer and each position. Relu  activation is used as the final operation of R-Net, which guarantees the non-negativity of the output layer thicknesses. Since the boundary position is acquired by summing up the corresponding non-negative layer thicknesses, the ordering of the reconstructed boundaries is guaranteed even at the fovea where the distances between boundaries can be zero.
Fig. 1 shows a schematic of our framework. We describe each step in our processing below.
Preprocessing A typical retinal B-scan is which can require large amounts of memory if directly processed by a deep network. To address this, we approximately identify the Bruch’s membrane, flatten the retina and crop the image to remove the black background. Overlapped patches of size are extracted and segmented by the deep network.
Segmentation Network (S-Net) Overview Our segmentation FCN (S-Net) is based on the U-Net . It takes a image as input and the output is a
segmentation probability map which includes probability maps for the eight retinal layers and the background above and below the retina (vitreous and choroid, respectively). Fig.2 shows the details of S-Net; specifically, four max pooling and 19 convolution layers are used.
Regression Net (R-Net) Overview The R-Net consists of two parts: a U-Net identical to our S-Net (except for the input channels) and a dense layer. The input to the R-Net are the topologically unconstrained results from the S-Net. R-Net is applied to learn the shape and topology priors of the layer structures, while being resistant to the segmentation defects. (see Fig. 4
for examples). The dense layer of the R-Net uses Relu activation and thus guarantees a non-negative output vector. The size of this output is= 1152, corresponding to the thicknesses of the 9 layers over the 128 A-scans being segmented.
We train our framework in two steps: S-Net is trained with a common pixel-wise labeling scheme because every pixel in the training data can be treated as an independent training sample and the total training data size is enlarged . R-Net is trained with augmented ground truth masks to learn the shape and topology prior. An alternative way to train the R-Net is to take the S-Net output as input and output the ground truth layer thickness. However, training in this manner would be sub-optimal as the S-Net output is not the ground truth mask. Thus, the training pairs of the S-Net output and the ground truth thicknesses are biased, which would bias the resultant R-Net. Therefore, we train both networks independently. We note that training the R-Net separately with simulated training masks allows this network to be generalized for use with other layered structures.
S-Net training The S-Net is trained with a common pixel-wise labeling scheme, namely the cross-entropy loss function:
Here, is an indicator function on the ground truth label of pixel and is the prediction probability from the deep network that the pixel belongs to layer . Standard back-propagation is used to minimize the loss and update the network parameter .
R-Net training The purpose of the regression net is to find a mapping from the pixel-wise segmentation probability maps into layer thicknesses. We simulate topology defects with the ground truth mask and use R-Net to recover the correct layer thicknesses. The training of R-Net is based on minimizing the mean squared loss function below with standard back-propagation,
Here, is the ground truth mask, is the corresponding ground truth layer thickness, is the prediction from the regression net, is the simulated defects and Gaussian noise  added to the ground truth mask. The simulated defects are random ellipses with magnitude ranging from to . Examples of the simulated input masks to R-net are shown in Fig. 3. To prevent R-net from over-fitting the thickness values, we randomly move the position of the masks vertically and dilate or shrink the masks.
Ten fully delineated Spectralis Spectral Domain OCT (SD-OCT) scans (of size ) were used for training. 20 overlapped patches were extracted within each B-Scan for training both networks, which yielded samples for training. 20 SD-OCT macular OCT scans (of size ) were acquired for validation. Ten data sets in our validation cohort were diagnosed with multiple sclerosis (MS) and the remaining ten were healthy controls.
Boundary segmentation accuracy was evaluated by comparing the automatic segmentation results with manual delineation along every A-scan. The mean absolute distance (MAD), root mean square error (RMSE), and mean signed difference (MSD) were calculated for the state-of-the-art RF + Graph method (RF+G) 111Code for Lang et al. downloaded from https://www.nitrc.org/projects/aura_tools/ and our proposed deep networks (S-Net + R-Net). The Wilcoxon signed test was used to compare these two methods and the 95quantile of the MSD is also reported. These results are shown in Table. 1. The depth resolution is 3.9 m. From the table, both methods have MAE and RMSE less than 1 pixel and our proposed method achieves similar or slightly better results than the state-of-the-art graph methods. The MSD and 95 quantile show that compared to our proposed method, the graph method is more biased. Figs. 4 and 5 show some examples that when the image is of poor quality or the boundaries in the image are not clear, the S-Net results can be wrong whereas R-Net guarantees the correct topology while maintaining state-of-the-art accuracy.
The total segmentation time of our proposed deep network for one scan is 10 s (preprocessing and reconstruction included), of which the deep network inference takes 5.85 s. The segmentation is performed with Python 3.6 and the preprocessing is performed in Matlab R2016b called directly from the Python environment. The RF+G method, had a total segmentation time of 100s in Matlab R2016b, of which RF classification was 62 s and the graph method took 20 s.
|MAD ()||RMSE ()|
|MSD () (95% quantile)|
|Vitre-RNFL||1.02 (-3.92, 6.77)||0.32 (-5.06, 6.12)|
|RNFL-GCL||-0.37 (-9.14, 7.26)||-0.43 (-9.13, 7.21)|
|IPL-INL||0.68 (-8.26, 8.70)||0.67 (-7.03, 8.75)|
|INL-OPL||-0.04 (-8.70, 8.00)||-1.08 (-9.44, 6.89)|
|OPL-ONL||0.53 (-7.75, 7.98)||0.80 (-6.78, 8.42)|
|ELM||0.13 (-5.69, 6.34)||-0.94 (-7.51, 5.66)|
|IS-OS||1.53 (-3.09, 7.31)||-0.07 (-4.70, 5.83)|
|OS-RPE||1.15 (-6.85, 12.20)||-0.06 (-9.26, 10.39)|
|RPE||2.53 (-3.27, 12.27)||1.10 (-5.67, 10.72)|
|Overall||0.80 (-6.62, 9.08)||0.03 (-7.54, 8.17)|
4 Discussion and conclusion
In this paper, we presented a fast topology guaranteed deep learning method for retinal OCT segmentation. Our method adds a thickness regression network after a conventional pixel-wise labeling network and utilizes the Relu activation to guarantee the non-negativity of the output and thus guarantee the topology. Since the R-Net is trained on masks that can be easily generated, our proposed framework can provide a topology guaranteed segmentation solution for other layered structures.
This work was supported by the NIH/NEI under grant R01-EY024655.
-  Antony, B.J., Miri, M.S., Abràmoff, M.D., Kwon, Y.H., Garvin, M.K.: Automated 3D segmentation of multiple surfaces with a shared hole: segmentation of the neural canal opening in SD-OCT volumes. In: 17 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2014). Lecture Notes in Computer Science, vol. 8673, pp. 739–746. Springer Berlin Heidelberg (2014)
-  BenTaieb, A., Hamarneh, G.: Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In: 19 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2016). Lecture Notes in Computer Science, vol. 9901, pp. 460–468. Springer Berlin Heidelberg (2016)
-  Carass, A., Lang, A., Hauser, M., Calabresi, P.A., Ying, H.S., Prince, J.L.: Multiple-object geometric deformable model for segmentation of macular OCT. Biomed. Opt. Express 5(4), 1062–1074 (2014)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for lvcsr using rectified linear units and dropout. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. pp. 8609–8613. IEEE (2013)
-  Dou, Q., Yu, L., Chen, H., Jin, Y., Yang, X., Qin, J., Heng, P.A.: 3d deeply supervised network for automated segmentation of volumetric medical images. Medical Image Analysis (2017)
-  Garvin, M.K., Abràmoff, M.D., Wu, X., Russell, S.R., Burns, T.L., Sonka, M.: Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images. IEEE Trans. Med. Imag. 28(9), 1436–1447 (2009)
-  He, Y., Carass, A., Yun, Y., Zhao, C., Jedynak, B.M., Solomon, S.D., Saidha, S., Calabresi, P.A., Prince, J.L.: Towards topological correct segmentation of macular oct from cascaded fcns. In: Fetal, Infant and Ophthalmic Medical Image Analysis, pp. 202–209. Springer (2017)
-  Hee, M.R., Izatt, J.A., Swanson, E.A., Huang, D., Schuman, J.S., Lin, C.P., Puliafito, C.A., Fujimoto, J.G.: Optical coherence tomography of the human retina. Arch. Ophthalmol. 113(3), 325–332 (1995)
-  Lang, A., Carass, A., Hauser, M., Sotirchos, E.S., Calabresi, P.A., Ying, H.S., Prince, J.L.: Retinal layer segmentation of macular OCT images using boundary classification. Biomed. Opt. Express 4(7), 1133–1152 (2013)
-  Medeiros, F.A., Zangwill, L.M., Alencar, L.M., Bowd, C., Sample, P.A., Jr., R.S., Weinreb, R.N.: Detection of Glaucoma Progression with Stratus OCT Retinal Nerve Fiber Layer, Optic Nerve Head, and Macular Thickness Measurements. Invest. Ophthalmol. Vis. Sci. 50(12), 5741–5748 (2009)
-  Rathke, F., Schmidt, S., Schnörr, C.: Probabilistic Intra-Retinal Layer Segmentation in 3-D OCT Images Using Global Shape Regularization. Medical Image Analysis 18(5), 781–794 (2014)
-  Romero, A., Drozdzal, M., Erraqabi, A., Jégou, S., Bengio, Y.: Image segmentation by iterative inference from conditional score estimation. arXiv preprint arXiv:1705.07450 (2017)
-  Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: 18 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2015). Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer Berlin Heidelberg (2015)
-  Roy, A.G., Conjeti, S., Karri, S.P.K., Sheet, D., Katouzian, A., Wachinger, C., Navab, N.: ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network. CoRR abs/1704.02161 (2017)
-  Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: The IEEE International Conference on Computer Vision (ICCV) (December 2015)