The changes of retinal vasculature contain substantial diagnostic information for many vascular and systematic diseases. Specifically, diseases may affect arterioles and venules differently. For example, in hypertensive patients, the size of arterioles usually shrinks faster than that of venules. While in diabetic patients, we usually observe the expansion of venules first. Therefore, accurately segmenting and classifying retinal arterioles and venules has a great potential to improve the diagnosis and management of these diseases.
Methods for segmenting and classifying retinal arterioles and venules include two major types: image-processing methods [1, 2] and deep learning based methods [3, 4]. For image-processing methods, a vessel segmentation-arteriovenous classification strategy is usually adopted. A binary mask is first generated by retinal vessel segmentation of the fundus image. Furthermore, the centerlines of the vessels are computed from the binary mask using image morphological operations. Various hand-crafted image features are then extracted around vessel centerlines, which are used to classify the arterioles and the venules in the image.
On the other hand, deep learning based methods are reported to classify arterioles and venules. Welikala et al  present a two-stage method for automated classification of arteriole and venule using deep learning. Retinal vessels are first segmented using a linear detector. The centerlines of arterioles and venules are then classified by a 6-layer neural network. AlBadawi et al.  report a deep learning method that combines an encoding-decoding model and graph-based approach to classify arterioles and venules. These methods substantially improve segmentation accuracies as opposed to traditional image-understanding approaches, but the pipelined workflow may undermine the stability of the methods.
We propose a novel architecture of deep convolutional neural network for segmenting and classifying arterioles and venules on retinal fundus images. At beginning, we first adopt the encoding-decoding structure (Unet) as the backbone network of our proposed model. However, the model generates poor segmentation and classification results with the classic convolutional layers in Unet. One explanation of this problem is that retinal vessels in fundus images strictly follow a topological distribution: the same type of blood vessels does not intersect itself. The fixed receptive fields of classic convolutional layers are insufficient to represent such global image information. Therefore, to improve the accuracies of segmentation and classification, we develop a special encoding path that couples InceptionV4 modules and Cascading Dilated Convolutions (CDCs) on top of the backbone network. The model is thus able to extract and fuse high-level semantic features from multi-scale receptive fields. The network structure is shown in Fig. 1.
This network takes the original color fundus image as inputs and multi-class labels as outputs, which follows an end-to-end training process, requiring limited pre- and post-processing of the image data. All the image features are computed and utilized internally in the deep neural network, where no hand-crafted features or task-specific assumptions are included. The proposed method is evaluated on the DRIVE dataset  and has achieved state-of-the-art performance.
We take the encoding-decoding structure (Unet 
) as the backbone network of our model. In the encoding stage, we use Inception convolutional modules to extract and fuse high-level features via multi-layer feature extraction. To further enhance the ability of information collection, we implement Cascading Dilated Convolutions (CDCs) to extract features with enlarged receptive fields. In the decoding stage, a step-by-step upsampling process restores image resolution. Skip connections between the encoding and decoding paths allow the information directly flows through corresponding layers, which maintains the information magnitude, avoiding gradient vanishing during training process. Fig.1 shows the overall architecture of our proposed model.
2.1 Inception convolutional module
In the fundus images, retinal vasculature is complex due to the irregular distribution of vessel bifurcations and intersections. We thus adopt Inception convolution module in  for the feature extraction of retinal arterioles and venules. As shown in Fig. 2
. Inception module comprises of a convolutional block and a down-sampling block. The convolutional block contains multiple branches, where each branch first uses the 1x1 convolution kernel to reduce the number of feature channels, and the convolutions with kernels of 3x3, 1x7, and 7x1 are then used to compute feature maps from different scales. Finally, multi-scale information is grouped by concatenating all feature maps. The down-sampling block is a concatenation of max-pooling layer and convolution kernel with stride of 2 to resample the input/intermediate feature maps. Compared the traditional max-pooling, this new structure could effectively avoid information loss.
2.2 Cascading Dilated Convolutions
In the arteriovenous classification of retinal fundus images, vessels usually have long and curvy shapes. However, the small receptive field in classic convolution is designed for relatively local features and works poorly for classifying most of the main branches of retinal arterioles and venules. Therefore, we adopt dilated convolution which is described in  (see Fig. 3). The authors combines multiple convolutions with different dilation rates in a parallel fashion, extracting features from different scales by using enlarged receptive fields. Different from their original parallel design for segmenting multiple objects with different sizes, we implement a novel structure with cascading dilated convolutions, called CDCs. The cascading design suits the continuous structures of retinal vessel trees which have thick trunks and thin branches. The proposed CDCs combines four dilated convolution layers of rates 2, 4, 8, and 12, respectively, which represent different sizes of receptive fields. CDCs is shown in Fig.6.
3 Experiments and Results
We conduct our experiment on a publicaly available dataset, the DRIVE dataset. DRIVE dataset contains 40 retinal fundus images with the annotations of arterioles and venules. An example of retinal fundus images and annotations is shown in Figure XXX. The image size is 584*565. We use a randomly selected subset of 30 images for training the deep neural network and the rest 10 images to evaluate the results. Due to the small size of this dataset, the images are first randomly cropped to a size of 512*512, and scaling, panning, random clipping, and etc are then applied to augment the dataset. Finally, the size of training dataset is expended to 2,490 images. A case-level five-fold cross validation is performed within the training dataset. In order to balance the data imbalance between arteriovenous and background pixels, we set the class weight to 5 for veins and arteries during the experiment.
As shown in Fig. 5, five labels are provided by the anotations which are red for arterioles, blue for venules, green for intersections, black for background and white for uncertain pixels. In the experiment, we remove the label of uncertain pixels, resulting in four classes/labels for the model training and predicting. Furthermore, the intersections have textural characteristics from both arterioles and venules, we thus set a very small class weight (1e-12) to the intersection to avoid ambiguity during training.
3.2 Model Regularization
To prevent the possible overfitting issue from training a small dataset, we apply the following approachs: (1) Add Batch Normalization layer after each layer of convolution. (2) In the encoding stage, a dropout operation is added after each convolution. The threshold of dropout is set to 0.2, respectively. (3) Reduce the number of parameters in the convolution layer. In encoding stage, the numbers of channels are set to 32, 32, 64, 128, respectively. The convolution number in Decoder stage is set to 128, 64, and 32, respectively.
3.3 Experimental results
The model is trained by standard backpropagation and gradient descent approaches. The learning rate is initialized at 1e-4, and to increase the speed of convergence, poly decay
is used as the learning rate decay strategy. We use cross-entropy between the prediction and the label as loss function. During the experiment, the batch size is set to 4, and two NVIDA 1080 GPUs are used for parallel training.
We use true positive rate (TPR) and accuracy to evaluate the classification results of the proposed model.
where the subscripts at and ve refer to arterioles and venules, respectively. TP stands for true positive and FP stands for false positive, as described in . The experimental results are shown in Table 1. Fig. 6 shows an example of the segmentation and classification results.
In this work, we present a novel deep-learning framework for automated segmentation and classification of arterioles and venules. We adopt Unet (encoding-decoding) structure as our backbone network, which provides relatively good baseline of this work. Furthermore, we exploit the Inception convolution module to extract and fuse high-level features. During the experiment, we discover that the classic convolution suffers ineffective information collection for complex vascular structures, due to the small receptive fields. To address this, we develop a novel structure named cascading dilated convolution (CDC), which progressively enlarge the size of receptive fields to segment and classify arterioles and venules. Finally, several methods are applied to prevent the possible overfitting during the model training. The experimental results demonstrate the superior performance of the proposed method.
-  Xiayu Xu, Wenxiang Ding, Michael D Abràmoff, and Ruofan Cao, “An improved arteriovenous classification method for the early diagnostics of various diseases in retinal image,” Computer methods and programs in biomedicine, vol. 141, pp. 3–9, 2017.
-  Claudia Kondermann, Daniel Kondermann, and Michelle Yan, “Blood vessel classification into arteries and veins in retinal images,” in Medical Imaging 2007: Image Processing. International Society for Optics and Photonics, 2007, vol. 6512, p. 651247.
-  RA Welikala, PJ Foster, PH Whincup, Alicja R Rudnicka, Christopher G Owen, DP Strachan, SA Barman, et al., “Automated arteriole and venule classification using deep learning for retinal images from the uk biobank cohort,” Computers in biology and medicine, vol. 90, pp. 23–32, 2017.
-  Sufian AlBadawi and MM Fraz, “Arterioles and venules classification in retinal images using fully convolutional deep neural network,” in International Conference Image Analysis and Recognition. Springer, 2018, pp. 659–668.
-  Joes Staal, Michael D Abràmoff, Meindert Niemeijer, Max A Viergever, and Bram Van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE transactions on medical imaging, vol. 23, no. 4, pp. 501–509, 2004.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi,
“Inception-v4, inception-resnet and the impact of residual connections on learning.,”in AAAI, 2017, vol. 4, p. 12.
-  Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2018.
-  W Liu, A Rabinovich, and AC Berg, “Parsenet: Looking wider to see better. arxiv, 2015,” arXiv preprint arXiv:1506.04579.
-  Xiayu Xu, Rendong Wang, Peilin Lv, Bin Gao, Chan Li, Zhiqiang Tian, Tao Tan, and Feng Xu, “Simultaneous arteriole and venule segmentation with domain-specific loss function on a new public database,” Biomedical Optics Express, vol. 9, no. 7, pp. 3153–3166, 2018.
-  Xiayu Xu, Wenxiang Ding, Xuemin Wang, Ruofan Cao, Maiye Zhang, Peilin Lv, and Feng Xu, “Smartphone-based accurate analysis of retinal vasculature towards point-of-care diagnostics,” Scientific reports, vol. 6, pp. 34603, 2016.
-  Jiong Zhang, Behdad Dashtbozorg, Erik Bekkers, Josien PW Pluim, Remco Duits, and Bart M ter Haar Romeny, “Robust retinal vessel segmentation via locally adaptive derivative frames in orientation scores,” IEEE transactions on medical imaging, vol. 35, no. 12, pp. 2631–2644, 2016.
-  Qiaoliang Li, Bowei Feng, LinPei Xie, Ping Liang, Huisheng Zhang, and Tianfu Wang, “A cross-modality learning approach for vessel segmentation in retinal images.,” IEEE Trans. Med. Imaging, vol. 35, no. 1, pp. 109–118, 2016.