Retinal imaging is the only feasible way to directly inspect the vessels and the central nervous system in the human body in vivo, which can give us informative signs and indications on possible disorders. Fundoscopy has thus become an important method and the routing examination to help diagnosis of many diseases, including diabetes, hypertension, arterial hardening, and so forth chatziralli2012value. Fundoscopy is easy to operate, quick, accurate, and relatively low in cost. Medical doctors, not only ophthalmologists, are considering a wider use of fundoscopy.
However, similarly to other types of medical images, retina images exhibit high complexity and huge diversity JIN2019. Sufficiently trained specialists are required to handle ever-increasing requests to read such images. Moreover, reading retinal images by specialists can potentially be error-prone under this highly demanded circumstance. To that end, computer-aided diagnosis can be a promising technical break-through that automatically analyzes such retina images.
Various high-level tasks of retinal image analysis, such as the calculation of central artery equivalent, central vein equivalent, artery-to-vein diameter ratio huang2018artery, as well as the detection of retinal artery occlusion and retinal vein occlusion woo2016associations, which can reveal risks of stroke, cerebral atrophy, cognitive decline, and myocardial infarct, etc., are built on top of vessel segmentation and artery/vein (A/V) classification. A vast amount of research efforts have been made for both components. For vessel segmentation, most of the earliest attempts are based on the local information of retinal images Cheng2014; 7042289, including intensity, color, some hand-crafted features, etc. In recent years, UNet UNet-based segmentation models become more popular 8036917; 8341481. As for A/V classification, a classic approach is applied to segmented vessels in retinal images HUANG2018197, where some structural prior on vessels has been leveraged for better performance (alam2018combining; 8598955). Deep models are also explored and achieved the state-of-the-art performance 10.1007/978-3-319-93000-8_71. Meanwhile, lack of large-scale labeled datasets motivates data augmentation with generative adversarial networks 8055572.
Although many approaches have been proposed in this area, their performances are not satisfactory yet. This is because the retina images are usually complicated and full of noises. It is hard to extract all vessels, including minor ones, while not introducing too many false vessel pixels. Moreover, the available training data are very limited. In most of the public datasets, the number of retina images for training is no more than
. Furthermore, things become more difficult when we need to classify the vessels into artery or vein, because this further increases the unbalance between the numbers of pixels on artery or vein vessels and the number of background (non-vessel) pixels.
In this paper, we propose a method for automatically analyzing retinal images, such as the one in Fig. LABEL:fig_story. Our method consists of two components: (i) A neural model, coined SeqNet, that segments vessels and classifies each pixel into artery and vein, and (ii) post-processing to refine initial classification by SeqNet. The main idea behind our neural model is to jointly training the model, but yet segmentation and classification streams are sequential rather than simultaneous, as shown in Fig. LABEL:fig_structure. The segmentation stream only cares about vessel extraction. Meanwhile, the classification stream utilizes segmentation results to immunize itself against cluttered backgrounds in input images. The existing methods that simultaneously do segmentation and classification suffer from the severe bias in label distributions since background pixels are dominant in retinal images. We remedy this imbalance by our sequential model, dividing the task into the background/vessel classification (i.e. segmentation) task and artery/vein classification task, where we employ the state-of-the-art model li2019iternet for the segmentation stream.
There may still be some errors in classification results. This is because fully convolutional network-like models (such as UNet-based ones 10.1007/978-3-319-93000-8_71; hemelings2019artery; 8759380), or more generally convolution operations, are more suitable to extract local features than handling global context. Hence all UNet-based models’ prediction performances depend on local cues, such as color and contrast, rather than the structure of the whole vessel system. This locality leads to many minor errors, as shown in Fig. LABEL:fig_segments_analysis_2(a) and (b).
We thus incorporate the global context, i.e., the structure of the vessel system, into our method via post-processing for further improving the performance. We divide extracted vessels into many small segments and unifying the pixel-level predictions in each of them into a single prediction, called intra-segment label unification. We also propose a new strategy called inter-segment prediction propagation (PP). This strategy can further refine classification results among neighboring segments by propagating predictions to neighboring segments with judging whether they are connected with each other or just crossed two different vessels.
Our main contribution is three-fold:
We design a joint segmentation and classification model based on the UNet architecture UNet, which sequentially handles respective tasks to balance the label distributions for better training.
We propose to post-process classification results for refining them by leveraging global information, called intra-segment label unification and inter-segment prediction propagation, which smooths each pixel’s label along the vessel system’s structure.
We experimentally demonstrate that our method, including SeqNet and the post-processing, achieves the state-of-the-art performance over two public datasets. The code is available here111https://github.com/conscienceli/SeqNet.
Our method consists of SeqNet (Fig. LABEL:fig_structure) for initial segmentation/classification and PP for refinement. Following sections details these two components.
Some existing methods for A/V classification actually formulate the problem as a ternary classification task, where each pixel is labeled as either artery, vein, or background. This can deteriorate the performance by imposing further imbalance among the labels, i.e., there are much more background labels than artery/vein labels. Most state-of-the-art models actually suffer from a poor segmentation ability, which is discussed in Section 3. Unlike these methods, SeqNet sequentially applies segmentation into vessel/background and classification into A/V in a single network. Yet, training is done jointly.
As shown in Fig. LABEL:fig_structure, SeqNet mainly consists of two streams (the upper stream with the blue and green blocks and the lower stream with the orange block). The upper stream is for segmentation. We adopt IterNet li2019iternet, which iteratively refines the segmentation results by smaller UNets (the green block in Fig. LABEL:fig_structure) after initial segmentation by the blue block. The state-of-the-art performance has been achieved with this model over the mainstream datasets staal:2004-855; 5740926. In SeqNet, the green block is repeated three times, following the original implementation in li2019iternet. Both two streams use separate cross entropy losses and are trained jointly with a batch size of . For the target, IterNet uses the segmentation labels while the classification part uses the A/V labels. Adam kingma2014adam is used as the optimizer with a learning rate of .
With input retinal image and refined vessel map by IterNet, where and are the width and height the input image and vessel map, we apply another full-size UNet block, which is shown in orange in Fig. LABEL:fig_structure, to classify each pixel into artery/vein. The possible output labels are background, artery, and vein. We mask background pixels in input image by
where is the element-wise multiplication. This masking reduces the complexity of the input retinal image, so that the classification stream can fully focus on finding the differences in color, thickness, shape, etc., among the vessels. We put a block layer before the element-wise multiplication to prevent back-propagation from the classification stream to the segmentation stream, so that each steam can be responsible to the respective task and can be trained in a multi-task manner.
The output from the classification stream is merged with the segmentation result. Let , where denote the softmax output of the classification stream.
2.2 Intra-segment Label Unification
There are mainly two types errors in classification results: The first one is inconsistency along one single vessel, i.e., both artery and vein labels appear in a vessel, as shown in Fig. LABEL:fig_segments_common_mistakes1, because the underlying convolutional network does not count the structure of the vessel system, making decisions mainly based on local features, such as color and shape. These local features can be easily influenced by environmental factors, e.g., illumination and the retinal camera used. The second type of errors is mixed-up prediction that happens mostly near the crossing and branching points, as shown in Fig. LABEL:fig_segments_common_mistakes2, because local features corresponding to both vessel types may be observed. To remedy these two kinds of errors, we design a post-processing algorithm, namely, intra-segment label unification for the label inconsistency problem and inter-segment prediction propagation for the mixed-up prediction problem.
Intra-segment label unification firstly generates a binary image of detected vessels from SeqNet’s output by:
where and are the -th pixels in and , respectively; is a predefined threshold. We then extract binary skeletons using a multiple-threshold method introduced in Appendix A, as shown in Fig. LABEL:fig_segments_analysis(a). We detect all key-points, which includes the crossing points between vessels and the terminal points (i,.e., start and end points) of vessels (Fig. LABEL:fig_segments_analysis(b)). Crossing points are detected by looking for vessel pixels on the skeleton image that have more than two neighbors, while terminal points only have no more than one neighbor. Skeletal pixels between connected key-points are extracted as a segment as in Fig. LABEL:fig_segments_analysis(c).
Let be the set of all segments extracted from , where is the set of pixels in segment . We compute the confidence that segment belongs to in by
where is the value in corresponding to pixel . can be viewed as unified label confidence of corresponding to , where actual prediction can be done by comparing ’s, i.e., is artery if and vein otherwise.
2.3 Inter-segment Prediction Propagation
To address errors around crossing and branching points, we introduce additional post-processing, coined inter-segment prediction propagation, in which the label of a segment is propagated to its connected segments. This is based on the observation that classification failures usually come with a low confidences on their labels and that they can be corrected by the influences from their connecting segments with high confidence. Propagation should happen depending on the similarity between connecting segments based on their shapes, directions, etc. If two segments share similar shapes, are located nearby, and flows in similar directions, it is highly possible that they belong to the same vessel. Therefore, the influence between these segments should be strong.
Based on this observation, we update confidence of segment according to the following rule:
where is the index of segment connected to . is the coefficient to determine the influence of to , given by
be the unit tangent vector ofat a certain key-point, which is computed using the key-point pixel position and the position of the fifth pixel along the skeleton, i.e., . involves the angle between and , defined as
where is the angle formed by segments and and is given by
where is the pre-defined maximum value decided by observing the vessel systems on the training images. This function serves as normalization of into . gives 1 if the tangent vectors are in the opposite directions (i.e., gives 180 degree).
handles a potential missing connection between two segments, which is defined as
where is a unit vector from ’s key-point to ’s, and the angle computed by is normalized by in the same way as Eq. (7). gives a value close to 1 if one of ’s key-point is on the line described by .
Thickness of vessels can also be a informative cue to retrieve connecting vessels since they share a similar thickness when they are connected to each other. We encode this by , defined as
where gives the difference of mean thickness of and , computed along the skeleton pixels. gives a small value if and are far from each other. We defined this as
Both and are defined in the same way as Eq. (7).
We apply this update rule to all extracted segments. The detailed algorithm is presented in Algorithm D in Appendix. The label confidence evolves as shown in Fig. LABEL:fig_propagation_steps. We can see that several iterations correct the predicted labels. Note that a segment has two end points, while , , and involve a single end point in each of segments and . We update the confidence for all four combinations of end points.
This propagation process is not allowed to change the segments in the cup area, which is indicated by the magenta circle in Fig. LABEL:fig_segments_analysis(b). This is because vessels in this area are too dense and hard to analyze their relationships, i.e., which segments are actually connected together and which segments are merely crossing, etc. Also, higher brightness in the cup area results in many segmentation failures, which may lead to the failure of PP.
3 Performance Evaluation
We use two popular public datasets, namely DRIVE staal:2004-855, and the artery/vein labels from 10.1007/978-3-642-40763-5_54, as well as LES-AV orlando2018towards, to evaluate our method. We compare our method with two recent methods, i.e., uncertainty-aware (UA) 8759380 and fully convolutional network (FCN) hemelings2019artery, on the DRIVE dataset.
One problem is that existing methods use different evaluation strategies. Although most of them use accuracy as the performance metric, but usually with different pixel masks, including the whole image, the discovered vessel pixels, the ground-truth vessel pixels, the major vessel pixels, etc. To remove the barrier of reproducing and testing A/V classification methods, we adopt a newly-proposed evaluation procedure hemelings2019artery which includes a series of pixel masks, such as full image, center-line of discovered vessels, center-line of major discovered vessels (width), the amount of discovered vessels, etc.
Among these results shown in Table. LABEL:table_results_drive and Table. LABEL:table_results_les_av, we can see that our method achieves a better AUC value than other models, as our model avoids deterioration of the segmentation performance due to isolation of segmentation and classification. Also, our full method (SeqNet & LU & PP) shows higher accuracy on both datasets.
In this paper, we propose SeqNet for accurate vessel segmentation and artery/vein classification in retinal images, together with a post-processing algorithm. SeqNet sequentially does segmentation and classification but not simultaneously, which may deteriorate the segmentation performance due to the problem of imbalanced label distribution. Our post-processing algorithm then corrects classification results by propagating highly confident labels to their surrounding vessels segments. Experimental results showed that our method is effective and can achieve the state-of-the-art performance on two public datasets.
This work was supported by Council for Science, Technology and Innovation (CSTI), cross-ministerial Strategic Innovation Promotion Program (SIP), “Innovative AI Hospital System” (Funding Agency: National Institute of Biomedical Innovation, Health and Nutrition (NIBIOHN)). This work was also supported by JSPS KAKENHI Grant Number 19K10662.
Appendix A Multiple Thresholds in Segments Extraction
In order to propagate the influence correctly, we have to extract the vessel segments accurately. Otherwise, the vessel map may be erroneous, resulting in unreasonable propagation, as shown in Fig. LABEL:fig_multiple_thresholds(a). Due to a missing important segment, a wrong label is propagated to the segment on the right hand side. Therefore, we should make several different binary skeleton with different thresholds and combine them into a complete vessel map. This is also detailed in Algorithm D.
Appendix B Example Results of Intra-Segment Label Unification
Fig. LABEL:fig_segments_analysis_2(a) shows the direct output from the classification stream, in which we can see many prediction errors. Figs. LABEL:fig_segments_analysis_2(b) and (c) are the results of vessel skeleton extraction and label unification, respectively, where most label inconsistency in a single vessel segment have been resolved.
Appendix C Common Prediction Errors
Figs. LABEL:fig_segments_common_mistakes1 and LABEL:fig_segments_common_mistakes2 respectively show two common errors in classification, i.e., inconsistency along one single vessel segment and mixed-up prediction that happens around the crossing and branching points in most cases.
Appendix D Post-Processing Algorithm
We detail the proposed post-processing in Algorithm D, including multiple thresholds fusion, segment extraction, label unification, and prediction propagation.
The thresholds we select in our implementation are , , and . They are in a descending order because the higher threshold can result in a skeleton in higher confidence by focusing more on major vessels, while the smaller thresholds covers minor vessels.
As introduced in Section 2.2, label unification is based on the confidence associated with each segment, which is actually the sum of the prediction confidence of pixels in that segment. The confidence value is also used in PP, which may need several iterations for a better result. In our experiment, the number of iterations is set to .
Refined prediction result
Start searching segments in the vessel map
tr in BS Skeletonize(Binarify(, threshold=tr));
keypoints FindEndPoints(BS) + FindCrossingPoints(BS);
segments segments + FindSegments(keypoints); Start unify the segments in segments CalculateTotalConfidence() *using Eq. 3 UnifyResultAlongOneSegment(); Start prediction propgation count 0;
count in segments UpdateConfidence(, segments) *using Eq. 4,5 ChangeSegmentCategory(, ); count count ;
Appendix E Example Prediction Results
Figs, LABEL:fig_experiment_figures_drive shows an example result on the DRIVE dataset.