Optical coherence tomography angiography (OCTA) is a non-invasive imaging technique that is widely used for retinal vascular imaging. Unlike conventional vascular imaging techniques such as fluorescein angiography (FA), OCTA successfully avoids the potential side effects and risks associated with dye injection . Compared to color fundus imagery, as shown in Fig. 1 (B), OCTA is able to capture the microvasculatures surrounding in the fovea and parafovea regions . Therefore, it is often used to quantify important clinical indicators, such as vessel density, curvature and fractal dimension, to assist clinicians in the diagnosis and treatment of retina-related diseases [26, 4, 8]. However, the diagnosis of some retinal diseases often requires the analysis of vascular structure from a larger field of view (FOV). For example, Diabetic Retinopathy (DR) mainly presents with macular peripheral vascular changes, but some important vessels are often not visible in OCTA imaging scans due to the limitation of imaging range . Meanwhile, the resolution of OCTA images is inversely proportional to the field of view at the same sampling frequency, which limits clinicians’ performance for analyzing larger vascular areas. In consequence, an OCTA image with high resolution is highly desired when dealing with larger FOV conditions.
Current OCTA imaging devices, such as the RTVue XR Avanti SD-OCT system (Optovue, USA) and CIRRUS HD-5000 (Carl Zeiss, Germany) can produce OCTA en face images with different field of views, e.g., , , and . The and scans are two commonly used scanning approaches in clinical practice. The former has a higher scanning density, which means it has a higher image resolution and can depict the retinal foveal avascular zone (FAZ) and surrounding capillaries more clearly, as shown in Fig. 1 (B) and (D). By contrast, as shown in Fig. 1 (C) and (E), the scan has a larger FOV but with lower scanning density, which results in lower imaging visibility to present small capillaries . This is due to the fact that a high scanning/sampling frequency is hardly to be achieved in clinical practice by considering its longer inter-frame and total imaging time at a larger FOV. Artifacts caused by involuntary movements, blinking and tear film evaporation will arise during the long acquisition procedure and interfere with clinical diagnosis . Thus, it has been becoming a challenging issue to ensure a considerable image resolution with a larger FOV.
In recent years, several works have been established to achieve high imaging resolution within larger FOV images by improving the optical system directly . Despite the significant efforts have been invested, the current state-of-the-art OCTA acquisition systems still need to make a trade-off between image resolution and FOV size . Hence, pre-processing techniques such as image enhancement is elaborated to compensate the resolution loss and obviously show more values. For example, Zang et al.  proposed to minimize image artifacts in order to extend the total samplings and thus to obtain high resolution images with larger FOV. Uji et al.  employed a multiple en face image averaging method to enhance image quality of OCTA; Tan et al.  developed a vessel enhancement algorithm based on a modified Bayesian residual transform, which can improve the contrast and visibility of vascular features. However, such procedures may require more repeated acquisitions, which in turn increases the total sampling time.
, as they are not only able to perform HR reconstruction of images with the larger FOVs, but also reduce the necessity of long acquisition time. Currently, image super-resolution techniques can be divided into three types: interpolation-based, statistics-based [32, 44, 45], and learning-based methods [27, 51, 38]. Most of image super-resolution techniques are always limited by detailed, realistic textures or small upscaling factors, while learning-based methods are capable of recovering missing high-frequency information from large quantities of LR and HR pairs, and thus have achieved significant improvements in super-resolution reconstruction [7, 37, 29, 24, 9]
. For example, By introducing Super-Resolution Convolutional Neural Network (SRCNN)
, convolutional neural networks (CNNs) have led to state-of-the-art performance in super-resolution reconstruction of medical images. Umeharaet al.  proposed to apply SRCNN on CT images and their approach outperforms many of the traditional linear interpolation methods. For MRI images, Shi et al.  developed a residual learning-based super-resolution method to solve MRI super-resolution problems. Feng et al.  proposed a multi-stage integration network for multi-contrast MRI super-resolution, which explicitly models the dependencies between multi-contrast images at different stages. Zhou et al.  proposed a two-stage GAN to improve the reconstruction of ultrasound image structures and details by cascading a U-Net network at the front end of the generator and a modified GAN network at the back end. The network combined multi-scale global residual and local residual learning, which can effectively capture high-frequency details. These successes have motivated our investigation of OCTA super-resolution methods using CNNs. Hence, in this paper, we aim to enhance the image quality of OCTA via learning-based super-resolution method for alleviating the resolution loss caused by under-sampling.
In general, learning-based super-resolution methods require pairs of LR-HR images. In practice, it is extremely difficult to obtain a realistic HR image in OCTA acquisition. Since OCTA images have higher resolution and are also fully contained within the images, we can therefore use realistic OCTA images as the reference HR images to guide the training of realistic images reconstruction. This strategy requires the registration of to . However, it is difficult to achieve perfect registration result, due to the fact that local capillary differences caused by eye motion and signal variations obviously exist between the two acquisitions, as shown in Fig. 1. Such discrepancy in the spatial mapping between and images has been observed in several OCTA repeatability and reproducibility studies [11, 20, 25, 19]. Ignoring this domain discrepancy during learning process could result in over-smoothed reconstruction with less detailed information. It is a key aspect to reduce the domain gap between and images for high-resolution vascular structures reconstruction.
Domain adaptation aims to use labeled source domains to learn models that perform well on unlabeled target domains. Recently, adversarial-based domain adaptation methods have been proposed for solving challenging dense prediction tasks [28, 49, 35]. For example, Tsai et al.  proposed an adversarial-based domain adaptation method for semantic segmentation, which uses the spatial similarity of the source and target domain outputs to reduce the domain differences. By means of appropriate adaptation strategies, models trained on synthetic datasets have achieved performance comparable to that of models trained with realistic labeled datasets. .
Inspired by these adversarial-based domain adaptation methods, we propose a novel framework from a domain adaptation perspective for OCTA images super-resolution reconstruction, which can mitigate the difficulties in pairwise training of and images. In the proposed network, a multi-level super-resolution model is proposed as a generator for images reconstruction, and PatchGAN  is developed as a discriminator to align the reconstruction results with the features of HR images. In addition, we introduce a sparse edge-aware loss to optimize the vascular structure of the reconstruction results via dynamic alignment of edges between the reconstructed HR image and the reference HR image. Finally, we perform quality evaluation of the super-resolution results on two OCTA sets with vascular and FAZ pixel annotations. The experimental results show that our proposed method achieves state-of-the-art results in terms of visual quality and clinical evaluation on two OCTA sets.
Our contributions can be summarized as follows:
An adversarial-based super-resolution model is designed in this work. It aligns the spatial features between the generated HR image and the reference HR image to address the domain gap between LR images and its reference HR images.
We introduce a sparse edge-aware loss to dynamically optimize the reconstruction of local structures in LR images. This new loss provides the flexibility to learn similar edge features from the reference HR images.
The proposed method has undergone rigorous quantitative and qualitative evaluation, and it has demonstrated the proposed method has promoted the OCTA image segmentation performance.
In this work, a new SUper-resolution REconstruction dataset (SURE) of OCTA image is constructed for the proposed work. SURE includes two subsets: SURE-O and SURE-Z, and the images in these two sets were acquired by two commonly-used OCT imaging devices: Optovue Avanti RTVue XR (Optovue, Fremont, USA), Zeiss Cirrus HD-OCT 5000 (Zeiss Meditec, Dublin, USA), respectively.
SURE-O contains 559 pairs of OCTA images from 320 subjects (including 150 with Alzheimer’s disease (AD) and 170 healthy controls) with FOV of and separately. Both FOVs have the same image resolution of . We randomly assigned the 559 pairs of images to the training, testing, and validation sets in a 4:1:1 ratio.
SURE-Z contains 261 pairs of OCTA images from 172 subjects (including 159 with diabetic retinopathy (DR) and 13 healthy controls) with FOVs of and separately. Both FOVs have the same image resolution of . Similar to the SURE-O dataset, we divided the SURE-Z into the training, testing, and validation sets in the ratio of 4:1:1.
Super-resolution reconstruction task usually requires aligned image pairs in the training phase. To this end, we separately have a pair of synthetic LR-to-HR images and a pair of realistic LR-to-HR images in this work. We define the original OCTA images from both sets as HR images. The corresponding low resolution images were obtained by two different strategies. First, the original OCTA images were downsampled to obtain Synthetic LR image (). The original image was degraded from to , and we use bicubic interpolation in this work. We then cropped the corresponding area scanned area from image, which is defined as Realistic LR image and the Generalised Dual Bootstrap-ICP (GDB-ICP)  registration method is employed. Fig. 3 illustrates the strategies of LR image generations.
In addition, we use retinal vessel and FAZ segmentation to better confirm the impact of proposed reconstruction model on subsequent analysis tasks. Three well-trained imaging experts manually labeled the retinal vessel and FAZ, and then two senior ophthalmologists reviewed and refined the manual annotations. Their consensus were finally defined as ground truth in this study. The inter-annotator agreement is higher than 0.90 in terms of pixel-level. 37 and 36 OCTA images (all with scan) from the testing set of SURE-O and SURE-Z were randomly selected for annotation, respectively. All the data described in this section have appropriate approvals from the ethics committees of Ningbo Institute of Industrial Technology, Chinese Academy of Sciences, and written informed consent was obtained from each participant in accordance with the Declaration of Helsinki.
As aforementioned, due to the noise and contrast variations in the realistic LR-to-HR pairs, it is challenging to obtain satisfactory reconstruction performance by solely relying on the realistic pairs to supervise a pixel-by-pixel learning process. Therefore, the key of our network lies on how synthetic data pairs can guide the super-resolution reconstruction of realistic data pairs. To this end, we propose a semi-supervised framework which exploits the concept of domain adaptation reconstruction. The proposed Sparse-based domain Adaptation Super-Resolution (SASR) method consists of three components, i.e., a multi-level super-resolution network (MLSR), a discriminator network, and a sparse edge-aware loss function. It is worth noting that we define the adversarial network including the MLSR and patch discrimination modules as a domain adaptation super-resolution network (DASR). Fig.3 shows the overall network architecture.
3.1 Multi-level super-resolution network
Unlike image classification which is adaptive in the global feature space, adaptive learning in low-dimensional space is optimal for dense estimation tasks. Our intuition is that the reconstructed image ofOCTA has strong spatial and local similarities to the HR OCTA image. Therefore, we exploit this property to adapt the low-dimensional output of super-resolution reconstruction by means of generative-adversarial learning, in which a MLSR network is proposed as a super-resolution generative network and a patch discriminator as a discriminative network.
The MLSR network is introduced for the supervised super-resolution reconstruction of , while and share the network parameters to generate high-resolution reconstruction results of the realistic LR images. As shown in Fig. 3, the MLSR consists of three main components: a backbone network, an encode-decode network, and an up-sampling fusion network. The backbone network adopts the RDN 
structure, which consists of four main parts: the shallow feature extraction net (SFENet), residual dense blocks (RDBs), dense feature fusion (DFF), and the up-sampling net (UPNet). In our method, we remove the UPNet structure from the RDN network and preserve the framework of SFENet, RDBs and DFF. As shown in Fig.3 (a), SFENet consists of two convolutional layers, and RDBs are composed of six dense residual modules and one convolutional layer, forming a continuous memory mechanism. Contiguous memory mechanism means dense connection, which is realized by passing the state of preceding RDB to each layer of current RDB. This mechanism ensures continuous storage and memory of low-level and high-level information. We denote the output of this module as .
The backbone network has extracted the dense multi-scale features of the image. However, the multi-resolution features are not fully utilized by the network. Therefore, in the parallel sub-network, we introduce an encoder-decoder network to extract the multi-resolution features of the image and optimize the overall structure. This sub-network consists of three encoder-decoder layers with the skip-connection block. As can be seen in Fig. 3 (b), each green box represents the encoder layer, which includes two residual modules and a maxpooling layer. The residual module contains three
convolutional layers and three batch normalization layers. The blue boxes represent the decoder layers, and each of them includes an up-sampling layer and a residual module, where the up-sampling layer employs the nearest neighboring interpolation method. The white boxes represent the RDB modules, which is formed by combining a residual block and a dense connection block to enhances the transmission of information and gradients. We add the RDB module to the skip connection with the aim of improving the ability to extract and transfer spatial information to the input feature map. The output of encoder-decoder network is denoted by.
Afterwards, we concatenate and into a new feature map as the input to the up-sampling fusion network. This network mainly consists of a dynamic convolution layer and an up-sampling layer. The dynamic convolution  can guide the layer parameters adaptively updated according to the feature maps of the different input channels, which can synthesize and optimize the reconstruction results of the output at different stages. As shown in Fig. 4, the weight matrix for each dynamic convolutional layer, where , , and are the number of kernels, input channel, output channel and kernel size, can be defined as:
where and represent the weight matrix and attention map of the kernel, and we set in the experiments. Note that is obtained by using a Squeeze and Excitation (SE) module 
to obtain discriminative representation. More specifically, the SE module is composed of a global average pooling (GAP) layer, and a Multi-Layer Perceptron (MLP) with a ReLU-activated hidden layer, followed by the Softmax layer. The output ofis denoted as , which is fed into the PixelShuffle  layer to obtain the high-resolution reconstruction results of the same size as . The PixelShuffle layer first feeds the low-resolution feature map into a convolutional layer for channel expansion, and then performs multi-channel reorganisation to generate a high-resolution map through period filtering. Compared to other upsampling methods, the PixelShuffle is able to increase the frequency of information extraction to enrich the image detail texture. In the training stage, we combine the Mean Squared Error (MSE) and SSIM as the super-resolution loss to optimize the output in this network. The loss can be defined as:
where and are set to 1 and 0.5 empirically.
3.2 Patch discrimination
To guide the realistic LR-to-HR image reconstruction, we use a discriminator to assist the generative network to achieve spatial feature alignment between synthetic and realistic data. In our paper, we use PatchGAN to classify true and false samples for synthetic and real LR image reconstruction results. Compared with traditional discriminators, PatchGAN makes judgments based on the patch level rather than the whole image level. As shown in Fig.5, the structure of PatchGAN is composed of
five convolutional layers with a stride of 2 in the first three layers and a stride of 1 in the last two layers. Except for the last convolutional layer, the first four convolutional layers are followed by Leaky ReLU layers with a slope of 0.2 and batch normalization (BN) layers. The Sigmoid layer is used to obtain the probability score of the output feature map in the last convolutional layer. In the above setting, the receptive field size of PatchGAN is, which means that PatchGAN has a faster inference speed than traditional discriminators, but still can guide the generator to generate realistic results. Consequently, each pixel in the final output feature map represents the probability that the corresponding patch of the input image is from a real sample. To this end, the loss function in the adversarial process of discriminator and generator is defined as:
where and are discriminator and MLSR, respectively.
3.3 Sparse edge-aware loss
Unlike the general domain adaptation approach, the inputs to our proposed framework are paired synthetic and realistic LR images. Even though and have some discrepancy in spatial mapping, their vascular structures are very similar. In order to further improve the local vascular reconstruction of realistic LR images, we propose to use edge similarity loss for structure optimization. However, since the background noise affects the extraction of partial vessel structures, optimizing the overall vessel edges would introduce mislabeling and thus generate over-smoothed structures. To solve this issue, a sparse edge-aware loss is designed to adaptively constrain the reconstruction results. Firstly, as shown in Fig. 6, we use the canny operator  to extract the edge structures of the and . Then MSE is employed to compute the distances between the two edge images as follows:
where and denote the patches with the size of from and with , respectively. denotes the distance of each pair of patches. Then we adopt a hard shrinkage operation to promote the sparsity:
where is also known as ReLU activation, and is a very small positive scalar. is used to screen vascular regions of with structures similar to . Finally, we compute the edge distances between realistic LR image reconstruction result and label :
where and denote the patches with the size of from and . Therefore, the sparse edge-aware loss is defined as:
To this end, the total loss function is denoted as:
where and are set as 1 and 0.1 in our paper.
4.1 Implementation details
The proposed method was implemented by the publicly available Pytorch Library in the Nvidia GeForce TITAN Xp GPU. In the training phase, we employed an Adam optimizer to optimize the deep model. We used a gradually decreasing learning rate, starting from 0.0001, and a momentum of 0.9. In each iteration, we took a random
patch of the image for training, and the batch size was set to 8 during the training. We trained the network for 300 epochs and the network reached convergence at around the 150 epoch. The training lasted approximately 20 and 8 hours on theSURE-O and SURE-Z datasets, respectively. In addition, online data enhancement with a random rotation from to was employed to enlarge the training set. In the sparse edge-aware loss, the parameter in Equation (7) was set to 0.05 by counting the average distance between the edges of and . and are set to 16 and 36, respectively.
|RDN  (baseline)||0.942±0.076||0.846±0.166||0.913±0.107||0.863±0.192||0.110±0.205||0.867±0.186||29.12±30.77|
|Two-stage GAN ||0.947±0.066||0.852±0.165||0.916±0.106||0.866±0.193||0.106±0.206||0.870±0.186||24.80±31.30|
|SASR (Our Method)||0.961±0.034||0.890±0.093||0.940±0.056||0.886±0.147||0.104±0.177||0.890±0.142||17.58±29.58|
|RDN  (baseline)||0.941±0.031||0.830±0.073||0.908±0.042||0.840±0.090||0.134±0.110||0.846±0.087||69.92±87.87|
|Two-stage GAN ||0.928±0.041||0.799±0.093||0.889±0.055||0.791±0.116||0.194±0.147||0.799±0.112||63.13±85.29|
|SASR (Our Method)||0.958±0.033||0.889±0.070||0.940±0.039||0.897±0.063||0.086±0.061||0.900±0.060||41.01±42.88|
|RDN  (baseline)||0.793±0.032||0.579±0.047||0.740±0.029||0.571±0.048||0.235±0.093||0.655±0.044|
|Two-stage GAN ||0.789±0.033||0.578±0.047||0.738±0.029||0.567±0.048||0.239±0.096||0.652±0.045|
|SASR (Our Method)||0.807±0.036||0.649±0.049||0.775±0.030||0.594±0.060||0.269±0.093||0.684±0.055|
|RDN  (baseline)||0.734±0.040||0.526±0.040||0.681±0.027||0.433±0.048||0.322±0.060||0.591±0.040|
|Two-stage GAN ||0.732±0.037||0.522±0.039||0.680±0.027||0.436±0.048||0.316±0.059||0.591±0.040|
|SASR (Our Method)||0.747±0.038||0.541±0.049||0.688±0.030||0.442±0.051||0.323±0.051||0.601±0.045|
4.2 Evaluation Metrics
For the pairs of realistic OCTA images, it is unreliable to evaluate the image quality improvement with the common quality measures, due to that the inherent variations of signal and noise levels can be generated during the image acquisition at different scales, i.e. and . Nevertheless, the similarity on clinically significant structures such as blood vessels provides us with extra possibilities for quality evaluation, i.e., by considering the improvements to FAZ and vascular segmentations.
For the synthetic OCTA images, we use peak signal-to-noise ratio (PSNR) and structural similarity (SSIM)  and learned perceptual image patch similarity (LPIPS)  to evaluate the reconstruction quality of different super-resolution methods. The PSNR is defined as by:
where and present reference and reconstructed images, both of size . The SSIM is defined as:
where and are the mean values of and , respectively. and
denote the variance ofand , respectively. represents the covariance of and . and are the constants that maintain stability. PSNR and SSIM are commonly used metrics for the evaluation of image super-resolution, and they pay attention to image fidelity rather than visual quality. In contrast, LPIPS is more concerned with whether the visual features of the images are similar. LPIPS uses pre-trained AlexNet to extract image features and then calculates the distance between two features. Therefore, the smaller the LPIPS, the closer the generated image is to the ground truth.
4.3 Performances on realistic low resolution image
In this subsection, we evaluated the reconstruction performance when the realistic low resolution image was utilized as input. To prove the superiority of the proposed method, the following methods were compared:
(2) Deep learning methods for natural synthetic images: Super-Resolution Convolutional Neural Network (SRCNN) , Enhanced Deep Super-Resolution network (EDSR) , Densely Residual Laplacian Network (DRLN)  and Residual Dense Network (RDN) .
(3) Deep learning methods for realistic images: Zero-Shot Super-Resolution (ZSSR) , Realistic degradation framework for Super-Resolution (RealSR) , CycleGAN , Two-stage GAN  and High-resolution Angiogram Reconstruction Network (HAR-Net) . Particularly, the HAR-Net was specialized in OCTA image super-resolution task.
4.3.1 Performance of reconstruction results from realistic OCTA images
Fig. 7 shows the visual effect of the super-resolution results, and our method shows relatively better performance in terms of feature preservation and contrast. This might be because our method pays more attention to multi-scale structure and edge details. For a careful observation, our method yields better vascular details compared to other comparative methods. The main reason is that the proposed network emphasizes the vascular signal ignoring the effect of background noise during the reconstruction process. As can be seen from the Fig. 7, the reconstruction results of our method avoid the generation of redundant vascular information. The vague capillary details in LR images are properly reconstructed in the generated HR images with better visibility. We also reported the number of parameters and test times of single image for different super-resolution models, as shown in Table 5. Compared to the RDN, the time difference of our method between the tests on the two sets is small, despite the increased parameters of the proposed model. This is because the discriminative network is not involved in the testing. The testing time of the proposed model is within allowable limits in contrast to other adversarial migration methods.
4.3.2 Performance of FAZ segmentation
The FAZ is a highly specialized region and useful marker that indicates fovea health. To confirm the reconstruction performance on realistic images, we further performed FAZ segmentation of the reconstructed images on two OCTA sets. Specifically, we employed a trained OCTA-Net  to segment the FAZ in both of the reconstructed HR images and the reference HR images. This model had been pre-trained on HR images with the ground truth of FAZ contours. Note that the ground truth was obtained by manual segmentation in the reference HR image, and used to calculate the FAZ segmentation metrics for the reconstructed and reference images.
|Method||Parameter||Time (SURE-Z)||Time (SURE-O)|
The following metrics are calculated and compared:
Area Under the ROC Curve (AUC);
Sensitivity (SEN) = TP / (TP + FN);
Accuracy (ACC) = (TP + TN) / (TP + TN + FP + FN);
Kappa score = ;
False Discovery Rate (FDR) = FP / (FP + TP);
G-mean score = ;
Dice coefficient (Dice) = 2 TP / (FP + FN + 2 TP);
where TP is true positive, FP is false positive, TN is true negative, and FN is false negative. in Kappa score represents opportunity consistency between the ground truth and prediction, and it is denoted as:
Hausdorff distance ()  is denoted as :
where calculates the directed Hausdorff distance between the point set of edge contour in the reconstructed or reference HR image and the point set of edge contour in the ground truth.
|RDN  (baseline)||0.942||0.846||0.913||0.863||0.110||0.867||29.12||0.941||0.830||0.908||0.840||0.134||0.846||69.92|
|SASR (Our Method)||0.961||0.890||0.940||0.886||0.104||0.890||17.58||0.958||0.889||0.940||0.897||0.086||0.900||41.01|
|RDN  (baseline)||0.793||0.740||0.571||0.740||0.235||0.655||0.734||0.526||0.681||0.433||0.322||0.591|
|SASR (Our Method)||0.807||0.649||0.775||0.594||0.269||0.684||0.747||0.541||0.688||0.442||0.323||0.601|
Fig. 8 shows the FAZ segmentation results of the reconstructed images using different methods on the two OCTA sets, where the red area represents the under-segmentation results while the blue area represents the over-segmentation results. It can be seen that the FAZ segmentation results of our method are closest to the HR segmentation results and the ground truth, where the under-segmented and over-segmented areas are the smallest. In addition, Table 1 and 2 provide the FAZ segmentation results of all methods on the realistic images of the two sets, respectively. In both sets, our method is far ahead of the other methods in terms of all metrics (results in boldface). Compared with the baseline network RDN, the segmentation metric AUC improves by 1.9% and 1.7%, G-mean improves by 2.7% and 3.2%, Kappa improves by 2.3% and 5.7%, and improves by 11.54 and 28.91 on two sets, respectively. These improvements fully demonstrate the superiority of the proposed modules in our network. We also compared our method with HAR-Net, which specializes in OCTA image super-resolution reconstruction task. The evaluation results show that our proposed network achieves better performance in terms of all metrics. Meanwhile, SEN improves by 4.5% and 6.0%, FDR improves by 0.6% and 4.8%, and Dice improves by 2.4% and 5.5% on both sets, respectively.
4.3.3 Performance of capillary segmentation
Capillary segmentation is considered as another important evaluation method for the super-resolution reconstruction on realistic OCTA images. Thus, we compared the segmentation results based on the reconstructions from all different methods. Similarly, we employed the trained vessel segmentation model (OCTA-Net) to extract the small capillaries in both of the reconstructed HR images and the reference HR images. This model had been pre-trained on HR images with manually labeled vascular networks. To evaluate the performance of segmentation after reconstruction, we used the following metrics similar to FAZ segmentation: AUC, SEN, G-mean, Kappa, FDR and Dice. In addition, we calculated vascular-related measures to verify the consistency of the super-resolution results from different perspectives: 1) Vascular Length Density (VLD): the ratio between the total number of microvascular centreline pixels and the area of analyzed region. 2) Vascular Tortuosity (VT): a metric to measure the tortuous level of vascularture, computed by applying the method proposed by .
It is important to emphasize that the region for segmentation comparison is defined within the FOV around the macula for the following reasons: (1) The area around the macula is more correlated with disease; (2) The small capillaries in this selected region have better visibility and can produce a higher segmentation confidence level for evaluation purpose.
The top row and the third row of Fig. 9 show reconstruction results using different methods on two sets. It can been observed that more capillaries are identified in our SASR reconstructed images, where the improvements are indicated by yellow arrows. The vessel segmentation results are also more accurate due to the significantly improved image quality between the vessels and the background area of the reconstructed images. From the segmentation results in Table 3 and Table 4
, we can observe that our approach achieves the best performance on both sets. Specifically, Our method outperforms the state-of-the-art DRLN by 1.3% and 1.4% in AUC, 7.0% and 1.3% in SEN, and 2.9% and 1.0% in Dice, respectively. Compared with ZSSR, SEN is 7.5% and 1.9% higher, G-mean is 3.8% and 0.8% higher, Dice is 3.2% and 1.1% higher, respectively. Moreover, compared with CycleGAN, our method outperforms it in all metrics on both sets. This is because unpaired transfer learning lacks the constraint of structural consistency, while the proposed sparse edge-aware loss enhances the texture details of the reconstructed images. Our method also outperforms the baseline model by 7.0% and 1.5% in SEN, 3.5% and 0.7% in G-mean, and 2.3% and 0.9% in Kappa, respectively. These quantitative evaluations prove that our module has great improvement compared to the baseline network. These improvements in performance are consistent with the segmentation results indicated by the yellow arrows in Fig.9, where the segmentation model can successfully extract small capillaries with good continuity and integrity from the reconstructed images of the proposed method, while the reconstructed images of the other methods yield relatively low capillary correspondence. Furthermore, Table 6 reports the vascular measurement metric results for the different methods. The results show that the VT and VLD results of the proposed method are closest to the high-resolution images on both sets.
4.4 Performances on synthetic low resolution image
In this section, we evaluated the reconstruction performance when the synthetic low resolution image was utilized as input.
Table 9 shows the super-resolution reconstruction results of all methods on the two synthetic sets. The experiments show that our method achieves the best performance in terms of PSNR on both sets, which represents that the fidelity of our method outperforms other methods. The LPIPS metric of our method also presents the highest performance on two OCTA dataset, indicating our results are much closer to the HR images in terms of visual characteristics. We adopted a simple method for image degradation, but the reconstruction effect of SURE-O dataset is far inferior to that of SURE-Z data, which is mainly because the image quality of SURE-Z dataset is relatively higher, and has less background noise. For SURE-Z dataset, all methods achieve comparable performance in terms of their reconstruction results. Nevertheless, our proposed network still achieves the best results in all metrics. When compared with EDSR, the proposed method outperforms it in all metrics on both sets, with a respectively higher PSNR of 0.41 dB and 3.36 dB. The LPIPS is also improved by 4.3% and 3.4%, respectively. In addition, compared with the HAR-Net, our method achieves significant improvements in all three metrics, with 0.39 dB and 3.85 dB higher on PSNR, 0.3% and 2.4% higher on SSIM, 4.5% and 3.0% higher on LPIPS, respectively.
4.5 Ablation studies
In this paper, our proposed method employed three modules to establish the super-resolution framework, i.e., multi-level super-resolution model (MLSR), patch discrimination, and a new sparse edge-aware loss. To evaluate the effectiveness of each module, we validated the reconstruction performance using different combinations of these modules on two OCTA sets separately.
4.5.1 Multi-level super-resolution model (MLSR)
The effectiveness of the MLSR module was demonstrated by comparing with different reconstruction results separately on the synthetic and realistic OCTA sets. For the synthetic data in both sets, Table 9 shows that MLSR outperforms RDN in all metrics on both sets, with a respectively higher PSNR of 0.37 dB and 2.16 dB. The LPIPS is also improved by 3.7% and 1.4%, respectively. For the realistic data, Table 1-4 show that the MLSR achieves higher performance on both FAZ and vessel segmentations compared to other approaches. The results indirectly prove that the super-resolution performance of the model on synthetic data can affect the performance tested on realistic data until the domain gap between synthetic and realistic data are mitigated.
4.5.2 Patch discrimination
Furthermore, we also validated the effectiveness of patch discrimination in the domain adaptation super-resolution framework. We reported the FAZ and vessel segmentation results on both sets. It is worth while to emphasize that one of the main contributions in this work is the domain adaptation framework, which guides the super-resolution reconstruction of realistic LR images via reducing the domain map between synthetic and realistic data. In both OCTA sets, Fig. 7-9 show that the reconstruction performance of DASR outperforms MLSR in terms of the visual effect of super-resolution reconstruction results and the performance of FAZ and vessel segmentation. The experimental results in Table 7 and Table 8 also show that DASR outperforms MLSR in all metrics for the FAZ and vessel segmentation, respectively. It proves that DASR can enhance the clarity and contrast of the vessels in the reconstructed images, which leads to improvement of the FAZ and vessel segmentation results.
4.5.3 Sparse edge-aware loss (SE-loss)
Based on the proposed DASR, we also introduced the SE-loss to further improve the reconstruction results. SE-loss is mainly designed to optimize the reconstruction results by using the highly similar characteristics of blood vessels in LR and HR images. Fig. 9 shows that the proposed SE-loss can alleviate the problem of vascular discontinuity of OCTA reconstruction results and meanwhile enhance the image quality. The quantification results of the vessel segmentation in Table 8
also show that the SE-loss is beneficial to the precise vessel reconstruction in LR images. In local regions of high similarity between LR and HR images, super-resolution reconstruction is performed by taking advantage of the HR vascular information in a supervised manner. While in regions of low similarity, the reconstruction of LR regions are achieved via unsupervised learning. This can be useful to the enhancement of local visibility with rich capillary details.
In general, and FOVs are two typical OCTA acquisition criteria. The scan quality of larger FOV angiography is significantly lower compared to images, which leads to the presence of noise and the invisibility of small capillaries. Such limitations brings challenges to the judgment of the ophthalmologist or researcher when larger FOV is needed. Therefore, we have proposed a super-resolution method to alleviate this problem. We attempted to address this problem through a domain adaptation approach and construct two OCTA sets from two different devices. First, we used bicubic method to perform degradation on HR images. We then proposed a super-resolution method based on domain adaptation to reconstruct the realistic OCTA images by reducing the difference between the spatial feature domain of the synthetic and the realistic images. In our experiments, we evaluated the reconstruction performance in terms of quality improvement and segmentation accuracy separately. The experimental results show that our method achieves superior performance on both synthetic and realistic images.
In future work, we will extend the proposed single-image super-resolution method to more image modalities such as OCT and ultrasound images for quality improvement. We will also apply our reconstruction technique on clinical data to support dedicated disease analysis, and at the same time improving the reliability and confidence of our method from a clinical perspective.
-  (2020) Densely residual laplacian super-resolution. IEEE Trans. Pattern Anal. Mach. Intell.. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2005) Canny edge detection enhancement by scale multiplication. IEEE Trans. Pattern Anal. Mach. Intell. 27 (9), pp. 1485–1490. Cited by: §3.3.
Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 3722–3731. Cited by: §1.
-  (2017) Retinal microvascular network alterations: potential biomarkers of cerebrovascular and neural diseases. Am. J. Physiol.-Heart Circul. Physiol. 312 (2), pp. H201–H212. Cited by: §1.
-  (2020) Dynamic convolution: attention over convolution kernels. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 11030–11039. Cited by: §3.1.
-  (2015) A review of optical coherence tomography angiography (OCTA). Int. J. Ret. Vit. 1 (1), pp. 5. Cited by: §1.
-  (2015) Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2), pp. 295–307. Cited by: §1, §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2017-10) Automatic blood vessels segmentation based on different retinal maps from OCTA scans. Comput. Biol. Med. 89, pp. 150–161 (en). External Links: Cited by: §1.
-  (2021) Multi-contrast mri super-resolution via a multi-stage integration network. In Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervent. (MICCAI), pp. 140–149. Cited by: §1.
-  (2020) Reconstruction of high-resolution 6 6-mm OCT angiograms using deep learning. Biomed. Opt. Express 11 (7), pp. 3585–3600. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2020) Intra-session repeatability of quantitative metrics using widefield optical coherence tomography angiography (OCTA) in elderly subjects. Acta Ophthalmol. 98 (5), pp. e570–e578. Cited by: §1.
-  (2010) Image quality metrics: PSNR vs. SSIM. In Proc. ICPR, pp. 2366–2369. Cited by: §4.2.
-  (2018) Squeeze-and-excitation networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 7132–7141. Cited by: §3.1.
-  (2009) Multi chaotic systems based pixel shuffle for image encryption. Opt. Commun. 282 (11), pp. 2123–2127. Cited by: §3.1.
-  (2020) Real-world super-resolution via kernel estimation and noise injection. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshops, pp. 466–467. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2019) Reducing the hausdorff distance in medical image segmentation with convolutional neural networks. IEEE Trans. Med. Imag. 39 (2), pp. 499–513. Cited by: §4.3.2.
-  (1981) Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust., Speech, Signal Process. 29 (6), pp. 1153–1160. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
-  (2019) Repeatability of vessel density measurements using optical coherence tomography angiography in retinal diseases. Br. J. Ophthalmol. 103 (5), pp. 704–710. Cited by: §1.
-  (2017) Repeatability and reproducibility of superficial macular retinal vessel density measurements using optical coherence tomography angiography en face images. JAMA Ophthalmol. 135 (10), pp. 1092–1098. Cited by: §1.
-  (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In Proc. Eur. Conf. Comput. Vis., pp. 702–716. Cited by: §1.
-  (2017) Enhanced deep residual networks for single image super-resolution. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshops, pp. 136–144. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2020) ROSE: a retinal OCT-angiography vessel segmentation dataset and new model. IEEE Trans. Med. Imag. 40 (3), pp. 928–939. Cited by: §1, §4.3.2.
-  (2019) Image super-resolution using progressive generative adversarial networks for medical image analysis. Comput. Med. Imag. Graph. 71, pp. 30–39. Cited by: §1.
-  (2017) Repeatability of vessel density measurement in human skin by OCT-based microangiography. Skin Res Technol. 23 (4), pp. 607–612. Cited by: §1.
-  (2019) CS-net: channel and spatial attention network for curvilinear structure segmentation. In Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervent. (MICCAI), pp. 721–730. Cited by: §1.
-  (2020) Single image super-resolution via a holistic attention network. In Proc. Eur. Conf. Comput. Vis., pp. 191–207. Cited by: §1.
-  (2017) Unsupervised domain adaptation for semantic segmentation with gans. arXiv preprint arXiv:1711.06969 2 (2), pp. 2. Cited by: §1.
-  (2018) Super-resolution reconstruction of MR image with a novel residual learning network algorithm. Phys. Med. Biol. 63 (8), pp. 085011. Cited by: §1.
-  (2018) “Zero-shot” super-resolution using deep internal learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 3118–3126. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2015) Image artifacts in optical coherence angiography. Retina 35 (11), pp. 2163. Cited by: §1.
-  (2008) Image super-resolution using gradient profile prior. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1–8. Cited by: §1.
-  (2018) Enhancement of morphological and vascular features in OCT images using a modified bayesian residual transform. Biomed. Opt. Express 9 (5), pp. 2394–2406. Cited by: §1.
-  (2009) The edge-driven dual-bootstrap iterative closest point algorithm for registration of multimodal fluorescein angiogram sequence. IEEE Trans. Med. Imag. 29 (3), pp. 636–649. Cited by: §2.
-  (2018) Learning to adapt structured output space for semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 7472–7481. Cited by: §1.
-  (2018) Multiple enface image averaging for enhanced optical coherence tomography angiography imaging. Acta Ophthalmol. 96 (7), pp. e820–e827. Cited by: §1.
-  (2018) Application of super-resolution convolutional neural network for enhancing image resolution in chest CT. J. Digit. Imag. 31 (4), pp. 441–450. Cited by: §1.
-  (2018) Esrgan: enhanced super-resolution generative adversarial networks. In Proc. Eur. Conf. Comput. Vis.Workshops, pp. 0–0. Cited by: §1.
Deep networks for image super-resolution with sparse prior.
Proc. IEEE. Int. Conf. Comput. Vision, pp. 370–378. Cited by: §1.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process 13 (4), pp. 600–612. Cited by: §4.2.
-  (2020) High-resolution wide-field OCT angiography with a self-navigation method to correct microsaccades and blinks. Biomed. Opt. Express 11 (6), pp. 3234–3245. Cited by: §1.
-  (2019) 75-degree non-mydriatic single-volume optical coherence tomographic angiography. Biomed. Opt. Express 10 (12), pp. 6286–6295. Cited by: §1.
-  (2010) Multi-megahertz OCT: high quality 3d imaging at 20 million a-scans and 4.5 gvoxels per second. Opt. Express 18 (14), pp. 14685–14704. Cited by: §1.
-  (2013) Fast direct super-resolution by simple functions. In Proc. IEEE. Int. Conf. Comput. Vision, pp. 561–568. Cited by: §1.
-  (2010) Image super-resolution via sparse representation. IEEE Trans. Image Process 19 (11), pp. 2861–2873. Cited by: §1, §4.3, Table 1, Table 2, Table 3, Table 4.
-  (2020) Detection of clinically unsuspected retinal neovascularization with wide-field optical coherence tomography angiography. Retina. Cited by: §1.
-  (2016) Automated motion correction using parallel-strip registration for wide-field en face OCT angiogram. Biomed. Opt. Express 7 (7), pp. 2823–2836. Cited by: §1.
The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 586–595. Cited by: §4.2.
-  (2017) Curriculum domain adaptation for semantic segmentation of urban scenes. In Proc. IEEE. Int. Conf. Comput. Vision, pp. 2020–2030. Cited by: §1.
-  (2018) Residual dense network for image super-resolution. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 2472–2481. Cited by: §3.1, §4.3, Table 1, Table 2, Table 3, Table 4, Table 7, Table 8.
-  (2021) Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428. Cited by: §1.
-  (2020) Automated tortuosity analysis of nerve fibers in corneal confocal microscopy. IEEE Trans. Med. Imag. 39 (9), pp. 2725–2737. Cited by: §4.3.3.
-  (2019) Image quality improvement of hand-held ultrasound devices with a two-stage generative adversarial network. IEEE Trans. Biomed. Eng 67 (1), pp. 298–311. Cited by: §1, §4.3, Table 1, Table 2, Table 3, Table 4.
Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 2223–2232. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4.