Log In Sign Up

Robust and efficient computation of retinal fractal dimension through deep approximation

by   Justin Engelmann, et al.

A retinal trait, or phenotype, summarises a specific aspect of a retinal image in a single number. This can then be used for further analyses, e.g. with statistical methods. However, reducing an aspect of a complex image to a single, meaningful number is challenging. Thus, methods for calculating retinal traits tend to be complex, multi-step pipelines that can only be applied to high quality images. This means that researchers often have to discard substantial portions of the available data. We hypothesise that such pipelines can be approximated with a single, simpler step that can be made robust to common quality issues. We propose Deep Approximation of Retinal Traits (DART) where a deep neural network is used predict the output of an existing pipeline on high quality images from synthetically degraded versions of these images. We demonstrate DART on retinal Fractal Dimension (FD) calculated by VAMPIRE, using retinal images from UK Biobank that previous work identified as high quality. Our method shows very high agreement with FD VAMPIRE on unseen test images (Pearson r=0.9572). Even when those images are severely degraded, DART can still recover an FD estimate that shows good agreement with FD VAMPIRE obtained from the original images (Pearson r=0.8817). This suggests that our method could enable researchers to discard fewer images in the future. Our method can compute FD for over 1,000img/s using a single GPU. We consider these to be very encouraging initial results and hope to develop this approach into a useful tool for retinal analysis.


Retinal Image Restoration and Vessel Segmentation using Modified Cycle-CBAM and CBAM-UNet

Clinical screening with low-quality fundus images is challenging and sig...

Towards Adversarial Retinal Image Synthesis

Synthesizing images of the eye fundus is a challenging task that has bee...

Structure-preserving Guided Retinal Image Filtering and Its Application for Optic Disc Analysis

Retinal fundus photographs have been used in the diagnosis of many ocula...

Human Recognition based on Retinal Bifurcations and Modified Correlation Function

Nowadays high security is an important issue for most of the secure plac...

Artificial intelligence as a gateway to scientific discovery: Uncovering features in retinal fundus images

Purpose: Convolutional neural networks can be trained to detect various ...

RetinaMatch: Efficient Template Matching of Retina Images for Teleophthalmology

Retinal template matching and registration is an important challenge in ...

Deep Learning-Based Perceptual Stimulus Encoder for Bionic Vision

Retinal implants have the potential to treat incurable blindness, yet th...

1 Introduction

Retinal fundus images are non-invasive and low-cost. They are important for ophthalmology and also capture a detailed picture of the retinal vasculature. Thus, they can be used for studying and potentially predicting diseases such as diabetes, stroke, hypertension and neurovascular disease [macgillivray2014retinal]. To analyse the relationships between aspects of the retina and other quantities of interest, retinal traits (also called features, parameters or phenotypes) are used as a quantitative description of a specific aspect of the retinal image. Reducing a complex image to a single, meaningful number is necessary to use standard statistical methods yet a challenging task. It is challenging to identify a potentially salient aspect of the retina in the first place and to then design a method that can reliably quantify this aspect. This is further complicated by the large variability in retinal images stemming from idiosyncrasies of the imaged retinas (e.g. due to retinal diseases or rare phenotypes) and image quality (e.g. due to operator inexperience or time pressures in large scale cohort studies). Thus, pipelines for extracting such retinal traits tend to be complex and comprise of multiple steps, and can only be applied to images of sufficient quality.

Poor image quality is a key problem in retinal image analysis. Particularly for large scale studies such as UK Biobank, many images are of poor quality being blurred, obscured, or hazy [macgillivray2015suitability]. Imaging artefacts such as noise, non-uniform illumination or blur can also lead to poor vessel segmentations [mookiah2021review]. Previous work analysing 2,690 UK Biobank participants found that only 60% had an image that could be adequately analysed by VAMPIRE [macgillivray2015suitability]. Two recent large-scale studies using retinal Fractal Dimension (FD) for predicting cardiovascular disease risk discarded 26% [zekavat2022deep] and 43% [velasco2021decreased] of the images in UK Biobank. Although necessary, this is unfortunate as it leads to lower sample sizes and makes it hard to study rare diseases in particular.

Figure 1: Overview of our proposed framework. a) A typical pipeline for computing FD: an encoder-decoder neural network for segmentation, potentially some refinement steps like optic disc segmentation and removal, and a method to calculate FD of the segmentation (e.g. box counting or multifractal). b) DART, our proposed approach outputs a deep approximation of FD in a single step using an encoder-only neural network, with drastically reduced complexity. c) We can train our model to be robust to image quality issues by synthetically degrading input images and training our model to minimise the loss between its output and the FD obtained with the original high quality image.

We hypothesise that it is possible to approximate pipelines for calculating retinal traits with a single, simpler step and propose Deep Approximation of Retinal Traits (DART). Fig. 1

gives a high-level overview of our approach. DART trains a deep neural network (DNN) to predict the output of an original method (OM) for calculating a retinal trait. We can then train the model to be robust to image quality issues by synthetically degrading the input images during training and asking the DNN model to predict the output of the OM on the original high quality image. The intuition behind this approach is that obtaining a high quality segmentation of the entire retina is a much harder task than describing an aspect of the vasculature like vascular complexity directly. DART offers a segmentation-free way of computing retinal traits related to the vasculature, but can also be applied to any other retinal image analysis method like feature extraction for disease grading or pathology segmentation.

In the present work, we focus on retinal Fractal Dimension (FD), a key retinal trait that has been used to predict cardiovascular disease risk [velasco2021decreased, zekavat2022deep] and is associated with neurodegeneration and stroke [lemmens2020systematic]. We use FD as calculated by VAMPIRE [trucco2013novel] with the multifractal [stosic2006multifractal] method as the OM we apply DART to. At minimum, FDDART should have very high agreement with FDVAMPIRE on high quality images so that it can be interpreted in the same way. To be a useful method, it should further be robust to image quality issues and efficient. Robustness would enable researchers to discard fewer images than currently necessary while efficiency allows to conduct analyses at large scale without requiring large compute resources.

2 Deep Approximation of Retinal Traits (DART)

2.1 Motivation and theory

We hypothesise that it is possible to approximate the entire pipeline of an original method (OM) for calculating a retinal trait in a single, simpler step. We denote the distribution of high quality retinal fundus images as , where each image has dimensions height H, width W, and channels C. The OM can be interpreted as a function that maps from the image space to one-dimensional retinal trait space (in our case, FD) , i.e. given an image the FD computed by the OM is . Our goal is to find an alternative function that is both simpler than and has high agreement with for all images of sufficient quality that the OM can be used, i.e. for all .

Designing such a simpler function by hand would be very challenging. Thus, we use a deep neural network (DNN). DNNs are universal function approximators in theory and very effective for image analysis in practice. We can then find a good approximation of by simply updating the model parameters (weights, biases, normalisation layer parameters) to minimise some differentiable measure of divergence between and , e.g. mean squared error.

2.1.1 Accuracy

The output of the OM is fully determined by the given image, so we would expect that very high accuracy can be achieved. This contrasts with other problems, e.g. clincians take into account additional information like symptoms and family history, and might disagree with each other or even themselves if shown the same image multiple times.

2.1.2 Simplicity & Efficiency

Some readers might not perceive DNNs as simple or efficient. However, modern pipelines for retinal image analysis tend to use DNNs for vessel segmentation, so not requiring additional steps implies strictly lower complexity both computationally and in terms of required code. Furthermore, segmentation models tend to have an encoder-decoder structure (e.g. UNet) whereas models for classification/regression only need an encoder and small prediction head, making them more parameter-, memory-, and compute-efficient. Finally, given the widespread adoption of deep learning, the frameworks are very mature and can be very efficiently GPU-accelerated.

2.1.3 Robustness

We hypothesise that there images of lower quality that are such that a) current pipelines would not produce a useful FD number, but b) there is still sufficient information to give an accurate estimate of the FD number we would have obtained on a counterfactual high quality image. For example, in an image with an obstruction, only part of the retina might be visible. Thus, the resulting vessel segmentation map would be poor and the FD of this map would be very different from that of the counterfactual high quality image, yet the visible parts of the retina might contain sufficient information about the vascular complexity of the retina as a whole to recover an accurate estimate of the FD.

As we do not observe counterfactual high quality images or objective ground truth FD values, we artificially degrade high quality images with a degradation function and train our model to minimise the difference between the predicted FD for the degraded image and the OM’s FD for the high quality image . If there indeed is sufficient information in the degraded images, then our model should be able to predict the OM’s FD from the high quality image reasonably well. However, this is a much harder task than matching the OM on high quality images, as the degradations lose information and for a given degraded image there are multiple possible counterfactual high quality images.

2.2 Implementation

2.2.1 Model & Training

Our model consists of a pretrained ResNet18 [he2016deep]

backbone that extracts a feature map from the images, followed by spatial average pool and a small multi-layer perceptron with a two hidden layers with 128 and 32 units, and a single output. Each hidden layer is followed by a layernorm

[ba2016layer] and GELU [hendrycks2016gaussian] activation. No activation is applied to the final output. ResNet is a well-established architecture that has been shown to perform competitively with more recent architectures when using modern training techniques [bello2021revisiting, wightman2021resnet]. We use Resnet18 as it is the most light-weight member of the Resnet family. We initialise the backbones with pre-trained weights on natural images from Instagram [yalniz2019billion]

. Those images are very different from retinal images, thus this is merely a minor refinement on random initialisation. We resize images to 224x224 pixels for computational efficiency and lower memory requirements. Apart from standard normalisation using channel-wise ImageNet mean and standard deviations, no further preprocessing is done and all 3 colour channels are kept.

We train our model using a batchsize of 256 to minimise the mean squared error between prediction and target after normalizing the target to zero mean and unit variance, using mean and standard deviation from the training data to avoid data leakage. The model output can then be mapped back to FD range by applying the inverse transformation. We use the AdamW optimiser

[loshchilov2017decoupled] (, weight decay of ) and a cosine learning rate schedule [loshchilov2016sgdr]

. We train for 35 epochs with a linear learning rate warmup from

to for 5 epochs, followed by 3 cycles of 10 epochs each. During each cycle, the current epoch learning rate is set according to a cosine schedule, and after each cycle is decayed by taking the square root. We apply generic data augmentations (horizontal () and vertical flip (), mild affine transformations (, rotation by up to ±10°, shear of up to ±5°, and scaling by ±5%)) as well as the image degradations described in the next section with

(sampling all 5 levels uniformly) to the images during training. We implemented our code in Python 3.9 using PyTorch and timm

[rw2019timm] and plan to make it publicly available upon publication.

2.2.2 Synthetic degradations

max width=0.9max totalheight=1

Severity 1 2 3 4 5
Brightness/Contrast/Gamma ±5% ±10% ±15% ±20% ±25%
Mini Artifacts (holes, height, width) 2-20/1-3/5-8 2-24/1-5/5-12 2-28/1-5/5-16 2-32/1-3/5-20 2-40/1-3/5-24
Square Artifacts (side length) 25 50 75 100 125
Chop Artifacts (% of image removed) 10-15 10-25 10-35 10-45 10-50
Advanced Blur (kernel size, sigma) 3-5/0.2-0.5 3-7/0.2-0.7 3-9/0.2-0.8 3-11/0.2-0.9 3-13/0.2-1.0
Gaussian Noise (variance) 1-10 5-10 5-20 5-25 5-30
Table 1: Severity levels for the degradations. Brightness, contrast and gamma changes are independently sampled from the given interval. Dimensions in pixels.

We focus on three types of quality issues in retinal images [mookiah2021review, macgillivray2015suitability]

: Lighting issues, artifacts/obstructions, and imaging issues. To simulate general lighting issues, we independently change brightness, contrast and gamma of the image. To simulate artifacts/obstructions and severely inconsistent lighting, we introduce one of three artifacts: 1) many smaller rectangular holes placed across the retina, b) a single large square hole, or c) we “chop” off the bottom or top part of the image. The latter is inspired by the observation that in UK Biobank some images only have the top or bottom part properly illuminated. To simulate general imaging issues, we add pixel-wise Gaussian noise and blur the image. Standard isotropic Gaussian blur kernels do not mimic realistic image blur, so we use an advanced anisotropic blurring technique developed for image super-resolution

[wang2021real] where the standard deviations for both dimensions of the kernel are sampled independently, and the kernel is then rotated and has some noise added before being applied to the image.

We specify degradation parameters for five levels of severity, shown in Table 1. For a given level, we sample parameters for each image independently from the given ranges. Degradations are applied after images have already been downsized to 224x224. We apply an artifact with where is the severity. If an image was chosen to have an artifact applied to it, we then choose Mini Artifacts with , Square Artifact with , and Chop Artifact with . Degradations are implemented using the albumentations package [info11020125].

Figure 2: Random examples of synthetically degraded versions of the same fundus image. Best viewed zoomed in, especially for the advanced blur. UK Biobank asks to only reproduce imaging data where necessary, so we demonstrate the degradations on an image taken from DRIVE [staal2004ridge] which is similar in appearance to those in UK Biobank.

3 Experiments

3.1 Data

We apply our DART framework multi-fractal FD [stosic2006multifractal] calculated with VAMPIRE [trucco2013novel]. We use only images that had been identified as high quality in a previous study [velasco2021decreased] as for those images FDVAMPIRE should be reliable and can be considered as a reasonable “ground-truth”. We randomly split the data into train, validation, and test sets containing 70, 10, and 20% of the participants in UK Biobank, resulting in 52,242 / 7,478 / 14,907 images belonging to 32,300 / 4,614 / 9,229 participants in each set. We split at the participant level such that no images of the same participant occur in different sets. Images are cropped to square to remove black non-retinal regions and processed at 224x224 as described above.

3.2 Results

3.2.1 Agreement & Robustness

max width=0.9max totalheight=1 Degradations Pearson (p-value) Spearman (p-value) OLS Regression fit None 0.9160 0.9572 (0.0000) 0.9561 (0.0000) y=0.01 + 1.00x Severity 1 0.8957 0.9467 (0.0000) 0.9446 (0.0000) y=0.01 + 0.99x Severity 2 0.8859 0.9414 (0.0000) 0.9396 (0.0000) y=0.01 + 0.99x Severity 3 0.8623 0.9287 (0.0000) 0.9282 (0.0000) y=0.00 + 1.00x Severity 4 0.8309 0.9116 (0.0000) 0.9103 (0.0000) y=0.01 + 0.99x Severity 5 0.7773 0.8817 (0.0000) 0.8840 (0.0000) y=0.02 + 0.99x

Table 2: Agreement between FDVAMPIRE obtained on high quality images, and FDDART for different levels of degradation measured on 14,907 held-out test set images.

We find very high agreement between FDVAMPIRE and FDDART on the original images with Pearson and . Table 2 shows results for different levels of degradations. When degrading the images and asking our model to predict the FDVAMPIRE obtained from the high quality image, agreements goes down as the images become more degraded, which is what we would expect as these degradations remove substantial information about the retinal vasculature. However, despite this, we still observe good agreement with the FDVAMPIRE obtained on the original image even at severity level 5 where extreme degradations are applied (Pearson and ). This suggests that DART can recover good estimates of the retinal trait that would have been obtained from a counterfactual high quality image even if the available image has very poor quality. Thus, this might allow for discarding much fewer images than currently necessary.

(a) Scatterplots of FDDART against FDVAMPIRE obtained
from original images for different levels of degradation.
(b) Boxplots of the
Figure 5: Agreement results for 14,907 held-out test set images. Best viewed zoomed in. a) Red line: best linear fit; dashed black line: . b) Faint red line: ; vertical black lines: ± one interquartile range (IQR) of FDVAMPIRE for reference.

For comparison, a previous study comparing FD for arteries and veins separately between VAMPIRE and SIVA [mcgrory2018towards] found very poor agreement between the measures of the two tools ( and for arteries and veins, respectively). Another study comparing vessel caliber-related retinal traits obtained with VAMPIRE, SIVA, and IVAN found that they agreed with Pearson s of 0.29 to 0.86. Thus, the observed agreement between FDVAMPIRE and FDDART with a Pearson and is very high, and even when DART is applied the most degraded images the agreement (Pearson and ) is higher than what could be expected when using two different tools on the same high quality images.

Finally, our method shows very low bias even as degradation severity is increased (Fig. 5). The best OLS fit is very close to the identity line for all levels of severity, or equivalently, the optimal linear translation function from FDDART to FDVAMPIRE is almost simply the identity function. This also implies that no post-hoc adjustment for image quality is needed and FDDART values obtained for images of varying quality are on the same scale out-of-the-box. As degradation severity increases, the variance of the residuals also increases but most residuals are still less than one interquartile range (IQR), a robust equivalent of the standard deviation, even when applying the strongest degradation.

3.2.2 Speed

Images were loaded into RAM so that hard disk speed is not a factor. We then measured the time it took to process all 52,242 training images, including normalisation, moving them from RAM to GPU VRAM, as well as the time to move the results back to RAM. We used a modern workstation (Intel i9-9920X 24 core CPU, single Nvidia RTX A6000 24GB GPU, 126GB of RAM) and a batchsize of 440. With ResNet18 as backbone, our model processed all 52,242 images in 48.5s ± 93.6 ms (mean±std over 5 runs), yielding a rate of 1077 img/s.

4 Conclusion

We have shown that we can use DART to approximate the multi-step pipeline for obtaining FDVAMPIRE with very high agreement. Our resulting model can compute FDDART for over 1,000img/s using a GPU. Furthermore, our model can compute FDDART values from severely degraded images that still match the FDVAMPIRE values obtained on the high quality images well. This could allow researchers interested in studying retinal traits to discard fewer images than currently necessary and thus have higher sample sizes. We consider these to be very encouraging initial results.

There are a number of directions for future work. First, the proposed framework can be easily applied to other retinal traits like vessel tortuosity or width, or FD as calculated by other pipelines. We would expect that this would be similarly successful. Second, the robustness of the resulting DART model should be evaluated in more depth and the cases with extreme residuals should be manually examined. We expect that robustness can be further improved, especially if we identify common failure cases and use those as data augmentations. Third, many straight-forward, incremental technical improvements should be possible such as improved training procedures to further increase performance, trying different architectures and resolutions, and speeding up inference speed further through common tricks like fusing batch norm layers into the convolutional layers. Finally, we hope that our approach will eventually enable other researchers to conduct better analyses, e.g. by not having to discard as many images and thus having a larger sample size available.


We thank our colleagues for their help and support.

This research has been conducted using the UK Biobank Resource under project 72144. This work was supported by the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. For the purpose of open access, the author has applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising.