With the advent of smartphone indirect ophthalmoscopy, teleophthalmology – the use of specialist ophthalmology assets at a distance from the patient – has experienced a breakthrough, promising enormous benefits especially for healthcare in distant, inaccessible or opthalmologically underserved areas, where specialists are either unavailable or too few in number. However, accurate teleophthalmology requires high-quality ophthalmoscopic imagery. This paper considers three feature families – statistical metrics, gradient-based metrics and wavelet transform coefficient derived indicators – as possible metrics to identify unsharp or blurry images. By using standard machine learning techniques, the suitability of these features for image quality assessment is confirmed, albeit on a rather small data set. With the increased availability and decreasing cost of digital ophthalmoscopy on one hand and the increased prevalence of diabetic retinopathy worldwide on the other, creating tools that can determine whether an image is likely to be diagnostically suitable can play a significant role in accelerating and streamlining the teleophthalmology process. This paper highlights the need for more research in this area, including the compilation of a diverse database of ophthalmoscopic imagery, annotated with quality markers, to train the Point of Acquisition error detection algorithms of the future.
1 Introduction††margin: Figure 1. Comparison of a normal versus a blurry ophthalmoscopy image from Köhler et al. Note the reduced vascular definition over the optic disc and the obliteration of fine vasculature contrast in the blurry image
With increasing frequency, ophthalmoscopy is performed using digital methods of acquisition, using digital indirect fundus cameras. However, owing to the advent of smartphone digital ophtalmoscopy,[6, 3] what was once the preserve of ophthalmoscopic practice in the developed world has become an important tool of the ’democratisation’ of eye care and retinal screening, not the least where ophthalmological experts are not available on site or other reasons for telemedicine and image storage prevail. The ubiquity of relatively high quality cameras and serious computing power on smartphones has fuelled this trend globally, with incredible ramifications for public health worldwide and the opportunity to screen patients effectively, even in areas where specialised ophthalmological expertise may not be available and where specialised high quality tools for digital ophthalmoscopy are not available, and direct them to adequate treatment. In spite of this, of all the recognised DICOM modalities, fundoscopy imagery (DICOM modality OP) has remained relatively neglected from the standpoint of automated analytics, compared to other more popular modalities, such as magnetic resonance imaging, computed tomography, nuclear medicine and ultrasonography. Considering the potential of teleophthalmology in ophthalmic trauma, ophthalmic infectious disease triage, early recognition of congenital defects and the growing public health threat of diabetic retinopathy, the importance of algorithmic analysis of ophthalmic imagery cannot be understated.
From the perspective of machine learning, retinal imagery presents two difficulties. Firstly, the typical ophthalmoscopy image is enormously feature-rich and contextual. Not only are there numerous confounding features, but the clinical meaning of those features is largely contingent on their location. So for instance on a normal, non-pathological image, a slightly hyperpigmented (dark) spot, very slightly smaller (approximately 1.5mm or ) than the optic disc (approximately 1.77mm or ), is positioned roughly halfway between the optic disc and the edge of the fundoscopic image. This is the macula, the area responsible for sharp central colour vision. However, at any other location, the same sort of darker spot might be indicative of localised disease, including e.g. large diffuse haemorrhages occurring in diabetic retinopathy. Simple object recognisers therefore fail at resolving this problem adequately, by struggling to consider that the meaning of a feature is dependent on its milieu (the features surrounding it and its location within the total fundus image). For this reason, most conventional methods of machine learning from images encounter significant obstacles when faced with retinal imagery.
The sine qua non of successful teleophthalmology is the acqusition of high quality retinal imagery. Since teleophthalmology customarily relies on general practitioners, nurses, health visitors and other public health workers without specialist training in ophthalmology and retinal imaging to acquire images that need later be useful for the reviewing ophthalmologist, a Point of Acquisition quality assessment tool is absolutely crucial to ensure the success of teleophthalmology programmes. By far one of the most serious problems is posed by out-of-focus or blurry images – unlike direct ophthalmoscopy, where the clinician can adjust the focus on the ophthalmoscope, digital ophthalmoscopic images are often compromised by blur issues, even those taken using adequate equipment, which can no longer be corrected after acquisition. Image blur may be caused by eye motion during acquisition or, more often, incorrect focus. The problem is even more significant with smartphone ophthalmoscopy, where focus is automatically set by the camera firmware, which was not designed for ophthalmoscopy specifically.
In order to assess image sharpness, we are making use of three distinct types of metrics. Statistical metrics give overall quantifications of the image, whereas gradient-based metrics are summarised local gradient metrics that are highly locationally sensitive and wavelet transform coefficient based metrics reveal more about the underlying patterns of the image.
2.1 Statistical metrics
We have employed three statistical metrics from the realm of information theory: overall energy , Shannon entropy and Atkinson’s Entropy Focus Criterion ( and ). The latter is more predominantly used in magnetic resonance imaging (MRI), where it serves to identify intra-slice motion artifacts, and to the best of the author’s knowledge, this is the first time it has been used to assess blurriness in ophthalmic fundoscopic imagery, albeit its use for iris recognition has been documented previously by Grabowski et al.
The overall energy of an image is defined as the sum of square pixel intensities. For multi-channel images, both the overall energy of the grayscale image
can be calculated, as well as the root mean square (RMS) of the channels , so that
Grayscale conversion is performed using the coefficients (R: , G: , B: ), pursuant to ITU-R Recommendation BT.709, using the conversion code in skimage.color.rgb2gray. Conceptually, image energy is the discretised spatial equivalent of the overall spectral energy of a continuous-time signal , which is calculated as
This is implemented by the sum of squares representation. To compensate for the size dependence, the mean pixel energy metric
is used in lieu of overall energy.
2.1.2 Shannon entropy
For an image , the corresponding Shannon entropy is defined as the negative of the number of possible outcomes for the pixels within the image. Shannon entropy can be understood as a metric of the information content in the overall image and is calculated as
where is the element-wise entropy function
2.1.3 Atkinson’s Entropy Focus Criterion (EFC)
For an image , Atkinson’s Entropy Focus Criterion (EFC) is the pixel value entropy, using e as the basis, normalised by the maximum entropy , where the entire energy of the image is concentrated in a single pixel.. For
the adjusted entropy is defined as
2.1.4 Atkinson’s Normalised EFC (NEFC)
Similarly to Atkinson’s EFC (on which see subsubsection 2.1.3), the normalised EFC is a metric of the relationship between individual pixel values and the maximum image entropy. However, it is robust and invariant in respect of changes in image dimensions by using an adjustment factor. Thus, for an image
where is Atkinson’s Entropy Focus Criterion as described in subsubsection 2.1.3.
2.2 Gradient based metrics
In addition to the statistical metrics, we will make use of three primary gradient-based metrics: the sum of thresholded root of squared Sobel kernels (also known as the Tenengrad metric [26, 23]), the mean absolute Laplacian (MAL) ,[15, 27] and the log variance of the mean absolute Laplacian
and the log variance of the mean absolute Laplacian, which I will refer to henceforth as the log Pech-Pacheco metric. In addition, we are using two derived metrics, and , which are perivasculatity weighted equivalents of and , respectively.
2.2.1 Tenengrad gradient magnitude
Tenengrad is a gradient magnitude metric first described by Tenenbaum and later explored by Schlag et al. and Krotkov among others. Tenengrad combines magnitude informations from two Sobel kernels,
Then, given a source image , the gradient magnitude at , denoted , is calculated as the sum of squared convolutions, i.e.
where is the convolution of the source image with and is the convolution of the source image with , respectively, at the point . For a threshold value ,
and the thresholded Tenengrad measure is then defined as
while the unthresholded Tenengrad measure corresponds to
2.2.2 Mean of the absolute Laplacian () and energy of the Laplacian
For an image , the Laplacian is defined as
which in a discrete space can be approximated by the convolution of with the kernel
Then, the mean of absolute values of the Laplacian over the entire image space are
where denotes the convolution of with the kernel (see Equation 16). For an image , the energy of the Laplacian is defined as
2.2.3 Logarithmic Pech-Pacheco (Laplacian log variance) metric ()
Given an image , the log variance of the mean of the absolute Laplacian is defined, by reference to (see subsubsection 2.2.2), as
where is the Laplacian kernel described in Equation 16.
2.2.4 Mean perivascular Tenengrad and absolute Laplacian
The perivascular area – the sharp margin between the relatively homogenous blood vessel and the equally rather homogenous retinal surface – is an outstanding indicator of image definition, as it shows how well an unconditional ground truth (the sharp vascular margin) is reflected in the image. Therefore, perivascular weighting of Tenengrad (see subsubsection 2.2.1) and absolute Laplacian (see subsubsection 2.2.2) metrics can give a better image of image sharpness and definition. To facilitate this, a perivascular mask is constructed, which is then used to weight the vascularity-naive and metrics. The results are then re-averaged to arrive at the mean perivascular Tenengrad magnitude and the mean perivascular absolute Laplacian , respectively.
The perivascular mask is computed by reference to a filtered image , where is the multiscale vessel enhancement filter described by Frangi et al. (1998). This yields the perivascular mask so that
where is the structure tensor of the image
is the structure tensor of the imageat with . This creates a ’vascular neighbourhood’ image that can be used to weight gradients in the vascular environment. Since the vascular environment tends to contain sharp edges (the vascular boundary, which in a sharp ophthalmoscopic photograph appears as a steep gradient), weighting the vascular neighbourhood can ’focus’ gradient metrics (for an example, see Figure 2).
2.3 Wavelet coefficients
Let be the discrete 2D wavelet transform of with the wavelet , with coefficients . Then, the ordered set is the ordered set of the horizontal, vertical and diagonal coefficients of the wavelet transform. Then, for , let the variance of wavelet coefficients be defined as the ordered set
in which each of the elements over the third axis corresponds to the horizontal, vertical and diagonal coefficients of the wavelet transform, respectively. Let furthermore the sum of squared coefficients be
For the purposes of this paper, two Daubechies wavelets and , the biorthogonal wavelet and the Haar wavelet were used. To prevent spurious signals arising from JPEG compressed source images, a mask was applied onto the images based on the shape of the raw ophthalmoscopic images to filter out any values in the dark areas.
In order to understand to what extent image blurriness prejudices diagnostic suitability, the reference dataset by Köhler et al. was used. This diagnostic data set contains 18 pairs of images, each acquired by a Canon CR-1 fundus camera (Canon Inc. Medical Equipment Group, Tokyo, Japan) with a field of view, with each pair containing one ’good’ and one ’bad’ quality image. Each image is stored as a 3456x5184px 3-channel RGB JPEG image. Image processing was performed using Python 3.6, primarily using skimage and scipy. The pandas package was used to aggregate and analyse data. Wavelets were implemented using the PyWavelets package.
3.1 Principal components analysis (PCA)††margin: Figure 3
. 2-dimensional principal components analysis of all feature variables.Note the relatively tight groupings of normal-valued images’ PCAs surrounded by blurry images’ PCAs.
A PCA was performed to elicit grouping density and dimensional reducibility. PCA was conducted on standard-scaled values, where the standard score of each value from a sample was defined as
The principal components were separately analysed for three distinct variable clusters: statistical indicators (Atkinson’s EFC and NEFC metrics, and Shannon entropy ), gradients and gradient-derived indicators (Tenengrad, , perivascular weighted Tenengrad and perivascular weighted ) and for wavelet transforms, the variance of the diagonal, vertical and horizontal coefficients and the sum of squared coefficients by wavelet (Figure 3).
3.2 Machine learning algorithms over the image metrics
The data set was randomly split into a 25% () test set and a 75% ( ) training set. It is important to note that the scarcity of data was suboptimal and for a more thorough understanding of the predictive potential of each of the indicators, as would be discernible from e.g. Bayesian logistic regression using a Monte Carlo sampling algorithm like NUTS (No U-Turn Sampling) . Comparative performance of classifiers as exemplified by ROC curves.
) training set. It is important to note that the scarcity of data was suboptimal and for a more thorough understanding of the predictive potential of each of the indicators, as would be discernible from e.g. Bayesian logistic regression using a Monte Carlo sampling algorithm like NUTS (No U-Turn Sampling)[28, 8] or simple Metropolis-Hastings sampling, a larger dataset would be indispensable. ††margin: Figure 4
. Comparative performance of classifiers as exemplified by ROC curves.Note the outstanding performance of both the sigmoid kernel SVM and the crossvalidated logistic regression. Both the sigmoid kernel SVM classifier and the crossvalidated logistic regression arrived at the same boundaries and thus overlap on the ROC curve. This is largely due to the scarce test set.
In this stage, three classifiers were applied:
logistic regression with 5-fold crossvalidation and F1 optimised scoring,
a support vector classifier with a sigmoid kernel.
Overall, the performance of both the sigmoid kernel SVM classifier and logistic regression has been excellent, with an Area Under Curve (AUC) of 0.95 for both. However, when considering the score, a metric that considers both accuracy and recall, logistic regression () far outperforms the sigmoid kernel SVM classifier () and leaves the random forest model () far behind. At least partially at fault for this is the relatively small sample count, and it would no doubt be intriguing to repeat these assessments with a larger data set of professionally annotated blurry versus sharp images.
At the same time, the conclusion is inescapable: by the combination of simple statistical indicators, together with vascularity weighted gradients and wavelet-based metrics drawn from an ensemble of relatively divergent wavelets (, and are asymmetric orthogonal bi-orthogonal wavelets while is a symmetric non-orthogonal biorthogonal wavelet), a fast and efficient method for assessing image blurriness can be generated that is able to triage ophthalmoscopic images with great certainty. For the future of teleophthalmoscopy, the development, standardisation and testing of such an ensemble algorithm will no doubt be a significant first step.
Already a decade ago, Hossain et al. labelled obesity and diabetes in the developing world ”a growing challenge”. The emergence of this growing public health issue, fuelled by dietary and cultural changes especially in urban areas in developing nations, will bring with itself a new and unprecedented challenge: ensuring that populations already underserved when it comes to ophthalmic care will receive adequate care for diabetic retinopathy, the prevalence of which can only be expected to increase.[25, 5] Teleophthalmology can be a cornerstone of providing this care to a wide range of the population, even in areas where specialist ophthalmologic resources are scarce, unavailable on site, or difficult to access.[21, 2] While smartphone ophthalmoscopy holds enormous potential to provide screening services to underserved populations, its potential beneficial impact is contingent on the ability to acquire high-quality images. Especially where images are acquired with nonspecialist equipment, such as smartphones, pre-analysis Quality Assurance, performed at point of image acquisition, is going to be the crucial determinative factor of whether a healthcare system can successfully leverage teleophthalmology to alleviate the increasing workload arising from the rise in diabetic retinopathy.[19, 14, 29] As computing power in smartphones increases rapidly while their cost rapidly decreases (along with the decreasing cost and increasing ease of access to smartphone ophthalmoscopic adapters, some of which can now be manufactured using a commercial off-the-shelf 3D printer), the case for on-device quality control becomes increasingly warranted. Since the operational concept of smartphone ophthalmoscopy envisages such devices being used by nonspecialists, such as general practitioners, nurses, health visitors and other public health workers, device-based feedback will be the primary method of quality control and quality triage to ensure that only images that are of sufficient quality to be further evaluated by a specialist are further transmitted into the teleophthalmology system. Fast and high-performance detection of blurry, unsharp or out of focus images is a crucial first step towards ensuring this.
This paper disclosed an ensemble method of multi-component feature extraction, followed by training and evaluating the results on some of the most frequently used machine learning algorithms. Unfortunately, while the initial results are encouraging, the small sample size (a balanced set of
This paper disclosed an ensemble method of multi-component feature extraction, followed by training and evaluating the results on some of the most frequently used machine learning algorithms. Unfortunately, while the initial results are encouraging, the small sample size (a balanced set of) draws attention to the fact that more data is needed to isolate the most influential components that assist in determining whether an image is too blurry to be diagnostically useful. It is also important to note that the images used in this project were nonpathological, i.e. taken of generally healthy persons. Testing whether the algorithms are robust with respect to pathologies, including those that manifest as large, diffuse and undifferentiated areas (such as the large white reflex spots typical of retinoblastoma or the diffuse appearance of the optic disc in severe acute papilloedema) is paramount before they can play a valuable clinical role. Given the magnitude of the challenge inherent in providing quality ophthalmic care to a growing population in need and already underserved in many respects, creating a large, professionally annotated data set for the detection of various image quality defects should be considered a global research priority.
The author wishes to thank Tamás Marton and Katie von Csefalvay for their helpful suggestions in finalising the manuscript, and his colleagues at Starschema for fostering his interest in the subject and encouraging the work that led to this article. All errors and omissions are the author’s own.
The author has no relevant affiliations or financial involvement with any organization or entity with financial interest or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
- 1. D. Atkinson, D. L. Hill, P. N. Stoyle, P. E. Summers, and S. F. Keevil. Automatic correction of motion artifacts in magnetic resonance images using an entropy focus criterion. IEEE Transactions on Medical Imaging, 16(6):903–910, 1997.
- 2. R. L. Bashshur, G. W. Shannon, B. R. Smith, and M. A. Woodward. The empirical evidence for the telemedicine intervention in diabetes management. Telemedicine and e-Health, 21(5):321–354, 2015.
- 3. A. Bastawrous. Smartphone fundoscopy. Ophthalmology, 119(2):432–433, 2012.
- 4. A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever. Multiscale vessel enhancement filtering. In International Conference on Medical Image Computing and Computer-assisted Intervention, pages 130–137. Springer, 1998.
- 5. D. S. Friedman, F. Ali, and N. Kourgialis. Diabetic retinopathy in the developing world: how to approach identifying and treating underserved populations. American Journal of Ophthalmology, 151(2):192–194, 2011.
- 6. M. E. Giardini, I. A. Livingstone, S. Jordan, N. M. Bolster, T. Peto, M. Burton, and A. Bastawrous. A smartphone based ophthalmoscope. 2014.
- 7. K. Grabowski, W. Sankowski, M. Zubert, and M. Napieralska. Focus assessment issues in iris image acquisition system. In 2007 14th International Conference on Mixed Design of Integrated Circuits and Systems, pages 628–631. IEEE, 2007.
- 8. M. D. Hoffman and A. Gelman. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
- 9. P. Hossain, B. Kawar, and M. El Nahas. Obesity and diabetes in the developing world—a growing challenge. 2009.
- 10. ITU. ITU-R Recommendation BT.709 Basic parameter values for the HDTV standard for the studio and for international programme exchange. Technical report, International Telecommunications Union, 1990.
- 11. E. Jones, T. Oliphant, and P. Peterson. SciPy: Open source scientific tools for Python. 2014.
- 12. T. Köhler, A. Budai, M. F. Kraus, J. Odstrčilik, G. Michelson, and J. Hornegger. Automatic no-reference quality assessment for retinal fundus images using vessel segmentation. In Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, pages 95–100. IEEE, 2013.
International Journal of Computer Vision, 1(3):223–237, 1988.
- 14. S. Kumar, E.-H. Wang, M. J. Pokabla, and R. J. Noecker. Teleophthalmology assessment of diabetic retinopathy fundus images: smartphone versus standard office computer workstation. TELEMEDICINE and e-HEALTH, 18(2):158–162, 2012.
- 15. V. Laparra, J. Ballé, A. Berardino, and E. P. Simoncelli. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16):1–6, 2016.
- 16. G. Lee, F. Wasilewski, R. Gommers, K. Wohlfahrt, A. O’Leary, and H. Nahrstaedt. PyWavelets: Wavelet Transforms in Python, 2006.
- 17. B. Liesenfeld, E. Kohner, W. Piehlmeier, S. Kluthe, S. Aldington, M. Porta, T. Bek, M. Obermaier, H. Mayer, G. Mann, et al. A telemedical approach to the screening of diabetic retinopathy: digital fundus photography. Diabetes Care, 23(3):345–348, 2000.
- 18. W. McKinney. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing, 14, 2011.
- 19. M. Mohammadpour, Z. Heidari, M. Mirghorbani, and H. Hashemi. Smartphones, tele-ophthalmology, and VISION 2020. International Journal of Ophthalmology, 10(12):1909, 2017.
- 20. D. Myung, A. Jais, L. He, M. S. Blumenkranz, and R. T. Chang. 3D printed smartphone indirect lens adapter for rapid, high quality retinal imaging. Journal of Mobile Technology in Medicine, 3(1):9–15, 2014.
- 21. F. J. Pasquel, A. M. Hendrick, M. Ryan, E. Cason, M. K. Ali, and K. V. Narayan. Cost-effectiveness of different diabetic retinopathy screening modalities. Journal of Diabetes Science and Technology, 10(2):301–307, 2016.
J. L. Pech-Pacheco, G. Cristóbal, J. Chamorro-Martinez, and
Diatom autofocusing in brightfield microscopy: a comparative study.
Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 3, pages 314–317. IEEE, 2000.
- 23. J. F. Schlag, A. C. Sanderson, C. P. Neuman, and F. C. Wimberly. Implementation of automatic focusing algorithms for a computer vision system with camera control. Technical report, Institute of Robotics, Carnegie-Mellon University, Pittsburgh, PA, 1983.
- 24. T. S. Surendran and R. Raman. Teleophthalmology in diabetic retinopathy. Journal of Diabetes Science and Technology, 8(2):262–266, 2014.
- 25. H. R. Taylor and J. E. Keeffe. World blindness: a 21st century perspective. British Journal of Ophthalmology, 85(3):261–266, 2001.
- 26. J. M. Tenenbaum. Accommodation in computer vision. Technical report, Stanford University, Department of Computer Science, 1970.
- 27. L. J. van Vliet, I. T. Young, and G. L. Beckers. A nonlinear Laplace operator as edge detector in noisy images. Computer Vision, Graphics, and Image Processing, 45(2):167–195, 1989.
M. Xu, B. Lakshminarayanan, Y. W. Teh, J. Zhu, and B. Zhang.
Distributed Bayesian posterior sampling via moment sharing.In Advances in Neural Information Processing Systems, pages 3356–3364, 2014.
- 29. Y. Ye, J. Wang, Y. Xie, J. Zhong, Y. Hu, B. Chen, X. He, and H. Zhang. Global teleophthalmology with iPhones for real-time slitlamp eye examination. Eye & Contact Lens, 40(5):297–300, 2014.