Over the past years, digital imaging technologies have been established in ophthalmology to examine the human retina in an in-vivo and non-invasive way . Digital fundus cameras and scanning laser ophthalmoscopes (SLO) are some of the most commonly used systems to capture single images or video sequences of the retina . This is an essential part for the diagnosis of retinal diseases, e. g. in computer-assisted glaucoma  or diabetic retinopathy screening . In intraoperative applications, the slit lamp is a common technique for live examination of the eye background . Common to all of these approaches is the strong need for capturing high-resolution images with a wide field of view (FOV) to employ them for diagnosis or intervention planning. However, in retinal imaging this is difficult due to technological and economic reasons. First, for a wide FOV the pupil should be dilated. Moreover, the spatial resolution is limited by the characteristics of the camera sensor and optics. Modern fundus cameras and SLO are able to provide images with sufficient resolution to support diagnostic procedures but they are relatively expensive and not mobile limiting their benefits for low-cost screening applications.
For these reasons, two complementary software-based strategies are an emerging field of research. (i) Image mosaicing to register and combine multiple views showing different regions of the retina has been proposed to increase the FOV. Can et al.  and later Zheng et al.  have developed feature-based registration schemes based on vascular landmark matching that are applicable to mosaicing of high-resolution fundus images acquired longitudinally. Similarly, interest points can be used for feature-based registration . In more recent approaches, intensity-based  or hybrid image registration [5, 10]
have been proposed to avoid the need for accurate feature extraction. Common to these methods is that they either rely on high-quality data or they are applicable to images of poor quality but cannot enhance them, e. g. in terms of resolution. (ii) For spatial resolution enhancement of digital images, super-resolution has been investigated. In recent works from Köhler et al. and Thapa et al. 
, multiple low-resolution frames from a video sequence showing the same FOV but with small geometric displacements to each other are fused to reconstruct a high-resolution image. This approach exploits complementary information across different frames due to small natural eye motion during an examination. Unlike image mosaicing, the super-resolution paradigm cannot be employed to increase the FOV as all frames have to cover the same region. Early work in computer vision suggests the combination of both, using super-resolution reconstruction applied to a mosaic.
This paper proposes a novel multi-stage framework to reconstruct super-resolved retinal mosaic images. Unlike many related approaches, we exploit video sequences rather than longitudinally acquired images. As a key idea, we use a mobile and video-capable but low-cost camera to scan different regions on the retina as typically done, e. g. using fundus video cameras or slit lamps. We propose eye tracking to select appropriate views for our reconstruction method in a fully automatic way. Complementary to related concepts [5, 13], these are first fused by super-resolution reconstruction followed by image mosaicing. For accurate combination of the views, we introduce robust intensity-based registration and a novel adaptive weighting scheme. Our experimental evaluation performed with a low-cost fundus camera demonstrates the clinical practicality of our method.
2 Super-Resolved Mosaicing Framework
We consider a video sequence of frames denoted by the set , where each frame
is represented in vector notation. The frames inshow different regions of the retina, whereas each region is captured by frames and is referred to as a view. To apply our mosaicing framework within the scope of one examination, we scan different regions on the retina by exploiting eye movements conducted by a patient guidance. Our approach aims at reconstructing a super-resolved mosaic in a three-stage procedure as depicted in Fig. 1. (i) In order to select appropriate views for super-resolved mosaicing, we employ eye tracking to trace the eye position during the examination. (ii) For views automatically selected by the eye tracking, we reconstruct high-resolution images by means of multi-frame super-resolution where , . (iii) Finally, the complementary views in are registered to a common reference and are combined by image stitching.
2.1 Super-Resolution View Reconstruction
Appropriate views for mosaicing are selected based on eye tracking in an initial stage. For this purpose, we employ the geometry-based tracking-by-detection introduced by Kürten et al. . This enables real-time tracking of the optic disk as a robust feature to describe eye motion. For each frame , the tracking yields the optic disk radius and the pixel coordinates of its center point . Based on the coordinates , we decompose the entire input sequence into disjoint subsets . For each view , we compute the euclidean distance for consecutive positions relative to describing the eye position in the first frame of . Each view is composed of consecutive frames as , where is the maximum amount of motion accepted within one view. The starting positions are selected such that , where is the minimum distance between two successive views to gain an improvement in terms of FOV. The view with the closest distance to the centroid of all views is selected as reference to ensure that has sufficient overlap to all other views.
For each view , we obtain a super-resolution reconstruction based on the frames corresponding to this view. This reconstruction exploits subpixel motion in that is related to small, natural eye movements that occur during an examination. We adopt the adaptive algorithm presented in our prior work 
where the term denotes bilateral total variation (BTV) regularization and denotes an adaptive regularization weight. The parameter vector encodes subpixel motion to describe eye movements by an affine transformation and is the system matrix to model , sub-sampling and the camera point spread function. The multiplicative and additive parameters and model spatial and temporal illumination changes that are estimated by bias field correction. The regularization weight is adaptively selected using image quality self-assessment. To achieve uniform noise and sharpness characteristics across all views and hence consistency required for mosaicing, this parameter is initially determined for the reference view according to:
where denotes the quality measure to assess the appearance of that is reconstructed using the weight . Once is determined, Eq. (1) is solved for each view with a fixed regularization weight using scaled conjugate gradient iterations.
2.2 Mosaic Image Reconstruction
We propose a fixed-reference registration scheme for robust mosaicing that is insensitive to error accumulation. The super-resolved views , are registered to the reference
as selected by the automatic tracking procedure. For view registration, we employ a 12 degrees of freedom (DoF) quadratic transformation to consider the spherical surface of the retina. A point in is transformed to in according to:
where denotes the transformation parameters. To apply this model for view registration, we propose intensity-based registration. This approach does not rely on accurate feature detection, e. g. vascular tree segmentation, that is hard to achieve in retinal images of lower quality. As photometric differences between multiple views are an additional issue, we adopt the correlation coefficient as a similarity measure . This measure has been investigated to estimate projective transformations using an enhanced correlation coefficient (ECC) optimization algorithm  to maximize iteratively. Iterations are performed according to , where is the increment for the parameters at iteration computed from a scaled version of the Jacobian of . The proposed method generalizes this framework to the quadratic model in Eq. (3), where the Jacobian of with respect to is computed per pixel as:
The registration of view to the reference is implemented in a hierarchical scheme to avoid getting stuck in a local optimum. We employ the eye positions and obtained from the tracking procedure to estimate a translational motion and initialize our model in Eq. (3) by . Based on this initialization, we estimate the model equivalent to a 6 DoF affine model defined by . Finally, we estimate the full quadratic model in Eq. (3) using the affine model as initial guess. In addition to the geometric registration, mosaicing requires photometric registration to compensate for illumination differences across the views. Therefore, histogram matching is employed to determine a monotonic mapping that adjusts the histogram of each view , to the histogram of the reference .
Applying geometric and photometric registration to yields the registered view , whereas for we set . To reconstruct a mosaic from by image stitching, we propose the pixel-wise adaptive averaging :
where are adaptive weights. For reliable mosaicing, the weights need to be selected such that seams between overlapping views are suppressed. Moreover, robust mosaicing needs to account for the registration uncertainty of individual views. In our approach, these issues are addressed by the adaptive weights:
where is an indicator function with if the -th view contributes to the mosaic at position and otherwise. The spatially varying weights are computed by a distance map of that decays from a maximum at the image center to zero at the boundary. denotes the correlation evaluated on the vesselness filtered images  and as well as on the intensity images and to assess the registration uncertainty in overlapping views. To exclude incorrectly registered views from mosaicing based on their consistency with the reference, and are pre-defined thresholds for the correlation values.
3 Experiments and Results
We demonstrate the application of our framework for fundus imaging using the mobile and low-cost camera presented in  to acquire monochromatic images with a frame rate of 25 Hz. We examined the left eye of seven healthy subjects with a FOV of in vertical and in horizontal direction without pupil dilation. To scan different regions on the retina, we fixed the camera position and asked the subjects to fixate different positions on a fixation target. The duration of each video was 15 s and images were given in VGA resolution ( px). In total, we acquired 24 data sets. We aligned the camera such that the optic disk was centered in the first frame, which was selected as the reference view. We varied the number of views between and . The view selection was performed with px, px and . We employ our public available super-resolution toolbox111The latest version of our toolbox is available on our webpage www5.cs.fau.de/research/software/multi-frame-super-resolution-toolbox/ to reconstruct super-resolved views with magnification and apply mosaicing with and . The super-resolved mosaics for 24 datasets were assessed by three human experts in retinal imaging. Each image was ranked in the following categories: (i) Quality of the geometric registration and appearance of anatomical structures. (ii) Homogeneity of the illumination on the retina. (iii) Overall appearance of the reconstructed image. Each category was graded ranging from ’perfect’ (grade: 1) to ’not usable’ (grade: 4). The overall grade for each image was chosen to be the worst of the three categories. Two example images obtained with different grades are shown in Fig. 2. These were obtained by horizontal and vertical eye movements and the FOV was increased up to . The overall distribution of the quality grades is summarized in Fig. 3 (left).
To compare our approach to cameras used in clinical practice, we captured color fundus images with a Kowa nonmyd camera ( FOV, px). Fig. 4 compares a mosaic with a grayscale converted fundus image captured from the same subject, where mosaicing enhanced the horizontal FOV to based on views. This was comparable to the FOV provided by the Kowa camera. For quantitative evaluation, we examined the blind signal-to-noise ratio and edge preservation measured by:
are the mean and standard deviation of the intensity within a homogenous region of interest (ROI) in. Similarly, , and with
denote the weight, the mean and the standard deviation of a Gaussian mixture model fitted for background (b) and foreground (f) in an ROI containing a transition between two structures andis the mean intensity in this ROI. Fig. 3 (right) compares the statistics of and of original video data, super-resolved mosaics and the Kowa images for five subjects using boxplots. For both measures, we evaluated four manually selected ROIs per image.
In our experiments, % of the mosaics were ranked as ’acceptable’ or ’perfect’ without noticeable artifacts and severe registration errors were alleviated by our adaptive mosaicing scheme, see Fig. 2 (top). No image was graded as ’not usable’ and % were graded as ’poor’ due to a low contrast of videos for individual subjects, which is not enhanced by our framework. In the remaining cases, images were graded as ’poor’ due to inaccurate geometric registrations in individual regions, see Fig. 2 (bottom). In these experiments with a non-mydriatic camera, which makes imaging with wide FOV challenging, the proposed framework was able to provide a spatial resolution and FOV comparable to those of high-end cameras, see Fig. 4. In particular, our method was able to double the FOV compared to the original video. In terms of and , we obtained substantial improvements by super-resolved mosaicing compared to original video data and competitive results compared to the Kowa camera. Unlike related methods, the benefit of our framework is that mosaicing is applicable even with mobile and low-cost video hardware within one examination of a few seconds rather than longitudinal examinations. This is essential in computed-aided screening, where our approach provides a relevant alternative to expensive and non-mobile cameras as typically used in clinical practice.
4 Conclusion and Future Work
In this work, we have proposed a fully automatic framework to reconstruct high-resolution retinal images with wide FOV from low-resolution video data showing complementary regions on the retina. Our approach exploits super-resolution to obtain multiple super-resolved views that are stitched to a common mosaic using intensity-based registration with a quadratic transformation model. Using a mobile and low-cost video camera, our framework is able to reconstruct retinal mosaics that are comparable to photographs of commercially available high-end cameras in terms of resolution and FOV.
One scope of future work is mosaicing of peripheral retinal areas as our current framework processes central areas around the optic nerve head. Another promising direction for future research is the formulation of super-resolution and mosaicing in a joint optimization approach to further enhance the robustness of mosaic reconstruction.
-  N. Patton, T. M. Aslam, T. MacGillivray, I. J. Deary, B. Dhillon, R. H. Eikelboom, K. Yogesan, and I. J. Constable, “Retinal image analysis: Concepts, applications and potential,” Progress in retinal and eye research, vol. 25, no. 1, pp. 99–127, 2006.
-  M. D. Abramoff, M. K. Garvin, and M. Sonka, “Retinal Imaging and Image Analysis,” IEEE Rev Biomed Eng, vol. 3, pp. 169–208, 2010.
-  T. Köhler, G. Bock, J. Hornegger, and G. Michelson, “Computer-aided diagnostics and pattern recognition: Automated glaucoma detection,” in Teleophthalmology in Preventive Medicine, Georg Michelson, Ed., pp. 93–104. Springer, 2015.
-  X. Zhang, G. Thibault, E. Decenciere, B. Marcotegui, B. Laÿ, R. Danno, G. Cazuguel, G. Quellec, M. Lamard, P. Massin, A. Chabouisc, Z. Victorb, and A. Erginayb, “Exudate detection in color retinal images for mass screening of diabetic retinopathy,” Med Image Anal, vol. 18, no. 7, pp. 1026 – 1043, 2014.
-  R. Richa, R. Linhares, E. Comunello, A. von Wangenheim, J. Schnitzler, B. Wassmer, C. Guillemot, G. Thuret, P. Gain, G. Hager, and R. Taylor, “Fundus image mosaicking for information augmentation in computer-assisted slit-lamp imaging.,” IEEE Trans Med Imaging, vol. 33, no. 6, pp. 1304–12, 2014.
-  A. Can, C.V. Stewart, B. Roysam, and H.L. Tanenbaum, “A feature-based, robust, hierarchical algorithm for registering pairs of images of the curved human retina,” IEEE PAMI, vol. 24, no. 3, pp. 347–364, 2002.
-  Y. Zheng, E. Daniel, A. Hunter, R. Xiao, J. Gao, H. Li, M. G. Maguire, D. H Brainard, and J. C. Gee, “Landmark matching based retinal image alignment by enforcing sparsity in correspondence matrix.,” Med Image Anal, vol. 18, no. 6, pp. 903–13, 2014.
-  P. C. Cattin, H. Bay, L. Van Gool, and G. Székely, “Retina Mosaicing Using Local Features,” in Proc. MICCAI 2006, 2006, pp. 185–192.
-  K. M. Adal, R. M. Ensing, R. Couvert, P. van Etten, J. P. Martinez, K. A. Vermeer, and L. J. van Vliet, “A Hierarchical Coarse-to-Fine Approach for Fundus Image Registration,” in Proc. WBIR 2014, 2014, pp. 93–102.
-  T. Chanwimaluang, G. Fan, and Stephen R. Fransen, “Hybrid retinal image registration,” IEEE Trans Inf Techol Biomed, vol. 10, no. 1, pp. 129–142, 2006.
-  T. Köhler, A. Brost, K. Mogalle, Q. Zhang, C. Köhler, G. Michelson, J. Hornegger, and R. P. Tornow, “Multi-frame Super-resolution with Quality Self-assessment for Retinal Fundus Videos,” in Proc. MICCAI 2014, 2014, pp. 650–657.
-  Damber Thapa, Kaamran Raahemifar, William R. Bobier, and Vasudevan Lakshminarayanan, “Comparison of super-resolution algorithms applied to retinal images,” Journal of Biomedical Optics, vol. 19, no. 5, pp. 056002, May 2014.
-  A. Zomet and S. Peleg, “Efficient super-resolution and applications to mosaics,” in Proc. ICPR 2000, 2000, vol. 1, pp. 579–583.
-  A. Kürten, T. Köhler, A. Budai, R. P. Tornow, G. Michelson, and J. Hornegger, “Geometry-based optic disk tracking in retinal fundus videos,” in Proc. BVM 2014, pp. 120–125. 2014.
-  G. D. Evangelidis and E. Z. Psarakis, “Parametric image alignment using enhanced correlation coefficient maximization.,” IEEE PAMI, vol. 30, no. 10, pp. 1858–65, 2008.
-  D. Capel, Image mosaicing and super-resolution, Springer, 2004.
-  A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever, “Multiscale vessel enhancement filtering,” in Proc. MICCAI 1998, 1998, vol. 1496, pp. 130–137.
-  R. P. Tornow, R. Kolar, and J. Odstrcilik, “Non-mydriatic video ophthalmoscope to measure fast temporal changes of the human retina,” in Proc. SPIE Novel Biophotonics Techniques and Applications, 2015, vol. 9540, pp. 954006–954006–6.