Probe-based Confocal Laser Endomicroscopy (pCLE) is a recent optical fibre bundle based medical imaging modality with utility in a range of clinical indications and organ systems, including gastrointestinal, urological and respiratory tracts Fugazza2016.
The pCLE probe relies on a coherent fibre bundle comprising many (10k) cores that are irregularly distributed across the field of view (FoV). The nature of image acquisition through coherent fibre bundles constitutes a source of inherent limitations in pCLE, having a direct, negative impact on the image quality. The raw data that the pCLE devices produce, therefore remain challenging to use for both clinicians and computerised decision support systems.
Raw pCLE images are distorted by a few artefacts such as a honeycomb pattern, and so need to be corrected before reconstruction. During calibration and restoration, the raw image is transformed into a vector of corrected fibre signals, and their locations in the space of the fibre FoVle2004towards. The irregular sampling domain of the signals can be accurately discretised as a set of locations in an over-sampled regular grid and then interpolated.
Existing pCLE image reconstruction approaches typically use Delaunay triangulation to linearly interpolate irregularly sampled signals onto a Cartesian grid vercauteren2006robust. These interpolation methods allow to reconstruct the Cartesian image, yet do not enhance image quality nor take into account any prior knowledge of the image space except for regularisation related properties. Moreover, they are themselves prone to generating artefacts, such as triangle edge highlights or additional blur vercauteren2006robust.
Reconstructed pCLE images can be post-processed by restoration and super-resolution (SR) techniques to improve image quality. It was shown that state-of-the-art CNN-based single-image super-resolution (SISR) techniques improve the quality of pCLE images ravi2018effective. A potential limitation in the current CNN approaches is that the analysis starts from already reconstructed pCLE images, including reconstruction artefacts as mentioned above.
There are a few research focusing on allowing sparse data as CNN input eldesokey2018propagating; hua2018normalized; uhrig2017sparsity. Generalising the conclusion from these studies to pCLE hints to the intuition that applying CNNs directly to irregularly sampled pCLE data is far from trivial. We propose a solution that facilitates using sparse images as the input of the SR CNN directly, without the need for prior reconstruction, and also eliminating edge artefacts from input images and compare it to the classical SR methods and reconstruction algorithm.
Since the vast majority of DL techniques, including SISR used for the pCLE reconstruction, rely on Cartesian images, we propose a unified, computationally-efficient, methodology that generalises NW kernel regression as a part of DL framework and allows it to be optimised. The main focus of this work is to compare pCLE image reconstructions obtained from the classical interpolation method and dedicated DL architectures, applied directly to the irregularly sampled or reconstructed Cartesian images respectively.
For the comparison study, we design a novel trainable convolutional layer called an NW layer, which integrates Nadaraya-Watson (NW) kernel regression nadaraya1964estimating into the DL framework, allowing effective handling of irregularly sampled data in the CNN network. It is the first work proposing a unique CNN architecture (referred to as NWNetSR), which takes advantage of the potential of both the NW kernel regression and the SISR to benefit pCLE image reconstruction. To the best of our knowledge, we are the first to propose using NW kernel regression embedded in a CNN framework to design a network for medical image super-resolution reconstruction from irregularly sampled pCLE signals.
2 Related work
pCLE image reconstruction:
The pCLE image reconstruction algorithm in current clinical use is based on Delaunay triangulation. It yields sharp images, but contains triangulation artefacts tom_phd.
Image reconstruction from sparse signals has been widely studied. Specifically, in the context of pCLE, Vercauteren et al. implemented reconstruction from scattered pCLE data with NW kernel regression using handcrafted Gaussian weighting kernels vercauteren2006robust. They demonstrated that the method efficiently reconstructs pCLE images and mosaics, at the price of some additional blur in comparison to Delaunay reconstruction.
It has been shown that CNNs could effectively improve pCLE quality. In the study ravi2018effective
, researchers compared the performance of state-of-the-art SISR networks in reconstructing SR pCLE images. Due to the lack of ground truth high-resolution (HR) images, they used mosaicing to estimate synthetic HR images and simulated pCLE signal loss to create synthetic low-resolution (LR) images. They confirmed both quantitatively and qualitatively that CNNs trained on the synthetic data can improve the pCLE image quality.
Another work showing a promising proof-of-concept for SR reconstruction uses DenseNet trained on pairs of LR and HR patches izadi2018can. The LR endomiscroscopy was simulated by bi-cubic down-sampling of HR confocal laser endomicroscopy (CLE). These CLE images are generated by the Pentax EC-3870FK, and they are not affected by same distortions as pCLE are, because the Pentax device does not use a fibre bundle as an imaging guide. They prove that their DL solution outperforms classical interpolation by recovering HR details and reduces pixelation artefacts when used to super-resolve synthetic LR images.
In the absence of HR pCLE images, also unsupervised blind super-resolution has been proposed. The work contributes a novel architecture based on adversarial training with cycle consistency for pCLE ravi2019adversarial. However, all these works use reconstructed Cartesian pCLE and CLE images as input to the networks.
Sparse CNN inputs:
While convolution layers are widely used, they have been identified as sub-optimal for dealing with sparse data uhrig2017sparsity. Much of the available literature on exploring sparsity in the context of CNN input deals with the irregular data in an intuitive but ad-hoc way: non-informative pixels are assigned zero, creating an artificial Cartesian image. For example, Li et al. used that technique and assigned zeros to the missing points on an LR image li2016vehicle. A similar workaround is to use an additional channel to encode the validity of each pixel like in Kohler et al. they passed a binary mask to the network kohler2014mask. These solutions suffer from the redundancy in image representation due to spurious data is being fed to the convolutional layers.
In a recent study, Uhrig et al. proposed a convolutional layer which jointly processes sparse images and sparse masks to achieve sparsity invariant CNNs uhrig2017sparsity. Their sparse layer is designed to account for missing data during the convolution operation by modelling the location of data points with the use of a mask. This is achieved by convolving the mask with a constant kernel of ones while optimising the solution through convolving the sparse image with trainable kernels.
Following the success of sparse CNNs, Hua, J. et al. proposed to implement normalised convolution, as an extension of sparse convolution hua2018normalized. They showed that using shared positive kernels for convolution with both an image and a mask is beneficial for upsampling depth maps. In both aftermentioned works, the information on sparsity is propagated to consecutive layers by the binary mask.
A demonstrated improvement of the proposed solutions is to use soft certainty maps, rather than propagating binary masks eldesokey2018propagating. These maps are produced by updating the mask with the convolution. This method worked well in a guided depth upsampling task and uses both RGB data and LiDAR to reconstruct depth maps.
3 Materials and methods
Since common Image Quality Assessment (IQA) relies on ground-truth images used as a reference in metrics such as the Peak signal-to-noise ratio (PSNR), the lack of ground-truth high-resolution pCLE images makes it difficult to evaluate the quality of SR reconstructions.
To address the lack of the HR pCLE, the authors in ravi2017deep proposed to use a first-generation SR method - an offline mosaicing - to simulate HR endomicroscopy. They used mosaics as a source of HR content; Unfortunately, these mosaics are not perfect enough estimate of HR images. The mosaicing resolves SR image from utilising overlap of video frames, and therefore it suffers from miss-registration artefacts, and not a uniform overlay of the frames, which cause nonuniform contribution of SR-resolving capabilities of the mosaicing on the entire surface of the SR image. The mosaicing is also time-consuming, making it not applicable to the real-time workflow of pCLE.
In this work, similarly to ravi2017deep, we used a triangulation based reconstruction algorithm to simulate synthetic HR and LR endomicroscopy. However, in contrast, we took advantage of the availability of histopathological images as a source of HR signals instead of using imperfect mosaics. During the diagnostic process, histopathological images play a similar role to pCLE. Since histopathological images are acquired with a digital camera, histology does not suffer from the problems created by irregularly distributed fibre signals.
Synthetic images were created from high quality, large histopathological images. We created three sets: a training set built with 540 files, a validation set built with 227 files randomly selected from publicly available databases111https://zenodo.org/record/1214456#.XbBaDnVKhy0,222http://www.andrewjanowczyk.com/deep-learning/,333https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SI32FV of histological images from data sets data2016; patho; DVN, and a synthetic test set. The synthetic test set was created with ten histopathologies from publicly available data called Kather444https://zenodo.org/record/53169#.W6HwwP4zbOQ published by kather2016multi. The synthetic test set facilitates making a comparative study between baseline solution, our proposed methodology, and classical CNN solutions. The simulation steps are illustrated in Figure 1.
3.1 Simulation of synthetic pCLE images
Synthetic HR pCLE:
In the first part of the simulation, we transformed RGB HR histological images into grayscale HR pCLE-like videos. The simulation starts with transforming RGB images into grayscale images. Next, we randomly selected original pCLE videos from Smart Atlas andre2011smart, retrieved information on a bundle FoV and fibres locations for each video, and we matched pCLE metadata randomly with the histological images. To crop pCLE-like frames from the grayscale image we were moving a bounding box of the fibre’s FoV from left to right, and from top to bottom in the image, with step size equal to half of the bounding box size. These pCLE-like frames were stacked to create the synthetic HR pCLE video sequence.
Synthetic LR pCLE:
To simulate LR pCLE videos, we used the physically-inspired pCLE-specific downsampling presented in ravi2017deep. In our case, the sources of irregular signals for physically-inspired pCLE-specific downsampling are HR synthetic pCLE videos. They are rich in high frequencies and pixel-level details.
For every synthetic HR pCLE, we used fibres location to build Voronoi diagrams with each fibre in the centre of the Voronoi cell. Every cell corresponds to the one fibre signal, yet cell space covers several pixels around the fibre on HR image. Thus to simulate signal loss, all HR pixels in that cell are averaged, and the average HR signal is used as a new fibre signal on the LR image.
Typically, pCLE noise is interpolated onto the reconstructed image from noisy signals. To achieve that, we simulate pCLE noise by adding it to the new fibre signal, before the interpolation step. We add multiplicative and additive Gaussian noise to mimic a calibration imperfection and an acquisition noise ,respectively.
HR pixels with added noise are used as the irregularly sampled signals and reconstructed to the synthetic noisy Cartesian LR image. The reconstruction is performed as interpolation based on the Delaunay triangulation LeGoualher2004. The synthetic LR pCLE has the size of the HR image, yet it is characterised by the lower image quality, noise, and reduced content of information due to simulated signal loss.
Thanks to simulating signal distribution through the geometrical position of the fibres in the bundle, we simulate synthetic endomicroscopy as similar to real pCLE characterised by typical triangulation artefact, and noise patterns. In our experiments, synthetic endomicroscopies are used as a synthetic equivalent of real pCLE images.
3.2 Trainable NW kernel regression
Irregularly sampled data can be represented, with an arbitrary approximation quality, on a fine Cartesian grid as the sparse artificial Cartesian image with all non-informative pixels set to zeros. To reconstruct sparse images to Cartesian images, the missing information is typically interpolated. This also means that the reconstructed images are over-sampled, and only a subset of the pixels carry information ravi2018effective.
We represent the sparse pCLE image as an artificial Cartesian image. Intuitively, the image sparsity is encoded by ones and zeros for the informative and non-informative pixels respectively as a binary mask whose shape is the same as . A sample and its corresponding is depicted in Figure 2.
can be input into any CNN. Yet, a convolutional layer is defined on the Cartesian grid and considers all image pixels as equally important regardless of their position. Thus, the CNN network has to learn not only the mutual relations of informative pixels but also how to handle their sparsity, which makes the optimisation a difficult task.
We compare the applicability of a classical CNN network and the proposed generalisation of NW kernel regression to pCLE image reconstruction from sparse data. To incorporate NW kernel regression into the CNN framework, we propose a novel trainable CNN layer henceforth referred to as an NW layer, which models the relation of the data points by use of custom trainable kernels to perform the local interpolation. We define the core NW operation as:
The NW layer takes as input and the corresponding . The mask can be seen as a probabilistic sparsity map. Initially , and the next is updated as described in equation 2
and becomes an approximation of the probability of obtaining reconstructionsgiven . The holds the arbitrary probabilistic sparsity patterns, which are then propagated deeper to the consecutive NW layer. The outputs of the NW layer are reconstructed feature maps estimated using an NW regression and updated probabilistic sparsity masks . Finally, bias is added to . The graphical representation of the NW layer is presented in Figure 2.
Classical NW kernel regression uses handcrafted, positive kernels. For a generalisation of the kernel regression, however, our trainable NW layer allows for negative values. It is necessary for the convolution of the mask rely on the absolute value of , as this operation is meant to capture the geometric influence of neighbouring pixels on the predicted values of . For numerical stability, we also normalise the kernels such that , where is a weight in position in the .
Multiple NW layers can be stacked to generalise and benefit NW kernel regression for a irregularly sampled pCLE data. This in turn facilitates end-to-end pipelines that can incorporate sparse inputs by combining NW layers with classical CNNs. As illustrated in Figure 2, we show how to combine NW layers into deep(er) network. Each NW layer has unique kernels . The first layer takes as input and binary and returns feature maps and updated sparsity masks which become the input for the next NW layer. The last NW layer of the NWNet framework returns only feature maps , and masks are discarded.
Application to endomicroscopy image reconstruction:
NWNet in combination with complex deep learning models, such as EDSRlim2017enhanced, may allow for reconstruction of higher quality pCLE than the more typically used interpolation.
Inspired by EDSR network lim2017enhanced, we design two architectures CNNnetSR and NWnetSR presented in Figure 3. CNNnetSR was designed to perform the SR task from Cartesian or sparse LR pCLE images. The network is very similar to EDSR architecture lim2017enhanced
, but with two small improvements. First, we do not use upsampling layer, because synthetic LR and HR have the same size. In place of upsampling layer, we put the convolution layer with 32 filters. Second, the last convolutional layer aims at fusing the output feature maps from the penultimate layer into the Cartesian image. We find it more beneficial to use the kernel of size one, which is commonly used to reduce the number of features maps, than three. All convolutional layers have the kernel size 3. The Last layer uses a linear activation function.
We want to take advantage form SISR for NW kernel regression, so we design NWnetSR based on CNNnetSR by partly replacing standard convolution with NW layers. For fist NW layer, the kernel size is 9 across each image dimension. The size was chosen based on the known distribution of fibres across a Cartesian image to ensure that each convolution would capture more than 10 informative pixels. For deeper NW layers kernel size is 3. The NW weights were initialised with a truncated normal distribution with mean, and standard deviation equal to 0.2 and 0.05 respectively.
3.3 Implementation details
To facilitate training, video sequences were normalised for each frame individually by subtracting mean and standard deviation of LR frame as follows: and , and then scaled to the range [0,1]. Synthetic LR images were transformed to the synthetic sparse LR images by setting zeros to all pixels which do not correspond to any fibre signal. The masks were generated as a binary image, where fibre positions from the LR image are set to ones. Lastly, to perform batch-based training, we extracted non-overlapping sparse and Cartesian patches for the train and validation sets. The test sets were built with sparse and Cartesian full-size synthetic pCLE images.
To achieve the best training results for each model individually, networks were trained with Population Based Training (PBT) jaderberg2017population
. PBT is it an optimisation technique design to find the best training parameters for the network. During PBT training, a population of the models with different parameters is trained, these models are periodically validated, and the weights from the best performing model in the population are copied to other members of the population. We set the population size to 6 workers, where each member of the population uses a single GPU optimising a network for epoch in one PBT iteration for a total of 100 iterations. The perturbation interval was set for every 20 iterations. The hyperparameter search applied to the 6 learning rates, which were initially set towhere . We used the Adam optimiser and set =0.9, =0.999, . Based on results presented in ravi2018effective, the models were trained with SSIM+L1 loss zhao2015loss. Finally, the best performing model from the population is used to generate results on the test sets described in Section 3.
We compared performance of three DL models: CART for CNNnetSR trained with Cartesian input, SPARSE for CNNnetSR trained with sparse input and NW for NWnetSR trained with sparse input and corresponding mask. These models were handling irregular signals as input differently. As a baseline method, we used the golden standard reconstructions algorithm currently implemented in the clinical setup, which is based on linear interpolation and Delaunay triangulation vercauteren2006robust. We also provide a comparison to reconstructions obtained using NW kernel regression with a single Gaussian kernel tom_phd. The final performance of the models is evaluated by comparing reconstructed SR pCLE with the HR synthetic images from the test set.
To quantify how the standard convolution performs on sparse pCLE images in comparison to using Cartesian reconstructions as the input, we trained two unique models based on CNNnetSR network: CART trained using reconstructed Cartesian images, and SPARSE one trained with sparse images.
To test NW kernel regression benefits from generalisation via learning multiple kernels, we trained the NWnetSR as SISR network for the task of pCLE SR reconstruction with sparse input images and masks.
To measure the image quality of the SR pCLE reconstructions, we design an image quality assessment (IQA) procedure which consists of two complementary metrics typically used for this task: peak signal-to-noise ratio (PSNR) and the Structural SIMilarity index SSIM wang2004image.
The results computed for the images from the test set, described in Section 3, are shown in Table 1. The main observation is that each of DL model, including NW outperforms the baseline interpolation technique. Training of the NWnetSR as SISR network, generalise NW kernel regression and outperforms traditional NW regression with hand-crafted Gaussian kernel, proving that use of many kernels which are estimated on the data set are more beneficial then custom single Gaussian kernel. There is no significant improvement for PSNR and SSIM scores when comparing DL models between each-other. They perform almost indistinguishable. We also provide example reconstructions in Figure 4, and it can be noticed easily that all DL images are almost identical.
We analysed SR reconstructions qualitatively, and we can observe two tendencies for all DL models. First, SR reconstructions differ slightly on a pixel level, but that differences do not affect how the video is perceived as a whole, and maybe a reason for slightly different metrics score during IQA. Second, in the opinion of our clinical collaborators, SR reconstructions are more aesthetically pleasing than baseline reconstructions INTER and NW GAUSS. Neither of SR has triangulation artefacts, and every SR reconstruction has significantly reduced noise, additionally benefiting from improved contrast and visibility of details.
The results confirm that the NW layer among with other layers handling sparse data is a choice for image reconstruction and yield increasingly good image quality results from sparse pCLE data to Cartesian SR image.
5 Discussion and conclusions
The proposed CNN layer enables the use of sparse images as input to the CNNs and learns the sparse image representation. In the context of pCLE, this is the first work which proposes end-to-end deep learning image reconstructions from irregularly sampled fibre data.
NW layer is used as a building block in dedicated architecture NWNetSR which performs deep generalised Nadaraya–Watson kernel regression. NWNetSR capture data sparsity and learn reconstruction kernels for sparse data. We demonstrated that among with other CNN-based solutions, NWNetSR also improves the image quality of pCLE images. We proved that super-resolved pCLE images have better quality than the one interpolated with use of the baseline method; thus the proposed super-resolution pipeline outperforms the currently used reconstruction method.
Deep learning methods for regularly sampled data may be transferred to sparse data by adopting NW layer in standard CNN networks. We have shown the successful implementation of the reconstruction pipeline, which combines NW and CNN layers, and is trained in a supervised manner as SISR, reconstructing super-resolved images from sparse input images.
This work was supported by Wellcome Trust: 203145Z/16/Z; WT101957; 203148/Z/16/Z, and EPSRC: NS/A000050/1; NS/A000027/1; EP/N027078/1. This work was undertaken at UCL and UCLH, which receive a proportion of funding from the DoH NIHR UCLH BRC funding scheme. The PhD studentship of Agnieszka Barbara Szczotka is funded by Mauna Kea Technologies, Paris, France. Tom Vercauteren is supported by a Medtronic / Royal Academy of Engineering Research Chair: RCSRF1819/7/34.
6 Compliance with ethical standards
Conflict of Interest: The PhD studentship of Agnieszka Barbara Szczotka is funded by Mauna Kea Technologies, Paris, France. Tom Vercauteren owns stock in Mauna Kea Technologies, Paris, France.
The other authors declare no conflict of interest.
Ethical approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent: For this type of study formal consent is not required. This article does not contain patient data.