1 Introduction
Pneumonia is characterized by a host inflammatory response to a pathogenic infectious burden in the distal lung and is usually caused by bacterial infection [1]. However, accurate quantitative diagnosis and monitoring of suspected pneumonia is a challenging task with currently available imaging/diagnostic tools [2]. Optical endomicroscopy (OEM) is a popular method for in vivo imaging of the distal lung, and has recently gained prominence in investigating the presence of bacteria using targeted smartprobe [1]. Smartprobes are specialized molecular agents introduced in the imaging area to make the bacteria fluoresce.
Outlier/anomaly detection problems can usually be addressed using unsupervised or supervised methods [3]. In unsupervised approaches, the objects/anomalies to be detected are learned from the data by fitting them with suitable distributions without using explicitlyprovided labels [4, 5, 6, 7, 8, 9, 10, 11]. On the other hand, considering supervised approaches, the dataset is usually divided into training and testing sets. In the training phase, a model is trained by pairing inputs with their expected outputs, which are also known as the ground truth. The trained model can then be used to estimate the output of the test dataset [12, 13, 14, 15].
In this work, we investigate the performance of a supervised approach for bacterial detection in datasets of OEM lung images [16, 17, 18, 19, 20, 21]. The main contributions of this work are threefold. First, we formulate the problem of simultaneous bacteria detection and background estimation as a (robust) sparse coding problem and use an ADMM algorithm to solve the bacteria detection problem. To the best of our knowledge, it is the first time this problem is addressed by a sparse representation approach. Second, we provide simulations using real datasets, whereby we investigate different bacterial concentrations including control cases in which no bacteria are present, and different SmartProbes that cause weak and strong bacteria fluorescence. Third, we compare the results of the proposed model with bacteria annotations performed by a trained clinician and three widely used spotdetection algorithms, using both dotannotation and countannotation methods.
2 Outlier Detection Formulation
Figure 1 shows an example of an OEM image with bacteria shown within circles that are annotated by a trained clinician. We can observe that bacteria appear as high intensity dots in the image in addition to as bright as background structures representing elastin and collagen, making the differentiation of bacteria a quite challenging task. The problem of bacteria detection is formulated such that, given a test image , a data matrix is formed by splitting the image into a set of overlapping square patches containing
pixels. These patches are vectorized and finally gathered in
. The data matrix can then be well approximated by a sparse linear model, excluding a small number of pixels  the outliers  which significantly deviate from this model. The collection is described as follows(1) 
where is a dictionary assumed to be known, is the sparse coefficient matrix, has few nonzero elements that represents sparse deviations from the linear model representing background image structures, and is a lowenergy noise component, which is assumed to be independent and identically distributed (i.i.d.) Gaussian.
The primary objective here is to estimate the outlier matrix in Eq. (1), given that the sparse coefficients in are also unknown. Thus we proposes to estimate jointly (, ) from the observation matrix . To solve this problem, we propose an optimizationbased method to estimate the unknown parameters.
3 Proposed Model
The recovery of and in Eq.(1) is formulated as the following unconstrained minimization problem
(2) 
where , , similarly , , and and are two positive scalar parameters controlling the degree of sparsity of and respectively. Problem (2) encourages a solution in which is sparse. However, for the outliers that cannot be represented exclusively by , it permits nonzero entries in .
The optimization problem in Eq. (2), although convex, cannot be solved using standard gradientbased methods due to the nonsmooth terms. The core idea is to convert this unconstrained minimization problem into another constrained one by the application of a variable splitting operation (see Eq. (3) below). Finally, the obtained constrained problem is solved with using ADMM [22, 23, 19]. By a careful choice of the new variables, the initial problem is converted into a sequence of much simpler problems, which can be solved iteratively. To solve the problem depicted in Eq. (3), we introduce a new variable for the regularization function in in order to decouple it from the data fidelity term. Therefore, the constrained version of problem (2) can be written as follows
(3) 
The augmented Lagrangian corresponding to the problem in Eq. (3) can be written as , where is the set of Lagrange multiplier corresponding to the splitting, and is a constant. The ADMM algorithm using to solve Eq. (3) (also Eq. (2)) is shown in Algorithm 1. During each step of this iterative scheme, is optimized with respect to (step 2), (step 3) and (step 4), and then the Lagrange multipliers are updated (step 5).
Solving the minimization problems in Algorithm 1 leads to Algorithm 2, where , and soft is the soft thresholding function [24]. The parameter is updated within the algorithm to keep the primal and dual residual norms within a factor of of one another. The stopping criterion we use is , which is the sum of the primal and dual residuals, where [22, 19].
4 Experimental Results
4.1 Datasets
The proposed algorithm is assessed using two datasets of ex vivo ventilated whole ovine lungs with bacteria present. Dataset I contains seven videos assessing a combination of fluorescent dyes (SmartProbes) and bacterial types, including control segments. It contains (i) three videos of ovine lungs instilled with Methicillinsensitive Staphylococcus aureus (MSSA) stained with a commercially available laboratory dye (PKH67, SigmaAldrich), a highly fluorescent cell membrane dye, (ii) two videos of ovine lungs instilled with bacteria (grampositive MSSA and gramnegative Pseudomonas PA3284) stained in situ with an inhouse bacterial detection SmartProbe [1], and (iii) two videos of ovine lungs without the presence of any bacteria. Videos 1 to 5 are instilled with a single concentration of bacteria, equivalent to Optical Density (OD595nm) of 2.
Video 


Fluorophore  Bacteria  
1  26  2  PKH 


2  19  
3  13  
4  32  SmartProbe 


5  19 


6  12  NA  NA  Control  
7  12 
Dataset II contains four videos, each with an increasing bacterial concentration (OD595nm 0.004, 0.04, 0.4, 4), all labelled with an inhouse bacterial detection SmartProbe. This dataset is considered to make sure that as the concentration increases, the counts of the clinician and of the algorithm also increase. Tables 1 and 2 summarise the details of Datasets I and II respectively.
Video 


Fluorophore  Bacteria  

1  14  0.004  SmartProbe 


2  14  0.04  
3  15  0.4  
4  15  4 
The Cellvizio fibred confocal OEM imaging platform (Mauna Kea Technologies, Paris, France) [16, 17] is used to acquire all data in this study. Image sequences of size pixels () are captured at 12 frames per second. Representative frames are selected from each of the video sequences are chosen by a trained clinician. These comprise 133 frames for Dataset I, and 58 frames for Dataset II as described in Tables 1 and 2 respectively. In each frame, a trained clinician marked the coordinates of phenomena that are thought with high confidence to be bacteria. Ambiguous points are ignored.
4.2 Dictionary Learning for Bacterial Detection
Each dataset is split into training and testing phases. In the training phase, one dictionary is learned for each dataset from its corresponding videos; namely for Dataset I and for Dataset II. Every set of frames in each video has a certain elastin and collagen pattern. Hence, one frame from each set is chosen as a representative. This yielded 12 frames from Dataset I and 17 frames from Dataset II. Features are then extracted from each training frame by dividing it into square overlapping patches of fixed size. In this work, we employed a window size with overlap. The patches that are annotated by the clinician as containing bacteria are then excluded from the training dataset (see Fig. 1). The remaining bacteriafree patches are vectorised and gathered in the training matrix for Dataset I and Dataset II. The method of optimal directions (MOD) dictionary learning method [25] is then applied to train the dictionaries. The KSVD algorithm [26] is also investigated but provided similar results to MOD, thus the results are not reported here. Figure 2 shows 30 dictionary atoms learned for a selection of frames from Datasets I.
4.3 Algorithm evaluation
In the testing phase, after the dictionaries have been learned, Algorithm 2 is run for each of the remaining and frames for Dataset I and II respectively, yielding the estimated outlier matrix for each frame. The final outlier image is then reconstructed using these overlapping patches by averaging their intensities. The outlier image is then normalized to range and thresholded (by ), while pixels that exceed this threshold value are counted as a potential bacteria. Since each bacterium corresponds to a set of connected pixels, each group of connected detections is counted as a single detection. The estimated number of bacteria is thus the number of estimated groups and their positions are computed using the barycenter of each region.
Due to the unbalance of this twoclass problem (absence/ existence of bacteria), we consider precisionrecall curves to assess the bacteria detection performance, in which the reference is the set of annotations from the clinicians. Precisionrecall curves are plots of precision versus recall at different cutoff thresholds (different ) for the resulting outlier amplitude image. The precision and the recall can be calculated as respectively, where TP, FN, and FP refer to the number of true positives, false negatives, and false positives respectively. Given the pixel locations where a bacterium has been annotated by the clinician, we defined a disk of radius pixels [27], and we consider that any detection that is present within the disk as a match (TP); any detection outside any of the disks as FP; and any clinician’s annotation that does not match with any of the algorithm detection as FN.
We test different parameters for evaluating the performance of the proposed algorithm. First, we fix the regularization parameter corresponding to the sparse representation matrix to , and vary the outlier regularization parameter (). Second, we investigate the impact of the number of atoms () within the learned dictionary. Finally we vary the outlier amplitude image threshold () between and , and construct the precisionrecall curves accordingly. Statistical comparison of bacterial counts (countannotation) and detections (dotannotation) performed by the trained clinicians and the algorithm output is then considered after choosing the best combination of the parameters described above.
4.4 Results and Discussion
Dotannotation effect: Figure 3
(a) shows a plot of different smartprobes (represented by video ranges) versus different numbers of dictionary atoms and the maximum achieved area under precisionrecall curve (AUC). It can be noted that the bacteria detection performance is enhanced when increasing the number of dictionary atoms. Although a strong smartprobe which produces high fluorescence signals is used for videos 1:3, the reported AUC is close to that for videos 3:4, for which a weaker smartprobe is used. This is because videos 4 and 5 have less elastin and collagen structures, and hence there is lower probability of getting false positive detections. Regarding the control cases (videos 6 and 7), it can be observed that the optimal regularization parameter
(printed in red on top of each bar) is always higher than that of the bacteria stained videos (videos 1:3 and 4:5), which in turn promotes more outlier sparseness and hence fewer counts. Moreover, the AUCs of these videos are lower than those of videos 1:3 and 4:5, as they are not stained by fluorophores and hence makes the fluorescence of bacteria weaker and more difficult to discriminate, stressing the need for SmartProbes for bacterial detection. We also noticed that there a broad range of outlier regularization parameters provides very similar precisionrecall curves, and the results are not extremely sensitive to the value of .Figure 3(b), on the other hand, shows a plot of the four concentrations (represented by video numbers) versus different numbers of atoms and the maximum achieved AUC. We note that there is not much difference in the achieved maximum AUC for the two tested dictionary atom numbers. We also noticed that there is a broad range of the regularization parameter values that provides same AUC.


Countannotation effect: For Dataset I, the algorithm counts are compared with the clinician counts in each frame as shown in Fig. 4. This corresponds to precision of and recall of that also corresponds to cutoff threshold to the outlier amplitude images of . We considered the values of providing the maximum AUC per each fluorophore. We can observe an almost linear relationship between the clinician counts and algorithm counts, with an empirical linear correlation between the manually and automatically detected anomalies as . Furthermore, for videos 1, 2 and 3 in which a highly fluorescent SmartProbe is used, and videos 4 and 5 in which an inhouse SmartProbe which produces weaker fluorescence signals is used, a similar trend is observed between the numbers of clinician’s annotations and the counts provided by our algorithm. This also depending on the type of bacteria the samples are stained with. Videos 6 and 7, which are controls, show minimal annotations and counts, which reflects the ability of the algorithm to differentiate bacterial loads from control.
Similarly, for Dataset II, we compared the clinicianalgorithm counts for different cutoff thresholds , ranging between and , which corresponds to total recall of and precision of , and provided the results in Fig. 5. This corresponds to the counts provided by maximum AUC when different values of are tested. We can observe that the counts of both the algorithm and the clinician increase as the bacteria concentration increases, which reflects the agreement between the approach considered and the clinician’s annotations.
We can also observe that the algorithm counts are higher than those of the clinician for the two processed datasets, as we expect the algorithm to be able to identify dots that are barely visible to the naked eye. Moreover, the clinician did not annotate ambiguous dots, meaning that a number of these are not chosen. This, along with false positives, is the main reason why the algorithm counts are higher than the clinician counts.


Mean number of detections per selected frames in videos 1 to 4 of Dataset II and the corresponding standard deviation. (a) clinician’s opinion, (b) proposed method.
4.5 Comparison with existing approaches
In this subsection, we compare the proposed approach with popular spotdetection methods from the literature, namely the Laplacian of Gaussian (LoG) and its approximation; the difference of Gaussians (DoG) filters [28, 11], and the grey scale opening tophat filter (GSOTH) [10, 9]. These methods, although simple, have been considered in the literature of spot and blob detection in various applications. In this work, from preliminary trials to optimize performance, the LoG filter is implemented by employing a kernel of standard deviation of to each frame. Similarly, the DoG filter is implemented by considering the difference of two Gaussian kernels of standard deviations of and respectively. The GSOTH is employed by first smoothing the input image by a Gaussian kernel to reduce the noise, then by computing the morphological opening of the input image by employing a flat disc, which achieves the best detection results and then subtracts the result from the original image. The same post processing steps described earlier (pixel grouping and computation of the barycenters) are also employed. The comparison is conducted in terms of AUC of the resulting precisionrecall curves, as well as in terms of computation time.
Table 3 compares the maximum achieved AUC of the proposed algorithm for Datasets I and II with those of the three methods described above. We can observe that the proposed algorithm provides the highest AUC for both Datasets I and II. Although the grey scale opening tophat filter provides competitive results for videos 1:3 and 4:5 in Dataset I, it fails to identify the control cases as good as the proposed approach. The LoG and the DoG filters, on the other hand, show similar performance.

LoG  DoG 


Videos  AUC  
Dataset I  1:3  0.754  0.58  0.56  0.749  
4:5  0.8  0.53  0.63  0.786  
6:7  0.27  0.175  0.104  0.172  
Average  0.61  0.43  0.43  0.569  
Dataset II  1  0.32  0.142  0.14  0.257  
2  0.43  0.18  0.268  0.322  
3  0.30  0.09  0.116  0.184  
4  0.26  0.136  0.115  0.226  
Average  0.33  0.137  0.16  0.247 
The average computation times of the four methods are 0.4, 0.11, 0.05 and 0.22 seconds respectively. For the proposed approach, the resulting number of test patches is yielding , and the dictionary tested is . The experiments were conducted on ACER corei32.0 GHz processor laptop with 8 GB RAM. Although the proposed approach provides slightly higher computation time, it crucially brings the benefit of providing higher detection performance with respect to the other three methods.
5 Conclusion and Future Work
In this work, we have demonstrated the performance of a supervised approach for bacterial detection in OEM images of distal lung tissue using targeted SmartProbes. We learned a dictionary for background image structure (elastin, collagen, etc.), which was then used to predict any deviating outliers in testing frames. We have provided simulations on two ovine lung datasets instilled with bacteria, which demonstrated that the estimated bacterial counts correlates with the bacterial counts performed by a clinician and good AUC were achieved. However, precautions should be considered when learning the dictionaries for such problems. While annotating ground truth, it is highly likely that the annotator makes mistakes: they can either falsely annotate a bacterium when it is noise, or simply missannotating a bacterium due to their overwhelming numbers in each frame. These types of error are common in any annotation process, but it might have a more severe impact on learning the dictionary since our target objects are ‘dots’ with similar structure. Therefore, wrongly annotated/unannotated bacteria can provide biased dictionary atoms that cause errors in the estimation process. Current investigations include unsupervised and robust methods for learning the dictionary in the case of absence or unreliability of annotations.
References
 [1] A. R. Akram, N. Avlonitis, T. Craven, M. Vendrell, N. McDonald, E. Scholefield, A. Fisher, P. Corris, C. Haslett, M. Bradley, and K. Dhaliwal, “Structural modifications of the antimicrobial peptide ubiquicidin for pulmonary imaging of bacteria in the alveolar space,” The Lancet, vol. 387, p. S17, Feb 2016, spring Meeting for Clinician Scientists in Training 2016.
 [2] L. Thiberville, S. MorenoSwirc, T. Vercauteren, E. Peltier, C. Cavé, and G. Bourg Heckly, “In vivo imaging of the bronchial wall microstructure using fibered confocal fluorescence microscopy,” American journal of respiratory and critical care medicine, vol. 175, no. 1, pp. 22–31, Jan 2007.
 [3] I. Smal, M. Loog, W. Niessen, and E. Meijering, “Quantitative comparison of spot detection methods in fluorescence microscopy,” IEEE Trans. on Medical Imag., vol. 29, no. 2, pp. 282–301, Feb 2010.

[4]
X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,”
IEEE Trans. on Image Process., vol. 20, no. 12, pp. 3419–3430, Dec 2011.  [5] Y. Altmann, S. McLaughlin, and A. Hero, “Robust linear spectral unmixing using anomaly detection,” IEEE Trans. Comput. Imag., vol. 1, no. 2, pp. 74–85, June 2015.

[6]
P. McCool, Y. Altmann, A. Perperidis, and S. McLaughlin, “Robust Markov random field outlier detection and removal in subsampled images,” in
IEEE Statistical Signal Processing Workshop (SSP), Palma de Mallorca, Spain, Jun 2016, pp. 1–5.  [7] I. Smal, E. Meijering, K. Draegestein, N. Galjart, I. Grigoriev, A. Akhmanova, M. Van Royen, A. B. Houtsmuller, and W. Niessen, “Multiple object tracking in molecular bioimaging by raoblackwellized marginal particle filtering,” Medical Image Analysis, vol. 12, no. 6, pp. 764–777, Dec 2008.
 [8] I. Smal, W. Niessen, and E. Meijering, “A new detection scheme for multiple object tracking in fluorescence microscopy by joint probabilistic data association filtering,” in IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI), Paris, France, May 2008, pp. 264–267.
 [9] D. S. Bright and E. B. Steel, “Twodimensional top hat filter for extracting spots and spheres from digital images,” Journal of Microscopy, vol. 146, no. 2, pp. 191–200, May 1987.
 [10] Y. Kimori, N. Baba, and N. Morone, “Extended morphological processing: a practical method for automatic spot detection of biological markers from microscopic images,” BMC bioinformatics, vol. 11, no. 1, pp. 1–13, July 2010.
 [11] F. He, B. Xiong, C. Sun, and X. Xia, “A laplacian of gaussianbased approach for spot detection in twodimensional gel electrophoresis images,” in International Conference on Computer and Computing Technologies in Agriculture. Beijing, China: Springer, Sept. 2010, pp. 8–15.
 [12] S. Seth, A. R. Akram, K. Dhaliwal, and C. K. Williams, “Estimating bacterial and cellular load in fcfm imaging,” Journal of Imaging, vol. 4, no. 1, pp. 1–11, Jan 2018.

[13]
S. Jiang, X. Zhou, T. Kirchhausen, and S. T. Wong, “Detection of molecular particles in live cells via machine learning,”
Cytometry Part A, vol. 71, no. 8, pp. 563–575, Aug 2007. 
[14]
C. Arteta, V. Lempitsky, and A. Zisserman, “Counting in the wild,” in
European Conference on Computer Vision (ECCV)
. Amsterdam, The Netherlands: Springer, Oct 2016, pp. 483–498.  [15] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in neural information processing systems, Dec 2010, pp. 1324–1332.
 [16] N. Ayache, T. Vercauteren, G. Malandain, F. Oberrietter, N. Savoire, and A. Perchant, “Processing and mosaicing of fibered confocal images,” in Medical Image Computing and ComputerAssisted Intervention (MICCAI): Workshop on Microscopic Image Analysis with Applications in Biology (MIAAB). Copenhagen, Denmark: Springer, 2006, pp. 1–5, invited. [Online]. Available: https://hal.inria.fr/inria00615589
 [17] G. Le Goualher, A. Perchant, M. Genet, C. Cavé, B. Viellerobe, F. Berier, B. Abrat, and N. Ayache, “Towards optical biopsies with an integrated fibered confocal fluorescence microscope,” in Medical Image Computing and ComputerAssisted Intervention (MICCAI). SaintMalo, Brittany, France: Springer, Sept 2004, pp. 761–768.
 [18] N. Krstajić, A. R. Akram, T. R. Choudhary, N. McDonald, M. G. Tanner, E. Pedretti, P. A. Dalgarno, E. Scholefield, J. M. Girkin, A. Moore et al., “Twocolor widefield fluorescence microendoscopy enables multiplexed molecular imaging in the alveolar space of human lung tissue,” Journal of Biomedical Optics, vol. 21, no. 4, pp. 046 009–046 009, 2016.
 [19] A. K. Eldaly, Y. Altmann, A. Perperidis, N. Krstajic, T. R. Choudhary, K. Dhaliwal, and S. McLaughlin, “Deconvolution and restoration of optical endomicroscopy images,” IEEE Trans. Comput. Imag., vol. 4, no. 2, pp. 194–205, March 2018.
 [20] A. K. Eldaly, Y. Altmann, A. Perperidis, and S. McLaughlin, “Deconvolution of irregularly subsampled images,” in IEEE Statistical Signal Processing Workshop (SSP), Freiburg, Germany, June 2018, pp. 303–307.
 [21] A. Perperidis, H. E. Parker, A. KaramEldaly, Y. Altmann, K. Dhaliwal, R. R. Thomson, M. G. Tanner, and S. McLaughlin, “Characterization and modelling of intercore coupling in coherent fiber bundles,” Optics Express, vol. 25, no. 10, pp. 11 932–11 953, May 2017.
 [22] M. V. Afonso, J. M. BioucasDias, and M. A. Figueiredo, “An augmented lagrangian approach to the constrained optimization formulation of imaging inverse problems,” IEEE Trans. on Image Process., vol. 20, no. 3, pp. 681–695, March 2011.
 [23] J. Nocedal and S. Wright, Numerical optimization. Springer Science & Business Media, 2006.
 [24] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forwardbackward splitting,” Multiscale Modeling & Simulation, vol. 4, no. 4, pp. 1168–1200, Nov 2005.
 [25] K. Engan, S. O. Aase, and J. H. Husøy, “Multiframe compression: Theory and design,” Signal Processing, vol. 80, no. 10, pp. 2121–2140, Oct. 2000.
 [26] M. Aharon, M. Elad, A. Bruckstein et al., “Ksvd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. on Signal Process., vol. 54, no. 11, p. 4311, Nov 2006.
 [27] O. Mandula, I. Š. Šestak, R. Heintzmann, and C. K. Williams, “Localisation microscopy with quantum dots using nonnegative matrix factorisation,” Optics express, vol. 22, no. 20, pp. 24 594–24 605, Sept. 2014.
 [28] T. Lindeberg, “Feature detection with automatic scale selection,” International journal of computer vision, vol. 30, no. 2, pp. 79–116, July 1998.
Comments
There are no comments yet.