Automatic analysis of cells, microorgansims, or other subcellular features within microscope images is essential for a wide range of biomedical and diagnostic applications. Over the past several years, the application of convolutional neural networks (CNNs) has dramatically improved the performance of different counting tasks in various scenarios, including for cancer and tumor diagnosis[9, 10], infectious disease detection[47, 11], and computerized automation of complete blood cell (CBC) counting. However, the overall efficiency of the automated microscope-based cell counting is still constrained by the limited field of view (FOV) of conventional microscopes, especially for tasks which require high-resolution images . This limitation is compounded by the fact that for many applications, the diagnostically-relevant features of interest is highly sparse. For example, to diagnose infection with the malaria parasite (Plasmodium falciparum), a trained expert often needs to examine a blood smear with a 100 microscope objective to identify the parasite across several hundred fields of view, with the aim of visually identifying just one or a few parasites , which can lead to a serious bottleneck within the diagnostic pipeline.
A core reason why we cannot simultaneously obtain high spatial resolutions over large FOVs stems from lens aberrations. The higher the desired spatial resolution, the more difficult it is for lens designers to correct for aberrations at off-axis positions. As a result, most standard microscopes are limited to a space-bandwidth-product (SBP) of 10s of megapixels . Current solutions to the limited SBP problem include whole slide imaging (WSI) scanners [53, 54], Fourier ptychographic microscopy (FPM)[6, konda2020fourier, zheng2021concept], and multi-aperture systems [brady2012multiscale, fan2019video]. However, WSI requires high-precision mechanically scanning of the sample or the imaging system [bueno2014automated], which in turn may require repeated focus adjusting, making the entire process time consuming and expensive [evans2018us].
In this work, we propose a new solution that overlaps multiple microscope images from different FOVs onto a single image sensor. After acquiring a single snapshot, a machine learning algorithm then extracts relevant information over the larger effective FOV for cell counting tasks. Our method takes advantage of two qualities of many cell counting tasks. First, the target itself is often sparse and thus has low probability of overlapping with each other in the composite image. Second, the positions of the cells are not important, and thus, unlike previous overlap-based multiplexed imaging approaches[15, 16, 17], we do not need to reconstruct the extended FOV, but rather the CNN can be trained and deployed on image patches . Thus, in forming superimposed images, our approach is efficient not only in the data capture, but also in the computational analysis. We demonstrate our approach in experiment on the specific clinical tasks of white blood cell (WBC) [50, 49] and malaria parasite counting .
1 Proposed method
1.1 Principle and design of overlapped imaging
Fig. 1 presents an overview of the general principle of overlapped imaging for rapid classification and counting. Instead of relying on a single objective lens, our goal here is to use an array of sub-lenses, each of which images a unique, independent FOV onto a common sensor. With sub-lenses, our approach can in principle capture light from an times larger FOV as compared with a standard microscope, albeit from disjoint slide areas (i.e., FOVs that are not necessarily directly adjacent). While the individual sub-images overlap and thus reduce image contrast, experimental results show this approach is still effective for tasks where the goal is to search for sparsely distributed features across a relatively uniform or repetitive background, such as detecting WBCs or malaria parasites from blood smears.
For most of our experiments, we used sub-lenses (Fig. 1b(ii)) to image up to seven unique FOVs, as this number of lenses leads to an efficient hexagonal packing geometry and offers a good balance between the increase in effective FOV and the proportional decrease in dynamic range. In our imaging configuration, all lenses are placed parallel to the image sensor (Sony IMX477R) to ensure all the FOVs are approximately in focus at the same plane. The sub-lens size, spacing, object distance, and image distance (detailed in Table 1) were chosen to ensure that the individual image from every sub-lens covers the majority of the sensor. While WBC and malaria parasite detection tasks are traditionally implemented with microscopes with more than 40 magnifications, here we used slightly lower resolution (25), based upon findings in our prior work .
Specifically, we set the working distance to the microscope slide, , to 1.2 mm and the image distance, , to 30 mm, creating magnification sub-images. The diameter of each sub-FOV at the object plane, , and its associated diameter at the image plane, , are set by the selected lens and obey the relationship . In our experiments, = 0.84 mm and = 21 mm. The distance between each lens within our lens array, mm, defines the distance between the center of the FOV of each sub-lens at the sample plane. Each sub-FOV of diameter = 0.84 mm was thus separated from adjacent sub-FOVs by approximately mm as well (Fig. 1b(i,iii)). Using = 7 lenses in total leads to a total array diameter of approximately mm, which approximately matches the width of a typical blood smear slide (typically around 1.5 cm 2 cm). Pinhole illumination arrays are also used to prevent cross-talk of light from individual FOVs. A complete list of the parameters used in our initial experiments is presented in Table 1.
|Number of lenses||7|
|Object distance||1.2 mm|
|Image distance||30 mm|
|Inter-lens spacing||3.7 mm|
|Resolution(full pitch)||2 m|
|Sub-FOV diameter, object||0.84|
|Sub-FOV diameter, image||21 mm|
|Total lens array width||11.1 mm|
|Sensor dimension (width)||6.287 mm|
|Number of pixels||12.3 MP|
Although overlapped imaging increases the effective FOV of our microscope, the general approach suffers from two disadvantages: first, the dynamic range of each sub-image, , decreases with an increase in the image overlap number, . Image sensors exhibit a limited total dynamic range, . Typically, grayscale values per pixel for an 8-bit sensor, so to avoid sensor saturation, these values must be divided amongst all sub-images. On average, we can expect the sub-image dynamic range to be . For example, with
= 7 overlapped images, an 8-bit sensor can only dedicate 36 - 37 grayscale values to each sub-image. For tasks where grayscale variations are important for accurate classification, it is beneficial to perform overlapped imaging with either a high-dynamic-range sensor or a high-dynamic-range capture strategy on a standard sensor. Second, the signal-to-noise ratio (SNR) of each sub-image decreases with an increase in the image overlap number,. Assuming that the image sensor can capture photons before saturation when detecting just a single non-overlapped image ( = 1), there are on average photons per sub-image. Assuming shot noise as the dominant noise source, our SNR per sub-image scales with the square root of the number of photons per sub-image, 1/ .
As we will see, these disadvantages, nonetheless, may not dramatically impact the accuracy of current deep learning-based classification for a wide selection of tasks. Images often contain a high amount of redundancy, especially when the end goal is a global classification decision. To verify our idea, we show how WBCs and malaria parasites can still be accurately detected from overlapped images with extended FOVs in both simulation data with a realistic noise model and experimental data from our prototype microscope.
1.2 Deep learning-based cell detection
In this work, we adopted a 10-layer VGG-like framework with He-normal initialization[he2015delving]
and leaky ReLU activation to detect target objects from acquired overlapped image data. We use five sets of
convolutional filters and a single fully connected layer, with each set containing two 2D convolutions with the second convolution having a nonunary stride to reduce the spatial dimension of the tensor.
In a number of learning-based tasks, annotation of regions of interest (ROIs), such as bounding boxes and segmentation masks, are desired. To avoid the relatively tedious process of segmentation mask annotation and to ensure our model remained as widely applicable to different input data types as possible, regardless of the requirement of special annotation, we adopted a binary-classification framework for image post-analysis. During training, we utilized image patches as input. During testing and experimental use across full-FOV image data, we then generated classification heatmaps via a standard sliding window approach[74, 75, 76], which is widely used for spatial density prediction. Our networks’ resulting heatmaps include the probability of the presence of target objects at each pixel over the whole FOV of the various overlapped images, from which various statistics, including counting, are subsequently derived.
2 Simulation procedures and datasets
2.1 Simulation of overlapped images with correct noise statistics
We first demonstrated our method in simulation by applying it to digitally overlapping images (Fig. 2). For a given number of lenses , we created synthetic overlapped imaging datasets by digitally averaging images, yielding . To counteract the resultant artifactual improvement in SNR, as expected from averaging independent measurements, we added Gaussian noise
, whose standard deviation was scaled based on the averaged value at each pixel location. Finally, the resultant image was quantized into 8 bits, simulating sensor digitization. See Appendix for noise model details.
For each cell-counting task, we dynamically overlapped small image patches across the available FOVs. The overlapped images were assigned labels based on the presence of target objects in at least one of the sub-FOV images. Fig. 2b illustrates this process for the
= 3 case. Our synthetic approach allowed us to easily vary the number of overlapped images. We capitalized on this degree of freedom to characterize the performance of our system as a function of number of overlapping images for each task. This method of synthetically creating overlapped images also serves to augment our data for CNN training – by varying which regions are overlapped, the number of overall unique overlapped images increases combinatorially with.
2.2 Datasets for cell-counting tasks
We first conducted a simulation experiment to study the effectiveness of our method to automatically detect malaria parasites in thick blood smears. This task is based on an open-source dataset that we term SimuData, where thick blood smears were stained and imaged via a modified mobile phone camera. This dataset contains 1800 large-FOV images captured at 100 magnification, each with a pixel count of 30244032, from 150 individual patients. Each image was labelled by an expert to indicate where in the FOV the malaria parasite was visible. This task represents an ideal scenario for overlapped imaging, given the high contrast and sparsity of the parasites. The patients were split into training, validation, and test sets (, , and of patients, respectively). For the training and validations datasets, square regions of 9696 pixels were extracted from the full FOVs (1400 for training, 600 for validations) and marked infected if the parasite annotation lay within the inner third (3232 pixels) of the image. The datasets were balanced, so that the infection to no-infection ratio was 1:1.
Using the proposed overlapped imaging system (Fig. 1) with physical parameters specified in Table 1, we collected images from Wright-stained human peripheral blood smear slides (Carolina) to form the DukeData dataset. The task of interest was to automatically identify and count WBCs from within acquired images with different amounts of overlap. For experimental data collection, we first imaged peripheral blood smear slides with the seven sub-lenses of our microscope individually, using the illumination provided by a single white LED for each captured image. This produced a set of seven non-overlapped images per specimen position. Subsequently, from the seven captured “sub-images", we cropped and resized 500 WBCs and 500 RBCs patches to be pixels in size to form a “single-lens" dataset. This dataset was used create synthetically overlapped image data using the procedure described above for SimuData, with which we trained CNNs for a variable number of overlaps . The simulated datasets were split into training and validation with a ratio of 7:3.
Finally, = 2-7 sub-lenses were used to simultaneously image and capture physically overlapped images to form experimental datasets. We captured 35 groups of data, where one “group” of data includes 7 non-overlapped, single-FOV images and 6 overlapped images with different levels overlap ( = 2-7, by blocking sub-apertures). The non-overlapped, single-FOV images were used to provide accurate annotations for locations of WBCs in the corresponding overlapped images. In these 35 groups, a total of 43 WBCs were observed. A sliding window approach with a 10-pixel step size was used to split the whole image into pixels pitches. Patches containing whole WBCs were labeled as positive, while patches only containing RBCs and background were labeled as negative. The CNNs trained on synthetically overlapped data were applied to these experimentally-overlapped data to evaluate performance.
3.1 Simulation results
We first investigated the impact of overlapped imaging (i.e., the resulting reduced contrast and SNR) on classification accuracy in simulation with a malaria parasite counting task (SimuData). We characterized classification performance across a wide range of by digitally adding images and noise as discussed above and in the Appendix (Fig. 3). Fig. 3a shows how the number of overlapped images impacts the classification task performance. In particular, we can see that although performance degrades roughly linearly with the number of overlapped images for detecting the malaria parasite, at = 7, a detection accuracy above 80% is still maintained for both the training and validation sets.
A receiver operating characteristic (ROC) curve for classifying the malaria parasite for= 1 - 7 overlapped images is shown in Fig. 3b, with the area under the curve (AUC) for each curve displayed in the legend. The ROC gives a different perspective on task performance, showing how with a relatively low false positive rate we can achieve a high true positive rate, even for the highly overlapped condition of = 7. The simulation results thus show that under a certain degree of overlap, the CNN model can still identify targets with relatively high accuracy.
3.2 Overlapped imaging system characterization
We characterized the resolution of our overlapped imaging system by capturing unoverlapped images of a USAF target (Fig. 4a-c). First, we used an iris to restrict incident light to illuminate the sample beneath just one sub-lens at a time. This allowed us to effectively capture each of the seven sub-images, one at a time, by simply moving the position of the iris. An example segment of one image of the resolution target, positioned and illuminated from the center sub-lens, is shown in Fig. 4a. Here, we can resolve group 8 element 6, demonstrating a maximum full-pitch resolution of approximately 2 µm. We note here that the illumination source (a single white LED placed 10 mm away) provides spatially coherent illumination to the sample, suggesting that it may be possible to improve image resolution using a lower coherence source. At the same time, our source has a spatial coherence length less than w, thus ensuring that the overlapped image is an incoherent superposition of all sub-images.
With the resolution target in the same position, we then opened the iris to illuminate the entire resolution target slide, imaging through all 7 sub-lenses and capturing an = 7 overlapped image (Fig. 4a, bottom). As our USAF target slide only contained features across a 0.5 mm diameter area, most of the other sub-images that contribute to this overlapped image do not contain any obvious features and simply decrease the image contrast, as expected. We quantified the decrease in image contrast by taking traces through the resolution target at similar locations (group 8, element 1, shown as the colored horizontal line in each image). These trace values are plotted in Fig. 4d (averaged over 20-pixel rows). We used the maximum difference between peak and valley in each trace curve to define the contrast in the corresponding image. Here, a normalized contrast of 0.9 for the single non-overlapped image (Fig. 4d, top) dropped to approximately 0.15 for the = 7 overlapped image (Fig. 4d, bottom), roughly as expected. Deviations from our expected image contrast drop of 7 may be attributed to a slightly non-uniform brightness of each sub-image across the image plane.
We repeated this resolution target imaging experiment for all 7 of the sub-lenses, shifting both the resolution target (to lie beneath each sub-lens) and the iris (to selectively illuminate the sample) to 7 unique positions. At each position, we captured a non-overlapped image and opened the iris to capture an = 7 overlapped image. Results from performing this experiment with two other sub-lenses in our lens array are shown in Fig. 4b-c. We can see that the resolution is approximately constant over the entire sub-FOV of each sub-lens, with a cutoff resolution of approximately 2 µm (see blue boxes). Furthermore, the contrast drop for each sub-lens is approximately constant at . However, the sub-lenses do contain some non-uniform intensity variations, which we attribute to imperfections in the mounting process, as well as non-uniformities across the sample.
3.3 Experimental results
As a preliminary investigation of automated cell counting with our experimental microscope for overlapped imaging, we collected and processed the DukeData dataset (see Sec. 2.2), with results in Fig.5-7. When only the center lens is used, the system has a resolution comparable to that of a standard 25 microscope, allowing our proposed system to maintain crucial morphological features of different types of WBCs (see Fig. 5a).
As a first test, we digitally-overlapped DukeData images and examined WBC classification accuracy as a function of (see Fig. 5). As shown in Fig. 5b, the distinction between a WBC and red blood cells (RBC) or other background material becomes relatively visually unclear at or 4 overlapped images. However, our CNN classifier still maintains 95% accuracy at = 5, and monotonically decreases with increasing overlap (Fig. 5c) as expected. Due to dynamic augmentation during training, the validation accuracies in this series of tests is slightly higher than the training accuracy. These trends are consistent with those observed in the malaria parasite counting task (Fig. 3).
Next, we attempted to automatically identify WBCs in experimentally-overlapped images, using CNN models pre-trained on digitally-overlapped data (as described in Sec. 2.2). A unique CNN model was used for each value of . Each model was independently trained 3 times with different random seeds, with their predictions averaged. Results are reported as both ROC curves (Fig. 6a) and confusion matrices (Fig. 6b-h) for each overlap condition (
= 1-7). For the confusion matrices, we chose the threshold for the CNN outputs based on that which maximizes the geometric mean of the true positive rate and the true negative rate[xie2020effect], which accounts for dataset imbalance. In this experiment, we obtained the following aggregate detection accuracies for = 1 - 7: 96.7%, 89.7%, 81.7%, 68.9%, 71.1%, 64.0%, and 59.6%.
Models obtained relatively high true positive rates for low overlap ( = 1 - 4: 94.0%, 84.5%, 74.5%, 74.0%), with overall performance generally decreasing with , which is consistent with our previous simulation results. The ROC curves also give similar trends (Fig. 6a). Deviations from the expected strictly monotonic trend and the fact that performance is overall worse than with the simulated datasets are likely due to imperfections of our first experimental prototype, such as differences in brightness or color balance for different overlapping conditions.
In a third set of experiments, we applied our trained CNNs to entire images by passing a sliding window as the input to generate co-registered classification heatmaps (Fig. 7). Fig. 7a shows the non-overlapped images collected by 7 single sub-lenses of the proposed system, which were used for ground truth WBC annotations (dotted circles). Fig. 7b shows the overlapped images experimentally captured by the proposed microscope, with the classification heatmaps overlaid in the bottom row. These results qualitatively confirm that models trained with digitally overlapped data can still identify the WBCs accurately under = 4 overlap for subsequent counting, despite decreased contrast. In future work, we aim to investigate post-processing strategies to facilitate high-accuracy cell counting from such acquired overlapped image heatmaps.
4 Discussion and conclusion
In this work, we have demonstrated a new imaging system that can capture and overlap images of multiple independent FOVs on a common detector, which may offer a significant speed-ups for tasks requiring analysis of large FOVs. For the malaria parasite and WBC counting tasks, we investigated the relationship between CNN-based classification accuracy and number of overlapped images. For current CMOS image sensors, we showed that it is in principle possible to overlap up to 4 images while maintaining over 90% accuracy on standard 8-bit detectors. We then presented initial results from our prototype hardware system, which can capture 2-µm resolution images over a several square millimeter areas using a set of seven small lenses. This resolution and imaging FOV of one sub-lens of our first prototype is comparable to those of a standard 25 objective lens. We also showed that CNNs trained entirely on synthetic data is able to generalize to experimental data for the WBC counting task.
There are a number of avenues for future work that extend our proof-of-principle experiments here. Apart from exploring different CNN architectures, such as those tailored to segmentation tasks [ronneberger2015u], we could consider other types of annotation appropriate for cell-counting tasks, such as global count . In addition to sparse cell-counting tasks, we envision our technique being applied to other similar tasks, such as identifying defects in otherwise pristine surfaces, such as semiconductor wafers. Alternative hardware implementations may also improve performance for such tasks. For example, while our current prototype uses brightfield illumination, it may be advantageous to use darkfield illumination, which would substantially reduce the background and thus improve the contrast of the overlapped images. Further, using higher-dynamic-range sensors could compensate for lost contrast due to overlapping. Another potential direction is to use the recently proposed random access parallel microscopy setup , which images multiple FOVs sequentially with a single parabolic mirror as a common tube lens. Such a setup has the advantage that all sub-images overlap completely on the sensor.
We are hopeful that our initial demonstration will encourage additional exploration into the various benefits of overlapped imaging, in particular when coupled with machine learning to automatically process the acquired data. As machine learning techniques continue to hit impressive benchmarks, we believe that our approach can leverage successive advances and inspire new research into high-throughput imaging devices that do not necessarily clearly resolve an entire scene or sample, but can still excel at specific tasks.
Appendix A Proof of the noise model
Here, we provide additional details about the noise model used to ensure our synthetically overlapped image data exhibits an experimentally accurate SNR.We prove that adding the Gaussian random variable,
pixel-wise (indexed by and ) to the digitally superimposed images produces an image with the correct noise statistics in the shot noise limit. The factor of in Eq. 1, where is the pixel well depth, simply converts photoelectron count to an value.
Let be the number of photons coming from the
FOV, which follows a Poisson distribution with an unknown rate parameter. Then the total number of photons detected from all fields of view is , where . Thus, , which are are the desired target statistics. However, in the case of our digitally simulated overlapped images, we collect single field of view images with times larger illumination intensity, such that . Then, as described in the main text, the simulated value is, , where we are ignoring discretization effects for now. Thus, , as desired, but . We thus require a new transformed variable such that
has the correct mean and variance. We posit one possible function:
where is a constant that does not depend on (as it’s unknown). Because is 0-mean, has the same (correct) mean of . However, we desire a value of such that
as required by Poisson statistics. Of these three terms, we know only the first,
. The second and third terms can be computed from the joint distribution,
Unfortunately, Gaussian and Poisson distributions are not conjugate distributions, meaning further analysis would not permit analytical solutions. From Bayesian statistics, if we have a Gaussian likelihood with a known mean and unknown variance, an inverse-gamma distribution on the variance is a conjugate prior. Thus, we approximated the distribution ofwith an inverse-gamma distribution that has the same mean and variance (respectively, and ):
Then the joint distribution is a normal-inverse-gamma distribution:
From this joint distribution, we know that , and . Evaluating Eq. 4, we obtain
which ensures that , thus justifying Eq. 1.