Modern microscopy techniques rely on many components that are remotely controllable. This allows implementing control loops that limit the need for human-supervised operation. Auto-focusing systems, in particular, are used extensively in the acquisition of timelapses in developmental or cellular biology or to automatically image slides in a slide scanner. In the former application, imaged specimens tend to drift from the focal plane over time because of specimen growth, flow of the medium, or motion caused by temperature changes. In the latter case, variability in the mounting of the slides requires per-slide adjustment.
af systems seek to determine the optimal shift by which to adjust the axial position to maximize image sharpness. af solutions can be hardware-based (e.g. laser-based sensing of the sample drift [liron_laser_2006] or phase detection by an auxiliary sensor [silvestri_rapid_2017]) or image-based, which does not require any modification of the optical path of the microscope as a focus score is retrieved from the image itself [sun_autofocusing_2004].
We can classify image-based af algorithms into two categories. The first comprises af methods that use iterative minimization of a one-dimensional objective function, the focus score, to move the object to the point at which it is sharpest. Because the output of the function is not predictable and depends on the sample, the af has to acquire tens to hundreds of images at different axial positions in order to converge to a non-local optimum[sun_autofocusing_2004]. A high number of image acquisitions can be damaging for the sample, especially in fluorescence microscopy [magidson_circumventing_2013].
Additionally, existing objective functions only give a meaningful result in the neighborhood of the focal plane, and lose information (i.e. the gradient of the curve is zero) farther away from the focal plane. Furthermore, depending on the software implementation and the imaging modality, the acquisition of hundreds of images can take up to several minutes. The second category comprises single shot AF techniques (that need only one or a few images). Thanks to end-to-end cnn, they take an image as input and directly deduce the optimal shift to be in focus ([wei_neural_2018, jiang_transform-_2018, pinkard_deep_2019]). The drawback of these direct methods is that a long and computationally-intensive cnn training with a microscope objective-specific training data set, must be repeated whenever the optical system changes. Furthermore, these methods are not directly available in open microscope control software, such as Manager [edelstein_computer_2010].
In this paper, we propose a local, cnn-based focus scoring function that remains nearly invariant when imaging different types of samples or modalities on any given microscope. We developed a correlation-based af algorithm that takes advantage of the broad shape and unimodal minimum of this function, which helps to speed up convergence and remaining effective even when the imaged object is far from the focal (several times the dof, see Fig. 1). Since our cnn method does not require a microscope-specific data set for training besides a single stack of an arbitrary object, it is plug-and-play.
This paper is organized as follows. In Section 2, we present the blurriness scoring function, the calibration process, and the af algorithm. In Section 3, we experimentally verify the scoring function’s assumed invariance to a variety of samples and characterize performance with respect to the number of images and in comparison to common af scoring functions, using both simulated and experimentally acquired data. We discuss our findings and conclude in Section 4.
2.1 Problem statement
We consider a specimen, modeled as 2D manifold in 3D space (such as a thin microscopy slide), that we wish to image with a widefield microscope in bright field, fluorescence, or phase contrast. The entire specimen or some regions in the fov can be out of focus and outside of the dof (see Fig. 1). We assume the microscope has a motorized stage for adjusting the focus. We aim at finding the optimal axial shift by which to adjust the sample position such that it is in focus. We seek a solution that (i) does not require a manually selected reference image to be matched (such that the method can be used both for maintaining focus in live timelapses but also for imaging collections of fixed samples) (ii) requires a minimal number of images (to limit photodamage) (iii) shall not require imaging calibration specimens (PSF measurement beads, etc.) or large-scale, microscope-specific training.
2.2 Method description
The principle behind our proposed algorithm is to measure a blurriness score for a few () images acquired at different focus positions , , resulting in a set of pairs and to determine the necessary focal shift such that matches a microscope objective-specific, sample-invariant, depth-blurriness response curve using cross-correlation. The curve invariance assumption has been similarly used by the model-based curve fitting approach of [yazdanfar_simple_2008].
For this approach to work, we need a focus estimation function that is invariant to the sample shape or texture (sample-invariance) but co-variant with the sample’s axial position and sufficiently informative beyond the immediate vicinity of the focal plane. To this end, we chose an estimator of the local optical properties of the microscope objective [shajkofci_semi-blind_2018]. Briefly, it relies on a trained cnn to regress the parameters of a Zernike polynomial psf model [von_zernike_beugungstheorie_1934], given a blurry image patch as an input. Here, we use the estimated Zernike coefficient corresponding to focus as a blurriness score, which provides, given an image as input, a local blurriness score for the indicated position depth .
The trained cnn [shajkofci_semi-blind_2018] does not require re-training when used on different microscopes or different microscope objectives and produces a curve whose shape (up to an axial scaling) is invariant to the sample (an aspect that we verify experimentally in Section 3.1). In order to determine the axial scaling, which is instrument-dependent, we require a calibration step consisting in the acquisition of a full stack of an arbitrary planar and textured object. This yields a blurriness map that we center with its minimum at the origin.
We now describe our proposed af, which follows the structure illustrated in Fig. 2 and is summarized in the steps:
Fit to a Moffat distribution [moffat_theoretical_1969] and extract its fwhm. Set , let be the initial focal plane position, and initialize a gss algorithm with the interval . Acquire images at and compute, using the cnn, the blurriness scores .
Check the convexity of , by fitting to a quadratic polynomial. If the of the polynomial fit is higher than the of a linear fit, go to Step 6. Otherwise go to Step 3.
Increment . Update the gss triplet to obtain and move to a new axial position .
Acquire an image at the current axial position .
Compute, using the cnn, the blurriness score and go to Step 2.
Compute using cross-correlation the local optimal shift minimizing the squared distance:
Move the sample by , averaged for the roi in the plane.
|Focus score function|
3.1 Characterization of regression invariance to image diversity
Since our af algorithm relies on the invariance of to the type of imaged sample, we investigated whether our proposed cnn indeed satisfied this condition and whether other (existing) focus metrics could be substituted.
We gathered images from the evaluation dataset of [shajkofci_semi-blind_2018] and blurred them with Gaussian psf mimicking a 10, NA objective for points in the depth range . In addition, we acquired stacks of fixed rat brain slices tagged with three fluorescent stains using a widefield transmission light microscope with a 10, NA 0.3 objective in a depth range of . We then computed using DeepFocus and other methods, including hpf, lapv, sml [nayar_shape_1990], Tenengrad [sun_secrets_2010], ewc [hanghang_tong_blur_2004], and ws [liebling_autofocus_2004], which cover a broad range of focus measures, as reviewed in [price_comparison_1994, sun_autofocusing_2004, mateos-perez_comparative_2012, ali_analysis_2018].
In Table 1, we reported the average sd of over all input images. Using the experimental dataset, our method had an average sd of (normalization scale with 1 and 0 the blurriest and sharpest values, respectively). We noticed, as illustrated in Fig. 3 (a) and (b), that DeepFocus’ sd increased when increases (i.e when the acquired pictures contain a medium-to-high blur). A low sd implies that is similar with different types of imaged specimens. Other methods had a sd of
, and hence confirmed the variance of these focus metrics with image diversity.
3.2 Characterization of information measure of the scoring function
We next investigated how robustly our proposed DeepFocus measure can report (de)focus information as the distance from focus is increased up to 10 times the dof. We observed (Fig. 3) that focus metrics other than ours were unable to give any information about from whenever is higher than , as they reach a value that does no longer vary as the position is increased further. Since the gradient in such plateau regions is small, minimization algorithms could not converge quickly. To quantify these visual observations regarding the uncertainty of recovering from any given , we computed the conditional entropy:
are random variables representing the calibration blurriness score and the axial distances,and their support sets, and
the probability of a score, given the distance . A high conditional entropy value implies a high uncertainty of detecting the right position for a given . The results, compiled in Table 1, reveal that DeepFocus had a conditional entropy of , a value smaller than that obtained when using any of the other scoring functions instead. In the case of experimental acquisitions, we observed again an improvement in terms of entropy (), where other methods have values in the range . We further determined the threshold distance after which no distance information can be inferred from the image, i.e when the image is too blurry to make the af converge. DeepFocus retained depth information for a range of m with a 10, objective, which is equivalent, using the diffraction-limited dof formula, to 11 times the dof (). In comparison, metrics like ws and sml achieved ranges of only 4 and 7 times the dof, respectively.
3.3 Characterization of the af error as a function of the number of acquisitions
We finally investigated how accurately DeepFocus could retrieve the focal distance as a function of the number of images acquired. We used 100 blurred images from the generated dataset in Section 3.1 with a known in-focus position and computed its distance to the output position of the af. We also compared our method to other autofocus scoring functions (for which we used a bounded Brent’s method as optimizer). The results are summarized in Fig. 4.
We observed that our proposed af converged rapidly (3 iterations), while the two other focus functions needed more than twice as many images to reach a similar focus accuracy. Using 8 iterations or more, we did not notice a better accuracy with our method compared to Tenengrad or hpf.
4 Discussion and conclusion
In our experiments, we showed that the variance of over multiple images was usually lower using DeepFocus than when using other focus scoring functions, especially near the focal plane. Our explanation would be that the cnn, already known to be translation-invariant [lecun_learning_2012], have been trained specifically for the recognition of the psf parameters without discrimination on the input image type and position. By contrast, crafted features such as hpf are computed from content-based calculations and differ from one image to another. When the image is acquired at a large distance from the focal plane, we noticed a loss of spatial features in the acquired image, due to the large fwhm of the psf that degraded it. However, we have been able to retrieve depth information from the image up to 2.5 times farther away from the focal plane than with other methods. That could be mostly explained by the fact that DeepFocus computes features from a 128128 px window, while Gradient-based methods use a much smaller window, such as 33 or 55.
In summary, we developed an af method based on a combination of an cnn scoring function and optimization algorithms that are relying on the invariance of the scoring function. We showed that DeepFocus was robust to changes amongst samples, which enables the retrieval of the optimal axial shift using a correlation-based optimization process that needs as few as 3 images to converge. Our method is currently limited to imaging thin samples and further work will investigate the procedure for thicker objects. We implemented the calibration step and af algorithm as two plugins (Java with a PyTorch[paszke_automatic_2017] backend) for the Manager microscopy acquisition engine [edelstein_computer_2010], which we will make available upon acceptance.