In recent years, data-driven artificial intelligence (AI) methods have dominated automated pattern search. In particular, deep learning (DL) approaches successfully reduce the need for feature engineering by leveraging a large amount of data to optimize the model’s parameters and find the most important data features[lecun2015deep]. DL is more and more broadly used in many scientific disciplines. Deep neural networks trained on huge datasets are irreplaceable in modern imaging diagnostics in medicine and allow for fast and precise clinical decisions and medical procedure implementations [NMI1, NMI3, NMI4, NMI5, NMI6]. Another example is the application of DL in cancer research and diagnostics [CR1, CR2, CR3, CR4], where it improves automation and opens new avenues for artificial intelligence-assisted precision oncology and medicine in general. Overall, the implementation of AI provides a new quality of medical tools and diagnostics enabling faster and more efficient healthcare procedures.
Another area where AI could be implemented is microbiology with regard to standard procedural requirements in the pharmaceutical, cosmetic, or food industries. The manufacturing processes are subjected to strict policies and regulations listed, for instance, in Pharmacopoeia [bib:M1, bib:M2, bib:M3] or European Medicines Agency (EMA) guidelines [bib:M4], including extensive regulations governing microbiological purity in industrial areas [bib:M5, bib:M6]. The above requirements obligate manufacturers to perform constant microbiological monitoring, which means thousands of samples analysed by experienced microbiologists. In most production plants, microbiological procedures are entirely manual - starting from sample collection and culturing to the final plate evaluation with microbial colony count and identification. Large companies start implementing laboratory automation systems to speed up the operating process [bib:Ferrari2017, bib:FaronCOP]. Taking into account the current industry demand, we aim to develop an effective automatic analysis of microbiological samples based on DL algorithms, performing an efficient and precise microbial sample analysis.
The existing solutions for microbial colony analysis provide only some level of automation. Commonly, the process is limited to generating the photo of a microbial culture grown on a Petri dish, which is further investigated and manually curated by microbiologists. However, the full automation of microbial sample analysis might be accomplished by the implementation of deep neural networks providing robust and accurate algorithms for microbial colony localization, classification, and counting. DL-based solutions can generalize different acquisition setups and new microbe species easier than taditional computer vision algorithms, which strongly depend on hand-crafted feature selection[ZD1]. As DL requires well-balanced and diverse data to optimally fit the model’s parameters, there is a need to create huge publicly available datasets for neural network training and, consequently, achieving their best performance.
The main objective of our work was to develop a DL-based methodology to identify and count microbial colonies based on the photo of a standard agar plate culture. At first, we prepared the Annotated Germs for Automated Recognition (AGAR) dataset consisting of over 330k labeled microbial colonies (Staphylococcus aureus, Bacillus subtilis, Pseudomonas aeruginosa, Escherichia coli, and Candida albicans) localized on 18k annotated Petri dish photos. Secondly, we evaluated the selected deep neural networks, such as Faster R-CNN [bib:Faster2015] and Cascade R-CNN [bib:Cascade2018] architectures for object detection, with several different backbones – ResNet [bib:resnet2016], ResNeXt [bib:resnext2017], and HRNet [bib:hrnet2020] – to set benchmarks for the task. The whole process of collecting and annotating the data as well as the deep learning analysis is presented in Fig.1.
AGAR dataset preparation
Designing a dataset requires several decisions to be made. In the case of data imaging for microbial colony counting and identification, one needs to select microorganism species to study, provide proper growth conditions (applied medium, incubation temperature and time, culture dilution factors, etc.), and choose an appropriate photo acquisition setup (light source location, external light handling, or camera type). Such decisions for the AGAR dataset were based on the goal of making a diverse dataset for automatic colony counting on agar plates which could be easily reproduced in any microbiological laboratory.
The selection of standard microorganisms for the AGAR dataset preparation was based on Pharmacopoeia [bib:M1, bib:M2, bib:M3] regulations and ATCC guidelines [bib:M15]. Five representatives from different bacterial groups (S. aureus subsp. aureus ATCC 6538, B. subtilis subsp. spizizenii ATCC 6633, P. aeruginosa ATCC 9027, E. coli ATCC 8739), and C. albicans ATCC 10231 as a yeast strain were chosen. A series of 10-fold dilutions of refreshed cultures were made and 100 ul were inoculated onto Trypticase Soy Agar (TSA) plates in five technical replicates. The microbial cultures were then incubated at 37oC for 18-24 hours. For data acquisition, single and mixed cultures (S. aureus & P. aeruginosa/E. coli/C. albicans; P. aeruginosa & E. coli/C. albicans; E. coli & C. albicans) were used. The images of colonies grown on agar plates were taken using one of two cameras. Thus, the whole photo dataset was divided into two major subsets: higher-resolution (4000 x 6000 px) and lower-resolution (2048 x 2048 px). Moreover, three different subgroups were distinguished in the former subset: bright, dark, and vague based on different illumination conditions in which the photos were taken. For visualization of plates captured in the various setups see Fig. 3, whereas details on microbial colony growth conditions and acquisition setups are provided in Supplementary Sections 1.1 and 1.2.
A web application was developed for microbiologists to upload and annotate agar plate culture photos. Each sample recorded in the database was marked as countable, uncountable, or empty (Fig. 2(d)). It was important to establish a countable range for manual counting, which is generally between 30 - 300 colonies for a standard 100 mm Petri dish [bib:Tomasiewicz1980, bib:Sutton2011]. Every countable sample was manually labeled by microbiologists using a bounding box and species label per colony. Samples were classified as uncountable if more than 300 colonies were present, or if no discrete colonies were seen due to their agglomeration or blurred shape.
In total, 18 000 photos taken under diverse lighting conditions were collected in the AGAR database. Altogether, 336 442 colonies of 5 standard microbes were annotated. The summary of the AGAR dataset statistics is presented in Fig. 2(b). The AGAR dataset achieves a good balance in the number of instances for different microbes (see Fig. 2(a)), which is important for building a robust deep learning model. The interquartile range for all 12 271 countable samples is between 4 and 38 colonies, and around 84.4% of these samples have less than 50 colonies. The histograms of the number of annotated colonies per countable image are presented in Fig. 2(c). Additional information of provided annotations and bounding box size distribution among microbial classes can be found in the Supplementary Sections 1.2 and 1.3.
Deep Learning Analysis
To verify the suitability of the AGAR dataset for building deep learning models for image-based microorganism recognition, and to set a benchmark for the task, we evaluated the performance of the two neural network architectures for object detection: Faster R-CNN [bib:Faster2015] and Cascade R-CNN [bib:Cascade2018], with four different backbones: ResNet-50 [bib:resnet2016], ResNet-1010 [bib:resnet2016], ResNeXt-101 [bib:resnext2017], and HRNet [bib:hrnet2020]. The implementation from the MMDetection [bib:mmdetection2019]
toolbox was used. The backbones’ weights were initialized using models pre-trained on ImageNet[bib:imagenet2009], available in the torchvision
package of the PyTorch library. More technical details are provided in Supplementary Section 4.1.
The quantitative results for each subset of the AGAR dataset analyzed separately are reported, as well as the results for models trained on the whole dataset. Each subset was split randomly into approximately 75% samples, which constituted the training set, and 25% for the validation and testing set. Due to the relatively high resolution of all images, the samples were divided into patches of 512×512 px to avoid scaling down, which would make the recognition of small colonies more challenging. If necessary, zero paddings were applied to get a proper aspect ratio. As the goal of this study was to establish a baseline for the microbe detection task, the neural network architecture was not modified, thus the patches were resized to match the default input layer size. More details on the data split and patch preparation can be found in Supplementary Section 2.1.
Several approaches have been undertaken for data augmentation, including the addition of Gaussian blur, salt and pepper noise, changing the image color space to LAB and HSV, preparing histogram equalization, rotating images in the [-45, 45] degree range, or cropping around the annotated bounding boxes, but always keeping a visible microbe object. However, the best results were obtained through splitting the image into patches and normalizing it using means and standard deviation per channel values, the same as for the Common Object in Context (COCO) dataset[bib:COCO2014].
The evaluation metric for colony detection was based on the mean Average Precision (mAP) previously established for the COCO competition[bib:COCO2014]. The Average Precision (AP) is calculated as the average value of precision on the precision-recall (PR) curve for a given Intersection over Union (IoU) threshold. IoU describes the level of overlapping between truth and predicted bounding box. The mean value of AP for different IoU thresholds and/or classes gives the mAP. To measure the effectiveness of the colony counting, two separate metrics were provided, namely Mean Absolute Error (MAE), its cumulated variation (cMAE), and symmetric Mean Absolute Percentage Error (sMAPE), and evaluated for different models on every data subset. Our post-processing of the predicted bounding boxes is described in detail in the Supplementary Section 2.2, while the calculations of all used metrics (mAP, MAE, cMAE, sMAPE) in the Supplementary Sections 3.1 and 3.2.
The analysis for each of the major subsets of the AGAR dataset (higher- and lower-resolution) was performed independently. The vague subset was excluded from the studies. It was later used as a different domain representation in the studies on the models’ ability to generalize to new acquisition setups, see Supplementary Sections 4.7 and 4.8 for details. The results for the baseline model (Faster R-CNN with ResNet-50), along with more complex, multi-stage detectors (Cascade R-CNN with HRNet), are presented in Fig. 4. The visualisation of predictions made by Cascade R-CNN with HRNet backbone is shown in Fig. 3. Detailed descriptions of results achieved by other tested detectors are placed in Supplementary Sections 4.2 and 4.3.
Our main outcomes in terms of mAP are presented in Fig. 4 (a). In general, Cascade R-CNN performs slightly better with mAP scores equal to 52.0% for the higher- and 59.4% for the lower-resolution subset, compared to Faster R-CNN, which achieved 49.3% and 56.0%, respectively. A comprehensive study showed that the increase of backbone complexity gives just slightly better results (see Supplementary Section 4.2), which suggests that, in this class of architectures, the model’s capability is reached and extending backbone networks further might not lead to much better performance.
The average precision calculated per microbe is higher for microbes forming smaller colonies, namely S. aureus and C. albicans
. The rest of the studied microorganisms tend to aggregate or overlap, especially in low dilution samples (more colonies on the plate surface). Moreover, larger colonies tend to have blurred edges with lower contrast with regard to agar substrate. This all leads to less accurate microbe detection in terms of mAP. The mAP score measures the quality of detection using IoU, which is related to true and predicted bounding boxes overlapping. For the colony counting, the recognition of microbes is crucial, but the exactness of their location is of secondary importance. Therefore, the sMAPE and MAE metrics were used for tuning the algorithms applied in the post-processing stage to merge predictions for individual patches into the whole test image. The values of two parameters, namely classification probability threshold and NMS threshold for the modified soft non-maximum suppression algorithmm[bib:NMS2017], were established on the training dataset and then used to eliminate the excess of bounding boxes (e.g. appearing twice through the edges of neighbouring overlapped patches) for the test data.
The results of microbe counting are presented in Fig. 4(b). Overall, Cascade R-CNN with HRNet predictions are more accurate with sMAPE equal 4.86% for higher- and 3.81% for lower-resolution subsets. It is worth noting that the most lightweight of the considered models (Faster R-CNN with ResNet-50) is also very effective with 5.32% and 4.68% sMAPE for the same subsets. As these metrics do not include the error coming from microorganism misclassification, cMAE being a sum of MAE calculated for every microbe separately is also provided (in brackets in altogether row). For cMAE, only properly recognized microbes are taken as a correct count. The small difference between MAE and cMAE (e.g. 1.57 vs 1.76 in case of Cascade with HRNet for lower-resolution subset) proves the great ability of the models to distinguish between different species.
The selected models’ predictions for microbial counting on test AGAR subsets are presented in Fig. 4(c). The straight black lines represent perfect results, while the blue ones indicate a 10% error to highlight the acceptable error range. The detectors tend to underestimate the exact number of colonies for more populated samples. This is largely due to the nature of the sMAPE metric, which weights errors inversely to the number of instances. Moreover, the AGAR dataset is dominated by plates with less than 50 colonies (see Fig. 2(c)), which impacts the values of the probability and NMS thresholds for post-processing. The extension for higher populated samples using double thresholding approach is described in Supplementary Section 4.6.
Although empty and uncountable
plates come with no colony annotations, qualitative studies were carried out to evaluate the performance of the trained models. Experiments on uncountable samples showed that many plates labeled by microbiologists as having more than 300 colonies were, in fact, within the countable range, as recognized by the detectors. As discussed above, the models are more accurate for samples with less than 50 colonies. However, they give very good estimates for plates with hundreds or even thousands of instances, correctly identifying single colonies in highly populated samples, with the maximum number of predicted colonies on one plate equal to 2782. It is worth noting that it takes seconds for the deep learning system, while it could take up to an hour in the case of manual counting.
The analysis of samples marked by microbiologists as empty presented the great capability of deep learning methods for microbiological quality control. The detectors were able to recognize colonies difficult to see and missed by human specialists. Overlooked microbes were usually small and located either at the edge of a Petri dish or on a plate factory marking. This is a strong indication that human-AI interaction can lead to a higher quality of microbiological analyses.
Further discussion on empty and uncountable samples can be found in Supplementary Section 4.5. The complementary analysis of the impact of the models’ initialization parameters are provided in Supplementary Section 4.4. The detectors’ ability to generalize to different AGAR subsets with transfer learning technique is described in Supplementary Section 4.7. The models’ performance on every subset when trained on the whole dataset is provided in Supplementary Section 4.8.
Discussion and Conclusions
There is a growing interest in applying computer vision algorithms to microbiological analyses, especially in industry and pharmaceutical branches, but many existing approaches are based on traditional image processing techniques [bib:Zhang2008, bib:Brugger2012, bib:Zhu2018, bib:A1, bib:C1]. Although deep learning methods have recently become more commonly used [bib:Nie2015, bib:Zielinski2017, bib:Talo2019, bib:Ferrari2015, bib:Ferrari2017, bib:Savardi2018], the proposed procedures are usually not end-to-end solutions and cannot be used independently to process an image of an entire Petri plate. The lack of effective DL-based approaches for microbiological purposes possibly results from the poor availability of huge datasets needed for deep neural network training. To fill this gap, the AGAR dataset is introduced with 18 000 photos of agar plate cultures (Petri dish) and 336 442 annotated colonies of five microorganisms (staphylococci, Gram-positive and negative bacilli, and yeast) most commonly used according to the Pharmacopoeia guidelines.
An earlier study by Ferrari et al. [bib:Ferrari2015, bib:Ferrari2017, bib:Savardi2018]
presented the creation of the MicrobIA, a smaller dataset composed of about 29 000 labeled segments (fragments of the Perti dish with single agglomerates) of bacterial colonies grown on blood and chromogenic agar plates. The segments were extracted with the WASPlab system from photos of whole Petri plates and manually assigned to one of seven classes (segments containing from 1 to 6 colonies and outliers). The MicrobIA dataset was then used to train a convolutional neural network to classify images into one of the six colony counts or the outliers category (bubbles, dust or dirt on the agar). Authors reported the per-colony error of 28% measured on the individual segments[bib:Ferrari2017]. Another study using the MicrobIA dataset [bib:Andreini2020] explored the possibility of image generation for agar plates’ segmentation and investigated the domain shift problem. It was proved that synthetic data can be applied to train a deep neural network for agar plate image segmentation.
In contrast, our study demonstrates the possibility of exploiting deep learning-based detectors to build end-to-end solutions for multiclass microbial colony recognition and counting. The provided benchmarks of the chosen DL models prove that the AGAR dataset can be used to build robust models, which adjust well to real data achieved in various acquisition setups, resulting in different illumination and resolution of a Petri plate photo. The exhaustive analysis of eight different deep neural networks for object detection was performed on two AGAR subsets referred to as higher- and lower-resolution. The best performing model used in our study (Cascade R-CNN with HRNet backbone) achieved the low counting error of 4.92% and 3.81%, respectively on both subsets. This is a significant improvement over previous reports describing the microbial colony counting with computer vision algorithms [bib:Ferrari2017]. In the case of detection, the mAP scores within the range from 49.3% to 59.4% were achieved for different detectors, which is an excellent result compared to other reports (44.6% for Cascade R-CNN and 36.7% for Faster R-CNN ) done with the same architectures on the COCO dataset.
In summary, the selected R-CNN models perform very well in detecting microbial colonies. This is likely because the detected instances have similar shapes and all species of microbes are well represented in the training data. Moreover, the results obtained with base Faster R-CNN and more complex Cascade R-CNN do not differ much. For that reason, exploring different state of the art AI approaches, such as Transformers [bib:Zhu2020], seems to be the right future direction for the implementation of deep learning in microbiology.
There is a visibly increasing demand for artificial intelligence in numerous human activities, including social interactions, industry, agriculture, and medicine. The AGAR dataset, which compiles a huge variety of Petri dish culture photos, is a great data source for further exploration. The use of generative models on AGAR might, for example, improve the performance of colony detection on a wider variety of images from other domains, as well as with a larger number of microbial species. Additionally, the large variety in the distribution of colony sizes shows the potential of the AGAR dataset for detailed studies of colony growth dynamics, which can, for example, help us better understand antibiotic resistance of bacteria.
Project “Development of a new method for detection and identifying bacterial colonies using artificial neural networks and machine learning algorithms” is co-financed from European Union funds under the European Regional Development Funds as part of the Smart Growth Operational Program. Project implemented as part of the National Centre for Research and Development: Fast Track (grant no. POIR.01.01.01-00-0040/18).
Author contributions statement
SM, JP, TB, AH and AP contributed to the DL analysys code. GG and JR prepared microbiological samples, captured photos of standard agar plates culture and annotated them. AL and TG prepared an application to annotate images. SM analyzed and interpreted the AGAR datasets statistics. SM and JP planned the experiments and developed DL-based methodology to identify and count microbial colonies. JP and SM performed calculations and analysed the results of the experiments. ZDK led the microbiologist team, while TG led the DL team during the whole project. TG and ZDK supervised the project.
All authors provided critical feedback and helped shape the research, the analysis and the manuscript.