1 Introduction
Dilution experiments are an important tool to detect the presence of microbes, even in very low concentrations, relying on basic microbiology techniques and relatively simple laboratory equipment. The microbes are first sampled from a natural environment, such as soil or drinking water, or from an engineered benchtop reactor system where they can be grown planktonically or in a biofilm before potentially being exposed to some treatment. Whatever the case, the sample containing microbes is transferred into a volume of liquid in tube 0, most commonly buffered water (Rice et al., 2017) but any appropriate liquid medium may be used. It is then of interest to estimate the number of microbes in this tube 0, i.e., the abundance of microbes in the original sample. Analyzing the original sample directly may be possible using specialized equipment and more complex lab processes (e.g. a confocal scanning laser microscope Pitts and Stewart, 2008; Parker et al., 2018a). Instead, subvolumes from tube 0 may be “spreadplated” or “dropplated” (Herigstad and Hamilton, 2001) onto growth media in a Petri dish to let bacteria grow until distinguinshable by the naked eye (see Figure 1(c)). When plating directly from samples that contain a high density of microbes, it is not possible to identify and count individual colonies, in which case it is necessary to dilute the original volume until only a few colonies can be counted after plating in a Petri dish. From these counts at some dilution(s), the microbial abundance is inferred in tube 0 and hence in the original sample. The process is called a dilution series and is described as follows.
To begin the dilution series, from the volume in tube 0 a subvolume is taken which is diluted at a factor , typically , to form a new volume in a new tube at dilution . This process is repeated for : a subvolume is taken from the volume in tube , and then diluted by a factor to form a new volume in a new tube at dilution . A smaller volume is then removed from each tube and then plated onto growth media in a Petri dish, and kept at optimal environment conditions (e.g., temperature and pH) to allow any viable microbes to grow and reproduce. Individual microbes or clumps of microbes will start forming colonies in the Petri dish, until, at a high enough dilution, they are distinguishable to the naked eye as dots called Colony Forming Units (CFUs).
Estimating microbial abundances from colonies dates at least as far back as the seminal work of Robert Koch in the 1880’s (Prescott et al., 1996, p. 9). These estimates are expressed as CFUs rather than as the number of microbes because of a number of known limitations, the most obvious being the questionable assumption that each colony arises from an individual cell (Prescott et al. (1996, p.119); see also Cundell (2015) for a more recent discussion). Still, especially for a single microbial species isolated from a consortia in an environmental sample, or for a monoculture grown in the laboratory, the CFU remains a useful quantitative measure for estimating microbial abundances. Dilution series for CFU counting are performed routinely in many government, academic and private laboratories for experimentation as well as for testing and public standard compliance (see for example BenDavid and Davidson, 2014; FDA, 2018, or a “dilution series” search in fda.gov). Indeed, CFU counting is a required international metric for assessing the efficacy of antimicrobial treatments in North America and Europe (Parker et al., 2018b).
In a dilution series, for some lower dilutions, too many CFUs may cluster and will be impossible to count and are reported as “too numerous to count” (TNTC). For higher dilutions there will eventually be no CFUs (no microbial activity). Commonly, one dilution is then selected, namely dilution , , for CFU counting, having a minimum number of distinguishable CFUs per plate or drop, and referred to as the lowest countable dilution. From this, the microbial abundance in the original sample is to be estimated. The crudest estimate is the (number of CFUs) , where and is the ratio of the dilution tube volume over the volume plated, or drop volume, . The usual formula used by practitioners is (number of CFUs) , which is precisely equal to the latter. See Figure 1 for an illustration of the process, CFU counting and basic data analysis.
Instead of taking solely the lowest countable dilution, a possible variant is to weightedly average the CFUs across multiple dilutions (Hedges, 2002; Maturin and Peeler, 1998; Niemela, 1983; Niemi and Niemela, 2001; Parkhurst and Stern, 1998) that is motivated by the HorvitzThompson estimator, popular in field ecology (Horvitz and Thompson, 1952). Hamilton and Parker (2010) argue that the added information is minimal for common dilution experimental designs. Our investigation (presented here) leads to this same conclusion, which supports the microbiologist’s conventional practice of using data from only the first countable dilution to estimate the microbial abundance in the original sample.
(a)  (b)  (c) 
In most situations the abundance of microbes in the original sample spans several orders of magnitude and therefore it is common to estimate a log abundance that is the transform of the CFU count in dilution 0. The estimated mean log abundance from statistical analyses can easily be normalized to a mean log density per unit of the volume or surface area of the original specimen, , by simply subtracting by .
The efficacy of an antimicrobial treatment is usually quantified by a log reduction (LR) which is the estimated mean log abundance of microbes that survived the treatment subtracted from the estimated mean log abundance of microbes in a concurrent control. The microbes in the control samples are subjected to the same conditions as the treated samples with the exception that the controls are subjected to a placebo treatment. Perhaps not surprisingly, the log abundances of the microbes in the control samples are typically much less variable than the microbes subjected to an antimicrobial (Parker et al., 2018b)
. Hence the constant variance assumption of many statistical models is often violated when analyzing a data set that includes both control and treated samples.
There are two common approaches to analyzing CFU data. One uses a Poisson likelihood model of the counts (Haas et al., 1999), while the other uses a lognormal likelihood model of the counts (Hamilton et al., 2013). Both maximize the likelihood to provide microbial abundance estimates and quantify uncertainty. Both also can be extended to handle random effects (Zuur et al., 1999) due to samples being repeatedly collected from the same site, experiment and/or the same laboratory. Software is readily available to fit either of these types of models (see, e.g., Bates et al., 2015).
Haas et al. (1999, p. 213) argue that a Poisson maximum likelihood estimator (MLE) approach is preferred over the lognormal MLE. One reason is that the Poisson likelihood naturally deals with zero CFU counts. The obvious downside to the Poisson model is the requirement that the variance is equal to the mean. The control data that we present here clearly do not adhere to this restriction. The Generalized Poisson or Negative binomial are both extensions of the Poisson that allow for the variance to be independently estimated from the mean (Joe and Zhu, 2005). Nonetheless, neither of these Poisson approaches allow one to directly analyze LRs.
The lognormal MLE approach overcomes the restriction that the mean equals the variance by including separate parameters for the mean and variance. To deal with the differing variability of microbes treated by antimicrobials versus controls, one can aggregate the log abundance estimates into LRs and then apply a normal model to the LRs (Hamilton et al., 2013). Unfortunately, this approach does not allow one to separately model or estimate the variance among counts on different plates at different dilutions. When presented with CFU data including a zero or TNTC, a common tactic when using a lognormal likelihood is to substitute in a small value (CFU = 0.5 or 1) for zero and to substitute in the largest count for TNTC (30 for the drop plate method, 300 for the spread plate method). Many published simulation studies show that as long as there are not too many censored data ( of the total data set), the lognormal model has little bias and mean squared error when estimating the mean log abundance of organisms (see, eg., Clarke, 1998; Haas and Scheff, 1990; Singh and Nocerino, 2002; EPA, 1996).
The approach that we present here has several advantages over the conventional approaches just described.

Any censored data are directly modeled in the binomial likelihood (i.e., no substitution rules are employed) resulting in zeros and TNTCs handled systematically by the same model.

Counts from multiple dilutions are directly incorporated into the model (ie. they are not aggregated together before statistical analysis), which allows us to separate out the variance among counts on different plates and on different dilutions from other sources of variance that contribute to the variance of log abundances and LRs.

Our model accounts for clustering of the cells (with the miscount probability
) that violates the assumption that one microbe generates one CFU. This aspect of the model can also deal with miscounts by the technician who actually counts the CFUs. 
Instead of summarizing the results from each experiment by a LR and then analyzing the LRs using a normal model, we provide an overarching hierarchy for the analysis of LRs that has the explicit information regarding the CFUs and dilutions that led to the LR.
The paper is organized as follows. In section 2 we build our binomial model from the description of the experimental design. Using a basic theorem a simplification is obtained leading a straightforward likelihood. We also describe our Bayesian inference process and a hierarchical modeling strategy to analyze intra and interlab data and the MCMC method used. Two examples are presented in section 3 considering real data and in section 4 we present a discussion of the paper.
2 Model and Inference
As explained above, microbial samples are treated with, for example, a chemical agent at some concentration, or water temperature for a specified contact time, and the effect of this treatment is observed as the number of CFUs from surviving microbes using a dilution series. Commonly, several experiments may be conducted with the same treatment, and within each experiment, multiple repetitions or samples are also considered. To ease analysis and notation, we concentrate on one experiment for a single treatment. We make a comment on the analysis of multiple treatments in section 4.
Throughout, we will refer to the plate or drop from which CFUs were counted only as ‘drop’; our model may consider in principle any such volume from which CFUs are counted simply by using a different division factor .
Let be the number of repetitions of microbial samples to be analyzed in one experiment. Each sample, grown in an original surface area, volume etc. , is transported into a volume (typically buffered water) and carefully homogenized in tube 0 of each repetition . Some fraction of the volume in this first tube is taken to tube 1, diluted in volume and carefully homogenized. This process is repeated to produce tubes for each repetition. Tubes 1 through will have volume and is the proportion of volume in tube 0 vs the common volume in the rest of the dilution series tubes. In our examples (intralab example studying heat treatments) and and (interlab example studying bleach treatments).
‘drops’ of volume are then plated from each dilution tube. For example, for the drop plate method, a total of drops, each with volume l, are usually plated. For the spread plate method, drops, each with volume ml or ml, is common. One or several Petri dishes may be used to grow CFUs from these drops, but we will not distinguish between different dishes from the same dilution tube. For the proportion plated is and for .
Let be the r.v. representing the number of CFUs in dilution for repetition and let be any particular realization for it (we use the standard probabilistic notation of upper case being the r.v. and lower case a particular value for it, eg. , the probability of being less or equal to the particular value conditional on ) . Let be the CFU count in drop , for dilution for repetition . Our approach is to build a hierarchical model following the dilution process and the counting process just described. Due to homogenization, we can safely assume that and
are binomial random variables. Namely
(1)  
(2) 
for , where is the Dirac function. Here is included as an additional probability that each individual microbe in each drop actually does form a distinguishable colony and adds to the CFU count.
The proposed model above brings up an interesting philosophical point. This binomial model is simply describing the way that the experiment is conducted, as opposed to a Poisson or lognormal. As explained in the introduction, the above model is in fact describing the experimental design with no further assumptions in the statistical modeling than those already assumed by the experimenters conducting the dilution experiments. These assumptions are that: through careful homegenization, a drop from dilution has a proportion of CFUs from dilution 0 and dilution has a proportion of CFUs from dilution . Moreover, only a small proportion (maybe less that 5%) of CFUs fail to make countable colonies.
Moreover, use of the binomial permits direct estimation of the number of CFUs (’s) with no need for scaling and alleviates the main issues of the other models: over and underdispersion in the Poisson case and the use of substitution rules for 0’s and TNTCs in the lognormal case.
To model all repetitions in a single experiment, and because it is common to consider log abundances when is large, we take the usual approach of constructing a model linking all log abundances , , as realizations from a population having a common mean with some dispersion. This again reflects/models what is being done in the lab: The intention of performing repetitions is to try to estimate the (log) abundance of the experiment itself, and asses its variability by performing repetitions under repeatitable conditions. Accordingly is interpreted as the mean log abundance for a single experiment, which we will infer, and using a Bayesian approach, it will be taken as a r.v. and its posterior distribution estimated. However, before describing the details of the latter, we first comment on the following two points.
First, as opposed to usual practices, from the onset we add 1 when calculating the log abundance to properly define it when , that is, when there are only zero CFUs as occurs when no microbes survive an antimicrobial treatment. Note that for (as almost always occurs for control samples), is already nearly the same as , so the interpretation of the log abundance defined as should be straightforward for these large values of . When is closer to 0, then a log abundance is less useful. Instead, one can examine directly and its posterior for statistical inference. Still, Taylor’s Theorem shows that for small , which suggests defining the log abundance as even in this case. Moreover, this definition will permit us not only to be consistent in all cases, but also to discuss the limit of detection (LOD) when no counts are detected; see section 2.2.
Second, it is common to normalize abundances to the volume or surface area of the original specimen, , via . We recommend applying this normalization to convert log abundances to log densities by . This brings up an interesting point that is not well appreciated by microbiologists. Changing the units via clearly changes the mean log density but leaves the variance unchanged (due to the multiplicative property of the log transform). For example, for biofilm samples, when changing the units from CFU/mm to CFU/cm,
decreases by a factor of 100, so that the mean log density increases by 2 but the standard error of the sample mean (SEM) and the SD of the individual log densities remain unchanged. Hence, any frequentist hypothesis test of the population mean of
that depends on a ratio of the sample mean to the SEM will always become “statistically significant” for a drastic enough change in units. This issue is mitigated when considering the log abundance of microbes compared to a concurrent control via a log reduction (LR), as occurs when assessing antimicrobial treatments. This is because the LR is unitless.Returning to the experimental mean log abundance , the first and nearly default modeling approach would be taking
(3) 
or . This model might indeed be appropriate for relatively small , to mantain positive, but the Gaussian model is certainly not well suited in general. To make a more robust modeling approach, first assume that
(4) 
ie. . Second we introduce as a dispersion parameter, to stipulate the model
(5) 
Here we use the parametrization for the Gamma distribution
where is the ‘scale’ parameter, and therefore the expected value above is preciselyas required. Moreover, its standard deviation is
and its signaltonoise ratio (ie. mean over standard deviation or the inverse of the coefficient of variation) is
, representing the unitless dispersion in the model which is specially well suited for positive r.v.’s such as . Note also that the above gamma model correctly generalizes the Gaussian model (3) since for large (eg. ) the gamma distribution is already close to a Gaussian and and this is precisely the case when we have relatively small variances (low coefficient of variation), making the Gaussian model appropriate. Then simply, (5) generalizes the default Gaussian model in (3) to positive only values, only using a different, and perhaps better suited, parametrization.The specification above in fact creates a hierarchical model and using a simple well known result we may integrate out the and the binomial model in (1) becomes
(6) 
See further details on this and other technical elements of our model in Appendix A.
We use a Bayesian approach to make inferences for the parameters of interest by first stating prior distributions and performing an MCMC to sample from the posterior.
We require prior distributions for . We take a pragmatic approach in which the prior for
is a discrete uniform distribution from 0 to
, and , is a scale parameter. is a maximum physical capacity of CFUs for the surface with surface area or the volume with volume , etc. put to treatment. In engineered reactor systems especially, experimentalists know the maximum microbial abundance in a sample (i.e., they know ). Indeed, they choose which dilutions to plate based on this knowledge. is the shape parameter of the Gamma conditional distribution in (5). A simple approach is to take the prior for as exponential, resulting with most of its mass from 0 to . In the drop plate examples below we use and . This prior parameter for A () was calibrated such that each had nearly the same prior as , that is a (ie. a priori is approximately for all ; this may be done calibrating with simulated samples from (5)).We use the twalk (Christen and Fox, 2010) to produce a MCMC algorithm to sample from the resulting posterior distribution. The twalk is a self adjusting MCMC algorithm, that requires the log posterior and two initial points. In the resulting MCMC in all examples we typically generated chains of length 500,000 with an IAT (Geyer, 1992) of 50 leading to an effective sample size of roughly 10,000. In our Python implementation the corresponding computations took 50 seconds on a 2.2 GHz processor. By simulating initial values for each from its ‘free’ posterior (see Appendix A) the burnin resulted very short indeed in most cases. The initial value for is taken as the mean of the s and the initial value for is taken by simulating from its prior distribution. Overall, the MCMC is very robust, working nearly unsupervised in all the examples we tested, including all those presented here.
2.1 Goodness of fit of the binomial model
Our binomial model simply follows what is actually done in the laboratory and therefore we claim it models dilution series count data correctly. For example, it accounts for all extreme or censored data cases. However, there may be unaccounted sources of variability that could question the appropriateness of the binomial model. To support our claim that our approach is an overall better model for making the correct inferences, we here compare it to an alternative model. Obvious models to compare the binomial model with include the Poisson, negative binomial, or generalized Poisson. The generalized Poisson or negative binomial are both Poisson mixtures (Joe and Zhu, 2005), and would be good candidates for a comparison but it is not clear how to model the dilution series, as we did in (1) and (2) with respect to the parameter of interest
(the abundance in the original sample) to provide a fair comparison with our model. Certainly alternative definitions can be attempted to use some Poisson mixture as a model for dilution data, but a straightforward generalization of our binomial approach is a Betabinomial distribution.
The Betabinomial distribution is the result of a mixture of a binomial and a Beta distribution, and as such is a generalization of the binomial distribution. Namely, if
and then . That is, this generalizes the binomial distribution by letting the sucess probability to be random, resulting in a variance that is independent of the mean, whereas for the binomial model, its expected value over variance is always .It is clear now how to extend our binomial model in (6), namely,
where . Here as required, for any . That is, instead of setting as in (6) we let become a parameter, the r.v. , with , and in this way generalize the binomial model to a Betabinomial. Here
is an additional hyperparameter for the above Beta (prior) distribution for
.We use the Bayesian model comparison machinery, calculating the posterior odds of the Betabinomial vs the binomial model, namely, the Bayes Factors (BF) of the Betabinomial in favor of the binomial model
(see Kass and Raftery, 1995, for example). In either case, the posterior probability for
is discrete and its normalization constant is calculated with a sum; this normalization constant is calculated for each of the binomial and the Betabinomial models. The posterior probability of each model is proportional to its normalization constant and the BF is the ratio of these posterior probabilities or in fact the ratio of the normalization constants of each model.We still need to fix to stipulate the Beta prior distribution for in each case. If the prior mode for is at zero and would bias the Betabinomial model towards zero CFU counts in all cases. To avoid this we need , that is . We have seen that not restricting leads to a BF of practically zero in favor of the Betabinomial model, besides cases with zero CFU counts, as expected (results not shown). Since in most cases is quite large, then setting represents a neutral choice.
We performed the Bayesian model comparison on the data sets mentioned in section 3. For the 51 CBE and 18 interlab individual dilution series count data we found only 4 BF’s above 3: one for the former, with a BF = 15 (tube 3 of experiment 70 C, 10 min), and three for the latter data set, with BF’s of 15, 5.5 (from tube 1 of lab 5, tube 1 of lab 6 ) and 3.45. Leaving out those BF’s above 4, the rest of the BF’s are plotted in the boxplots in Figure 2. We are using the common recommendation that a BF above 3 provides positive evidence agains the default model (Kass and Raftery, 1995). These results do not provide strong evidence for the Betabinimial over the binomial model. Moreover, most of the BF’s were below 1 which is evidence in favor of the binomial model over the alternative, more general, Betabinomial.
2.2 Censored data, zero counts and level of detection
As explained in the introduction, the usual practice when counting CFUs is that, besides the actual dilution selected for counting, the other s are not recorded. Therefore, the CFUs in the first uncounted dilutions could be considered as censored data since we know that drops below the selected dilution are TNTC, and also CFU counts in drops above the selected dilution could also be counted, even if all zero. However, this added difficulty in recoding and analysis does not seem to add any substantial information (Hamilton and Parker, 2010), but this will depend on the particular design, mainly on the dilution parameters and . We have confirmed this using our model and the usual drop plate design using a simulation study (see Appendix B); we do not discuss this possibility any further.
Accordingly, let be the dilution selected for counting (i.e., the lowest countable dilution) for repetition and to ease notation let , the actual CFU count in the th drop. The selected dilution is part of our experimental design and is taken as the first dilution such that ( when drop plating, when spread plating). If zero CFUs appeared even in the first tube, we let and . Dealing with zero counts, that is (ie. no CFU counts even at dilution 0), represents no problem and may be dealt with consistently since (6) leads to a likelihood from which a posterior may be calculated; see Figure 3(a).
In the case where even at the highest dilution still the CFU count is above the threshold , that is then the CFU is recorded as TNTC, and we may treat this as right censored data. The likelihood in this case is and this extreme case can also be dealt with, although with a higher computational burden; it involves calculating the binomial cdf in the likelihood. An example of this saturated data is presented in Figure 3(b) and (c). In the real data examples presented in Sections 3.1 and 3.2 no saturated counts are present.
Regarding the miscount parameter , note that for usual drop plate experimental designs ml, , the success probability in the binomial model is (6) which will be quite similar to for reasonable miss count probabilities . The effect of will be barely noticeable. For other experimental settings, the effect of could be more important, in which case could be also considered a (nuisance) parameter to be included in the posterior, with a tight prior for small. However, it would be advisable to make an experimental design to learn about the miscount parameter , with a reference sample with a known microbial abundance and several repetitions. However, to our knowledge, these hypothetical experiments have not been conducted as yet. In the examples presented in Section 3 we simply fix , this should barely have an effect, as seen in Figure 3(a). Nonetheless, is included for overall consistency of our approach.
(a)  (b)  (c) 
Regarding the lower limit of detection (LOD, Currie, 1968), that is, the minimum number of CFUs that can be detected in dilution 0 with the chosen experimental design, we may calculate such that , for various repetitions
. That is, setting all drop counts to zero, we define the lower LOD as the 95% upper quantile
of the corresponding posterior distribution for the microbial abundance . In Figure 4 we present examples of these posteriors for the drop plate design (, drops) with and repetitions. The results are and . A substantial increase in the lower LOD is obtained when increasing from 1 to 3 repetitions, but the increase in the lower LOD is slower then onwards, coinciding with the current practices of performing repetitions per experiment. This same approach can be applied to similarly assess the upper LOD for any experimental design (a useful quantity to estimate when some drop counts are TNTC). We further comment on LODs in our approach in section 4.2.3 Interlaboratory hierarchical model analysis
If the treatment has also been repeatedly studied in different laboratories, each lab will have an independent hierarchical variable, denoted by . Analogously we may define a global variable and where
(7) 
and again this generalizes the default Gaussian approach to positive values. This also describes what practitioners initially intend to do: infer an overall across many labs and asses its variability. This constitutes an additional hierarchy, where now represents the global mean for , for the experiment across laboratories. The twalk can be used to generate a chain of posterior samples, as was the case for the hierarchical models within each lab. We use this approach in the interlaboratory examples in Section 3.2.
3 Examples
All Python code and data from examples in sections 3.1 and 3.2 are available in the supplemental material.
3.1 Intralab analysis
Data are taken from a series of experiments performed in the Center for Biofilm Engineering, Montana State University, MT, USA. Biofilms of Sphingomonas parapaucimobilis were grown on a cylinder with surface area 4.52 cm and then subjected to different temperatures for varying contact times. Each temperature and time combination represents a treatment and each treatment was applied to replicate biofilm samples, as described in Wahlen et al. (2016). By sonication the biofilm is harvested from the cylinders into ml of buffered water to form dilution . To begin a fold dilution series, next ml is taken from dilution 0 and then 9 ml of water is added to form dilution , etc. up to dilution . Ten drops of each are plated and placed at C for h and in turn inspected for CFU formation. Therefore we have and .
In Figure 5 we present the results of our analysis of 4 treatments: Room temperature for min, 65 C for min, 70 C for min and 75 C for
min. Also included are results from the classic simple analysis (using the lognormal approach) of calculating confidence intervals from a mean and a standard deviation on the estimated log abundances. The resulting intervals, in general, do not coincide with the variability in the posterior distributions and in some cases may result in confidence intervals that include negative values; see Figure
5(d).(a)  (b) 
(c)  (d) 
A more extreme example is when most CFU counts are zero. This case is simply untreatable using a mean and standard deviation. For the and min treatment, two of the three repetitions had only zero counts (as mentioned in Section 2.2, we set and all ) and the third repetition had one drop with one CFU only at dilution 0, that is and . We show the corresponding posterior in Figure 6(b). Moreover, to appreciate the effect of the hierarchical model we include the ‘free’ posterior distributions for the s (see Appendix A), that is not considering the hierarchical model and each independent posterior for based only on the likelihood based on (6); see Figure 6(a). Since for repetitions we only had zero counts there is a positive probability that , and results in that the mode of this free posterior is precisely at 0. However, since repetition had one CFU then this renders logically impossible, and therefore it has zero posterior probability at , see 6(a).
The corresponding marginal posterior probabilities, now using the hierarchical model, seen in Figure 6(b), do not match completely with the former posteriors transformed to the scale. This is indeed a result of the hierarchical model and the shift in the individual marginal posteriors is a case of “borrowing strength” from one repetition to the other.
In many cases it is desired to study the log abundance of a treatment with respect to its concurrent untreated control, that is the log reduction (LR). For example, it is common for antimicrobial products to make claims such as “kills 99.9%” of bacteria which corresponds to . A considerable advantage of using a Bayesian approach is that the LR is analyzed explicitly through its constitutive parts (the controls and treated samples) which can open new possibilities for more comprehensive and goal oriented analyses for dilution series experiments (see also the LR analysis in Section 3.2 and the ‘activation probability’ analysis discussed in Section 4). Namely, the same hierarchical model may be fitted to the control data obtaining a posterior sample for the hierarchical log abundance, which we call and inference regarding the LR is simply based on the posterior distribution of . Since we have MCMC posterior samples from both variables, obtaining a posterior sample for is immediate; see for an example Figure 6(c). Moreover, we may calculate which in this case equals 0.9993 ie. “with near certainty the for 2 min treatment killed at least 99.9% of the biofilm”.
(a)  (b)  (c) 
3.2 Interlab analysis
In this example we report on results from a multilab study of the socalled “single tube method” (ASTM, 2013b, a), recently standardized by ASTM International (https://www.astm.org/), to test antimicrobial efficacy against biofilms of Pseuodomonas aeruginosa. In this example we focus on 3 labs from the interlaboratory study of the single tube method, and a single experiment from each lab. In each experiment, we analyze control samples and samples treated with a low concentration of bleach (i.e., 123 ppm of sodium hypochlorite) against biofilms. The biofilms in the labs were harvested into a ml volume to form dilution 0. As in the previous Section, to begin a fold dilution series, 1 ml is taken from dilution 0 and then 9 ml of water is added to form dilution 1, etc. up to dilution 6 (so there is a volume 10 ml in each dilution tube). Two drops of l each are spread plated, and placed at C for hr and in turn inspected for CFU formation. Therefore we have , and . Figure 7 presents the results of our analyses of the log densities () for each of the control and bleach treatments, and also the LRs.
(a)  (b)  (c) 
Unlike other fields of science where there is a demonstrable lack of reproducibility assessments, in the field of antimicrobial science, many organizations have pooled resources to quantifiably assess the reproducibility, across different laboratories, of methods that assess antimicrobial efficacy (Parker et al., 2018b). The reproducibility of test results of antimicrobials is of paramount importance for public health: regulators want to keep ineffective products out of the marketplace, and producers of highly effective products want to bring their products to market. All stakeholders want laboratory methods that accurately and reproducibly make decisions regarding which products are effective.
The reproducibility issue seems apparent in this example since the results from the LRs for this treatment, as seen in Figures 7(b) and (c) span , orders of magnitude. In a classical setting an analysis of variance (ANOVA) is performed on the dilution series data from multiple labs, but that inherits the problems of the log normal model already mentioned. In our setting, a formal reproducibility question may be put forward and then assessed with a posterior distribution. A first choice would be to compare the models for the mean log abundnce for each individual lab versus the full hierarchical model for the global mean log abundance , calculating the posterior distribution of each case, for example comparing the individual lab LR’s vs. the global LR ; see Table 1 for a tentative discussion. This approach is one possibility, but a more in depth analysis is needed to address other reproducibility questions that stakeholders may phrase. The best approach is to establish the necessary posterior probabilities and, to provide guidance for the involved decisions to be taken, to maximize posterior expected utilities. Our Bayesian setting opens up these possibilities, but not without further effort.
Laboratories  

1.6443  1.4053  1.2654  
0.5348  0.0014  0.2257 
4 Discussion
A key aspect of our model is the use of a binomial likelihood () with the parameter of interest being , the abundance of microbes in the original sample. We prefer the binomial likelihood because it models what the technicians actually do in the lab. This is in stark contrast to other common approaches (e.g., the lognormal and Poisson approaches) that provide only an approximation, and markedly different from the case where the statistician imposes an experimental design solely for convenience of the statistical analysis.
This work provides contributions to both the statistical and applied sciences. To the former, we show how to apply a Bayesian hierarchical model with a binomial likelihood to estimate and quantify uncertainty about microbial abundances () from dilution series experiments. The binomial likelihood has a rich history analyzing microbiological data in the case when, instead of CFUs, only a presence/absence response is measured from each tube in a dilution series from which can be estimated using MLE theory (McCrady, 1915; Cochran, 1950; Garthright, 1993). Interesting, this MLE approach is called the “most probable number” technique, coined by McGrady in 1915 before MLE theory had been developed (Hamilton, 2011). Our contribution is the first time to our knowledge that CFU data have been modeled with a binomial likelihood. Regarding our contribution to the applied sciences, we provide a sound alternative to the lognormal and Poisson modeling approaches that are commonly applied in the analysis of CFU data (Hamilton et al., 2013; Haas et al., 1999). We have shown that these common approaches have serious deficiencies when modeling CFUs. For example: in the Poisson case, the restriction on the variance does not hold for CFUs collected from control conditions (that tend to exhibit high abundances and low variability); and in the lognormal case, it is not possible to deal with 0’s and TNTC’s (without ad hoc substitution rules). The Negative binomial or Generalized Poisson models have been used to extend the Poisson model (Joe and Zhu, 2005). Instead, we present a Bayesian approach with a binomial likelihood that allows us to directly estimate the abundance of microbes without a severe constraint on the variance like the Negative binomial or Generalized Poisson would do, and unlike these approaches, our approach can directly incorporate data from multiple dilutions, directly analyze log reductions from the CFUs recovered from control and treated samples that have different levels of variability, and account for miscounts (recently a shifted Poisson model has been suggested to also handle miscounts BenDavid and Davidson, 2014). Furthermore, while there is readily available software for application of mixed effects models with either a lognormal or Poisson likelihood to assess reproducibility (see, e.g., Bates et al., 2015), we are not aware of similar extensions for the Negative binomial or Generalized Poisson approaches. We show how our approach may be applied for reproducibility assessments, by comparing the posterior for each individual lab to the posterior for the population of all labs. While this is an area of active research, the results we have presented seem promising.
A point not directly addressed in this paper is the analysis and comparison of multiple treatments, since here we concentrated on the novel Bayesian binomial approach using hierarchies for multiple repetitions and multiple laboratories for a single treatment. The usual objective in analyzing a series of treatments is to fit a regression model to test and/or predict their effect (see for example Wahlen et al., 2016). One of the main benefits of performing a Bayesian inference is that we may ask as many questions of interest as we require regarding our parameter of interest, and these are answered with its corresponding posterior probability. A simple initial approach, that exploits our Bayesian analysis, would be to calculate the actual posterior probability of the desired result, and compare such posterior probabilities across treatments. For example, commonly in microbiological experiments we are interested in the number of surviving microbes after treatment. This translates in calculating an ‘activation probability’ of a treatment intended to kill microbial activity, ie. for some (small) agreed threshold. If compared to a control , then we could calculate the posterior probability of a log reduction above some threshold, that is etc., see Figure 8 for an example. This approach lacks the formal predictive approach of fitting a regression model using covariates.
Although not trivial to do, a regression model may indeed be incorporated using a Bayesian approach, embedded in our hierarchical model and binomial likelihood, in a multiexperiment multilab scenario. These interesting possibilities are the focus of future research. In general, using a Bayesian approach opens the door to a formal Uncertainty Quantification (UQ) approach to the analysis of dilution series data, by exploiting the posterior distribution obtained and the ease in calculation of expectations or posterior probabilities given the MCMC sample obtained.
Instances of 0’s and TNTCs relate directly to the lower and upper limits of detection (LOD) for the process used to generate CFUs (Currie, 1968). The process includes how the dilution experiment was conducted such as the dilution factors ( and ), which dilutions were plated (), and the volume plated (). The process also includes properties about the method used to harvest the bacteria from the sample (e.g., sonication, scraping, stomaching or a RODAC plate), the method used to disaggregate/homogenize the bacteria into single cells in the original volume and the environmental conditions used to incubate the bacteria after plating. In microbiology, it is common to refer to either or as the LOD of a CFU counting method. This is unfortunate since neither of these values necessarily are associated with a high level of statistical confidence or probability regarding how many viable microbes survive in the sample. In our analyses (Figure 4) we defined the LOD of the process used to generate CFUs as the lowest microbial abundance that can survive in the sample with probability 0.95 when there were no CFUs recovered from the sample. From the Bayesian analyses that we present, this calculation is straightforward: the LOD is simply the 95 percentile from the posterior distribution for ; see Figures 3(a) and 4. Our hierarchical model also easily and consistently can provide the LOD as a function of dilution experiment design (i.e., and ) and the number of repetitions by using the posterior for . For example, in Figure 4, it is seen the expected result that the LOD decreases as the number of repetitions increases. The LOD for a particular experimental design can be calculated, beforehand, at minimal computational cost, to asses if the proposed design meets any LOD requirements.
Regarding the LOD or any other characteristic desired in an experiment, an intensive search can be conducted to optimize the design parameters in Bayesian analyses (Weaver et al., 2016; Huan and Marzouk, 2014; Christen and Buck, 1998). This, however, involves calculations of far higher costs, commonly needing parallel computing and other sophisticated software and numerical analysis resources. Given a design, parameters are simulated from the prior and in turn synthetic data sets from the model, to calculate the corresponding posterior for quantities of interest to asses the expected information gain in data from the design. This then needs to be repeated from a set of candidate designs to find an optimal design. We leave this interesting dilution experiment design possibility for future research.
Acknowledegments
JAC is partially funded by CONACYT CB201601284451, RDECOMM and ONRG grants. AEP is partially funded by NSF grant 1516951. The authors gratefully acknowledge the industrial associates of the CBE, Lindsey Lorenz and Professor Emeritus Martin Hamilton.
Appendix A Details on the hierarchical model
Our hierarchical model may be summarized with the Directed Acyclical Graph (DAG) depicted in Figure 9(a). As mentioned in the main text, using the following well known result we may integrate out the . That is, if and then . Using it recursively (1) becomes for
(8) 
and the corresponding DAG is as shown in Figure 9(b). This substantiates the equation (6) provided earlier. Since considering is important and may indeed happen note that it must be the case that . Accordingly we define as the Dirac’s delta at zero.
Note that, strictly speaking, is a discrete variable and we are modeling it here as a continuos r.v. The binomial model may be changed to accept a noninteger “number of trials” using gamma functions in the combination function, having as a particular case the conventional binomial pmf (this we do in our implementation providing no further details here). Then for mathematical convenience is modeled as a continuos r.v. while still is taken discrete.
Moreover, by ignoring the hierarchy involving and we may calculate the posterior distribution of each independently. In this case, since it is a single discrete parameter, calculating its posterior pmf is straightforward, constructing a likelihood from (LABEL:AppAeqn:model3b). For illustration purposes and comparisons this is done in Section 2.2 and in Figure 6(a). We call this the free posterior for , the microbial abundance in the .
(a) 
(b) 
The full set of parameters are . Only binomial and Gamma densities are involved and therefore the corresponding likelihood is simple to write and calculate. The likelihood, indeed since repetitions are exchangeable, is
(9) 
where is the corresponding binomial pmf stated in (8) and arises from (5), and are the r.v.’s of observed CFU counts arranged in matrices such that and . is established using the fact that if then .
Appendix B Multidilution data analysis
(a)  (b) 
We present results on whether it is worth analyzing CFU counts over all dilutions vs over only the first countable dilution. This question is difficult to answer in complete generality. We will then concentrate on a two common designs that correspond to data in Sections 3.1 and 3.2. Namely, for the drop plate data in Section 3.1 and for the spread plate data in Section 3.2.
We study the extreme case when the first dilution can be counted, that is all counts are below the TNTC threshold , and there is only one repetition (). We fix a true value for and simulate data at all dilutions using the binomial model in (8). Then we calculate the discrete posterior pmf of and repeat the process averaging over all resulting posteriors over 120 simulated data sets. The process is done calculating the posterior when only the first dilution is used and when all dilution counts are considered in the calculation of the posterior. For the true value of the abundance we considered only a set of representative values for both designs, below a maximum to maintain expected simulated counts below . Namely, 500, 5,000, 10,000, 20,000, 30,000 for the drop plate data design and 50, 500, 1,000, 2,000, 3,000 for the spread plate data design. The posteriors with the most noticeable differences, although still quite low, were at and , respectively; we present these posteriors in Figure 10. In all other cases the differences in posteriors were even smaller (not shown), suggesting that the added difficulty in counting, recording and analyzing CFUs at all dilutions is not worth the expected information gain, and therefore we recommend only to record and analyze the first countable dilution.
The reason that the posteriors are so similar can be seen in Figure 3. The likelihood for TNTCs in dilution becomes flat and provides no information for higher values than . If there are countable drops for dilution , the likelihood for these data is placed at fold higher values well beyond and therefore including the TNTCs likelihood in dilution will not have any significant effect on the posterior, at least for the common case where . Then including the censored likelihood terms for TNTC’s will not add substantial information and does not justify the added computational cost.
References
 ASTM (2013a) ASTM (2013a). An interlaboratory study was conducted by eight laboratories testing three samples of varying targeted results to establish a precision statement for test method e2871. ASTM International, West Conshohocken, PA, USA, https://www.astm.org/DATABASE.CART/RESEARCH_REPORTS/RRE351008.htm.
 ASTM (2013b) ASTM (2013b). Standard test method for evaluating disinfectant efficacy against pseudomonas aeruginosa biofilm grown in cdc biofilm reactor using single tube method. ASTM International, West Conshohocken, PA, USA, https://www.astm.org/Standards/E2871.htm.
 Bates et al. (2015) Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixedeffects models using lme4. Journal of Statistical Software, 67(1):1–48.
 BenDavid and Davidson (2014) BenDavid, A. and Davidson, C. E. (2014). Estimation method for serial dilution experiments. Journal of Microbiological Methods, 107:214–221.
 Christen and Fox (2010) Christen, J. and Fox, C. (2010). A general purpose sampling algorithm for continuous distributions (the twalk). Bayesian Analysis, 5(2):263–282.
 Christen and Buck (1998) Christen, J. A. and Buck, C. E. (1998). Sample selection in radiocarbon dating. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47:543–557.
 Clarke (1998) Clarke, J. (1998). Evaluation of censored data to allow statistical comparisons among very small samples with below detection limit observations. Environmental Science and Technology, 32:177–183.
 Cochran (1950) Cochran, W. G. (1950). Estimation of bacterial densities by means of the “most probable number”. Biometrics, 6:263–282.
 Cundell (2015) Cundell, T. (2015). The limitations of the colonyforming unit in microbiology. European Pharmaceutical Review, 6.
 Currie (1968) Currie, L. (1968). Limits for qualitative detection and quantitative determination. Anal. Chem., 40(586).
 EPA (1996) EPA (1996). Guidance for data quality assessment. Office of Research and Development, US Enivironmental Protection Agency.
 FDA (2018) FDA (2018). BAM 4: Enumeration of Escherichia coli and the Coliform Bacteria. US Food and Drug Administration, https://www.fda.gov/food/laboratorymethodsfood/bam4enumerationescherichiacoliandcoliformbacteria.
 Garthright (1993) Garthright, W. E. (1993). Bias of the logarithm of microbial density estimates from serial dilutions. Biometrical Journal, 35(3):299–314.

Geyer (1992)
Geyer, C. (1992).
Practical markov chain monte carlo.
Statistical Science, 7(4):473–511.  Haas et al. (1999) Haas, C., Rose, J., and Gerba, C. (1999). Quantitative Microbial Risk Assessment.
 Haas and Scheff (1990) Haas, C. N. and Scheff, P. A. (1990). Estimation of averages in truncated samples. Environental Science Technology, 24:912–919.
 Hamilton et al. (2013) Hamilton, M., Hamilton, G., Goeres, D., and Parker, A. (2013). Guidelines for the statistical analysis of a collaborative study of a laboratory disinfectant product performance test method. JAOAC International, 96(5):1138–1151.
 Hamilton (2011) Hamilton, M. A. (2011). The P/N formula for the log reduction when using a semiquantitative disinfectant test of type SQ1. Center for Biofilm Engineering.
 Hamilton and Parker (2010) Hamilton, M. A. and Parker, A. E. (2010). Enumerating viable cells by pooling counts for several dilutions. Center for Biofilm Engineering.
 Hedges (2002) Hedges, A. J. (2002). Estimating the precision of serial dilutions and viable bacterial counts. International Journal of Food Microbiology, 76:207–214.
 Herigstad and Hamilton (2001) Herigstad, B. R. and Hamilton, M. (2001). How to optimize the drop plate method for enumerating bacteria. J Microbiological Methods, 44(2):121–129.
 Horvitz and Thompson (1952) Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47:663–685.
 Huan and Marzouk (2014) Huan, X. and Marzouk, Y. M. (2014). Gradientbased stochastic optimization methods in Bayesian experimental design. International Journal for Uncertainty Quantification, 4(6):479–510.

Joe and Zhu (2005)
Joe, H. and Zhu, R. a. (2005).
Generalized poisson distribution: the property of mixture of poisson and comparison with negative binomial distribution.
Biometrical Journal, 47(2):219–229.  Kass and Raftery (1995) Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430):773–795.
 Maturin and Peeler (1998) Maturin, L. and Peeler, J. T. (1998). Chapter 3 – aerobic plate count, section: Conventional plate count method. In Council, B., editor, FDA Bacteriological Analytical Manual. US Food and Drug Administration.
 McCrady (1915) McCrady, M. H. (1915). The numerical interpretation of fermentationtube results. J. Infectious Disease, 17:183–212.
 Niemela (1983) Niemela, S. (1983). Statistical evaluation of results from quantitative microbiological examinations. Nordic Committee on Food Analysis. Ord Form AB, Uppsala.
 Niemi and Niemela (2001) Niemi, M. M. and Niemela, S. I. (2001). Measurement uncertainty in microbiological cultivation methods. Accreditation and Quality Assurance, 6:372–375.
 Parker et al. (2018a) Parker, A., Pitts, B., Lorenz, L., and Stewart, P. (2018a). Polynomial accelerated solutions to a large gaussian model for imaging biofilms: in theory and finite precision. Journal of the American Statistical Association, 113(524):1431–1442.
 Parker et al. (2018b) Parker, A. E., Hamilton, M. A., and Goeres, D. M. (2018b). Reproducibility of antimicrobial test methods. Scientific Reports, 8(1):12531.
 Parkhurst and Stern (1998) Parkhurst, D. F. and Stern, D. A. (1998). Determining average concentrations of cryptosporidium and other pathogens in water. Environmental Science and Technology, 32:3424–34–3429.
 Pitts and Stewart (2008) Pitts, B. and Stewart, P. (2008). Confocal laser microscopy on biofilms: Successes and limitations. Microscopy Today, 16(4):18–21.
 Prescott et al. (1996) Prescott, L., Harley, J., and Kline, D. (1996). Microbiology. Wm. C. Brown Publishers, third edition edition.
 Rice et al. (2017) Rice, E., Baird, R., and Eaton, A. (2017). Standard Methods for the Examination of Water and Wastewater. American Public Health Association, American Water Works Association, Water Environment Federation, 23rd edition edition.
 Singh and Nocerino (2002) Singh, A. and Nocerino, J. (2002). Robust estimation of mean and variance using environmental data sets with below detection limit observations. Chemometrics and intelligent laboratory systems, 60:69–86.
 Wahlen et al. (2016) Wahlen, L., Parker, A., Walker, D., Pasmore, M., and Sturman., P. (2016). Predictive modeling for hot water inactivation of planktonic and biofilmassociated sphingomonas parapaucimobilis to support hot water sanitization programs. Biofouling, 32(7).
 Weaver et al. (2016) Weaver, B. P., Williams, B. J., AndersonCook, C. M., and Higdon, D. M. (2016). Computational enhancements to bayesian design of experiments using gaussian processes. Bayesian Analysis, 11:191–213.
 Zuur et al. (1999) Zuur, A., Ieno, E., Walker, N., Savelliev, A., and Smith, G. (1999). Mixed Effects Models and Extensions in Ecology with R.
Comments
There are no comments yet.