Nonparametric Bayesian multi-armed bandits for single cell experiment design

by   Federico Camerlenghi, et al.

The problem of maximizing cell type discovery under budget constraints is a fundamental challenge in the collection and the analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and real data alike. HPY-TS code is available at



There are no comments yet.


page 18


Medoids in almost linear time via multi-armed bandits

Computing the medoid of a large number of points in high-dimensional spa...

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Games with large branching factors pose a significant challenge for game...

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Drawing a sample from a discrete distribution is one of the building com...

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

We design a new myopic strategy for a wide class of sequential design of...

Sequential Monte Carlo Bandits

In this paper we propose a flexible and efficient framework for handling...

Ultra Fast Medoid Identification via Correlated Sequential Halving

The medoid of a set of n points is the point in the set that minimizes t...

Bandit-PAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

Clustering is a ubiquitous task in data science. Compared to the commonl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Technological developments in high-throughput genomics have generated a wealth of data allowing researchers to measure and quantify RNA levels of individual cells (single_cell_1; single_cell_2). Benefiting from experimental and computational advances alike, single-cell RNA-seq (scRNA-seq) allows the characterization of cell types and cellular diversity, offering invaluable insights at scales unattainable in previous bulk gene expression studies (bulk_v_sc_1). In particular, to understand the diversity of the thousands of cell types and subtypes across different organisms, recent incentives aim at the molecular profiling of all cell types of complex organisms such as mouse or human (regev2017science; han2018mapping). Despite the decreasing cost of technologies for single cell sequencing, cell atlases are still expensive to collect and hard to coordinate across species, cells, tissues, organs, diseases, technologies, and labs. A principled way of collecting data is therefore paramount: given the experimental cost limiting the number of cells to be sequenced, and given multiple related experimental scenarios (e.g., developmental time, biological region, tumor site), how can one allocate the cellular sequencing budget in order to minimize experimental cost and maximize the number of distinct cells types obtained? In this paper we give an effective Bayesian nonparametric approach to address this problem.

Recent work (bubeck2013optimal; battiston2016multi; dumitrascu2018gt) proposed the use of classical multi-armed bandit strategies, e.g., upper confidence bounds (UCB) (lai1985; auer2002) and Thompson sampling (TS) (thompson1933)

, for devising sequential approaches to maximize the number of distinct species discovered by sampling over multiple populations. These sampling strategies balance the exploration of the experimental choices—which populations are sampled—with the exploitation of populations that maximize current estimates of the expected rewards–the observed species diversity within a population. In the classical multi-armed bandit setting, a gambler is presented with slot machines (

one-armed bandits

is the colloquial term for a slot machine in American slang) that each pay out a random reward sampled from an arm-specific probability distribution. The gambler commits to querying a given arm for a single trial before switching to another arm, and her goal is to select a sequence of arms to play in order to maximize her rewards over subsequent trials. At each step, the gambler estimates the expected rewards of a single trial from each machine’s arm, both queried and not. She must then balance exploiting the arm with the current highest estimate and exploring undersampled arms to improve estimates of the arms’ expected rewards.

A natural variation of the above multi-armed bandit setting is when the gambler commits to querying a given arm for a pre-determined number of consecutive trials before switching to another arm. This variation is readily applicable to the experimental design problem of guiding the sequential selection of samples through single cell sequencing technologies: we may sequence some number of cells from one of multiple tissues or sample sites. In detail, we consider this problem as a set of sequential trials where a scientist may choose a subset of tissue samples to assay. Each organ, tissue type, sample site, or experimental condition represents an arm to be pulled. When choosing a specific arm, the scientist commits to sequencing a number of cells proportional to the maximum number of new cell type discoveries expected in a future sample from the given experimental condition. The reward of each experimental trial is given by the number of new cell types uncovered in the sequenced sample. A first attempt to address this problem, within the context of scRNA-seq data, was proposed in dumitrascu2018 by combining a class of Good-Toulmin (GT) estimators (good1953population; good1956number; efron1976estimating; orlitsky2016optimal) with the TS strategy.

In this paper, we follow ideas from battiston2016multi and dumitrascu2018 to introduce a Bayesian nonparametric counterpart of the previous Good-Toulmin Thompson sampling (GT-TS) approach (dumitrascu2018). Because of the purely nonparametric nature of smoothed GT estimators, the GT-TS approach does not allow us to take into account the structure of cell type diversity. As cell types arise through cellular differentiation (rizvi2017single), they organize themselves in suitable hierarchies or developmental landscapes (waddington1957strategy)

. Hierarchical structures can be imposed on the cell types through Bayesian nonparametric priors, as was done for cell trajectory reconstruction and Bayesian inference on developmental lineages

(heaukulani2014beta; shiffman2018reconstructing).

A natural choice for a nonparametric prior to model cell type diversity is the hierarchical Pitman-Yor process (HPY) (teh2006hierarchical; teh2010). The HPY process has previously been used in the context of species discovery problems in multiple populations, and it has been shown to have good performance in small data sets (camerlenghi2018; bassetti2018). Yet, species sampling problems considered in these recent studies are not sequential problems: a Bayesian nonparametric model with a HPY prior is fit to the data de novo each time new data become available. This makes current posterior sampling procedures designed for the HPY prior infeasible for our sequential species sampling problem of rapidly-growing single cell data sets.

We propose a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach for guiding the selection of samples for single cell sequencing technologies with the goal of maximizing the diversity of cell types discovered. Our approach has two main contributions. First, we introduce a multi-armed bandit strategy that combines the TS approach with a Bayesian nonparametric counterpart of the GT estimator under the HPY prior, extending previous work that allowed only a single trial to a pre-determined number of consecutive trials before switching arms (battiston2016multi). The TS strategy encodes the sequential exploration-exploitation process associated with data collection from any given region, whereas the use of the HPY prior incorporates biologically relevant information regarding the relationships among cell types to guide the allocation of resources. Second, we devise an efficient posterior sampling scheme that relies on sequential Monte Carlo methods (west1993a; liu2001combined)

. Sequential Monte Carlo (SMC) allows us to fully exploit the sequential nature of our species sampling problem, thus avoiding the overwhelming computational burden of the Markov chain Monte Carlo scheme proposed in

battiston2016multi. We compare out method to the previous method (GT-TS) and to an oracle in simulations and in a data set based on the Mouse Cell Atlas (han2018mapping). Since our motivation lies in the realm of single cell experimental design, we illustrate how, given a per trial budget, the resulting algorithm leverages information across tissues to inform subsequent experiments in order to maximize cell type discovery in the Mouse Cell Atlas (han2018mapping).

The paper is structured as follows. Section 2 contains preliminaries on: i) the multi-armed bandit setting within the context of prioritizing single cell sampling across populations, i.e., organs, tissues, regions, and experimental conditions; ii) the definition of HPY prior, and some of its marginal sampling properties. In Section 3, we introduce our Bayesian nonparametric sequential approach, referred to as the HPY-TS strategy, for guiding the selection of samples through single cell sequencing technologies. A detailed description of the sequential Monte Carlo approach for posterior sampling is presented in Section 4. Section 5 contains numerical illustrations of our approach through a simulation study and an application to a data set derived from the Mouse Cell Atlas. In Section 6, we summarize our work and briefly discuss extensions to our HPY-TS strategy.

2 Preliminaries

Let denote the set of labels representing the cell types of an organism being studied. The cell type composition within each of the possible populations (arms, experiments) is characterized by a probability distribution over – cell types are shared across populations. Precisely, we denote by the probability distribution on in population , for . Let be the number of cells (pulls) observed from the th population, let

be the vector of

observations from the th population, whereas is the joint sample corresponding to a budget . Assume now to have an additional budget that constraints the number of cells that can be collected per trial in a future experiment. If populations are available, a multi-armed bandit iteratively selects a subset of population to sample from as well as the appropriate number of cells to sample in that population. At each step, the arm is chosen with the goal of maximizing the number of novel cell types observed. Therefore, in order to set up our strategy, we need to estimate the number of thus far unseen species (cells) that are going to be sampled for every possible arm , as .

A possible strategy to address this sequential problem was first proposed by dumitrascu2018gt. This approach relies on a smoothed version of the Good-Toulmin estimator of the number of unseen species (orlitsky2016optimal). However, while the smoothed Good-Toulmin estimator presents attractive statistical properties, as discussed by orlitsky2016optimal, it is designed for a single population scenario. In this paper, we focus on data coming from multiple related populations. Indeed, we discover cell types across diverse tissue types assayed in scRNA-seq experiments. It is then important to guarantee two key properties in our model: that it (i) preserves data heterogeneity for different tissues; and (ii) allows borrowing of information across the different tissues. Hierarchical Bayesian nonparametric priors are tailored for such situations: the data are divided into distinct populations (according to the tissue they are derived from), and at the same time the hierarchical construction allows a borrowing of information across the diverse populations of cell types.

2.1 The Pitman-Yor process

The Bayesian nonparametric (BNP) approach relies on the choice of a prior distribution for the cell type labels. The Dirichlet process (DP) (ferguson1973bayesian) is a well-known Bayesian nonparametric distribution. In this paper, we make use of a generalization of the DP, the Pitman-Yor (PY) process (pitman1997). The PY process is a random probability measure that depends on two parameters , respectively called the concentration and the mass parameter, with a base measure on the space of labels . The admissible values we consider here for these parameters are and . The most simple way to define the PY process uses a stick-breaking procedure (sethuraman1994constructive). More specifically, is a discrete random probability measure such that



is a sequence of i.i.d. random variables as

, and

is a collection of independent beta-distributed random variables with parameters

. The two sequences and are assumed to be independent. We write to denote the distribution of . The classical DP prior can be found as a limiting case of the PY process, letting .

It is worthwhile to highlight the differences between the PY and the DP process with respect to predictive distributions. In both the cases, the predictive distribution may be represented as a Chinese restaurant process (CRP); see pitman1997 for a detailed account and references. Consider a sample of size from the PY process. The almost sure discreteness of the random probability measure allows for ties within the sample. Then, let be the number of distinct values within the sample , denoted as and having multiplicities . Then, the predictive distribution of the st observation given past observations is


In other words, Equation (2.1) tells us that the probability of observing an old value is proportional to . Intuitively, the more samples of a species we observe, the higher the probability of sampling it again in future trials; this is referred to as “the rich get richer” behavior. Alternatively, the probability of sampling a new observation from the base measure is proportional to . Notice that the clustering structure of the PY depends on two parameters, and , whereas in the DP it is governed only by . This more complex parametrization offers more flexible clustering rates and cluster size tail behaviors for the PY process (ishwaran2001gibbs).

2.2 Hierarchies of Pitman-Yor processes

When considering observations sampled from multiple populations, it is natural in the Bayesian framework to model the group structure with a hierarchical framework. Here we use a hierarchical structure based on the PY process. We denote by the distribution of cell type labels across all of the populations (experimental design conditions). The probability distribution is almost surely discrete with an unknown number of atoms, and we select a PY prior to model the distribution of cell type labels with parameters and non–atomic base measure on the space of labels. Then each population-specific distribution is modeled using a PY process prior with parameters , and we further suppose that the common base measure for all the s is the PY process . Summing up, we have specified the following hierarchical PY (HPY) process prior:


The hierarchical specification introduces dependencies among different populations (experimental conditions or arms), thus allowing the borrowing of information across populations since the base measure is common to the different collections of observations (teh2006hierarchical; camerlenghi2018). Conditional on the base measure , the s are independent PY processes. In particular, the interpretation of the parameters is the same as in the single population case described above.

The predictive distribution and the combinatorial structure induced by hierarchical processes can be thought of in terms of the Chinese restaurant franchise (CRF) metaphor (teh2010). According to this culinary metaphor, each sample identifies the dishes chosen by the customers of restaurant (group) , for any . People sitting at the same table eat the same dish, and the same dish can be served within the same restaurant or across different restaurants, since we use the same a.s. discrete base measure for all of the groups. We denote by the distinct dishes across the samples, and represents the number of customers in restaurant eating dish . The vector encodes all the frequencies for a specific population .

The combinatorial structure induced by the HPY process is usually formally described by the so called partially exchangeable partition probability function (pEPPF) defined by


In other words, this represents the probability of observing a specific configuration of the dishes across the restaurants. A tractable expression of the pEPPF (2.3) was found in camerlenghi2018, resorting to auxiliary latent variables, which have to be seen as tables in the CRF language. More specifically, each observation (customer) is associated with a latent tag identifying the table of the restaurant to which the specific customer is seated. We have the constraints

where we denote by the number of tables in restaurnat serving dish , namely ; denotes the number of customers in restaurnat sitting at table , eating dish . In order to fix the notation, in the sequel it will be useful to denote by the number of distinct values in the th group , indicated by , which is a subset of the set .

The introduction of latent tables leads to a refinement of the partition of the observations defined in Equation (2.3). Indeed, now we can look for the probability that the observations are partitioned into a set of distinct groups according to both tables and dishes. In particular, such a probability coincides with an augmented version of Equation (2.3) derived in camerlenghi2018, i.e.,


where the functions and denote the so-called exchangeable partition probability function (EPPF) induced by and , respectively. Then we have

where is the Pochhammer symbol, and where represents the total number of tables in group , and is the number of tables across restaurants. A completely analogous formula holds for as well. One can then obtain an expression for Equation (2.3) by integrating out the tables in Equation (2.4).

The CRF provides a simple and meaningful interpretation of the predictive distributions for observed species within and across populations. In particular, conditional on , the predictive distribution for a new observation of the th population is the same as the CRP in the single population case. On the other hand, integrating out , we obtain the predictive distribution for the new species in population with respect to the unique species in the joint sample (across populations). That is, we have

where are distinct species in the joint sample from populations, and is the number of observations in population from species .

Notice that is the number of times that species has been observed in the joint sample. Intuitively, a high value of leads to a high probability of observing in all populations, even if has not yet been sampled in some of the populations. In particular, this probability is proportional to : the number of times that we observe minus the discount parameter of the base distribution . In other words, the parameters allow us to control the total number of species in the joint sample and the extent of sharing of species across different populations. If is low then, in expectation, the total number of distinct species in the joint sample will be low in expectation. If is high then, in expectation, the distinct populations will share fewer species.

3 The HPY-TS strategy

In this section, we present our Bayesian nonparametric sequential approach, referred to as HPY-TS, for guiding the selection of samples for single cell sequencing technologies. HPY-TS is a multi-armed bandit strategy that combines the TS strategy with a Bayesian nonparametric counterpart of the GT estimator under the HPY prior. Our HPY-TS strategy may be viewed as follows: i) an extension of the strategy proposed by battiston2016multi from a single trial before switching arms to a pre-determined number of consecutive trials before switching arms; ii) a Bayesian nonparametric counterpart of the GT-TS strategy proposed by dumitrascu2018, where the smoothed GT estimator is replaced by its Bayesian nonparametric counterpart under the HPY prior including a hierarchical structure on the species.

Consider cells that are simultaneously observed from multiple populations. These populations, i.e., organs, tissues, regions, or experimental conditions, represent arms to be selected for experimentation. Under the HPY prior assumption for the unknown composition of the populations, the HPY-TS strategy prescribes to select the population in such a way as to maximize the number of new distinct cell types that we expect to observe in additional cells from the selected experiment. We define the set of hitherto unobserved cells as , and we denote by the random number of new distinct cell types that will be observed in an additional sample of size collected from population (or arm) . We compute the expectation of the posterior distribution of , here denoted by , for each arm , and then we choose the arm that corresponds to the maximum value of .

Under the HPY prior, provides the natural Bayesian nonparametric counterpart of the smoothed GT estimator in dumitrascu2018. The posterior expectation was first described in battiston2016multi. In the next proposition, we simplify the expression for the posterior expectation in Proposition 2 of battiston2016multi. We denote by the beta distribution with parameters . Let be the unknown cell type proportions of population . Let represent the unknown cell type proportions for the collection of cells that have not yet been sampled from population .

Proposition 3.1.

Let the unknown cell type proportions of population be modeled according to the HPY process (Equation (2.2)). Conditionally on random variables and , where

one has



Here, random variable , for any , counts the number of distinct values in a random sample of size from a PY process with updated parameters , and is the probability that .


We recall that function gives the probability of having distinct values in a sample of size drawn from a PY process with parameters . Then, we write the following,


denotes the generalized factorial coefficient. Then, according to Proposition 2 of battiston2016multi, the expectation in Equation ((3.1)) may be expressed as follows

By exchanging the order of the summations in , , and , we get the following expression


The sum with respect to in Equation ((3.2)) may be evaluated using the expected value of the number of distinct values sampled from the PY process with parameters and . See Equation (3.13) in pitman1996. Then we have

The thesis follows by straightforward calculations and by observing that . ∎

Following from the result of Proposition 3.1, one can recover a formula for by simply integrating Equation (3.1) with respect to the distribution of and the distribution of . Using this observation, we can infer that the computational complexity of computing the formula for is proportional to . Having found the posterior expectation of for all populations in Proposition (3.1), our HPY-TS strategy selects the population with the highest expected rewards, computed from a posterior sample. More specifically, we sample and from the distribution described in Proposition 3.1. Then, conditional on these realizations, we compute according to Equation (3.1). Finally, we choose the population with the highest realized value. Details of the HPY-TS strategy are described in Section 4. Our HPY-TS strategy is based on the Thompson’s sampling approach (Algorithm 1), with parameters updated sequentially according to Algorithm 2.

With regards to the choice of the prior distribution for the hyperparameters of the HPY, we put a uniform prior on

for both parameters and . Moreover, we put a gamma prior with parameters for both parameters and . All prior distributions are assumed to be independent. Note that Algorithm 1 depends on parameters

and on the table counts of the CRF, which are encoded by the vector . It is worth stressing that the collection of table counts are latent variables that have not been observed in the initial sample. Therefore, before running Algorithm 1, we estimate these latent variables using a Gibbs sampler based on the expression of the pEPPF from camerlenghi2018, using a suitable adaptation of their algorithm. Specifically, we exploit the sequential structure of the problem to update the vector of parameters : we describe the new and efficient algorithm for the updating of in the next section.

4 Sequential parameter updates

The multi-armed bandit problem is a sequential allocation problem, where the goal is to find the best allocation strategy to sample new observations from different populations at every experimental time step. Whenever the new cells are sampled from a population one has to update the parameters of the HPY process in a computationally feasible way. A possible approach to this problem was suggested in battiston2016multi, where the authors use Markov chain Monte Carlo (MCMC) to estimate the posterior distribution of the hyperparameters of the HPY. However such an approach does not take advantage of the sequential nature of the species sampling problem and, more importantly, is not computationally feasible with large data sets. The computational burden of the approach of battiston2016multi makes its direct application almost impossible, except for toy examples with small numbers of arms. In this section, we suggest a computationally tractable approach that leverages the sequential structure of the problem (Algorithm 1) and is based on a filtering algorithm of liu2001combined.

  for :number of new samples do
     for  do
        draw Compute according to Proposition 3.1
     end for Compute ; Draw the next sample from population ; Update the HPY parameters according to Algorithm 2;
  end for
Algorithm 1 HPY-TS

The HPY-TS strategy selects the arm to sample from, then one sequentially samples the batch of cells from the selected arm. After that, one updates the model parameters with the new observation encoded by to select the new arm to sample from. We then consider discrete time points , and we clarify how to sequentially update the parameters of our model in a computationally feasible way. Let us fix some notation: is the vector of observations from the arm selected at time , and is the set of observations available at time . Thus, we can think of a model that is described by a distribution evolving in time and depending on a vector of model parameters . At each iteration, we select an arm and observe , and we sample the updated parameters from the posterior distribution , as . Note that this posterior distribution is proportional to

due to Bayes Theorem. We can think of

as the density function of at time . Our aim is to sample a new set of parameters from the posterior distribution of in presence of a new observation . The key idea is to approximate the distribution of with a mixture of Gaussian kernels, i.e.,

where is the set of importance sampling weights for at time , is the number of importance samples at each time step, and

is the density function of a Gaussian distribution with mean

and covariance . Moreover, is the estimate of the covariance with respect to the Monte Carlo posterior, and is a smoothing parameter. liu2001combined suggest to choose as a decreasing function of the number of importance samples. In our simulations, we set . In order to avoid “loss of information” over time, earlier work (west1993a; west1993b) propose shrinkage kernel locations and suggest setting and , where is the mean of the Monte Carlo sample of size at time . In particular, with these choices we preserve the covariance over time.

  Evaluate for each :
which are the prior point estimates of . Construct a posterior approximation of with weights and samples , for , as follows.
  for  do
  • Sample an auxiliary integer variable from the set with probability proportional to:

  • Sample a new parameter vector from the th normal component of the kernel density, namely:

  • Evaluate the corresponding weights


    in other words is proportional to the pEPPF defined in (2.4), depending on the parameters and on the information up to time .

  end for
  Resample according to the importance weights to obtain a set of parameters with equal weights–in other words, a Monte Carlo approximation of the posterior.
Algorithm 2 Filtering algorithm

In our framework, one only needs to evaluate the conditional distribution , which may be easily recovered from the expression of the pEPPF in Equation (2.4). We initially run a Gibbs sampler (camerlenghi2018) to obtain random samples of the latent table counts and parameters . The output of the initial sample may be regarded as an importance sample of with equal weights at time . Algorithm 2 describes the sequential updates of parameters of the HPY.

5 Applications

5.1 Simulation study

We first demonstrate the utility of our improved HPY-TS Algorithm in the context of simulated data. We consider a setup with arms, representing a samples corresponding to different species. The true distribution of each arm follow a Zipf’s law, such that the mass assigned to the most common species in any given population is

where is the number of species in population , is a real parameter that controls the distribution of mass among the support: a large indicates the total mass is concentrated on a few points, and a small value indicates the mass is shared across many points. Hence, an arm with a low is a ‘winning arm’ or an arm with high species diversity. Among the arms we consider winning arms (Zipf parameter ), and less diverse arms (Zipf parameter ). An optimal strategy should balance exploration and exploitation, and query the less diverse arms occasionally, while focusing on the winning arms.

We evaluate the performance of our TS strategy by comparing it with three baselines: an Oracle strategy, a Uniform strategy, and the Good-Toulmin Thompson sampling (GT-TS) strategy of dumitrascu2018gt. In particular, the Oracle strategy is used to compare performance with optimal behavior; the Oracle strategy is allowed to see into the future or pre-sample from all the arms and make the optimal decision at every iteration. The Uniform strategy allocates the budget uniformly across the arms. The GT-TS strategy is based on a smoothed version of the Good-Toulmin estimator. More precisely, the smoothed Good-Toulmin estimator (orlitsky2016optimal) estimates the number of new species that will be sampled in an additional sample of size for a fixed population ; this estimator is defined as

where is known as the extrapolation factor, denotes the number of species occurring with frequency in , the sample from the th population, and is an independent random nonnegative integer. Common choices for the distribution of

include the Poisson distribution and the binomial distribution

(orlitsky2016optimal). The Good-Toulmin diversity estimator can be incorporated into the multi-armed bandit framework as follows (dumitrascu2018gt). At each step, an arm is chosen based on its probability of yielding the greatest novel diversity. The probability that the th population is chosen during a trial is based on the weight of its Good-Toulmin estimator . Upon collecting new samples from the chosen arm, the reward (the number of novel cell types) is observed, and the parameters of the Good-Toulmin estimator for the chosen population are reestimated with the new samples and reward.

In implementing our HPY-TS strategy, we use an initial sample of observations from each of the arms, with observations sampled at each iteration. We use sampling steps, and the results are averaged over runs. The computations are performed in parallel, and the code is available at We observe that the HPY-TS algorithm performs better than the GT-TS strategy and the Uniform strategy; the latter two methods explore, but fail to exploit the most diverse arms (Fig. 1). As expected, the HPY algorithm discovers fewer new species than the Oracle strategy, but the HPY approach comes close to Oracle behavior. The results show similarities with the positive performance previously reported in the work of battiston2016multi for a simulation scenario with a small number of arms ( arms), with the added benefit of scalability to an order of magnitude more arms.

Figure 1: Simulation results. We considered a multi-armed bandit setting for population sampling with arms in which the species diversity follows Zipf’s laws with parameters ( high diversity, winning arms) and ( low diversity arms). An initial sample of cells were collected from each of the arms, with additional cells sampled at each iteration. We used sampling steps and averaged the results over

runs. We compared HPY-TS (red, dashed) to two baselines—the GT-TS sampler (black, dotted) and a Uniform sampling strategy (green, dot-dashed line)—and to an Oracle estimator (blue, solid). The shaded bands are within one standard deviation to the average performance, computed as the mean across simulations.

5.2 Application to single cell RNA-seq experimental design

We further illustrate the advantage of our approach in the context of a simulation study based on the Mouse Cell Atlas data (han2018mapping). The Mouse Cell Atlas aims to provide the first high-throughout transcriptome-based single-cell atlas in a mammalian system. The project assayed over cells from all of the major mouse organs and identified previously uncharacterized cell populations (Fig. 2). Following technical noise correction, high-quality cells were sequenced, representing distinct tissues and major cell types across four developmental stages – embryo, fetal, neonatal, and adult. In the collection process, equal numbers of cells were sampled uniformly across organs and developmental stages. We show that our experimental design approach achieves similar cell type diversity while requiring substantially fewer samples when compared to related methods. We follow the simulation setup developed in prior work (dumitrascu2018gt), outlined below.

Figure 2: Summary of the single cell RNA-seq data from the Mouse Cell Atlas (han2018mapping). Panel A: Cell type distributions across tissues together with the corresponding cells and specimens. Panel B: Cell type distributions per arm: aggregated tissue types and developmental stages.

In our simulation, we envision a setting in which the cells were assayed in smaller batches than in the actual experiments. Smaller batches are common in single cell experiments that use technologies that are less noisy but more expensive; thus experimental design plays an important role in minimizing cost (angerer2017single). Moreover, larger batches would quickly saturate the available data, so we evaluate on batch sizes that are typically smaller than are used. The mouse organs were aggregated across the four developmental stages, fetal, adult, embryo, and newborn, resulting in a heterogeneous data set, and cells were sampled with replacement from each of the four experimental categories, representing the developmental stages. An experimental round corresponds to an allocation step in which budgets of cells are are distributed across experimental conditions.

We consider two ways of allocating samples: the incidence case and the delayed abundance case. In the incidence case (Fig. 3), a most informative experimental condition is chosen and samples come from that single condition. In the delayed abundance case (Fig. 4), samples are allocated across all the available experimental conditions in parallel. In both cases, the budget allocation step is applied using the HPY-TS strategy as follows. In the incidence case, we allocate more cells to the experiment (i.e., developmental stage) with a higher probability of yielding new cell types based on previous trials. Following the initial sampling step with samples from each arm, additional trials were performed. At each time step, all cells were sampled from one chosen experiment. In the delayed abundance case, after the initial samples from each arm, a budget of cells were distributed across arms according to the HPY-TS estimated probabilities, across sequential trials. The results were averaged over runs for each algorithm, and the HPY-TS sequential Monte Carlo strategy used sampling steps. We compare HPY-TS to three other approaches–the GT-TS sampler, a Uniform sampling strategy, and an Oracle estimator.

Figure 3: Performance of HPY-TS on the Mouse Cell Atlas data (incidence case). An initial sample of cells were collected from each of four populations: fetal, adult, embryo, and newborn. Following the initial sampling step with samples, sequential trials were performed. At each time step, all cells were sampled from one chosen experiment. The results were averaged over runs of each algorithm. We compared HPY-TS (red, dashed) to two baselines—the GT-TS sampler (black, dotted) and a Uniform sampling strategy (green, dot-dashed line)—and to an Oracle estimator (blue, solid). The shaded bands are within one standard deviation to the average performance, computed as the mean across simulations.

Our results show that the HPY-TS approach achieves substantial improvement in efficiency as compared to the baseline GT-TS estimator and to the Uniform sampling strategy (Fig. 3). Moreover, the HPY-TS approach shows nearly optimal performance, as compared with the performance of the Oracle. When compared to the Uniform strategy, our HPY-TS approach leads to, on average, as much as more distinct cell types identified, with an average consistent margin of additional distinct cell types identified across trials (Fig. 3, 4). The baseline GT-TS approach approximates the probability of observing a new cell type according to a model that assumes the cell types are distributed according to a Poisson process (orlitsky2016optimal). In contrast, the HPY-TS algorithm assumes that all arms share a baseline distribution given by the base measure, information that is diffused across the developmental landscape to generate the developmental stage-specific cell type distributions. Sharing information across experiments using this prior appears to substantially improve performance by allowing updates of the parameters governing experiments similar the chosen experiment at each iteration, instead of only the chosen experiment parameters.

Figure 4: Performance of HPY-TS on the Mouse Cell Atlas data (delayed abundance). An initial sample of cells were collected from the four populations: fetal, adult, embryo and newborn. Following the initial sampling step, additional trials were performed and, at each time step, samples were distributed across the arms following a diversity estimation step. The results were averaged over runs of each algorithm. We compared HPY-TS (red, dashed) to two baselines—the GT-TS sampler (black, dotted) and a Uniform sampling strategy (green, dot-dashed line)—and to an Oracle estimator (blue, solid). The shaded bands are within one standard deviation to the average performance, computed as the mean across simulations.

6 Discussion

In this paper, we propose a Thompson sampling strategy using a hierarchical Pitman-Yor process to optimize species discovery in experimental design. Our HPY-TS strategy was shown to substantially improve cell type discovery in the setting of experimental design for single cell sequencing collection. Our multi-armed bandit setup readily applies to cases where the number of arms corresponding to experimental conditions have substantial structure across those conditions. In particular, as cell atlases emerge, the strategy developed here is crucial to efficiently and effectively study cell type variability across new and growing experimental conditions including many thousands of simultaneous cellular perturbations (e.g., Perturb-seq (dixit2016perturb)) and combinatorial interventions (horlbeck2018mapping). The improvements that the HPY-TS strategy achieves over uniform experimental design strategies in both simulated and real data justify incorporating these types of methods in the data collection pipeline during the experimental process.

From a statistical standpoint, our work proposed a sequential Monte Carlo scheme that, unlike previous work (battiston2016multi), scales to a multi-sample setting and allows for inference across a large number of experiments as one finds in cell atlas development or Perturb-seq experiments. This makes our HPY-TS strategy appropriate for experimental setups with a large and growing number of arms. We demonstrate a number of advantages in using Bayesian experimental design to maximize cell type discovery within a budget during single cell RNA-sequencing experiments. We further show evidence that modeling the cell type structure of single cell data using an HPY prior captures the developmental constraints guiding cell type diversity and allows each sample to inform all of the arms, leading to near-oracle behavior.


The authors would like to thank David B Dunson, Jordan Bryan for helpful comments. Federico Camerlenghi and Stefano Favaro received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817257. Federico Camerlenghi and Stefano Favaro gratefully acknowledge the financial support from the Italian Ministry of Education, University and Research (MIUR), “Dipartimenti di Eccellenza” grant 2018-2022. Federico Ferrari is partially supported by grant 1R01ES028804-01 of the National Institute of Environmental Health Sciences of the United States Institutes of Health.