## 1 Introduction

As cancerous diseases still cause many deaths in human population, there is a high need of improving the understanding of tumor growth dynamics and treatment strategies. For that purpose and for treatment optimization, mathematical approaches can help with adequate quantitative descriptions and predictions. Numerous mathematical models and methods are already available (see [1, 2]), including differential equations of all types, stochastic models, phenomenological models as well as more mechanistic ones. Despite good new experimental techniques, e.g. measuring tumor growth also in vivo [3, 4, 5], it is still difficult to quantify growth by determining parameter values in realistic situations without harming a patient or affecting a potential treatment. To get a better quantitative understanding of the underlying processes, investigating tumor cell cultures and even single cells under well-controlled conditions provides a promising way to get at least parts of the necessary information (see e.g. [6]). But even under laboratory conditions, parameters can underlie variations and cells may behave individually to some extent. To consider this, uncertainty quantification can be an adequate tool. Having such in vitro

experiments, it is much easier to separate and measure different environmental influences on the tumor growth. The probably most relevant environmental influences are the availability of nutrients and oxygen. For modeling purposes, one often focuses on the limiting nutrient, without specifying it in detail. Nutrient deprivation causes “stress” to any cells, but other factors may cause stress to the cells, as well. Thus, it is worth to consider the concept of stress as a relevant factor for modeling growth and death of tumor cells.

To approach this problem, we suggest the introduction of an “environmental stress level” as an immeasurable auxiliary variable, which quantifies to what extent viable cells would get stressed out under certain environmental conditions. These may include quantities like nutrient saturation, oxygen level, drug concentrations or mechanical forces. On the one hand, this environmental stress level can e.g. inhibit cell growth and promote death by inducing necrosis for present cells. On the other hand, it is assumed to have an influence on cell movement, as cells favor areas with a low environmental stress level, which may cause metastasis formation, as well. In this work, this approach is presented in a simplified spatially homogeneous setting, where only the nutrient saturation effects the environmental stress level. Nevertheless, the presented model can be easily extended to more environmental variables, which provides several advantages from a modeling point of view. Even in this very simplified setting, model comparison shows a slight preference for the newly introduced model.

For comparison, we present two systems of ordinary differential equations, which model tumor cell population dynamics respectively with and without considering an environmental stress level. To do so, time-resolved *in vitro* data (from [7]

) are used to perform model parameter calibration with Bayesian inversion. The latter approach provides a quantitative estimate of the uncertainty in the estimated parameters and hence brings more information than a deterministic inversion method, at the price of a higher computational cost. In the Bayesian inversion setting several methods are available for sampling. Among the most popular ones are Markov Chain Monte Carlo (MCMC) methods

[8, Ch. 6-7], which are very simple to implement, but they need ad hoc tuning when the dimension of the parameter space is high or for complex posterior distributions [9]. In the last decade, particle based methods have gained popularity for their higher efficiency and robustness compared to simple MCMC, especially in the presence of high dimensional parameter spaces and multimodal posterior distributions [10]. Most of these algorithms are population based, which means they deal with a collection of samples in every iteration. In this category are sequential importance sampling and resampling (annealed importance sampling [11] and Sequential Monte Carlo [12, 13]) and population MCMC [14, 15]. To calibrate the unknown parameters of the presented models, we use Sequential Monte Carlo (SMC). It is worth mentioning that the ensemble Kalman filter

[16] can also be used to solve the inverse problem. It can be viewed as an approximate SMC with Gaussian approximation of the posterior [17]. However, the rigorous analysis of this method is confined for the moment to linear forward operators

[18, 17]. Eventually, the calibration results are used to compare the models via a validation metric and the Bayes factor. For this purpose, the chosen SMC method is advantageous opposed to other algorithms (e.g. MCMC), since its structure provides easy access to the Bayes factor

[10].The article is structured as follows: The upcoming Section 2 explains the investigated mathematical models with the underlying experimental setup and provides background about the applied methods for uncertainty quantification, parameter calibration and model comparison. The results from the model calibrations are summarized and discussed in Section 3. Finally, Section 4 concludes the presented results and the use of the novel modeling approach. At the end of this article, more details about mathematical analysis of the models and the calibration results are enclosed in a supplementary section.

## 2 Materials and method

Section 2.1 presents different models for population dynamics of tumor cells and introduces the idea of using a stress level to model environmental influences on the cells collectively. To investigate the effect of nutrient changes on viable cells, we consider a special case of the models. The underlying biological setting and the measurement methods for the corresponding in vitro experiments are explained in Section 2.2. With these data we calibrate the unknown parameters of the models using Bayesian inversion and Sequential Monte Carlo methods. Section 2.3 provides the theoretical background behind the algorithms. Finally, we present the validation metric and the Bayes factor in Section 2.4, which are used to investigate and compare the resulting model calibrations.

### 2.1 Mathematical modeling

The presented ODE models describe the basic growth and death dynamics of tumor cells in a spatially homogeneous, avascular environment under consideration of the present nutrient saturation. The time-dependent variables are the density of viable tumor cells , the nutrient saturation , and the environmental stress level at day .

We consider two possibilities to model the influence of the nutrient saturation on the cells: Model assumes that the nutrient saturation directly affects the growth/death rates of the cells, while in model the nutrient supply influences the cell population indirectly by changing the environmental stress level, which itself has an effect on the viability of the cells. The nutrient saturation is bounded and normalized: . The bounds represent the complete absence of nutrients () and a nutrient supply, which creates optimal growth conditions for the cells (). In the latter case, both models can be reduced to a simplified model, which is independent of the nutrient saturation . We denote this model by (“opt” short for “optimal” growth conditions).

A detailed description of each model follows in the consecutive paragraphs. All models are biologically reasonable in a first check, as they have been mathematically analyzed in terms of positivity and boundedness of the solutions as well as steady states and their stability. The last paragraph of this section provides a summary of the variables, parameters, and mathematical features of the models.

#### What if the cells find optimal nutrient conditions?

External nutrients serve as an energy source for cell proliferation. Growth can happen with a maximal possible proliferation rate , which is virtually reached under optimal nutrient conditions (). Under these circumstances, cell death is assumed to occur only for nutrient-independent reasons (e.g. dying of old age) with a constant rate . These dynamics can be modeled with a combination of a generalized logistic growth term (first proposed in [19]) and an exponential death term. The population growth is limited by the carrying capacity of the biological system, taking into account e.g. limited space but not nutrient shortage. In particular, cell growth is inhibited in a cell density dependent manner, which is known as “proliferation contact inhibition” [20]. The strength of this phenomenon is specific for the cell type and is modeled by the shape parameter of the logistic growth. A small value for indicates strong contact inhibition, i.e. reduction of proliferation already starts at small cell densities. Therefore, can be interpreted as the strength of contact inhibition. Overall, this results in the following initial value problem:

() |

As a feature of the pure logistic term, population growth for very small populations (i.e. ) can be approximated by exponential growth with rate . It is a reasonable assumption that, independently of the initial population size , the cell population’s size actually increases over time under optimal nutrient conditions, which translates to a parameter relation: . With this constraint, the ODE can be rewritten as a purely logistic growth term with a new rate and capacity :

Therefore, under optimal nutrient conditions, the cell population actually grows logistically with

For the known analytical solution of such an initial value problem, i.e. for model , is

(2.1) |

#### What if suboptimal nutrient conditions directly affect the cell dynamics?

Exposing the cells to suboptimal nutrient conditions () can affect the cells’ metabolism by decelerating proliferation or even lead to cell death by starvation. To include this, an additional death term is needed, whose rate as well as the proliferation rate are scaled appropriately to the nutrient supply.

A time-dependent nutrient supply can be described by adding an additional differential equation to the system: with . The reaction term can e.g. include terms for nutrient consumption by viable cells or an external nutrient source. In the considered *in vitro* experiments, whose measurements are used to calibrate the model later on, the nutrient concentration is maintained approximately constant. Therefore, all presented models assume

The present nutrient saturation contributes to the cells’ reproductivity as well as to their ability to survive. If there are no nutrients available, the cells die from starvation with a maximal possible rate , assumed to happen in an exponential manner. With increasing nutrient supply, the starvation rate decreases and the proliferation rate is up-regulated. Adapting the previous model accordingly yields the more general initial value problem

() |

with appropriate nutrient-dependent functions and , scaling the total growth and death rates:

The scaling functions need to fulfill certain criteria to be biologically meaningful in our model setting. First, the cells should not die from starvation but proliferate with the maximal possible rate under optimal nutrient conditions. In contrast, without nutrients the cells should not be able to reproduce but starve with a maximal possible rate. These features translate to the properties

Second, increasing the nutrient saturation should promote growth and reduce starvation: resp. need to be monotonically increasing resp. decreasing functions bounded by . Therefore, a possible choice are Hill type functions with Hill coefficient , i.e.

(2.2) |

where denotes the nutrient threshold, for which the cells proliferate/starve with half-maximal rate. For simplicity, we do not assume any hysteresis effect, motivating the relation . The scaling functions can be interpreted as influence functions, which describe how tolerant viable cells are to nutrient changes. In this context, the parameter can be seen as a nutrient sensitivity threshold. Figure 1 shows a qualitative plot of the behavior of the scaling functions.

It can be shown (for calculations see Section 5.1 in the supplement) that for a well-defined analytical solution of problem () is

We note that in the extreme situation of having no nutrient supply (i.e. ), it holds and . This results in and . By inserting this special case of into the given analytical solution, it simplifies to

This is an expected result, being the solution of a simple exponential decay model without growth:

#### Advantages of using the environmental stress level (ESL) instead to directly affect the cells.

For an alternative way to include the effect of the nutrient conditions in model , we introduce an auxiliary variable: the environmental stress level . It is designed to be an immeasurable quantity, describing to what extent viable cells would get stressed out, if exposed to certain environmental conditions, like in this case the nutrient saturation. We assume that the stress level is limited by a maximal level, which is set to one: .

In a general setting, the the ESL can be influenced by multiple environmental factors (e.g nutrient/oxygen/drug concentration), where each can be mathematically represented by a system variable. Let be a set of such environmental variables, with

being the given reaction equations. Then the ODE for the environmental stress level can be given by

(2.3) |

The first term includes the dynamics raising the ESL. Since , the increase is bounded by one from above via the factor . If the environmental factors generate beneficial growth conditions for the cells, the ESL decreases according to the second term. The parameters resp. are variable-specific “sensitivity rates”, denoting how fast the stress level increases/decreases, if the value of the associated variable enters/leaves a range, which is critical for survival. With distinguishing between and per variable, it is possible to mathematically consider that cells react e.g. faster to the lack of an environmental factor, which is essential for their survival, than they recover from the deprivation once beneficial conditions are restored. Which values of are considered to be critical, is given by the explicit shape of the corresponding “influence functions” and . In particular, large values of promote stress, whereas large values of inhibits stress.

Collecting the influences of the environment in the stress level gives rise to some practical advantages. We can choose an arbitrary number of environmental variables and include them easily into the stress reaction due to the modular structure of (2.3). With the parameters and we can consider different sensitivities of the cells to changes in different environmental factors. Although neither these parameters nor the stress level can be measured directly in experiments, appropriate data of cell viability can provide enough information to estimate the values of the sensitivity rates. For this e.g. statistical methods for parameter calibration can be used, where the only prior assumption is a rough range, in which the parameter values are expected. Then, by grouping these sensitivity rates according to their magnitude, we can distinguish the corresponding environmental variables between having a fast or slow or even no impact on the cells. Under a quasi-steady state assumption, the ODE for can then be simplified by choosing a time scale of interest and omitting environmental variables with no influence.

In the considered experimental setting, we have the special case of the constant nutrient saturation being the only environmental variable:

We assume the ESL to increase and decrease with the same “nutrient sensitivity rate” . The corresponding “nutrient influence functions” are then and , as introduced previously in (2.2) for model . Hence, we have

i.e. given the initial condition we consider the ODE

Since this equation is independent of the variable , the exact solution can be calculated by separation of variables and for it is given as:

(2.4) |

Instead of the nutrient-dependent scaling functions in the basic model , now the ESL influences the proliferation and starvation rate. A high stress level causes slower proliferation and faster starvation, whereas a low stress level has the opposite effect. Since by definition , this results in the following initial value problem:

() |

By inserting the explicit time-dependent analytical solution (2.4), the system reduces to a non-autonomous ODE for , which is solved numerically. We observe that model has an interesting relation to the model : under the assumption that critical nutrient changes immediately influence the viable cells, i.e. assuming , we return to model , since for it holds

The same result can be achieved by a quasi-steady state assumption, stating that changes in the ESL happen at a much faster time scale than changes in the tumor cell number. In this case, the stress level reaches its steady state virtually instantly, which is:

We are aware that having multiple environmental factors included into the model would make the advantages of using the concept of the environmental stress level more evident. However, due to the lack of corresponding data, we focus on the special case of having only one environmental variable (nutrient saturation ). We consider the investigations in this manuscript as a proof of principle that the environmental stress level is a feasible alternative way to model the effect of the environment on the cells.

#### Overview over all models and their mathematical properties.

The last model has been constructed from model , which itself uses model as a basis. Therefore, the number of variables and parameters increases with adding more complex dynamics. An overview over the variables and strictly positive model parameters of each system as well as abbreviating notations for important parameter terms/functions can be found in Tables 1, 2 and 3 below. The units of the variables and parameters are motivated by the biological setting and the measurement methods, which are explained in more detail in the following Section 2.2.

Variable | Meaning | Unit | |||
---|---|---|---|---|---|

✓ | ✓ | ✓ | Density of viable tumor cells | ||

✓ | ✓ | Nutrient saturation | |||

✓ | Environmental stress level | – |

A mathematical analysis of the models yields positivity and boundedness of the solutions, which are important features for biological reasonableness. Table 4 summarizes the computed bounds as well as the steady states reps. and their stability. The corresponding calculations providing these results can be found in the supplementary Section 5.2.

### 2.2 Experimental data

We use data from the experiments described in [7] to calibrate the unknown parameters from Table 2. A CellTiter-Blue^{®} assay was used to monitor the viability of tumor cells. In particular, viable cells metabolize a provided chemical and they emit measurable light as a result of this process. Hence, the data points are fluorescence intensity measurements. This way, viability was measured once every day and there are four biological replicates of each measurement to estimate statistical significance and repeatability.

Let be an intensity measurement of a specific cell line at time point . Excluding the corresponding background intensity of the cell-free medium, the fluorescence intensity produced by viable cells is assumed to be directly proportional to the density of viable tumor cells . The experiments were performed with different initial cell densities between and cells per milliliter, which motivates the unit of and eventually leads to the relation

where denotes the proportionality constant translating fluorescence intensity to cell density. Whenever we refer to “intensity measurements” in the following, we mean the fluorescence produced by the cells and neglect the superscript “” for better readability.

For nutrition the cells are supplemented with a particular concentration between to of fetal bovine serum (FBS). A supplementation with does not provide the cells with any nutrients, whereas generates optimal growth conditions. The nutrient supply is kept constant throughout the whole duration of each experiment, i.e. the nutrient saturation is assumed to stay at its initial level at any time. Overall, this motivates:

The remaining variable, the environmental stress level , is an auxiliary variable and especially immeasurable. Hence, it has no experimental counterpart and is used as a dimensionless quantity.

#### How are uncertainties considered?

In reality, the equation is not rigorously fulfilled. This can be due to e.g. model inadequacy or biological fluctuations of the cells’ metabolism, which affect measurement accuracy. To capture this uncertainty, we assume a multiplicative noise for each element of a set of measurements and a set of model solutions considering the corresponding values of , , and . In the Bayesian framework that we adopt (see Section 2.3

) we model this to be a random variable

such that for :(2.5) |

Let be i.i.d. and have the unimodal and continuous distribution of a random variable

with probability density function (PDF)

. Then, the following properties should hold:Different distributions are possible to accomplish these properties. A small number of shape parameters and an easy calculation to ensure property [P2] are desirable. Therefore, we choose a Gamma distribution

, with a few restrictions. This is a plausible choice for multiplicative noise and often used in imaging theory (see e.g. [21, 22, 23]). Property [P1] and the desired behavior near infinity of [P3] are satisfied by definition. The corresponding PDF is given by , where denotes the Gamma function. To fulfill the remaining properties, we observeNote that by constraining the shape of the distribution depends only on the parameter . In fact,

is directly related to the standard deviation

and hence the variance of the distribution:

. Therefore, for the uncertainty factor for a particular measurement can be modeled by(2.6) |

We use the percentiles and of the Gamma distributed uncertainty factors to define the

(2.7) |

In particular, given a specific noise variance , the model expects 90% of the measurements within this interval, whereas respectively 5% are expected below and above it. The left side of Figure 2 depicts exemplary plots of the PDF for different values of

. It also shows the positive skewness of the distribution. Under consideration of the measurement method, this is a reasonable feature for the uncertainty factors, assuming the cells might not metabolize the assay to their full potential. On the right side of Figure

2 we see an example of a 90% uncertainty range around a solution.The reason for considering a multiplicative (and, for instance, not additive) noise term is twofold. From a mathematical perspective, it allows to preserve positivity of the data. A more practical motivation is the reasonable assumption that the fluorescence noise of the intensity measurements is proportional to the density of viable cells. This is also supported by the observation of a larger variance for experiments with larger cell numbers in our data. Furthermore, multiplicative noise has demonstrated to be better suited for fluorescence than additive noise in other experimental settings [24].

#### How can the measurements be utilized?

One of the experiments in [7] monitors the dependence between cell viability and nutrient supply by providing five different nutrient concentrations over a period of 7 days: this results in five data sets “D1”–“D5”. These measurements are employed to calibrate the unknown model parameters for models and with Bayesian inversion methods (for details see following Section 2.3). Another experiment in [7] was designed to investigate cell behavior under optimal growth conditions, i.e. the cells were provided with over a period of 21 days. We denote the data set generated by this experiment with “D6”. It is used additionally to D1–D5 to validate the calibration results with model . All experiments start with several populations of different initial size . Table 5 shows the experimental setup of all data sets and the resulting initial conditions in the models.

Nutrition | Duration | Initial values in the models | |||
---|---|---|---|---|---|

Data set | (in ) | (in days) | |||

D1 | 10.0 | 7 | |||

D2 | 7.5 | 7 | |||

D3 | 5.0 | 7 | |||

D4 | 2.5 | 7 | |||

D5 | 0.0 | 7 | |||

D6 | 10.0 | 21 | () | () |

### 2.3 Parameter calibration: Bayesian inversion and Sequential Monte Carlo (SMC)

The task to identify the unknown true parameters from given data is called the “inverse problem”. We solve it using a Bayesian approach, which leads, under mild assumptions, to a naturally well-posed inverse problem [25]. Furthermore, this approach allows to quantify the uncertainty in the estimated parameters and hence it brings more information than a deterministic inversion method, at the price of a higher computational cost.

We collect all parameters to be estimated in a vector

(), where denotes the parameter space. Assuming the considered data consist of intensity measurements (excluding background intensity), we collect them in a vector . Defining with as the forward operator mapping parameter values to the corresponding intensities, such a measurement can be rewritten as(2.8) |

where is the corresponding model solution to (), () or () using as parameters and is the multiplicative noise. Note that we use parts of the data sets D1–D5 for parameter estimation, hence each measurement can refer to a different nutrient condition, initial cell density, and time point. Therefore, has to be calculated in consideration of the corresponding values for , and .

Now, we want to consider all measurements in collectively. The parameter vector and the noise are modeled as multi-dimensional random variables taking values in and respectively. The Bayesian formulation of the problem is the following: Given a prior (measure) on , compute the posterior (measure) given the data . The prior in the Bayesian setting is the correspondent of a regularization in deterministic inverse problems [25] and it reflects the knowledge about the parameters before including any information given by the data, whereas the posterior describes the knowledge after seeing the data. Let and denote the probability densities of and , respectively. By Bayes’ formula, we have

(2.9) |

where is the data likelihood [25]. The proportionality constant of relation (2.9) depends only on . It is called “model evidence” and it can be used to quantitatively compare two models (see Section 2.4). We remind from the previous Section 2.2 that (2.6) states i.i.d. with for every measurement , . Using this together with (2.8), the data likelihood of is:

#### How to sample from the posterior?

In order to make predictions, we want to sample from the given posterior distribution (2.9). However, this has a complicated, concentrated density, so we cannot sample from it exactly with a random number generator. To approximate the posterior measure, we use therefore the Sequential Monte Carlo (SMC) method, which we explain now based on [26]. In SMC one considers a sequence of intermediate distributions , such that is the prior and coincides with the posterior distribution. The probability density of the intermediate measure can be defined by

(2.10) |

where and are normalizing constants and

is the likelihood associated to observation with normalization constant , . The intermediate densities could also be constructed with an adaptive approach using tempering [9]. However, since the considered data measures the quantity of interest in a time series, the presented filtering method is computationally more efficient. The SMC algorithm samples sequentially from the intermediate measures using a weighted swarm of samples, called particles. Let be the sample size, i.e. the number of particles. At the -th iteration () the algorithm leads to a collection of particles with associated weights , which gives the approximation

The SMC algorithm is summarized in Algorithm 1: we achieve appropriately weighted particles to approximate by starting with uniformly weighted particles distributed according to the prior (line 1) and iteratively move the samples from the previous measure to in a selection (lines 3-7) and a mutation step (line 8) [27, Ch. 5], which are explained in detail in the following paragraphs.

Selection step. We start with a collection of particles , distributed according to . Their weights are updated to by importance sampling: for we have

We see that is normalized to ensure that

, i.e. having a probability distribution. Note that the importance sampling only changes the weights and not the particles. However, if there are many particles with low weights, the estimation is only as accurate as a Monte Carlo approximation with a very small number of particles

[28]. In this case, the reweighing step is followed by a resampling step, where the particles are replaced according to their updated weights. Resampling is needed, if the effective sample sizeis small, which we check by comparison with a threshold :

This discards particles with low weight and improves the representation of the distribution .

Mutation step. Performing only selection steps will eventually lead to degeneracy in the diversity of the particle population. In particular, after some resampling steps, few particles will survive and be replicated. Therefore, we introduce diversity in the particles by moving them according to a Markov Chain Monte Carlo (MCMC) kernel . This kernel is -invariant, i.e. it does not modify the particle distribution. We adopt the adaptive strategy developed in [9] to construct such a MCMC kernel. A random walk Metropolis-Hastings (MH) proposal is used on each univariate component, conditionally independently. More precisely, remembering that each particle is a vector , MH proposes with by computing

where denotes the -th component () and the scale is tuned to the acceptance rate of the previous SMC iteration. This means, we choose adaptively as