Log In Sign Up

Bayesian Metabolic Flux Analysis reveals intracellular flux couplings

by   Markus Heinonen, et al.

Metabolic flux balance analyses are a standard tool in analysing metabolic reaction rates compatible with measurements, steady-state and the metabolic reaction network stoichiometry. Flux analysis methods commonly place unrealistic assumptions on fluxes due to the convenience of formulating the problem as a linear programming model, and most methods ignore the notable uncertainty in flux estimates. We introduce a novel paradigm of Bayesian metabolic flux analysis that models the reactions of the whole genome-scale cellular system in probabilistic terms, and can infer the full flux vector distribution of genome-scale metabolic systems based on exchange and intracellular (e.g. 13C) flux measurements, steady-state assumptions, and target function assumptions. The Bayesian model couples all fluxes jointly together in a simple truncated multivariate posterior distribution, which reveals informative flux couplings. Our model is a plug-in replacement to conventional metabolic balance methods, such as flux balance analysis (FBA). Our experiments indicate that we can characterise the genome-scale flux covariances, reveal flux couplings, and determine more intracellular unobserved fluxes in C. acetobutylicum from 13C data than flux variability analysis. The COBRA compatible software is available at


Genome-scale estimation of cellular objectives

Cellular metabolism is predicted accurately at the genome-scale using co...

Inferring Signaling Pathways with Probabilistic Programming

Cells regulate themselves via dizzyingly complex biochemical processes c...

Adversarial α-divergence Minimization for Bayesian Approximate Inference

Neural networks are popular models for regression. They are often traine...

Bayesian Origin-Destination Estimation in Networked Transit Systems using Nodal In- and Outflow Counts

We propose a Bayesian inference approach for static Origin-Destination (...

Hidden Hamiltonian Cycle Recovery via Linear Programming

We introduce the problem of hidden Hamiltonian cycle recovery, where the...

1 Introduction

Metabolic modelling considers networks of up to thousands of chemical reactions that transform metabolite molecules within cellular organisms (Palsson, 2015). The key problem of metabolism is estimation of the reaction rates, or fluxes, of the system of the highly interdependent intracellular fluxes from measurements of few exchange fluxes that transfer nutrients or products between the external medium and the cell.

The dominant approach to flux estimation is the celebrated Flux Balance Analysis (FBA) framework that finds reaction rates that maximise prespecified cellular growth function (Feist and Palsson, 2010), while assuming the cell is in a steady-state, where concentrations of intracellular metabolites do not change (Almaas et al., 2004). The FBA problem can be casted as a convenient and computationally efficient linear programming problem of solving a system of linear steady-state constraints while maximising a linear growth target (Orth et al., 2010), and where flux measurements can be encoded as constraints to the fluxes (Carreira et al., 2014). FBA is commonly used to characterise intra-cellular fluxes in various simulated target conditions (Mo et al., 2010). In metabolic flux analysis (MFA) values of unknown fluxes are directly estimated based on measurements of some determined fluxes without explicit maximal growth assumption (Kim et al., 2008). In both approaches a point estimate for up to thousands of highly interdependent fluxes are determined (Bordbar et al., 2014).

The standard metabolic analyses suffer from several approximations that warrant careful methodological protocol to achieve biologically meaningful results. First, the exact steady-state constraint is an unrealistic assumption since metabolites can accumulate or deplete (Pakula et al., 2016)

. Second, in FBA maximal growth is assumed, while it only holds at the highest growth phase in practise. Finally, due to a large number of metabolic reactions and limited number of experimental data, flux point estimates commonly used in the field completely ignore the notable uncertainty involved in FBA and MFA solutions. The flux variances are key in characterizing metabolic systems and uncertainties emerging from the use of insufficient and noisy data.

Numerous separate extensions to flux analysis have been introduced to alleviate these limitations. The robust FBA framework considers the effect of measurement uncertainties to the maximal growth (Zavlanos and Julius, 2011). The steady-state assumption was recently relaxed by the RAMP model (MacGillivray et al., 2017). In contrast to point flux estimates of FBA and MFA, the flux variability analysis (FVA) characterises the sensitivity of the target function to independent flux perturbations, resulting in upper and lower bounds around the FBA solution (Mahadevan and Schilling, 2003; Gudmundsson and Thiele, 2010)

. In principal flux mode analysis the eigenvectors of steady-state flux cone characterise the flux variability

(Bhadra et al., 2018). Alternatively the solution space of the fluxes can be sampled (Schellenberger et al., 2010) by considering only optimal fluxes from border of the flux hypercone (Bordel et al., 2010) or by sampling also inoptimal fluxes from the inside the hypercone (Mo et al., 2010; Saa and Nielsen, 2016). The sampling methods use the ’Hit-and-Run’ (Smith, 1984) or ’Artificially Centered Hit-and-Run’ (Kaufman and Smith, 1998) algorithms to cope with the large flux space. A related approach uses possibility calculus (Dubois et al., 1996) to iteratively refine the estimate of possible and impossible flux states (Llaneras et al., 2009).

There is a distinct lack of approaching metabolism with statistical modelling. The sole statistical approach for flux analysis remains the Metabolica framework that proposed modelling distributions of fluxes of skeletal muscle metabolism (Heino et al., 2007, 2010), but did not include modelling of maximal growths or genome-scale metabolic models. Bayesian methods have also been developed for 13C labeling data (Kadirkamanathan et al., 2006; Theorell et al., 2017) by assuming fixed steady-state and without incorporating any target growth condition.

In this paper we tackle all three limitations simultaneously by introducing a novel paradigm of Bayesian metabolic flux analysis where the genome-scale, interdependent flux vector distributions are estimated. In Bayesian formalism prior distributions on flux random variables are determined, and subsequently updated by incorporating the measurement information, resulting in posterior flux distributions. We place priors on flux distributions, and estimate posterior distributions that characterise and quantify the probability of all flux states that are compatible with flux measurements, steady-state assumption and stoichiometry. Our model can reveal flux dependencies in explicit form, and characterise the full space of flux states in principled fashion. The Bayesian flux analysis can be used as a drop-in replacement to standard FBA, MFA, FVA and sampling methods. We provide public implementation of the Bayesian flux analysis using the standard COBRA framework

(Becker et al., 2007; Schellenberger et al., 2011).

2 Methods

The goal of this paper is a probabilistic formulation of static steady-state metabolic systems that can be applied to whole genome metabolic flux analysis (MFA), flux balance analysis (FBA) and flux variability analysis (FVA). We propose the Bayesian method as a direct replacement to these classic FBA, MFA and FVA tools. We start by assuming a metabolic system of metabolites and reactions has been characterised by a constant stoichiometric matrix , where the rows denote metabolite participations in all reactions, while the columns denote reactants and products of metabolites by individual reactions. The flux vector denotes the reaction rates of the system. The steady-state equation can be stated as

which encodes that metabolite concentration changes are zero and hence the metabolite concentrations do not change. Throughout the paper we assume a subset of fluxes have been observed or determined (for instance, some of the exchange fluxes), while the remaining fluxes are unknown. Our goal is to infer the distribution of all unknown fluxes given the observed fluxes, the steady-state constraints and the flux lower and upper bounds.

2.1 Bayesian metabolic model

We formulate a Bayesian flux model, which starts by assuming multivariate Gaussian priors for fluxes as

with means and diagonal covariances . The prior means are set to zero, or to the closest value to zero considering the flux upper and lower bounds. The variances

are hyperparameters that characterise the

a priori values the flux can take. The prior distribution converges towards an uninformative uniform prior as the prior variances increase.

We assume a Gaussian prior also for the metabolite changes

where are the a priori mean accumulations or depletions of metabolite species. The diagonal covariances encode the variances around prior metabolite changes. In strict steady-state, the prior for metabolite change becomes Dirac’s delta function at zero. By increasing the variances we can relax the steady-state assumption on individual metabolites, and encode allowance for accumulations or depletions of them.

The joint distribution of fluxes

and metabolite changes

can now be stated as a joint multivariate Gaussian distribution

that encodes the exact111We add small numerical tolerance within the inverse to ensure invertibility of the matrix. relation . The conditional distribution of fluxes given a specific realisation of metabolite changes (e.g. ) is then from standard Gaussian identities

where . Since we do not in general have access to exact metabolite change values , we marginalise the conditional flux distribution over the change prior distribution resulting in


where and .

2.2 Conditioning the model with observations

Assume we have access to noisy observations from a subset of observed fluxes . The observations can be empirical measurements, 13C flux estimations, or flux hypotheses determined by the user. We assume independent additive Gaussian noise with variances collected in a matrix , and hence the likelihood of observed fluxes is

The joint distribution of all fluxes and noisy flux observations is now

which gives a conditional distribution of all fluxes given the observations as


where is the noisy covariance, is the full covariance matrix, is the covariance matrix between observed fluxes and all fluxes, and is the covariance matrix between observed fluxes. Note that observed fluxes are in both and in . Note also that the model works with no observations at all as the conditional distribution in Eq. (2) reduces back to the prior in Eq. (2.1).

Finally, we add the flux upper and lower bounds by truncating the distribution with the known flux lower and upper bounds resulting in the final truncated normal flux posterior

The posterior encodes the distribution of bounded fluxes that are compatible with the flux observations, flux priors, and where steady-state applies according to the tolerances determined by the steady-state prior means and variances .

The derived flux posterior is an unimodal truncated multivariate normal (TMVN) distribution where flux dependencies are represented through the covariance matrix , which encodes all flux relationships with high rank. The flux posterior as a whole characterises the distribution of all valid flux vectors. The main characterisations of interest are the individual flux distributions (the marginals) and flux combination distributions (multi-variate marginals). Marginals of TMVN’s are not analytically tractable, nor are they TMVN distributions (Horrace, 2005). We resort to MCMC sampling from the TMVN flux distribution to reveal individual flux, flux pair or flux group distributions.

2.3 Gibbs sampling truncated MVN’s

A recent review summarises sampling approaches for Truncated MVN’s (Altmann et al., 2014). The conditionals of TMVN’s are still TMVNs (Horrace, 2005), which has lead to many Gibbs based samplers (Geweke, 1991; Kotecha and Djuric, 1999; Horrace, 2005; Emery et al., 2014; Li and Ghosh, 2015), while also HMC samplers (Pakman and Paninski, 2014) have been proposed, while elliptical slice samplers would also fit well to the problem (Murray et al., 2010). We experimented with the three main approaches, and found out that Gibbs sampling has consistently the best performance in genome-scale metabolic models up to 4000 fluxes (data not shown). In the remainder of the paper we apply Gibbs sampling.

To sample the distribution we begin by transforming it into whitened domain by Cholesky decomposition with transformed fluxes with white distribution , where and . We sample from the univariate conditional distributions


which is a standard Normal with bounds in the white domain are (We refer to Li and Ghosh (2015) for detailed explanation):

The bounds are functions of the remaining (whitened) fluxes and need to be updated after each change to flux values. We iteratively update each whitened flux by sampling a new value from the conditional distribution with the minimax tilting method (Botev, 2016), which we found out to outperform the alternative Chopin’s algorithm (Chopin, 2011). Finally, we transform the whitened variables back into original domain by .

Organism Model thin thin
E. coli core min min
E. coli iJR904 hr d hr
E. coli iAF1260 hr d hr
E. coli iJO1366 hr d hr
B. subtilis iYO844 hr d hr
C. acetobutylicum Wallenius (2013) min hr
S. cerevisiae iMM904 min d
S. cerevisiae 7.6 hr d hr
T. reesei CORECO hr d hr
Table 1: Metabolic models analysed by Bayesian flux analysis by sampling samples from the flux posteriors. The runtime is shown for thinning rate and .

We notice that the method of Li and Ghosh (2015) can be further optimised by running multiple chains in parallel by considering a flux sample matrix containing whitened flux vector chains . The bound function is then represented as

where is the sample matrix without the ’th row, is the Cholesky matrix without the ’th column. By sampling several chains in parallel with one CPU node we can utilise the full bandwidth of the CPU. This is especially useful for Gibbs sampling.

2.4 Sampling the flux posterior

Figure 1: The effective number of independent simulation draws for a individual fluxes for a subset of models from Bayesian flux analysis with different thinning parameters. The x-axis corresponds to individual fluxes sorted based on the effective number of samples.

We set the initial flux vector to the maximum a posteriori

(MAP) of the truncated normal distribution, which we compute using quadratic programming. The truncated normal distribution is unimodal, and hence we begin sampling from the mode of the distribution providing efficient optimization.

The fluxes can be arranged into distinct bounded fluxes and unbounded fluxes , where . The conditional distribution of a truncated normal is still a truncated normal (Horrace, 2005). We only need to sample the bounded fluxes with Gibbs MCMC, and afterwards the distribution of unbounded fluxes conditioned on the bounded flux samples can be drawn from untruncated normal as

where we arrange and .

We implement the MCMC sampling in Matlab. We run multiple independent Markov chains in a vectorised form. By default we run

chains of flux samples for a total of flux vector samples. We use thinning and only accept every ’th flux sample. The MCMC chains have converged if successive samples are uncorrelated, chains are indistinguishable and have effectively forgotten the initial value. Convergent chains indicate that the MCMC sampler has characterised the whole flux posterior. We use potential scale reduction factor to approximate convergence (Gelman and Rubin, 1992). An optimal value of indicates convergence, while values are considered sufficient for convergence. We also compute the effective number of samples per flux (Gelman et al., 2013).

We assume that flux means are fixed to either zero or the lower bound of each respective flux. The model has then two main hyperparameters and that affect the posterior. The determines how much the mass balance can be relaxed, and can be set according to the prior knowledge of the modeller. To enforce mass balance, a small value such as should be chosen. The prior flux variance determines how much fluxes are driven towards zero a priori, but also should be set to sufficiently high value not to exclude possibly high fluxes. In practise we set the variance for all fluxes .

2.5 Sampling FBA solutions

The presented Bayesian model is a metabolic flux analysis (MFA) model designed to characterise the global flux configurations compatible with mass balance assumption, observations, and bounds. The method can as well be applied as an flux balance analysis (FBA) method, where a target function – such as biomass reaction – is maximised. For FBA mode we first find the standard FBA solution target flux with linear programming, and encode it as a flux observation

, where the variance determines how closely we sample from the maximal target. By default, we set the standard deviation to

of the target flux. To run maximum growth Bayesian FBA we would set an observation for the biomass pseudoreaction to the classical FBA maximum growth value and condition the model with the pseudo-measurement as in Sec. 2.2.

Figure 2: Posterior flux distributions of E. coli core model. The blue color indicates fluxes in growth, the green color maximal growth, the red color maximal growth with exchange fluxes specified, and the conventional FBA solution is a black line.

3 Results

We first perform in silico experiments to highlight the capabilities of the Bayesian FBA and MFA models in Sections 3.13.3. Our goal is to compare the computational approach against the conventional FBA and FVA methods, and to showcase the method’s in silico performance in various metabolic models. The main experiment of this paper is application of the Bayesian flux analysis to the 13C analysis of the C. acetobutylicum in Section 3.4, where we can elucidate fluxes on a genome-scale from a small set of intracellular flux measurements.

3.1 In silico metabolic models

Table 1 indicates the stoichiometric models that were considered. We considered four organisms, seven genome-scale metabolic models and one core model. All models were downloaded from the BiGG database222 For all models we run the Bayesian model in FBA mode – by specifying a growth target – with standard exchange flux measurements included as bounds. We sampled chains of flux vector samples from the full space containing all intracellular and extracellular fluxes. These flux vectors represent all possible flux configurations compatible with the experimental setting. The sampling thinning parameter determines how uncorrelated successive MCMC samples are. We applied thinning values of and , with linear effect on the running time.

The effective number of independent simulation draws for all models from Bayesian flux analysis with different thinning parameter are shown in Figure 1 using the potential scale reduction factor. The x-axis corresponds to individual fluxes sorted based on the effective number of samples. The number of reactions is different for different models. In all cases majority of the fluxes have over effective number of independent samples, which indicates that the samples represent the flux posterior well. A minority of the fluxes have low effective sample sizes. These are usually central branching fluxes that are highly dependent across the genome-scale metabolism, and hence converge slowly. The thinning parameter has a large effect on some models (core, iJR904, iAF1260, iJO1366) whereas for some other models there are not much change (CORECO, iYO844, iMM904, 7.6).

3.2 Bayesian FBA and MFA

We illustrate the characteristics of the Bayesian model using E. coli central carbon metabolism model333BiGG model e_coli_core. The model contains fluxes and metabolites. It should be noted that the model was not further constrained and do not represent the native E. coli strain as such, while it allows e.g. carbon fixation. The model was rather used to theoretically compare our modelling approach to conventional FBA methodology. The conventional FBA solution achieves a growth flux by limiting the glucose exchange flux with a . We consider three cases of Bayesian analysis: (i) growth by defining only the biomass flux observation , (ii) maximal growth scenario by defining the biomass flux observation , and (iii) maximal growth with additional observations for key exchange fluxes: GLC, O2, CO2, H2O, H+, HPO4, SO4, NH4 and ethanol that were all set to their conventional FBA solutions with a standard deviations of . In all three experiments the remaining fluxes were free with only a prior distribution with a standard deviation of specified. We defined a nearly strict steady-state by defining . We sample a total of flux vectors with the Gibbs sampler using chains and

samples each. We use 1-dimensional kernel density estimates as proxies of marginal flux posterior distributions. The small jaggedness of the distributions are an artefact from the MCMC sampling. By considering longer chains these would eventually smoothen out.

Figure 2 shows the flux distributions of fluxes. The blue color indicates the growth flux distributions, the green color the maximal growth distributions, the red color maximal growth with exchange fluxes specified, and the conventional FBA is shown with a black line. The Bayesian distributions represent the space of all allowed steady-state flux configurations given the observations and target function. Figure 2 shows that maximal growth can still support a large variance in many fluxes, with the FBA point estimate misleading by only considering one flux configuration. Similarly to conventional FVA our approach elucidates directly the possible variance in a given flux. For instance, the pentose-phosphate pathway flux G6PDH2r: D-Glucose 6-phosphate + NADP 6-Phosphogluconolactone + H + NADPH can vary between and in maximal growth. The conventional FBA yields zero flux for glyoxylate cycle flux MALS: Acetyl-CoA + Glyoxylate + HO CoA + H + Malate, while the flux space indicates that values up to are possible.

The red distributions indicate how the intracellular fluxes get more and more specified as the model is better specified by inclusion of exchange measurements. Variance of almost all fluxes reduces by more than half. For instance, the flux FUM: Fumarate + HO Malate is specified to a range of from a range of without exchange measurements.

The blue color indicates the cellular flux state when the cell is only growing at of the maximum growth rate. Most fluxes have a higher variance in this scenario. Interestingly the glucose intake is still kept at a relatively high rate. Instead of biomass production, the excess carbon from glucose can be diverted to other carbon sinks, such as formate and ethanol production. The ethanol and formate effluxes can grow up to 3 and 15, respectively. The carbon dioxide exchange decreases by over half into a range of from maximal growth exchange range of .

Figure 3: Examples of flux covariance distributions of E. coli core network. Blue points represent growth, green points maximal growth, red points maximal growth with exchange fluxes specified, and the conventional FBA solution is a black dot. Scatter plots represent pair-wise (2-dimensional) marginal posterior distributions as obtained from the MCMC samples.

3.3 Flux couplings

The flux variations are in general not independent from each other. To understand the intracellular flux space, we need to consider higher-order flux dependencies. The flux sample covariances indicate flux couplings, where the variation in one flux is constrained by other fluxes. Figure 3 highlights example flux pair patterns out of the total of in the core model. Blue points represent growth, green points maximal growth, red points maximal growth with exchange fluxes specified, and the conventional FBA solution is a black dot.

Figure 4: Global flux standard deviation reduction due to addition of twelve 13CMFA internal flux measurements in glucose limited condition. The green points indicates the standard deviation of fluxes given only exchange measurements. The red points indicate the corresponding standard deviations after inclusion of twelve 13CMFA intracellular flux measurements. The blue circles highlight the 13CMFA measured fluxes.

The flux covariations become consistently more constrained while traversing from the loose growth model (blue) towards maximal growth (green). By measuring the exchange fluxes (red), we can already pinpoint most flux patterns very closely around the theoretically optimal flux pattern as defined by the conventional FBA (black).

Figure 5: Pair-wise marginal posterior fluxes presented together the the metabolic network map. The visualized fluxes are highlighted. Blue points represent flux values in growth, green points in maximal growth, red points in maximal growth with exchange fluxes specified, and the conventional FBA solution is a black dot.

Multiple patterns of covariation can be immediately identified. There is an exact coupling between pentose-phosphate pathway reactions GND: 6-Phospho-D-gluconate + NADP CO2 + NADPH + D-Ribulose 5-phosphate and TALA: Glyceraldehyde 3-phosphate + Sedoheptulose 7-phosphate D-Erythrose 4-phosphate + D-Fructose 6-phosphate, as expected from the stoichiometry. Glycolysis related FBA: D-Fructose 1,6-bisphosphate Dihydroxyacetone phosphate + Glyceraldehyde 3-phosphate and pentose-phosphate pathway related G6PDH2r have also a strong, but not exact, negative correlation. The flux PGI: D-Glucose 6-phosphate D-Fructose 6-phosphate and carbon dioxide exchange have no correlation, but the maximal growth still pinpoints to a carbon dioxide exchange value of approximately . The dependency of glyoxylate cycle related MALS and TCA cycle related SUCOAS: ATP + CoA + Succinate HPO + ADP + Succinyl-CoA on maximal growth requirement can be observed in Figure 3. Under maximal growth a negative correlation between the two fluxes emerges. The conventional FBA solution pinpoints an optimal values as zero MALS with SUCOAS around , while the Bayesian model reveals that MALS can still have a flux value of around as long as SUCOAS tends towards simultaneously.

The patterns of G6PDH2r and FBA and FUM indicate a linear inequality for these fluxes. Especially with FUM this is natural since the pentose-phosphate pathway flux G6PDH2r limits the TCA cycle flux FUM. The same effect is also seen with G6PDH2r and ICDHyr: Isocitrate + NADP 2-Oxoglutarate + CO + NADPH, another TCA cycle flux.

To get more insight into the biology behind the flux couplings, the flux pair patterns can also be illustrated in the metabolic network (Figure 5). Figure 5 shows the samples of the flux distributions for several example pairs of reactions. These scatter plots indicate the dependency of the flux configurations between two reactions across the reaction. There is natural correlation between adjacent or subsequent fluxes but also correlation between fluxes in different pathways, such as glycolysis and TCA cycle (See the SUCOAS and TKT1 pair).

3.4 Intracellular flux elucidation of C. acetobutylicum

We consider the results obtained from 13CMFA of Clostridium acetobutylicum grown in chemostat, i.e. in continuous cultivation maintaining steady-state, with reference condition, glucose limited condition and butanol stimulus with the goal of inferring the internal fluxes. We effectively repeat the study of Wallenius et al. (2016), where FBA and FVA were performed and constrained on 12 intracellular fluxes determined by 13CMFA and 7 exchange fluxes. The model for Clostridium acetobutylicum consists of 451 metabolites and 604 reactions, and is given as an .xml file in the supplement of Wallenius et al. (2016).

The data from different measurements in glucose limited condition are shown in Table 2. For the reference and the butanol stimulated conditions, see Supplementary Tables 2 and 3. The internal fluxes are obtained from 13CMFA analysis, whereas the exchange fluxes were measured by chromatographical methods or transferred from the 13CMFA results. Flux values were normalized to the specific growth rate which was given the value of , except for the reference condition, the measured growth is . Exchange fluxes measured from the cultivations were given to Bayesian MFA as mean and standard deviations and fluxes obtained from 13CMFA were given as ranges. In all Bayesian MFA experiments, the steady state relaxation was . Finally, 500 samples were drawn from the posterior with thinning 1000.

To study the FBA’s, FVA’s and BMFA’s performance to predict the measured distributions of metabolic fluxes, three sets of models were generated: (A) a model with reaction directions for the 12 13CMFA determined internal fluxes set according to the data (bounds in table 2 ), (B) a set of 12 models where each reaction among 12 13CMFA determined internal fluxes is unconstrained at a time (the reaction direction is still constrained) while the rest 11 fluxes are constrained according to the measurements (leave-one-out), and (C) a model with all 12 13CMFA determined internal fluxes constrained to their measured values. In all three cases we constrain the model with measured values for the 6 exchange reactions and the measured growth. For the (B) set of models, we test how well we can predict the 13CMFA determined flux interval for the leave-one-out reaction by comparing the prediction with 13CMFA determined flux. For each set of models, the standard MFA with Taxicab penalty, FVA, and Bayesian MFA were performed. The MFA and FVA were performed by The Cobra Toolbox’s optimizeCbModel and fluxVariability functions by maximizing growth with the growth lower and upper bounds set to the measured value .

Reaction KEGG ID Bounds Glucose limited
Exchange fluxes
Glucose exchange C00031
Acetate exchange C00033
Acetone exchange C00207
Butanol exchange C06142
  Butyrate exchange C00246
Ethanol exchange C00469
EPS production
F1 scores %
ex only LOO ex + 13C
Malate DHase R00342 19 7 65 19 68 23
3P-glycerate DHase R01513 1 53 12 59 100 100
Acetaldehyde DHase R00228 8 56 12 73 95 92
Triosephosphate DHase R01061 10 95 5 75 63 85
Acetolactate synthase R04672 17 67 17 68 68 68
Aspartate transaminase R00355 1 32 14 29 69 69
5,10-CH=THF hydrolyase R01655 0 3 0 3 100 100
Malate hydrolyase R01082 2 83 59 54 62 67
Ribulose-5P epimerase R01529 1 0 1 0 100 100
Pyruvate carboxylase R00344 19 36 65 33 66 26
Carbonate hydrolyase R10092 10 57 97 31 100 53
C-acetyl transferase R00212 16 2 84 13 100 81
Average 9 41 36 38 83 72
Table 2: The 6 exchange flux, the growth and EPS production, and the 12 internal flux measurements. Measurements from the cultivations include standard deviations, while fluxes determined by 13CMFA are given as a range. The unit used for the model fluxes is: (CDW). : measured by chromatographic methods, obtained from 13CMFA
Figure 6: For the glucose limited condition, flux distributions of the 12 internal fluxes predicted solely from exchange fluxes (a), and distributions after seeing 13CMFA data in leave-one-out experiment (b) and after seeing all 12 internal reactions (c).

We study how the flux variances from Bayesian MFA results decrease when adding 13CMFA constraints. The Figure 4 shows the reduction of standard deviations of flux distributions of all fluxes of C. acetobutylicum in glucose limited condition when all 12 13CMFA constraints are added to the model (model set C). When adding the 13CMFA constraints, the variance of most unmeasured internal fluxes decreases, demonstrating how Bayesian MFA propagates the information about 12 measured fluxes to several tens of other internal fluxes. The reduction of standard deviations of flux distributions for the reference and butanol stimulated conditions are shown in Supplementary Figures 2 and 3.

For the glucose limited experiment, the MFA, FVA and Bayesian MFA results for model sets A, B and C are shown for 12 13CMFA determined reactions in Figure 6. To quantify the performance of FVA and Bayesian MFA to predict the 13CMFA measured range of flux values, precision, recall and the F score were computed for each reaction and model set. The F score values are shown in Table 2

as percentage and the precision and recall values are shown in the Supplementary Table 1. From Table

2 it can be concluded that the Bayesian MFA outperforms FVA for most of the reactions, especially in the leave-one-out testing. In Figure 6, for the model set A, the FVA gives a wide range of solutions whereas the distribution from Bayesian MFA is narrower and closer to the true values for almost all reactions. Figure 6b shows the results for the model set B: for reactions 3P-glycerate dehydrogenase, Acetaldehyde dehydrogenase, Triosephosphate dehydrogenase and Acetolactate synthase the flux distribution obtained by Bayesian MFA matches the test data better. In Figure 6c the resulting flux ranges for FVA and Bayesian MFA distributions are always within the true measured range, but the Bayesian MFA captures the probability in the flux values. Similar results are obtained for the butanol stimulated and reference conditions (See Supplementary Tables 2 and 3 Supplementary Figures 4 and 5).

4 Discussion

The conventional FBA formalism is a powerful framework for flux analysis that however assumes several unrealistic simplifying model approximations. Several approaches from robust flux analysis and sampling to flux variability analyses indicate the need to alleviate the approximations towards a more principled model.

We proposed the Bayesian flux analysis formalism that considers fluxes as distributions instead of point estimates. The model learns a posterior distribution of fluxes given prior information, flux measurements, upper and lower bounds and steady-state assumptions into account. The degree of belief in the measurements and steady-state can be adjusted via measurement noise variances and biological knowledge as encoded in (subjective) priors. The model characterises the complete space of possible flux configurations by modeling the uncertainties of fluxes and flux combinations. The Bayesian formalism can be seen as a drop-in replacement for deterministic flux analysis tools — such as FBA and FVA — at the cost of added running time necessary to properly characterise the flux distributions. The runtime can be effectively alleviated by only considering the core parts of the metabolic model or by running multiple MCMC chains in parallel.

Our results show that the conventional FBA and FVA tools provides an overly simplistic view of the flux capabilities of the cellular system under study, while the Bayesian model expresses the full variance in the flux configurations. The Bayesian model of metabolism opens doors for building flux analysis models in a Bayesian way. We will leave experimental design, knock-outs and strain design using the Bayesian modelling basis for future work. In future we expect the Bayesian formalism to provide an alternative statistical approach for majority of current FBA and MFA based tools with the benefit of rigorous uncertainty modeling and improved interpretation.


This work has been supported by the Academy of Finland Center of Excellence in Systems Immunology and Physiology, the Academy of Finland grants no. 260403 and 299915, and the Finnish Funding Agency for Innovation Tekes (grant no 40128/14, Living Factories).


  • Almaas et al. (2004) Almaas, E., Kovács, B., Vicsek, T., Oltvai, Z., and Barabási, A. (2004). Global organization of metabolic fluxes in the bacterium escherichia coli. Nature, 427, 839–843.
  • Altmann et al. (2014) Altmann, Y., McLaughlin, S., and Dobigeon, N. (2014). Sampling from a multivariate gaussian distribution truncated on a simplex: a review. In Statistical Signal Processing.
  • Becker et al. (2007) Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., and Herrgard, M. J. (2007). Quantitative prediction of cellular metabolism with constraint-based models: the cobra toolbox. Nature protocols, 2(3), 727.
  • Bhadra et al. (2018) Bhadra, S., Blomberg, P., Castillo, S., Rousu, J., and Wren, J. (2018). Principal metabolic flux mode analysis. Bioinformatics.
  • Bordbar et al. (2014) Bordbar, A., Monk, J., King, Z., and Palsson, B. (2014). Constraint-based models predict metabolic and associated cellular functions. Nat Rev Genet, 15, 107–120.
  • Bordel et al. (2010) Bordel, S., Agren, R., and Nielsen, J. (2010). Sampling the solution space in genome-scale metabolic networks reveals transcriptional regulation in key enzymes. PLOS Computational biology, 6.
  • Botev (2016) Botev, Z. (2016). The normal law under linear restrictions: Simulation and estimation via minimax tilting. Journal of the Royal Statistical Society Series B.
  • Carreira et al. (2014) Carreira, R., Evangelista, P., Maia, P., Vilaca, P., Pont, M., Tomb, J.-F., Rocha, I., and Rocha, M. (2014). Cbfa: phenotype prediction integrating metabolic models with constraints derived from experimental data. BMC Systems Biology, 8, 123.
  • Chopin (2011) Chopin, N. (2011). Fast simulation of truncated Gaussian distributions. Statistics and Computing, 21(2), 275–288.
  • Dubois et al. (1996) Dubois, D., Fargier, H., and Prade, H. (1996). Possibility theory in constraint satisfaction problems: handling priority, preference and uncertainty. Applied Intelligence, 6, 287–309.
  • Emery et al. (2014) Emery, X., Arroyo, D., and Pelaez, M. (2014). Simulating large gaussian random vectors subject to inequality constraints by gibbs sampling. Mathematical Geosciences, 46, 265–283.
  • Feist and Palsson (2010) Feist, A. and Palsson, B. (2010). The biomass objective function. Current opinion in Microbiology, 13, 344–349.
  • Gelman and Rubin (1992) Gelman, A. and Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
  • Gelman et al. (2013) Gelman, A., Carlin, J., Hunson, D., Vehtari, A., and Rubin, D. (2013). Bayesian Data Analysis, chapter 11.4–11.5. CRC Press, 3rd edition.
  • Geweke (1991) Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints and the evaluation of constraint probabilities. In Computing science and statistics, pages 571–578.
  • Gudmundsson and Thiele (2010) Gudmundsson, S. and Thiele, I. (2010). Computationally efficient flux variability analysis. BMC bioinformatics, 11(1), 489.
  • Heino et al. (2007) Heino, J., Tunyan, K., Calvetti, D., and Somersalo, E. (2007). Bayesian flux balance analysis applied to a skeletal muscle metabolic model. Journal of theoretical biology, 248, 91–110.
  • Heino et al. (2010) Heino, J., Calvetti, D., and Somersalo, E. (2010). Metabolica: a statistical research tool for analyzing metabolic networks. Comput Methods Programs Biomed, 97, 151–167.
  • Horrace (2005) Horrace, W. (2005). Some results on the multivariate truncated Normal distribution.

    Journal of Multivariate Analysis

    , 94(1), 209–221.
  • Kadirkamanathan et al. (2006) Kadirkamanathan, V., Yang, J., Billings, S., and Wright, P. (2006). Markov chain monte carlo algorithm based metabolic flux distribution analysis on corynebacterium glutamicum. Bioinformatics, 22, 2681–2687.
  • Kaufman and Smith (1998) Kaufman, D. and Smith, R. (1998). Direction choice for accelerated convergence in hit-and-run sampling. Oper. Res, 46, 84–95.
  • Kim et al. (2008) Kim, H., Kim, T., and Lee, S. (2008). Metabolic flux analysis and metabolic engineering of microorganisms. Molecular BioSystems, 4, 113–120.
  • Kotecha and Djuric (1999) Kotecha, J. and Djuric, P. (1999). Gibbs sampling approach for generation of truncated multivariate gaussian random variables. In Acoustics, Speech, and Signal Processing.
  • Li and Ghosh (2015) Li, Y. and Ghosh, S. (2015). Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints. Journal of Statistical Theory and Practice, 9(4), 712–732.
  • Llaneras et al. (2009) Llaneras, F., Sala, A., and Pico, J. (2009). A possibilistic framework for constraint-based metabolic flux analysis. BMC Systems biology, 3.
  • MacGillivray et al. (2017) MacGillivray, M., Ko, A., Gruber, E., Sawyer, M., Almaas, E., and Holder, A. (2017). Robust analysis of fluxes in genome-scale metabolic pathways. Scientific Reports, 7(268).
  • Mahadevan and Schilling (2003) Mahadevan, R. and Schilling, C. (2003). The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metabolic engineering, 5(4), 264–276.
  • Mo et al. (2010) Mo, M., Palsson, B., and Herrgård, M. (2010). Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Systems Biology, 3.
  • Murray et al. (2010) Murray, I., Prescott, A., and MacKay, D. (2010). Elliptical slice sampling. JMLR.
  • Orth et al. (2010) Orth, J., Thiele, I., and Palsson, B. (2010). What is flux balance analysis. Nature Biotechnology, 28(3), 245–248.
  • Pakman and Paninski (2014) Pakman, A. and Paninski, L. (2014). Exact hamiltonian monte carlo for truncated multivariate gaussians. Journal of Computational and Graphical Statistics, 23(2), 518–542.
  • Pakula et al. (2016) Pakula, T. M., Nygren, H., Barth, D., Heinonen, M., Castillo, S., Penttilä, M., and Arvas, M. (2016). Genome wide analysis of protein production load in trichoderma reesei. Biotechnology for Biofuels, 9(1), 132.
  • Palsson (2015) Palsson, B. (2015). Systems Biology: Constraint-based Reconstruction and Analysis. Cambridge University Press.
  • Saa and Nielsen (2016) Saa, P. and Nielsen, L. (2016). ll-achrb: a scalable algorithm for sampling the feasible solution space of metabolic networks. Bioinformatics, 32(15), 2330–2337.
  • Schellenberger et al. (2010) Schellenberger, J., Park, J., Conrad, T., and Palsson, B. (2010). Bigg: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC bioinformatics, 11.
  • Schellenberger et al. (2011) Schellenberger, J., Que, R., Fleming, R. M., Thiele, I., Orth, J. D., Feist, A. M., Zielinski, D. C., Bordbar, A., Lewis, N. E., Rahmanian, S., et al. (2011). Quantitative prediction of cellular metabolism with constraint-based models: the cobra toolbox v2. 0. Nature protocols, 6(9), 1290.
  • Smith (1984) Smith, R. (1984).

    Efficient monte-carlo procedures for generating points uniformly distributed over bounded regions.

    Oper. Res, 32, 1296–1308.
  • Theorell et al. (2017) Theorell, A., Leweke, S., Wiechert, W., and Nöh, K. (2017).

    To be certain about the uncertainty: Bayesian statistics for 13c metabolic flux analysis.

    Biotechnology and Bioengineering, 114, 2668–2684.
  • Wallenius et al. (2016) Wallenius, J., Maaheimo, H., and Eerikäinen, T. (2016). Carbon 13-metabolic flux analysis derived constraint-based metabolic modelling of Clostridium acetobutylicum in stressed chemostat conditions. Bioresource Technology, 219, 378 – 386.
  • Zavlanos and Julius (2011) Zavlanos, M. and Julius, A. (2011). Robust flux balance analysis of metabolic networks. In American Control Conference (ACC), 2011, pages 2915–2920. IEEE.