1 Introduction
Metabolic modelling considers networks of up to thousands of chemical reactions that transform metabolite molecules within cellular organisms (Palsson, 2015). The key problem of metabolism is estimation of the reaction rates, or fluxes, of the system of the highly interdependent intracellular fluxes from measurements of few exchange fluxes that transfer nutrients or products between the external medium and the cell.
The dominant approach to flux estimation is the celebrated Flux Balance Analysis (FBA) framework that finds reaction rates that maximise prespecified cellular growth function (Feist and Palsson, 2010), while assuming the cell is in a steadystate, where concentrations of intracellular metabolites do not change (Almaas et al., 2004). The FBA problem can be casted as a convenient and computationally efficient linear programming problem of solving a system of linear steadystate constraints while maximising a linear growth target (Orth et al., 2010), and where flux measurements can be encoded as constraints to the fluxes (Carreira et al., 2014). FBA is commonly used to characterise intracellular fluxes in various simulated target conditions (Mo et al., 2010). In metabolic flux analysis (MFA) values of unknown fluxes are directly estimated based on measurements of some determined fluxes without explicit maximal growth assumption (Kim et al., 2008). In both approaches a point estimate for up to thousands of highly interdependent fluxes are determined (Bordbar et al., 2014).
The standard metabolic analyses suffer from several approximations that warrant careful methodological protocol to achieve biologically meaningful results. First, the exact steadystate constraint is an unrealistic assumption since metabolites can accumulate or deplete (Pakula et al., 2016)
. Second, in FBA maximal growth is assumed, while it only holds at the highest growth phase in practise. Finally, due to a large number of metabolic reactions and limited number of experimental data, flux point estimates commonly used in the field completely ignore the notable uncertainty involved in FBA and MFA solutions. The flux variances are key in characterizing metabolic systems and uncertainties emerging from the use of insufficient and noisy data.
Numerous separate extensions to flux analysis have been introduced to alleviate these limitations. The robust FBA framework considers the effect of measurement uncertainties to the maximal growth (Zavlanos and Julius, 2011). The steadystate assumption was recently relaxed by the RAMP model (MacGillivray et al., 2017). In contrast to point flux estimates of FBA and MFA, the flux variability analysis (FVA) characterises the sensitivity of the target function to independent flux perturbations, resulting in upper and lower bounds around the FBA solution (Mahadevan and Schilling, 2003; Gudmundsson and Thiele, 2010)
. In principal flux mode analysis the eigenvectors of steadystate flux cone characterise the flux variability
(Bhadra et al., 2018). Alternatively the solution space of the fluxes can be sampled (Schellenberger et al., 2010) by considering only optimal fluxes from border of the flux hypercone (Bordel et al., 2010) or by sampling also inoptimal fluxes from the inside the hypercone (Mo et al., 2010; Saa and Nielsen, 2016). The sampling methods use the ’HitandRun’ (Smith, 1984) or ’Artificially Centered HitandRun’ (Kaufman and Smith, 1998) algorithms to cope with the large flux space. A related approach uses possibility calculus (Dubois et al., 1996) to iteratively refine the estimate of possible and impossible flux states (Llaneras et al., 2009).There is a distinct lack of approaching metabolism with statistical modelling. The sole statistical approach for flux analysis remains the Metabolica framework that proposed modelling distributions of fluxes of skeletal muscle metabolism (Heino et al., 2007, 2010), but did not include modelling of maximal growths or genomescale metabolic models. Bayesian methods have also been developed for 13C labeling data (Kadirkamanathan et al., 2006; Theorell et al., 2017) by assuming fixed steadystate and without incorporating any target growth condition.
In this paper we tackle all three limitations simultaneously by introducing a novel paradigm of Bayesian metabolic flux analysis where the genomescale, interdependent flux vector distributions are estimated. In Bayesian formalism prior distributions on flux random variables are determined, and subsequently updated by incorporating the measurement information, resulting in posterior flux distributions. We place priors on flux distributions, and estimate posterior distributions that characterise and quantify the probability of all flux states that are compatible with flux measurements, steadystate assumption and stoichiometry. Our model can reveal flux dependencies in explicit form, and characterise the full space of flux states in principled fashion. The Bayesian flux analysis can be used as a dropin replacement to standard FBA, MFA, FVA and sampling methods. We provide public implementation of the Bayesian flux analysis using the standard COBRA framework
(Becker et al., 2007; Schellenberger et al., 2011).2 Methods
The goal of this paper is a probabilistic formulation of static steadystate metabolic systems that can be applied to whole genome metabolic flux analysis (MFA), flux balance analysis (FBA) and flux variability analysis (FVA). We propose the Bayesian method as a direct replacement to these classic FBA, MFA and FVA tools. We start by assuming a metabolic system of metabolites and reactions has been characterised by a constant stoichiometric matrix , where the rows denote metabolite participations in all reactions, while the columns denote reactants and products of metabolites by individual reactions. The flux vector denotes the reaction rates of the system. The steadystate equation can be stated as
which encodes that metabolite concentration changes are zero and hence the metabolite concentrations do not change. Throughout the paper we assume a subset of fluxes have been observed or determined (for instance, some of the exchange fluxes), while the remaining fluxes are unknown. Our goal is to infer the distribution of all unknown fluxes given the observed fluxes, the steadystate constraints and the flux lower and upper bounds.
2.1 Bayesian metabolic model
We formulate a Bayesian flux model, which starts by assuming multivariate Gaussian priors for fluxes as
with means and diagonal covariances . The prior means are set to zero, or to the closest value to zero considering the flux upper and lower bounds. The variances
are hyperparameters that characterise the
a priori values the flux can take. The prior distribution converges towards an uninformative uniform prior as the prior variances increase.We assume a Gaussian prior also for the metabolite changes
where are the a priori mean accumulations or depletions of metabolite species. The diagonal covariances encode the variances around prior metabolite changes. In strict steadystate, the prior for metabolite change becomes Dirac’s delta function at zero. By increasing the variances we can relax the steadystate assumption on individual metabolites, and encode allowance for accumulations or depletions of them.
The joint distribution of fluxes
and metabolite changescan now be stated as a joint multivariate Gaussian distribution
that encodes the exact^{1}^{1}1We add small numerical tolerance within the inverse to ensure invertibility of the matrix. relation . The conditional distribution of fluxes given a specific realisation of metabolite changes (e.g. ) is then from standard Gaussian identities
where . Since we do not in general have access to exact metabolite change values , we marginalise the conditional flux distribution over the change prior distribution resulting in
(1) 
where and .
2.2 Conditioning the model with observations
Assume we have access to noisy observations from a subset of observed fluxes . The observations can be empirical measurements, 13C flux estimations, or flux hypotheses determined by the user. We assume independent additive Gaussian noise with variances collected in a matrix , and hence the likelihood of observed fluxes is
The joint distribution of all fluxes and noisy flux observations is now
which gives a conditional distribution of all fluxes given the observations as
(2) 
where is the noisy covariance, is the full covariance matrix, is the covariance matrix between observed fluxes and all fluxes, and is the covariance matrix between observed fluxes. Note that observed fluxes are in both and in . Note also that the model works with no observations at all as the conditional distribution in Eq. (2) reduces back to the prior in Eq. (2.1).
Finally, we add the flux upper and lower bounds by truncating the distribution with the known flux lower and upper bounds resulting in the final truncated normal flux posterior
The posterior encodes the distribution of bounded fluxes that are compatible with the flux observations, flux priors, and where steadystate applies according to the tolerances determined by the steadystate prior means and variances .
The derived flux posterior is an unimodal truncated multivariate normal (TMVN) distribution where flux dependencies are represented through the covariance matrix , which encodes all flux relationships with high rank. The flux posterior as a whole characterises the distribution of all valid flux vectors. The main characterisations of interest are the individual flux distributions (the marginals) and flux combination distributions (multivariate marginals). Marginals of TMVN’s are not analytically tractable, nor are they TMVN distributions (Horrace, 2005). We resort to MCMC sampling from the TMVN flux distribution to reveal individual flux, flux pair or flux group distributions.
2.3 Gibbs sampling truncated MVN’s
A recent review summarises sampling approaches for Truncated MVN’s (Altmann et al., 2014). The conditionals of TMVN’s are still TMVNs (Horrace, 2005), which has lead to many Gibbs based samplers (Geweke, 1991; Kotecha and Djuric, 1999; Horrace, 2005; Emery et al., 2014; Li and Ghosh, 2015), while also HMC samplers (Pakman and Paninski, 2014) have been proposed, while elliptical slice samplers would also fit well to the problem (Murray et al., 2010). We experimented with the three main approaches, and found out that Gibbs sampling has consistently the best performance in genomescale metabolic models up to 4000 fluxes (data not shown). In the remainder of the paper we apply Gibbs sampling.
To sample the distribution we begin by transforming it into whitened domain by Cholesky decomposition with transformed fluxes with white distribution , where and . We sample from the univariate conditional distributions
(3) 
which is a standard Normal with bounds in the white domain are (We refer to Li and Ghosh (2015) for detailed explanation):
The bounds are functions of the remaining (whitened) fluxes and need to be updated after each change to flux values. We iteratively update each whitened flux by sampling a new value from the conditional distribution with the minimax tilting method (Botev, 2016), which we found out to outperform the alternative Chopin’s algorithm (Chopin, 2011). Finally, we transform the whitened variables back into original domain by .
Runtime  

Organism  Model  thin  thin  
E. coli  core  min  min  
E. coli  iJR904  hr  d hr  
E. coli  iAF1260  hr  d hr  
E. coli  iJO1366  hr  d hr  
B. subtilis  iYO844  hr  d hr  
C. acetobutylicum  Wallenius (2013)  min  hr  
S. cerevisiae  iMM904  min  d  
S. cerevisiae  7.6  hr  d hr  
T. reesei  CORECO  hr  d hr 
We notice that the method of Li and Ghosh (2015) can be further optimised by running multiple chains in parallel by considering a flux sample matrix containing whitened flux vector chains . The bound function is then represented as
where is the sample matrix without the ’th row, is the Cholesky matrix without the ’th column. By sampling several chains in parallel with one CPU node we can utilise the full bandwidth of the CPU. This is especially useful for Gibbs sampling.
2.4 Sampling the flux posterior
We set the initial flux vector to the maximum a posteriori
(MAP) of the truncated normal distribution, which we compute using quadratic programming. The truncated normal distribution is unimodal, and hence we begin sampling from the mode of the distribution providing efficient optimization.
The fluxes can be arranged into distinct bounded fluxes and unbounded fluxes , where . The conditional distribution of a truncated normal is still a truncated normal (Horrace, 2005). We only need to sample the bounded fluxes with Gibbs MCMC, and afterwards the distribution of unbounded fluxes conditioned on the bounded flux samples can be drawn from untruncated normal as
where we arrange and .
We implement the MCMC sampling in Matlab. We run multiple independent Markov chains in a vectorised form. By default we run
chains of flux samples for a total of flux vector samples. We use thinning and only accept every ’th flux sample. The MCMC chains have converged if successive samples are uncorrelated, chains are indistinguishable and have effectively forgotten the initial value. Convergent chains indicate that the MCMC sampler has characterised the whole flux posterior. We use potential scale reduction factor to approximate convergence (Gelman and Rubin, 1992). An optimal value of indicates convergence, while values are considered sufficient for convergence. We also compute the effective number of samples per flux (Gelman et al., 2013).We assume that flux means are fixed to either zero or the lower bound of each respective flux. The model has then two main hyperparameters and that affect the posterior. The determines how much the mass balance can be relaxed, and can be set according to the prior knowledge of the modeller. To enforce mass balance, a small value such as should be chosen. The prior flux variance determines how much fluxes are driven towards zero a priori, but also should be set to sufficiently high value not to exclude possibly high fluxes. In practise we set the variance for all fluxes .
2.5 Sampling FBA solutions
The presented Bayesian model is a metabolic flux analysis (MFA) model designed to characterise the global flux configurations compatible with mass balance assumption, observations, and bounds. The method can as well be applied as an flux balance analysis (FBA) method, where a target function – such as biomass reaction – is maximised. For FBA mode we first find the standard FBA solution target flux with linear programming, and encode it as a flux observation
, where the variance determines how closely we sample from the maximal target. By default, we set the standard deviation to
of the target flux. To run maximum growth Bayesian FBA we would set an observation for the biomass pseudoreaction to the classical FBA maximum growth value and condition the model with the pseudomeasurement as in Sec. 2.2.3 Results
We first perform in silico experiments to highlight the capabilities of the Bayesian FBA and MFA models in Sections 3.1–3.3. Our goal is to compare the computational approach against the conventional FBA and FVA methods, and to showcase the method’s in silico performance in various metabolic models. The main experiment of this paper is application of the Bayesian flux analysis to the ^{13}C analysis of the C. acetobutylicum in Section 3.4, where we can elucidate fluxes on a genomescale from a small set of intracellular flux measurements.
3.1 In silico metabolic models
Table 1 indicates the stoichiometric models that were considered. We considered four organisms, seven genomescale metabolic models and one core model. All models were downloaded from the BiGG database^{2}^{2}2http://bigg.ucsd.edu. For all models we run the Bayesian model in FBA mode – by specifying a growth target – with standard exchange flux measurements included as bounds. We sampled chains of flux vector samples from the full space containing all intracellular and extracellular fluxes. These flux vectors represent all possible flux configurations compatible with the experimental setting. The sampling thinning parameter determines how uncorrelated successive MCMC samples are. We applied thinning values of and , with linear effect on the running time.
The effective number of independent simulation draws for all models from Bayesian flux analysis with different thinning parameter are shown in Figure 1 using the potential scale reduction factor. The xaxis corresponds to individual fluxes sorted based on the effective number of samples. The number of reactions is different for different models. In all cases majority of the fluxes have over effective number of independent samples, which indicates that the samples represent the flux posterior well. A minority of the fluxes have low effective sample sizes. These are usually central branching fluxes that are highly dependent across the genomescale metabolism, and hence converge slowly. The thinning parameter has a large effect on some models (core, iJR904, iAF1260, iJO1366) whereas for some other models there are not much change (CORECO, iYO844, iMM904, 7.6).
3.2 Bayesian FBA and MFA
We illustrate the characteristics of the Bayesian model using E. coli central carbon metabolism model^{3}^{3}3BiGG model e_coli_core. The model contains fluxes and metabolites. It should be noted that the model was not further constrained and do not represent the native E. coli strain as such, while it allows e.g. carbon fixation. The model was rather used to theoretically compare our modelling approach to conventional FBA methodology. The conventional FBA solution achieves a growth flux by limiting the glucose exchange flux with a . We consider three cases of Bayesian analysis: (i) growth by defining only the biomass flux observation , (ii) maximal growth scenario by defining the biomass flux observation , and (iii) maximal growth with additional observations for key exchange fluxes: GLC, O2, CO2, H2O, H+, HPO4, SO4, NH4 and ethanol that were all set to their conventional FBA solutions with a standard deviations of . In all three experiments the remaining fluxes were free with only a prior distribution with a standard deviation of specified. We defined a nearly strict steadystate by defining . We sample a total of flux vectors with the Gibbs sampler using chains and
samples each. We use 1dimensional kernel density estimates as proxies of marginal flux posterior distributions. The small jaggedness of the distributions are an artefact from the MCMC sampling. By considering longer chains these would eventually smoothen out.
Figure 2 shows the flux distributions of fluxes. The blue color indicates the growth flux distributions, the green color the maximal growth distributions, the red color maximal growth with exchange fluxes specified, and the conventional FBA is shown with a black line. The Bayesian distributions represent the space of all allowed steadystate flux configurations given the observations and target function. Figure 2 shows that maximal growth can still support a large variance in many fluxes, with the FBA point estimate misleading by only considering one flux configuration. Similarly to conventional FVA our approach elucidates directly the possible variance in a given flux. For instance, the pentosephosphate pathway flux G6PDH2r: DGlucose 6phosphate + NADP 6Phosphogluconolactone + H + NADPH can vary between and in maximal growth. The conventional FBA yields zero flux for glyoxylate cycle flux MALS: AcetylCoA + Glyoxylate + HO CoA + H + Malate, while the flux space indicates that values up to are possible.
The red distributions indicate how the intracellular fluxes get more and more specified as the model is better specified by inclusion of exchange measurements. Variance of almost all fluxes reduces by more than half. For instance, the flux FUM: Fumarate + HO Malate is specified to a range of from a range of without exchange measurements.
The blue color indicates the cellular flux state when the cell is only growing at of the maximum growth rate. Most fluxes have a higher variance in this scenario. Interestingly the glucose intake is still kept at a relatively high rate. Instead of biomass production, the excess carbon from glucose can be diverted to other carbon sinks, such as formate and ethanol production. The ethanol and formate effluxes can grow up to 3 and 15, respectively. The carbon dioxide exchange decreases by over half into a range of from maximal growth exchange range of .
3.3 Flux couplings
The flux variations are in general not independent from each other. To understand the intracellular flux space, we need to consider higherorder flux dependencies. The flux sample covariances indicate flux couplings, where the variation in one flux is constrained by other fluxes. Figure 3 highlights example flux pair patterns out of the total of in the core model. Blue points represent growth, green points maximal growth, red points maximal growth with exchange fluxes specified, and the conventional FBA solution is a black dot.
The flux covariations become consistently more constrained while traversing from the loose growth model (blue) towards maximal growth (green). By measuring the exchange fluxes (red), we can already pinpoint most flux patterns very closely around the theoretically optimal flux pattern as defined by the conventional FBA (black).
Multiple patterns of covariation can be immediately identified. There is an exact coupling between pentosephosphate pathway reactions GND: 6PhosphoDgluconate + NADP CO2 + NADPH + DRibulose 5phosphate and TALA: Glyceraldehyde 3phosphate + Sedoheptulose 7phosphate DErythrose 4phosphate + DFructose 6phosphate, as expected from the stoichiometry. Glycolysis related FBA: DFructose 1,6bisphosphate Dihydroxyacetone phosphate + Glyceraldehyde 3phosphate and pentosephosphate pathway related G6PDH2r have also a strong, but not exact, negative correlation. The flux PGI: DGlucose 6phosphate DFructose 6phosphate and carbon dioxide exchange have no correlation, but the maximal growth still pinpoints to a carbon dioxide exchange value of approximately . The dependency of glyoxylate cycle related MALS and TCA cycle related SUCOAS: ATP + CoA + Succinate HPO + ADP + SuccinylCoA on maximal growth requirement can be observed in Figure 3. Under maximal growth a negative correlation between the two fluxes emerges. The conventional FBA solution pinpoints an optimal values as zero MALS with SUCOAS around , while the Bayesian model reveals that MALS can still have a flux value of around as long as SUCOAS tends towards simultaneously.
The patterns of G6PDH2r and FBA and FUM indicate a linear inequality for these fluxes. Especially with FUM this is natural since the pentosephosphate pathway flux G6PDH2r limits the TCA cycle flux FUM. The same effect is also seen with G6PDH2r and ICDHyr: Isocitrate + NADP 2Oxoglutarate + CO + NADPH, another TCA cycle flux.
To get more insight into the biology behind the flux couplings, the flux pair patterns can also be illustrated in the metabolic network (Figure 5). Figure 5 shows the samples of the flux distributions for several example pairs of reactions. These scatter plots indicate the dependency of the flux configurations between two reactions across the reaction. There is natural correlation between adjacent or subsequent fluxes but also correlation between fluxes in different pathways, such as glycolysis and TCA cycle (See the SUCOAS and TKT1 pair).
3.4 Intracellular flux elucidation of C. acetobutylicum
We consider the results obtained from ^{13C}MFA of Clostridium acetobutylicum grown in chemostat, i.e. in continuous cultivation maintaining steadystate, with reference condition, glucose limited condition and butanol stimulus with the goal of inferring the internal fluxes. We effectively repeat the study of Wallenius et al. (2016), where FBA and FVA were performed and constrained on 12 intracellular fluxes determined by ^{13C}MFA and 7 exchange fluxes. The model for Clostridium acetobutylicum consists of 451 metabolites and 604 reactions, and is given as an .xml file in the supplement of Wallenius et al. (2016).
The data from different measurements in glucose limited condition are shown in Table 2. For the reference and the butanol stimulated conditions, see Supplementary Tables 2 and 3. The internal fluxes are obtained from ^{13C}MFA analysis, whereas the exchange fluxes were measured by chromatographical methods or transferred from the ^{13C}MFA results. Flux values were normalized to the specific growth rate which was given the value of , except for the reference condition, the measured growth is . Exchange fluxes measured from the cultivations were given to Bayesian MFA as mean and standard deviations and fluxes obtained from ^{13C}MFA were given as ranges. In all Bayesian MFA experiments, the steady state relaxation was . Finally, 500 samples were drawn from the posterior with thinning 1000.
To study the FBA’s, FVA’s and BMFA’s performance to predict the measured distributions of metabolic fluxes, three sets of models were generated: (A) a model with reaction directions for the 12 ^{13C}MFA determined internal fluxes set according to the data (bounds in table 2 ), (B) a set of 12 models where each reaction among 12 ^{13C}MFA determined internal fluxes is unconstrained at a time (the reaction direction is still constrained) while the rest 11 fluxes are constrained according to the measurements (leaveoneout), and (C) a model with all 12 ^{13C}MFA determined internal fluxes constrained to their measured values. In all three cases we constrain the model with measured values for the 6 exchange reactions and the measured growth. For the (B) set of models, we test how well we can predict the ^{13C}MFA determined flux interval for the leaveoneout reaction by comparing the prediction with ^{13C}MFA determined flux. For each set of models, the standard MFA with Taxicab penalty, FVA, and Bayesian MFA were performed. The MFA and FVA were performed by The Cobra Toolbox’s optimizeCbModel and fluxVariability functions by maximizing growth with the growth lower and upper bounds set to the measured value .
Reaction  KEGG ID  Bounds  Glucose limited  
Exchange fluxes  
Glucose exchange  C00031  
Acetate exchange  C00033  
Acetone exchange  C00207  
Butanol exchange  C06142  
Butyrate exchange  C00246  
Ethanol exchange  C00469  
EPS production  
Growth  
F1 scores %  
ex only  LOO  ex + 13C  
FVA  BMFA  FVA  BMFA  FVA  BMFA  
Malate DHase  R00342  19  7  65  19  68  23  
3Pglycerate DHase  R01513  1  53  12  59  100  100  
Acetaldehyde DHase  R00228  8  56  12  73  95  92  
Triosephosphate DHase  R01061  10  95  5  75  63  85  
Acetolactate synthase  R04672  17  67  17  68  68  68  
Aspartate transaminase  R00355  1  32  14  29  69  69  
5,10CH=THF hydrolyase  R01655  0  3  0  3  100  100  
Malate hydrolyase  R01082  2  83  59  54  62  67  
Ribulose5P epimerase  R01529  1  0  1  0  100  100  
Pyruvate carboxylase  R00344  19  36  65  33  66  26  
Carbonate hydrolyase  R10092  10  57  97  31  100  53  
Cacetyl transferase  R00212  16  2  84  13  100  81  
Average  9  41  36  38  83  72 
We study how the flux variances from Bayesian MFA results decrease when adding ^{13C}MFA constraints. The Figure 4 shows the reduction of standard deviations of flux distributions of all fluxes of C. acetobutylicum in glucose limited condition when all 12 ^{13C}MFA constraints are added to the model (model set C). When adding the ^{13C}MFA constraints, the variance of most unmeasured internal fluxes decreases, demonstrating how Bayesian MFA propagates the information about 12 measured fluxes to several tens of other internal fluxes. The reduction of standard deviations of flux distributions for the reference and butanol stimulated conditions are shown in Supplementary Figures 2 and 3.
For the glucose limited experiment, the MFA, FVA and Bayesian MFA results for model sets A, B and C are shown for 12 ^{13C}MFA determined reactions in Figure 6. To quantify the performance of FVA and Bayesian MFA to predict the ^{13C}MFA measured range of flux values, precision, recall and the F score were computed for each reaction and model set. The F score values are shown in Table 2
as percentage and the precision and recall values are shown in the Supplementary Table 1. From Table
2 it can be concluded that the Bayesian MFA outperforms FVA for most of the reactions, especially in the leaveoneout testing. In Figure 6, for the model set A, the FVA gives a wide range of solutions whereas the distribution from Bayesian MFA is narrower and closer to the true values for almost all reactions. Figure 6b shows the results for the model set B: for reactions 3Pglycerate dehydrogenase, Acetaldehyde dehydrogenase, Triosephosphate dehydrogenase and Acetolactate synthase the flux distribution obtained by Bayesian MFA matches the test data better. In Figure 6c the resulting flux ranges for FVA and Bayesian MFA distributions are always within the true measured range, but the Bayesian MFA captures the probability in the flux values. Similar results are obtained for the butanol stimulated and reference conditions (See Supplementary Tables 2 and 3 Supplementary Figures 4 and 5).4 Discussion
The conventional FBA formalism is a powerful framework for flux analysis that however assumes several unrealistic simplifying model approximations. Several approaches from robust flux analysis and sampling to flux variability analyses indicate the need to alleviate the approximations towards a more principled model.
We proposed the Bayesian flux analysis formalism that considers fluxes as distributions instead of point estimates. The model learns a posterior distribution of fluxes given prior information, flux measurements, upper and lower bounds and steadystate assumptions into account. The degree of belief in the measurements and steadystate can be adjusted via measurement noise variances and biological knowledge as encoded in (subjective) priors. The model characterises the complete space of possible flux configurations by modeling the uncertainties of fluxes and flux combinations. The Bayesian formalism can be seen as a dropin replacement for deterministic flux analysis tools — such as FBA and FVA — at the cost of added running time necessary to properly characterise the flux distributions. The runtime can be effectively alleviated by only considering the core parts of the metabolic model or by running multiple MCMC chains in parallel.
Our results show that the conventional FBA and FVA tools provides an overly simplistic view of the flux capabilities of the cellular system under study, while the Bayesian model expresses the full variance in the flux configurations. The Bayesian model of metabolism opens doors for building flux analysis models in a Bayesian way. We will leave experimental design, knockouts and strain design using the Bayesian modelling basis for future work. In future we expect the Bayesian formalism to provide an alternative statistical approach for majority of current FBA and MFA based tools with the benefit of rigorous uncertainty modeling and improved interpretation.
Acknowledgements
This work has been supported by the Academy of Finland Center of Excellence in Systems Immunology and Physiology, the Academy of Finland grants no. 260403 and 299915, and the Finnish Funding Agency for Innovation Tekes (grant no 40128/14, Living Factories).
References
 Almaas et al. (2004) Almaas, E., Kovács, B., Vicsek, T., Oltvai, Z., and Barabási, A. (2004). Global organization of metabolic fluxes in the bacterium escherichia coli. Nature, 427, 839–843.
 Altmann et al. (2014) Altmann, Y., McLaughlin, S., and Dobigeon, N. (2014). Sampling from a multivariate gaussian distribution truncated on a simplex: a review. In Statistical Signal Processing.
 Becker et al. (2007) Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., and Herrgard, M. J. (2007). Quantitative prediction of cellular metabolism with constraintbased models: the cobra toolbox. Nature protocols, 2(3), 727.
 Bhadra et al. (2018) Bhadra, S., Blomberg, P., Castillo, S., Rousu, J., and Wren, J. (2018). Principal metabolic flux mode analysis. Bioinformatics.
 Bordbar et al. (2014) Bordbar, A., Monk, J., King, Z., and Palsson, B. (2014). Constraintbased models predict metabolic and associated cellular functions. Nat Rev Genet, 15, 107–120.
 Bordel et al. (2010) Bordel, S., Agren, R., and Nielsen, J. (2010). Sampling the solution space in genomescale metabolic networks reveals transcriptional regulation in key enzymes. PLOS Computational biology, 6.
 Botev (2016) Botev, Z. (2016). The normal law under linear restrictions: Simulation and estimation via minimax tilting. Journal of the Royal Statistical Society Series B.
 Carreira et al. (2014) Carreira, R., Evangelista, P., Maia, P., Vilaca, P., Pont, M., Tomb, J.F., Rocha, I., and Rocha, M. (2014). Cbfa: phenotype prediction integrating metabolic models with constraints derived from experimental data. BMC Systems Biology, 8, 123.
 Chopin (2011) Chopin, N. (2011). Fast simulation of truncated Gaussian distributions. Statistics and Computing, 21(2), 275–288.
 Dubois et al. (1996) Dubois, D., Fargier, H., and Prade, H. (1996). Possibility theory in constraint satisfaction problems: handling priority, preference and uncertainty. Applied Intelligence, 6, 287–309.
 Emery et al. (2014) Emery, X., Arroyo, D., and Pelaez, M. (2014). Simulating large gaussian random vectors subject to inequality constraints by gibbs sampling. Mathematical Geosciences, 46, 265–283.
 Feist and Palsson (2010) Feist, A. and Palsson, B. (2010). The biomass objective function. Current opinion in Microbiology, 13, 344–349.
 Gelman and Rubin (1992) Gelman, A. and Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
 Gelman et al. (2013) Gelman, A., Carlin, J., Hunson, D., Vehtari, A., and Rubin, D. (2013). Bayesian Data Analysis, chapter 11.4–11.5. CRC Press, 3rd edition.
 Geweke (1991) Geweke, J. (1991). Efficient simulation from the multivariate normal and studentt distributions subject to linear constraints and the evaluation of constraint probabilities. In Computing science and statistics, pages 571–578.
 Gudmundsson and Thiele (2010) Gudmundsson, S. and Thiele, I. (2010). Computationally efficient flux variability analysis. BMC bioinformatics, 11(1), 489.
 Heino et al. (2007) Heino, J., Tunyan, K., Calvetti, D., and Somersalo, E. (2007). Bayesian flux balance analysis applied to a skeletal muscle metabolic model. Journal of theoretical biology, 248, 91–110.
 Heino et al. (2010) Heino, J., Calvetti, D., and Somersalo, E. (2010). Metabolica: a statistical research tool for analyzing metabolic networks. Comput Methods Programs Biomed, 97, 151–167.

Horrace (2005)
Horrace, W. (2005).
Some results on the multivariate truncated Normal distribution.
Journal of Multivariate Analysis
, 94(1), 209–221.  Kadirkamanathan et al. (2006) Kadirkamanathan, V., Yang, J., Billings, S., and Wright, P. (2006). Markov chain monte carlo algorithm based metabolic flux distribution analysis on corynebacterium glutamicum. Bioinformatics, 22, 2681–2687.
 Kaufman and Smith (1998) Kaufman, D. and Smith, R. (1998). Direction choice for accelerated convergence in hitandrun sampling. Oper. Res, 46, 84–95.
 Kim et al. (2008) Kim, H., Kim, T., and Lee, S. (2008). Metabolic flux analysis and metabolic engineering of microorganisms. Molecular BioSystems, 4, 113–120.
 Kotecha and Djuric (1999) Kotecha, J. and Djuric, P. (1999). Gibbs sampling approach for generation of truncated multivariate gaussian random variables. In Acoustics, Speech, and Signal Processing.
 Li and Ghosh (2015) Li, Y. and Ghosh, S. (2015). Efficient sampling methods for truncated multivariate normal and studentt distributions subject to linear inequality constraints. Journal of Statistical Theory and Practice, 9(4), 712–732.
 Llaneras et al. (2009) Llaneras, F., Sala, A., and Pico, J. (2009). A possibilistic framework for constraintbased metabolic flux analysis. BMC Systems biology, 3.
 MacGillivray et al. (2017) MacGillivray, M., Ko, A., Gruber, E., Sawyer, M., Almaas, E., and Holder, A. (2017). Robust analysis of fluxes in genomescale metabolic pathways. Scientific Reports, 7(268).
 Mahadevan and Schilling (2003) Mahadevan, R. and Schilling, C. (2003). The effects of alternate optimal solutions in constraintbased genomescale metabolic models. Metabolic engineering, 5(4), 264–276.
 Mo et al. (2010) Mo, M., Palsson, B., and Herrgård, M. (2010). Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Systems Biology, 3.
 Murray et al. (2010) Murray, I., Prescott, A., and MacKay, D. (2010). Elliptical slice sampling. JMLR.
 Orth et al. (2010) Orth, J., Thiele, I., and Palsson, B. (2010). What is flux balance analysis. Nature Biotechnology, 28(3), 245–248.
 Pakman and Paninski (2014) Pakman, A. and Paninski, L. (2014). Exact hamiltonian monte carlo for truncated multivariate gaussians. Journal of Computational and Graphical Statistics, 23(2), 518–542.
 Pakula et al. (2016) Pakula, T. M., Nygren, H., Barth, D., Heinonen, M., Castillo, S., Penttilä, M., and Arvas, M. (2016). Genome wide analysis of protein production load in trichoderma reesei. Biotechnology for Biofuels, 9(1), 132.
 Palsson (2015) Palsson, B. (2015). Systems Biology: Constraintbased Reconstruction and Analysis. Cambridge University Press.
 Saa and Nielsen (2016) Saa, P. and Nielsen, L. (2016). llachrb: a scalable algorithm for sampling the feasible solution space of metabolic networks. Bioinformatics, 32(15), 2330–2337.
 Schellenberger et al. (2010) Schellenberger, J., Park, J., Conrad, T., and Palsson, B. (2010). Bigg: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC bioinformatics, 11.
 Schellenberger et al. (2011) Schellenberger, J., Que, R., Fleming, R. M., Thiele, I., Orth, J. D., Feist, A. M., Zielinski, D. C., Bordbar, A., Lewis, N. E., Rahmanian, S., et al. (2011). Quantitative prediction of cellular metabolism with constraintbased models: the cobra toolbox v2. 0. Nature protocols, 6(9), 1290.

Smith (1984)
Smith, R. (1984).
Efficient montecarlo procedures for generating points uniformly distributed over bounded regions.
Oper. Res, 32, 1296–1308. 
Theorell et al. (2017)
Theorell, A., Leweke, S., Wiechert, W., and Nöh, K. (2017).
To be certain about the uncertainty: Bayesian statistics for 13c metabolic flux analysis.
Biotechnology and Bioengineering, 114, 2668–2684.  Wallenius et al. (2016) Wallenius, J., Maaheimo, H., and Eerikäinen, T. (2016). Carbon 13metabolic flux analysis derived constraintbased metabolic modelling of Clostridium acetobutylicum in stressed chemostat conditions. Bioresource Technology, 219, 378 – 386.
 Zavlanos and Julius (2011) Zavlanos, M. and Julius, A. (2011). Robust flux balance analysis of metabolic networks. In American Control Conference (ACC), 2011, pages 2915–2920. IEEE.