Bayesian inference for system identification and rare event reliability analysis can both be formulated as Bayesian updating problems, which means that they can both be solved using the same algorithms [7, 11, 43, 20]
. In this work we consider Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) algorithms for solving these updating problems. This family of algorithms, based on Sequential Monte Carlo, allows us to gradually transform the prior probability distribution describing the system’s uncertainty to the updated posterior distribution that describes the system uncertainty conditioned on data or knowledge of a failure’s occurrence. Previously, separate algorithms from this family have been used to solve these problems, such as TMCMC for posterior sampling in Bayesian system identification and Subset Simulation 
for estimating prior probabilities of rare events. These algorithms share many commonalities and can be combined in the framework of ST-MCMC to enable full posterior probabilities of rare events to be estimated by a single algorithm, which has not been done previously in this framework. The unification of Bayesian updating and uncertainty quantification for posterior reliability problems has been considered before in[5, 36, 3], but has been significantly held back by inefficient MCMC methods that have made it generally computationally expensive. The development of a new MCMC sampler within ST-MCMC, the Rank One Modified Metropolis Algorithm presented in this work, allows for more efficient sampling of the posterior failure region than previous methods. Moreover, we find that the benefits of ST-MCMC and ROMMA are quite broad. Therefore, the contributions of this work are:
Presenting a general framework for understanding Sequential Tempered MCMC algorithms like TMCMC and Subset Simulation
Showing that this framework can be used to efficiently solve the posterior reliability problem while being robust to modeling uncertainty
Introducing the Rank-One Modified Metropolis Algorithm to speed up sampling in ST-MCMC
Typical MCMC methods rely on generating samples by sequentially evolving a Markov Chain which explores the posterior distribution and estimates expectations with respect to the posterior based upon the ergodicity of the chain. These single chain samplers are difficult to parallelize, tune, and adapt to complex posterior environments such as unidentifiable and locally identifiable models. This makes solving for the posterior failure probability difficult since the problem is often high dimensional and the posterior may be quite complex in shape [4, 28].
Sequential Tempered MCMC methods are population-based methods that can more efficiently generate samples from complex posteriors since they evolve a population of chains which captures the global structure of the posterior, allowing for better adaptation. Examples of methods that fit in the framework of ST-MCMC include transitional/multilevel MCMC [12, 38], AIMS , AlTar/CATMIP , Subset Simulation , and more generally, forms of Sequential Monte Carlo (SMC) [27, 19, 25, 14]. These methods exploit parallelism by evolving a population of Markov chains simultaneously through a series of intermediate distribution levels until, as a population, they reach the final posterior. Since there are many chains sampling the final level, the mixing time of the Markov chain while sampling the ultimate posterior distribution is less relevant and so it can be much more efficient to use these methods. The intermediate levels also enable the algorithm to estimate the model evidence for solving model selection problems [12, 34] and rare event failure probabilities .
We also introduce a better proposal method for the MCMC step in ST-MCMC to sample high dimensional distributions. Moving beyond standard Metropolis algorithms, like Random Walk Metropolis (RWM), which suffers from the curse of dimensionality, we adapt the Modified Metropolis Algorithm[1, 45], for use when the proposal distribution is a multivariate Gaussian with any covariance structure. This new algorithm, called the Rank-One Modified Metropolis Algorithm (ROMMA), performs a series of rank-one updates according to the prior distribution to form a proposal candidate. This means ROMMA can sample posterior distributions very effectively when they are significantly informed by the prior, which is common for many inference problems for physical systems. ROMMA avoids many of the high-dimensional scaling issues seen with RWM, particularly when the prior distribution has bounded support, since the proposed candidate is adapted to any prior constraint information.
The paper is organized as follows. First, we formulate the Bayesian updating and reliability problems in Section 2. Then we discuss Sequential Tempered MCMC (ST-MCMC) algorithms and how they can be used to solve Bayesian inference and reliability problems in Section 3. Next, we introduce the ROMMA algorithm in Section 4. Then we present experimental results of using ST-MCMC to solve a robust reliability assessment problem for estimating small failure probabilities for a water distribution system in Section 5. Finally, we provide concluding remarks in Section 6. The appendix contains proofs, implementation details, and discussion about algorithm tuning.
2 Bayesian Formulation
The Bayesian framework is a rigorous method for quantifying uncertainty using probability distributions. This philosophy is rooted in probability as a logic [2, 16, 17, 26]. Within this framework, probability distributions are used to quantify uncertainty due to insufficient information, regardless of whether that information is believed to be unavailable (epistemic uncertainty) or is believed to not exist because of inherent randomness (aleatory uncertainty). This notion of uncertainty makes the Bayesian framework well suited for posing system identification problems, where postulated system models have uncertain, rather than random, mathematical structure and parameters. Therefore, we view system identification as updating a probability distribution that represents our beliefs about a system based on new information from system response data. This uncertainty description is then used to quantify the prediction of future system behaviors.
2.1 Bayesian Inference
Bayesian inference uses Bayes’ theorem to update our uncertainty about a system using data, where uncertainty is quantified by assigning a probability functionto different system descriptions. The Bayesian inference formulation of parameter estimation and system identification is given in : Given observation data and assuming a system description in terms of a model class , find the posterior distribution
that represents the updated beliefs about a model parameter vectorafter incorporating the data. The data is made up of measurements of the output and possibly measurements of the input . The model class consists of (a) a set of predictive forward models, , describing the plausibility of the predicted output for a parameter vector , and (b) a prior distribution, over representing our initial belief about the plausibility of different values of the model parameter vector and their corresponding forward models. To find the posterior distribution, Bayes’ Theorem is used:
The likelihood function, , denotes the probability of observing the data according to the predictive model of the system with the measured input and output substituted into it. The normalizing factor in equation (1), , is called the evidence (or marginal likelihood) for the model class and is given by:
If multiple model classes exist to describe the uncertain behavior of the system, the evidence for each model class can be used to solve the corresponding model class selection problem .
Typically, the model’s prediction of the system’s behavior has uncertain prediction accuracy, so we choose a probability model based on additive independent measurement and prediction errors, , for the prediction of the measurement of the th output . The stochastic predictive forward model then becomes
where is the model prediction equation for the th output and is the PDF for the combined measurement and model prediction errors. With modern sensors, the measurement error will usually be quite small compared to the model prediction error. The deterministic prediction equation may be constructed from underlying physical principles while may be chosen based on a sensor model and Jaynes’ principle of maximum information entropy [2, 26].
We can broadly classify the posterior probability distributions from solving the inference problem for a model class into three types: globally identifiable, locally identifiable, and unidentifiable[4, 28]. Globally identifiable model classes have a posterior distribution with a single peak around a unique maximum. Locally identifiable model classes have a posterior distribution with several separated peaks with approximately the same significance. Unidentifiable models have a manifold in the parameter space with approximately equally plausible parameter values. When the problem results in a locally identifiable or unidentifiable distribution, Bayesian methods are essential since they can capture the distribution unlike optimization-based methods. However, these problems are still challenging since it is often difficult to find and explore the peaks or the manifold of most plausible parameter values. This necessitates the development of better MCMC methods.
2.2 Bayesian Uncertainty Quantification
Taking a Bayesian perspective to studying complex systems enables scientists and engineers to quantitatively integrate all forms of uncertainty into their planning and decision making process using probability; for example, to make prior and posterior robust predictions about future system performance that take into account all available sources of uncertainty [30, 13, 5, 33]. This is done by marginalizing over the collective modelling uncertainty and other sources of uncertainty.
Within this framework, we can assess the probability of a system in the future being in some failure domain of unsatisfactory system performance that is defined as where is a performance failure function. For convenience, we now view the parameter vector as an expanded vector that captures both the modeling uncertainty in the system description and the uncertain future inputs. Therefore, the uncertainty for some of the components of will be described by the posterior in (1) while for the remaining components it will be described by a prior distribution. When the system understanding only uses prior information about the model description of the system, , for a single model class , then we can define the prior failure probability [5, 36] as the expected value:
where it is to be understood that some of the components of may not be informed by the data and so the uncertainty in their values is controlled by the prior distribution. Using MCMC and importance sampling via ST-MCMC, we can provide estimates of the posterior failure probability. However, the MCMC method introduced in the Subset Simulation algorithm  for small prior failure probabilities (rare-event simulation) in (4) requires modification to be efficient computationally in the posterior case (5) because the data induces significant correlations in that were not considered in the original algorithm.
2.3 Solving Bayesian Updating Problems
Bayesian inference and updating problems are usually analytically intractable [2, 22] due to the required integration in (2). Common sampling methods for Bayesian inference are Markov Chain Monte Carlo (MCMC) methods , which do not require normalization of the posterior in (1
). Generating samples through MCMC is computational intensive as often thousands to millions of model evaluations are needed to fully populate the high probability content of a complex posterior. While the central limit theorem implies that the estimate quality for the mean of a finite-variance stochastic variable scales independently of the dimension, given independent samples, MCMC methods produce correlated samples, which can introduce poor high dimensional scaling. As a result, solving updating problems using Bayesian methods is often prohibitively expensive for high dimensional or complex problems.
Recall that the basic idea of Monte Carlo estimation is to estimate expected values of with respect to the posterior distribution by using a population of posterior samples for as follows:
Assuming certain conditions hold, the quality of this estimate and its convergence can be assessed by the Markov chain central limit theorem .
2.4 Basic Markov Chain Monte Carlo Algorithm
The basis for many MCMC methods is the Metropolis-Hastings algorithm, which produces a Markov chain with a desired stationary distribution, , by designing a transition kernel, , such that the Markov chain is ergodic and reversible [23, 39]. Ergodicity ensures that the Markov chain has a unique stationary distribution, if it exists, while reversibility is a sufficient condition for the existence of a stationary distribution, . Any proposal distribution such that for , can be used to construct such a . Given the Markov chain is in state , this is done by proposing a candidate sample according to and then accepting the candidate with probability given by:
If the candidate is rejected, the initial sample is repeated. This leads to the Metropolis-Hastings (MH), Algorithm 1. After settling into the stationary distribution, the resulting Markov chain produces samples from which are correlated.
The major challenge for the MH algorithm is designing the proposal distribution . A good proposal will cause the Markov chain to (1) converge quickly to the stationary distribution, that is, have a short burn-in time, and (2) have low correlation while sampling the stationary distribution. A common Metropolis-Hastings implementation is Random Walk Metropolis (RWM) in which
is a multivariate Gaussian distribution centered at.
2.5 Limitations of Traditional MCMC
Generally, finding a proposal distribution that escapes the “curse of dimensionality” is difficult because the high probability region of the posterior distribution lives on a low dimensional manifold in the high dimensional space. Therefore, it is important to find and sample this manifold efficiently. Even when starting on the manifold, randomly sampling the region around it without detailed knowledge of the structure will lead to a low acceptance rate of proposed samples in the Markov chain. Thus, if the proposal distribution is ill informed, very short steps are needed to ensure high acceptance rates. This leads to highly correlated samples. These types of posterior distributions are common in inverse problems for physical systems where the data is not sufficiently rich to detangle the complex relationships produced by the system.
Further, even for simple distributions without complicated geometry, RWM requires many model evaluations to produce a satisfactory sample population. Practitioners often run chains for hundreds of thousands or millions of iterations to ensure they have enough uncorrelated samples. For simple problems, the length of the chain required to decorrelate the samples scales linearly with the dimension of the problem . Avoiding slow mixing by developing more efficient samplers is therefore critical for solving inference problems involving systems where evaluating the forward model is computationally intensive.
Finally, the standard MH algorithm is constrained by the fact that it forms a Markov chain. This means it requires sequential evaluation and it has a limited ability to adapt to its state without jeopardizing its reversibility and ergodicity. The sequential updating of the chain makes MH MCMC unsuitable for high performance computing because it cannot exploit parallelism. For efficient solution of computationally intensive inverse problems, algorithms are sought that exploit parallelism and are adaptive based upon global information.
3 Sequential Tempered MCMC Algorithm
ST-MCMC methods provide many advantages over single chain methods because of the information and adaptation they gain through the population of samples and by tempering. The population aspect enables parallelism and the ability to capture and learn from the global structure of the posterior. The tempering enables the samples to gradually transition from the prior to the posterior. This means they better adapt to handle complicated distributions, like multi-modal distributions, without getting stuck sampling in one area when the chains have difficulty exploring the posterior. These methods have been shown to be very effective in many problems [12, 32, 1, 27]. Thinking about all these algorithms within a single framework is helpful to implement and tune them, particularly for posterior robust uncertainty quantification.
ST-MCMC methods can be divided into three basic parts: tempering/annealing, importance resampling, and MCMC. The annealing step introduces intermediate distributions that gradually evolve the samples from the prior to the posterior, as illustrated in Figure 1. The importance resampling step discards relatively implausible samples and multiplies highly plausible samples to maintain and rebalance the samples with respect to the changes in the intermediate distributions from one level to the next. Then the MCMC step allows the samples to be perturbed in order to explore the intermediate distributions and to adjust to changes as the distributions evolve. This family of algorithms can be interpreted as a form of Sequential Monte Carlo as found in , which provides the theoretical foundation of these algorithms.
A fully general ST-MCMC algorithm is described in Algorithm 2. The choice for resampling, adaptation, and MCMC may vary. This algorithm begins with samples from the prior distribution and then evolves them to generate samples from the posterior distribution , which is the distribution conditioned on additional data or information . For each of the intermediate distributions , the previous level’s population samples are weighted based upon their relative likelihood according to the initial and new distribution, and then resampled. Then each sample initiates a Markov chain which explores the intermediate distribution according to the MCMC process defined by the proposal distribution . The next level proposal and next intermediate distribution are typically adapted based upon information from the current sample population, with growing closer to the posterior as increases. Once the chains are sufficiently decorrelated from their initial seeds, the samples at the end of the Markov chains serve as the initial samples to start the weighting and resampling process for the next level. This process repeats until the samples are distributed according to the posterior distribution , meaning that is for the final . For a discussion for the general tuning and parameterizations of this algorithm, see the appendix A.
3.1 ST-MCMC for Bayesian Inference
An approach to using ST-MCMC for Bayesian inference is found in the TMCMC  and CATMIP  algorithms or in the context of an SMC algorithm as in . A general implementation of these ideas is found in Algorithm 3. When initializing the algorithm, the initial sample population is drawn from the prior distribution. The number of samples in the population is typically fixed at all levels to be . Parameters used for the algorithm are then initialized, such as the annealing factor , level counter , and parameters that define the proposal distribution .
At the beginning of each subsequent level , the annealing factor is computed. The annealing factor controls the influence of the data at every level by gradually transitioning the level stationary distributions from the prior at to the posterior at . The increment at each level is chosen to ensure that the intermediate distributions are not too far apart, otherwise the sample population does a poor job representing the next level distribution . This increment is controlled by looking at the degeneracy of the sample importance weights, which are weights that allow us to transform samples from making estimates with respect to to . This process is illustrated in Figure 2. This weight function takes the form of . The degeneracy in the weights is measured by computing their coefficient of variation and trying to find a such that it is equal to some target threshold . We use the sample coefficient of variation and so we must solve for in
This equation is typically solved using a bisector method since we have an upper and lower bound for . Based upon the theory of importance sampling, the coefficient of variation is an estimate of the effective sample size (ESS), making it a good proxy for degeneracy . Therefore, we set to have a certain target ESS in the sample population so that the population represents the intermediate level distribution well. The ESS is the equivalent number of independent samples that will provide the same variance estimate of a quantity of interest. For a more detailed discussion of the change of ESS during ST-MCMC, see A.
Once and are found, the sample population can be used to make expectation estimates with respect to the th level PDF using the normalized importance weights :
The algorithm is now ready to produce the next level sample population, . Importance resampling produces an initial sample population that is asymptotically distributed according to for increasing population size. Multinomial resampling is often used, where each is randomly picked from the samples . The probability of choosing is . Because the new samples are a subset of the previous level population, there is added degeneracy. To add diversity, MCMC is used to evolve the sample population according to .
The MCMC step for ST-MCMC is defined by the type of proposal distribution used and the length of the chain. Both of these factors can significantly influence the performance of the algorithm as the chains must be allowed to evolve sufficiently to ensure that they explore the distribution and decorrelate from each other. Typically, within Metropolis-Hastings MCMC, a Gaussian proposal is used where the candidate , and is the sample covariance matrix computed using the weighted samples while is the scaling factor, which is adapted using the acceptance rate of the sampler at the previous MCMC level. We use a feedback controller-based method to tune the scale factor and a theoretical method for choosing the chain length, both of which are discussed in the appendix A.2. Other methods besides RWM can be used as discussed in Section 4.
The final sample population at level is at the end of these chains. The algorithm then iterates until in which case the final samples are from the posterior distribution.
3.2 ST-MCMC for Estimating Prior Failure Probabilities
Sequential Tempered MCMC can also be used to estimate small failure probabilities for reliability analysis as described in Algorithm 4. This algorithm as implemented is essentially a formulation of Subset Simulation . Here, the intermediate distributions are intermediate failure domains for increasing threshold levels of a failure function , where failure is defined when . So by defining the intermediate failure region by for some , this region contains the full failure region plus additional points. When the distribution over the parameters is the prior distribution since no failure information has been incorporated. When , the distribution is now the posterior distribution since it is the distribution conditioned on the failure event . By moving from to , we anneal down to the final failure region through a set of nested intermediate failure domains. With these prior, posterior, and intermediate distributions, we use ST-MCMC to find, sample, and estimate the probability contained in the failure region.
The ST-MCMC method used to estimate small failure probabilities is simpler than that used for Bayesian updating since . This means that the weights are either or depending if the sample is inside or outside, respectively, the failure region defined by . Therefore, multinomial resampling is not needed. The next is determined in Step 2 of Algorithm 4 by finding a such that a specified fraction, , of the samples satisfy . The fraction controls the ESS of the sample population i.e. if then the ESS is . These samples are then replicated to regenerate the initial size of the sample population. The samples are then decorrelated and explore the intermediate failure domain distribution using MCMC.
It is important to note that Algorithm 4 differs from the Subset Simulation algorithm presented in  in one aspect. In place of Step 4, the original Subset Simulation keeps the fraction of the samples that satisfy and then evolves each of the chains steps in Step 6 to recover a total of samples from the th level failure PDF . In contrast, we replicate the samples times then evolve the resulting chains in parallel for a number of steps that is determined by how well the chain is mixing (Step 6). While this requires more function evaluations, it is very parallelizable and also more robust when long chains are needed to ensure decorrelation and better exploration of .
3.3 ST-MCMC for Estimating Posterior Failure Probabilities
To estimate and sample the posterior failure region , we can combine Algorithms 3 and 4. One approach is to use Algorithm 3 to transform samples from to samples from and then use these samples to seed Algorithm 4, which then transforms samples from to samples from , yielding posterior failure samples. In this case, is essentially the prior distribution for the failure assessment and the intermediate distributions are where is the intermediate failure domain. Therefore, it is important for the MCMC method to be able to efficiently explore , as we see in the experiments in Section 5 when we solve a posterior reliability analysis using this approach. Otherwise, Algorithm 4 proceeds as described in Section 3.2.
Alternatively, different paths to transform samples from to could be used. The most efficient path will depend on the computational cost of computing the likelihood and failure functions, along with the level of difficulty of sampling the domains. This question of developing a strategy for finding the optimal path to minimize effort is left to future study.
3.4 ST-MCMC for Integration
One of the original motivations for the development of Sequential Tempered MCMC methods like TMCMC  was to solve Bayesian model selection problems, for which population MCMC algorithms do well . Such problems can be very difficult to solve because they require computing the model evidence, that is, the likelihood of the data given the model class, , which is given by the integral in (2). Similarly, when ST-MCMC is used for computing failure probabilities , we must compute , which can be expressed as integral (4). The similarity of the model selection and failure probability estimation problem indicate that the same methods can be used to solve both problems. In this section, we describe such a method for computing failure probabilities using ST-MCMC. We discuss computing model evidences in the appendix to illustrate the similarities between the two methods, see B.
3.4.1 Estimating Rare Event Failure Probability
Computing the failure probability for a rare events is dicussed in . In this case, is
This integral could be naively estimated using Monte Carlo sampling of the prior distribution . This estimate would be computationally inefficient when the probability of failure is small, since the probability of randomly generating a sample in the failure region is low. However, the intermediate levels of ST-MCMC solve this problem by decomposing the computation over the intermediate failure domains. The integral is then expressed as the product of Monte Carlo estimates of intermediate conditional failure probabilities:
For each intermediate level, we can perform a fairly accurate Monte Carlo estimate between the previous level and the current level since these distributions are designed to be relatively close to each other in terms of the relative ESS (effective sample size) of samples coming from the previous level. Having a high ESS means Monte Carlo sampling will be effective. Noting that , the Monte Carlo estimate for in (11) takes the form
where . Then, if the fraction of intermediate failure samples at each level of the algorithm is set to , the total estimate of the failure probability becomes
By breaking up the estimation of the failure probability into a number of levels, the number of model evaluations needed to check whether a system configuration leads to failure scales sub-linearly with the inverse failure probability . Normal Monte Carlo requires samples to estimate the failure probability while ST-MCMC algorithms require since it requires approximately levels to reach the failure domain.
4 Rank-One Modified Metropolis Algorithm
The most computationally intensive step of ST-MCMC is evolving the population of Markov chains during the MCMC step. This is particulary significant for high dimensional inference problems where exploring the posterior distribution is challenging. For certain types of problems where the posterior distribution is informed by the prior, integrating prior information in the proposal can reduce the computational cost. This avoids the wasted computational effort of computing the likelihood function when the candidate was rejected mostly due to the influence of prior information and not the likelihood. The Modified Metropolis Algorithm (MMA), developed in , does this under the assumption that the proposal distribution is a Gaussian with independent variables and that the prior has independent components. The Rank-One Modified Metropolis Algorithm (ROMMA) presented in this work generalizes the MMA algorithm to any prior distribution and any Gaussian proposal distribution.
In general, ROMMA provides speed ups when the distribution conditioned on information or data is still significantly informed by the prior distribution . There are two common situations that lead to these types of posteriors. The first case is where the prior enforces a constraint such as a prior inequality constraint on model parameter. The second case is inference problems where the data is only rich enough to inform the parameter distribution along certain directions, also known as active subspaces . This typically occurs for unidentifiable model classes which have a posterior distribution with a manifold of approximately equally plausible parameter values . Both of these situations are common in posterior reliability problems for complex models of physical systems.
4.1 Modified Metropolis Algorithm
The Modified Metropolis Algorithm was developed in  to overcome the curse of dimensionality that comes from sampling high dimensional posteriors when estimating small failure probabilities. This algorithm originally assumed that the posterior is the product of the prior and an indicator function but it can be expanded to the more general Bayesian inference setting as in Algorithm 5. A proof of its reversibility is given in C.1. The algorithm assumes that the prior distribution has independent components such that , which is a common assumption for many prior reliability problems. In order to evolve the Markov chain that samples the posterior , the authors break it up into a two-step proposal. The first step of the proposal deals with the prior and can be done component-wise. The second step deals with the likelihood , which is done by evaluating the forward model. By separating out the prior and evolving it component-wise, this algorithms avoids the poor dimensional scaling introduced by the prior. This is particularly important for priors with bounded support because they often have a significant impact on the posterior. However, the independence assumptions for the prior and proposal distributions pose a significant drawback for applying MMA to general Bayesian updating problems where there is significant correlation induced by the data.
4.2 Rank-One Modified Metropolis Algorithm (ROMMA)
We develop a similar two-step proposal process for a more general setting where the proposal and prior may not correspond to independent variables. In particular, we study the case of a multivariate Gaussian proposal and a general prior. The key idea is that instead of thinking of the algorithm as a set of component-wise proposals, think of it as a set of linearly independent rank one proposals. By employing this algorithm, we can significantly reduce the number of forward model evaluations, which provides a significant speed up. The tradeoff is that this algorithm requires an increased number of prior evaluations, which scales linearly with dimension, and it is sensitive to the proposal covariance used to generate the rank-one updates. However, when used as part of ST-MCMC, the covariance structure and scaling can be well estimated.
ROMMA for MCMC is described in Algorithm 6. In this algorithm, the correlation structure in the Gaussian proposal is handled by computing the matrix square root, , of the proposal covariance, ; however, in principle, any matrix decomposition may be used. We also need two permutation matrices, and , where
is the identity matrix and corresponds to performing the rank-one updates in the forward direction whilecorresponds to reversing or flipping the indices of the variables and performing the updates in the reverse direction. Using these two permutation matrices is necessary to produce a reversible sampler.
Then, for each step in the Markov chain, we initialize the candidate to be the current sample and randomly choose the permutation to be the forward or reverse ordering with equal probability. Based upon the choice of the permutation, the transformed matrix square root is formed. The th column of , , will be the th rank one update. Finally, we draw a random standard Normal vector , as when generating a zero-mean multivariate Gaussian with transformed covariance using .
Iterating through all of the rank-one updates, we construct a proposed candidate based upon the current rank-one update vector , as . We then compute the ratio of the priors, and choose whether to accept or reject the proposed rank one change according to a Metropolis step. If the component is rejected, remains the same, else is updated to . These two steps are performed for all rank-one updates until we reach the final . This set of rank-one proposals can be thought of as evolving the Markov chain according to the prior since the prior distribution would be the invariant distribution of this Markov chain in the absence of the likelihood evaluation step that follows.
After choosing , we then perform a Metropolis step to accept or reject the entire vector according to only the likelihood . Thus, we compute the ratio, , of the likelihood for the candidate and current parameter vectors. If the sample is accepted, then , else .
5 Example: Reliability of a Water Distribution Network
Assessing the performance reliability of water distribution networks is important given the increasing age of many of these infrastructure systems that leads to component degradation and failure [42, 31]. We consider first identifying leak positions and intensity within a water distribution system and then making robust predictions about the reliability of the distribution system given uncertainty in demand, leak position, and leak intensity. The Bayesian leak detection problem was previously considered in  but the posterior reliability problem has not been addressed before. This style of posterior reliability problem has been formulated in  but, in general, it remains computationally intractable using existing methods. This test problem has been designed to have many parameters whose values are not significantly informed by the data, which makes the problem reflect many physical systems. Using ST-MCMC and ROMMA, we are able to solve this problem significantly faster than ST-MCMC approaches based upon Random Walk Metropolis (RWM) or the Modified Metropolis Algorithm (MMA), with similar accuracy as judged by looking at the failure estimates and posterior marginal distributions.
5.1 Test System
We consider the Hanoi water distribution model, Figure 3, found in [21, 18] and assume it is operating under steady state conditions. The network model considers frictional losses in the pipes and pressure-dependent leak rates. The static hydraulic system model is determined by solving mass conservation equations at the nodes and energy conservation along loops. Mass conservation at each node of the system is captured by
where and are the sets of in-flow and out-flow pipes to node , respectively. is the flow rate of pipe and is the demand or leak rate at node . In this model we treat leaks as nodes where the demand is pressure dependent according to the hydraulic head at the node so that if is a leakage node.
Second, energy conservation is enforced for each loop :
where the change in the hydraulic head from the beginning to the end of pipe , , is captured by the Hazen-Williams equation expressing fictional losses:
Here, and are the hydraulic head at the start and end of the pipe, respectively. is the pipe length and is its diameter. There are four fixed model parameters: the numerical conservation constant, , the pipe roughness, , and the regression coefficients, and . For leaks at an unknown position along a pipe, we parameterize the location of that leak node along pipe using such that the length of the pipe between the source and the leak is . The “source” direction of each pipe is defined in the model but this does not constrain the flows since the flow can either be positive or negative. Therefore, we can compute the hydraulic head at the leak on pipe k as
The combined equations that describe the mass conservation, energy conservation, and leaks is solved using Newton’s method to find the vector of flows along the pipes and the vector of hydraulic heads at the nodes. This approach follows standard techniques in the water distribution community as described in .
The Hanoi network in  is a reservoir fed system with 31 nodes and 34 pipes, leading to 34 possible leaks. Therefore, the network state is parameterized by 31 nodal loads and 34 leak sizes and positions, leading to 99 parameters in the model parameter vector . The hydraulic head at the reservoir is fixed at m. The network description and the physics captured by Equations (14) - (17) define a model class used in the analysis. The prior distribution on the nodal demands are modeled as a multiplicative factor on a nominal high value that follows a Gaussian distribution with mean and standard variation . The leak sizes have an exponential prior with mean , while the leak position has a uniform prior over the length of the pipe. Our choice of an exponential leak size prior means that most of the leaks will be small but a few large leaks are possible. For more details, see D.
In this test system, failure is defined as being unable to provide a minimum level of hydraulic head at each node, which in this example is set to m. The head will be influenced by the uncertain demand and leak properties. This operations constraint is mapped to the failure function by defining . Here, are the heads for each of the demand nodes. The prediction of the hydraulic head for failure estimation is treated as deterministic and does not include model prediction uncertainty. This could be included at the cost of a more complex failure model.
The investigation is divided into three phases: 1) prior failure estimation where the failure probability is assessed based upon the prior uncertainty in the nodal demands and the leak properties, 2) leak identification based upon pressure data observed from the network under different known loading conditions, and 3) posterior failure probably estimation based upon the prior uncertainty on the demand and posterior uncertainty on the leak properties. For the posterior analysis, we consider two cases: Case 1, where failure is likely because of a large leak in a sensitive part of the network and Case 2, where failure is unlikely because of limited damage to sensitive areas of the network.
For each of these investigations, we compare ST-MCMC based on ROMMA to ST-MCMC using MMA and RWM. The methods use chains and their chain lengths at each level are determined by evolving the chains until a correlation target, , is reached. For inference (Algorithm 3), the correlation is assessed as the maximum correlation of a parameter to its starting value in the chains. For failure probability estimation (Algorithm 4), since multimodality is common to insure the chains explore the likelihood levels of each mode, the correlation of the log posterior likelihood is used. For these algorithms, the next intermediate distribution is chosen such that the effective sample size is approximately . For inference this is done by setting in Algorithm 3. For failure probability estimation this is done by setting in Algorithm 4. For a more detailed discussion of these choices see A.
The sampling based uncertainty estimates for Subset Simulation from 
are used to capture sampling uncertainty in terms of variance but they do not capture any bias due to the chains not mixing sufficiently to adequately decorrelate and explore the intermediate distributions. The quoted uncertainties are the sampling standard deviations.
5.2 Prior Failure Estimation
The estimation of the prior failure probability for systems has been extensively studied and the Subset Simulation algorithm using Modified Metropolis is generally quite effective. Here, the prior uncertainty for nodal demand, leak size, and leak position is considered, yielding 99 uncertain parameters. Two ST-MCMC algorithms were used that are similar to Subset Simulation, one based upon MMA and one based upon ROMMA. The two algorithms had the same tuning parameters so their expected accuracy should be similar. The estimated prior failure probabilities were (MMA) and (ROMMA). The prior failure probability was found to be based upon Monte Carlo samples. We see in Figure 4 that ROMMA improves on the efficiency of MMA for solving this problem since it takes only 60% of the time. This improvement is significant but the full power of ROMMA comes when solving the posterior problem, since handling the data-induced correlation becomes much more important.
The fact that ROMMA better handles correlation can be seen in Figure 4 by the fact that the number of model evaluations needed at each level to reduce the correlation in the chains to a desired threshold does not significantly change as increases. This increase is seen in MMA since it does not efficiently handle the correlation in the failure domain as .
5.3 CASE 1: Leak Identification
The leak identification problem seeks to estimate the size and position of leaks given observations under an arbitrary set of nodal demands. For the first experiment, data was generated from the water distribution system subject to a large leak in a critical section of the system. This data corresponds to noisy observations of the hydraulic head at all demand nodes under 10 known demand conditions. Therefore, 10 model evaluations are needed for every likelihood evaluation. The observation uncertainty was modeled as Gaussian with m. Using this data, the leak sizes and positions are inferred using two ST-MCMC algorithms based upon Algorithm 3, standard RWM and ROMMA, to sample from the posterior
. The means and Bayesian credibility intervals of the leak size and position posterior parameters are shown in Figures5 - 6. We see that many of the parameters are not significantly informed by the data and that they are similar to their prior distributions. This is particularly true for the leak position variables. Further, the data implies some significant correlations between some parameters, but most parameters remain uncorrelated as in the prior. Along many directions the posterior is very similar to the prior, enabling ROMMA to give very high performance since it explicitly integrates the prior information into the proposal and can also handle the directions where correlation is imposed by the data. Indeed, Figure 7 shows that ROMMA enables much higher computational performance, requiring significantly fewer model evaluations. The performance gains of ROMMA do decrease relative to RWM as increases since the intermediate distribution gets farther from the prior .
Finally, the distributions of posterior samples from ST-MCMC with RWM and ROMMA are practically the same, as we would expect from a converged MCMC algorithm. The comparisons of the posterior parameter marginal distributions and the posterior parameter correlations for ROMMA and RWM can be seen in the appendix, Figures 18 - 20.
5.4 CASE 1: Posterior Failure Estimation
In this case, since there is a large leak in a critical part of the network, failure is likely. We compute the failure probability using ST-MCMC with Algorithm 4, based upon RWM, MMA, and ROMMA. In Figure 8, we see that ROMMA outperforms both RWM and MMA significantly. MMA requires smaller steps because it cannot effectively handle the correlation introduced by the data (see Figure 21 for parameter correlations), leading to slower sampling. RWM also requires smaller steps because of the high dimensionality of the problem where it is hard for large random steps to stay in the high probability region of the posterior. The three algorithms are in good agreement with their posterior failure estimates. ROMMA estimates the failure probability as , RWM estimates it as , and MMA as .
Figures 9 - 10 compare the structure of the prior and posterior failure distributions and for the leak size and demand parameters. We see that the large leak in the system matches well with the prior failure distribution so qualitatively the prior and posterior failure regions look very similar. Further, there is really only one failure mode present so the distribution is unimodal, making it work well with our algorithm tuning theory and implementation. For a full look at the posterior failure marginal distributions and correlations for the three algorithms, see the appendix, Figures 21 - 24.
5.5 CASE 2: Leak Identification
Similar to Case 1, leak sizes and positions are inferred from noisy observations of the hydraulic head under different conditions. However, in this case there is no catastrophic leak. The true leak parameters used to generate the data can be seen in Figures 11 - 12. Again, when solving this identification problem, we compare a ST-MCMC method that uses RWM to one using ROMMA. Figure 13 shows a considerable speed up where ROMMA take only 3% of the time to solve the problem than RWM, giving it about a thirty times speed up. While both RWM and ROMMA use a tuned proposal covariance, because ROMMA samples the prior very efficiently through its rank-one proposal in the first part of the algorithm, it is much more efficient than RWM. RWM must take significantly smaller proposal steps than ROMMA to maintain an appropriate acceptance rate.
The posterior is still significantly influenced by the prior since the majority of leaks are small and close to the inequality constraint that their flow rate must be non-negative. Further, the data is not rich enough to really inform the posterior and constrain all the leak sizes and positions any more than the prior already does, although it does introduce a few significant parameter correlations. We do see from Figures 11 - 12 that only the leak size parameters that influence the system have been significantly identified. Figures 28 - 27 in the appendix gives a more complete view of the parameter correlations and marginal distributions of the posterior. ROMMA excels in this environment and samples very efficiently. However, again we see in Figure 13 that the number of model evaluations needed for ROMMA increases as increases since the posterior is moving further away from the prior as the data is being integrated and some of the parameters are being constrained.
5.6 CASE 2: Posterior Failure Estimation
When we combine the two ST-MCMC algorithms to estimate the posterior robust failure probability, we still see significant speed ups in Figure 14 using ROMMA instead of RWM or MMA. The latter require taking much smaller step sizes than ROMMA, causing higher computational cost. The algorithms give the posterior failure probability to be for ROMMA, for RWM, and for MMA. Therefore, the posterior information about the leaks indicates that the system is significantly less likely to fail than when only considering prior information about the leaks (Section 5.2). Estimating the very small posterior failure probability based upon Monte Carlo samples would be very computationally challenging since the estimate needs to incorporate the data through importance sampling, which would be very inefficient. Using ROMMA with 16384 samples instead of 1024 samples gives the failure probability estimate . All these estimates agree within an order of magnitude, but their variation reflects that the quality of the estimate degrades for very small failure probabilities with complex failure domains.
The high fidelity ROMMA simulation with 16384 samples shows that the posterior failure domain is tri-modal, corresponding to failures due to large loads on nodes 24, 43, or 64 in Figure 3
. This tri-modality means that using the covariance estimate from the data will lead to a suboptimal proposal mechanism. This could contribute to the variation in the failure probability estimates. The algorithms generally agree on the marginal distributions over the failure domain. The means and confidence intervals are seen in Figures15-16 and the marginals are seen in the supplemental information in the appendix, Figures 29-31.
6 Concluding Remarks
In this work we explore using Sequential Tempered MCMC (ST-MCMC) with the proposed Rank-One Modified Metropolis Algorithm (ROMMA) to solve problems in Bayesian updating and failure estimation arising in system identification, where we wish to update our understanding of a system given data, and in reliability assessment, where given our current understanding of a system we want to estimate the probability of failure based on some performance quantity. Our main contributions are as follows:
Presenting a general framework for understanding Sequential Tempered MCMC algorithms like TMCMC and Subset Simulation which have been used for solving Bayesian updating problems and reliability problems separately in the past.
Showing that this framework can be used to solve posterior reliability problems efficiently and robustly with respect to modeling uncertainty by combining multiple algorithms within the class of ST-MCMC algorithms.
Introducing the Rank-One Modified Metropolis Algorithm to speed up sampling in ST-MCMC for high-dimensional distributions with inequality constraints.
ST-MCMC combines tempering, importance resampling, and MCMC into a single algorithm that gradually transforms a population of samples from being distributed according to the prior to being distributed from the data-updated posterior. These methods gain efficiency because they can be easily parallelized and because they can adapt through the tempering process and learn from global population information. Further, these methods can be used to efficiently estimate Bayesian model evidence and failure probabilities for systems. ST-MCMC type algorithms have been used separately to solve the Bayesian inference problem and the prior failure probability assessment problem but in this work, we combine them to solve the joint posterior failure probability problem. We demonstrate efficient estimation of the prior and posterior failure probabilities for the reliability of a water distribution network subject to uncertain user demand and uncertain leak conditions. This high-dimensional problem reflects realistic complex systems where the data is often uninformative about many of the parameters. ROMMA with ST-MCMC is shown to perform very well under these conditions.
The efficiency for solving the posterior reliability problem is achieved by speeding up the MCMC sampling step within ST-MCMC using ROMMA, which builds upon the Modified Metropolis Algorithm (MMA) introduced in the original Subset Simulation reliability algorithm to allow scaling to high dimensional spaces. Unlike MMA, ROMMA does not require the prior and proposal distributions to be expressed in terms of independent stochastic variables, making it much more suited to posterior estimation that often involves high correlation induced by the data. ROMMA especially speeds up problems where the prior distribution significantly informs the posterior, as is the case where the prior enforces certain constraints or where the data only informs the posterior along a certain parameter subspace. This performance gain comes by first sequentially updating a candidate sample in the chain along rank-one directions using prior information and only then accepting or rejecting the candidate based upon the likelihood of the data.
There are many future opportunities for expanding this work. First, ST-MCMC and ROMMA could be further accelerated by making better approximations of the global structure of the intermediate distributions based upon the sample population to inform the MCMC proposal step to speed up sampling. Second, the path taken for transforming the prior to the posterior failure region by ST-MCMC could be better optimized to minimize the computational effort. Finally, the underlying theory of ST-MCMC needs to be further explored, as discussed in the appendix, to better inform its tuning for some desired level of performance, particularly for approximations of integrals, such as estimating the failure probability.
The authors wish to acknowledge the Department of Energy Computational Science Graduate Fellowship and the John von Neumann Fellowship granted to the first author. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND no. 2018-3857 J
-  S.-K. Au and J. L. Beck, Estimation of small failure probabilities in high dimensions by subset simulation, Probabilistic Engineering Mechanics, 16 (2001), pp. 263–277.
-  J. L. Beck, Bayesian system identification based on probability logic, Structural Control and Health Monitoring, 17 (2010), pp. 825–847.
-  J. L. Beck and S.-K. Au, Bayesian updating of structural models and reliability using markov chain monte carlo simulation, Journal of Engineering Mechanics, 128 (2002), pp. 380–391, https://doi.org/10.1061/(ASCE)0733-9399(2002)128:4(380).
J. L. Beck and L. S. Katafygiotis,
Updating models and their uncertainties. I: Bayesian statistical framework, Journal of Engineering Mechanics, 124 (1998), pp. 455–461.
-  J. L. Beck and A. A. Taflanidis, Prior and posterior robust stochastic predictions for dynamical systems using probability logic, International Journal for Uncertainty Quantification, 3 (2013).
-  J. L. Beck and K. M. Zuev, Asymptotically independent Markov sampling: a new Markov Chain Monte Carlo scheme for Bayesian inference, International Journal for Uncertainty Quantification, (2013), (2013).
-  W. Betz, I. Papaioannou, J. L. Beck, and D. Straub, Bayesian inference with subset simulation: strategies and improvements, Computer Methods in Applied Mechanics and Engineering, 331 (2018), pp. 72–93.
-  S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of Markov Chain Monte Carlo, CRC press, 2011.
B. Calderhead and M. Girolami,
Estimating bayes factors via thermodynamic integration and population mcmc, Computational Statistics & Data Analysis, 53 (2009), pp. 4028–4045.
-  T. A. Catanach, Computational Methods for Bayesian Inference in Complex Systems, PhD thesis, California Institute of Technology, 2017.
-  M. Chiachio, J. L. Beck, J. Chiachio, and G. Rus, Approximate bayesian computation by subset simulation, SIAM Journal on Scientific Computing, 36 (2014), pp. A1339–A1358.
-  J. Ching and Y.-C. Chen, Transitional Markov Chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging, Journal of Engineering Mechanics, 133 (2007), pp. 816–832.
-  O. Chkrebtii, D. A. Campbell, M. A. Girolami, and B. Calderhead, Bayesian uncertainty quantification for differential equations, arXiv preprint arXiv:1306.2365, (2013).
-  N. Chopin, A sequential particle filter method for static models, Biometrika, 89 (2002), pp. 539–552.
-  P. G. Constantine, Active subspaces: Emerging ideas for dimension reduction in parameter studies, SIAM, 2015.
-  R. T. Cox, Probability, frequency and reasonable expectation, American Journal of Physics, 14 (1946), pp. 1–13.
-  R. T. Cox, The Algebra of Probable Inference, Johns Hopkins University Press, 1961.
-  M. d. C. Cunha and J. Sousa, Water distribution network design optimization: simulated annealing approach, Journal of Water Resources Planning and Management, 125 (1999), pp. 215–221.
-  P. Del Moral, A. Doucet, and A. Jasra, Sequential monte carlo samplers, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68 (2006), pp. 411–436.
-  F. DiazDelaO, A. Garbuno-Inigo, S. Au, and I. Yoshida, Bayesian updating and model class selection with subset simulation, Computer Methods in Applied Mechanics and Engineering, 317 (2017), pp. 1102–1121.
-  O. Fujiwara and D. B. Khang, A two-phase decomposition method for optimal design of looped water distribution networks, Water Resources Research, 26 (1990), pp. 539–549, https://doi.org/10.1029/WR026i004p00539, http://dx.doi.org/10.1029/WR026i004p00539.
-  A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis, vol. 2, Taylor & Francis, 2014.
-  C. Geyer, Introduction to Markov Chain Monte Carlo, Handbook of Markov Chain Monte Carlo, (2011), pp. 3–48.
-  W. Härdle and L. Simar, Canonical Correlation Analysis, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007, pp. 321–330, https://doi.org/10.1007/978-3-540-72244-1_14, http://dx.doi.org/10.1007/978-3-540-72244-1_14.
-  A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris, Inference for lévy-driven stochastic volatility models via adaptive sequential monte carlo, Scandinavian Journal of Statistics, 38 (2011), pp. 1–22.
-  E. T. Jaynes, Probability theory: the logic of science, Cambridge University Press, 2003.
-  N. Kantas, A. Beskos, and A. Jasra, Sequential Monte Carlo methods for high-dimensional inverse problems: A case study for the Navier-Stokes equations, SIAM/ASA Journal on Uncertainty Quantification, 2 (2014), pp. 464–489.
-  L. S. Katafygiotis and J. L. Beck, Updating models and their uncertainties. II: Model identifiability, Journal of Engineering Mechanics, 124 (1998), pp. 463–467.
-  L. S. Katafygiotis and H.-F. Lam, Tangential-projection algorithm for manifold representation in unidentifiable model updating problems, Earthquake Engineering & Structural Dynamics, 31 (2002), pp. 791–812, https://doi.org/10.1002/eqe.122, http://dx.doi.org/10.1002/eqe.122.
-  M. C. Kennedy and A. O’Hagan, Bayesian calibration of computer models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63 (2001), pp. 425–464.
-  M. Maskit and A. Ostfeld, Leakage calibration of water distribution networks, Procedia Engineering, 89 (2014), pp. 664–671.
-  S. Minson, M. Simons, and J. Beck, Bayesian inversion for finite fault earthquake source models I-theory and algorithm, Geophysical Journal International, (2013), p. ggt180.
-  H. N. Najm, Uncertainty quantification and polynomial chaos techniques in computational fluid dynamics, Annual Review of Fluid Mechanics, 41 (2009), pp. 35–52.
-  R. M. Neal, Annealed importance sampling, Statistics and Computing, 11 (2001), pp. 125–139, https://doi.org/10.1023/A:1008923215028, https://doi.org/10.1023/A:1008923215028.
-  A. B. Owen, Monte Carlo theory, methods and examples, 2013.
-  C. Papadimitriou, J. L. Beck, and L. S. Katafygiotis, Updating robust reliability using structural test data, Probabilistic Engineering Mechanics, 16 (2001), pp. 103–113.
-  Z. Poulakis, D. Valougeorgis, and C. Papadimitriou, Leakage detection in water pipe networks using a bayesian probabilistic framework, Probabilistic Engineering Mechanics, 18 (2003), pp. 315–327.
-  E. Prudencio and S. H. Cheung, Parallel adaptive multilevel sampling algorithms for the Bayesian analysis of mathematical models, International Journal for Uncertainty Quantification, 2 (2012).
-  C. Robert and G. Casella, A short history of MCMC: Subjective recollections from incomplete data, Handbook of Markov Chain Monte Carlo, (2011), pp. 49–66.
-  G. O. Roberts, J. S. Rosenthal, et al., Optimal scaling for various metropolis-hastings algorithms, Statistical Science, 16 (2001), pp. 351–367.
-  J. S. Rosenthal et al., Optimal proposal distributions and adaptive MCMC, Handbook of Markov Chain Monte Carlo, (2011), pp. 93–112.
-  A. Scheidegger, J. P. Leitão, and L. Scholten, Statistical failure models for water distribution pipes–a review from a unified perspective, Water Research, 83 (2015), pp. 237–247.
-  M. K. Vakilzadeh, Y. Huang, J. L. Beck, and T. Abrahamsson, Approximate bayesian computation by subset simulation using hierarchical state-space models, Mechanical Systems and Signal Processing, 84 (2017), pp. 2–20.
-  K. M. Zuev, J. L. Beck, S.-K. Au, and L. S. Katafygiotis, Bayesian post-processor and other enhancements of subset simulation for estimating failure probabilities in high dimensions, Computers & structures, 92 (2012), pp. 283–296.
-  K. M. Zuev and L. S. Katafygiotis, Modified metropolis–hastings algorithm with delayed rejection, Probabilistic Engineering Mechanics, 26 (2011), pp. 405–412.
Appendix A Discussion of Algorithm Implementation and Tuning
a.1 General ST-MCMC Tuning
a.1.1 Effective Sample Size
The number of effective samples is a common measure used in sampling problems like MCMC and importance sampling. Because of sample weights or sample correlations, the estimate quality does not necessary behave the same as if the estimate was made using independent samples. Thus, the Effective Sample Size (ESS), given by:
estimates the size of the population of independent samples that would give the same variance of the estimate as the weighted or correlated sample population. For example, in the case of a simple mean estimate, . In order to estimate the ESS, for Sequential Tempered MCMC methods, we must consider the effects of weighting the samples in the Importance Sampling step, performing the resampling, and evolving the population during the MCMC step. In general, weighting and resampling reduce the ESS while the MCMC step increases the ESS.
An approximation of the evolution of the ESS for ST-MCMC is presented in . This evolution of the effective sample size of the population is a function of the target coefficient of variation and correlation between the start and end of the Markov chains at level :
If the population size is large and the COV and correlation targets are constant for all steps, we can find a condition for the existence of a non-zero stationary number of effective samples by finding the fixed point of (19) as the number of levels increases:
If this condition holds, then an asymptotic expression for the effective number of samples is
In general, this analysis gives us a guide for setting the target COV and correlation to obtain a satisfactory number of samples from the posterior distribution. The region where learning is possible (i.e. the ESS will be non-zero at the end of all the levels) is found in Figure 17. For example, if , then and it requires for learning.
a.1.2 Measure of Correlation
Finding a good measure of the correlation of a multivariate sample population is important to insure that the sample population converges to the correct distribution. A simple measure is to look at the component-wise correlations of each of the parameters. This technique is commonly used when estimating the autocorrelation of the Markov Chain for more typical effective sample size analyses for MCMC.
However, even if the parameter-wise correlations are small, there might be large correlations in some transformed set of coordinates. One strategy to mitigate this issue is to use Canonical Correlation Analysis (CCA) . CCA is a technique to efficiently find the direction and magnitude of maximum correlation between two populations, i.e., find vectors and to maximize . By minimizing the canonical correlation, we can insure the correlation target is achieved. This approach was used when solving the Bayesian inference problem for the leak identification.
When the posterior distribution is multimodal, as is often the case for posterior reliability problems like Case 2, looking at the correlation of the components will cause the algorithm to be impractically slow. This is because it is very hard for the Markov Chain to traverse the multiple components which means the correlation will always be high. In this case, the rebalancing of samples between the components will be achieved by the importance resampling step so the most important factor is decorrelating the sample weights and making sure the chain explores the mode it is confined too. Therefore we use as correlation the log likelihood of the start and end of each chain.
a.2 ROMMA Acceptance Rate Tuning
The scaling for the spread of a MCMC proposal distribution is typically tuned by trying to find a scaling factor that achieves an acceptance rate target. In ROMMA, because there are multiple Metropolis steps, finding the appropriate definition of the acceptance rate is non-trivial. For example, having the function that relates the scaling factor to the acceptance rate be monotonic is important for many of the tuning algorithms to achieve a target acceptance rate that corresponds to a scaling factor that is neither too large or too small and thus induces low correlation. However, the acceptance rate in the second part of the ROMMA algorithm does not have this property. When the scale factor is small, the second Metropolis step acceptance rate is high and generally decreases. Then once the scale factor gets sufficiently large, the acceptance rate starts to increase again since most of the rank one proposals in the first step of the algorithm are now getting rejected. This causes the bifurcation of the correlation with respect to the acceptance rate.
An alternative definition of the acceptance rate is to look at the acceptance rate for a specific rank one component. This means the probability that a specific component proposal is accepted during step one of the ROMMA algorithm and also accepted as part of the combined candidate in step 2. Since there are multiple rank one components, we take the minimum acceptance rate among all of them. This quantity is generally monotonic since as the scale factor grows very large, the higher acceptance rate in step 2 is balanced by the higher rejection rate in step 1. Since it is monotonic, it is a much better tuning mechanism to find a scaling factor that leads to low correlation.
The Gaussian proposal distribution in ST-MCMC, , was tuned over the levels of the ST-MCMC algorithm. The proposal covariance was chosen to be a scaled version of the sample population covariance, where the scaling, , was adapted using a feedback controller to get a desired acceptance rate. The feedback controller is described in Algorithm 7. The motivation and derivation of this feedback controller for MCMC can be found in . Based upon [41, 40], , and was chosen to be based upon .
Appendix B ST-MCMC for Model Selection
The model evidence is the high dimensional normalization factor in Bayes’ Theorem for that MCMC was developed to avoid computing. The evidence can be thought of as the expectation of the probability of the data with respect to the prior distribution generated by the model class and it is important for computing the posterior probability of :
This integral could be naively estimated using Monte Carlo sampling of the prior distribution . This estimate would be very computationally inefficient when the data is informative, since the high probability content of the prior may be very far from the high probability content of the posterior. However, the intermediate levels of ST-MCMC enable us to address this problem by decomposing the evidence computation over the intermediate levels [34, 12]. This can be thought of as thermodynamic integration as in . Let denote the ratio of the evidences for the intermediate levels and of the levels, then:
where and . For each intermediate level, we can perform a fairly accurate Monte Carlo estimate between the previous level and the current level since these distributions are designed to be relatively close to each other in terms of the relative effective sample size of samples coming from the previous level. Having a high ESS means Monte Carlo sampling will be effective. This leads to the Monte Carlo estimate:
This integral can be thought of as the evidence provided by level where is the data likelihood added by level and is the prior for level . The combined estimate of the level model evidences
provides an asymptotically unbiased estimate of the total model evidence.
Appendix C Proofs of MMA and ROMMA Reversibility
Reversibility is a sufficient condition for the existence of a stationary distribution, , that satisfies the detailed-balance condition:
This sufficient condition means that any transition kernel may be chosen to maintain the stationary distribution , as long as the reversibility condition (25) holds. Further, the composition of kernels which have the same invariant distribution, , also has as its invariant distribution . This method can be used to create non-reversible Markov chains with the correct stationary distribution.
c.1 Proof of MMA Reversibility
The MMA MCMC Markov process step from to , with transition distribution denoted by , forms a reversible Markov chain whose invariant measure is the posterior distribution . For this algorithm we assume that the proposal and prior distributions have independent components i.e. and .
Let denote the probability density that describes moving from to under the Markov chain proposal, then the transition density from to , , is:
We define to be the ith component of the candidate and as the full transition probability according to both the ith component update and the Metropolis accept/reject step. The component-wise proposals is introduced in Algorithm 5. Each factor in this product can be express in two different ways that depend on whether the candidate was accepted or rejected at the ith step:
This leads to the ratio:
Therefore, we can put these results together to find:
Substituting this result into the Markov chain transition probability ratio , we can prove the reversibility of the Markov chain with respect to the posterior distribution: