Bayesian Inference for Latent Chain Graphs

08/12/2019 ∙ by Deng Lu, et al. ∙ King Abdullah University of Science and Technology National University of Singapore Johns Hopkins Medicine Yale-NUS College 0

In this article we consider Bayesian inference for partially observed Andersson-Madigan-Perlman (AMP) Gaussian chain graph (CG) models. Such models are of particular interest in applications such as biological networks and financial time series. The model itself features a variety of constraints which make both prior modeling and computational inference challenging. We develop a framework for the aforementioned challenges, using a sequential Monte Carlo (SMC) method for statistical inference. Our approach is illustrated on both simulated data as well as real case studies from university graduation rates and a pharmacokinetics study.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Two common approaches to probabilistically describe conditional dependence structures are based on undirected graphs (Markov networks) and directed acyclic graphs (DAG) (Bayesian networks), e.g. 

[25]. The vertices of the graph represent variables, while the presence (absence) of an edge between two vertices indicates possible dependence (independence) between the two corresponding variables. Applications of Markov networks include biological networks, e.g., to study the dependence structure among genes from expression data [9, 11] and financial time series for forecasting and predictive portfolio analysis [8, 33]; while Bayesian networks occur in expert system research for providing rapid absorption and propagation of evidence [18, 24], path analysis for computing implied correlations [34], and psychometrics for causal pathways [6, 16].

Chain graphs [17, 19, 35]

provide an elegant unifying point of view on Markov and Bayesian networks. These models allow for edges that are both directed and undirected and do not contain any semi-directed cycles. Although chain graphs were first introduced in the late eighties, most research has focused on Bayesian networks and Gaussian Graphical Models. Recently they are receiving more attention as they have proved to be very useful in applications due to their ability to represent symmetric and non-symmetric relationships between the random variables of interest. However, there exist in the literature several different ways of interpreting chain graphs and what conditional independencies they encode, giving rise to different so-called

chain graph interpretations. Whilst a directed edge works as in DAGs when it comes to representing independence, the undirected edge can be understood in different ways, giving rise to the different interpretations. This implies that chain graphs can represent every independence model achievable by any DAG, whereas the opposite does not hold. The most common interpretations are the Lauritzen-Wermuth-Frydenberg (LWF) interpretation, the Andersson-Madigan-Perlman interpretation and the multivariate regression (MVR) interpretation. Each interpretation has its own way of determining conditional independences in a CG and each interpretation subsumes another in terms of representable independence models; see [30]. Moreover, [20] discuss causal interpretation of chain graphs.

This work focuses on chain graphs of the AMP type [2, 22]

and develops a statistical framework to infer the chain graph structure from data measured on a set of variables of interest with the goal of understanding their association pattern. Many inferential procedures have concentrated on the estimation of the parameters characterizing the graph,

given the chain graph topology. For example, [13]

consider likelihood-based parameter inference for chain graph models that are directly observed. Nevertheless, in the machine learning literature, methods based on optimization have been proposed to perform inference on the graph structure itself in the context of chain graphs, e.g. 

[23, 26]. We consider the case where not only are the parameters unknown, but the chain graph itself is unobserved. The main objective is to perform Bayesian inference in such a context.

There are several contributions that propose Bayesian methods in related contexts. In [28]

, the author proposes a method to perform full Bayesian inference for acyclic directed mixed graphs using Markov chain Monte Carlo (MCMC) methods. There are several similarities between the model there (based upon structural equation models (SEM) e.g . 

[7]) and the one developed here. However, the main difference is given by the structure of the latent graph and the associated constraints in our case. These latter constraints are more stringent and lead to more complexity in the computational strategy. In [32] a similar model to the one presented in [28] is developed, which uses a spike-and-slab prior to induce sparsity in the graph. The computational scheme is expected to outperform that in [28] for a large number of nodes. The prior structure in [32] could in principle be extended to accommodate chain graph structures at the cost of introducing constraints that could make computations infeasible.

The contributions of this article are as follows. Based upon the likelihood method in [13] we develop a new Bayesian model for latent AMP chain graphs. This model can also incorporate covariate information, if available. We introduce a sequential Monte Carlo (SMC) method as in [10] that improves upon MCMC-based methods in this context. Our approach is applied to real case studies from university graduation rates and a pharmacokinetics study. We find the performance of our algorithm to be stable and robust.

This article is structured as follows. In Section 2 we describe our model and prior specifications. In Section 2.3 we introduce the SMC algorithm. In Section 3 we present a simulation study and two real-life applications. In the appendix we detail further elements of our algorithm in Section 2.3.

2 Model

2.1 Likelihood

Let

be a random vector whose elements correspond to

nodes of a graph. We assume we have observations on : , .

A graph is described by a set of nodes and edges , with variables placed at the nodes. The edges define the global conditional independence structure of the distribution. An AMP chain graph is a graph whose every edge is directed or undirected such that it does not contain any semi-directed cycles, that is, it contains no path from a node to itself with at least one directed edge such that all directed edges have the same orientation. Each graph can be identified by an adjacency matrix. Let be a matrix with entries where (let , which is a notation used later on) with , for

and, given the upper-triangular part of , for

Given , a labelling of the nodes and the adjacency matrix define a graph . Let denote the set of possible chain graphs for a set of vertices. Given , let be a positive definite (real-valued) matrix, such that if then . In addition, given , for an arbitrary real-valued matrix , then, if . Finally set

where is the identity matrix. To inherit the AMP chain graph property, should be a positive definite matrix.

The absence of semi-directed cycles implies that the vertex set of a chain graph can be partitioned into so-called chain components such that edges within a chain component are undirected whereas the edges between two chain components are directed and point in the same direction. More precisely, the vertex set of a chain graph can be partitioned into subsets such that all edges within each subset are un-directed and edges between two different subsets are directed. In the following, we assume that the partition is maximal, that is, any two vertices in a subset are connected by an un-directed path. Recall that the parents of a set of nodes of is the set . Let be parents of , and sub-matrices of and , respectively. For a single observation , , where, for , denotes the

dimensional Gaussian distribution with mean

and covariance matrix . Then the joint density of given is

The likelihood is then given by

that is, the observations are i.i.d. given . We remark that this structure is similar to the structural equation model proposed by [29], where also a Gaussian distribution is assumed for the variables corresponding to the nodes of the graph. Moreover, these above assumptions on the precision matrix ensure that the corresponding graph satisfies the AMP Markox property as discussed in [2].

2.2 Prior Distribution on the Chain Graph Space

We specify the prior on the set of possible chain graphs, by specifying a prior on the elements of the corresponding adjacency matrix. For , let

where . We assume each vector to follow a Dirichlet distribution with parameter and the to be conditionally independent given . Note that, if available, covariate information could be incorporated to model , as in [31] for example. Marginally, we have

where

The prior for a graph then becomes:

Note that the prior does not integrate to 1, but has a finite mass.

We assume that given , has a Wishart prior distribution ([21]) of parameters , and given the non-zero entries of are i.i.d. univariate Gaussian random variables with mean

and variance

. Let be the cone of real-valued positive definite matrices. Then the joint prior for and is given by:

which does not integrate to 1.

2.3 Posterior Inference

The posterior distribution of all the unknown of interest is given by:

The structure of the model bears some resemblance to the models considered in [28, 32]. Besides the graph structure being different (namely, we do not allow bi-directional edges), we also impose the AMP constraint, mathematically represented by the term in , which leads to a much more complex structure of the graph space than the aforementioned references. As a result, posterior exploration of graph space is more challenging and computationally demanding and requires carefully devised algorithms, in a sense more sophisticated, than the ones considered in [28, 32].

Our approach is to design an adaptive SMC sampler as in [15] (see [10] for the original algorithm and [5] for convergence results). SMC based algorithms offer the advantage of being easily parallelisable, often reducing computation times over serial methods in some scenarios. The strategy involves simulating samples (particles) to approximate the sequence of densities

where

and . The motivation for this algorithm is well-documented (see the aforementioned references for details), and SMC algorithms have been successfully employed in may contexts to sample from high dimensional posterior distributions.

In the implementation of the algorithm, we require a Markov kernel (e.g., a MCMC kernel) , that admits as an invariant distribution; this step is detailed in the appendix. The algorithm is summarised in Algorithm 1. For notational convenience, we set . In the initialization stage, one simulates exactly from using rejection sampling. The sequence of is set as proposed in [37], and the choice of parameters for the MCMC kernel follows [15].

  • Initialize. Set , for sample from .

  • Iterate: Set . Determine . If set , otherwise determine the parameters of the MCMC kernel .

    • Resample using the weights where, for ,

    • Sample from for .

Algorithm 1 SMC Sampler.

3 Simulation and Real Data Study

In the following numerical experiments, we set and for the G-Wishart prior, and for the distribution of the non-zero element of . The number of particles for the SMC is . The MCMC steps are Metropolis-Hastings kernels.

3.1 Simulated Example

In the simulated example, we assume that the random variables corresponding to the nodes of the graph are independent, i.e., for all . This is a benchmark example to evaluate the ability of our algorithm to recover the structure as this is a very simple graph structure. We consider vertices and observations. To generate the data, we set the precision matrix equal to identity matrix , and the entries of matrix are set to be . For the Dirichlet prior, we consider

. This prior implies a higher probability of no connection and assumes the same prior probability for any type of edge between two nodes. The reason we prefer this choice of hyper-parameters to

(which corresponds to a uniform prior) is due to the computational time required by the initialization step to generate the chain graph.

Figure 1: Simulation results for the independent case: (a) ESS in each SMC step; (b) plot of

across on particles; (c) - (e) acceptance rates in each MCMC step; (f) distribution of the log(target) (i.e., log of posterior probabilities) of the samples at the end of the algorithm.

The simulation results are summarized in Figure 1. The effective sample size (ESS) drops very fast as the algorithm begins and goes into a steady lower state after several resampling procedures. The acceptance rate is acceptable as it does not fall below a very small value. Using the weighted samples obtained at the last step of SMC, we estimate the posterior probability by

and summarize the results in Table 1. From the table, we can see that all the estimated posterior probabilities are greater than 0.7, and most of them are above 0.9. This suggests that our algorithm is able to recover the structure of the graph (independence) used to generate the data.

Nodes
0.882 0.892 0.920 0.932 0.920 0.870 0.874 0.908 0.934
0.902 0.906 0.828 0.898 0.934 0.784 0.804 0.906
0.914 0.890 0.880 0.890 0.908 0.900 0.890
0.900 0.866 0.882 0.770 0.918 0.900
0.788 0.952 0.912 0.830 0.918
0.806 0.932 0.906 0.908
0.904 0.738 0.916
0.918 0.740
0.952
Table 1: Posterior probability .

3.2 University Graduation Rates

We investigate the performance of our algorithm when the latent graph has a more complex structure. We consider data first presented in [14] that stem from a study for college ranking carried out in 1993. After initial analysis, Druzdzel and Glymour focus on variables:

average spending per student,
student-teacher ratio,
faculty salary,
rejection rate,
percentage of admitted students who accept university’s offer,
average test scores of incoming students,
class standing of incoming freshmen, and
average percentage of graduation.

Based on universities, the correlation matrix of these eight variables is estimated in [14]. Conditional on this correlation matrix, [13] obtain a chain graph through the SIN model selection with significance level 0.15. To get a chain graph with different AMP and LWF Markov properties, [13] further deleted the undirected edge between and and the undirected edge between and , and introduced an undirected edge between and . The resulting graph is shown in Figure 2, and the corresponding adjacency matrix is shown in Table 2. This graph has three chain components and . In the original article [14] provide some insight about the causal relationship between some of the variables. The average spending per student (), student-teacher ratio () and faculty salary () are determined based on budget considerations and are not influenced by any of the five remaining variables. Rejection rate () and percentage of students who accept the university’s offer from among those who are offered admission () precede the average test scores () and class standing () of incoming freshmen. The latter two are measures taken over matriculating students. Finally, graduation rate () does not influence any of the other variables.

Figure 2: Chain Graph Estimate presented in [13].
strat spend salar top10 tstsc rejr pacc apgra strat 0 1 1 2 0 0 0 0 spend 1 0 1 2 2 2 0 0 salar 1 1 0 0 2 2 2 2 top10 3 3 0 0 1 0 1 0 tstsc 0 3 3 1 0 1 0 2 rejr 0 3 3 0 1 0 1 0 pacc 0 0 3 1 0 1 0 2 apgra 0 0 3 0 3 0 3 0 Table 2: The adjacency matrix corresponding to chain graph in Figure 2.

To test the performance of the proposed Bayesian chain graph model, we build an empirical baseline graph through the following procedure. Starting with a graph with only one node denoted by , each time we add a new node to the existing graph. For every node in the subgraph not containing the new node, we consider the candidate graphs that are generated by adding one of four types of correlation (no edge, undirected edge and directed edges in either direction) between the new node and the node in the subgraph. We choose the graph with smallest BIC value obtained by fitting a structural equation model (SEM). SEM is a multivariate statistical analysis technique that is commonly used to analyse structural relationships. In general, the formulation of the SEM is given by the basic equation , where and are vectors of random variables. The parameter matrix contains regression coefficients and the matrix gives covariances among the elements of . The chain graph model described in this work can be written as , where is a multivariate Gaussian random vector with covariance matrix . This is consistent with the structural equation model formulation. Figure 3 shows the derived graph and Table 3 shows the corresponding adjacency matrix.

Figure 3: Empirical Graph.
strat spend salar top10 tstsc rejr pacc apgra strat 0 1 2 2 2 2 2 2 spend 1 0 1 2 2 2 2 2 salar 3 1 0 1 2 2 3 2 top10 3 3 1 0 2 1 0 2 tstsc 3 3 3 3 0 1 0 2 rejr 3 3 3 1 1 0 3 0 pacc 3 3 2 0 0 2 0 2 apgra 3 3 3 3 3 0 3 0 Table 3: The adjacency matrix corresponding to the chain graph in Figure 3.

We compare posterior inference from our model with the chain graphs in Figure 2 and Figure 3. For the SMC sampler, we set the number of samples . For the Dirichlet prior, we first consider a prior based on the analysis of [13], i.e., choosing by matching the probabilities of each type of edges in Figure 2. More precisely, the number of no-edges, un-directed edges and directed edges from to are 11, 7 and 10 according to the graphs in Figure 2. So the corresponding proportions of these three types of edge are the first three components of . The last component of is chosen to be 0.05 to ensure the occurrence of directed edge from to and the probability of this type of edge is not large compared to the other three types. We also perform posterior inference with , which is a uniform prior, and with , which favours more connections. The remaining parameters are specified as in the previous section.

Based on the weighted samples , we first estimate the posterior probability of occurrence of each edge by

Then the entries of the estimated adjacency matrix are given by

Figure 4 shows the estimated chain graphs obtained under each prior. The structure of the chain graph obtained from setting is obviously more similar to Figure 2 than that obtained using . This is not surprising since the prior with is very informative and forces the posterior distribution of the edges to be similar to Figure 2. The estimated chain graph obtained setting the hyper-parameters equal to favours more connections compared with other two priors. If we compare our results with the conclusions in [14], the estimated chain graph obtained under the informative prior indeed shows that , and are parents of other variables, and is a child of the others. However, it does not show that and precede and (the graph just shows the opposite relation). Similarly, the graph obtained under the prior with shows that is a parent of , and , which is a contrast to the available prior information. This can be improved by modifying the ordering of nodes and choosing a small value of .

(a)
(b)
(c)
Figure 4: (a): posterior estimated chain graph using a Dirichlet prior with ; (b) posterior estimated chain graph using a Dirichlet prior with ; (c) posterior estimated chain graph using a Dirichlet prior with .

Finally, we compare our results with those obtained using SEM. We use package sem in R to fit a SEM to these data. Note that when fitting the SEM model the graph topology is fixed. Table 4 presents a brief summary of three different chain graphs using the sem function. From the table, we can see that the chain graph selected by the Bayesian chain graph model with prior has smaller AIC and BIC values compared with the graph selected by the SIN model selection procedure and the empirical graph, which suggests that our algorithm can take advantage of available prior information and perform better.

Base chain graph Chain graph selected by SIN
Edge p-value Edge p-value Edge p-value
strat — spend 1.630e-14 pacc salar 1.137e-06 strat — spend 1.630e-14
strat salar 1.382e-06 pacc rejr 1.470e-03 strat — salar 1.082e-05
spend — salar 2.629e-11 strat apgra 8.237e-02 strat top10 1.935e-09
strat top10 4.743e-07 spend apgra 6.067e-02 spend — salar 7.156e-13
spend top10 2.822e-28 salar apgra 4.794e-03 spend top10 5.979e-34
top10 — salar 8.931e-03 top10 apgra 4.253e-01 spend tstsc 3.995e-12
strat tstsc 3.140e-03 tstsc apgra 1.096e-10 spend rejr 2.909e-03
spend tstsc 6.634e-01 pacc apgra 2.711e-03 salar tstsc 2.350e-05
salar tstsc 2.008e-04 salar rejr 1.323e-03
top10 tstsc 6.831e-19 salar pacc 1.827e-14
strat rejr 1.954e-01 salar apgra 1.570e-02
spend rejr 3.621e-03 top10 — tstsc 1.256e-09
salar rejr 2.575e-04 top10 — pacc 5.020e-01
top10 — rejr 1.816e-04 tstsc — rejr 8.297e-03
tstsc — rejr 1.003e-02 tstsc apgra 8.352e-19
strat pacc 2.585e-02 rejr — pacc 5.617e-03
spend pacc 4.109e-07 pacc apgra 5.481e-03
AIC BIC AIC BIC
67.887 -13.319 80.838 -24.919
Chain graph selected by algorithm Chain graph selected by algorithm Chain graph selected by algorithm
() () ()
Edge p-value Edge p-value Edge p-value
strat — spend 1.630e-14 spend strat 2.150e-52 spend strat 1.536e-42
strat salar 1.727e-06 strat salar 9.127e-07 salar strat 4.952e-06
strat top10 3.597e-09 spend salar 2.484e-24 strat top10 1.636e-07
spend salar 4.956e-33 top10 spend 2.068e-25 tstsc strat 8.237e-01
spend top10 1.086e-33 salar pacc 7.304e-10 salar spend 9.659e-11
spend tstsc 4.059e-12 apgra salar 3.751e-11 spend top10 3.881e-10
salar tstsc 5.524e-05 top10 — tstsc 2.163e-14 tstsc spend 2.886e-08
salar pacc 1.012e-18 top10 rejr 2.504e-03 tstsc salar 3.654e-27
salar apgra 3.201e-05 tstsc rejr 2.847e-04 salar rejr 8.844e-02
top10 — tstsc 2.295e-10 tstsc apgra 1.428e-43 salar pacc 1.062e-08
top10 rejr 1.951e-03 rejr pacc 1.480e-04 salar apgra 1.036e-04
tstsc rejr 2.348e-04 apgra pacc 3.587e-03 tstsc top10 1.033e-20
tstsc apgra 1.367e-18 top10 rejr 9.160e-03
rejr pacc 1.032e-03 tstsc rejr 5.443e-03
pacc — apgra 5.315e-03 tstsc apgra 2.644e-17
rejr pacc 3.048e-04
apgra pacc 5.759e-03
AIC BIC AIC BIC AIC BIC
61.372 -50.522 102.93 -18.169 58.401 -47.356
Table 4: Summaries of different chain graphs using package SEM.

3.3 Tenofovir study

In this section, we illustrate our Bayesian model for AMP chain graphs on a real data application from the RMP-02/MTN-006 study ([3]). Tenofovir (TFV) is a medication used to treat chronic hepatitis B and to prevent and treat HIV. TFV gel demonstrated protective efficacy in women using the gel within 12 hours before and after sexual activity in the Centre for the AIDs Programme of Research in South Africa 004 study [1]. Daily dosing of tenofovir disoproxil fumarate (TDF)/emtricitabine provides 62 to protection against HIV transmission in serodiscordant men and women enrolled in the Partners PrEP study [4]. The RMP-02 study was designed to evaluate the systemic safety and biologic effects of oral TFV combined with a gel formulation of the drug for application rectally and vaginally. The study enrolled 18 patients, all of whom received a single oral dose of TFV, and randomized each patient to receive either the gel formulation or a placebo several weeks later. Details about the phase 1 study are given in [3]. [27] present analyses of the ancillary studies into the treatment’s biologic effects.

The biologic effects we examine here concern the pharmacokinetics (PK) of TFV and its active metabolite tenofovir diphosphate (TFVdp), as well as the pharmacodynamics of the drug. The PK studies evaluate subjects’ TFV and TFVdp concentrations in multiple physiologic compartments (i.e., tissues and cells) across multiple time points during the study. Table 5 lists the compartments.

Compound Compartment Notation
TFV Blood plasma TFV
TFV Rectal biopsy tissue TFV
TFV Rectal fluid TFV
TFVdp Rectal biopsy tissue TFVdp
TFVdp Total mononuclear cells in rectal tissue Total
TFVdp CD4 lymphocytes from MMC CD4
TFVdp CD4 lymphocytes from MMC CD4
Table 5: Tissues and cell types examined in the PK studies

[27] demonstrate that tissue HIV infectibility (cumulative p24) is correlated with in vivo concentrations of both TFV and TFVdp. Statistically significant, non-linear dose-response relationships with reduced tissue infectibility are found for one TFV compartment and four TFVdp compartments; the dose-response relationships are highly significant for TFVdp in whole rectal tissue, CD4, CD4 and Total compartments. Furthermore, [36] conduct a comprehensive pharmacokinetic study of rectally administered TFV gel that describes the distribution of TFV and TFVdp into various tissue compartments relevant to HIV infection. They argue that TFV rectal fluid concentrations may be reasonable bio-indicators of plasma and rectal tissue concentrations, making it easier to estimate adherence and TFV concentrations in the target tissue. Therefore, the correlations between the TFV and TFVdp in compartments can be helpful in studying HIV suppression procedure and providing a measure of drug efficacy, enabling more advanced population pharmacokinetic modelling methods. In the following analysis, we investigate the correlation structure of p = 7 concentration levels in Table 5 collected at visit 12. From clinical knowledge, we would expect the following associations:

  • TFV is associated with TFV (blood levels and tissue levels),

  • TFV is associated with TFV (rectal tissue and rectal fluid),

  • TFV is associated with Total (rectal tissue and mononuclear cells in rectal tissue),

  • Total is associated with CD4 and CD4 (total and constituents).

We normalise the observations of each variable to have zero mean and standard deviation of one. The number of observation is

, and the number of particles is set to . We fit the model using as hyper-parameters in the Dirichlet prior both , which corresponds to the uniform prior, and , which favours the presence of connections. This latter prior choice is more suitable for small sample sizes. The remaining parameters are set as in the previous section.

Based on weighted samples , we calculate the posterior probabilities . As posterior estimate of the resulting chain graph, we report the one obtained from the adjacency matrix where , and is shown in Figure 5 (for both prior settings). Obviously, the chain graph obtained under the prior with has more connections. Moreover, this graph also shows that the TFV is related to TFV and TFV, the TFV is associated with Total through TFV, and CD4 causes Total and CD4. This is consistent with the clinical knowledge mentioned before. However, some of these associations are missing in the chain graph obtained under the uniform prior, e.g. the edge between TFV and TFV (they are independent given Total and CD4 under the AMP chain graph property).

(a)
(b)
Figure 5: Chain graph with highest posterior probability.

We compare our results to those obtained by fitting a SEM model using the sem function in the R package sem. We summarise the results in Table 7. The AIC and BIC values of the model corresponding to the chain graph obtained under the uniform prior are missing, which may due to the singular Hessian matrix when estimating the covariance matrix of the parameter. From the table, it is evident that the p-values of some edges are large. For example, the edge CD4 TFV in the chain graph obtained under uniform prior has a p-value 0.704. These results suggest that these relationships are not significant in a SEM model.

When inadequate fit of a structural equation model is observed, model modification is often conducted followed by retesting of the modified model. The most popular statistic is the modification index, which is a chi-square score test statistic with degree of freedom one. The modification index provides an estimated value in which the model’s chi-square test statistic would decrease if the corresponding parameter is added to the model and respecified as a free parameter. We perform model modification using the

modIndices function in sem package, which calculates modification indices and estimates parameter changes for the fixed and constrained parameters in a structural equation model. Table 5 shows the five largest modification indices for both the matrix and matrix for the model corresponding to the chain graph obtained under the Dirichlet prior with . The modification indices suggest that a better fit to the data would be achieved by adding association between TFV and TFV to the model. The small sample size makes it obviously challenging to estimate associations. Furthermore, the level in rectal tissue may depend on whether or not the patient received the TFV gel or the placebo, which may make the correlation of TFV with other variables unclear.

Chain graph selected by algorithm Chain graph selected by algorithm
() ()
Edge p-value Edge p-value
CD4 Total 9.881e-19 CD4 TFVdp 5.687e-01
CD4 CD4 4.091e-81 CD4 CD4 2.941e-46
CD4 TFV 1.496e-01 CD4 TFV 1.210e-02
Total — TFVdp 1.589e-02 CD4 Total 7.028e-10
TFVdp — CD4 1.812e-02 CD4 TFV 1.874e-03
CD4 — TFV 8.583e-03 TFVdp — CD4 8.477e-02
Total TFV 1.815e-03 TFVdp TFV 2.991e-01
CD4 TFV 7.043e-01 TFVdp TFV 2.584e-01
TFVdp Total 1.162e-13
CD4 Total 2.352e-22
TFV TFV 4.259e-01
TFV — Total 5.719e-02
TFV TFV 7.550e-01
Total TFV 2.337e-02
AIC BIC AIC BIC
Inf Inf 46.875 -11.909
Table 6: Summaries of two different chain graphs using package SEM.
5 largest modification indices, matrix 5 largest modification indices, matrix
(regression coefficients) (variances/covariances)
TFV Total 3.281 TFV — Total 3.394
TFV TFV 2.219 TFV — TFV 2.881
CD4 TFV 0.709 TFV — CD4 0.709
TFV TFVdp 0.654 TFV — TFVdp 0.709
TFV CD4 0.555 TFV — TFVdp 0.389
Table 7: Summaries of modification indices for the model corresponding to the chain graph obtained under prior with .

4 Conclusions

In this article we propose a novel Bayesian model for latent AMP chain graphs, for which observations are available only on the nodes of the graph. Posterior inference is performed through a specially devised SMC algorithm. We investigate the ability of the model to recover a range of structures, also when prior knowledge is available. The performance of the SMC sampler is stable and consistent in our numerical study. However, the sampler is not suitable when the number of nodes is large, as the initializing step is difficult. Moreover, the proposed algorithm does not scale well with respect to . The computational cost of computing the probabilities in the adjacency matrix is , and the computation of the normalizing constant in the -Wishart distribution is quite expensive when is large (approximately ).

Several extensions of this work are possible. First, the algorithm can be extended to large by choosing a more efficient initial proposal and by actually exploiting parallel computing techniques. Second, the model can be extended to accommodate multiple groups of observations, allowing borrowing information across groups or time periods. Third, we could avoid sampling and by using a Laplace approximation when calculating the posterior probability.

Appendix A MCMC Kernel

The updates are taken in the order together, then and . We use Metropolis-Hastings steps. The updates for and are preformed element-wise. For each element of and a Gaussian random walk with variance is used as proposal. The acceptance probability is calculated as usual and note that if any of the constraints on or are violated, then the acceptance probability is zero.

The update for together is much more complicated. We denote with the current values of and the proposed states with . The move proceeds by picking an uniformly at random and then proposing one of the other possible 3 values with uniform probability. Note that at this stage, we will check if the proposed graph is a chain graph and if not, the move is rejected instantly. Based upon this proposal, we propose new values of and as detailed below. Suppose the selected edge is , consider the following scheme:

  1. If the transformation of the selected edge is , we take except for elements and . For these elements, we set and . Finally, we set .

  2. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we set .

  3. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we set .

  4. If the transformation of the selected edge is , we take except for elements and . For these elements, we set . Finally, we set .

  5. If the transformation of the selected edge is , we take except for elements and . For these elements, we set . Finally, we take except for element . For this element, we set .

  6. If the transformation of the selected edge is , we take except for elements and . For these elements, we set . Finally, we take except for element . For this element, we set .

  7. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we set .

  8. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we take except for elements and . For these elements, we set and .

  9. If the transformation of the selected edge is , we take except for elements and . For these elements, we set , and . Finally, we set .

  10. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we set .

  11. If the transformation of the selected edge is , we take except for element . For this element, we set . Finally, we take except for elements and . For these elements, we set and .

  12. If the transformation of the selected edge is , we take except for elements and . For these elements, we set , and . Finally, we set .

The acceptance probabiity for this move is easily calculated.

References

  • [1] Abdool, K. Q., Abdool, K. S. S., Frohlich, J. A., et al. (2010). Effectiveness and safety of tenofovir gel, an antiretroviral microbicide, for the prevention of HIV infection in women. Science, 329, 1168-1174.
  • [2] Andersson, S. A., Madigan, D. & Perlman, M. D. (2001). Alternative Markov properties for chain graphs. Scand. J. Statist. 28, 33-85.
  • [3] Anton, P. A., Cranston, R. D., Kashuba, A., Hendrix, C. W., Bumpus, N. N., Harman, N. R., Elliott, J., Janocko, L., Khanukhova, E., Dennis, R., Cumberland, W. G., Ju, C., Dieguez, A. C., Mauck, C., & McGowan, I. (2012). RMP-02/MTN-006: A phase rectal safety, acceptability, pharmacokinetic, and pharmacodynamic study of tenofovir gel compared with oral tenofovir disoproxil fumarate. AIDS Res Hum Retroviruses 28(11), 1412-1421.
  • [4] Baeten, J. M., Donnell, D., Ndase, P., et al. (2012). Antiretroviral prophylaxis for HIV prevention in heterosexual men and women. N Engl J Med, 367, 399-410.
  • [5] Beskos, A., Jasra, A., Kantas, N. & Thiery, A. (2016). On the convergence of adaptive sequential Monte Carlo. Ann. Appl. Probab., 26, 1111-1146.
  • [6] Boerebach, B. C., Lombarts, K. M., keijzer, C., Heineman, M. J., & Arah. O. A. (2012). The teacher, the physician and the person: how faculty’s teaching performance influences their role modeling. PLoS One 7:e32089. doi:10.1371/journal.pone.0032089.
  • [7] Bollen, K. (1989). Structural Equation Models with Latent Variables. Wiley: New York.
  • [8] Carvalho, C. M. & West, M. (2007). Dynamic Matrix-Variate Graphical Models. Bayesian Anal., 2, 69-98.
  • [9] Chun, H., Zhang, X. & Zhao, H. (2015). Gene regulation network inference with joint sparse Gaussian graphical models. J. Comp. Graph. Statist., 24, 954-974.
  • [10] Del Moral, P., Doucet, A. & Jasra, A. (2006). Sequential Monte Carlo samplers. J. Roy. Statist. Soc. Ser. B, 68, 411-436.
  • [11] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. & West, M. (2004). Sparse graphical models for exploring gene expression data. J. Mult. Anal., 90, 196-212.
  • [12] Drton, M. & Perlman, M. D. (2004). A SINful approach to Gaussian graphical model selection. Technical Report 457, University of Washington.
  • [13] Drton, M. & Eichler, M. (2006). Maximum Likelihood Estimation in Gaussian Chain Graph Models under the Alternative Markov Property, Scand. J. Statist., 33, 247–257.
  • [14] Druzdel, M. J & Glymour, C. (1999). Causal inferences from databases: why universities lose students. In Computation, causation, and discovery (eds C. Glymour & G. F. Cooper), Chapter 19, 521-539. AAAI Press, Menlo Park, CA.
  • [15] Jasra, A., Stephens, D. A., Doucet, A. & Tsagaris, T. (2011). Inference for Lévy driven stochastic volatility models via adaptive sequential Monte Carlo. Scand. J. Statist., 38, 1–22 .
  • [16] Kanayama, G., Pope, H. G., & Hudson, J. I. (2018). Associations of anabolic-androgenic steroid use with other behavioral disorders: an analysis using directed acyclic graphs. Psychol Med, 48, 2601-2608.
  • [17] Lauritzen, S.L. & Wermuth, N. (1984). Mixed interaction models. Institut for Elektroniske Systemer, Aalborg Universitetscenter.
  • [18] Lauritzen, S.L. & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their applications to expert systems (with discussion). J. R. Statist. Soc. B, 50, 157-224.
  • [19] Lauritzen, S. L. & Wermuth, N. (1989). Graphical models for association between variables, some of which are qualitative and some quantitative. Ann. Statist. 17, 31-57.
  • [20] Lauritzen, S.L. & Richardson, T.S.  (2002). Chain graph models and their causal interpretations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 321–348.
  • [21] Lenkoski, A. & Dobra, A. (2011). Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. Journal of Computational and Graphical Statistics, 20(1), 140–157.
  • [22] Levitz, M., Perlman, M.D. & Madigan

    , D. (2001). Separation and completeness properties for AMP chain graph Markov models.

    Annals of statistics, 1751–1784.
  • [23] McCarter, C. & Kim, S. (2014). On sparse Gaussian chain graph models. Advances in Neural Information Processing Systems (NIPS).
  • [24] Pearl, J. (1986). A constraint propagation approach to probabilistic reasoning. Kanal, L. M. & Lemmer, J., Eds.

    Uncertainty in Artificial Intelligence

    , North-Holland, Amsterdam, 357-370.
  • [25] Pearl, J. (2014). Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.
  • [26] Pena, J.M. (2014). Learning marginal AMP chain graphs under faithfulness. In European Workshop on Probabilistic Graphical Models (82–395). Springer, Cham.
  • [27] Richardson-Harman, N., Hendrix, C. W., Bumpus, N. N., Mauck, C., Cranston, R. D., Yang, K., Elliott, J., Tanner, K., & McGowan, I. (2014). Correlation between compartmental tenofovir concentrations and an ex vivo rectal biopsy model of tissue infectibility in the RMP-02/MTN-006 phase 1 study. PLoS One. 9(10), e111507.
  • [28] Silva, R. (2013). A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs. In Giudici,P., Ingrassia, S. & Vichi, M. (Eds) Statistical Models for Data Analysis, pp. 343–351, Springer: New York.
  • [29] Silva, R. & Ghahramani, Z. (2009). The Hidden Life of Latent Variables: Bayesian learning with mixed graph models. J. Mach. Learn. Res., 10, 1187–1238.
  • [30] Sonntag, D. & Pena, J.M.  (2016). On expressiveness of the chain graph interpretations. International Journal of Approximate Reasoning, 68, 91–107.
  • [31] Tan, L. Jasra, A., De Iorio, M & Ebbels, T. (2017). Bayesian Inference for multiple Gaussian graphical models. Ann. Appl. Stat., 11, 2222-2251.
  • [32] Wang, H. (2015). Scaling It Up: Stochastic search structure learning in graphical models. Bayes. Anal., 10, 351-377.
  • [33] Wang, H., Reesony, C. & Carvalho, C. M. (2011). Dynamic financial index models: modeling conditional dependencies via graphs. Bayesian Anal., 6, 639-664.
  • [34] Wermuth, N. (1980). Linear recursive equations, covariance selection and path analysis. J. Am. Statist. Assoc, 75, 963-972.
  • [35] Wermuth, N. & Lauritzen, S. L. (1990). On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J. Roy. Statist. Soc. Ser. B 52, 21-72.
  • [36] Yang, K. H., Hendrix, H., Bumpus, N., Elliott, J., et. al. (2014). A multi-compartment single and multiple dose pharmacokinetic comparison of rectally applied tenofovir gel and oral tenofovir disoproxil fumarate. PLOS One 9, e106196.
  • [37] Zhou, Y., Johansen, A. M. & Aston, J. A. (2016). Towards Automatic Model Comparison: An Adaptive Sequence Monte Carlo Approach. J. Comp. Graph. Statist., 25, 701-726.