Log In Sign Up

Analysis of an interventional protein experiment using a vine copula based structural equation model

While there is considerable effort to identify signaling pathways using linear Gaussian Bayesian networks from data, there is less emphasis of understanding and quantifying conditional densities and probabilities of nodes given its parents from the identifed Bayesian network. Most graphical models for continuous data assume a multivariate Gaussian distribution, which might be too restrictive. We re-analyse data from an experimental setting considered in Sachs et al. (2005) to illustrate the effects of such restrictions. For this we propose a novel non Gaussian nonlinear structural equation model based on vine copulas. In particular the D-vine regression approach of Kraus and Czado (2017) is adapted. We show that this model class is more suited to fit the data than the standard linear structural equation model based on the biological consent graph given in Sachs et al. (2005). The modelling approach also allows to study which pathway edges are supported by the data and which can be removed. For data experiment cd3cd28+aktinhib this approach identified three edges, which are no longer supported by the data. For each of these edges a plausible explanation based on underlying the experimental conditions could be found.


Learning Linear Non-Gaussian Graphical Models with Multidirected Edges

In this paper we propose a new method to learn the underlying acyclic mi...

Structural Learning of Multivariate Regression Chain Graphs via Decomposition

We extend the decomposition approach for learning Bayesian networks (BN)...

Discovering Cyclic Causal Models by Independent Components Analysis

We generalize Shimizu et al's (2006) ICA-based approach for discovering ...

Bayesian Discovery of Linear Acyclic Causal Models

Methods for automated discovery of causal relationships from non-interve...

Neighborhood selection with application to social networks

The topic of this paper is modeling and analyzing dependence in stochast...

FASK with Interventional Knowledge Recovers Edges from the Sachs Model

We report a procedure that, in one step from continuous data with minima...

1 Introduction

One important task in biology is to identify signalling pathways caused using a stimulus defined by an interventional experimental setting. The resulting cascade of chemical reactions on proteins in a cell has to be understood and quantified. A well known and publicly available data set for this kind of set up is the data collected and initially analyzed by Sachs2005. It involves measurements on proteins on cells under different experimental conditions. Often the underlying pathways are learned from these cell measurements using graphical models on directed acyclic graphs (DAGs), which are also called Bayesian networks (Pearl1988, Lauritzen1996, KollerFriedman2009). Here the associations between the random variables are modelled by a statistical model on a graph with node set and edge set . Using Markov assumptions and allowing only for acyclic graphs, the joint density of can be expressed as product of conditional densities


where is the parent set of , i.e the set of nodes which have an edge going to . More precisely . Note that can be the empty set. The set at observed values of the parent nodes is denoted by .

Sachs2005 used a single DAG to model the pooled data collected under all experimental conditions. Since the Sachs data involves only continuous data, Gaussian DAGs or undirected Gaussian graphical models (dempster1972covariance) are often utilized. Gaussian DAG models are often constructed using a structural equation model (SEM) (mulaik2009linear, kaplan2008structural).

Many approaches have been followed to learn the underlying pathway network from this data. To name a few, friedman2008sparse used graphical lasso to construct an undirected graph for the pooled data, while ellis2008learning discretized the data and used a graphical model for discrete data. peterson2015bayesian constructed a selection prior for different experimental conditions to conduct a joint Bayesian MCMC analysis. Earlier luo2011bayesian developed a non graphical Bayesian hierarchical approach, where the regression coefficients for each experimental condition are linked by a prior. scutari2013identifying

used bootstrapping methods to identify significant edges for the Sachs data. A penalized maximum likelihood estimator of the interventional Markov equivalence classes under linear Gaussian assumptions was derived in

hauser2015jointly. castelletti2018learning, castelletti2019objective use an objective Bayes approach to learn the Markov equivalence classes in a Gaussian DAG for the Sachs data. ramsey2018fask

introduced a fast adjacency skewness (FASK) algorithm, which orients edges derived from a fitted undirected network by incorporating skewness properties of the variables. Genetic algorithms have also been applied to this data set (


Many of the pathway learning approaches assume that there is an underlying joint Gaussian model available which allows the conditional densities in (1) as Gaussian. However this might not be true for the data considered. Already hauser2015jointly mentioned that the Gaussian assumption for the Sachs data is questionable. This was also noticed by voorman2014graph, zhang2017mixture. voorman2014graph developed a nonparametric approach to include non Gaussian behavior in a graphical model, while zhang2017mixture build a Bayesian network using a mixture copula on dimensions to model the conditional distribution of a node given the observed values of the parents to accommodate the non Gaussianity of the data and the pooling over the different experimental conditions. Here he followed the approach of elidan2010copula, who was the first to use a copula based representation of the conditional density in (1).

One special feature of the Sachs data is that there exist a biological consent graph given in Figure 3 of Sachs2005. It consists of 20 edges connecting the eleven nodes of the Sachs data. This consent graph is shown in Panel (a) of Figure 3.

(a) All edges
(b) Edges in the D-vine models
Figure 3: Panel (a): Consent DAG of the Sachs data. Panel (b): Modelled edges in all non Gaussian D-vine based SEMs in blue and red. Edges missing in the Gaussian SEM are colored in red

In this paper we want to extend the flexibility by modeling the conditional densities in Equation (1) using a copula based approach. For this we suggest the use of vine copulas (joe2014dependence, czadobook), which have become very popular among practitioners. One major reason for this is the enormous flexibility of this class compared to standard multivariate copula families such as the class of elliptical or Archemedian copulas. In particular the vine copula class is constructed using only bivariate copulas, which can be chosen separately and data driven. For the conditional densities needed in the DAG model we use the ones derived from a fitted vine distribution of the node and its parents . The vine distribution is chosen in such a way that the conditional density of a node given its parents is available in closed form. The choice of the vine model is further optimized such that the conditional log-likelihood is maximized when parent nodes are included in a forward selection. For this we use the approach taken in kraus2017d

derived for vine copula based quantile regression models. The forward selection of the parent nodes will also be used to identify which edges in the consent graph are supported by the data and allows for a potential reduction of the edges in the graph.

Overall this approach allows us to fit very general vine copula based Bayesian networks, thus extending the copula Bayesian network of elidan2010copula. We call the resulting models vine copula based SEMs and show the need of this novel model class for one experimental setting of the Sachs data using the consent graph. We restrict to a single experimental setting to avoid accommodating clusters present in the pooled data set arising from different experimental conditions.

This new approach allows us to quantify the effects of ignoring the non Gaussianity in the data with regard to the changes in the estimated conditional densities given the parent nodes as well as on simulated conditional quantiles based on the fitted models. In summary this extension provides an important building block for learning non Gaussian networks. It is easy to fit to a given graph, can remove nonsignificant edges in the graph and allows to quantify differences in conditional densities as well as probabilities. For one experimental setting of the Sachs we show that this approach leads to a better understanding and quantification of the graphical structure.

The paper is organized as follows: In Section 2 we introduce statistical models on DAGs including Gaussian DAGs more formally. The needed background on copulas and especially on vine copulas is given in Section 3. This includes the subclass of D-vine copulas. General vine copula based regression models are discussed in Section 4. The included D-vine regression forms the building block of the novel D-vine copula based SEMs introduced in Section 5. In Section 6 we present D-vine copula based structural equation analysis for the experimental setting of the Sachs data. In Section 7 we provide simulation based evidence for the better fit of the D-vine copula based SEM compared to the standard linear Gaussian Bayesian network fit. In Section 8 we quantify differences between the fitted models based on conditional probabilities and densities. The paper closes with a summary and an outlook to future research directions.

2 Statistical models on DAGs

To describe the behavior of many variables it is important to identify how the variables interact. For many applications it is useful to describe these interaction by a network with directed edges. Following Pearl1988 a Bayesian network is a directed acyclic graph (DAG), where each node represents a random variable

. The edges connecting the nodes indicate direct causal influence between the linked variables and their strengths are quantified by conditional probabilities or densities in the case of continuous random variables.

Since we are dealing with continuous data in this paper we start with a linear SEM. For this we assume that we have a graph for the node set and edge set with directed weighted edges. Let be an associated dimensional adjacency matrix with th element given by weight , which will be estimated. Further if and only if , i.e. there is an edge . Let be a

dimensional multivariate normal random vector with zero mean vector and diagonal covariance matrix

. Then a linear SEM for is defined as


It follows that is multivariate normal with zero mean vector and covariance matrix satisfying . Using conditioning arguments the factorization (1) follows, where the the conditional densities are univariate normal densities. In general such a factorization is equivalent to the Markov assumption with respect to the graph (see Theorem 3.27 of Lauritzen1996).

Since the graph is acyclic there is at least one permutation of the numbers 1 to such that for all . Such a permutation is called a topological order. One such topological order resulting from the consent graph in Figure 3 applied to the eleven protein variables on the logarithmic scale

of the Sachs data is given in Table 1.

pip3 plc pip2 pkc pka p38 jnk raf mek erk akt
Table 1: Topological order of the consent graph of Figure 3

Reordering the columns and rows of according to the topological order allows to express A as a strict upper triangular matrix and thus (2) can be expressed recursively using the topological order. Utilizing (1) and the topological order we can therefore express the joint density of the Sachs data with the eleven protein variables based on the consent graph as


Hence instead of estimating one eleven dimensional density we can estimate eleven univariate conditional densities, where each of them has at most three conditioning variables. We use the special case of a linear Gaussian Bayesian network (LGBN) as a reference model, where the weights of the conditional densities derived from the SEM (2) are estimated. In particular it satisfies


for regression parameters

and residual variances

for . The corresponding maximized log-likelihoods as well as AIC and BIC statistics are given in Table 2 for the experimental setting of the Sachs data. We see that the protein contributes more to the overall fit compared with the other proteins, while does the opposite. This is expected in this experimental setting since we have activation of the levels for and inhibition for .

Since the SEM (2) and the LGBN (4) induce marginal normality an inspection of the marginal histograms of the Sachs experiment shows skewed and multimodal histograms (see first column of Figure A4 of Appendix A1) thus indicating that the reference model might be inadequate. To extend beyond the LGBN we discuss now a copula based approach. This will allow us to construct nonlinear non Gaussian SEMs.

Node log-likelihood AIC BIC
pip3 -986.90 1977.81 1987.29
plc -921.77 1849.53 1863.75
pip2 -1249.37 2506.74 2525.70
pkc -952.32 1912.64 1931.60
pka -830.92 1667.84 1686.80
p38 -570.72 1149.45 1168.40
jnk -991.87 1991.74 2010.70
raf -969.82 1947.64 1966.60
mek -551.50 1112.99 1136.69
erk -968.07 1944.14 1963.09
akt -196.03 402.06 425.75
: -9189.29 18462.58 18666.37
Table 2: Log-likelihood, AIC and BIC of the fitted LGBN for the experiment of the Sachs data on the original scale (F)

3 Copulas and vine distributions

For the analysis of large multivariate data sets multivariate statistical models are required, which can adequately describe not only the data center but also their tail behavior. For this standard multivariate distributions such as the multivariate normal or Student distribution are insufficient. They often require that all univariate and multivariate marginal distributions are of the same type and might result in symmetric tail dependence of pairs of variables. These characteristics are often not satisfied and so the copula approach developed by Sklar allows to separate the marginal behavior from the dependence structure. In particular a -dimensional copula is a multivariate distribution function on the -dimensional hyper cube

with uniformly distributed marginals. The corresponding copula density for an absolutely continuous copula we denote by

. The fundamental theorem of Sklar for a -dimensional random vector

with joint distribution function

and marginal distribution functions is given by


for some -dimensional copula . When is absolutely continuous the associated density can be expressed as


with marginal densities and copula density .

For absolutely continuous distributions the copula C is unique. Using (5) or (6) flexible multivariate distributions can be constructed if -dimensional copulas are used. By inversion of (5) we can use any -dimensional distribution function to obtain the corresponding copula. Examples for such copulas are the Gaussian and the Student copula. Note that using these copula families together with arbitrary margins results in multivariate distributions, sometimes also called meta distributions, which are much more flexible than the multivariate distribution classes used in the inversion. Another class of parametric copulas are built directly using generator functions is the class of Archimedean copulas. The Gumbel, Clayton and Frank copula families are prime examples. A nice introduction to copulas is given in genest2007 and more theoretical treatments are the books by nelsen2007introduction and Joe1997.

From (6) for we immediately can derive expression for the conditional density and distribution functions, which are needed later. In particular the conditional density and distribution function can be expressed as


Since vine copulas are built out of bivariate copulas we give some properties of bivariate copulas. For this we use pairwise dependence measures such as Kendall’s and Spearman’s , which are completely determined by the copula to measure the overall strength of the dependence. To assess the tail dependence upper and lower tail dependence coefficients can be used. It is easy to see that the Gaussian and the Frank copula do not have tail dependence, while the Student copula has symmetric tail dependence. Further the Clayton copula has no lower tail dependence while the Gumbel has upper tail dependence. For more details see for example Chapter 2 of czadobook.

To allow for a visual comparison between different bivariate copula families marginally normalized contour plots are helpful. For this we consider three different scales: the original scale , the copula scale and the marginally normalized scale (z-scale) . Here denotes the distribution function of a random variable. Comparison of contours on the copula scale for different families is difficult, since copula densities are in general unbounded at the corners of . This is not the case if one works on the z-scale. Here has margins and thus any non-elliptical contour shape indicates a deviation from a Gaussian dependence.

We now turn to estimation in a parametric setup. In a copula based model specified by (5) or (6) we have to estimate both the marginal and copula parameters. For this joint maximum likelihood estimation can be used, if the number of parameters are not too large. However it is more common to use a two step approach, by first estimating the marginal parameters based on i.i.d observations for . This can be done separately for each of the margins. In a second step pseudo copula data is formed by setting


using the estimated probability integral transforms (PIT) . Then the copula parameters are estimated based on the pseudo copula data. If parametric marginal models are used we speak of an inference for margins approach (IFM) and if the empirical distribution is applied to the margins we have a semiparametric approach. The efficiency of the IFM approach has been investigated by joe2005, while the semiparametric approach has been proposed by Genest.

In the case where parametric marginal and copula models do not fit the data, we can also use a nonparametric approach. For the margins kernel density based estimates of the distribution function can be used. For the bivariate copula case also many nonparametric estimators are available. Their efficiency have been investigated in nagler2014kernel. The results showed that bivariate kernel density based estimation is preferred and this option is also available in rvinecopulib of rvinecopulib2019 to fit the more general class of vine distributions, which are build using bivariate copulas. We now give a short exposition of this class.

While the catalogue of bivariate parametric copula families is large, this is not the case for . Therefore the aim of early research in vine copulas was to find a way to construct multivariate copulas using only bivariate copulas as building blocks. The appropriate tool to obtain such a construction is conditioning and Joe96 gave the first pair copula construction of a multivariate copula in terms of distribution functions, while Bedford3, Bedford2 independently developed constructions in terms of densities. Additionally they provided a general framework to identify all possible constructions. We shortly illustrate this construction for starting with the recursive factorization


and treat each term separately. To determine we consider the bivariate conditional density . This density has and as marginal distributions (densities) with associated conditional copula density . More specifically denotes the copula density associated with the conditional distribution of given . Applying (6) to gives


Now is the conditional density of given which can be determined using (7) applied to (11) yielding


Finally direct application of (7) gives


Inserting (12), (13) and (14) into (10) yields a pair copula decomposition of an arbitrary three dimensional density as


We see that the joint three dimensional density can be expressed in terms of bivariate copulas and (conditional) distribution functions. However this decomposition is not unique, since


are two different decompositions using a reordering of the variables in (10). In these decompositions we have in general different conditional copula densities when the value of the conditioning variable is changing. This is intractable for estimation and therefore often the so called simplifying assumption is made, i.e the dependence on the conditioning value is ignored. For example we set in (15) for every choice of . Bivariate copulas are also referred to as pair copulas. Using the simplifying assumption in (15) we speak of a pair copula construction and not of a decomposition. In this setup we can estimate both marginal and copula parameters in a parametric setup or use a nonparametric approach for both margins and bivariate copulas. Note that the specification of marginal and pair copula distributions can be done independently thus we have constructed a very flexible class of three dimensional densities.

This construction principle involving only marginal distributions and pair copulas can be extended to arbitrary dimensions. For arbitrary dimension we need a building plan identifying which bivariate conditional distributions and their associated bivariate copulas are needed in the construction. This building plan is specified through a set of linked trees, where nodes in the next tree are edges of the current tree. Further the edges of the next tree have to satisfy the so called proximity condition. This tree structure was called by Bedford3, Bedford2 a vine tree structure. The corresponding edges identify the pair copulas needed to construct the -dimensional density called a regular (R-) vine distribution. Since the number of such vine tree structures grows super exponentially (morales2011count), we need a greedy algorithm such as the one developed by dissmann2013selecting. Under the simplifying assumption parameter estimation can be performed in a stepwise fashion over the pair copula terms in a tree. This approach was developed by aas2009pair and applied to financial data. There are two simple sub classes of R-vine distributions. The first one is called a D-vine distribution and uses as allowed tree structures only a path of all nodes in the tree. The class of C-vines occur in the case that all trees are stars with a root node in the center.

For a detailed introduction to pair copula constructions and the resulting vine distributions consult czadobook and joe2014dependence. The use of such vine copula based models has grown tremendously. aas2016pair and czadonagler2022 give reviews and point to current developments.

4 Vine copula based regression models

After constructing multivariate distributions we are interested in constructing a vine copula based regression model to allow for nonlinear regression effects. In a first approach chang2019prediction, chang19a, cooke2019vine build a vine distribution for the the covariates only and then add the response to the vine tree structure such that the resulting structure remains a vine tree structure and allows to specify the conditional response distribution in closed form. In this way numerical integration over the covariates is avoided. This first focus on the joint distribution of the covariates is not natural in the context of a regression problem, therefore an alternative is to start with the response as first node and add covariates one by one in such a way that again the resulting structure is a vine tree structure and the associated density of the response given the covariates is available in closed form. The adding a covariates stops if the conditional (penalized) log-likelihood does not increase any more when a further covariate is added. This approach was first developed for D-vine models by kraus2017d and later extended to certain R-vine structures by zhu2021simplified. tepegjozova2021nonparametric considered both D- and C-vines, but generalized the procedure to look two steps ahead before adding a new covariate. While the D- and C-vine based regression methods are feasible in large dimensions, the search for R-vines based regression is restricted to smaller dimensions because of the huge number of allowed R-vine tree structures.

5 D-vine copula based structural equation models

We now extend the LGBN model (4) to allow for non linear effects of the parents on a node. For this we recall the factorization (1) and formulate for the conditional density of the node given its parents a vine copula based regression model. If we use the D-vine based regression formulation discussed in Section 4 then we can express the conditional density of node given its parents , where is the number of parents of node for as



and . This will now form the building block for the factorization (1) and in view of (4) we call the model specified by (1) and (18) a D-vine copula based SEM.

The order of the parent nodes in the D-vine based conditional density (18) is apriori not fixed, we will however later choose the order as obtained by using the methods developed by kraus2017d and described in Section 4. Thus we start with the most important parent node with regard to the conditional (penalized) log-likelihood of the node and continue adding parent nodes until this quantity can no longer improved or all parent nodes are included.

Since the forward selection of the parent nodes also includes the possibility of not including all parents this gives a way to remove edges from the starting consent graph. These edges are then not supported by the data. We will now apply this class of copula based SEMs to the experimental setting of the Sachs data.

6 Analysis of experimental setting from the Sachs data

For this analysis we used the data from on the logarithmic scale. Recall that using the consent graph given in Figure 3 results in the density decomposition (2) for the eleven protein variables. Next we want to utilize the D-vine copula based SEM introduced in Section 5. This entails that every density term in (2) is modeled by using the D-vine regression introduced in Section 4. Using (18) it follows that (2) can be rewritten as


Note that if a node has more than one parent, the order of the D-vines might change depending on the chosen marginals and copulas. In this case we use numbers to denote the position in the order of the D-vine instead of the explicit nodes. Hence, ”” denotes the first parent of the modeled node, ”” the second parent in the D-vine regression and so on. For our analysis we have 845 observations available for the experiment. In this experiment the reagent Anti-CD3/CD28 111General perturbation: Activates T cells and induces proliferation and cytokine production. Induced signaling through the T cell receptor (TCR), activated ZAP70, Lck, PLC-g, Raf, Mek, Erk, and PKC. The TCR signaling converges on transcription factors NFkB, NFAT, and AP-1 to initiate IL-2 transcription (Sachs2005) and AKT 222Inhibitor specific perturbation: Binds inositol pleckstrin domain of AKT and blocks AKT translocation to the membrane where normally AKT becomes phosphorylated and active [median inhibitory concentration (IC50) 0 5 mM]. Inhibition of AKT and phosphorylation of AKT substrates are needed to enhance cell survival (Sachs2005) are having an activation effect on the protein and and an inhibitor effect on according to Figure 2 of Sachs2005.

To estimate the parameters of the marginal and copula terms in Equation (6) we follow the IFM approach discussed in Section 3. So in a first step we have to estimate the margins and then form appropriate pseudo data as in (9). In the second step we fit a D-vine copula to the node as first variable and its parents as remaining variables in the described forward approach.

To see how the fit of the model varies, when using different marginals (M) or copulas (C

) we fit three different copula based SEMs. For the first model we use kernel density estimates as margins and allow for parametric and non-parametric copulas (

MC), for the second one we use Gaussian mixture margins since we observe some multi nodal margins and the same set of copulas (MC) and for the last model we only allow for Gaussian margins and Gaussian copulas (MC). A non parametric pair copula is chosen, when it fits better than any implemented parametric family and is implemented in vinereg of vinereg.

The fit of the different marginals is shown in Figure A4 of Appendix A1 comparing marginal histograms to the fitted density, qq plots and checking if the data after applying the PIT using the estimated marginal densities is approximately uniformly distributed. We observe that using Gaussian margins for the PIT in general does not result in uniformly distributed data indicating a bad fit. On the other hand, Gaussian mixture margins and kernel density estimates fit the margins very well. It is hard to distinguish between these two models. We also consider the goodness of fit measures log-likelihood, AIC and BIC of the fitted marginals. The results are displayed in Table A1 in Appendix A2 and show that Gaussian margins give the worst fit.

To support the use of copula based SEMs instead of linear Gaussian ones, we give normalized contour plots in Figure A8 of Appendix A1. From there we see non elliptical shapes in the lower triangular panels regardless which marginal models were used, thus pointing towards the need to capture non Gaussian dependence.

In the next step we use the fitted marginals to transform the data to the copula scale by applying the estimated PIT. On these new data sets we then fit copulas according to three defined models.

Using kernel density estimates for the PIT and then fit a D-vine regression model for each node with at least one parent results in a log-likelihood of 2316.23, an AIC of -4228.73 and a BIC of -3272.06 of the copula terms. Using Gaussian mixture margins and the same set of copula families results in a log-likelihood of 2292.34, an AIC of -4189.49 and a BIC of -3252.98 of the copula terms. The last model where Gaussian margins and Gaussian copulas are used worsens the three measures to a log-likelihood of 1497.84, an AIC of -2971.68 and a BIC of -2914.80 of the copula terms. More detailed results are displayed in Table A2 and Table A3 on the copula scale as well as contour plots of the fitted copulas in Figure A12 of Appendix A3. They show that Gaussian pair copulas are only seldom chosen in the non Gaussian SEMs. Further the parametric pair copula families contained in vinereg are not sufficient and many nonparametric pair copulas (tll) are selected. We see strong similarities on the fitted copula scale between the MC model and the MC model.

If a parent is not selected in the D-vine regression for the node , this indicates that the copula density is very close to the bivariate independence copula density ( for every ) and thus the edge is not supported by the data. This implies that in both copula based non Gaussian SEMs only 17 of the 20 dependencies given by the biological consent graph are present for this experimental setting (see Panel (b) of Figure 3). The removed edges are and . The deletion of implies that there is no pathway from to . This might be a consequence of the inhibiting effect on . Further the deletion of implies that can only be reached from over , which is sensible since in this experiment is activated. The removal of is also sensible since is inhibited in this setting. The edges and also were identified as edges to be removed from the consent graph in the original analysis of Sachs2005.

The number of supported edges decreases for the MC model further. Here only 12 edges are modelled. The removal of these additional four edges (colored red in Panel (b) of Figure 3) is questionable, since as we show now that the fit of the MC is inferior to the other two copula based SEM’s.

We now include the marginal fit for the different SEMs based on Equation (18) to compute overall fit measures, which allow us to also compare to the performance of the standard LGBN. These are contained in Table 7.

Model raf mek plc pip2 pip3 erk akt pka pkc p38 jnk :
LGBN -970 -552 -922 -1249 -987 -968 -196 -831 -952 -571 -992 -9189
MC -902 -399 -827 -921 -895 -822 -79 -766 -770 -246 -837 -7464
MC -907 -416 -826 -934 -917 -830 -103 -768 -775 -263 -845 -7587
MC -970 -552 -922 -1249 -987 -968 -197 -831 -952 -571 -993 -9193
Table 3: *

(a) Log-likelihood of the conditional densities

Model raf mek plc pip2 pip3 erk akt pka pkc p38 jnk :
LGBN 1948 1113 1850 2507 1978 1944 402 1668 1913 1149 1992 18463
MC 1858 876 1718 1938 1813 1695 236 1552 1556 565 1750 15558
MC 1854 889 1710 1967 1844 1702 276 1555 1569 590 1747 15703
MC 1948 1113 1850 2505 1978 1943 401 1666 1911 1148 1991 18453
Table 4: *

(b) AIC of the conditional densities

Model raf mek plc pip2 pip3 erk akt pka pkc p38 jnk :
LGBN 1967 1137 1864 2526 1987 1963 426 1687 1932 1168 2011 18666
MC 1989 1057 1870 2165 1868 1817 421 1599 1596 738 1930 17051
MC 1946 1021 1847 2202 1867 1800 441 1597 1611 740 1883 16957
MC 1967 1132 1864 2519 1987 1957 420 1676 1925 1162 2005 18614
Table 5: *

(c) BIC of the conditional densities

Model raf mek plc pip2 pip3 erk akt pka pkc p38 jnk :
LGBN 4 5 3 4 2 4 5 3 4 4 4 42
MC 28 38 32 48 12 26 39 10 8 37 38 315
MC 20 28 29 49 5 21 35 9 9 32 29 265
MC 4 4 3 3 2 3 4 2 3 3 3 34
Table 6: *

(d) Number of (effective) parameters of the conditional densities

Table 7: Goodness of fit measures of the fitted copula based SEMs and LGBN

We observe that the LGBN and the D-vine based SEM with Gaussian margins and copulas result in almost identical results for all three goodness of fit measures. This is reasonable as both of them specify a multivariate Gaussian distribution following KollerFriedman2009 and MORALES2008699. The parameters of the conditional Gaussian distribution given by each node in the two models are displayed in Table S3 in the Supplement S1. Many similarities between these two models are visible even though all possible dependencies are modelled in the LGBN, as it was fitted optimizing the conditional log-likelihood with all parents of a node, whereas this is not the case for the MC.

Comparing the two models to the MC model and the MC model we can see a strong improvement in all goodness of fit measures in the models with parametric and non-parametric copulas. This is no surprise since we have already seen that Gaussian mixture margins and kernel density estimates provide a much better fit than Gaussian margins and allowing flexibility in the choice of the pair copula families provides an advantage. We observe that between the non Gaussian models only slight differences are visible mostly due to the large mumber of effective parameters needed for the kernel density estimates, which influences the BIC.

7 Model validation

Up to this point we have only compared the goodness of fit measures between the four model but have not evaluated if they really reproduce structures visible in the observed data. For this we sampled 845 times from each model by starting in the node

and sampling it using its fitted marginal density. We then follow the topological order, i.e for each variable in this order we simulate data for the node using the already simulated data for its parents. For the LGBN we use that we can express the conditional distribution as a conditional normal distribution. To sample from the D-vine copula based SEMs we utilize the algorithm presented in


(a) MC
(b) MC
(c) MC
(d) LGBN
Figure 8: Scatter plots for each pair of nodes of the respective model in the lower panel and of the data set in the upper panel. Green plots in the lower panel indicate that an edge exists between these two nodes in the model. For a better visibility the same pairs are colored in red in the upper panel

To compare the induced pairwise dependence structures we consider the scatter plots for node pairs of the observed data and of the simulated data sets from the different models in Figure 8. Looking at the scatter plots of the MC model and the MC model we observe almost no differences. This does not surprise as we have already seen that their fitted copulas are almost identical. Comparing the scatter plots to the observed scatter from the data set we see that they closely match. For all variable pairs the shapes, even more complicated ones like between the node pairs or , are similar from the simulated data of the two fitted non Gaussian SEMs and the original data.

However comparing the scatter plots of the MC model and the LGBN instead, it is clear that both of them have problems modeling more complicated shapes as they are only able to model elliptical shaped dependencies. It seems that the MC model does this a little bit better looking at the pairs plots of and which are closer to the ones from the original data set than the ones from the LGBN. Since in Figure 8 we only considered bivariate dependence properties, we check if high/low values in several variables at once appear in a similar frequency in the data and the simulated data. Hence, we sum up over the variables in each sample and then fit kernel density estimates to the resulting data. We can then compare the kernel density estimates fitted to the sum over the data and simulated data from the investigated models. The results are displayed in Figure 9.

Figure 9: Histograms of the sum over all nodes in the experiment and fitted kernel density estimates to the simulated data sets of the different models

Comparing the fitted kernel density estimates of the sum over all nodes we see that especially the ones fitted to the simulated data from the MC model and the MC model are very close to the original data set and are only slightly shifted to the left. On the other hand, we observe that the ones fitted to the sum of the simulated data from the MC model does not reach the height of the mode compared to the original data set whereas the LGBN overestimates the height of the mode. Hence, considering the ability of the different models to recreate the data set they are fitted on the MC model and the MC model outperform the other two.

8 Conditional density and medians for given values of parent nodes

To see the effect of allowing a more general dependence structure in the SEM (18) compared to a LGBN we investigate the behavior of the conditional density of a node given its parents for the studied models. Here we like to investigate the effect when median and extreme values of the parents are used. If a node only has one parent in the DAG, we chose the , and empirical quantile of the parent node as conditioning values. This is appropriate for the nodes and and the associated conditioning values are shown in Figure S1 of Supplement S2.

Since there is no simple equivalent of quantiles in two or three dimensional distributions a different approach had to be used for all other nodes. If a node has two or three parents, we fitted a two/three-dimensional kernel density estimate to the joint distribution of the parent nodes using the data. We then manually choose one point close to the mode of the fitted density, two in the tails and two more which are neither close to the mode nor in the tails. This is illustrated in Figure S2 for nodes with two parents and Figure S9 for nodes with three parents, respectively, in Supplement S2. Note that while six nodes have two parents, (, , , , , ), three of them (, and ), have the same set of parents, namely and , so we only obtain four cases for the values of the conditioning variables. The chosen values of the parent nodes are summarized in Table 8.

Node Parent Mode Middle Tail
plc pip3 2.68 3.56 4.38
pip2 plc 2.40 1.80 2.80 3.80 2.90
pip3 3.50 3.60 3.90 4.30 2.10
pkc plc 2.50 1.90 2.90 1.90 4.00
pip2 4.90 5.20 4.80 2.50 5.00
pka pkc 1.80 2.93 3.46
p38, jnk, raf pka 6.00 6.00 7.00 7.50 6.50
pkc 3.00 2.50 3.00 2.50 1.75
mek raf 3.62 4.79 3.49 4.43 1.97
pka 6.10 6.52 7.40 7.64 6.63
pkc 2.88 3.44 3.10 2.04 2.71
erk mek 3.10 3.50 2.90 4.50 3.00
pka 6.10 6.20 6.40 6.00 7.50
akt erk 2.57 4.22 2.10 2.58 1.54
pka 6.19 7.15 5.89 6.28 6.63
pip3 3.73 3.68 3.99 5.41 4.66
Table 8: Chosen values of the parent nodes used for the density of a node given its parents

In the following we will denote the points which we chose close to the the mode of the fitted parent distribution as ”Mode”, the points in the tails as ”Tail” and the points between them as ”Middle”. If a node only has one parent, we will assign the points from the and the quantile to ”Tail” and the point from the quantile to ”Mode”. Note that as in the MC model not all dependencies have been modeled, in this model sometimes the number of parents change with the model.

As earlier we sample 845 times from each node of the model given the values of the parent nodes and then fitted kernel density estimates to this sample. The resulting density plots are displayed in Figure 10. As we expect, we observe only small differences in the general shape of the conditional density plots between MC and MC. The same holds when comparing the MC model and the LGBN. However the shapes of the conditional density can be very different for the non Gaussian copula based SEMs and the Gaussian SEMs, thus illustrating very different conditional effects for the chosen parent values.

Mode                     Middle                                Tail

Figure 10: Density plots of the kernel density estimates fitted to the simulated data based on different structural equation models conditioned on values for the parent nodes in Table 8 following the order there. Blue: MC. Green: MC. Brown: MC. Orange: LGBN

As a final visualisation of the differences on relevant conditional quantities we consider the conditional median for each node given its parents. We follow the topological order as given in Table 1. For we use a sample quantile of level denoted by as value of pip3. Next we determine the median of plc given using the D-vine quantile regression of kraus2017d, we denote this median as . For the node we determine the conditional median given using D-vine quantile regression. We denote this conditional median by . We continue in this way until all conditional medians are determined. The needed parents for the different models can be found in Table A3 of Appendix A3. We investigate three different and levels and the results are visualized in Figure 14. We see pronounced differences for the nodes and .

Figure 14: Conditional median of a node given the fitted median value of its parents starting with at the sample quantile at level (Brown: MC, blue: MC, green: MC, orange: LGBN)

9 Summary and future research directions

In this paper we suggested to use a D-vine copula based SEM to analyze data which obeys a graphical structure. At least a starting graph has be known, but the data does not need to follow a joint Gaussian distribution. The appropriate copula regression model for the conditional distribution of the node given its parents is very flexible, since the bivariate building blocks of the D-vine regression can be independently chosen in addition to the marginal models for the node and its parents.

The proposed approach uses data reduction in two ways, first by the choice of the graph and second by the forward selection of parent nodes ordered by importance. Edges pointing to a node in the starting graph can be removed when in the D-vine regression of the node the corresponding parent node is not selected as a covariate. This allows for an edge selection without relying on the Gaussian assumption of the data. This approach identified three edges in the network, which are not no longer supported by the experimental conditions investigated. The removal of these edges were found to be plausible given the specific experimental conditions.

We illustrated the approach using the experimental setting of the Sachs data. Here we showed a much better fit of the copula based SEM with margins fitted by mixtures of univariate normals or nonparametric kernel density estimation and using non Gaussian pair copulas.

These D-vine copula based SEMs are easy to fit and can be applied to large networks, since D-vine regressions are feasible for several hundred covariates. In addition it is easy to simulate from the model and assess conditional densities or probabilities of the nodes given observed values of the selected parent nodes.

These D-vine copula based SEM’s can also be utilized to conduct a causal analysis using the pooled Sachs data. For this we would subdivide the data in a node specific way. For a given node we would select only observations from experimental settings which do not inhibit or activate the node. This would remove the interventional effects present in the data and such an analysis is currently investigated.

While it seems like that the knowledge of an initial graph is restrictive, the approach can be combined with any structure learning approach. In particular to allow for non Gaussian structure selection the vine copula based approach of bauer2012pair to test for conditional independence in a PC algorithm (kalisch2007estimating, spirtes2011causation) can be used and the selected graph can be fitted using the proposed D-vine copula based SEM.

Two immediate extensions of the proposed modeling approach are to allow for R-vine based SEMs or to improve the forward selection procedure by looking two steps ahead instead of a step as has been proposed by tepegjozova2021nonparametric.

A non Gaussian analysis of the complete Sachs data would require the development of efficient mixtures of vine regression models to accommodate the different clusters induced by the different experimental settings. A first step in this direction is sahin2021vine, who developed a data driven algorithm for the specification of vine distributions in a mixture setup for clustering. Such a development might further improve the fit of the experimental, since for some nodes we can observe clusters in the data (see Figure A8).


A1: Marginal fits for the copula based structural equation model

(a) M
(b) M
(c) M
Figure A4: Density plot and qq plot of the chosen margins (in the first two columns) and histogram of the data after applying the distribution function of the fitted marginal distributions, in the third column, for all nodes for the chosen marginals on the data set
Kernel density estimates Gaussian mixture margins Gaussian margins
Node Log-lik. AIC BIC # Par. Log-lik. AIC BIC # Par. Log-lik. AIC BIC
pip3 -895.20 1813.47 1868.14 11.54 -916.83 1843.66 1867.36 5 -986.90 1977.81 1987.29
plc -952.81 1927.03 1977.73 10.70 -951.96 1919.92 1957.83 8 -979.48 1962.97 1972.44
pip2 -1214.93 2447.95 2490.80 9.04 -1225.04 2466.07 2503.99 8 -1338.18 2680.35 2689.83
pkc -771.76 1558.08 1592.58 7.28 -777.10 1570.20 1608.11 8 -954.10 1912.21 1921.69
pka -767.31 1552.31 1594.24 8.85 -769.54 1555.08 1592.99 8 -831.22 1666.44 1675.92
p38 -765.47 1546.82 1584.45 7.94 -778.79 1567.58 1591.28 5 -925.20 1854.40 1863.88
jnk -923.74 1874.36 1938.03 13.44 -929.73 1875.46 1913.37 8 -1008.16 2020.31 2029.79
raf -921.52 1865.16 1917.58 11.06 -926.74 1863.48 1887.18 5 -971.98 1947.95 1957.43
mek -710.24 1450.37 1521.20 14.94 -725.23 1460.47 1484.16 5 -785.21 1574.41 1583.89
erk -991.42 1997.43 2032.01 7.30 -997.54 1999.07 2008.55 2 -997.54 1999.07 2008.55
akt -865.88 1753.80 1806.03 11.02 -880.65 1771.30 1794.99 5 -912.43 1828.85 1838.33
: -9780.28 19786.78 20322.79 113.11 -9879.15 19892.29 20209.81 67 -10690.40 21424.77 21529.04
Table A1: Marginal log-likelihood, AIC and BIC of the fitted margins. For the fitted kernel density estimates and for the fitted Gaussian mixtures margins the number of (effective) parameters is displayed

We see that all three measures are extremely similar for the fitted Gaussian mixture margins and kernel density estimates. For the log-likelihood and the AIC the kernel density estimates are slightly better whereas for the BIC the Gaussian mixture margins marginally outperform kernel density estimates. This difference is due to the higher number of parameters necessary to fit kernel density estimates. Compared to these two approaches fitting Gaussian margins results overall in a worse fit.

A2: Exploring pairwise dependence under different marginal specifications

(a) Kernel density estimates
(b) Gaussian mixture margins
(c) Gaussian margins

Figure A8: Normalized contour plots in the lower left triangle, normalized scatter plots in the upper right triangle and histograms in the diagonal elements after applying the PIT using the different fitted margins to the experiment

A3: Copula fits for the copula based SEM

Node Order Log-lik. AIC BIC Order Log-lik. AIC BIC Order Log-lik. AIC BIC
plc pip3 126.09 -209.26 -107.55 pip3 125.75 -209.73 -110.76 pip3 57.72 -113.43 -108.69
pip2 pip3, plc 293.69 -509.66 -325.46 pip3, plc 290.73 -498.59 -302.22 pip3 88.80 -175.61 -170.87
pkc pip2 1.82 -1.64 3.09 pip2 1.84 -1.68 3.06 pip2 1.73 -1.46 3.28
pka pkc 1.08 -0.16 4.58 pkc 1.13 -0.26 4.48 0 0 0
p38 pkc, pka 519.59 -982.02 -846.57 pkc, pka 515.32 -977.28 -850.79 pkc 354.14 -706.28 -701.54
jnk pkc, pka 86.66 -124.26 -7.99 pkc, pka 84.79 -128.34 -30.61 pkc 15.54 -29.08 -24.34
raf pka, pkc 20.00 -6.88 71.59 pka, pkc 19.32 -9.59 59.28 pka, pkc 2.15 -0.30 9.18
mek raf, pkc, pka 310.82 -574.83 -463.88 raf, pkc, pka 308.73 -571.78 -463.55 raf, pkc 232.86 -461.72 -452.24
erk pka 169.47 -302.22 -215.22 pka 167.42 -297.22 -208.06 pka 29.16 -56.33 -51.59
akt erk, pka 787.00 -1517.80 -1384.65 erk, pka 777.31 -1495.02 -1353.81 erk, pka 715.74 -1427.47 -1417.99
: 2316.23 -4228.73 -3272.06 2292.34 -4189.49 -3252.98 1497.84 -2971.68 -2914.80
Table A2: Log-likelihood of the copula terms, AIC, BIC and order of the D-vine for each node in the D-vine regression models
Effective Copula
(a) MC Pair copula Family # parameters log-lik. AIC BIC Est. Ken.
plc plc pip3 tll 21.46 126.09 -209.26 -107.55 0.24
pip2 pip2 pip3 tll 19.77 175.60 -311.66 -217.98 0.05
pip2 plc; pip3 tll 19.10 118.10 -198.00 -107.48 0.29
pkc pkc pip2 clayton 1.00 1.82 -1.64 3.09 -0.01
pka pka pkc frank 1.00 1.08 -0.16 4.58 -0.03
p38 p38 pkc tll 27.58 517.89 -980.62 -849.91 0.60
p38 pka; pkc joe 1.00 1.70 -1.40 3.34 -0.01
jnk jnk pkc tll 23.53 85.59 -124.11 -12.58 0.15
jnk pka; pkc gumbel 1.00 1.08 -0.15 4.59 0.00
raf raf pka tll 15.56 18.74 -6.36 67.37 0.05
raf pkc; pka clayton 1.00 1.26 -0.52 4.22 0.01
mek mek raf tll 21.41 304.00 -565.19 -463.72 0.48
mek pkc; raf gaussian 1.00 4.54 -7.09 -2.35 -0.06
mek pka; raf pkc gumbel 1.00 2.28 -2.55 2.19 0.01
erk erk pka tll 18.36 169.47 -302.22 -215.22 0.10
akt akt erk tll 26.10 664.89 -1277.59 -1153.92 0.67
akt pka; erk bb8 2.00 122.11 -240.21 -230.73 0.26

Effective Copula
(b) MC Pair copula Family # parameters log-lik. AIC BIC Est. Ken.
plc plc pip3 tll 20.88 125.75 -209.73 -110.76 0.24
pip2 pip2 pip3 tll 20.32 174.14 -307.64 -211.35 0.05
pip2 plc; pip3 tll 21.12 116.59 -190.95 -90.87 0.29
pkc pkc pip2 clayton 1.00 1.84 -1.68 3.06 -0.01
pka pka pkc frank 1.00 1.13 -0.26 4.48 -0.03
p38 p38 pkc tll 25.69 513.63 -975.89 -854.14 0.60
p38 pka; pkc joe 1.00 1.69 -1.39 3.35 -0.01
jnk jnk pkc tll 19.62 83.69 -128.13 -35.14 0.15
jnk pka; pkc gumbel 1.00 1.10 -0.21 4.53 0.00
raf raf pka tll 13.53 18.11 -9.16 54.97 0.05
raf pkc; pka clayton 1.00 1.21 -0.43 4.31 0.01
mek mek raf tll 20.84 301.89 -562.11 -463.35 0.48
mek pkc; raf gaussian 1.00 4.67 -7.33 -2.59 -0.06
mek pka; raf pkc gaussian 1.00 2.17 -2.34 2.39 0.01
erk erk pka tll 18.81 167.42 -297.22 -208.06 0.10
akt akt erk tll 27.80 652.93 -1250.27 -1118.54 0.67
akt pka; erk bb8 2.00 124.38 -244.75 -235.27 0.26

(c) MC Pair copula Family log-lik. AIC BIC Par. Est. Ken.
plc plc pip3 gaussian 57.72 -113.43 -108.69 0.36 0.24
pip2 pip2 pip3 gaussian 88.80 -175.61 -170.87 0.44 0.05
pkc pkc pip2 gaussian 1.73 -1.46 3.28 0.06 -0.01
p38 p38 pkc gaussian 354.14 -706.28 -701.54 0.75 0.60
jnk jnk pkc gaussian 15.54 -29.08 -24.34 0.19 0.15
raf raf pka gaussian 1.08 -0.15 4.59 0.05 0.05
raf pkc; pka gaussian 1.08 -0.15 4.59 0.05 0.01
mek mek raf gaussian 227.58 -453.17 -448.43 0.65 0.48
mek pkc; raf gaussian 5.27 -8.55 -3.81 -0.11 -0.06
erk erk pka gaussian 29.16 -56.33 -51.59 0.26 0.10
akt akt erk gaussian 530.82 -1059.63 -1054.89 0.85 0.67
akt pka; erk gaussian 184.92 -367.84 -363.10 0.60 0.26

Table A3: Summary of all copulas fitted in the different D-vine regression models
(a) MC
(b) MC
(c) MC
Figure A12: Normalized contour plots for data set under three marginal specifications


Supplementary Material

S1: Comparison of linear Gaussian Bayesian network and structural equation model based on a Gaussian copula with Gaussian margins