Multivariate Conditional Transformation Models

06/07/2019 ∙ by Nadja Klein, et al. ∙ Humboldt-Universität zu Berlin 0

Regression models describing the joint distribution of multivariate response variables conditional on covariate information have become an important aspect of contemporary regression analysis. However, a limitation of such models is that they often rely on rather simplistic assumptions, e.g. a constant dependency structure that is not allowed to vary with the covariates. We propose a general framework for multivariate conditional transformation models that overcomes such limitations and describes the full joint distribution in simple, interpretable terms. Among the particular merits of the framework are that it can be embedded into likelihood-based inference and allows the dependence structure to vary with the covariates. In addition, the framework scales beyond bivariate response situations, which were the main focus of most earlier investigations. We illustrate the application of multivariate conditional transformation models in a trivariate analysis of childhood undernutrition and demonstrate empirically that even complex multivariate data-generating processes can be inferred from observations.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

By definition, univariate regression models involve only a single response variable. However, many research questions involve two or more response variables that are expected to be associated with each other. The separate analysis of each of these variables may lead to model misspecification, biased estimators and information loss. A more appropriate approach in this context is a multivariate regression model, in which all response variables are modeled jointly, conditional on a set of covariates.

Among various multivariate regression models, the most commonly used one is the seemingly unrelated regression (SUR) model, which uses a vector of correlated normal error terms to combine several linear model regression specifications with a common correlation structure that does not depend on any of the covariates 

(Zellner, 1962). Lang et al. (2003) extended SUR models by replacing the frequently used linear predictor with a structured additive predictor that combines nonlinear, spatial and random effects while retaining the assumption that the correlation structure does not depend on the covariates. Multivariate probit models use a latent SUR model for a multivariate set of latent utilities that, via thresholds, are transformed to the observed binary response vector  (Heckman, 1978). The approach of Klein, Kneib, Klasen and Lang (2015) embeds bivariate SUR-type specifications into generalised additive models for location, scale and shape (GAMLSS, Rigby and Stasinopoulos, 2005) by allowing all distribution parameters to be related to additive predictors.

Beyond SUR-type models, copulas provide a flexible approach to the construction of multivariate distributions and regression models. As a major advantage, the model-building process is conveniently decomposed into the specification of the marginals, on the one hand, and the selection of an appropriate copula function that fully defines the dependence structure, on the other hand (see Joe, 1997; Nelson, 2006, for reviews on copula models and their properties). The copula modeling framework allows a high level of flexibility in terms of model formulation, as arbitrary margins can be linked together using copula functions.

Analogous to GAMLSS, bivariate copula models with parametric marginal distributions, one-parameter copulas and semiparametric specifications for the predictors of all parameters of both the marginal and the copula jointly have been developed, by Marra and Radice (2017) using a penalised likelihood framework and by Klein and Kneib (2016) using a Bayesian framework. Following these lines, Marra and Radice (2019) recently extended the framework to copula link-based survival models. Alternatives to simultaneous estimation are two-step procedures that first estimate the marginals and then the copula given the marginals and have been proposed by e.g.  Vatter and Chavez-Demoulin (2015) and Yee (2015). However, all of these approaches are limited to the bivariate case.

Nonparametric attempts to simultaneously study multivariate response variables have been reported in the context of multivariate quantiles. Because no natural ordering exists beyond univariate settings, definitions of multivariate quantiles are challenging and there has been considerable debate regarding their desirable properties

(see Serfling, 2002, for an introduction to the different definitions). For example, one group of definitions draws on the concept of data depths (see for example Mosler, 2010), utilising options for multivariate depth functions based on distances such as Mahalanobis and Oja depths or weighted distances or on half-spaces. For more information on depth functions and multivariate quantiles, we refer the reader to Chernozhukov et al. (2017) and Carlier et al. (2016, 2017).

To address the lack of a coherent approach to multivariate regression analysis, we propose a novel class of multivariate conditional transformation models (MCTMs). These allow direct estimation and inference of the multivariate conditional cumulative distribution function (CDF)

of a -dimensional response vector given covariate information under rather weak assumptions. A key feature of MCTMs is that they extend likelihood-based inference in univariate conditional transformation models (CTMs, Hothorn et al., 2018) to the multivariate situation in a natural way. As in the case of copulas, a feature of the model specification process is that joint distributions are constructed by their decomposition into marginals and the dependence structure. Unlike in many existing copula models, however, model estimation is performed simultaneously for all model components, thus avoiding the need for two-step estimators. One of the major advantages of our approach is that it neither requires strong parametric assumptions (unlike in multivariate GAMLSS) nor separates the model estimation process into local properties (unlike in multivariate quantiles). At the same time, model inference is conveniently embedded in the maximum likelihood framework, such that estimation is done simultaneously for all parameters of the model and theoretical results on optimality properties, such as consistency and asymptotic normality, can be derived as well. Gaussian copulas (Song, 2000) with arbitrary marginal distributions are treated as a special case in this paper, where both the marginal distributions and the correlation parameters of the copula can be covariate-dependent. The method scales well to situations beyond the bivariate case () and readily allows determination of both the marginal distributions of subsets of the response vector and the conditional distributions of some response elements, given the others.

The paper is structured as follows: Section 2 provides details on the specification of multivariate transformation models for the unconditional case of absolutely continuous responses. Likelihood-based inference, optimality properties and extensions to discrete and censored responses are treated in the subsequent Section 3. Section 4 considers covariate-dependent multivariate conditional transformation models, illustrated by a trivariate analysis of childhood undernutrition indicators. Section 5 summarises the simulation-based evidence on the model’s performance. Finally, Section 6 proposes directions for future research.

2 Multivariate Transformation Models

2.1 Basic Model Setup

First, unconditional transformation models are developed for the joint multivariate distribution of a -dimensional, absolutely continuous random response vector with density and CDF . These unconditional models are then extended to the regression case in Section 4.

The key component of multivariate transformation models is an unknown, bijective, strictly monotonically increasing transformation function . This function maps the vector , whose distribution is unknown and is estimated from data, to a set of

independent and identically distributed, absolutely continuous random variables

with an a priori defined distribution , such that

For an absolutely continuous distribution with log-concave density , it can easily be shown that a unique, monotonically increasing transformation function exists for arbitrary, absolutely continuous distributions of (Hothorn et al., 2018). Thus, the model class is effectively limited only by the flexibility of the specific choice of in an actual model. As a default for

, for reasons of convenience we consider the standard normal distribution (

i.e.  with and ) but discuss alternative choices in Section 2.5.

Under this transformation model, the task of estimating the distribution of simplifies to the task of estimating . Because is strictly monotonically increasing in each element, it has a positive definite Jacobian, ı.e.


The density of implied by the transformation model is then

However, in this generality, the model is cumbersome. Thus, in the following, we introduce simplified parameterisations of that lead to interpretable models.

2.2 Models with Recursive Structure

In a first step, we impose a triangular structure on the transformation function by assuming

i.e. the th component of the transformation function depends only on the first elements of its argument . Consequently, this model formulation depends inherently on the ordering of the elements in . Because any multivariate distribution can be factored into a sequence of conditional distributions, the triangular structure does not pose a limitation in the representation of any -dimensional continuous distribution, as long as the transformation functions are appropriately chosen. Furthermore, the triangular structure of considerably simplifies the determinant of the Jacobian (2), which reduces to

In a second step, we assume that the triangulary structured transformation functions are linear combinations of marginal transformation functions , ı.e.

where each increases strictly monotonically and for all to ensure the bijectivity of . Because the last coefficient, , cannot be separated from the marginal transformation function , we use the restriction . Thus, parameterisation of the transformation function finally reads


Each of the marginal transformation functions includes an intercept, such that no additional intercept term can be inserted in (3). The Jacobian of further simplifies to

and the model-based density function for is therefore

Summarising the model’s specifications, our multivariate transformation model is characterised by a set of marginal transformations , , each applying to only a single component of the vector , and by a lower triangular matrix of transformation coefficients

Under the standard normal reference distribution , the coefficients in characterise the dependence structure via a Gaussian copula (see Section 2.3) while the marginal transformation functions allow the generation of arbitrary marginal distributions for the components of .

2.3 Relation to Gaussian Copula Models

The relationship between multivariate transformation models and Gaussian copulas can be made more precise by defining random variables . Under a standard normal reference distribution , the vector follows a zero mean multivariate normal distribution with . As a consequence, the elements of are marginally normally distributed as

, where the variances

can be determined from the diagonal elements of . Furthermore, this implies that the entries in determine the conditional independence structure between the transformed responses (and therefore, implicitly, also the observed responses ) because

where is the precision matrix of , and and denote the vectors of the observed and transformed responses, excluding elements and .

For the transformation functions , the explicit representation


is obtained, where is the univariate marginal CDF of . In summary,

and therefore the CDF of has exactly the same structure as a Gaussian copula, except that our representation relies on a different parameterisation of through . This is compensated for by the inclusion of univariate Gaussians with variances different from those acting on the marginals. However, because


the two formulations are exactly equivalent.

2.4 Model Implied Marginal and Conditional Distributions

The relationship of unconditional transformation models and Gaussian copulas can now be employed to facilitate the derivation of model-implied marginal and conditional distributions. The univariate marginal distributions of elements are given by

but more general versions are also easily obtained. Assume that the random vector is partitioned into and , where is a non-empty set of indices and is its complement, consisting of all indices not contained in . The vectors and can be similarly defined. The marginal distribution of is then given by , where is the submatrix of containing the elements related to the subset . We can therefore deduce both the marginal CDF and the density of as

where and


We use a similar process for the conditional distribution of , given , and note that

From the rules for multivariate normal distributions we furthermore obtain


where and denote the sub-blocks of corresponding to the respective index sets. Therefore, the conditional density of given is

Finally, using the marginal CDFs and densities, the marginal quantiles or moments can be derived. The latter can be computed by solving simple univariate numerical integrals, for example:

for the marginal means or for the marginal variances.

2.5 Alternative Reference Distributions

Although we have discussed our model specification in the context of a normal reference distribution and a Gaussian copula, these choices can be readily modified. In particular, if a reference distribution is chosen, the transformation function has to be modified to

We therefore obtain independent random variables . The model then implies marginal distributions

Attractive alternative choices for the reference distribution, including for the regression setup, are and

, because regression coefficients can be interpreted as log-odds ratios and log-hazard ratios (in fact, in the latter case, the marginal model is then a Cox proportional hazards model) respectively

(see Hothorn et al., 2018, for a discussion of models in the univariate context).

Extensions beyond the Gaussian copula structure are also conceivable when the linear combination of marginal transformations is replaced by nonlinear specifications. However, those types of models easily lead to identification problems and do not provide direct links to existing parametric copula classes. Accordingly, we leave this topic for future research.

3 Transformation Analysis

This section defines the maximum likelihood estimator and establishes its consistency and asymptotic normality based on suitable parameterisations of the marginal transformation functions .

3.1 Inference

Following Hothorn et al. (2018), the marginal transformation functions are parameterised as linear combinations of the basis-transformed argument , such that . The -dimensional basis functions with their corresponding derivative are problem-specific. Thus, for continuous responses , the marginal transformation functions and therefore also the plug-in estimators of the marginal CDF should be smooth with respect to , such that in principle any polynomial or spline basis is a suitable choice for . The empirical results of Section 4 rely on Bernstein polynomials. The basis functions

are then densities of beta distributions, a choice that is computationally appealing because strict monotonicity can be formulated as a set of linear constraints on the components of the parameters

, see Curtis and Ghosh (2011); Farouki (2012) for details and the discussion in Hothorn et al. (2018) for further options.

In the following, we denote the set of parameters describing all marginal transformation functions as , while contains all unknown elements of and comprises all unknown model parameters. The parameter space is denoted as , where

is the space of all strictly monotonic triangular transformation functions. Consequently, the problem of estimating the unknown transformation function , and thus the unknown distribution function , reduces to the problem of estimating the parameter vector . With the construction of multivariate transformation models, this is conveniently achieved using likelihood-based inference. For , the log-likelihood contribution of a given datum , is

with corresponding score contributions


for , (and zero otherwise) and with . We furthermore define as the th contribution to the observed Fisher information. Explicit expressions for the entries are given in Appendix A. Note that, despite the estimation of a fairly complex multivariate distribution with a Gaussian copula dependence structure and arbitrary marginals, the log-likelihood contributions have a very simple form. In addition, the log-concavity of ensures the concavity of the log-likelihood and thus the existence and uniqueness of the estimated transformation function .

Definition 1.

(Maximum likelihood estimator.) The maximum likelihood estimator for the parameters of a multivariate transformation model is given by


Based on the maximum likelihood estimator , maximum likelihood estimators for the marginal and joint CDFs are also obtained, by plugging in . Specifically, the estimated marginal CDFs are given by , where is the th diagonal entry of . The estimated joint CDF reads

3.2 Parametric Inference

The direct extension of univariate likelihood-based inference for transformation models (Hothorn et al., 2018) to the multivariate case discussed herein also allows asymptotic results of the former to be applied to the latter situation. Assume where denotes the true parameter vector, then the following assumptions are made:

  • The parameter space is compact.

  • , where

  • is nonsingluar.

  • , , .

Corollary 1.

Assuming (A1)–(A2), the sequence of estimators

converges in probability

for .

The proof of Corollary 1 follows from Theorem 5.8 of Vaart (1998).

Corollary 2.

Assuming (A1)–(A5), the sequence of estimators is asymptotically normally distributed with covariance matrix


By further assumption, is continuously differentiable in for all and

is continuous in , due to Eqs. (7),(8). Thus, by Lemma 7.6 of Vaart (1998), is differentiable in quadratic mean. Furthermore, based on Assumptions (A3)–(A5) and Corollary 1 the claim follows from Theorem 5.39 of Vaart (1998). ∎

3.3 Parametric Bootstrap

The asymptotic results allow, at least in principle, the derivation of confidence intervals also for transformed model parameters, by using the Delta-rule. However, many quantities of practical interest, such as the correlation matrix of the implied Gaussian copula or the densities of marginal distributions, are indeed highly nonlinear functions of the parameter vector

. Accordingly, a parametric bootstrap is a more promising alternative.

Because the proposed multivariate transformation models allow a direct evaluation of estimated joint CDFs, drawing bootstrap samples from the joint distribution is straightforward. For and with estimated marginal transformation functions and estimated covariance matrix , the parametric bootstrap can be implemented by Algorithm 1.

Given and : adsfffffffffffffff

1:for  do Loop over bootstrap samples
2:     Generate
3:     Compute
4:     Compute recursively
5:     Re-fit the model to obtain .
6:     Compute summaries using ,
Algorithm 1 Parametric bootstrap

The inverse exists because the estimated marginal distribution function is strictly monotonically increasing. For simple basis functions (for example, linear functions), the inverse can be computed analytically. For more complex basis functions, numerical inversion has to be applied.

3.4 Extensions to Discrete and Censored Observations

Conceptually, our framework easily carries over to multivariate random vectors that are discrete or censored. In particular, both discrete and censored data can be interpreted as incomplete information , where, instead of exact observations , only the upper and lower boundaries and respectively are observed. For discrete data, the underlying rationale would be that discrete realisations are obtained by discretisation from an underlying continuous process. For censored data, the interval boundaries result from the censoring mechanism, and random right censoring, left censoring as well as interval censoring can be handled by appropriate choices of and .

For , the log-likelihood contributions are then given by

where and . Numerical approximations need to be applied in evaluating these likelihood contributions. For , the quasi-Monte-Carlo algorithm by Genz (1992) seems especially appropriate because it relies on the Cholesky factor of the precision matrix rather than the covariance matrix . Nonetheless, the simplicity and explicit structure of the score contributions from the previous sections do not carry over to these more general cases, and the numerical evaluation of the log-likelihood becomes more demanding.

4 Extensions to Multivariate Regression

4.1 Multivariate Conditional Transformation Models

Multivariate regression models, i.e. models for conditional multivariate distributions given a specific configuration of covariates, can be derived from the unconditional multivariate transformation models introduced in Section 2. The transformation function has to be extended to include a potential dependency on covariates , and the corresponding joint CDF is defined by a conditional transformation function . By extending the unconditional transformation function (3), we define the components of a multivariate conditional transformation function given covariates as

where and are again expressed in terms of basis function expansions.

For the marginal (with respect to the response ) conditional (given covariates transformation functions, this leads to a parameterisation

where the basis functions , in general, depend on both element of the response and the covariates . These can, for example, be constructed as a composition of the basis functions for only from the previous section combined with a basis depending exclusively on . Specifically, a purely additive model results from

, and a flexible interaction from the tensor product

. A simple linear transformation model for the marginal conditional distribution can be parameterised as

with parameters . The model restricts the impact of the covariates to a linear shift. For arbitrary choices of , the marginal distribution, given covariates , is then

and, consequently, the regression coefficients can be directly interpreted as marginal log-odds ratios () or log-hazard ratios (; this is a marginal Cox model). Details of the parameterisations for the marginal transformation functions and a discussion of the practical aspects in different areas of application are provided in Hothorn et al. (2018).

For practical applications, an important and attractive feature of multivariate transformation models for multivariate regression is the possible dependency of on covariates . Thus, the dependence structure of potentially changes as a function of , if suggested by the data. This feature is implemented by covariate-dependent coefficients of . A simple linear model of the form

is one option. The case implies that the correlation between and does not depend on . More complex forms of additive models would also be suitable. Of course, the number of parameters grows quadratically in , such that models that are too complex may require additional penalisation terms in the likelihood.

4.2 Application: Trivariate Conditional Transformation Models for Undernutrition in India

To demonstrate several practical aspects of the parameterisation and interpretation of MCTMs, we present a trivariate analysis of undernutrition in India in the following. Childhood undernutrition is among the most urgent problems in developing and transition countries. A rich database available from Demographic and Health Surveys (DHS,

) provides nationally representative information about the health and nutritional status of populations in many of those countries. Here we use data from India that were collected in 1998. Overall, the data set comprised 24,316 observations, after the removal of outliers and inconsistent observations. We used three indicators,

stunting, wasting and underweight, as the trivariate response vector, where stunting refers to stunted growth, measured as an insufficient height of a child with respect to his or her age, while wasting and underweight refer to insufficient weight for height and insufficient weight for age respectively. Hence stunting is an indicator of chronic undernutrition, wasting reflects acute undernutrition and underweight reflects both. Our aim was to model the joint distribution of stunting, wasting and underweight conditional upon the age of the child.

Model Specification.

The focus of our analysis was the variation in the trivariate undernutrition process with respect to the age of the child (in months). Based on , we parameterised the marginal conditional transformation function as

The basis functions were Bernstein polynomials of order eight. We modelled a potentially nonlinear shift in age by the term using a Bernstein polynomial of order three. The coefficients of were parameterised as

where were the same Bernstein polynomial bases of three. The parameters were estimated under the constraint , where is a difference matrix. This leads to monotonically increasing estimated marginal transformation functions (Hothorn et al., 2018). No such shape constraint was applied to functions of age, i.e. the parameters and were estimated freely.

Results for Marginal Distributions.

Figure LABEL:fig:margins depicts the estimated marginal conditional CDFs , with the different colours indicating the ages of the children. Clearly, the shapes of the margins differ for the three indicators and change with the increasing age of the children. A shift to the left in the margins representing older ages indicates a higher risk of lower undernutrition scores. All of the distributions, but especially those of wasting, are asymmetric, as with increasing age the lower tails can be seen to vary less strongly than the upper tails.

Results for the Dependence Structure.

Figure LABEL:fig:undernutrition depicts the pairwise conditional correlations between stunting, wasting and underweight as functions of age along with the point and interval estimates obtained from parametrically drawn bootstrap samples (see Algorithm