Using relative weight analysis with residualization to detect relevant nonlinear interaction effects in ordinary and logistic regressions

06/26/2021
by   Maikol Solís, et al.
Universidad De Costa Rica
0

Relative weight analysis is a classic tool for detecting whether one variable or interaction in a model is relevant. In this study, we focus on the construction of relative weights for non-linear interactions using restricted cubic splines. Our aim is to provide an accessible method to analyze a multivariate model and identify one subset with the most representative set of variables. Furthermore, we developed a procedure for treating control, fixed, free and interaction terms simultaneously in the residual weight analysis. The interactions are residualized properly against their main effects to maintain their true effects in the model. We tested this method using two simulated examples.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/03/2019

Semi-parametric Bayesian variable selection for gene-environment interactions

Many complex diseases are known to be affected by the interactions betwe...
07/18/2018

A pliable lasso for the Cox model

We introduce a pliable lasso method for estimation of interaction effect...
02/12/2016

Detection of Cooperative Interactions in Logistic Regression Models

An important problem in the field of bioinformatics is to identify inter...
02/15/2022

REPID: Regional Effect Plots with implicit Interaction Detection

Machine learning models can automatically learn complex relationships, s...
07/30/2020

Decomposition of the Total Effect for Two Mediators: A Natural Counterfactual Interaction Effect Framework

Mediation analysis has been used in many disciplines to explain the mech...
10/17/2019

Ranking variables and interactions using predictive uncertainty measures

For complex nonlinear supervised learning models, assessing the relevanc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Traditional fit in ordinary or logistic regressions estimates the best set of parameters for a dataset. We executed another analysis in tandem, like hypothesis testing about the nullity of coefficients; estimation of confidence intervals; check the distribution of the residuals; or checking the variability of the fit along with the data. These checks are mandatory in modern statistics.

We can also check their influence on the model depending on the number of variables in the data. Here, we refer to the capacity of the variables to significantly impact the output. Non-influential variables should be excluded from the analysis. This analysis helps simplify the problem, analyze the results better, and choose better decisions.

General solutions to this analysis were proposed. Some examples use principal component analysis, stepwise regression, lasso, ridge, or elastic net regression HastieElements2011. However, these can detect and remove non-influential variables, but they hinder the impact of removing or adding this variable with respect to the other variables.

For a detailed measurement of the influence of each variable, we can mention measures such as zero-order correlations, standardized regression weights, and semi-partial correlations JohnsonHistory2004. However, if multicollinearity exists among the variables, these measures are inadequate. Other techniques, such as dominance analysis BudescuDominance1993 and residual weight analysis, have arisen to capture the complexity of a model JohnsonHeuristic2000. The dominance analysis and residual weights analysis aim to estimate the variable’s importance with respect to the output when multiple correlated predictors exist LebretonMonte2004.

In this paper, we focus on the residual weight analysis originally proposed by JohnsonHeuristic2000. Using ordinary least squares (OLS) regression, the technique creates a new set of predictors that are orthogonal representations of the original ones. In this new space, the importance standard score for each variable is estimated. Then, the scores return to the original space of variables through a transformation matrix.

The method OLS regression method uses assumptions such as homoscedasticity, linearity and residual normality. However, in areas such as sociology or psychology, the dependent variable is categorical, and thus a logistic approach is necessary. All the aforementioned assumptions are violated in this case. The work in TonidandelDetermining2010 proposed a solution to this problem by estimating residual weights for logistic regression. The solution creates a new orthogonal space and estimates the standardized scores with the standard deviation of the logistic response variable.

With interactions, the work of LeBreton2013 proposed residualizing the interaction terms to account for only the true effect among the variables. Thus, they executed a local regression where the dependent variable is the interaction term and the covariates are the main effects. Then, the residual error captures the pure effect between variables.

Regarding the type of model we use (OLS or logistic), the mentioned techniques intrinsically depend on the underlying way we adjust the predictors. The usual linear regression works for most problems. However, if the data have nonlinear structures; the linearity assumption is insufficient. Therefore, it can be extended using a nonlinear smoother.

In this work, we perform a relative weight analysis where the dependent variable is either continuous or binary using restricted splines to model the predictors. Even if we have a clear knowledge about the main effects of the model, the interactions remain an obscure part of the problem, making it difficult to match two covariates that provide the most possible information. Thus, we add the most relevant interactions terms to the regression to capture all the information not seen by the main effects. Our procedure includes all the interactions among the main effects and then searches for the best subset through a stepwise selection process. Even if there are some criticisms of the stepwise technique MillerSubset2002,HarrellRegression2015, we get accurate results.

To build a flexible procedure adjusted to multiple needs, we will allow the use of control variables, fixed variables, free variables and pairwise interactions. The control variables are static and remain only as main effects; the fixed and free variables are used to create interactions. The difference is the former ones remain in the final model, while the later ones can be removed in the stepwise process.

The remainder of this paper is organized as follows. Section 2 presents a review of the preliminaries on setting regression models with restricted splines. In Section 3 resides the core of the paper, where we explain all the details to build a procedure of the Relative Weight Analysis with residualization. We devote Section 4 to testing the algorithm capabilities using two simulated examples. Finally, in Section 5, we present the conclusions with some discussion about future lines of research.

2 Preliminaries

In a generalized linear model, we can link a set of input variables to an output variable through function . We also have a set of observations for each -tuple of variables. Formally, we state the model as,

The parameters , are estimated by an iterative reweighed the least-squared procedure. For a detailed review of logistic regression; we refer the reader to HastieElements2011.

In this work, we will use two cases for the function

Gaussian:

The output variable and we set the link function as , and the

are independent and normally distributed.

Binomial:

The output variable is classified as 0 or 1 (

). The link function is defined as , where is a function defined in . The noise variable has an independent logistic distribution.

In the following sections, and to simplify the notation, we will use the parameters named as in different contexts without implying that they are the same. Each equation has exactly the parameters that we want to present, unless otherwise stated. Additionally, we will use only to denote either or . In numerical examples, we will retake the notation to differentiate each case.

2.1 Control, fixed, and free variables

We assume a generalized linear model with the form,

(1)

The model is formed by control variables, fixed variables and free variables.

The functions represents the smoother used for each variable. In the classic setting, for a variable we put . However, to allow a parsimonious structure to the model; we should to introduce nonlinear functions.

We use restricted splines to model the functions in Equation (1) Stone1985. These functions are linear at the endpoints and require fewer parameters than classic cubic splines. We call the number of knots used to define restricted spline functions. We identify three cases according to the desired level of smoothness for each variable. Theoretically, it is possible to set any arbitrary positive integer value of having the form,

for ,

The model’s complexity increases in the same direction as . To keep a sane number of parameters, the recommendation is to use equal to 3, 4 or 5. The exact positions of the ’s are defined as with the ’s defined in Table 1 for a given number of knots .

Quantile level ()
3 0.1 0.5 0.9
4 0.05 0.35 0.65 0.95
5 0.05 0.275 0.5 0.725 0.95
Table 1: Quantile levels () defining the knot positions for cubic restricted spline functions.

3 Methodology

In this section, we will explain a series of steps to determine the most relevant variables in a model combining the classic relative weight analysis, nonlinear smoother with restricted cubic spline and residualization of interactions. We can summarize our procedure in three steps:

  1. Select the best submodel with given a set of control, free, fixed and interaction terms.

  2. Residualize all the interaction terms to remove the effects of the main variables and keep only the pure interaction effect.

  3. Apply the relative weight analysis to detect which variables are the most significant.

The steps are explained thoroughly in the following sections.

3.1 Model selection with interactions

Given a model (Gaussian or logistic) and once defined the structure of the variables (linear or nonlinear), we will define the interactions pairwise between the variables. We will use restricted interaction multiplication.

The restricted interaction will remove doubly nonlinear terms, allow us to remove no essential terms. For example, assume we decide to model and with a 3 knot spline. The interaction between and is denoted by

for some constants , and . A similar pattern follows the 4 knots case.

Setting a full set of interaction with variables, we should fit interactions. For a model with k-knots spline functions, we will need to fit distinct parameters. For example, if we have variables, for a full interaction model with a 3-knots case, we will need to fit parameters. If we decide to use the 4-knots case, the number increases to . Any model with such many parameters is inadequate. The overfitting leads to erroneous results, especially on small samples.

To solve this issue, we opted by selecting a submodel from the full model. We use the classic stepwise method based on the change of Bayesian Information Criterion (BIC). Recall that . Here represents the log-likelihood of the submodel and the variables used to fit it. In the procedure, we search to minimize the . The factor strongly penalize large models unless the value is less than .

In our context, the presents some advantages over the Akaike Information Criterion (AIC). The AIC is estimated with . The criterion penalizes the submodel to exclude unnecessary variables as well as BIC using the factor . However, it smooths the selection, allowing more variables inside the final model. Such models are appropriate for prediction instead to be parsimonious [see][]DziakSensitivity2020. We focus on the method in the inference of the most relevant features of the data, considering the BIC a better value for the stepwise selection process.

In our implementation, we start with a model using only main effects. Then, we continue adding or removing main effects or interactions until the algorithm cannot improve the BIC value. The implementation of the procedure was taken from the package MASS VenablesModern2002 using the function stepAIC. To use the BIC, as mentioned before, we set the parameter k = log(n). The function allows to include a parameter scope that consists of a list with lower and upper models. Given the structure of Equation (1), we define the lower and upper models as,

Lower Model
Upper Model

The implementation include in the lower model only the control and fixed variables. The upper model contains all the main effects and all the possible interactions.

After selecting the most relevant variables on the model we remain with the subsets of main effects and of interactions. Here and represent the number of main effects and interactions selected and and are their respective indices.

The final model after the stepwise selection is

(2)

Note finally that and . If then the model has only main effects. If then none of the interactions were added. For every interaction added, the main effects are also added automatically to preserve the hierarchical principle.

3.2 Residualized relative importance

Interactions play a key role in this work. We want to know if each interaction included, adds relevant information to the model. Otherwise, it is negligible compared to its main effects. Here, the interaction contains little information blurring other results in the model. The objective is to separate the relevant effects from the main ones and interactions.

Notice that in the case of simple linear interactions like , there are three effect types

  1. The effect solely from .

  2. The effect solely from .

  3. The effect solely from the interaction of .

Interactions with restricted splines, are handle it equally. Except that the effects are more diffused across the terms. For example, for a 3 knot spline, the interaction between and is

The three terms contain mixed information about and . Therefore, if we apply the relative weight analysis to this interaction, the different effects are blurred. By controlling first by the other variables, we isolate the pure interaction effect.

Suppose that we have a simpler model, where and are 3-knots restricted splines,

An extended way to state this model is also

The procedure proposed in LeBreton2013 is the following:

  1. Replace the higher order term with the residual obtained after regress the interaction with respect to their main effects.

    Here the variables represent the residual of the regressions. We perform the regression without an intercept to capture the full effects from the residual .

  2. Refit the model with the form

    (3)

    It means that the interactions are replaced by the residuals of the step before. We denote the residualized interaction term as

The main advantage of residualizing the interaction effects, we create a new set of interaction variables uncorrelated with their main effects. The procedure separates the true synergy between the variables and the main effects. However, the effects could be correlated with each other.

We apply this algorithm to all the restricted interactions of the reduced model. Therefore, the RWA was applied for the interactions over the residuals of an intermediate linear regression between the main effect.

3.3 Relative Weight Analysis

Relative Weight Analysis is tool used to separate the relevance of each variable or interaction. This technique creates a new set of predictors with are an orthogonal representation of the observed predictors (with one-to-one correspondence). Estimate the influence on this new space and then return the results to the original space. This way the problem presented with correlated models disappears given that the importance estimation occurs in this new space.

We are interested in the importance of the main effects versus the interactions. The design matrix can have multiple patterns. For example, the right side of equation (3). Here a mix of linear and nonlinear variables is present. We are interested in determining the effect of the whole variables instead of their components. Therefore, we will define the matrix as

Therefore, the matrix contains the control variables and all the effects and interactions resulting from the selection model procedure. Also, the interactions contain only information from the pure synergy between the variables due to the residualization.

The algorithm to estimate the relative weights is the following:

  1. Start with design matrix .

  2. Standardize the columns of .

  3. Estimate the singular value decomposition

    , where

    1. is the matrix of eigenvectors associated to

      .

    2. is the matrix of eigenvectors associated to .

    3. is the diagonal matrix of singular values, which is equivalent to the square root of eigenvalues of

      .

  4. Create the orthogonal version of using the SVD decomposition. The columns are one-to-one orthogonal representations of the columns of .

  5. Standardize the columns of .

  6. Estimate the fully standardized coefficient for the model . However, depending on the link function the steps differ.

    Gaussian:

    .

    1. Estimate the standardized ordinary least square coefficient of with the formula

      .

    Logistic:

    .

    1. Obtain the unstandardized coefficients of the model against the matrix . Here the could be calculated using a least square procedure.

    2. Compute which is.

      Alternatively, regress against and recover the .

    3. Estimate the standard deviation of the logit prediction,

      , where

    4. Estimate the fully standardized coefficients

      Here is the standard deviation of each column of .

  7. Create the link transformation .

  8. Estimate the relative weights with

The procedure projects the original columns of to orthonormal space called with a correspondence one-to-one in their columns. In this new space, the fully standardized coefficients of the model . Those are named as . A particular property is are the relative importance of the columns in . Given that is orthonormal, the weights refer to a uncorrelated variables set. To return those weights onto the original dimension space, we estimate a transformation matrix called . The matrix transforms the into the original scale . The sum of the are equal to the or

depending on the case. Finally, each variable has associated a percentage contribution to the explained variance.

4 Results

We will show the results of our algorithm. To test the algorithm, we will generate a sample of

of random variables determined by each example. In the simulated examples, we tested the capability of each method to determine the most relevant depending on if we are in the Gaussian or binomial case.

We follow these steps to generate two outputs variable and their respective dichotomized version , with a model with input variables :

  1. Generate a variable with the functional form of the variable. For example, .

  2. Apply the inverse of the logistic function, .

  3. Generate a Bernoulli random variable (

    ) with probability outcome

    .

  4. Repeat this procedure for each element of the sample. Call this vector

    .

The described step will generate a random variable Bernoulli with values and generate with the underlying model given by .

For testing, we will use three classic models: The Ishigami model, the -model, and a syntethic model of 15 variables by OakleyProbabilistic2004.

4.1 Ishigami function

The Ishigami function is classic to test the sensitivity and reliability variables in models. It presents a strong non-linearity and non-monotonicity. The order of the variables in terms of relevance are , and the interaction with . All other variables and interaction have zero theoretical relevance’s. The form of the model is

where for , and . We also set to add uncorrelated noise to the model.

In the work of IshigamiImportance1990, they split the model’s variance to determine the relevance of each predictor. They concluded in this model that the percentage of variance due to and the interaction are , and respectively. The other effects have 0% on their contribution to the variance.

We applied our procedure using the and cases. Table 2 presents the results for both cases. Notice how the variables , and the interaction, are captured correctly. For the three mentioned effects, we choose a restricted cubic spline with 5-knots. Recall that the model tested all the main effects and cross terms (, , , and so on). However, as we are using the BIC as a selection criterion, all the non-relevant terms were removed by the large penalty in the criterion.

In terms of goodness-of-fit, the while . The results are expected. The Ishigami model is continuous and from it, we create two outputs and . In the binary case we loss power to predict the variance given the change of the dependent variable. Even in the case, active effects were detected.

Variable Weight Type
Continuous output 34.52% Free
48.27% Free
0.09% Free
17.12% Interactions
Binary output 31.33% Free
53.25% Free
2.90% Free
12.52% Interactions
Table 2: Relative weights for the Ishigami model when using the continuous output and binary output . We use 5-knots restricted cubic splines as smoother.

To present the capabilities of the algorithm, we set two disfavorable scenarios for the Ishigami model. In these settings we will use only .

First, we set as fixed the variables , , , and . This setting will cause that the model is forced to fit 5 noisy variables with little information for . The model fit all the possible interactions, and removes those that worsen BIC. However, the variables will remain in all models.

The other scenario is when use a control variable. Here, the variable is in all models, but none interaction contain . This setting is useful when the study requires controlling by some variables, but you don’t want any further interference of it (e.g., gender, age, smoke, etc.).

Table 3 presents the values on the mentioned scenarios. In the upper part of the table we notice how the inclusion of the non-important variables does not affect the weights compared with Table 2. In fact, all variables present values between 0.04%-0.09%. If needed, the analyst can remove variables with weights in this order of magnitude without losing any inference power. The lower part presents the case when we control in the model. Notices how the model relies on the weights of and because we restricted the use of the interaction . In this case

Variable Weight Type
Fixed , , , and 35.57% Free
48.12% Free
0.06% Free
0.06% Fixed
0.05% Fixed
0.04% Fixed
0.09% Fixed
0.05% Fixed
15.96% Interaction
Controlled 42.69% Free
57.16% Free
0.15% Control
Table 3: Relative weights when fixing the variables and (upper) and when controlling by (lower). All the results were estimated with the continuous output

4.2 The Moon function

The work of MoonDesign2010,MoonTwoStage2012 established an algorithm to detect active and inactive variables on complex configurations. They propose a 20-dimensional function with 5 active main effects , , , , and 4 active effects , and the quadratic term .

Assuming , the explicit form of the model is,

They studied another complex configuration, including or removing the small terms and amplification of the active effects. To test if the procedure detects the relevant variables, we use the version where all the active effects were tripled while the small terms remain constant,

The small terms part consist of 189 elements between main, quadratic and interaction terms with low impact on the . The coefficients can be found in MoonDesign2010 (Table 3.11). The aim setting and is to compare the changes on the relative weights in disfavorable and favorable scenarios.

Table 4 presents the result using our algorithm with and . We estimated the absolute difference between both values to compare their change. The relative weights for give significantly weight to non-important variables like or . The behavior is due to the variable presents a more blurred information about the true active effects. However, the effects , , are well represented in the model. s

With all the non-important variables decrease their value. The true active ones increase their participation greatly. The variables , , and increase more than 10 percentage points with respect to the other model. Other terms like , , and , increase in a small amount.

The case for the quadratic term with is interesting. Its value is low in both scenarios. However, recall that we are using a restricted cubic spline with 5 knots to fit the mains and interaction effects. Additionally, given the residualization step, most of the participation of could be represented through the residualization process to extract all the effects from the interactions to the main present lower, the effect

We observe how the residual weights are more predominant for the case than the base. The result is expected because small terms are stronger in the base case than in the .

Relative weights
Variable (p.p.)
1.26% 1.38% 0.13
1.21% 0.13% -1.08
16.30% 2.97% -13.33
2.51% 0.36% -2.15
3.24% 0.56% -2.68
0.49% 0.05% -0.44
1.25% 12.21% 10.97
1.40% 0.22% -1.17
1.97% 0.26% -1.71
2.20% 14.29% 12.10
0.95% 0.13% -0.81
3.27% 0.71% -2.57
11.71% 1.69% -10.02
5.05% 0.95% -4.10
4.25% 0.56% -3.69
0.83% 2.62% 1.79
3.25% 0.51% -2.74
10.19% 11.42% 1.24
9.00% 10.18% 1.18
6.77% 8.32% 1.56
12.92% 30.46% 17.55
Table 4: Relative weights for the Moon model using the variable and . The absolute difference between both weights () is estimated in percentage points (p.p.). We use 5-knots restricted cubic splines as smoother.

5 Conclusion

In this paper, we explore a method to detect relevant variables using the relative weight analysis technique. The main contribution in our algorithm is the residualization of the interactions to capture the true effect of them. In a classic setting, the relative weights for the main effects and interactions are evaluated as is, without considering the real nature of the latter. In other words the interaction of two variables contains information on each variable separately and the remaining belongs to the true interaction effect. In this work, we emphasize this technique following the works of JohnsonHeuristic2000,TonidandelDetermining2010,LeBreton2013.

One aim was to create a flexible algorithm beyond to the classic linear model. To this end, we include the restricted cubic spline as smothers to determine the relationship between the inputs and outputs. This feature allows us to detect nonlinear patterns in the data, even in complex settings. More work should be done to make the procedure flexible to a wide set of cases.

We can find packages R performing Residual Weight Analysis like rwa ChanRwa2020 or flipRegression DisplayrFlipRegression2021. They rely on the analysis of main effects. In this study, we implemented a version that allows the user to set control, fixed, free and interactions in model. Besides, the predictors are modeled using restricted spline smothers. In this context, the analysis should be more accurate specially if the phenomenon is highly nonlinear. The great advantage with respect to other techniques is the analysts can include some variable in the model even if they are non-important. Or they can compare the weight between main effects on interactions to choose a particular model.

The stepwise procedure produced effective results, even if they are some points against it [e.g.][]MillerSubset2002,HarrellRegression2015. Other techniques to model selection were also considered like elastic nets ZouRegularization2005 or PLS regression MartensReliable2001,MevikPls2007. They represent interesting lines of research to follow in the future. In this study, we considered the implementation because of its simplicity. Stepwise regression combined with the Bayesian Information Criterion (BIC), is a fast and effective method to create the most relevant and simpler model. The BIC cut all the non-relevant variables in the first steps allowing the model to include those interactions adding some real value to the model. Other information criteria DziakSensitivity2020 can be explored in the future.

Finally, even if the results establish that our procedure identify correctly the relevant parts of our models, a proper validation must be done to check the dispersion of the relative weights. If we set a configuration over different realizations, the questions remaining are if the selected variables keep constant and how much the relative weights differ from each other. A proper validation of the results are beyond the scope of this paper, but, the work of [TonidandelDetermining2009] conducted an extensive study of simulation with bootstrap confidence intervals. The implementation of the technique will be a priority for future developments.

Declarations

Funding

The authors acknowledge the financial support from Escuela de Matemática de la Universidad de Costa Rica, through CIMPA, Centro de Investigaciones en Matemática Pura y Aplicada through the projects 821–B8-A25.

Code availability

All the calculations in this package were made using an own R-package residualrwa. The package is published on github on this address https://github.com/maikol-solis/residualrwa.

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.