A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

12/12/2017
by   Seokhyun Chung, et al.
0

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether regression assumptions are met. Most traditional approaches require human decisions at this step, for example, the user adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring that the majority of the important regression assumptions are met. When no subset satisfies all of the considered regression assumptions, our model provides an alternative subset that satisfies most of these assumptions. Computational results show that our model yields better solutions (i.e., satisfying more regression assumptions) compared to benchmark models while maintaining similar explanatory power.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2017

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression is to choose a subset of ...
research
07/26/2020

Empirical Likelihood Estimation for Linear Regression Models with AR(p) Error Terms

Linear regression models are useful statistical tools to analyze data se...
research
07/23/2020

Nonparametric Tests in Linear Model with Autoregressive Errors

In the linear regression model with possibly autoregressive errors, we p...
research
11/15/2021

Conditional Linear Regression for Heterogeneous Covariances

Often machine learning and statistical models will attempt to describe t...
research
02/21/2014

Important Molecular Descriptors Selection Using Self Tuned Reweighted Sampling Method for Prediction of Antituberculosis Activity

In this paper, a new descriptor selection method for selecting an optima...
research
01/19/2017

Parameter Selection Algorithm For Continuous Variables

In this article, we propose a new algorithm for supervised learning meth...
research
02/24/2020

Multi Linear Regression applied to Communications systems Analysis

This paper develops a propagation model of electromagnetic signals emitt...

Please sign up or login with your details

Forgot password? Click here to reset