Multifold Cross-Validation Model Averaging for Generalized Additive Partial Linear Models
Generalized additive partial linear models (GAPLMs) are appealing for model interpretation and prediction. However, for GAPLMs, the covariates and the degree of smoothing in the nonparametric parts are often difficult to determine in practice. To address this model selection uncertainty issue, we develop a computationally feasible model averaging (MA) procedure. The model weights are data-driven and selected based on multifold cross-validation (CV) (instead of leave-one-out) for computational saving. When all the candidate models are misspecified, we show that the proposed MA estimator for GAPLMs is asymptotically optimal in the sense of achieving the lowest possible Kullback-Leibler loss. In the other scenario where the candidate model set contains at least one correct model, the weights chosen by the multifold CV are asymptotically concentrated on the correct models. As a by-product, we propose a variable importance measure to quantify the importances of the predictors in GAPLMs based on the MA weights. It is shown to be able to asymptotically identify the variables in the true model. Moreover, when the number of candidate models is very large, a model screening method is provided. Numerical experiments show the superiority of the proposed MA method over some existing model averaging and selection methods.
READ FULL TEXT