Selective Sequential Model Selection

12/08/2015
by   William Fithian, et al.
0

Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two specific tests, the max-t test for forward stepwise regression (generalizing a proposal of Buja and Brown (2014)), and the next-entry test for the lasso. These tests improve on the power of the saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In addition, our framework extends beyond linear regression to a much more general class of parametric and nonparametric model selection problems. To select a model, we can feed our single-step p-values as inputs into sequential stopping rules such as those proposed by G'Sell et al. (2013) and Li and Barber (2015), achieving control of the familywise error rate or false discovery rate (FDR) as desired. The FDR-controlling rules require the null p-values to be independent of each other and of the non-null p-values, a condition not satisfied by the saturated-model p-values of Tibshirani et al. (2014). We derive intuitive and general sufficient conditions for independence, and show that our proposed constructions yield independent p-values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

Distribution free MMD tests for model selection with estimated parameters

Several kernel based testing procedures are proposed to solve the proble...
research
06/03/2018

Structural Learning of Multivariate Regression Chain Graphs via Decomposition

We extend the decomposition approach for learning Bayesian networks (BN)...
research
12/16/2021

A model sufficiency test using permutation entropy

Using the ordinal pattern concept in permutation entropy, we propose a m...
research
10/10/2019

Online control of the familywise error rate

Suppose an analyst wishes to test an infinite sequence of hypotheses one...
research
07/06/2020

On optimal two-stage testing of multiple mediators

Mediation analysis in high-dimensional settings often involves identifyi...
research
02/06/2013

Models and Selection Criteria for Regression and Classification

When performing regression or classification, we are interested in the c...
research
01/09/2019

Algorithmic Bayesian Group Gibbs Selection

Bayesian model selection, with precedents in George and McCulloch (1993)...

Please sign up or login with your details

Forgot password? Click here to reset