Targeted Cross-Validation

09/14/2021
by   Jiawei Zhang, et al.
0

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted L_2 loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted L_2 loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted L_2 loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2022

On High-Dimensional Gaussian Comparisons For Cross-Validation

We derive high-dimensional Gaussian comparison results for the standard ...
research
12/24/2020

On Statistical Efficiency in Learning

A central issue of many statistical learning problems is to select an ap...
research
07/02/2019

Double Cross Validation for the Number of Factors in Approximate Factor Models

Determining the number of factors is essential to factor analysis. In th...
research
01/06/2012

The Interaction of Entropy-Based Discretization and Sample Size: An Empirical Study

An empirical investigation of the interaction of sample size and discret...
research
09/11/2019

Counterfactual Cross-Validation: Effective Causal Model Selection from Observational Data

What is the most effective way to select the best causal model among pot...
research
04/29/2011

Model Selection Consistency for Cointegrating Regressions

We study the asymptotic properties of the adaptive Lasso in cointegratio...
research
03/05/2019

Opportunity costs in the game of best choice

The game of best choice, also known as the secretary problem, is a model...

Please sign up or login with your details

Forgot password? Click here to reset