Model Specification Test with Unlabeled Data: Approach from Covariate Shift

11/02/2019
by   Masahiro Kato, et al.
0

We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this condition, the model might lack robustness with regards to the distribution shift. The proposed method would enable us to reject a misspecified model under our definition. By applying the proposed method, we can obtain a model that predicts the label for the unlabeled test data well without losing the interpretability of the model. In experiments, we show how the proposed method works for synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2022

Estimation of prediction error with known covariate shift

In supervised learning, the estimation of prediction error on unlabeled ...
research
01/31/2020

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

For many machine learning algorithms, two main assumptions are required ...
research
07/11/2021

Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Learning from positive and unlabeled (PU) data is an important problem i...
research
06/16/2016

Unsupervised Risk Estimation Using Only Conditional Independence Structure

We show how to estimate a model's test error from unlabeled data, on dis...
research
07/05/2022

Adapting to Online Label Shift with Provable Guarantees

The standard supervised learning paradigm works effectively when trainin...
research
11/18/2019

Does Regression Approximate the Influence of the Covariates or Just Measurement Errors? A Model Validity Test

A criterion is proposed for testing hypothesis about the nature of the e...
research
04/13/2023

Unified Out-Of-Distribution Detection: A Model-Specific Perspective

Out-of-distribution (OOD) detection aims to identify test examples that ...

Please sign up or login with your details

Forgot password? Click here to reset