On regression and classification with possibly missing response variables in the data

12/06/2022
by   Majid Mojirsheibani, et al.
0

This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and the response variables. Our proposed approach involves two steps: In the first step, we construct a family of models (possibly infinite dimensional) indexed by the unknown parameter of the missing probability mechanism. In the second step, a search is carried out to find the empirically optimal member of an appropriate cover (or subclass) of the underlying family in the sense of minimizing the mean squared prediction error. The main focus of the paper is to look into the theoretical properties of these estimators. The issue of identifiability is also addressed. Our methods use a data-splitting approach which is quite easy to implement. We also derive exponential bounds on the performance of the resulting estimators in terms of their deviations from the true regression curve in general Lp norms, where we also allow the size of the cover or subclass to diverge as the sample size n increases. These bounds immediately yield various strong convergence results for the proposed estimators. As an application of our findings, we consider the problem of statistical classification based on the proposed regression estimators and also look into their rates of convergence under different settings. Although this work is mainly stated for kernel-type estimators, they can also be extended to other popular local-averaging methods such as nearest-neighbor estimators, and histogram estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2021

Linear regression under model uncertainty

We reexamine the classical linear regression model when the model is sub...
research
12/13/2018

Split regression modeling

In this note we study the benefits of splitting variables variables for ...
research
10/10/2018

On the Properties of Simulation-based Estimators in High Dimensions

Considering the increasing size of available data, the need for statisti...
research
12/29/2020

Bias-Aware Inference in Regularized Regression Models

We consider inference on a regression coefficient under a constraint on ...
research
05/13/2019

Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates

We establish the first nonasymptotic error bounds for Kaplan-Meier-based...
research
01/23/2023

Improving Estimation Efficiency In Structural Equation Models By An Easy Empirical Likelihood Approach

In this article, we construct empirical likelihood (EL)-weighted estimat...
research
06/07/2018

Kernel Machines With Missing Responses

Missing responses is a missing data format in which outcomes are not alw...

Please sign up or login with your details

Forgot password? Click here to reset