Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

by   Jerome H. Friedman, et al.
Stanford University

The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y–values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.


page 1

page 2

page 3

page 4


What's a good imputation to predict with missing values?

How to learn a good predictor on data with missing values? Most efforts ...

Predicting the Likely Behaviors of Continuous Nonlinear Systems in Equilibrium

This paper introduces a method for predicting the likely behaviors of co...

Antenna Optimization Using a New Evolutionary Algorithm Based on Tukey-Lambda Probability Distribution

In this paper, we introduce a new evolutionary optimization algorithm ba...

P-values for classification

Let (X,Y) be a random variable consisting of an observed feature vector ...

Estimating regression errors without ground truth values

Regression analysis is a standard supervised machine learning method use...

Generalized Score Distribution

A class of discrete probability distributions contains distributions wit...

Please sign up or login with your details

Forgot password? Click here to reset