Predictive modeling for limited distributed targets

03/24/2023
by   Björn Bokelmann, et al.
0

Many forecasting applications have a limited distributed target variable, which is zero for most observations and positive for the remaining observations. In the econometrics literature, there is much research about statistical model building for limited distributed target variables. Especially, there are two component model approaches, where one model is build for the probability of the target to be positive and one model for the actual value of the target, given that it is positive. However, the econometric literature focuses on effect estimation and does not provide theory for predictive modeling. Nevertheless, some concepts like the two component model approach and Heckmann's sample selection correction also appear in the predictive modeling literature, without a sound theoretical foundation. In this paper, we theoretically analyze predictive modeling for limited dependent variables and derive best practices. By analyzing various real-world data sets, we also use the derived theoretical results to explain which predictive modeling approach works best on which application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities o...
research
03/21/2019

Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding

Training documents have a significant impact on the performance of predi...
research
03/09/2018

Competitive Machine Learning: Best Theoretical Prediction vs Optimization

Machine learning is often used in competitive scenarios: Participants le...
research
07/26/2020

Derivation of Generalized Equations for the Predictive Value of Sequential Screening Tests

Using Bayes' Theorem, we derive generalized equations to determine the p...
research
11/26/2021

Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features

Shapley values are today extensively used as a model-agnostic explanatio...
research
08/31/2016

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

In this paper, we study the challenge of feature selection based on a re...

Please sign up or login with your details

Forgot password? Click here to reset