Datenqualität in Regressionsproblemen

01/16/2017
by   Wolfgang Doneit, et al.
0

Regression models are increasingly built using datasets which do not follow a design of experiment. Instead, the data is e.g. gathered by an automated monitoring of a technical system. As a consequence, already the input data represents phenomena of the system and violates statistical assumptions of distributions. The input data can show correlations, clusters or other patterns. Further, the distribution of input data influences the reliability of regression models. We propose criteria to quantify typical phenomena of input data for regression and show their suitability with simulated benchmark datasets. ----- Regressionen werden zunehmend auf Datensätzen angewendet, deren Eingangsvektoren nicht durch eine statistische Versuchsplanung festgelegt wurden. Stattdessen werden die Daten beispielsweise durch die passive Beobachtung technischer Systeme gesammelt. Damit bilden bereits die Eingangsdaten Phänomene des Systems ab und widersprechen statistischen Verteilungsannahmen. Die Verteilung der Eingangsdaten hat Einfluss auf die Zuverlässigkeit eines Regressionsmodells. Wir stellen deshalb Bewertungskriterien für einige typische Phänomene in Eingangsdaten von Regressionen vor und zeigen ihre Funktionalität anhand simulierter Benchmarkdatensätze.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2019

The Tilted Beta Binomial Linear Regression Model: a Bayesian Approach

This paper proposes new linear regression models to deal with overdisper...
research
05/15/2019

Moment-based Estimation of Mixtures of Regression Models

Finite mixtures of regression models provide a flexible modeling framewo...
research
03/17/2020

Improving predictions by nonlinear regression models from outlying input data

When applying machine learning/statistical methods to the environmental ...
research
06/08/2021

The Randomness of Input Data Spaces is an A Priori Predictor for Generalization

Over-parameterized models can perfectly learn various types of data dist...
research
09/20/2019

Consensual aggregation of clusters based on Bregman divergences to improve predictive models

A new procedure to construct predictive models in supervised learning pr...
research
11/19/2015

Universal halting times in optimization and machine learning

The authors present empirical distributions for the halting time (measur...
research
08/27/2018

Adversarial Feature Learning of Online Monitoring Data for Operation Reliability Assessment in Distribution Network

With deployments of online monitoring systems in distribution networks, ...

Please sign up or login with your details

Forgot password? Click here to reset