Improving predictions by nonlinear regression models from outlying input data

03/17/2020
by   William W. Hsieh, et al.
0

When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training. Continuous unbounded variables are widely used in environmental sciences, whence not uncommon for new input data to lie far outside the training domain. For six environmental datasets, inputs in the test data were classified as "outliers" and "non-outliers" based on the Mahalanobis distance from the training input data. The prediction scores (mean absolute error, Spearman correlation) showed NLR to outperform LR for the non-outliers, but often underperform LR for the outliers. An approach based on Occam's Razor (OR) was proposed, where linear extrapolation was used instead of nonlinear extrapolation for the outliers. The linear extrapolation to the outlier domain was based on the NLR model within the non-outlier domain. This NLR_OR approach reduced occurrences of very poor extrapolation by NLR, and it tended to outperform NLR and LR for the outliers. In conclusion, input test data should be screened for outliers. For outliers, the unreliable NLR predictions can be replaced by NLR_OR or LR predictions, or by issuing a "no reliable prediction" warning.

READ FULL TEXT

page 5

page 21

research
10/02/2011

Robust artificial neural networks and outlier detection. Technical report

Large outliers break down linear and nonlinear regression models. Robust...
research
05/22/2022

Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee

Conformal prediction aims to determine precise levels of confidence in p...
research
08/23/2019

A Robust Regression Approach for Robot Model Learning

Machine learning and data analysis have been used in many robotics field...
research
01/16/2017

Datenqualität in Regressionsproblemen

Regression models are increasingly built using datasets which do not fol...
research
08/18/2023

Generative Machine Listener

We show how a neural network can be trained on individual intrusive list...
research
05/03/2022

Towards an Ensemble Regressor Model for Anomalous ISP Traffic Prediction

Prediction of network traffic behavior is significant for the effective ...
research
08/02/2022

Viskositas: Viscosity Prediction of Multicomponent Chemical Systems

Viscosity in the metallurgical and glass industry plays a fundamental ro...

Please sign up or login with your details

Forgot password? Click here to reset