Missing values : processing with the Kohonen algorithm

01/05/2007
by   Marie Cottrell, et al.
0

The processing of data which contain missing values is a complicated and always awkward problem, when the data come from real-world contexts. In applications, we are very often in front of observations for which all the values are not available, and this can occur for many reasons: typing errors, fields left unanswered in surveys, etc. Most of the statistical software (as SAS for example) simply suppresses incomplete observations. It has no practical consequence when the data are very numerous. But if the number of remaining data is too small, it can remove all significance to the results. To avoid suppressing data in that way, it is possible to replace a missing value with the mean value of the corresponding variable, but this approximation can be very bad when the variable has a large variance. So it is very worthwhile seeing that the Kohonen algorithm (as well as the Forgy algorithm) perfectly deals with data with missing values, without having to estimate them beforehand. We are particularly interested in the Kohonen algorithm for its visualization properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

Transformed Distribution Matching for Missing Value Imputation

We study the problem of imputing missing values in a dataset, which has ...
research
02/26/2019

Optimal Clustering with Missing Values

Missing values frequently arise in modern biomedical studies due to vari...
research
04/26/2023

Regression with Sensor Data Containing Incomplete Observations

This paper addresses a regression problem in which output label values a...
research
06/19/2020

Bayesian Optimization with Missing Inputs

Bayesian optimization (BO) is an efficient method for optimizing expensi...
research
11/24/2020

To Explore What Isn't There – Glyph-based Visualization for Analysis of Missing Values

This paper contributes a novel visualization method, Missingness Glyph, ...
research
08/13/2016

An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

Techniques such as clusterization, neural networks and decision making u...
research
04/02/2018

Process Control with Highly Left Censored Data

The need to control industrial processes, detecting changes in process p...

Please sign up or login with your details

Forgot password? Click here to reset