An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

08/13/2016
by   Davi E. N. Frossard, et al.
0

Techniques such as clusterization, neural networks and decision making usually rely on algorithms that are not well suited to deal with missing values. However, real world data frequently contains such cases. The simplest solution is to either substitute them by a best guess value or completely disregard the missing values. Unfortunately, both approaches can lead to biased results. In this paper, we propose a technique for dealing with missing values in heterogeneous data using imputation based on the k-nearest neighbors algorithm. It can handle real (which we refer to as crisp henceforward), interval and fuzzy data. The effectiveness of the algorithm is tested on several datasets and the numerical results are promising.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2009

Using Association Rules for Better Treatment of Missing Values

The quality of training data for knowledge discovery in databases (KDD) ...
research
07/27/2022

Development of fully intuitionistic fuzzy data envelopment analysis model with missing data: an application to Indian police sector

Data Envelopment Analysis (DEA) is a technique used to measure the effic...
research
09/08/2021

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

In high-stakes applications of data-driven decision making like healthca...
research
11/23/2020

Distance-based Data Cleaning: A Survey (Technical Report)

With the rapid development of the internet technology, dirty data are co...
research
11/15/2019

Imputing missing values with unsupervised random trees

This work proposes a non-iterative strategy for missing value imputation...
research
04/07/2020

Learning Individual Models for Imputation (Technical Report)

Missing numerical values are prevalent, e.g., owing to unreliable sensor...
research
01/05/2007

Missing values : processing with the Kohonen algorithm

The processing of data which contain missing values is a complicated and...

Please sign up or login with your details

Forgot password? Click here to reset