Adaptive RBF Interpolation for Estimating Missing Values in Geographical Data

08/10/2019
by   Kaifeng Gao, et al.
0

The quality of datasets is a critical issue in big data mining. More interesting things could be mined from datasets with higher quality. The existence of missing values in geographical data would worsen the quality of big datasets. To improve the data quality, the missing values are generally needed to be estimated using various machine learning algorithms or mathematical methods such as approximations and interpolations. In this paper, we propose an adaptive Radial Basis Function (RBF) interpolation algorithm for estimating missing values in geographical data. In the proposed method, the samples with known values are considered as the data points, while the samples with missing values are considered as the interpolated points. For each interpolated point, first, a local set of data points are adaptively determined. Then, the missing value of the interpolated point is imputed via interpolating using the RBF interpolation based on the local set of data points. Moreover, the shape factors of the RBF are also adaptively determined by considering the distribution of the local set of data points. To evaluate the performance of the proposed method, we compare our method with the commonly used k Nearest Neighbors (kNN) interpolation and Adaptive Inverse Distance Weighted (AIDW) methods, and conduct three groups of benchmark experiments. Experimental results indicate that the proposed method outperforms the kNN interpolation and AIDW in terms of accuracy, but worse than the kNN interpolation and AIDW in terms of efficiency.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

04/21/2009

Using Association Rules for Better Treatment of Missing Values

The quality of training data for knowledge discovery in databases (KDD) ...
11/08/2019

Persistent Homology as Stopping-Criterion for Voronoi Interpolation

In this study the Voronoi interpolation is used to interpolate a set of ...
01/14/2019

Fast Green Function Evaluation for Method of Moment

In this letter, an approach to accelerate the matrix filling in method o...
02/27/2019

Can learning from natural image denoising be used for seismic data interpolation?

We propose a convolutional neural network (CNN) denoising based method f...
04/25/2020

A Lyapunov-Stable Adaptive Method to Approximate Sensorimotor Models for Sensor-Based Control

In this article, we present a new scheme that approximates unknown senso...
08/27/2017

Gatherplots: Generalized Scatterplots for Nominal Data

Overplotting of data points is a common problem when visualizing large d...
10/11/2021

Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

A method to estimate an acoustic field from discrete microphone measurem...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Datasets are the key elements in big data mining, and the quality of datasets has an important impact on the results of big data analysis. For a higher quality dataset, some hidden rules can often be mined from it, and through these rules we can find some interesting things. At present, big data mining technology is widely used in various fields, such as geographic analysis [10, 16], financial analysis, smart city and biotechnology. It usually needs a better dataset to support the research, but in fact there is always noise data or missing value data in the datasets [9, 14, 17]. In order to improve data quality, various machine learning algorithms [7, 15] are often required to estimate the missing value and clean noise data.

RBF interpolation algorithm is a popular method for estimating missing values [4, 5, 6].In large-scale computing, the cost can be minimized by using adaptive scheduling method [1]. RBF is a distance-based function, which is meshless and dimensionless, thus it is inherently suitable for processing multidimensional scattered data. Many scholars have done a lot of work on RBF research. Skala [13] used CSRBF to analyze big datasets, Cuomo et al. [2, 3] studied the reconstruction of implicit curves and surfaces by RBF interpolation. Kedward et al. [8] used multiscale RBF interpolation to study mesh deformation. In RBF, the shape factor is an important reason affecting the accuracy of interpolation. Some empirical formulas for optimum shape factor have been proposed by scholars.

In this paper, our objective is to estimate missing values in geographical data.We proposed an adaptive RBF interpolation algorithm, which adaptively determines the shape factor by the density of the local dataset. To evaluate the performance of adaptive RBF interpolation algorithm in estimating missing values, we used three datasets for verification experiments, and compared the accuracy and efficiency of adaptive RBF interpolation with that of NN interpolation and AIDW.

The rest of the paper is organized as follows. Section 2 mainly introduces the implementation process of the adaptive RBF interpolation algorithm, and briefly introduces the method to evaluate the performance of adaptive RBF interpolation. Section 3 introduces the experimental materials. Section 4 presents the estimated results of missing values, then discusses the experimental results. Section 5 draws some conclusions.

2 Methods

In this paper, our objective is to develop an adaptive RBF interpolation algorithm to estimate missing values in geospatial big data, and compare the results with that of NN and AIDW. In this section, we firstly introduce the specific implementation process of the adaptive RBF interpolation algorithm, then briefly introduces the method to evaluate the performance of adaptive RBF interpolation.

2.1 Adaptive RBF Interpolation Algorithm

The basic ideas behind the RBF interpolation are as follows. Constructing a high-dimensional function £¬suppose there is a set of discrete points with associated data values Thus, the function can be expressed as a linear combination of RBF in the form (Eq. (1)):

(1)

where is the number of interpolation points is the undetermined coefficient, the function is a type of RBF

The kernel function selected in this paper is Multi-quadric RBF(MQ-RBF), which is formed as (Eq. (2)):

(2)

where is the distance between the interpolated point and the data point, is the shape factor. Submit the data points into Eq. (1), then the interpolation conditions become (Eq. (3)):

(3)

When using the RBF interpolation algorithm in a big dataset, it is not practical to calculate an interpolated point with all data points. Obviously, the closer the data point is to the interpolated point, the greater the influence on the interpolation result and the data point far from the interpolated point to a certain distance, its impact on the interpolated point is almost negligible. Therefore, we calculate the distances from an interpolated point to all data points, and select 20 points with the smallest distances as a local dataset for the interpolated point.

In Eq. (2), the value of the shape factor in MQ-RBF has a significant influence on the calculation result of interpolation. We consult the method proposed by Lu and Wang [11, 12], adaptively determining the value of the interpolated points by the density of the local dataset. The expected density is calculated by the function (Eq. (4)):

(4)

where is the number of data points in the dataset, is the maximum value of for the data points in the dataset, is the minimum value of in dataset is the maximum value of in dataset, is the minimum value of in dataset

And the local density is calculated by (Eq. (5)):

(5)

where is the number of data points in the local dataset, in this paper, we set as 20. is the maximum value of for the data points in local dataset, is the minimum value of in local dataset is the maximum value of in local dataset, is the minimum value of in local dataset

With both the local density and the expected density, the local density statistic can be expressed as (Eq. (6)):

(6)

where is the location of an interpolated point Then normalize the measure to by a fuzzy membership function (Eq. (7)):

(7)

Then determine the shape factor by a triangular membership function. See Eq (8).

(8)

where are five levels of shape factor.

After determining the shape factor , the next steps are the same as the general RBF calculation method. The specific process of the adaptive RBF interpolation algorithm is illustrated in Fig.1.

Figure 1: Flowchart of the adaptive RBF interpolation algorithm

2.2 Evaluating the Performance of Adaptive RBF Interpolation

In order to evaluate the computational accuracy of the adaptive RBF interpolation algorithm, we use the metric, Root Mean Square Error (RMSE) to measure the accuracy. The RMSE evaluates the error accuracy by comparing the deviation between the estimated value and the true value. Then, we compare the accuracy and efficiency of adaptive RBF estimator with the results of NN and AIDW estimators.

3 Experimental Design

To evaluate the performance of the presented adaptive RBF interpolation algorithm we use three datasets to test it. The details of the experimental environment are listed in Table 1.

In our experiments, we use three datasets from three cities’ digital elevation model (DEM) (Fig.2(a) to Fig.2(c)), the range of three DEM maps is the same We randomly select 10% observed samples from each dataset as the samples with missing values, and the rest as the samples with known values. It should be noted that the samples with missing values have really elevation values in fact, but for testing, we assume the elevations are missing. Basic information of the datasets is listed in Table 2.

Specification Details
OS Windows 7. Professional
CPU Intel (R) i5-4210U
CPU Frequency 1.70 GHz
CPU RAM 8 GB
CPU Core 4
Table 1: Details of experimental environment
Dataset Number of known values Number of missing values Illustration
Beijing 1,111,369 123,592 Fig.2(a)
Chongqing 1,074,379 97,525 Fig.2(b)
Longyan 1,040,670 119,050 Fig.2(c)
Table 2: Details of the testing data
(a) DEM map of Beijing City
(b) DEM map of Chongqing City
(c) DEM map of Longyan City
Figure 2: The DEM maps of three cities for the experimental tests.

4 Results and Discussion

We compare the accuracy and efficiency of adaptive RBF estimator with that of NN and AIDW estimators.

Figure 3: Comparisons of the computational accuracy
Figure 4: Comparisons of the computational efficiency

In Fig.3 and Fig.4, we find that the accuracy of the adaptive RBF estimator is the best performing, and theNN estimator with the lowest accuracy. With the number of known data points in the datasets decreases, the accuracy of three estimators decreases significantly. Moreover, the computational efficiency of adaptive RBF estimator is worse than that of NN estimator and AIDW estimator, among them,NN has the best computational efficiency. With the increase of data quantity, the disadvantage of the computational efficiency of NN estimator becomes more and more obvious.

The data points selected from DEM are evenly distributed, and the shape factor of the adaptive RBF interpolation algorithm is adapted according to the density of the points in the local dataset therefore, when the missing data is estimated in a dataset with a more uniform data point, the advantages of the adaptive RBF interpolation algorithm may not be realized. We need to do further research in datasets with uneven datasets

5 Conclusions

In this paper, we specifically proposed an adaptive RBF interpolation algorithm for estimating the missing values in geographical data. We performed three groups of experimental tests to evaluate the computational accuracy and efficiency of the proposed adaptive RBF interpolation by comparing with the NN interpolation and AIDW method. We found that the accuracy of the adaptive RBF interpolation performs better than NN interpolation and AIDW in regularly distributed datasets. But the efficiency of adaptive RBF interpolation is worse than the NN interpolation and AIDW.

Acknowledgments

This research was jointly supported by the National Natural Science Foundation of China (11602235), and the Fundamental Research Funds for China Central Universities (2652018091, 2652018107, and 2652018109).

References

  • [1] G. B. Barone, V. Boccia, D. Bottalico, R. Campagna, L. Carracciuolo, G. Laccetti, and M. Lapegna (2016) An approach to forecast queue time in adaptive scheduling: how to mediate system efficiency and users satisfaction. International Journal of Parallel Programming 45 (5), pp. 1–30. Cited by: §1.
  • [2] S. Cuomo, A. Galletti, G. Giunta, and L. Marcellino (2016) Reconstruction of implicit curves and surfaces via rbf interpolation. Applied Numerical Mathematics 116, pp. 60–63. Cited by: §1.
  • [3] S. Cuomo, A. Gallettiy, G. Giuntay, and A. Staracey (2013) Surface reconstruction from scattered point via rbf interpolation on gpu. Cited by: §1.
  • [4] Z. Ding, G. Mei, S. Cuomo, H. Tian, and N. Xu (2018) Accelerating multi-dimensional interpolation using moving least-squares on the gpu. Concurrency Computation 30 (24). Cited by: §1.
  • [5] Z. Ding, M. Gang, S. Cuomo, Y. Li, and N. Xu (2018) Comparison of estimating missing values in iot time series data using different interpolation algorithms. International Journal of Parallel Programming, pp. 1–15. Cited by: §1.
  • [6] Z. Ding, M. Gang, S. Cuomo, N. Xu, and T. Hong (2017) Performance evaluation of gpu-accelerated spatial interpolation using radial basis functions for building explicit surfaces. International Journal of Parallel Programming (157), pp. 1–29. Cited by: §1.
  • [7] D. Gao, Y. Liu, J. Meng, Y. Jia, and C. Fan (2018) Estimating significant wave height from sar imagery based on an svm regression model. Acta Oceanologica Sinica 37 (3), pp. 103–110. Cited by: §1.
  • [8] L. Kedward, C. B. Allen, and T. C. S. Rendall (2017) Efficient and exact mesh deformation using multiscale rbf interpolation. Journal of Computational Physics 345, pp. 732–751. Cited by: §1.
  • [9] R. H. Keogh, S. R. Seaman, J. W. Bartlett, and A. M. Wood (2018) Multiple imputation of missing data in nested case-control and case-cohort studies. Biometrics 74 (4). Cited by: §1.
  • [10] Z. Liang, Z. Na, H. Wei, Z. Feng, Q. Qiao, and M. Luo From big data to big analysis: a perspective of geographical conditions monitoring. pp. 1–15. Cited by: §1.
  • [11] G. Y. Lu and D. W. Wong (2008) An adaptive inverse-distance weighting spatial interpolation technique. Computers & Geosciences 34 (9), pp. 1044–1055. Cited by: §2.1.
  • [12] G. Mei, N. Xu, and L. Xu (2016) Improving gpu-accelerated adaptive idw interpolation algorithm using fast knn search. Springerplus 5 (1), pp. 1389. Cited by: §2.1.
  • [13] V. Skala (2017) RBF interpolation with csrbf of large data sets. Procedia Computer Science 108, pp. 2433–2437. Cited by: §1.
  • [14] D. Sovilj, E. Eirola, Y. Miche, K. M. Björk, N. Rui, A. Akusok, and A. Lendasse (2016) Extreme learning machine for missing data using multiple imputations. Neurocomputing 174 (PA), pp. 220–231. Cited by: §1.
  • [15] T. Tang, S. Chen, Z. Meng, H. Wei, and J. Luo (2018)

    Very large-scale data classification based on k-means clustering and multi-kernel svm

    .
    Soft Computing (1), pp. 1–9. Cited by: §1.
  • [16] P. Thakuriah, N. Y. Tilahun, and M. Zellner (2016) Big data and urban informatics: innovations and challenges to urban planning and knowledge discovery. Cited by: §1.
  • [17] H. Tomita, H. Fujisawa, and M. Henmi (2018) A bias-corrected estimator in multiple imputation for missing data. Statistics in Medicine 47 (1), pp. 1–16. Cited by: §1.