Introduction
Big Data is indispensable for the development of machine learning [Manyika et al.2011]. Besides humanannotated data, geodistributed sensors are great source for data collection, which benefits the development of machine learning methods in understanding the environmental dynamics. In a common sensing or crowd sensing campaign [Chong and Kumar2003], sensors at different locations collect the environmental data during a time period. However, most sensing campaigns suffer from systematic or accidental missing data mechanism, like broken sensors, communication errors and etc. Such unfortunate information loss throws importance upon imputing missing values in the sensor data.
Sensor data recovery is a great challenge due to the remarkable portion of missing entries and their stochastic distribution. A handful of studies attempt to leverage the locality in the observational feature space via conventional methods like inverse distance weighting [Chen and Liu2012] or ARMA [Valipour, Banihabib, and Behbahani2013] or nearest neighbors [Pan and Li2010]
. Such methods yield unsatisfactory results as they fail to capture the latent, complex, and potentially higherorder temporal dynamics. In contrast, we aim to capture such dynamics in the latent (hidden) feature space by neural networks, which have proven rather effective in learning latent dynamics of timeseries data
[Längkvist, Karlsson, and Loutfi2014, Mei and Eisner2017].However, common deep learning procedure cannot directly be used with incomplete training data. We develop a flexible scheme to deal with this—first initialize the entries using simple statistic estimates, and then update the estimated value via a novel multilayer Iterative Imputing Network (IIN). The core component of our Iterative Imputing Network is a multilayer Long ShortTerm Memory (LSTM) network, which consumes a sequence of timestamped items and summarizes the representation and context information for each of them. An output layer is then stacked on this LSTM and projects the representation of any missing observation to a readable imputation. We propose to use two different versions of LSTM—the standard LSTM
[Hochreiter and Schmidhuber1997] for regularly sampled sensor data and PhaseLSTM [Neil, Pfeiffer, and Liu2016] for the irregularly sampled case. We call it Imputing Network (IN).Our novel Iterative Imputing Network (IIN) is a multilevel cascade of Imputing Networks (IN) that share the same set of weights, with the output imputation of any member IN block being fed into the higherlevel one as input. It is thus mathematically equivalent to iteratively training the same Imputing Network with the same sequence, until the imputation accuracy achieves a satisfactory level.
Why is this important? Because by iteratively connecting (training) the network, the issue of data sparsity, known as a natural enemy of neural models, is well handled—the Imputation Network is able to gradually adapt itself by iteratively refining its missing value imputation on one single sequence sample. Note that such data sparsity issue is especially troublesome in the task of missing data imputation, because 1) missing values naturally and effectively reduce data adequacy; 2) missing values sometimes are clustered together forming missing blocks, which are even more challenging to deal with.
Our Iterative Imputing Network has essential advantages over previous methods. First, our model summarizes higherorder temporal dynamics in the timeseries, by representing each timestamped observation with a deep neural network, while previous methods highly reply on locality in the observational feature space and could only summarize loworder (if more than one) temporal dependency in the timeseries. Second, by representing the sequences in hidden space and iteratively refining the missing value imputation, our model better deals with missing blocks. Our model is consistent with the imputations within a missing block, and effectively adopt more information than the previous methods that skip missing values. According to these advantages, our model outperforms all previous methods on a hard benchmark Beijing air quality and meteorological dataset. Moreover, we demonstrate with our experiments that the superiority of our model is consistent with varying missing data rates.
In summary, our main contributions are as follows:

We propose a practical scheme for sensor data recovery, enabling the use of deep learning procedure by initializing missing entries via flexible methods.

We design a novel Iterative Imputing Network (IIN), capturing the highorder latent temporal dynamics and iteratively refining the estimation of missing values.

Our method significantly improves the stateoftheart result on a hard benchmark, and shows robustness with varying missing rates.
Related Work
In the following, we review existing works related to our problem, including: (1) sensor data recovery; (2) deep learning for time series.
Sensing Data Recovery
When wireless sensor network emerged, [Doherty and others2000] pointed out the importance of data recovery. Once all missing values have been imputed, the dataset can then be analyzed using standard techniques for complete data. [Yozgatligil et al.2013, Lee, Kulic, and Nakamura2008] considered the temporal dependencies of sensing data series using statistical analysis like ARMA. Some studies include the spatial cue into the missing data recovery. [Pan and Li2010] presented a Knearest neighbor method for jointly spatial and temporal data imputation. [Yi et al.2016] considered spatial similarities together with their temporal similarities. Nevertheless, both of them only captured features on the surface, falling short in learning the internal dynamic of the temporal data. [Gruenwald et al.2010] applied treebased data mining techniques to handle missing data on reallife and synthetic datasets. [Lindström et al.2014b] utilized matrix analysis, while [Sorjamaa et al.2010] proposed a linear projection method called empirical orthogonal functions. However, their methods did not show enough robustness to the high rates and random distribution of missing data. Additionally, our approaches do not explicitly model the spatial similarities, since they can involve great uncertainties or noise. Instead, we feed all sensors’ data into one network without seperating each sensor, enabling our network to benefit from shared trends and common property of different sensors.
Deep Learning for Series
Deep learning has been known for its great capability of learning data representations automatically, instead of using handcraft features. Recurrent neural networks (RNN) and Long shortterm memory (LSTM) networks
[Gers, Eck, and Schmidhuber2000, Malhotra et al.2015] showed the efficacy for time series regression. However, few studies dive deep in dealing with missing values. [Parveen and Green2004] used binary indicators to handle missing values representing a pause between two sentences or a possibly interrupt for speech signals. Their missing values did not contain much information, which could cause much degradation of accuracy. They did not truly solve missing data recovery, but just ingore that. In contrast, our scheme aims to tackle the problem of missing data recovery, which has great significance due to the great amount but low quality of sensor data. Inspired by [Dai, He, and Sun2016]which proposed a segmentation cascade model in computer vision, we design our multilevel cascade Iterative Imputing Network (IIN) to model the recovery of time series.
Overview
Sensor data usually suffer from missing values because of errors in data collection and transmission. The missing entries have large and random distribution, illustrated by Figure 0(a). It is hard for us to utilize the sensor data for subsequent analysis and visualization unless we tackle the prevalent innate missing entries in the dataset.
To formulate the problem, we treat the series of data collected by one of the sensors as a sequence . Partial entries are missing, while the other entries are numerical values like temperatures and humidities. are in an unknown random distribution and some of them locate in consecutive values, forming a block missing. Our goal is to learn from and get a betterestimated value for missing readings . Based on , we split the data into the training set and test set . For testing set , let denote . Then we extract a portion of nonmissing entries denoted by from as ground truth, and learn how to recover from . Our intuition is that first initialize the entries using simple methods with few costs, and then apply deep learning model to learn and give prediction based on highorder latent temporal dynamics. The mean absolute error (MAE) and mean relative error (MRE) are used as metrics evaluating the quality of recovery data.
The Model
Imputing Network
Imputing Network, as the core component of our model, summarizes the context of each missing value by consuming its left and right neighboring observations or imputations with a forward and backward recurrent neural network (RNN) respectively, as shown in Figure 2. An output layer is then stacked upon the representations extracted by these RNNs and learns to impute the current missing value.
We choose Long ShortTerm Memory (LSTM) networks as our RNNs, because they have proven very effective at sequential modeling—they are able to handle longrange dependency along sequences and to prevent the gradients from exploding or vanishing with the memory cell.
Formally, our multilayer LSTMs recurrently consume context of each tobeimputed position in the sequence, as follow:
(1a)  
(1b) 
where the superscripts and denote ‘forward’ and ‘backward’ respectively, is the (possibly imputed) observation at each position, and . After and both reach , the imputation is computed by passing and through the output layer, as follow:
(2) 
.
In this paper, we adopt two versions of LSTMs, as shown in Figure 3. The standard LSTM [Hochreiter and Schmidhuber1997] is used to model regularly sampled sensor data. The update equations of standard LSTM is as follows.
(3a)  
(3b)  
(3c)  
(3d)  
(3e) 
PhaseLSTM [Neil, Pfeiffer, and Liu2016] incorporates time gate into LSTM cells in order to process irregularly sampled data, which is triggered by events generated in continuoustime ^{1}^{1}1For convenience, we call both two options LSTM and only differentiate them when it is needed.. The update equations of PhaseLSTM are shown as follows:
(4a)  
(4b)  
(4c)  
(4d) 
Iterative Imputing Network
To learn refined imputation and deal with possible data sparsity caused by missing values, we propose a novel Iterative Imputing Network (IIN), which is a multilevel cascade of Imputing Networks that share the same set of weights, with the output imputation of any member IN block being fed into the higherlevel one as input. This is important (as we can show shortly in experiments), because such design enables the Imputing Network to gradually adapt itself by iteratively refining its missing value imputation on one single sequence sample! This benefits the model learning by 1) jointly utilizing the information from not only visible observations but also previously imputed missing values and 2) well handling (potential) data sparsity caused by missing data.
Figure 4 illustrates the INN architecture. Note that this design is mathematically equivalent to iteratively training the same Imputing Network with the same sequence, until the imputation accuracy achieves a satisfactory level. Therefore, we naturally have two options for training the model—training the Iterative Imputing Model as a whole by gradientbased methods or iteratively training the same Imputing Network. As the former option is straightforward, we only elaborate the latter one in this section.
Our iterative recipe to train the model is illustrated in Algorithm 1.
This algorithm is essentially analogous to expectationmaximization (EM) algorithm: using currentlyestimated model weights to impute the missing values (expectation) and then updating these weights by maximizing the loglikelihood given the observational data (maximization)
[Allison2002].Experimental Analysis
In this section, we evaluate the effectiveness of our model on the benchmark Beijing air quality and meteorological dataset [Yu Zheng2013], elaborate the experimental details, and analyze the results.
Dataset Preparation
We conduct experiments on the Beijing air quality and meteorological dataset. The geodistributed air quality data and meteorological data was recorded every hour. We select the subset of PM2.5 from the air quality dataset and select the subsets of temperature (TEMP), humidity (HUM) and wind speed (WS) from the meteorological dataset. Figure 0(b) shows humidity data series collected by sensors in two different places. Sensor 00601 and 00602 are close to each other, while Sensor 00401 and 00403 are in the adjacent areas. Looking from the time dimension, the sensing data do not show an obvious periodicity. The sensing data may be affected by sparse asynchronous streams of events, which makes the learning and understanding of the internal data pattern a very challenging task. Worse still, there are noticeable missing entries in sensing dataset because of collecting errors and network errors, illustrated by Figure 0(a).
Besides the original missing entries in the sensing dataset, we need to prepare our training set and testing set by setting aside some entries as groundtruth. For PM2.5 dataset, we apply the method in [Yi et al.2016] to generate missing values. First, we record the positions of all missing values in each month’s data. Then we manually remove the values on the same position in the next month. (For instance, if the entry for a sensor at 20140504 14:00:00 is missing, then we drop out the value of this sensor at 20140604 14:00:00). For the other three datasets(temperature, humidity and wind speed), we randomly set aside of the total nonmissing values as the groundtruth of missing entries. In our experiment, we use the sensor recordings in the , , and month as testing set and the rest as training set.
Anchor Selection
A common practice in sensor data recovery is to operate within a sliding window that is centered at the tobeimputed entry. Such a sliding window is usually called the anchor [Ren et al.2015] of this entry ^{2}^{2}2 An anchor is associated with a window size, which could be large in order to cover several sensing cycles. ^{3}^{3}3Anchors can be extended to multiscale, i.e. combining anchors with multiple window sizes, or covering multiple sensors at the same time, but here we discuss the basic anchor for a single series.. An example anchor of size in the dataset is shown in Figure 5. As we can see, the tobeimputed entry is neighbored by its left (previous) and right (subsequent) context. The special value NA denotes the missing observations.
We aim to train our model only with anchors that carry adequate information about the dynamics. Therefore, we define each anchor to be valid for use, only if more than of the entries are observable (i.e. not missing). As with how to fill the missing blanks during training, we use the imputed values estimated by our model (with recently updated weights), which will be shown more effective than other common practices shortly in the results section.
Implementation Details
In our Imputing Network, we use a twolayer forward and backward LSTM to model the latent patterns of series. The first layer takes a sequence with a half of the window size as input, then outputs hidden units each of which is of dimensions. The second layer will output the last dimensional unit. We concatenate the outputs of forward and backward LSTM and then use dropout at a rate of . We set aside of the data in training set as our validation set and maintain the models which show best performance on the validation set.
To train the Imputation Network, we use mean absolute error (MAE) as our loss function and we apply Nesterovaccelerated adaptive moment estimation (Nadam) algorithm
[Sutskever et al.2013, Dozat2016]to optimize our neural network. Nadam incorporates Nesterov Momentum into Adam, making it consistently outperform Adam and RMSProp. For the LSTM cells, we initialize kernel weight matrix in a glorot uniform distribution, the recurrent kernel weight matrix in an orthogonal distribution, and set the bias to be zeros.
To implement IIN, a multilevel cascade of Imputing Networks, we maintain the index of the missing entries. We iteratively update the missing entries and pass the output into Imputing Network for two or three times. To update the missing entries, we maintain their indices. Cascade IIN will refine the data recovery as the number of iteration increases.
The entire process can also be seen as the adaption of EM iterative strategy to deep learning. At the first round, the large ratio of missing reading in a sensing dataset makes it unable to apply deep learning methods. Therefore, we actually use some combinations of statistical values of the data, which can be viewed as an unsupervised feature extraction. This is similar to E step. After we impute the missing entries from observed data, we get ”artificially” intact data and pass them to the LSTMbased IIN networks. The Imputing Network will learn the dynamics of the sensor data and give the most probable outputs for missing entries. This is similar to M step. After the first round, we could omit the E step—use the network outputs in the previous round as input data and do M step directly.
Recovery Accuracy and Error
We measure the performance by Mean Absolute Error (MAE) and Mean Relative Error (MRE),
where and is the estimated and groundtruth value respectively at time (with index ), and is the total number of observations.
Missing values occur in sensing dataset in a stochastical way. To eliminate the dominance of extreme cases and corner cases on MAE and MRE, we compute the mean error in a general scenario, which we denote as general missing. Similar to [Yi et al.2016], for the general missing we do not consider the spatial missing block (the missing values that records of all sensors are simultaneously absent) and the temporal missing block (the records of a sensor are missing in a certain length of time window, in our experiments). For fair comparisons, we compute MAE and MRE for all missing entries, which scenario we denote as overall missing.
We compare our method with 10 baselines, including ARMA [Valipour, Banihabib, and Behbahani2013], stKNN [Pan and Li2010], Kriging [Wu and Li2013], SARIMA [Yozgatligil et al.2013], DESM [Gruenwald et al.2010], AKE [Pan and Li2010], IDW+SES [Gardner1985], CF [Sarwar et al.2001, Su and Khoshgoftaar2009], NMF [Lee and Seung2001, Lindstrom et al.2014a], STMVL [Yi et al.2016]. Prior best results on Beijing air quality and meteorological dataset are achieved by STMVL, which use statistical methods to extract four features from the observed feature space.
We evaluate two types of IIN, based on standard LSTM and phased LSTM respectively. We denote them as IINS and IINP. Table 6 shows the comparison of different missing scenarios between different methods ^{4}^{4}4Results of different methods come from [Yi et al.2016].. As we can see, IIN outperform other baselines significantly in two missing scenarios. STMVL, the prior best work, improves on general missing than before. In contrast, IIN’s MAE improves on the results of STMVL. Moreover, IIN predict estimation for all spatial blocks and temporal blocks, while prior methods skip some extremely hard missing entries when computing the accuracy. In spite of this, IIN still outperforms significantly the prior methods.
Performance with Different Missing Rates
We consider the impact of different missing rates over our training process. The miss rate refers to the ratio of the number of missing entries over the total entries. Higher missing rate bring severe bias to data recovery. Here we prepare our dataset by randomly dropping out the data records with different missing rates. Illustrated by Figure 7, Our methods are always better than STMVL.
Missing Value Initialization Analysis
To kick off the training of our model, we fill the missing entries with appropriate initialization. Our schemes are flexible in different initialization methods, and then refine the estimation using deep learning approaches. We initialize the missing entries using the method proposed by [Yi et al.2016], which regresses each missing entry initialization on four commonly used corpuslevel statistics.
In Figure 8, we also evaluate the impacts of other initialization methods on our recovery accuracy. We compute the recovery error after passing the initialized data into Imputing Network. STMVL is the best initialization method, probably because it combines four statistical factors. Whatever initialization methods are used, IIN can always help refine the estimation of the results. (For MAE, AKE: to ; CF: to ; STMVL: to .) We should note that even if we initialize with AKE, IIN can further help to reduces the MAE to , lower than STMVL without IIN.
Separate or Mix Different Sensors?
In our recovery scheme, we utilize data from all sensors in order for jointly training of our neural networks. However, it seems natural to separately train and predict for the data of different sensors, in order to avoid their data patterns interfering with each other. We experimented in this way, applying our schemes on separate sensors and taking an average of their error. We denote this separate version as IIN(sep). The results are illustrated as Table 1. IIN shows better performance on all datasets than the stateofart methods STMVL. Nevertheless, IIN(sep) is not always better than STMVL. IIN(sep) shows an edge over STMVL on PM2.5 dataset, but fails on temperature dataset.
Therefore, mixing sensor data does not introduce much noise. Instead we argue that this enables our network to benefit from shared trends or common properties of different sensors. Some geodistributed sensor data have strong correlations, which benefits the recovery process on the missing data. For example, if we need to impute the missing values from an anchor of sensor 1, we find its neighboring sensor 2 has complete data of its anchor in the same period. Then the anchor of sensor 2 will help if we pass it into IIN, which gives prediction based on common data patterns in the latent feature space.
Dataset  PM2.5  TEMP  HUM  WS  

STMVL  MAE  12.12  0.68  3.37  1.89 
MRE  0.1740  0.0459  0.0591  0.2985  
IIN(Sep)  MAE  10.78  0.74  3.10  1.88 
MRE  0.1558  0.0496  0.0544  0.2958  
IIN  MAE  10.63  0.63  2.90  1.87 
MRE  0.1531  0.0422  0.0509  0.2953 
Conclusion
Besides the humanannotated data that is usually rather costly, sensor data has been for a long time playing an important role in machine learning tasks. However, systematic or accidental misoperations often result in a variety of missing data, which significantly adds up the noise in the collected dataset. While previous work only imputes the missing values by interpolating in the observational feature space, we aim to model the latent (hidden) temporal dynamics by summarizing each observation’s context with a novel Iterative Imputing Network. Our model significantly outperforms previous work on the benchmark Beijing air quality and meteorological dataset, and also yields consistent superiority over other methods in cases of different missing rates.
Acknowledgments
We sincerely thank Hongyuan Mei for many helpful discussions and comments on the manuscript. We also thank Xinsong Zhang and Weijia Jia for their introduction of the dataset and their encouragement.
References
 [Allison2002] Allison, P. D. 2002. Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology 55(1):193–196.
 [Chen and Liu2012] Chen, F.W., and Liu, C.W. 2012. Estimation of the spatial rainfall distribution using inverse distance weighting (idw) in the middle of taiwan. Paddy and Water Environment 10(3):209–222.
 [Chong and Kumar2003] Chong, C.Y., and Kumar, S. P. 2003. Sensor networks: evolution, opportunities, and challenges. Proceedings of the IEEE 91(8):1247–1256.

[Dai, He, and Sun2016]
Dai, J.; He, K.; and Sun, J.
2016.
Instanceaware semantic segmentation via multitask network cascades.
In
Computer Vision and Pattern Recognition (CVPR)
, 3150–3158.  [Doherty and others2000] Doherty, L., et al. 2000. Algorithms for position and data recovery in wireless sensor networks. UC Berkeley EECS Masters Report.
 [Dozat2016] Dozat, T. 2016. Incorporating nesterov momentum into adam.
 [Gardner1985] Gardner, E. S. 1985. Exponential smoothing: The state of the art. Journal of Forecasting 4(1):1–28.
 [Gers, Eck, and Schmidhuber2000] Gers, F. A.; Eck, D.; and Schmidhuber, J. 2000. Applying lstm to time series predictable through timewindow approaches. international conference on artificial neural networks 669–676.
 [Gruenwald et al.2010] Gruenwald, L.; Sadik, M. S.; Shukla, R.; and Yang, H. 2010. Dems: a data mining based technique to handle missing data in mobile sensor network applications. In Proceedings of the Seventh International Workshop on Data Management for Sensor Networks, 26–32.
 [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long shortterm memory. Neural computation 9(8):1735–1780.
 [Längkvist, Karlsson, and Loutfi2014] Längkvist, M.; Karlsson, L.; and Loutfi, A. 2014. A review of unsupervised feature learning and deep learning for timeseries modeling. Pattern Recognition Letters 42:11–24.
 [Lee and Seung2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for nonnegative matrix factorization. Neural information processing systems (NIPS) 556–562.

[Lee, Kulic, and
Nakamura2008]
Lee, D.; Kulic, D.; and Nakamura, Y.
2008.
Missing motion data recovery using factorial hidden markov models.
In IEEE International Conference on Robotics and Automation (ICRA), 1722–1728.  [Lindstrom et al.2014a] Lindstrom, J.; Szpiro, A. A.; Sampson, P. D.; Oron, A. P.; Richards, M.; Larson, T.; and Sheppard, L. 2014a. A flexible spatiotemporal model for air pollution with spatial and spatiotemporal covariates. Environmental and Ecological Statistics 21(3):411–433.
 [Lindström et al.2014b] Lindström, J.; Szpiro, A. A.; Sampson, P. D.; Oron, A. P.; Richards, M.; Larson, T. V.; and Sheppard, L. 2014b. A flexible spatiotemporal model for air pollution with spatial and spatiotemporal covariates. Environmental and ecological statistics 21(3):411–433.

[Malhotra et al.2015]
Malhotra, P.; Vig, L.; Shroff, G.; and Agarwal, P.
2015.
Long short term memory networks for anomaly detection in time series.
In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.  [Manyika et al.2011] Manyika, J.; Chui, M.; Brown, B.; Bughin, J.; Dobbs, R.; Roxburgh, C.; and Byers, A. H. 2011. Big data: The next frontier for innovation, competition, and productivity. Analytics.
 [Mei and Eisner2017] Mei, H., and Eisner, J. 2017. The neural Hawkes process: A neurally selfmodulating multivariate point process. In Advances in Neural Information Processing Systems (NIPS).
 [Neil, Pfeiffer, and Liu2016] Neil, D.; Pfeiffer, M.; and Liu, S.C. 2016. Phased lstm: Accelerating recurrent network training for long or eventbased sequences. In Advances in Neural Information Processing Systems (NIPS), 3882–3890.
 [Pan and Li2010] Pan, L., and Li, J. 2010. Knearest neighbor based missing data estimation algorithm in wireless sensor networks. Wireless Sensor Network 2(02):115.
 [Parveen and Green2004] Parveen, S., and Green, P. 2004. Speech enhancement with missing data techniques using recurrent neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 1, I–733.
 [Ren et al.2015] Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster rcnn: Towards realtime object detection with region proposal networks. In Advances in neural information processing systems (NIPS), 91–99.
 [Sarwar et al.2001] Sarwar, B. M.; Karypis, G.; Konstan, J. A.; and Riedl, J. 2001. Itembased collaborative filtering recommendation algorithms. International world wide web conferences (WWW) 285–295.
 [Sorjamaa et al.2010] Sorjamaa, A.; Lendasse, A.; Cornet, Y.; and Deleersnijder, E. 2010. An improved methodology for filling missing values in spatiotemporal climate data set. Computational Geosciences 14(1):55–64.

[Su and Khoshgoftaar2009]
Su, X., and Khoshgoftaar, T. M.
2009.
A survey of collaborative filtering techniques.
Advances in Artificial Intelligence
4.  [Sutskever et al.2013] Sutskever, I.; Martens, J.; Dahl, G.; and Hinton, G. 2013. On the importance of initialization and momentum in deep learning. In International conference on machine learning (ICML), 1139–1147.
 [Valipour, Banihabib, and Behbahani2013] Valipour, M.; Banihabib, M. E.; and Behbahani, S. M. R. 2013. Comparison of the arma, arima, and the autoregressive artificial neural network models in forecasting the monthly inflow of dez dam reservoir. Journal of hydrology 476:433–441.
 [Wu and Li2013] Wu, T., and Li, Y. 2013. Spatial interpolation of temperature in the united states using residual kriging. Applied Geography 44:112–120.
 [Yi et al.2016] Yi, X.; Zheng, Y.; Zhang, J.; and Li, T. 2016. Stmvl: filling missing values in geosensory time series data. In International joint conference on artificial intelligence (IJCAI), 2704–2710.
 [Yozgatligil et al.2013] Yozgatligil, C.; Aslan, S.; Iyigun, C.; and Batmaz, I. 2013. Comparison of missing value imputation methods in time series: the case of turkish meteorological data. Theoretical and applied climatology 112(12):143–167.
 [Yu Zheng2013] Yu Zheng, Furui Liu, H.P. H. 2013. Uair: When urban air quality inference meets big data. Knowledge discovery and data mining (KDD) 1436–1444.