Log In Sign Up

Multistage Large Segment Imputation Framework Based on Deep Learning and Statistic Metrics

by   JinSheng Yang, et al.

Missing value is a very common and unavoidable problem in sensors, and researchers have made numerous attempts for missing value imputation, particularly in deep learning models. However, for real sensor data, the specific data distribution and data periods are rarely considered, making it difficult to choose the appropriate evaluation indexes and models for different sensors. To address this issue, this study proposes a multistage imputation framework based on deep learning with adaptability for missing value imputation. The model presents a mixture measurement index of low- and higher-order statistics for data distribution and a new perspective on data imputation performance metrics, which is more adaptive and effective than the traditional mean squared error. A multistage imputation strategy and dynamic data length are introduced into the imputation process for data periods. Experimental results on different types of sensor data show that the multistage imputation strategy and the mixture index are superior and that the effect of missing value imputation has been improved to some extent, particularly for the large segment imputation problem. The codes and experimental results have been uploaded to GitHub.


page 6

page 9

page 11

page 12

page 17


Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing values in tabular data restrict the use and performance of machi...

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Due to complex experimental settings, missing values are common in biome...

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula

Most data science algorithms require complete observations, yet many dat...

Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation

Due to the cost or interference of measurement, we need to control measu...

Goodness (of fit) of Imputation Accuracy: The GoodImpact Analysis

In statistical survey analysis, (partial) non-responders are integral el...

Deep convolutional generative adversarial networks for traffic data imputation encoding time series as images

Sufficient high-quality traffic data are a crucial component of various ...

I Introduction

With the fast growth of sensor applications, the amount of sensor data has increased tremendously in recent years [4]. The missing value has been a major impediment to the development of the sensor data analysis process [34]. Missing data values can be caused by various factors, including sensor failure, data loss, irregular sampling data, manual recording errors, sensor maintenance, and debugging [15]. Missing value is a widespread and difficult problem to avoid; it complicates future data analysis for researchers and engineers.

Currently, there are primarily two types of methods [17] for dealing with missing values, namely, deletion and imputation. Deletion is the act of directly deleting data with missing values; this will not only result in the data loss of specific information but also lead to incomplete time series, thereby affecting the subsequent data analysis work. The imputation methods [8]

are divided into traditional machine learning methods and deep learning methods. Traditional machine learning methods include neighbor-based methods

[14], constraint-based methods [33], regression-based methods [42], statistical-based methods [43], matrix factorization-based methods [27]

, expectation maximization-based methods

[10] and imputation multivariate imputation by chained equation-based methods [38, 1]

. These methods are better suited to situations with a small amount of data and a low missing rate. Deep learning methods include fully connected neural network (FCNN)-based methods


, convolutional neural network (CNN)-based methods


, recurrent neural network (RNN)-based methods

[40, 21, 18, 6, 3, 35, 20, 28, 24]

, generative adversarial network (GAN)

[22, 12, 25, 16], attention [36, 23], and transformer [31]. Deep learning methods are better suited with a large amount of data and a high missing rate.

Although deep learning methods have made significant progress in the problem of missing value imputation, their assumptions and settings are quite different. In practice, the characteristics of the missing values for sensor data are complicated, with factors such as missing rate, missing location, and maximum missing length, all of which will influence the difficulty of the imputation task. Making many assumptions in advance may result in unexpected outcomes that cannot be used in practice. The current deep learning imputation methods have three types of issues.

First, the measurement indexes of missing value imputation warrant further study [15]. To measure the imputation effect of missing values point to point, the current practice is to use the indicators from the mean squared error (MSE) category, particularly, MSE, mean absolute error (MAE), root MAE (RMAE), and root MSE (RMSE). They are supervised indexes, which means that the operation steps are to first remove a subset of the values based on the original data, use the model to impute this subset of the values, and finally use RMSE or MAE to measure the gap between the removed and imputed values. MSE and MAE are mainly used at imputation tasks [15]. But, they overemphasize the distinction between dropped and imputed values while neglecting the difference between the missing and imputed values. Furthermore, when a part of the data is removed in advance, the method cannot be used completely for model training and imputation. This issue is especially prominent when some data has a high missing rate.

Second, as we all know, when processing a large amount of time-series data in deep learning, the data is divided into small segments, each of which is a subsequence of the original time-series data, also known as a sample in deep learning. As a result of this crucial step, the missing rate can be divided into two categories: local missing and global missing rate, with the global missing rate representing the missing rate of all data and the local missing rate representing the missing rate of each short segment. In essence, the model should account for the local missing rate. The local missing rate for different data segments can range from 0% to 100%, depending on the missing rate of the data and the length of the segment. By contrast, the global missing rate is fixed. However, most of the existing deep learning imputation methods [39, 11, 40, 21, 18, 6, 3, 35, 12, 25, 23, 22, 20, 31, 36, 16, 28, 24] do not distinguish between these two concepts. At this point, two subproblems arise.

i) The imputation task and the model’s target are not aligned. The task that needs to be accomplished is to fill in the missing values in the data. However, it is easy to transform the problem of imputing fixed global missing rates into imputing fixed local missing rates during evaluation. Missing data in actual sensor data easily cluster together; hence, the local missing rate easily becomes zero.

ii) The choice of using disguised missing data is unrealistic. Most of the aforementioned studies only consider nearly or completely complete data on missing value imputation. Removing values with different missing rates makes it easier to train and evaluate the model with more complete data. However, removing and calculating the loss when there are missing values in the sample data is unclear in the training phase; in the testing phase, how to select test data and what indicators to be used to measure the model’s quality are also very sensitive. In practice, the imputed data contains missing values; how to train and evaluate this type of data is a topic worth discussing.

Third, more adaptable imputation strategies are required for different missing value problems. However, the conventional approach of treating the missing value imputation as a one-stage problem also overlooks the assumptions of multiple local missing rates. Fig. 1 shows a one-stage flowchart. Liu [19] proved that the iterative approach improves imputation performance. The missing rate of data should always be kept constant when dealing with data with different missing conditions using the same method. This can be accomplished through an iterative approach. However, most researchers use the strategy of fixed length segmentation processing for long-length sensor data; however, fixed length imputation cannot deal with large missing sections appropriately. [19] has proven that RMSE increases concerning the gap under the condition of a fixed sample length, i.e., the larger the missing gap in the data, the more information the model requires.

Fig. 1: The flowchart of one-stage imputation.

To address the aforementioned issues, we propose a Multistage Large Segment Imputation Framework (MLSIF) in this study. The MLSIF introduces a new statistical indicator that allows missing data values to participate in the entire stage of the training and evaluation process, allowing the MLSIF to impute sensor data with missing data. To improve the imputation effect, the framework design uses an iterative multistage imputation method and uses different data segment lengths in each imputation stage to better process different missing conditions. The real sensor data is used in the experimental part to verify the model’s quality, and by simulating the missing situation of real data, the problem of separation from the real situation caused by artificial regulations is avoided. The deep learning model, NRTSI [31], is used as the basic imputation model structure of the framework. The main contributions of this study are summarized as follows.

  • A new statistical indicator is presented. The newly proposed indicator can include the missing values from the data itself in the entire training and testing process and can be generated based on the distribution of the data without removing the original data. Therefore, it is more consistent with real-world situations and can better measure the effect of imputation to a certain extent.

  • A new method of constructing missing values is adopted, i.e., using complete or nearly complete data to simulate the missing situation of real data to avoid the deviation from the actual situation caused by an artificial setting.

  • A new MLSIF is proposed that can handle the missing value imputation task for actual missing data. The large-missing gaps are divided into different stages and imputed among the multistage. MLSIF dynamically changes the length of training and imputing data based on the missing situation of the current data, and longer missing gaps are provided with longer observation data, which means more information. Hence, the real-life sensor data can be handled.

  • The experimental results on both the benchmark and real sensor data show the effectiveness of the proposed MLSIF. The superiority of the multistage imputation strategy and the mixture loss have been highlighted, and the effect of missing value imputation has been improved to some extent, especially for the large missing gap imputation problem.

Ii Problem Formulation

This study considers the problem of missing value imputation in univariate sensor data, a class of time series data. Let be a sequence of sensor data with length , where , and some may be missing. To identify the missing values, define a mask sequence that corresponds to , where


. Denote as a sample, which is a subsequence of , where and are the set of positive integers. The problem we need to solve is how to impute the missing part.

Consequently, can be decomposed into multiple subsequences with length without crossover. Define a set that contains all the subsequences of , where represents the number of samples. When the data cannot be completely segmented, the last few data will form an with the previous consecutive data. Thus can be calculated using Equation (2).


where is least integer function.

Iii Related Works

Iii-a Deep Learning Models

Recently, there have been several studies on imputation deep learning models in the field of missing value imputation. It should be noted that sensor data is a type of time-series data. Therefore, time-series models can be applied to sensor data.

FCNN, RNN, and CNN were applied as the three basic structures in the early days of neural network development. As the first proposed network structure, FCNN can achieve good results while performing several tasks, including missing value imputation [39]. However, due to its insensitivity to time and a large number of parameters, it is gradually being replaced in time-series data by RNN and in image data by CNN. RNN is frequently mentioned by researchers in missing value imputation tasks as a network structure designed for time-series data, such as unidirectional RNN [35, 18, 32], bidirectional RNN [3, 6, 44], and variant RNN [40, 20]. CNN has achieved unparalleled results in image processing; however, it has not shown unique advantages in time series-data. Guo [11] attempted to use CNN for missing value imputation and obtained acceptable results.

The breadth and depth of neural networks have steadily increased with the exponential growth of data and processing throughout this century. Some new network, such as GAN and attention, have gradually emerged. GAN has shown promising results in image generation; thus, some researchers have attempted to apply it for missing value generation [21, 22, 12, 25, 16]. Attention is a model designed for language sequence data that can also be used for missing value imputation in time-series data. GLIMA [36] believes that the information between the part and the whole data should be fully considered; therefore, it constructs a structure that can not only extract local information but also consider the whole information. Ma [23] used attention to extract information from data to realize missing value imputation.

Based on the attention structure, the transformer structure begins to show better results on a wide range of tasks. It is possible to think of it as a compound neural network structure based on attention. NRTSI [31] uses a nonregression method to impute the missing values in the field of missing value imputation. It reinterprets time series as a set of (time, data) tuples and proposes a time-series imputation method based on a permutation equivariance model, achieving excellent results so far in time-series imputation experimental results.

Iii-B Imputation Frameworks

Compared with imputation models that use processed sequences and samples, a more integral and comprehensive approach is to use a strategic framework to complete the imputation, which refers to the entire imputation process from missing data to complete data.

Some researchers have attempted to solve the problem of missing value imputation using a framework. Farhangfar [9]

provided a framework that can be used in almost any method to generate weights representing the quality of each estimate to perform boosting. Applying it to an imputation method can, on average, significantly improve the imputation accuracy while maintaining the same asymptotic computational complexity. Rahman

[29] subsequently proposed a framework for imputing missing values based on coappearance, correlation and similarity analysis. It proposes a novel missing value imputation technique that uses existing dataset patterns such as co-occurrence of attribute values, correlations between attributes, and similarity of attribute values.

Although some general frameworks exist, there are some flaws for specific problems, such as their inability to distinguish between local and global missing rates and their poor performance on the large gap missing problem.

Iii-C Performance Metrics

Metrics are the criteria by which a model is measured, evaluated and selected in the task of imputing missing values. There are three commonly used metrics [15]: MSE-like evaluation (MSE, MAE, RMSE, and RMAE), and . Assume two sequences of time series data and of equal length, where . Let be the obvious values removed artificially and be the imputed values of the corresponding position. MSE and MAE can be defined using Equations (3) and (4).


RMSE and RMAE are the root of MSE and MAE, respectively. Moreover, and are defined as follows:


where and

are the mean and standard deviation of

and .

To use the above indicators, missing values must be artificially created in nonmissing value data. Specifically, first, remove a portion of the values from the data and subsequently impute the removed values. Furthermore, the imputation effect is measured by comparing the difference between the removed values and the imputed values.

MSE-type metrics have a higher tolerance for results that are close to the mean; however, they are sensitive to extreme values, resulting in less attention being given to the overall data distribution. and are commonly used in statistical analysis and focus on the difference between the total data and the mean, which is the measure of dispersion. More importantly, the above metrics must have corresponding true values before they can be calculated, i.e., one must first know the corresponding original value before measuring it. This requirement is unrealistic in practice. When the rate of missing data is low, it can be used to assess how effectively the imputed values are performing. However, when the missing rate of the data is high, the artificially constructed missing values will jeopardize the data’s integrity and reduce the amount of valid information in the data.

Iii-D Imputation Losses

In practice, the loss directs the model training. However, it is difficult to directly find computable targets for most tasks in practice; hence approximating methods are used to achieve the goal. The goal of the missing value imputation task is to minimize the expectation of loss between missing and imputed values in Equation (7).



represents a loss function,

is the complete time-series data, represents element-wise multiplication and is an imputation model with parameter . denotes the observed portion of and denotes the missing portion of .

This task is difficult to achieve, because is not known. Therefore, the traditional regression method sets an approximate loss construction method shown in Equation (8).


where represents the two-norm.

However, T.M. Choi [5] emphasized that the loss determined by the traditional regression method does not correspond to the task to be completed. The difference between the regressed and observed values is calculated using Equation (8). There is an implicit assumption in this loss that when the imputed values are close enough to the real values, the imputation result is satisfactory. Although this loss is simple to calculate, it differs significantly from Equation (7). It calculates the regression loss rather than the imputation loss, which does not match the target well. Therefore, training imputation networks with Equation (8) can be called implicit training. For these reasons, T.M. Choi [5] proposed a new training method for explicit training based on random drop imputation with self-training (RDIS). Random drop data are generated by randomly removing existing values in the time-series data . The random drop data is denoted as and , where


The loss function of RDIS can be expressed as follows:


where represents the dropped part of the observations, and represents the remaining part of the observations after dropping.

Compared with Equation (8), Equation (10) is more advanced, where it adds a second section to the loss function, making it more similar to Equation (7). It removes the data and converts the information that will be used as input to the model. This approach is useful for model training when the sample missing rate is low. When the sample missing rate is high, the information of the input data is further destroyed.

Compared to the regression loss, the nonregression method [20, 31] only outputs the missing position values. Therefore, the loss can only be calculated by artificially removing certain observed data. The loss function is shown in Equation (11).


The above loss (11) is a step closer to the objective Equation (7) than the Equation (10), which eliminates the influence of the observation data on the loss, and guides model training by directly calculating the loss of the missing data. However, all of these losses are faced with two problems. First, when the sample missing rate is high, adding missing values degrades the original data. Second, when MSE-like loss is used as a loss function to guide model training, the imputation result will be very close to the mean because of its sensitivity to extreme values [15].

Overall, the four aspects of work mentioned above are crucial for the missing value imputation task. The effect of imputation is determined by model and loss, the framework by how imputation is performed, and metrics by how good the imputation is. The framework, in particular, controls the input and output, the model imputes the input data and outputs the result, the loss guides the model’s training direction, and the metric is the standard for measuring the imputation quality. The four aspects are interconnected and independent of each other. Improvements in any of these four areas may benefit the missing value imputation task. This study improves and enhances the three problems in the section I from the perspectives of loss function and metrics, experimental design, and framework.

Iv Statistical Indexes Variation Loss and Evaluation Indicator

This section proposes a statistical indicator that can be calculated directly on the original data with missing values. This is referred to as statistical indexes variation (SIV). SIV, like the MSE-type indicator, could be used both as a loss to guide the model training as well as an evaluation index to assess the quality of imputation results.

There are three types of assumptions for missing values, [30, 13]

: MCAR, missing at random, and missing not at random. It is difficult to say which type of assumptions are appropriate for real sensor data with missing values. However, the statistical characteristics of the time-series data with and without missing values can be calculated and compared, and these characteristics should not differ significantly when a small amount of data is missing. Consequently, we begin by presenting four statistical indexes: mean, standard deviation, skewness, and kurtosis, each of which indicates distinct data distribution characteristics. Then, SIV is used to calculate the change in statistical indexes before and after data imputation.

Iv-a Statistical Indexes

For any given sequence , its statistical characteristics, such as mean () and standard deviation (), the raised power of the corresponding order skewness (S), and kurtosis (K), are calculated using Equations (12 - 15).


where is the element in , .

The mean () describes the middle point of the sample set, and the standard deviation () describes the average of the Euclidean distances between each sample point in the sample set and the mean. The skewness (S) indicates that a distribution “leans” one way or the other and has an asymmetric tail [2]. This is the amount of data distributed on both sides of the distribution center. The sample data becomes more biased to the right when the skewness is positive. Nevertheless, when the skewness is negative, the sample data becomes more biased to the left. Kurtosis (K) is associated with the distribution’s tail, shoulder, and peak [2]. Generally, the smaller the kurtosis, the flatter the data distribution, and the greater the kurtosis, and the more concentrated the data distribution. Skewness and kurtosis, however, can be thought of as the second- and third-order distances from each sample point in the sample set to the mean. We increase S and K to the power of the corresponding order to unify the dimensions.

Iv-B Siv

Assuming there are two sequences and , the SIV is calculated using Equation (16).


where can be obtained by Equation (17).


Notably, SIV ssesses the differences in statistical features between the two sequences. SIV has no requirement for the sequence length; it can measure two sequences of different lengths. Thus, it can be used to directly measure the difference between missing sequences before and after imputation. SIV can be used to the missing value imputation problem denoted by Equation (18).


where and are the observed and completed sequences following imputation, respectively. This equation can be applied as a loss in training objective functions as well as an evaluation index in model selection.

The SIV loss function is notable for its ability to be calculated directly on original and imputed data. Furthermore, rather than calculating missing values point to point, SIV considers the data distribution characteristics segment by segment. However, it should be noted that these statistical indicators neglect the time information in time-series data. Simultaneously, when the missing data rate is low and the number of imputed values is relatively low, it is simple to validate the effectiveness of SIV. When the missing rate of data is high, it is unclear if the statistical indicators before and after imputation will differ significantly.

SIV, as an evaluation indicator, can reflect the quality of the imputation effect to some extent. In particular, we present two SIV indexes in experiments. The first calculates the SIV of the overall data before and after imputation using all data as an object; the second calculates and sums the SIV of each piece of data before and after imputation using each piece of data as an object. The first result is referred to as ” Global SIV”, and the value obtained by the second is referred to as ” Local SIV.”

The SIV proposal addresses the issue of MSE being overused in the task of missing value imputation. On the one hand, SIV can be used as a loss to participate in the model’s training. SIV, on the other hand, can assess the quality of missing value imputation from a certain perspective. The most important aspect is that SIV can make the missing values in the data participate in the model’s training and evaluation.

V Multistage Imputation Framework

Fig. 2: The flow chart of MLSIF.

In this section, we present MLSIF for the missing value imputation problem. MLSIF adds a cyclic process of selecting and imputing the data to the single-stage imputation method. MLSIF, in particular, employs multistage and dynamic data length tricks. Multistage is reflected in the cyclic structure of the framework, and each cycle is a stage. Each stage imputes the data while keeping the missing rate lower than . The term dynamic data length means that the data length will change at each stage. The MLSIF flowchart is shown in Fig. 2.

Previous experimental results in the literature [9] show that as the missing rate of data increases, the imputation accuracy decreases. Furthermore, Liu [19] demonstrated that the iterative approach improves imputation performance. Therefore, iterative multistage is used to echo the dynamic data length. It can alleviate the problem of insufficient effective information caused by large-missing gaps, which makes imputation difficult, if not impossible. Additionally, the data imputed at each stage is the simplest data to impute. Therefore, the goal of MLSIF using dynamic length is to keep the missing rate at a low level.

In MLSIF, a mixture loss, combined with MSE and SIV, is used to guide the model training. This is because using only the MSE loss causes the model to easily cluster imputed values around the mean, whereas using only SIV causes the loss to be unable to capture temporally characterized values. Consequently, the final result is randomly distributed within the data distribution range with no regularity.

V-a Framework Process

In each stage in MLSIF, the missing data are imputed once, and each stage contains the four steps described below.

Step 1: Select the samples whose missing rate is lower than by Algorithm 1

The goal of this step is to select high-quality samples for subsequent training. The missing rate is introduced to make it easier to impute the selected samples. The samples with missing rates less than are selected.

0:  The data: ; Splitting lengths parameter: ; Missing rate threshold: .
0:  Selected samples set
1:  .
2:  Initialize an empty set .
3:  While the samples in without missing values:
4:        .
5:        Split data as .
6:        .
7:        for in :
8:            if the missing rate of is lower than :
9:                Add to set .
Algorithm 1 Select the samples whose missing rate is lower than .

In Algorithm 1, the input is data with missing values, and the output is a set of samples (segment of X) with a missing rate less than , called

. Additionally, there are also two hyperparameters that must be predetermined:

and . First, initialize the splitting length to 0 and create an empty set as a container for samples with a missing rate less than . By increasing the splitting length , the data is divided into small segments according to the length , called samples. is the set of all samples. Set the variable . As the set is initially empty, this step is only relevant after the second iteration of the loop. Subsequently, iterate over all samples and add those with a missing rate less than to the . The loop ends only when there are missing values in set .

Step 1 involves selecting samples with no missing values and samples with a missing rate less than . All of these samples are safer to train and easier to impute than using all samples or the samples with a missing rate greater than .

Step 2: Train the selected samples to get the imputation model by Algorithm 2

The goal of this step is to train the imputation model using the samples selected in step 1 and the MSE + SIV loss.

In Algorithm 2, input the training data set selected in Step 1, the deep learning model for imputation , where NRTSI[31]

is selected as the imputation model. Additionally, the model training epoch

and the parameters are required. In this equation, represents the proportion of values dropped during training, represents the proportional relationship between the imputed and observed values in the , and represents the proportional relationship between and the in the model .

In this step, we use the model to learn about the data’s characteristics. The SIV and MSE coexist. Therefore, we combine SIV and MSE to form the model’s loss to guide model training. The model’s loss is defined in Equation (19).


where hyperparameter represents the weight between the MSE and SIV.

This algorithm contains two nested loops. The first layer of loops represents the number of the model training iterations, and the second layer represents the traversal of the training set. The first step of each sample’s operation drops a portion of the ample’s values based on the proportion of . The dropped data is denoted by , whereas the remaining data is denoted by . As the results of previous stages are retained in subsequent stages, can be further subdivided into and , depending on whether the dropped data is imputed or observed data. The model can then be used to impute the missing values, which are denoted as . Corresponding to the positions of and , is divided into and . The purpose of splitting the output is to minimize error propagation. There is an error with each imputation iterations, so for multistage imputation, the weights of the original and imputed values should not be the same. is to control the weight between the two parts. Finally, update model the parameters by minimizing , where the loss is calculated using the formula given in Equation (19).

The innovation of this step is primarily the selection of the loss function. We combine MSE and SIV as the model loss, allowing them to all adhere to their respective strengths.

0:  The selected samples set: ; Initialized deep learning imputation model: ; The epoch of model training: ; The non-negative parameters: .
0:  Trained imputation model
1:  for in :
2:        for in :
3:            Random drop values in , where          is the number of the known values in          .
4:            Denote drop values as and data remained          as . In particular, denote the drop data          imputed at previous stages as and the          drop data from obvious as .
6:            Corresponding to the positions of          and , is divided into           and .
7:            Update the parameters of model by         minimizing calculated by Equation          (19), where                                       ,         .
Algorithm 2 Train the imputation model.

Step 3: Impute the missing values of the selected sample by Algorithm 3

The goal of this step is to use the trained model to impute the missing values in the training data.

0:  Trained imputation model: ; The selected samples set: .
0:   without missing values
1:  for in :
2:        if has missing values:
4:            Replace missing values in with .
Algorithm 3 Impute missing values in training data.

In Algorithm 3, the model and the selected samples set are input. Use the model to impute the missing values for the samples in . Finally, the training data is obtained with no missing values. However, outside of the algorithm, the imputed data will replace the missing values in the corresponding missing positions of the original data for later imputation stages.

Step 4: Are all missing values imputed?

In this step, determine whether all missing values have been imputed and subsequently decide whether to continue the loop. If yes, output the result and end. If no, proceed to Step 1.

V-B Algorithm Summary and Example

MLSIF investigates flexible imputation strategies. Its entire process constitutes the four steps listed above, which are all interconnected. Step 1 selects data for Steps 2 and 3, and Step 2 can only train the model on that data. The corresponding model in Step 3 is trained using the training data, and only the missing values in the training data can be imputed, i.e., the first step serves as the foundation for all subsequent steps, and both the training and imputing steps are required.

Overall, there is a strong link between the various stages. Algorithm 2 states that the values imputed at previous stages will be provided as information for subsequent stages. This is because imputation becomes more difficult as the number of missing values increases. When the number of missing values decreases, the difficulty of imputation decreases; hence, we begin with low-difficulty tasks first while also providing more information for later high-difficulty imputation tasks; these are the advantages of the multistage and dynamic lengths in MLSIF.

(a) Step 1 of Stage 1
(b) Result of Stage 1
(c) Step 1 of Stage 2
(d) Result of Stage 2
Fig. 3: A toy as an example. The blue points represent the observed values, orange transparent points represent the missing values positions, red points represent the imputed values in the current stage, and green points represent the imputed values in the previous stage. The black numbers above the image represent the label for each piece of data. (*) Indicates that the data has been selected. The number of missing rates in each piece of data is indicated by the red numbers on the picture axis.

An example is used to explain the entire imputation process in Fig. 3. We are given sensor data with a length of 240. First, segment the data with a length of 24 and obtain 10 pieces of data, as shown in Fig (a)a, implying 10 samples. Then, filter each piece of data and bring in the data that meets the conditions to train the model. Only the second, fourth, and seventh pieces of data in this example do not meet the requirements (missing rate is less than 10%); therefore, the remaining data (with * in Fig. (a)a) are fed into the model for training. Impute the selected samples after training, as shown in Fig (b)b. As the missing values are still present, we proceed to Stage 2.

In Stage 2, after receiving the imputed result from the previous stage, repeat the operation with a longer segmentation length. The segmentation result is shown in Fig (c)c. After the imputation of this stage is completed, it is found that the data no longer contains missing values. Hence, we obtain the final result as shown in Fig (d)d. We will only show the final result in the Experimental section.

Vi Experimental

The proposed framework is evaluated in this section by comparing some baselines to actual sensor data. We select NRTSI [31] as the basic imputation model for the proposed framework. The results are visualized. For the dataset, we use one University of California (UCI) air quality [7] and four geological sensor datasets collected by physical sensors and uploaded to the GitHub 222

Vi-a UCI Air Quality Dataset

This dataset includes 9358 hourly averaged responses from a set of five metal oxide chemical sensors embedded in an air quality chemical multisensor device. During the experimental design process, this study employed a strategy to mitigate the reality-experimental split when artificially simulating missing values. Rather than randomly removing real data, this strategy simulates the absence of real data with a high missing rate on data with a low missing rate. Thus, the imputation result is what is required, not just better in theory. This experiment specifically selects the data with a low missing rate ( ”C6H6(GT)”, ”PT08.S1(CO)”, ”PT08.S2(NMHC)”, ”PT08.S3(NOx)”, ”PT08.S4(NO2)”, and ”PT08.S5(O3)” ) on the dataset [7] and removes the data corresponding to the missing position of the data ( ”NOx(GT)” ) with high missing rate, comparing the difference of the dropped and imputed data.

Experiment 1

First, we compare the effectiveness of the proposed multistage framework and the SIV indicator. We compared the difference between the imputation without a multistage framework and SIV (OFOS), imputation without a multistage framework but with SIV (OFWS), imputation with the multistage framework but without SIV (WFOS), and imputation with the multistage framework and SIV (WFWS).

The imputation result diagram on ”C6H6(GT)” and the metrics of all imputation results are shown in Fig. 4 and Table I, respectively. Other data result shown in diagrams are available on GitHub 333 In Fig. 4, the first subpicture shows the original data, and the last four subpictures represent four imputation results of OFOS, OFWS, WFOS, and WFWS respectively.

Fig. 4: The result of ”C6H6(GT)” in experiment 1. Both the second and third pictures do not use the framework; however, the loss is different. The second graph loss is MSE, and the third graph is SIV. The framework is used in the last two graphs; however, the fourth graph loss is MSE, and the last graph loss is the mixture loss. The last four pictures correspond to Case OFOS, OFWS, WFOS and WFWS.

As shown in Fig. 4, the first and most obvious phenomenon is the difference between the model with and without the multistage framework. When comparing OFOS and OFWS to WFOS and WFWS, when the imputation model does not use the multistage framework, there is a clear horizontal line in the imputation result near the mean. OFOS and OFWS are rarely imputed by mean in the section of the green wireframe in Fig. 4. One likely explanation is that when the missing length of the sample is similar to or equal to the length of the sample input, the model is unable to impute it due to insufficient effective information input into the model, resulting in the output being comparable to the input. This is why the fixed length model is insufficient to address the issue of long-missing segments. Consequently, the framework’s benefit is clear.

Comparing WFOS with WFWS, the model in WFOS is trained using the multistage framework and MSE loss, and the imputation results are clustered around the mean. Conversely, the imputation results of WFWS trained with the multistage framework and the mixed loss are almost consistent with the real data distribution. Even when comparing the first subimage (original data), it is difficult to distinguish the differences with unaided eyes. In terms of imputation outcomes, the WFWS appears superior to the other three models, as shown in Fig. 4.

Furthermore, we examine the quality of the imputation results based on the evaluation indicators. We use MSE, MAE, , , Global SIV, and Local SIV as the comparison indicators. The result is shown in Table I. Except for and indexes, which improve as their value increases, all other indexes improve as their value decreases. The best results have been highlighted in bold. Labels at the end of the values indicate better () or worse () than OFOS (baseline).

Dataset Case MSE MAE Global SIV Local SIV
C6H6(GT) OFOS(Baseline) 0.7878 0.6518 0.2074 0.5658 0.0139 134.1547
OFWS 1.8541 1.0706 0.0928 0.1423 0.0126 156.5508
WFOS 0.8874 0.7901 0.2625 0.5359 0.0125 13.8166
WFWS 0.7620 0.6651 0.2432 0.6186 0.0081 4.3440
PT08.S1(CO) OFOS(Baseline) 0.7453 0.7095 0.2321 0.5204 0.1315 142.4815
OFWS 0.9629 0.8226 0.0205 0.3652 0.1387 148.3941
WFOS 0.7446 0.7219 0.2783 0.5471 0.0142 17.3981
WFWS 0.6839 0.6863 0.3284 0.6270 0.0108 12.4037
PT08.S2(NMHC) OFOS(Baseline) 0.8092 0.7259 0.2459 0.5298 0.067 143.9391
OFWS 1.8586 1.0986 0.1162 0.1973 0.0650 161.8961
WFOS 0.8711 0.7757 0.2600 0.5052 0.0130 19.3278
WFWS 0.7369 0.6600 0.2639 0.6376 0.0125 9.2668
PT08.S3(NOx) OFOS(Baseline) 0.5686 0.5782 0.4126 0.7178 0.0607 122.021
OFWS 0.9851 0.7594 0.0219 0.3144 0.0682 123.2433
WFOS 0.8360 0.6879 0.1234 0.4001 0.0143 6.3858
WFWS 0.6205 0.6301 0.3611 0.7477 0.0070 5.6478
PT08.S4(NO2) OFOS(Baseline) 0.5817 0.5596 0.2886 0.5671 0.1045 131.3349
OFWS 0.7088 0.6466 0.1190 0.4335 0.1062 129.3343
WFOS 0.6331 0.5856 0.2395 0.4835 0.0178 20.3843
WFWS 0.5354 0.5482 0.3386 0.6609 0.0114 13.3867
PT08.S5(O3) OFOS(Baseline) 0.5930 0.6056 0.2410 0.6032 0.0375 102.3785
OFWS 0.8888 0.7905 0.0163 0.3828 0.0426 111.7164
WFOS 0.6971 0.6867 0.2196 0.5289 0.0150 21.7660
WFWS 0.6525 0.6445 0.1418 0.4728 0.0200 15.6181
  • In the comparison, the optimal index value is bolded. Labels at the end of the values indicate better () or worse () than OFOS (Baseline).

TABLE I: Matric Results of Experiment 1.1

The first four indicators (MSE, MAE, , and ) show similar trends and achieve optimal values on the same model except ”C6H6(GT)” and ”PT08.S3(NOx)”. It can be observed that most of these indicators are optimal in WFWS, whereas some of the first four indicators are optimal in OFOS. One notable situation is that once the multistage framework is introduced, WFOS does not have a significant improvement over OFOS. One possible explanation is that after the framework is introduced, the model imputes all positions of missing data positions, whereas before the framework was introduced, the model only imputed some values, i.e., after the introduction of the framework, the model makes more attempts, and more attempts mean larger losses. After all missing values are imputed, results similar to or even better than those in OFOS can be achieved, which may help explain the effectiveness of WFWS. We believe that the absence of temporal information in the new metrics is the reason why OFWS is not better than OFOS.

The two metrics proposed in this study are shown in the last two columns. Global SIV calculates the variable of the statistical index of the overall data before and after imputation, whereas Local SIV calculates and sums the variable of the statistical index of each piece of data during the imputation process. The introduction of the multistage framework considerably improves the imputation results on these two metrics. Among them, the most noticeable improvement is in Local SIV, and the value is reduced by dozens. More details about the relationship between the specific trend of these two indicators and other indicators as well as the relationship between these two indicators and the imputation results are further explored in Experiment 2.

From Experiment 1, it is shown that the use of the framework can solve the issue of insufficient imputation caused by fixed length and one-stage. Furthermore, using the mixed loss can improve the imputation effect of the model and alleviate the problem of MSE as the model loss.

Experiment 2

In this experiment, we explore the impact of the weights between MSE and SIV in the mixed loss. Simultaneously, the relationship between SIV and MSE as a metric is determined by analyzing the imputation results. Here, we present and analyze the imputation results of ”PT08.S2(NMHC)”, as shown in Fig. 5 and Fig. 6. They show the model’s imputation details for small- and large-missing gaps. Other compared results on other datasets can be found on the GitHub 444 In Fig. 5 and Fig. 6, the blue points represent the real data, red points represent the imputed data, and orange points represent the removed data. The distribution of the original and imputed data is represented by the blue and red lines on the right side of each image, respectively. The corresponding variation law of each index with is explained and shown in Fig. 7, where the dots circled in red represent the optimal value’s location.

Fig. 5: Imputation results for small missing gaps in data ”PT08.S2(NMHC)”, with the blue points representing observation values, orange points representing removed values, and red points representing imputed values. The blue and red lines on the right side of each image represent the distribution of the original and imputed data, respectively.
Fig. 6: Imputation results for large missing gaps in data ”PT08.S2(NMHC)”, with the blue points representing observation values, orange points representing removed values, and red points representing imputed values. The blue and red lines on the right side of each image represent the distribution of the original and imputed data, respectively.

In Fig. 5, the presence of a horizontal line close to the mean throughout the imputation results is the most noticeable. This phenomenon is alleviated when the is greater than 0.98. The same phenomenon can also be observed from the data distribution diagram. When is low, the imputed data cluster around the mean and gradually spreads out as increases. When , the distribution of imputed values is almost identical to that distribution of the original data.

In Fig. 6, there are large missing gaps in the data. The most obvious phenomenon from the distribution map is that the imputed data become looser as increases. When is low, the imputed data tend to cluster around the mean. As increases, the data becomes more dispersed, and, when equals 1, the data is completely dispersed, with the degree of dispersion being close to that of the original data. For the above two losses, MSE-type loss can capture certain temporal characteristics; however, it is prone to making imputations around the mean. SIV cannot capture temporal characteristics but allows the model to learn the data’s discrete situation; therefore, combining the two yields better results. Figures 5 and 6 show that most models perform well with small missing gaps. However, MLSIF impressively solves large-missing gap problems.

(a) MSE
(b) MAE
(e) Global SIV
(f) Local SIV
Fig. 7: Metrics change with . The abscissa represents the value of , and the ordinate represents the value of each indicator.

In Fig. (a)a - (d)d, the changing trend of these indicators is consistent. Generally, as increases, the metrics improve first and then deteriorate. When is small, the changes in these indicators are subtle, and most optimal values are found between 0.8 and 0.98. When exceeds 0.98, the indicators rapidly deteriorate. The reason for this is that when is large, MSE plays an insignificant role in the loss, allowing the SIV to play a dominant role, with the imputed data scattered within a certain range, resulting in these indicators having large values. One thing is certain: using the SIV to form the mixed loss improves the results on both metrics; however, using only the SIV makes the result worse than using only the MSE.

In Fig. (e)e and (f)f, the Global SIV and Local SIV metrics show a clear downward trend. This result was consistent with expectations. As the proportion of SIV loss increases, the value of the final imputed result using SIV as a metric also decreases. Consequently, unlike the others, these two indicators did not change significantly at . The optimal values are all obtained at around . At this time, the SIV plays a dominant role in the loss. It shows that between and , MSE and SIV can be traded off, resulting in both being relatively low.

By observing and analyzing the experimental results, we discovered that the Global SIV is more concerned with the difference between the imputed and original value. The Global SIV is not particularly large if there are few points that deviate from the discrete degree of the original data. The Local SIV calculates the changes in four statistical indicators for each small sample. Consequently, when the Global SIV is smaller, the data appear more compact. Furthermore, when the Local SIV is smaller, the data are more coherent. Therefore, the Local SIV can be used to assess the consistency between the imputed and true value.

In summary, Experiment 2 investigates the mixing loss further and investigated its impact on various indicators and imputation results. The experimental results show that using the mixed loss is better than using the MSE or the SIV loss alone in all metrics. Furthermore, it was found that most of the optimal values are obtained at around . These six measurement methods correspond to three types of measurement angles as metrics. The first type (MSE, MAE) focuses on the difference between each imputation and removed points. The second type () focuses on the distribution between the imputed and mean of the dropped data. The third type (SIV) focuses on the distribution difference between the original and imputed data.

Vi-B Physical Sensor Data

Due to the difficulty of acquiring and maintaining geological sensor data, there are frequently a significant number of missing values, making the subsequent analysis and research difficult. The sensitive information of the timestamp and numerical value of these data have been processed and uploaded to the GitHub 555

Experiment 3

The selected data in this experiment, named 35717443_temp, contains more than 20,000 data points and has approximately 3% missing values. In this section, we test the framework’s performance on these data with different missing rate scenarios. In contrast to other methods for randomly constructing missing values, the strategy used in this study is to construct large segments of missing values, i.e., a random position is selected and a portion of the original data is removed near this position. We compare the following methods when confronted with such a dataset:

  • ZOO (From R Package) [41]

    : three of the methods were chosen, namely, mean imputation, interpolation imputation and (cubic) linear imputation, called na.aggregate, na.approx, and na.spline, respectively.

  • ImputeTS (From R Package) [26]: we choose two of these methods, namely, linear imputation and structural model and Kalman smooth imputation, named na.interpolation and na.kalman, respectively.

  • BRITS and BRITS-I [3]: a method based on recurrent neural networks for missing value imputation in time-series data.

  • CSDI [37]: a time-series imputation method that utilizes score-based diffusion models conditioned on observed data.

  • NAOMI [20]: a nonautoregressive approach to impute long-range sequences given arbitrary missing patterns.

  • NRTSI [31]: reformulate time series as permutation-equivariant sets do not impose any recurrent structures to impute missing data.

(a) 0% missing
(b) 10% missing
(c) 20% missing
(d) 30% missing
(e) 40% missing
(f) 50% missing
Fig. 8: Imputation results of data with different missing rate using deep learning methods.

The imputation results and corresponding metrics (MSE and Global SIV) are shown in Fig. 8 and Table II, respectively. In Fig. 8, only the results of deep learning methods in data imputation are shown due to space constraints, whereas the metrics for all methods are displayed in Table II.

It can be observed from Fig. 8 that compared with other deep learning methods, MLSIF has a good imputation ability in the face of small- and large-missing segments. Especially when dealing with large-missing segments, MLSIF can still impute the missing parts according to the known data. Other deep learning methods fail when confronted with the same problem, imputing either the same or random values. The results of the metrics in Table II are consistent with the intuition. MLSIF can almost achieve the best results under both metrics when the missing rate is greater than 20%. Other methods are still competitive when the missing rate is less than 10%. It is also worth noting that, in most cases, the effect of deep learning is better than that of statistical methods in terms of indicators.

methods 0% 10% 20% 30% 40% 50%
mse siv mse siv mse siv mse siv mse siv mse siv
aggregate - 0.0016 0.4465 0.7038 0.9901 0.9163 1.4468 0.8061 1.6181 0.0111 - -
approx - 0.1018 0.3973 1.1184 1.3879 0.0264 1.841 0.2198 1.455 1.4625 1.1858 1.6105
spline - 0.0974 0.4134 1.201 42.9614 7.3364 13.677 2.041 1214.531 451.4788 51.2403 19.3022
interpolation - 0.0017 0.4904 0.0326 1.9098 0.049 1.8354 0.2663 1.4855 1.7797 1.1639 1.9565
kalman - 0.7093 52.7797 3.9657 9655.416 1833.317 3.4357 0.2284 249505.5 100113.9 2615.578 1250.991
BRITS - 0.0244 0.2806 1.0958 0.5528 0.0856 0.8542 1.5182 0.935 0.1286 1.0389 1.5358
BRITS-I - 0.034 0.5155 0.6348 0.4463 1.1412 0.696 0.8921 0.885 0.1078 1.1024 0.9712
CSDI - 0.0001 0.9817 0.0045 0.4935 0.8739 0.9886 1.2531 2.1484 0.3234 2.5333 0.3766
NAOMI - 0.0168 0.5195 0.3827 0.5224 0.823 0.7152 0.9769 0.9572 0.1097 1.1177 0.9732
NRTIS - 0.0001 0.5188 0.0707 0.5096 0.0848 0.7104 0.8733 0.9563 0.1027 1.1141 1.1545
MLSIF - 0.0043 0.7134 0.0098 0.4358 0.0178 0.6015 0.0337 0.8468 0.0791 1.2143 0.0987
TABLE II: MSE and Global SIV of data imputation with different missing rate using deep learning and statistics methods.

Experiment 4

In this experiment, we apply the model to three real geological sensor datasets with missing values (45710421_x, 45710421_y, and 45710422_x). There are approximately 20,000 timestamps in each dataset used in this experiment, with a missing rate greater than 30%.

(a) 45710421_x imputed by R
(b) 45710421_x imputed by deep learning methods
(c) 45710421_y imputed by R
(d) 45710421_y imputed by deep learning methods
(e) 45710422_x imputed by R
(f) 45710422_x imputed by deep learning methods
Fig. 9: The imputation results of Experiment 3 by R package and deep learning methods. The blue on the abscissa represents the coordinate corresponding to the location where the observed value exists.

The experimental results are shown in Fig. 9. From the overall point of view of the data, the imputation results of the statistical methods are very good when small segments are missing, almost indistinguishable from the original data. However, traces of imputation can be seen in the missing positions of large segments. When there are no real values in a sample, the model training becomes out of control, which is a problem for these deep learning methods. This type of sample is removed during training. The results show that while the validation loss is decreasing (low to 0.01 level), the imputation results are not improving. As in the previous experiment, when faced with a large segment of missing values, the results of imputation using other deep learning methods are either the same value or a random value around a certain value.

The MLSIF can not only impute good results for small missing gaps but also successfully impute large-missing gaps. The MLSIF imputation results are almost consistent with the original data distribution in the positions where small segments are missing. The imputation results are nearly identical to the original data trend and adhere to the data distribution in the large missing gap position in the middle.

methods 45710421_x 45710421_y 45710422_x
aggregate 0.02761 0.02201 0.02148
approx 0.03271 0.01106 0.05718
spline 0.02346 0.01013 0.07715
interpolation 0.03365 0.01066 0.07081
kalman 0.03412 0.01099 0.05648
BRITS 1.00396 3.07141 4.08062
BRITS-I 0.93489 2.97145 3.77598
CSDI 0.98591 2.72885 5.75567
NAOMI 0.02827 0.03034 0.03183
NRTIS 0.02691 0.02758 0.02592
MLSIF 0.01474 0.01048 0.01691
TABLE III: Global SIV metric results of three geosensor data.

Table III shows the Global SIV based on three geosensor data. As these datasets contain many missing values, removing regret affects the data’s integrity, which also causes MSE, MAE, , and from being calculated. Local SIV is excluded from the table because only MLSIF can calculate this metric.

The application of MLSIF to practical problems is reflected in Experiment 4. It has been discovered that many models fail to produce good imputation results when faced with significant missing data gaps; however, MLSIF can handle this problem and produce good results.

Vi-C Computational Efficiency Analysis

Contrary to the one train-and-impute framework, MLSIF requires many cycles of training and imputing. Taking the six datasets in Experiment 3 as an example, they must go through 12, 59, 108, 151, 200 and 246 cycles of training and imputing to obtain the results. Compared to the single-stage model, its time will increase as the cycles increase. In practical, the following tricks will help save time:

  • For models that are not sensitive to sequence length, such as NRTIS [31], a certain number of training iterations can be reduced by inheriting the parameters of the previous stage in each stage; (The measure taken in this paper)

  • For models that are sensitive to sequence length, such as NAOMI [20], one can reduce a certain number of training iterations by inheriting the same input length model parameters.

Generally, statistical methods are much more computationally efficient than deep learning methods; however, they sacrifice a certain level of accuracy. In contrast, deep learning methods trade time for precision, whereas MLSIF speeds up more time for higher precision.

Vii Conclusion

This study proposes SIV and MLSIF for sensor data missing value imputation. The introduction of SIV loss improves the imputation models, and the SIV metric measures the imputation effect effectiveness. MLSIF uses the multistage imputation method, which uses the imputation results of previous stages as knowledge to facilitate the learning of the model and improve the effect of imputation. During the imputation process, the framework dynamically adjusts the data length according to the unimputed data. This approach can be used to adaptively deal with different degrees of missing tasks. In the experimental design, we jumped out of the inherent assumptions, by simulating the actual situation of the real missing data, to avoid the phenomenon that the verification process deviates from reality. This paper only discusses one-dimensional sensor data. Extending the proposed method to multidimensional data imputation should be researched in future studies. Additionally, more measurement methods based on missing values must be explored.


  • [1] M. J. Azur, E. A. Stuart, C. Frangakis, and P. J. Leaf (2011) Multiple imputation by chained equations: what is it and how does it work?. International journal of methods in psychiatric research 20 (1), pp. 40–49. Cited by: §I.
  • [2] M. K. Cain, Z. Zhang, and K. Yuan (2017) Univariate and multivariate skewness and kurtosis for measuring nonnormality: prevalence, influence and estimation. Behavior research methods 49 (5), pp. 1716–1735. Cited by: §IV-A.
  • [3] W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y. Li (2018) Brits: bidirectional recurrent imputation for time series. arXiv preprint arXiv:1805.10572. Cited by: §I, §I, §III-A, 3rd item.
  • [4] S. Chen, H. Xu, D. Liu, B. Hu, and H. Wang (2014) A vision of iot: applications, challenges, and opportunities with china perspective. IEEE Internet of Things journal 1 (4), pp. 349–359. Cited by: §I.
  • [5] T. Choi, J. Kang, and J. Kim (2020) RDIS: random drop imputation with self-training for incomplete time series data. arXiv preprint arXiv:2010.10075. Cited by: §III-D.
  • [6] J. J. Dabrowski and A. Rahman (2019) Sequence-to-sequence imputation of missing sensor data. In

    Australasian Joint Conference on Artificial Intelligence

    pp. 265–276. Cited by: §I, §I, §III-A.
  • [7] S. De Vito, E. Massera, M. Piga, L. Martinotto, and G. Di Francia (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical 129 (2), pp. 750–757. Cited by: §VI-A, §VI.
  • [8] C. Fang and C. Wang (2020) Time series data imputation: a survey on deep learning approaches. arXiv preprint arXiv:2011.11347. Cited by: §I.
  • [9] A. Farhangfar, L. A. Kurgan, and W. Pedrycz (2007) A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37 (5), pp. 692–709. Cited by: §III-B, §V.
  • [10] Z. Ghahramani and M. I. Jordan (1994) Supervised learning from incomplete data via an em approach. In Advances in neural information processing systems, pp. 120–127. Cited by: §I.
  • [11] Z. Guo, Y. Wan, and H. Ye (2019) A data imputation method for multivariate time series based on generative adversarial network. Neurocomputing 360, pp. 185–197. Cited by: §I, §I, §III-A.
  • [12] M. Gupta and R. Beheshti (2020) Time-series imputation and prediction with bi-directional generative adversarial networks. arXiv preprint arXiv:2009.08900. Cited by: §I, §I, §III-A.
  • [13] E. Hallaji, R. Razavi-Far, and M. Saif (2021) DLIN: deep ladder imputation network. IEEE Transactions on Cybernetics. Cited by: §IV.
  • [14] A. T. Hudak, N. L. Crookston, J. S. Evans, D. E. Hall, and M. J. Falkowski (2008) Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sensing of Environment 112 (5), pp. 2232–2245. Cited by: §I.
  • [15] H. Junninen, H. Niska, K. Tuppurainen, J. Ruuskanen, and M. Kolehmainen (2004) Methods for imputation of missing values in air quality data sets. Atmospheric Environment 38 (18), pp. 2895–2907. Cited by: §I, §I, §III-C, §III-D.
  • [16] S. C. Li, B. Jiang, and B. Marlin (2019) Misgan: learning from incomplete data with generative adversarial networks. arXiv preprint arXiv:1902.09599. Cited by: §I, §I, §III-A.
  • [17] W. Lin and C. Tsai (2020) Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review 53 (2), pp. 1487–1509. Cited by: §I.
  • [18] Z. C. Lipton, D. C. Kale, R. Wetzel, et al. (2016) Modeling missing data in clinical time series with rnns. Machine Learning for Healthcare 56. Cited by: §I, §I, §III-A.
  • [19] Y. Liu, T. Dillon, W. Yu, W. Rahayu, and F. Mostafa (2020) Missing value imputation for industrial iot sensor data with large gaps. IEEE Internet of Things Journal 7 (8), pp. 6855–6867. Cited by: §I, §V.
  • [20] Y. Liu, R. Yu, S. Zheng, E. Zhan, and Y. Yue (2019) Naomi: non-autoregressive multiresolution sequence imputation. arXiv preprint arXiv:1901.10946. Cited by: §I, §I, §III-A, §III-D, 5th item, 2nd item.
  • [21] Y. Luo, X. Cai, Y. Zhang, J. Xu, and X. Yuan (2018) Multivariate time series imputation with generative adversarial networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 1603–1614. Cited by: §I, §I, §III-A.
  • [22] Y. Luo, Y. Zhang, X. Cai, and X. Yuan (2019) E2gan: end-to-end generative adversarial network for multivariate time series imputation. In AAAI Press, pp. 3094–3100. Cited by: §I, §I, §III-A.
  • [23] J. Ma, Z. Shou, A. Zareian, H. Mansour, A. Vetro, and S. Chang (2019) CDSA: cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904. Cited by: §I, §I, §III-A.
  • [24] Q. Ma, S. Li, L. Shen, J. Wang, J. Wei, Z. Yu, and G. W. Cottrell (2019) End-to-end incomplete time-series modeling from linear memory of latent variables. IEEE transactions on cybernetics 50 (12), pp. 4908–4920. Cited by: §I, §I.
  • [25] X. Miao, Y. Wu, J. Wang, Y. Gao, X. Mao, and J. Yin (2021)

    Generative semi-supervised learning for multivariate time series imputation

    In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 8983–8991. Cited by: §I, §I, §III-A.
  • [26] S. Moritz and T. Bartz-Beielstein (2017) ImputeTS: time series missing value imputation in r.. R J. 9 (1), pp. 207. Cited by: 2nd item.
  • [27] M. Morup, D. M. Dunlavy, E. Acar, and T. G. Kolda (2010)

    Scalable tensor factorizations with missing data.

    Technical report Sandia National Laboratories. Cited by: §I.
  • [28] A. W. Mulyadi, E. Jun, and H. Suk (2021) Uncertainty-aware variational-recurrent imputation network for clinical time series. IEEE Transactions on Cybernetics. Cited by: §I, §I.
  • [29] M. G. Rahman and M. Z. Islam (2014) Fimus: a framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowledge-Based Systems 56, pp. 311–327. Cited by: §III-B.
  • [30] D. B. Rubin (1976) Inference and missing data. Biometrika 63 (3), pp. 581–592. Cited by: §IV.
  • [31] S. Shan, Y. Li, and J. B. Oliva (2021) NRTSI: non-recurrent time series imputation. arXiv preprint arXiv:2102.03340. Cited by: §I, §I, §I, §III-A, §III-D, §V-A, 6th item, 1st item, §VI.
  • [32] L. Shen, Q. Ma, and S. Li (2018) End-to-end time series imputation via residual short paths. In Asian conference on machine learning, pp. 248–263. Cited by: §III-A.
  • [33] S. Song, A. Zhang, J. Wang, and P. S. Yu (2015) SCREEN: stream data cleaning under speed constraints. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 827–841. Cited by: §I.
  • [34] J. A. Stankovic (2014) Research directions for the internet of things. IEEE internet of things journal 1 (1), pp. 3–9. Cited by: §I.
  • [35] Q. Suo, L. Yao, G. Xun, J. Sun, and A. Zhang (2019) Recurrent imputation for multivariate time series with missing values. In 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–3. Cited by: §I, §I, §III-A.
  • [36] Q. Suo, W. Zhong, G. Xun, J. Sun, C. Chen, and A. Zhang (2020) GLIMA: global and local time series imputation with multi-directional attention learning. In 2020 IEEE International Conference on Big Data (Big Data), pp. 798–807. Cited by: §I, §I, §III-A.
  • [37] Y. Tashiro, J. Song, Y. Song, and S. Ermon (2021) CSDI: conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems 34, pp. 24804–24816. Cited by: 4th item.
  • [38] S. Van Buuren and K. Groothuis-Oudshoorn (2011) Mice: multivariate imputation by chained equations in r. Journal of statistical software 45, pp. 1–67. Cited by: §I.
  • [39] J. Yoon, J. Jordon, and M. Schaar (2018) Gain: missing data imputation using generative adversarial nets. In International Conference on Machine Learning, pp. 5689–5698. Cited by: §I, §I, §III-A.
  • [40] J. Yoon, W. R. Zame, and M. van der Schaar (2017) Multi-directional recurrent neural networks: a novel method for estimating missing data. In Time Series Workshop at the 34th International Conference on Machine, pp. 1–5. Cited by: §I, §I, §III-A.
  • [41] A. Zeileis and G. Grothendieck (2005) Zoo: s3 infrastructure for regular and irregular time series. arXiv preprint math/0505527. Cited by: 1st item.
  • [42] A. Zhang, S. Song, J. Wang, and P. S. Yu (2017)

    Time series data cleaning: from anomaly detection to anomaly repairing

    Proceedings of the VLDB Endowment 10 (10), pp. 1046–1057. Cited by: §I.
  • [43] A. Zhang, S. Song, and J. Wang (2016) Sequential data cleaning: a statistical approach. In Proceedings of the 2016 International Conference on Management of Data, pp. 909–924. Cited by: §I.
  • [44] Y. Zhang, P. J. Thorburn, W. Xiang, and P. Fitch (2019) SSIM—a deep learning approach for recovering missing time series sensor data. IEEE Internet of Things Journal 6 (4), pp. 6618–6628. Cited by: §III-A.

Viii Biography Section

Jin-Sheng Yang received a bachelor’s degree from Sichuan University in 2016. Currently, he is a postgraduate student at Hainan University. His main research direction is time series data analysis.

Yuan-Hai Shao received his B.S. degree in information and computing science from College of Mathematics, Jilin University, a master’s degree in applied mathematics, and a Ph.D. degree in operations research and management in College of Science from China Agricultural University, China, in 2006, 2008 and 2011, respectively. Currently, he is a Full Professor at the School of Management, Hainan University, Haikou, China. His research interests include support vector machines, optimization methods, machine learning and data mining. He has published over 100 refereed papers on these areas, including IEEE TPAMI, IEEE TNNLS, IEEE TC, PR, and NN.

Chun-Na Li received her Master’s degree and Ph.D degree in Department of Mathematics from Harbin Institute of Technology, China, in 2009 and 2012, respectively. Currently, she is a professor at Management School, Hainan University. Her research interests include optimization methods, machine learning and data mining.

Wen-si Wang received his Master’s and Ph.D. degrees in Microelectronics from Tyndall National Institute, Republic of Ireland, in 2008 and 2012, respectively. He was a visiting scholar with the Georgia Institute of Technology, Atlanta in 2012. From 2013 to 2015, he was with Tyndall National Institute as Post-doc and Assistant Researcher. Since 2015, he has been with the Beijing University of Technology as an Associate Professor. He is also the co-founder of a medical R&D company SuperVision with its focus on A.I. in medical applications. He has published over 30 papers and filed over 20 patents in this area.