Bayesian Reconstruction of Missing Observations

04/23/2014 ∙ by Shun Kataoka, et al. ∙ 0

We focus on an interpolation method referred to Bayesian reconstruction in this paper. Whereas in standard interpolation methods missing data are interpolated deterministically, in Bayesian reconstruction, missing data are interpolated probabilistically using a Bayesian treatment. In this paper, we address the framework of Bayesian reconstruction and its application to the traffic data reconstruction problem in the field of traffic engineering. In the latter part of this paper, we describe the evaluation of the statistical performance of our Bayesian traffic reconstruction model using a statistical mechanical approach and clarify its statistical behavior.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Methods for interpolating missing data are important in various scientific fields. A standard interpolation method, such as spline interpolation, is a deterministic interpolation technique. An alternative, probabilistic, interpolation technique has been developed in the last few years. In the probabilistic interpolation technique, which is called Bayesian reconstruction, the Bayesian treatment is used to interpolate and reconstruct missing regions. To the best of our Knowledge, Bayesian reconstruction was first implemented in the digital image inpainting filter. The digital image inpainting filter is used in the process of reconstructing lost or deteriorated parts of images 

[1] (see figure 1).

Figure 1: Example of digital image inpainting filter. The left image is the original scratched image and the center image is masked by the black regions. We reconstructed the masked region using the digital image inpainting filter to restore the original damaged image. The right image is the reconstructed image obtained by using the digital image inpainting method proposed in reference [2].

Previously, two of the authors applied Bayesian reconstruction to the digital image inpainting filter [3, 4]. Bayesian reconstruction is now becoming the standard technique in the digital image inpainting filter [5].

It can be expected that the framework of Bayesian reconstructions will be utilized in various reconstruction problems, and therefore, their use should not be limited to image processing. Recently, the authors applied Bayesian reconstruction to the traffic data reconstruction problem [6]. Traffic data reconstruction is an important processing that precedes traffic prediction, such as travel time prediction, density prediction, and route planning. In order to provide accurate information to drivers, a broad-scale database of real-time vehicular traffic over an entire city is required. However, in practice, it is difficult to collect the traffic data for an entire city, because traffic sensors are not installed on all roads. Therefore, the objective of the traffic data reconstruction is to reconstruct the states of unobserved roads where traffic sensors are not installed by using information from observed roads where traffic sensors are installed.

In the first part of this paper, we introduce the details of Bayesian reconstruction, and subsequently, an overview of the Bayesian traffic data reconstruction method proposed in reference [6], together with some new numerical results. In the latter part of this paper, we show a statistical mechanical analysis of our Bayesian traffic reconstruction, and clarify its statistical performance. The remainder of this paper is organized as follows. In section 2

, we introduce the framework of Bayesian reconstructions based on Markov random fields (MRFs). We explain a machine learning strategy for model selection based on the maximum likelihood estimation (MLE) in section

2.2. In section 3, we present an overview of Bayesian traffic data reconstruction according to the method proposed in reference [6], and we show some new numerical results in section 3.2. We describe our evaluation of the statistical performance of our Bayesian traffic reconstruction in terms of a statistical mechanical analysis in section 4. Finally, we present the conclusions of this paper and outline future work in section 5.

2 Scheme of Bayesian Reconstruction of Missing Observations

In the Bayesian framework, we suppose observations (observed data) are probabilistically drawn from a specific probability distribution, referred to as prior probability, because observations suffer from uncertainty, whose origin is physical noise, incompleteness of some elements, and so on.

Suppose that there exists an -dimensional observation , which is generated from prior probability

, and that we cannot observe a part of the elements in the observation for some reason. Since the observation is probabilistically generated, we can treat its elements as random variables. We define the set of labels of missing elements by

and the complementary set of by notation , i.e., , and therefore, is the set of labels of observed elements. Given an observation , we describe the values of the observed elements by notation to distinguish them from unobserved elements and collectively express the observed elements by . The values of the elements in set are fixed by the observation . Bayesian reconstruction considered in this paper consists of reconstructing the unobserved elements, , in the observation by using the observed elements, ; in other words, the objective is to estimate the values of by using , where notation is the set of belonging to set , i.e., .

In order to reconstruct the missing elements in terms of the Bayesian point of view, we first formulate the posterior probability of the missing elements,

, by the Bayesian rule

(1)

where the value of is fixed by the observation. By using Dirac’s delta, we have

(2)

It should be noted that, if are discrete variables, Dirac’s delta is replaced by Kronecker’s delta. From equations (1) and (2), we have

(3)

where

Probability is referred to as the likelihood in the Bayesian framework. In Bayesian reconstruction, we consider the suitable reconstructed values of unobserved elements, , to be the values of that maximize the posterior probability in equation (3), i.e.,

(4)

The above reconstruction scheme requires prior probabilities that describe the hidden probabilistic mechanisms of observations. However, unfortunately, in almost all situations we do not know the details of the prior probabilities. Therefore, in order to implement the Bayesian reconstruction system, we should model unknown prior probabilities.

2.1 Prior Modeling based on Markov Random Fields

One of the models of prior probabilities of observed data that is presently available is MRF. MRFs can easily treat complex spatial interactions among observational data points that create a variety of appearance patterns.

Consider an undirected graph , where is the set of vertices and is the set undirected edges between the vertices. An MRF is usually defined on such an undirected graph by assigning each variable to the corresponding vertex . Edge expresses a spatial interaction between variable and variable . To construct a probabilistic model, we define the joint probability of , . On the undirected graph , if we assume a spatial Markov property among random variables, , and the positivity of model , by the Hammersley-Clifford theorem, the model can be expressed as

(5)

without loss of generalities. The first term in the exponent, , is a potential function on vertex that determines the characteristic of , and the second term in the exponent, , is a potential function between vertices and that determines the interaction between and . represents the summation running over all edges, denotes the normalization constant, sometimes referred to as the partition function, defined by

The model in equation (5) is the MRF that is most frequently used. In the MRF, let us consider the conditional probability of expressed as

(6)

where denotes the set of all variables except : . Equations (5) and (6) lead to

(7)

where denotes the set of vertices connecting to vertex in the graph, and denotes the set of variables on the vertices belonging to , that is, is the set of nearest neighbor variables of : . Equation (7) states that the variable depends on only nearest neighbor variables in the conditional probability, and this constitutes the spatial Markov property of MRF.

2.2 Model Selection using Parametric Machine Learning

In order to implement the MRF in equation (5), the forms of potential functions should be determined. This is one of the most important points in MRF modeling. Parametrically, we model the potential functions by certain parametric functions with parameter ,

(8)

Thus, we should find the optimal values of the parameters. The standard method for achieving this is provided by the field of machine learning theory described as follows.

The objective of MRF modeling is to model the unknown prior probability of observation . Therefore, the optimal values of the parameters, , should minimize some distance between the prior probability and our model

. The Kullback-Leibler divergence (KLD)

(9)

is often utilized as a measure of two distinct probabilities, and . The value of KLD is always non-negative and is zero when two probabilities are equivalent. Thus, we consider that two distinct probabilities are close to each other when the value of KLD is small. In terms of KLD, we suppose the optimal values of the parameters are given by minimizing the value of KLD between the prior probability and our model,

However, we cannot perform this minimization because we do not know the prior probability.

Since we do not know the prior probability, we suppose instead that we have many complete observations222 “Complete” means each observation includes no missing points. generated from the prior probability. We describe the observations by , and we define the empirical distribution of the complete observations by

(10)

The empirical distribution is the frequency distribution of the complete observations. It should be noted that, if are discrete variables, Dirac’s delta is again replaced by Kronecker’s delta. We suppose the empirical distribution has some important properties of the prior probability and that suppose the optimal values of the parameters are approximately obtained by minimizing the value of KLD between the empirical distribution and our model,

(11)

This minimization can be perform if we have the complete observations generated from the prior probability. Equation (11) is rewritten as

This corresponds to the MLE in statistics. In the above scheme, we assumed there are no missing points in the observations used in the estimation of the parameters. If the observations include missing points, we will use an alternative strategy and apply the expectation and maximization (EM) algorithm.

From the above arguments, the (parametric) Bayesian reconstruction system is summarized as follows. Before the reconstructions, we design the potential functions in our MRF model and estimate the optimal values of the parameters, , in advance by using many complete observations and equation (11). Then, the reconstruction is approximately performed using the constructed model instead of the prior probability in equation (4), i.e.,

(12)

Since

the suitable reconstructed values of unobserved missing points, , are the values that maximize the conditional probability of our model,

with fixed by the observation .

3 Overview of Bayesian Traffic Data Reconstruction

In this section, we give an overview of the application of the Bayesian reconstruction scheme presented in the previous section to the traffic data recognition problem proposed by the authors [6], together with some new numerical results.

3.1 MRF model for Bayesian Traffic Data Reconstruction

We applied the Bayesian reconstruction scheme to traffic data reconstruction. Our goal is to reconstruct the states of roads, and therefore, random variables are assigned to roads. In order to formulate an MRF for a road network, we construct an undirected graph as follows: we assign each vertex on each road and draw each edge between two roads that are connected to each other at a traffic intersection (see figure 2).

Figure 2: Undirected graph representation for road network. (a) Road network with six roads and two intersections. (b) Vertices are assigned to roads. (c) Edges are drawn between two roads that are connected to each other at intersections.

On the undirected graph, we define the MRF by

(13)

where are the parameters of the model 333Although this expression seems to differ slightly from the original model proposed in reference [6], this expression is essentially equivalent to the original model.. The variable expresses the state of road . In this paper, we consider as traffic densities according to the method in reference [6]. The traffic density on road is defined by the number of cars per unit area on road ; high densities tend to lead to traffic jams. This MRF is obtained by setting and in equation (5). Parameter is the bias that controls the level of the traffic density of road , and parameter

controls the variances in the traffic densities. The interaction term in the last term in the exponent in equation (

13) corresponds to our assumption that is traffic densities of neighboring roads take close values. Parameter controls the strength of the assumption. This MRF forms the multi-dimensional Gaussian and is known as the Gaussian graphical model (GGM).

As in the previous section, we represent the set of unobserved roads by and the set of observed roads by . After determining the values of the parameters by the machine learning method in equation (11), from equation (12), the reconstructed densities on the unobserved roads are obtained by

(14)

where represents the densities on the observed roads. Since our model in equation (13) is multi-dimensional Gaussian, the conditional probability is also multi-dimensional Gaussian. Therefore, equation (14) is rewritten as

(15)

Hence, we find that the reconstructed densities, , are the expectations of . The expectations are obtained by solving the simultaneous equations,

(16)

by an iteration method, where denotes the number of elements in the assigned set and

Equation (16) is known as the mean-field equation, which is obtained by the naive mean-field approximation for , and is also known as the Gauss-Seidel method. It is known that in GGM mean-field equations and Gauss-Seidel methods are equivalent in general and that mean-field equations always provide exact expectations [7].

3.2 Results of Numerical Simulation using Road Network of Sendai-city

In this section, we describe the application of our Bayesian traffic data reconstruction to the road network of the city of Sendai (shown in figure 3) and show the performance of our model.

Figure 3: Road network of Sendai, Japan. This network consists of about ten thousand roads.

Figure 3 shows the road network of Sendai which consists of about ten thousand roads. According to our Bayesian traffic data reconstruction scheme, first, we defined the MRF model shown in equation (13) for the road network. The structure of the MRF was constructed according to figure 2.

The parameters were determined by the maximum likelihood estimation shown in section 2.2 with the regularizations [6]. Regularizations are frequently used to avoid over-fitting to noises in training data. In the maximum likelihood estimation, we used complete traffic data generated by a traffic simulator for the road network of Sendai. Although the traffic data were not real, they were presumed to represent typical behavior of traffic in Sendai. Using the MRF model, we reconstructed the traffic densities of Sendai.

Figure 4: True traffic density used in our numerical experiment. Each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai.

Figure 4 shows the traffic density data that were not used in the learning, namely, the test data. In order to visually represent the densities, we quantized the densities into five stages at 0.03 intervals; each road is colored according to its quantized traffic density. The densities increase in the following order: black, blue, green, yellow, and red. Thus, a road colored black has a density in the interval .

For the traffic density data shown in figure 4, we suppose that the densities in some roads are unobserved and we randomly select unobserved roads with probability .

Figure 5: Positions of unobserved roads, where the unobserved roads are colored red. About 80 % of roads are unobserved. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai.

Figure 5 shows the positions of the unobserved roads, which are colored red. We reconstructed densities in the unobserved roads using the densities in the observed roads, which colored black in the figure. Our Bayesian reconstruction result is shown in figure 6.

Figure 6: Reconstruction result by using our Bayesian reconstruction, where each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the central area of Sendai.

The mean square error (MSE) between the true densities shown in figure 4 and the reconstructed densities shown in figure 6 of the unobserved roads is approximately 0.001447, where the MSE is defined by

(17)

where is the true density on road and is the density on road reconstructed by our method. The scatter plot of this reconstruction is shown in figure 7.

Figure 7: Scatter plot of the true densities in figure 4 and the reconstructed densities in figure 6.

The correlation coefficient of this scatter plot is approximately 0.919. It can be seen that the correlation coefficient is close to one. Thus, our simple MRF model can be expected to capture a static statistical property of the traffic data.

Next, we address the average performance of our reconstruction method versus the value of the missing probability . Figure 8 shows the MSE versus the value of the missing probability .

Figure 8: MSE versus the missing probability . The solid curve is the Bezier interpolation of points.

Each point is the average value of MSE over 100 trials using the leave-one-out cross-validation method. It can be seen that the error increases with the value of the missing probability .

4 A Statistical Mechanical Analysis of Bayesian Traffic Data Reconstruction

In this section, we clarify the relationship between the model parameters in the MRF model in equation (13) and the reconstruction performance from a statistical mechanical point of view.

In our analysis, we assume that the prior probability of traffic data has the same form as our model in equation (13),

(18)

and assume the values of are independently drawn from the identical distribution , where notation is the normalization constant and notation is the summation running over all distinct pairs of vertices and in , i.e., . Although road networks have complex structures, we neglect the structures and employ the fully-connected model with no structure for the simplicity of analysis. For the observations generated from the prior probability in equation (18), we conduct the reconstructions by using the model taking the form

(19)

where . The bias parameters in the reconstruction model in equation (19) are defined by , and we assume that the values of are independently drawn from the identical distribution . If , , and , the prior model and the reconstruction model are equivalent.

For the observation with some missing elements, , generated from the prior model, as shown in equation (15), the reconstruction using the reconstruction model in equation (19) is conducted by

(20)

for . We measure the statistical performance of the reconstruction by the MSE in equation (17) averaged over all the possible observations generated from the prior probability and over all the possible values of bias parameters and that are generated from and , respectively. The averaged MSE is expressed by

(21)

where

(22)

is the MSE averaged over all the possible observations for the specific biases, where is the number of missing elements. Equation (21) represents the MSE of our Bayesian reconstruction averaged over all the possible situations that appear under our assumption for the prior model.

Since road networks are quite large, we consider the thermal dynamical limit of the mean square error by taking limits where is fixed at a finite constant. Parameter corresponds to the missing rate; it must be in the interval . In the thermal dynamical limit, from equations (19) and (38), we have

(23)

where

(24)

(see appendix A for the detailed derivation). Notations and are the averages of and , respectively. From equation (37), we have

where . Thus, equation (22) can be rewritten as

Similarly, from equation (37),

This means that coincides with with a probability of one. This leads to relation

and then, equation (22) is rewritten as

(25)

Since, we find

from equation (36), we obtain the average of in equation (23) with respect to the prior probability as

(26)

By using equations (21), (25), (26), and (36), with a straightforward calculation, we finally obtain the explicit form of equation (21) as

(27)

where and denote the variances of and , respectively. Equation (27) does not require the information about which roads are selected as the unobserved. The dependency on these data disappears as a result of the averaging operations.

It is obvious that takes the minimum value

(28)

when there is no model error between the prior model in equation (18) and the reconstruction model in equation (19), that is, when , , and . In the following, we examine numerically the relation between the averaged MSE and the parameters in the reconstruction model. For the numerical experiments, we set the parameters in the prior model to , , , and .

First, we examine the relation when a model error exists only in the interaction parameters, that is, when , , and . The parameter is the error of interaction. Figure 9 shows the plot of against error when .

Figure 9: Plot of against error when , , , and . The vertical axis is in equation (27). The inset is an enlarged plot around .

It can be seen that, where is bigger than , the performance level is relatively robust. In contrast, the performance level drastically decreases when is smaller than .

From equation (27), it can be seen that the dependency on the missing rate arises when model errors exist in the bias parameters or in variance parameters and . Next, we consider the case where model errors exist in the bias parameters and the variance parameters.

Figure 10: Plot of against missing rate when , , , and . The vertical axis is in equation (27).

Figure 10 shows the plot of against missing rate when , , , and . The reconstruction performance level decreases with the increase in the value of . This performance behavior seems to be qualitatively similar to the behavior of our numerical traffic density reconstruction in figure 8. From this result, we can presume that the behavior of MSE in figure 8 is caused primarily by the model errors in either the biases or the variance, or both.

The parameters in our reconstruction model were determined by the training data as described in section 3.2. However, in the training, it was not easy to find the truly optimal values of parameters from the training data, because the number of training data was much smaller than the number of parameters. This could be one of the reasons why the model errors exist.

5 Conclusion

In this paper, we introduced the Bayesian reconstruction framework, in which missing data are probabilistically interpolated, and overviewed the application of Bayesian reconstruction to the problem of traffic data reconstruction in the field of traffic engineering. Our traffic reconstruction model in equation (13) neglects some real traffic properties, for example, traffic lanes, contraflows, and so on. Nevertheless, the results of our reconstruction seem to be accurate. It can be expected that our simple GGM captures the static statistical property of traffic data and that it can be used an important base model of Bayesian traffic reconstructions. The extension of our model by taking real traffic properties into account should be addressed in the next study.

In the latter part of this paper, we evaluated the statistical performance of our reconstruction by using the statistical mechanical analysis, that is mean-field analysis. In our analysis, we used the simplified reconstruction model, which has no network structures, for the convenience of calculations. However, since real road networks are network structures, the result of our evaluation can be only a rough approximation. The authors proposed a method based on the belief propagation method to evaluate the statistical properties of Bayesian reconstructions on structured networks in the context of image processing [8]. We strongly suggest that the method can be applied to Bayesian traffic reconstruction and can lead to more realistic evaluations.

Appendix A Mean-field Analysis for Traffic Data Reconstruction Model

The partition function in equation (18) is

By using the Hubbard-Stratonovich transformation, we have

The Gaussian integral leads to

(29)

Therefore, the free energy (per one variable) for the prior model in (18) is expressed by

(30)

so that the first- and the second-order moments of the prior model are given by

(31)

and

(32)

respectively, where and is Kronecker’s delta.

Next, we find the free energy for the conditional probability of the reconstruction model in equation (20). The conditional probability is expressed by

where is defined in equation (24) and represents the summation running over all distinct pairs of vertices and in . represents the partition function defined by

Using almost the same derivation as the above mean-field derivation for the prior model, we find

(33)

where is the number of missing elements defined in section 4 and is associated with the missing rate. From equation (33), we can obtain the free energy (per one variable) for the conditional probability of the reconstruction model by

(34)

so that the first-order moments of the conditional probability of the reconstruction model are given by

(35)

for , where we define .

We consider the thermal dynamical limit by taking limits , where is fixed at a finite constant. In the thermal dynamical limit, the free energies in equations (30) and (34) are reduced to

and