1 Introduction
Revealing correlations of data is the simplest way to analyze relationships of given samples. However, the simple way has a lot of unignorable problems such as noise robustness, interpretability, and treatments of discrete data. Specifically, using simple correlations only, large input dimension causes difficulty of finding significant relationships because the reasonable threshold selection is not trivial. In order to find the essential relationships between variables, the sparse modeling, such as LASSO (Least Absolute Shrinkage and Selection Operator)[8], is focused in these decades[4]. Graphical Lasso [2] is a representative method which achives sparse covariance structure analysis. Since an inverse covariance matrix corresponds to a partial correlation matrix with appropriate scaling, we can discover robust and essential relationships of data by sparse covariance structure analysis. However, because of the assumption of a Gaussian Graphical Model (GGM) for observed data, Graphical Lasso can not treat count data.
In order to overcome this limitation of Graphical Lasso, we propose a hierarchical Bayesian model for Poisson distributed observations with sparse covariance structure. In our model, the latent variables, which indicate “
potential risks” of events, follow a Bayesian Graphical LASSO(BGL) [9], and the occurrences of events follow the homogeneous Poisson processes. We apply the proposed model to spatial crime data analysis and investigate the effectiveness with numerical experiments.1.1 Graphical Lasso
Graphical Lasso [2] is a wellknown and powerful model for sparse covariance structure analysis of GGM. Let a zeromeaned data matrix
follows a Gaussian distribution independently with given a precision matrix
denoting each component as in .(1) 
The Graphical Lasso optimizes the following objective which consists of the term from maximizing likelihood estimator of under the penalty
(2) 
where indicates the set of all positive semidefinite matrices and is a regularization parameter. Here, the norm is defined by . Note that maximizing the objective of Graphical Lasso (2) is equivalent to obtain the maximum a posteriori (MAP) estimator of the following model, which is the combination of the Laplace distributions of the offdiagonal components of denoting as of the form
and the exponential distribution of the diagonal components
of the form :(3)  
(4) 
where is a normalizing term and is an indicator function defined by
(5) 
1.2 Bayesian Graphical Lasso
Bayesian Graphical Lasso (BGL) [9] realizes a fully Bayesian treatment of Graphical Lasso. Using the fact that the double exponential distributions can be represented as a mixture of Gaussian and exponential distributions, (3) is modified to
(6)  
(7) 
where
denote the vector of the upper offdiagonal and diagonal entries of
and is the latent scale parameters. From the decomposition of the double exponential distributions, we derive a dataaugmented block Gibbs sampling algorithm for the BGL model(6).1.3 The Poisson Process and Crime Data
A Poisson distribution gives probability masses of the numbers of occurred events per unit time. More precisely, if the time intervals of events’ occurrences follow identical exponential distributions, the numbers of events per unit time follow identical Poisson distributions. This model is called a homogeneous Poisson process. The Poisson process is a simple but reasonable model to represent occurrences for rare events. Common examples of Poisson processes are customers calling a help
[10], radioactive decay in atoms [7], and crime occurrence [6][5], and so on.If data follow a multivariate point process, it is significant to understand the intervariate relationships of data. When the data consists of continuous values, we can evaluate the structure of variables by calculating correlations or applying Graphical Lasso. However, for count data, the correlation matrix will lead to biased evaluation for the structure. Hence, we introduce a novel Bayesian framework of sparse covariance structure analysis for count data.
The remainder of this paper is organized as follows. Section 2 gives the concrete formulation of the proposed model. In Section 3, we show the effectiveness of our model through experiments with spatial crime data and Section 4 shows the discussion of our model. Finally, Section 5 is devoted to a summary of this study.
2 The Proposed Method
2.1 Our Model
Fig. 1 shows the graphical model of the proposed model. At First, we define the set of nonnegative integers . When we obtain dimensional sequential data for timesteps, we assume that the elements of count data matrix follows conditionally independent Poisson distributions
(8)  
(9) 
and we also give priors as
(10)  
(11)  
(12)  
(13) 
Here, we assume the linear predictor as the potential risk of occurrence of events. Thus, indicates averaged potential risk of th dimension and represents dispersities from . Since follows the BGL prior, we can extract the sparse and essential cooccurrence structures of count data. In addition, we discussion about the effects of choices of in Section 3. Therefore, the joint posterior distribution can be expressed as
(14) 
2.2 Sampling Scheme
We evaluate the posterior (14
) with a Markov chain Monte Carlo method which consists of two different sampling schemes. The first scheme is block Gibbs sampling for the parameters in the BGL,
i.e. and . The second one is about the parameters of the potential risks, that is, and . We adopt the MetropolisHastings scheme for correlated count data [1].For example, we describe the MetropolisHastings sampling scheme from the fully conditional distribution of here. When tries to transition to new state , we define the acceptance probability for simplicity
(15) 
The sampling procedure is as follows:
 Step 1

Find optimal about the fully conditional distribution(16) by use of the NewtonRaphson method.
 Step 2

Sample a candidate state from proposal distribution defined as the multivariate tdistribution:(17) where is the Hessian matrix of (16) and
is a userdefined hyperparameter which indicates the degree of freedom.
 Step 3

Accept the candidate state with probability .
Update the state if accepted, and keep the state to be if rejected.
Sampling from is also executed by the similar way to . In summary, we repeat sampling from the fully conditional distributions of and respectively with MetropolisHastings within Gibbs sampler.
3 Synthetic Data Analyses
3.1 Synthetic Data
To assess the performance of our proposed model, we generate four synthetic datasets, whose size are . We fix and generated as with , and zero otherwise for entire simulations, where and represent constant values. Given these true parameters, we sample and the observed data matrix from (8) and (11) respectively.
3.2 Analyzing Effects of Hyperparameter Selection
In our MetropolisHastings sampling scheme, we empirically find that an appropriate selection for the degree of freedom in proposal tdistribution gives a significant effect for the sampling efficiency. In our simulations, slightly small , such as , is better than bigger . Because a tdistribution with
is equivalent to Cauchy distribution and
is corresponding to a Gaussian distribution, our choice means intermediate form of them.Next, we determine the parameters in priors. For , which is the hyperparameter of the regularization parameter , we adopt for and . Moreover, a value seems to be reasonable for the prior . Since and determine the parameters of the Poisson distribution within the exponential function in the proposed model, the absolute value of should be small.
3.3 Simulation Results
Here we show the estimation results of , , in the case that the size of data is . We adopt the hyperparameters and updated all elements at once both and in the MetropolisHastings algorithms.
Fig. 2 shows the result of . We show the solid line estimates of with the MAP estimator with 95% credible intervals. The dotted line represents the true values. It is assumed that small is difficult to estimate because the linear predictor is exponentially transformed to be the parameter of Poisson distribution. However, we find the 95% the true values are held within credible intervals.
Fig. 3 shows the MAP estimation results of and their true values. The subfigure (c) shows the differences between MAP estimated and true in absolute value.
Fig. 4 shows the estimation result of the value of . Note that is estimated to be as shown in Fig. 2. The solid line is the MAP estimation results of with the 95% credible intervals, and the markers are their true values. For the positive , the true values are within the credible intervals of the estimation result. On the other hand, the true values seem to be out of the 95% credible intervals for the negative . However, considering that the parameter of Poisson distribution becomes a small value, the estimation results are reasonable.
3.4 Evaluating Partial Correlations
We can calculate partial correlation matrix as
(18) 
from the estimated precision matrix . In Fig. 5, we show the MAP estimated for . We compare the estimated partial correlation coefficients and true one, which consists of small number of nonzero components. We confirm the partial correlation coefficients that correspond to the nonzero components are relatively larger than the other components that correspond to the zero components in the true partial correlation coefficients. We also find that nondiagonal and nonzero estimates have shrinkage from their true values. This is affected by the Lassolike priors of BGL.
4 Analysis Of Crime Spots Data
4.1 Spatial Crime Data
As an application example of the proposed model, we employed the proposed model to spatial crime data which is obtained from [3]. The spatial crime data published by the National Institute of Justice (NIJ) contains criminal occurrences and their locations in the Portland City, Oregon, USA.
In the crime data, the latent variable represents the average potential risk of criminal occurrences of the th area. Also, represents dispersities of the potential risks at each time point. Therefore, the interaction structure is captured by .
For simplicity, we used the crimes that occurred in 2016 and extracted 60 areas. Furthermore, we aggregate the number of crimes per week into one time point. Hence, the size of data matrix becomes is .
4.2 The Partial Correlation Of On Crime Data
The estimated is the inverse of the sparse covariance matrix that contributes to the influence between variables and temporal variation. Therefore, the partial correlation calculated from represents a sparse correlation of crime occurrence risk between areas.
A strong positive correlation between two areas means that the risk of the crime in one area intends to increase when the other area risk increases.
4.3 Visualization Of Partial Correlations
The partial correlation coefficients matrix is calculated from (18) with the estimated result of . Fig. 6 shows a visualized sparse partial correlation coefficients on the Portland city map. The coefficients shown on the map are the top 2% of the whole in absolute value. The correlation of crime risks could lead to a visual understanding by expressing a strong correlation on the map.
5 Conclusion
The proposed model can estimate reasonable values of and latent variables and . On the other hand, when the parameter is less than 1 in the Poisson process, the number of events tends to be , so that it is difficult to estimate the negative true value of the latent variable of the simulation data in the proposed model. One of the possible solutions tackle this problem is to find an alternative transformation function which maps to for the linear predictor of Poisson distribution. For example, is one of the candidates.
It is possible to find useful features by analyzing the estimation results of latent variables by the proposed model. We have been able to obtain sparse correlations between crime risk areas, by applying the proposed model to crime data. As a future issue, because there are many samples related to time in the Poisson process in general, we would like to make our model possible to catch timeseries dependence.
References
 [1] (2001) Marginal likelihood from the metropolis–hastings output. Journal of the American Statistical Association 96 (453), pp. 270–281. Cited by: §2.2.
 [2] (2008) Sparse inverse covariance estimation with the graphical lasso. In Biostatistics, Vol. 9, pp. 432–441. Cited by: §1.1, §1.
 [3] Realtime crime forecasting challenge. Note: https://nij.ojp.gov/funding/realtimecrimeforecastingchallenge Last accessed 18 May 2020 Cited by: §4.1.
 [4] (201603) Three levels of datadriven science. Journal of Physics: Conference Series 699, pp. 012001. External Links: Document, Link Cited by: §1.
 [5] (2006) Adaptive event detection with timevarying poisson processes. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 207–216. Cited by: §1.3.

[6]
(2000)
Poissonbased regression analysis of aggregate crime rates
. Journal of quantitative criminology 16 (1), pp. 21–43. Cited by: §1.3.  [7] (2015) Limitations of poisson statistics in describing radioactive decay. Physica Medica 31 (8), pp. 1105–1107. Cited by: §1.3.
 [8] (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 (1), pp. 267–288. External Links: ISSN 00359246, Link Cited by: §1.
 [9] (2012) Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis 7 (4), pp. 867–886. Cited by: §1.2, §1.
 [10] (2007) Bayesian forecasting of an inhomogeneous poisson process with applications to call center data. Journal of the American Statistical Association 102 (480), pp. 1185–1198. Cited by: §1.3.
Comments
There are no comments yet.