I Introduction
It is well recognized that crime prediction is of great importance for enhancing the public security of urban so as to improve the life quality of citizens [1, 2]. Accurate crime prediction is beneficial to advance the sustainable development of urban and reduce the financial loss of urban violence. Therefore, there is a rising need for precise crime prediction. Efforts have been made on constructing crime prediction models to predict either the total crime amount [1, 3] or several specific types of crime such as Burglary [4], Felony Assault [5], Grand Larceny [6], Murder [7], Rape [8], Robbery [9, 10], and Vehicle Larceny [11]. In other words, most existing crime prediction methods either do not distinguish different types of crime or consider each crime type separately.
According to criminology and recent studies, different types of crime behave differently but are intrinsically correlated. For instance, social disorganization theory [12] and broken windows theory [13] suggest that a series of minor crimes like vandalism or graffiti might cause the increase of more severe crimes like assaults and weapon violence; while relations between different types of crime in London are investigated in [14], and bicycle theft, burglary, robbery and theft from the person are observed to be closely related in terms of spatial distribution. The above theories and findings indicate that different types of crime are intrinsically related to each other, and exploiting the correlations among crime types could boost accurate crime prediction.
Recently, driven by the advances in big urban data collection and integration techniques, a great quantity of urban data has been collected such as crime complaint data, stopandfrisk data, meteorological data, point of interests (POIs) data, human mobility data and 311 publicservice complaint data. Such data contains rich and useful context information about crime. For example, in the near future, more crimes tend to occur in the areas with many crime complaints [1]; the POIs density can characterize the neighborhood functions, which have strong impact on criminal activities according to criminal theories [15]; while publicservice complaint data reveal citizens’ dissatisfaction with government service, thus it is associated with crimes. In addition, big urban data contains finegrained information about where and when the data is collected. Such spatiotemporal information not only enables us to study the geographical factors of crimes such as urban configuration, but also allows us to understand the dynamics and evolution of crimes over time [16]. According to environmental criminology like awareness theory [17] and crime pattern theory [18], the distribution of urban crimes is highly influenced by space and time. Therefore, the spatiotemporal understandings from big urban data provide unprecedented opportunities for us to construct more accurate crime prediction.
In this paper, we jointly explore crosstype and spatiotemporal correlations for crime prediction by leveraging big urban data. Specifically, we mainly seek answers for two challenging questions: (1) what correlations can be observed among different types of crime, and (2) how to mathematically model crosstype and spatiotemporal correlations for crime prediction. For crosstype correlations, we investigate temporal and spatial patterns of different types of crimes as well as their relationships; for spatiotemporal correlations, we focus our investigation on mathematically modeling (1) intraregion temporal correlation that suggests how crime evolves over time in a region, and (2) interregion spatial correlation that depicts the spatial relationship across regions in the city [1, 3, 19]. We propose a novel framework CCC, which jointly captures Crosstype and spatiotemporal Correlations for Crime prediction based on urban data. Our major contributions can be summarized as follows:

[leftmargin=*]

We verify the existence of correlations among different types of crime from temporal and spatial perspectives;

We propose a novel crime prediction framework CCC, which jointly captures crosstype and spatiotemporal correlations into a coherent model; and

We conduct extensive experiments on real big urban data to validate the effectiveness of the proposed framework and the contributions of different correlations to crime prediction.
Ii Problem Statement
In this section, we first introduce the mathematical notations and then formally define the problem we study in this work. We employ bold letters to represent vectors and matrices, e.g.,
and ; we leverage nonbold letters to denote scalars, e.g., and ; and we use Greek letters as parameters, e.g., and .Let denote the observed numbers of crime where is the number of crime type observed at region in time slot. Here we suppose that there are totally (1) regions in a city, i.e., , (2) time slots (e.g. days, weeks, or months) in the dataset, i.e., , and (3) types of crime (e.g. burglary, robbery and grand larceny), i.e., . Suppose that denotes the set of feature vectors, where is the feature vector of region in time slot, and is the number of features. Note that feature vector is same for all types of crime of region in time slot. More details about features will be proposed in the experiment section.
With the abovementioned notations and definitions, we formally state the problem of crime prediction as: Given the historical observed crime amounts and feature vectors , we aim to predict the crime amount of time slot (or time slots later) for each type of crime based on and .
It should be noted that our goal is to predict crime amount for future time slot . However, if the feature vector is constructed based on data in time slot of region, the future feature vector of time slot is not available. To this end, in this paper, we actually construct using data in time slot rather than time slot of region. Without the loss of generality, in the following sections, we leverage for illustrations, i.e., performing crime prediction for time slot.
Iii Preliminary Study
In this section, we investigate spatiotemporal and crosstype correlations for different types of crime. This preliminary analysis is based on the crime data collected from New York City, which contains 7 types of crime, i.e., Burglary, Felony Assault, Grand Larceny, Murder, Rape, Robbery, and Vehicle Larceny. We will study crosstype correlations from temporal and spatial perspectives.
Iiia SpatioTemporal Correlations
Within a region, the amount of crime should change smoothly over time. We assume the crime amount is and for time and . To study temporal correlation, we show how the crime amount differences changes with on average of all regions. The result is illustrated in Figure 1(a) where xaxis is (days) and yaxis is . From Figure 1(a), we can observe that the crime differences are highly related to . To be specific, (i) two consecutive time slots share similar crime amounts; (ii) with the increase of , the crime difference is likely to increase.
For regions in the city, if two regions are spatially close to each other, they are likely to have similar crime amounts at the same time slot. Given a pair of regions, we leverage as their spatial distance and use as their absolute crime difference. We show how changes with averaged over all time slots in Figure 1(b), where xaxis is and yaxis is . We note that (i) when two regions are spatially close, they have similar crime amounts and (ii) with the increase of distance , the crime difference tends to increase.
The above observations suggest the existence of temporal and spatial correlations for each type of urban crime. Note that we illustrate the observation of grand larceny in Figure 1, while omit other types of crime which have similar observations.
IiiB CrossType Correlations
To investigate temporal correlations among different types of crime, we study how crime amounts of each type change with the days of a year. The average daily crime amounts from 2006 to 2015 are shown in Figure 2, where xaxis denotes the days of a year and yaxis is the crime amounts for each type of crime, respectively. From the figure, we observe temporal correlations between different types of crime. Specifically, the daily crime amounts of most types tend to increase from March to September and decrease from October to February. Furthermore, the crime amounts of some types such as Burglary, Grand Larceny and Robbery tend to increase before Christmas, but decrease dramatically during Christmas and New Year.
To study the spatial correlations among different types of crime, we show how crimes spatially distribute in New York City of 2012 in Figure 3. From Figure 3, we make the observations that (1) the majority of types concentrate in the Bronx, except for Grand Larceny; (2) Manhattan is also a hot district for some types, especially for Grand Larceny and Burglary; and (3) Burglary, Rape, Robbery, and Vehicle Larceny share some hotspots in the Brooklyn and Queens.
To study the correlations from both temporal and spatial perspective, we first construct matrices , where each . Each element is the crime amount for type of crime in region of time slot. To study the spatiotemporal correlations between two types (e.g. the and type) of crime, we calculate the variant of cosine similarity between and as follows:
(1) 
where and is the Frobenius norm. The result is shown in Figure 4. We can observe that most types of crime are indeed correlated with each other. The least spatiotemporal correlation exists between Grand Larceny and Murder, which is also demonstrated in Figure 2 and Figure 3.
To sum up, we demonstrate the existence of temporal and spatial correlations among different types of crime. These observations provide the groundwork for us to leverage the crosstype correlations for accurate crime prediction.
Iv The Proposed Crime Prediction Framework
In above section, we validate the correlations among different types of crime. In this section, we will first present the basic model without considering crosstype and spatiotemporal correlations, then propose the details of introducing crosstype correlations as well as spatiotemporal correlations into a coherent framework. Finally, we will discuss the optimization process of the proposed framework and how to leverage the framework to perform crime prediction.
Iva The Basic Model
Without considering crosstype and spatiotemporal correlations, we build a basic and individual model of crime type for region in time slot. Correspondingly, there is a weight vector for crime type of region in time slot, which can map to as: . All can be learned by solving the following regression problem:
(2) 
where the first term is the square loss function for regression task in this work. Note that it is straightforward to leverage other loss functions such as logistic loss and hinge loss. We employ
(controlled by a nonnegative parameter ) to avoid overfitting issue. This basic and individual model completely neglects the existence of correlations among different types of crime and spatiotemporal correlations within each type of crime. In the following subsections, we will discuss how to model crosstype correlations as well as spatiotemporal correlations based on this basic model.IvB Modeling CrossType Correlations
Our preliminary study Section IIIB validates the existence of correlations among different types of crime. In this subsection, we will introduce the model component to capture crosstype correlations.
To exploit correlations of urban crimes, we first decompose the weight vector into the sum of two components , where we use to capture the common features shared by all crime types of region in time slot, while captures the specific features for crime type. For instance, some common features lead to the concentration of most crime types in Bronx, while some specific features cause the Grand Larceny concentrating in Manhattan. We will leverage different regularization terms on and to exploit different correlations.
can represent the crime type, which paves us a way to capture crosstype correlations. We first combine all the type specific weight vectors into a weight matrix, i.e., . Then, adopting the task relationship regularization component in [20], the relationships among can be modeled as as follows:
(3)  
where is the crime type covariance matrix of region in time slot to learn and is a nonnegative parameter to control the contributions by exploring crosstype correlations. Since is a covariance matrix, the matrix should be positive semidefinite (or ). We introduce this regularization component to capture the correlations among different type of crimes of region in time slot based on and .
IvC Modeling IntraRegion Temporal Correlation
Crime within a region is observed following intraregion temporal correlation in Section IIIA – (1) for two consecutive time slots, they tend to share similar crime amounts; and (2) with the increase of distance between two time slots, the crime amounts difference is likely to increase. Inspired by this discovery, we propose a temporal regularization component to model the temporal correlations of crime amount within each region.
To be specific, considering the smooth evolution of crime amounts, the weight vectors should also change smoothly. Therefore, we adopt a series of discrete weight vectors over time to represent the temporal dynamics of crime amounts, and we add a temporal regularization component to the basic model as follows:
(4) 
where nonnegative parameter is introduced to control the contribution of intraregion temporal correlation from the temporal regularization component. The first term pushes as closer as , i.e., the weight vector for common features shared by all crime types of region change smoothly over time, while the second term captures the smooth evolution of weight vector for each specific crime type within a region. Note that we define as in this work, which makes it possible to encourage weight vectors of two consecutive time slots to be exactly same. We do not use norm since it is likely cause “wiggly” cost dynamics, which is not robust to noises and may hurt generalization performance [21]. Eq. (4) can be rewritten as:
(5) 
where and . is a sparse matrix. More specifically, for and all the other terms 0.
IvD Modeling InterRegion Spatial Correlation
As mentioned in Section IIIA, aside from intraregion temporal correlation, the crime amounts across all regions follow interregion spatial correlation – (1) two spatial close regions tend to have similar crime amounts; and (2) with the increase of geographical distance between two regions in a city, the crime difference between these two regions is likely to increase in a certain time slot. This observation inspires us to develop a spatial regularization component to capture the spatial correlation of crime amounts across regions in a city.
Specifically, we choose to minimize the following spatial component to capture interregion spatial correlation:
(6) 
where is the spatial distance between and region. is a power law exponential function, which is nonincrease in terms of , where is the parameter controlling the degree of spatial correlations. Thus, when and regions are closer, (i.e. is smaller), becomes larger that enforces weight vectors of two regions to be closer. Similar analysis can be used when the distance between and is larger.
Similar to intraregion temporal correlation, the first term pushes and to be closer, which means the weight vector for common features of all types of crime in and region is similar if they are spatially close to each other. The second term captures the proximity across regions of each type of crime. This spatial component encodes Tobler’s first law of geography [22] and performs a soft constraint that spatially close regions tend to have similar weight vectors. We can rewrite Eq. (6) as:
(7) 
where and . is a sparse matrix. To be specific, we have and for and , while all the other terms 0.
IvE An Optimization Method
With aforementioned components to capture crosstype correlations and spatiotemporal correlations, the objective loss function of the proposed framework is to solve the following optimization task:
(8)  
where first term is the basic regression model, the second term captures crosstype correlations, the third term models intraregion temporal correlations and the last term captures the interregion spatial correlations. Figure 5 is an illustration of the proposed framework with two types of crime, where orange arrows are for crosstype correlations between two types of crime, green arrows are for temporal correlations and blue arrows are for spatial correlations.
In this work, we leverage ADMM technique [23] to optimize the objective loss function Eq. (8). We first suppose , , and , where , , and are auxiliary variable matrices in ADMM. Then the objective loss function becomes:
(9)  
Then the scaled form of ADMM optimization formulation of Eq (9) can be written as:
(10)  
where is the Frobeniusnorm of a matrix. We introduce scaled dual variable matrices , , and of ADMM. The penalty for the violation of equality constraints , , , is controlled by a nonnegative parameter . According to ADMM technique, each optimization iteration of Eq (10) consists of the following steps:
(11)  
(12)  
(13)  
(14)  
(15)  
(16)  
(17)  
(18)  
(19)  
(20)  
(21) 
where is the learning rate of gradient descent. The derivative of with respect to is:
(22)  
where is the row of , is the row of . The derivative of with respect to is:
(23)  
where is the column of . The soft thresholding operator is defined as follows:
(24) 
The details of ADMM optimization procedure are shown in Algorithm 1. We first initialize weight matrices and , auxiliary variable matrices , , and , and scaled dual variables matrices , , and randomly (line 1). Note that we initialize according to the assumption that all types of crime are unrelated initially. In each iteration of ADMM, we first leverage Gradient Descent technique with the gradient in Eq. (22) and Eq. (23) to update the current and (line 4 and 6). Note that all and are fixed. Then we update according to Eq. (13) in line 8. Next we proceed to update , , , , , , and using aforementioned update rules from line 10 to line 25. When ADMM optimization approaches convergence, Algorithm 1 will output the well trained weight vectors and , for respectively.
Next we discuss the computational cost of Algorithm 1. In each iteration of ADMM, calculating according to Eq. (23) is the most time consuming step. First we consider the time complexity of the first term in Eq. (23), in which and can be computed in , then subtracting and multiplying can be computed in , so the time complexity of the first term is . The second term can be computed in . For the third term, since the matrix representation of is very sparse, i.e., each row or column of has at most two nonzero elements, thus the time complexity of it is . Then the multiplying can be computed in , so time complexity of the third term is . Similarly, the last term can be computed in . Therefore, considering that there are regions, time slots and types of crime, the computational cost of each ADMM iteration is .
IvF Crime Prediction Task
When ADMM is convergent, Algorithm 1 can output the well trained weight vectors and , for all . In this subsection, we introduce how to perform crime prediction for a future time slot (i.e. time slot) based on all and .
As mentioned in Section II, we actually construct feature vector using data in time slot rather than time slot of region. Thus for the time slot, we can construct based on data in time slot. Therefore, in order to predict crime amount for type of crime in region of time slot, we need the mapping vectors and
. To sum up, the problem becomes to estimate
and based on and .The mapping vectors and should be related to these of previous time slots according to intraregion temporal correlation. Therefore, we assume that is the weighted sum of its previous time slots as:
(25)  
where should be a nonincrease function of , i.e., should be larger when is smaller, since should be closer related to its just previous few time slots. In this work, we use a power law exponential function of , where is introduced to control the contributions from . Note that when , contributes equally to . We propose to automatically estimate optimal from the training data via solving the following optimization problem:
(26) 
Figure 6 illustrates how we learn for type of crime in region, where we use . In this example, we aim to predict crime amount in time slot based on the training data of previous time slots and the well trained weight vectors from Algorithm 1. We use previous time slots to predict as . By solving Eq. (26), we can estimate based on samples, i.e., , and