Perfecting the Crime Machine

01/14/2020 ∙ by Yigit Alparslan, et al. ∙ 0

This study explores using different machine learning techniques and workflows to predict crime related statistics, specifically crime type in Philadelphia. We use crime location and time as main features, extract different features from the two features that our raw data has, and build models that would work with large number of class labels. We use different techniques to extract various features including combining unsupervised learning techniques and try to predict the crime type. Some of the models that we use are Support Vector Machines, Decision Trees, Random Forest, K-Nearest Neighbors. We report that the Random Forest as the best performing model to predict crime type with an error log loss of 2.3120.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Crime is a problem that we face every day in our society. Even though there are various reasons behind it, most of the reasons of crimes can be attributed to social-economical reasons. It is also shown that urban areas and cities show higher density of crime[1]. Crime also depends on different factors such as education, culture, economy level of neighbours and unemployment. There is a huge push towards using machine learning models to get statistics regarding crime predictions, to attest why they occur, when they would occur, and to whom it would occur[2][3][4][5][6][7]. One of the reasons we wanted to work with crime was because of individual incidents that we have seen on Drexel University campus, a rape incident dating around September 2019 that caused widespread backlash around Drexel University community and Philadelphia community regarding why Public Safety didn’t take enough precautions. Philadelphia, being at the top 6 cities in the United States for population, and being our home appeals to us as a city that we can study, because we wanted to see if we could find any underlying reasons regarding crime by building predictive models, and see if we can systematically find those reasons with robust workflows. Some of the workflows that we adhere by in this study is to feature extraction, model selection, parameter tuning for those models, and feature selection.

Ii Related Work

There is a huge push towards building predictive models and fight against crime. Studies show that one of the techniques used widely in this crime field is to look at how dense the crime points are on a map. It has been shown that the existence of crime dense areas can be used as an indicator of the future crime areas since crime changes depend on several different reasons on a multidimensional layer, this has been widely accepted as an indicator of future crime. In this study, we wanted to differentiate ourselves by following approaches.

  1. Work with very large number of classes (30 labels)

  2. Create features that doesn’t depend on the city

  3. Find optimal number of clusters in a data set

  4. Cluster centers and use the distance as feature in our predictive models.

  5. Work with different supervised learning models that incorporate the aforementioned aspects hoping that it would increase our model accuracies.

Researchers have focused on studying crime both from a time and location perspective[8]. The time perspective is the predictive aspect of crime as one might imagine. More specifically, one can create a grid on a city, and count the crime points on a grid and pose this problem as a regression over time series[9]. Other perspective is to use the location. Location might sound similar to the first time perspective but this is different and the difference lie on the fact that crime locations barely change over short amounts of time. So, if one were to study the crime dense neighbors of Philadelphia over a decade, and then guess the crime dense neighbors for the next year, month etc, one potential solution would be to flag the already existing crime dense areas and predict those neighbors as the future potential crime dense areas. We have to realize that the literature uses a special word for this, that is crime hot spot. There are mathematical models that labels an area as crime hot spot or not based on a Euclidean distance, that is a linear kernel functions.

Even though current literature is built on top of these approaches, we wanted to remove the assumption that current literature has, even this meant deviating from the current literature approaches. For this reason, there is a narrow common ground between our findings and the common ground where we can compare our findings. This meant creating models that would not depend on the city. For example, as we have seen with the time perspective, a predictive model that poses this crime problem as a regression is a model that would need crime counts over time. To get the crime counts, most researches had to create grids on a city and count the crime counts for each grid and sum them over different periods of time[10][11]. This way, one can do regression single grid, and let’s say, predict the crime count that one would expect in that crime grid cell for future time. One difficulty with that approach is that the data scientists or researches have to find a way to divide the city into different grid cells. This could be a problem since not every major city is a square, or has a compatible shape to be treated a rectangle/square. To mitigate this, we propose to create clusters in our data and use the clusters as a way of counting the crime instead of using a grid. More specifically, we create clusters by using our crime points for each year. We, then stack the clusters on top of each to get rid of the time element. Because we remove one of the constraints, now we can still get information from the time dimension that our data has without explicitly using it in our models. This approach of creating clusters removes the dependency of putting a grid into the city, and therefore it removes most of the preprocessing that a data scientist has to do to work with the crime data.

Iii Problem Statement

Problems that we are tackling are as follows:

  1. [label=()]

  2. Can we predict crime type given location and time?

  3. Can we predict accurately if class number is very large?

  4. Does incorporating features from unsupervised learning techniques improve our supervised models to predict crime type?

  5. Can we develop a systematic workflow to combine both learning (supervised/unsupervised) techniques for the crime data set that we work with?

To answer these questions, we investigate creating clusters and looking at different supervised machine learning models.

Iv Basic Approach

Our approach includes the following phases:

  1. Data preprocessing, feature extraction

  2. Finding optimal number of clusters in our data set

  3. Creating Clusters for each year and stacking the cluster centers.

  4. Calculating the Euclidean distance from each crime point to cluster centers

  5. Adding the distance features to previous, train different models including K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest, Multi Layer Perceptron

Fig. 1: A snapshot of our workflow. First, we plot crime points, then we find optimal number clusters and then we find Euclidean distance from each crime point to its closest cluster center, which is then is used one of the features for our supervised learning models.

We have two features in our data set. These features are mainly time and location. By using location and time, we can generate the following features via some data processing.

  1. Hour

  2. Month

  3. Year

  4. DayOfWeek

  5. Is_Weekend

  6. X

  7. Y

  8. Is_Intersection

  9. Is_Block

  10. Police District

  11. Street_Type, (St, Blv, Ave etc)

X Y Date Description
-75.174324 39.986978 4/3/2009 8:46 Other Assaults
-75.238710 39.953566 2/2/2008 7:56 Robbery Firearm
-75.069437 40.034939 4/8/2007 2:54 Driving Under Influence
-75.113286 39.996494 5/19/2006 11:37 Thefts
-75.065362 40.046056 7/26/2006 13:35 Other Assaults
TABLE I: Raw features used in our data. We extracted more features from the ones that were presented to us in the raw data. Location helped us use Cartesian and polar coordinates together. Time helped us use day, hour, minute, is weekend, and day of week.

V Data

The data set used during this study has about 1.3 million samples. It has been collected by Open Data Philly City Council Organization[12]. For our supervised learning models, we used the 80/20 training set, we got about 838860 samples for training data and 262030 for testing data. This equates to using the first 9 years beginning from 2015 as training data and the remaining a year as the testing data set. The data set years ranged from 2006 to 2015. Some rows were missing some missing values. Missing values required us to do data pre-processing. In order to perform data processing, it is essential to improve the data quality. There are a few techniques in practice, which are employed for the purpose of data pre-processing. The techniques are data cleaning, feature selection, outlier detection, and component reduction and transformation. Before applying a classification algorithm usually some pre-processing is performed on the data set. Features are location and time for a crime points. Time for a crime point is dispatch time that the operator at 911 call center recorded. Therefore, the time is expressed with preciseness up until minute. The location for the crime point is the X, and Y coordinates of the crime point. Latitude measures angular distance from the equator to a point north or south of the equator. Longitude is an angular measure of east/west from the Prime Meridian, which has an angular measure of 0 since it is the beginning for that measure. Latitude values increase or decrease along the vertical axis, the Y axis. Longitude changes value along the horizontal access, the X axis. Philadelphia’s latitude’s range is slightly greater than its longitude range, which might make the Y feature more important. This can be easily seen with feature selection analysis that we did. More importantly, other city councils that gather the same type of crime data can easily see that the crime analysis is very tightly connected to the city structure and neighbors distribution. For this reason, data scientists and researches usually have to have prior knowledge regarding the physicality of the city. In our study, we tried to remove the assumptions regarding the city land space and focused on the features that can be generalized well such as dispatch time of crime, and angular measure of the crime point on Earth such as longitude and latitude.

We also look at individual classes to see the underlying patterns. Some patterns that we is that there are less crime during cold seasons than hot seasons over a year. Interestingly enough, in 2009, there is a dip in the number of classes that occurred per year. Even though one might expect the otherwise situation since during recession when people panicked, one would have expected that there would be more crimes since people are more desperate. Around 6am is the safest hour in a day since most criminals are sleeping. Additionally, there is a peak in the crime count around lunch break. Since the data set is in Philadelphia, which is a highly populated urban city, there are more crimes in the lunch time compared to morning and slightly after lunch time. Overall, crime count peaks in the evening between 8pm and 10pm and stays vrey high until 1am.

Fig. 2: Crime points plotted for all years on Philadelphia map. Python’s sci-kit library was used to plot the points
Fig. 3: Crime counts for entire data set with crime labels. All the class labels can be seen on the horizontal axis. The vertical axis shows the running sum of those counts as percentage as well as absolute counts.
Fig. 4: All crimes in the data set over years(i), months(i), and hour (iii) are shown. By plotting the all crime counts that occur over years, months, and hours we can see some of the underlying trends. We see that colder months, and early hours have less crime count.
Fig. 5:

Distribution of crime counts over all class labels. There is a huge unbalanced crime count in the ’other’ class label, which suggests that there might be some skew in our models.

Fig. 6: Total crimes by police district over Philadelphia police districts. There are about 30 police districts in Philadelphia and we see that the police districts in Center city, Northeast Philadelphia, and West Philadelphia frequent more crime than other police districts.

Now, we look at some specific crime incidents and aggregate them over hours, months, and years. We see that some crime types such as prostitution and sex offenses occur very frequently during night time, and other crime types such as thefts and vandalism occur equally all day and remain stable in a day. We see that driving under influence occur at a very high rate between 10pm and 2 am, which is a natural time window for drivers who leave their parties after getting enough alcohol.

When we change the time scope from hours to months, we see that there is less crime incidents witnessed during cold months and that the hot months such as spring and summer see an increase in certain number of crime types such as thefts and prostitution.

If we look at the aggregation of crime types over years, we see that the trends get significantly harder to see. There are some general trends that we can mention. First, some crime types occur less over recent years such as vandalism. There are also some crime types that increase such as thefts. With thefts, we don’t see a decline in the number of theft incidents.

Fig. 7: Rape(i) and Loitering(ii) crime counts over a day. We see that the lunchtime and evening hours, there is a peak.
Fig. 8: Rape(i) and Loitering(ii) crime counts over months. We see that the colder months witness less crime
Fig. 9: Rape(i) and Loitering(ii) crime counts over years. Looking at a year scale, it gets hard to see the underlying pattern, but we can say that the loitering decreases significantly during recent years.
Fig. 10: Other sex offenses not commercialized(i) and prostitution(ii) crime counts aggregated for a day. We see that there is an increase in the lunch time window for prostitution.
Fig. 11: Other sex offenses not commercialized(i) and prostitution(ii) crime counts aggregated over months. We see that there is less crime in the fall and winter
Fig. 12: Other sex offenses not commercialized(i) and prostitution(ii) crime counts aggregated over a decade. It gets hard to see the underlying patterns when the scope is really zoomed out but when there is a general decline in the crime count for prostitution.
Fig. 13: All other offenses(i) and other assaults(ii) crime counts aggregated for a day. We see that there are more crime incidents in the evening hours
Fig. 14: All other offenses(i) and other assaults(ii) crime counts aggregated over months. We see that there are less crime crime incidents over cold months such as winter and fall seasons.
Fig. 15: All other offenses(i) and other assaults(ii) crime counts aggregated over years. We see that there is a decline in the crime counts over recent years.
Fig. 16: Residential burglary(i) and aggravated assaults with no firearms(ii) crime counts aggregated over hours. We see that there is a huge jump in the evening hours.
Fig. 17: Residential burglary(i) and aggravated assaults with no firearms(ii) crime counts aggregated over months. We see that there are less crime crime incidents over cold months such as winter and fall seasons.
Fig. 18: Residential burglary(i) and aggravated assaults with no firearms(ii) crime counts aggregated over years. We see that there are less crime crime incidents over recent years, but when the scope is years, it gets hard to tell the underlying patterns
Fig. 19: Thefts(i) and Vandalism(ii) crime counts aggregated over hours. We see that theft occurs all the time when humans are awake, and vandalism peaks significantly during daylight.
Fig. 20: Thefts(i) and Vandalism(ii) crime counts aggregated over months. We see that cold months witness less crime incidents.
Fig. 21: Thefts(i) and Vandalism(ii) crime counts aggregated over years. Year score is making the underlying patterns hard to see, but we witness less crime vandalism occurring n the recent years. Thefts seem to be be increasing.

Vi Experiment Results, Analysis and Performance Evaluation

Quick Summary for results are as follows:

  1. Random Forest is most sensitive to the minute and the hour.

  2. Random Forest is the best performing model, which aligns with the current literature.

  3. Support Vector Machines over 30 labels fails to run to completion in Google Cloud Compute Engine Service.

  4. The optimal number of clusters for all years is7 but when we take each year as a separate data set, we see that the optimal number of clusters varies between 7 and 10.

  5. Bayesian Inference works significantly well with 30 class labels, achieving around 27% mean accuracy compared to logistic regression with 5%and K Nearest Neighbors with 19% accuracies.

Having said all these results, now we expand upon them with details here.

Vi-a Unsupervised Learning

Unsupervised learning techniques are methods where one employs systematic methods to a data set without any labels to understand the underlying features regarding the data. The techniques that we use in this study is K-Means clustering algorithm. One really important aspect about this method is to find the optimal number of clusters in one’s data set. For this objective, we employed two different methods, namely elbow method and gap statistics. Elbow method can be employed like this:

  For each k value, we initialised k-means and used the inertia attribute to identify the sum of squared distances of samples to the nearest cluster centre.
  As k increases, the sum of squared distance tends to zero. Imagine we set k to its maximum value n (where n is number of samples) each sample will form its own cluster meaning sum of squared distances equals zero.
  If the plot looks like an arm, then the elbow on the arm is optimal k.
Algorithm 1 Calculate optimal number of clusters - Elbow Method
  Cluster the observed data, varying the number of clusters from k = 1, …, kmax, and compute the corresponding total within intra-cluster variation Wk.

  Generate B reference data sets with a random uniform distribution.

  Cluster each of these reference data sets with varying number of clusters k = 1, kmax, and compute the corresponding total within intracluster variation Wkb.

  Compute the estimated gap statistic as the deviation of the observed Wk value from its expected value Wkb under the null hypothesis: Gap(k)=

log(Wk).

  Compute also the standard deviation of the statistics.

  Choose the number of clusters as the smallest value of k such that the gap statistic is within one standard deviation of the gap at k+1: Gap(k)Gap(k+1) –sk+1 where s is the standard deviation.
Algorithm 2 Calculate optimal number of clusters - Gap Statistics
Fig. 22: Plot of all crime for 2006-2018. X and Y coordinates on a Cartesian’s coordinate system can be used to plot all the crime points in this map without the explicit Philadelphia borders.
Fig. 23: Gap Values by Crime count over a range to determine the optimal cluster count. We use the Gap statistics over the range of 1 to 16 and pick the index of the highest gap value as the optimal cluster count. This method gives 7 as the optimal cluster number
Fig. 24: Elbow Method to determine the optimal cluster values. This method gives 3 as the optimal cluster count.

Gap statistics method gives 7 as the optimal cluster number when applied to all the years in our data set. Elbow method gives 3 as the optimal cluster number when applied to all years in our data set. Because we have about 1 million data points, we want to maximize the variance on distances that we calculate to the cluster centers, and go with the k=7 optimal cluster count. We apply the K-Means clustering algorithm and see the results in Figure

27

Fig. 25: Average similarity distance when PCA is applied to all 25 adversarial examples with respect to percentage of components used to craft the adversarial effect during PCA

We plot only the cluster centers for each year and show the results in Figure 28. Cluster centers align on the direction from west to North East indicating that the crime points are gathered around the West Philadelphia and North East Philadelphia areas. To see the effect of cluster centers on the map, we apply a Gaussian Density Function to draw contours where height of the surface indicates the density of the crime happening in the future. This aspect really resembles the approach that we mentioned in our Related Work section when we introduced the concepts that literature took in this field.One being the time aspect, and one being the location aspect. Northeast and West Philadelphia achieved the tallest surface heights.

Fig. 26: Average similarity distance when PCA is applied to all 25 adversarial examples with respect to percentage of components used to craft the adversarial effect during PCA
Fig. 27: Average similarity distance when PCA is applied to all 25 adversarial examples with respect to percentage of components used to craft the adversarial effect during PCA

Vi-B Supervised Learning

Supervised learning methods are techniques that employ training data, a cost function and a testing data where training data is used to fit a data and testing data is used to report how well the the fit was behaved. Out of 1.3 millions samples in the training data set. We got about 838860 samples for training data and 262030 for testing data. This equates to using the first 9 years beginning from 2015 as training data and the remaining years 2015, 2016, 2017 and 2018 years as the testing data set. Some rows had missing missing values. Missing values required us to do data pre-processing and drop them.The class labels can be seen in Table II. The models that we employed in order to predict the crime types are as follows:

  1. K Nearest Neighbors

  2. Naive Bayesian Inference

  3. Decision Tree

  4. Random Forest

  5. Logistic Regression

  6. Support Vector Machine

  7. Multi Layer Perceptron

We choose KNN with 5 neighbors. KNN is a classifier that makes the classification output based on the majority of votes of the k nearest neighbors.Naive Bayes methods naively employs inference by assuming that the feature pairs are independent. We also used Decision Tree with a confidence factor 0.3. Decision Trees are supervised learning models that achieves the value of the target variable by learning simple splitting rules/decision rules on the data set. Random Forest that we used had 10 trees. A random forest is a model that combines several decision trees on several sub-samples of the data set and use the averaging to improve the predictive accuracy. Since it uses several trees, it is also expected to generalize well and avoid over fitting. Logistic Regression is a supervised learning method which is well suited to be a binary classifier and can also be used for multi class classification problems. It uses a log function in order to produce probability values over classes which then can be used to predict classes. Support Vector Machines (SVM) are supervised learning machines. They implement a good generalization on a limited number of learning patterns inferred based on the features that we used. It uses a linear kernel and tries to separate the crime points in a very high dimensional space which is likely to have a linear hyper plane. Multi Layer Perceptrons are layered supervised learning models that tries to find a hyper plane in order to separable the data. We employed one hidden layer of 150 neurons.

During our study, we employed a free Google Cloud Compute Engine Service with free 12 hours of GPU access in order to take advantage of fast cloud computing. We encourage the reader to see the specifications here[12]. We had several time outs whilst training the Support Vector Machines and the Multi Layer Perceptron models, and therefore, we don’t report their results in this study.

Class Label Index Class Labels Used in the Supervised Models
0 Aggravated Assault Firearm
1 Aggravated Assault No Firearm
2 All Other Offenses
3 Arson
4 Burglary Non-Residential
5 Burglary Residential
6 Driving Under Influence
7 Disorderly Conduct
8 Embezzlement
9 Forgery and Counterfeiting
10 Fraud
11 Gambling Violations
12 Homicide - Criminal
13 Homicide - Gross Negligence
14 Homicide - Justifiable
15 Liquor Law Violations
16 Motor Vehicle Theft
17 Narcotic / Drug Law Violations
18 Offenses Against Family and Children
19 Other Assaults
20 Other Sex Offenses (Not Commercialized)
21 Prostitution and Commercialized Vice
22 Public Drunkenness
23 Rape
24 Receiving Stolen Property
25 Recovered Stolen Motor Vehicle
26 Robbery Firearm
27 Robbery No Firearm
28 Theft from Vehicle
29 Thefts
30 Vagrancy/Loitering
31 Vandalism/Criminal Mischief
32 Weapon Violations
TABLE II: Class Labels that were used for the supervised learning models. Labels were given an integer to make the plot and training easier.

All the models were done by using Python’s sci-kit library and the preprocessing was done by first reading from excel file, splitting it into two: first for training and second for testing with 80% and 20% ratios respectively.

Fig. 28: All crime occurrences in the training data. Crime occurrences in the training data are seen over the class labels, which are encoded as integers. We see that some classes have much greater crime count compared to others.
Fig. 29: Histogram of different class labels and log loss values after predicting each label. Mean log loss gives us how well we are doing collectively and individually when we are predicting the crime types.

We also take a look at the number components that we can keep the high variance. This can help us eliminate components that don’t give extra information, or important information.

Fig. 30: Principal component analysis over features. Ranking features and keeping the ones tat have the highest information gain can be achieved with applying principal component analysis.

When we apply Principal component analysis, we see that applying PCA to our model will decrease the performance, This can be attributed to the fact that we are working very small number features and because essential information is lost in the PCA process, we lose information immediately after we start applying PCA.

For each row, a uniform probability prediction (no machine learning required), where each label has a 1/34 probability would give a log loss score of:

So if we calculate the log loss score per label, we can see that for what labels, we are performing worse than the base line probability.

Fig. 31: Distribution of predictions over all classes. We see that our Random Forest is not doing well for classes that have a very large number of sample counts compared to classes that have less counts.
Fig. 32: Smoothing probability values over all class labels. We get the lowest score of: 2.281062, with smoothing parameter: 0.000170. The improvement is 0.169367%

As it can be seen in Figure 32, we apply a smoothing parameter in order to improve the accuracy of our models.We add a small value to all the probability predictions. This is to achieve that we don’t have any 0 value probability. Note that while the for each row of the prediction matrix, this is not an issue. function used by Python rescales the matrix back to .

Figure 32, gives insight to adding a smoothing parameter to the probability predictions over different classes. We get the lowest score of: 2.281062, with smoothing parameter: 0.000170. The improvement is 0.169367%

Label No of Mispredictions Mean Log Loss
0 4 1.654153
1 18 1.918882
2 159416 2.156223
4 344 1.337823
5 1663 2.427808
6 1510 2.124929
7 1409 1.636606
10 6561 2.401358
16 446 1.805052
17 7 2.574474
18 24767 2.382460
20 92597 2.577907
21 1 5.540110
22 952 1.709928
23 15 1.773899
26 3573 2.454954
27 9 2.280118
28 34 2.523617
29 34812 2.343397
30 49930 2.108511
31 386 1.826013
32 37763 2.546528
33 79 1.984121
TABLE III: Distribution of mispredictions over all classes and the mean log loss of the mispredictions
Rank Feature Weight
0 Hour Zone 0.091309
1 Hour 0.089656
2 Y 0.062149
3 Rot60X 0.060207
4 Radius 0.058731
5 Rot45X 0.057662
6 Angle 0.057590
7 X 22 0.05119
8 Rot30Y 0.056596
9 Rot30X 0.056554
10 Rot60Y 0.055187
11 Rot45Y 0.054602
12 Street1 0.0038214
13 Minute 0.032564
14 WeekOfYear 0.031246
15 Year 0.028269
16 Day 0.02737
17 DayOfWeekNum 0.019447
18 PdDistrictNum 0.017935
19 Month 0.017296
20 Street2 0.010806
21 Season 0.008322
22 IsWeekend 0.007317
23 IsIntersection 0.003908
24 StreetType 0.000005
25 IsBlock 0.00000
TABLE IV: Feature rankings and Weights. Looking at features contribution to the predictions can tell us which feature to focus on during our predictions. For this ranking, we see that the X, Y and other time location features such as minute and hour were the features that had the most value for our predictions.

Some of the feature rankings that we have done can be seen in Table IV

Fig. 33: Actual vs Predicted Labels. Actual labels are the true integers that represent the crime type. Predicted labels for Random Forest are the classification outputs by the Random Forest.
Model Log Loss Accuracy
Random Forest 2.312060 0.218282
Naive Bayes 4.846123 0.274343
Decision Tree 8.787213 0.322790
K Neighbors 19.703055 0.195351
Logistic Regression 9.2131214 0.052230
SVM NaN NaN
MLP NaN NaN
TABLE V: Model Performance Metrics

Vii Conclusion

In this paper we have proposed a novel approach to predict multi-class crime type by incorporating unsupervised learning techniques and also relaxed some of the assumptions that we have seen in the current literature. We have kept working with all class labels and even though we got lower accuracy values, we were able to see that the best performing models were the same. When we combine supervised and unsupervised learning techniques, our workflow also produced results that could be easily generalized to other cities, since we are not putting a grid on a city like other studies have done so far. Due to lack of features and large number of class labels, we systematically crafted features in order to achieve better fit models. Specifically, we have described a methodology to run clustering algorithms on the data set, then use the distance to cluster centers as a feature in our supervised learning models. We achieved 2.2323 log loss on our Random Forest machine learning model, which was the best among various models that we have used. We hope this workflow of combining unsupervised and supervised learning models would give inspiration to create robust crime prediction workflows in fighting against crime.

Viii Acknowledgements

We would like to acknowledge Dr. Andrew Cohen from Department of Electrical and Computer Engineering for teaching this course, Dr. Robert Kane from Department of Criminology and Justice Studies and Dr. Matthew Burlick from Department of Computer Science and Informatics for advising us.

Ix Future Work

We have used Euclidean distance to calculate the distance from crime centers to crime points. Since the crimes are urban crimes, we would like to see the effect of choosing a different distance such as city-block distance in the future work.

References

  • [1] Johnson-Hart, L., and Kane, R. (2016). Deserts of Disadvantage: The Diffuse Effects of Structural Disadvantage on Violence in Urban Communities. Crime & Delinquency. DOI: 10.1177/0011128716682228.
  • [2] Chainey, S., Tompson, L., Uhlig, S.: The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal 21, 428 (2008)
  • [3] Kim S., Joshi P., Kalsi P.S. and Taheri P.: Crime Analysis Through Machine Learning, doi: 10.1109/IEMCON.2018.8614828
  • [4]

    Hochreiter, Sepp, and Jrgen Schmidhuber, Long short-term memory. Neural computation 9.8, 1997, pp. 1735-1780

  • [5] Stalidis P., Semertzidis T., Daras P.: Examining Deep Learning Architectures for Crime Classification and Prediction, arXiv:1812.00602 (2018)
  • [6] Stec A., Klabjan D.: Forecasting Crime with Deep Learning, arXiv: 1806.01486v1 (2018)
  • [7] Zhuang Y., Almeida M., Morabito M., Ding W.: Crime Hot Spot Forecasting: A Recurrent Model with Spatial and Temporal Information, IEEE International Conference on Big Knowledge (2017)
  • [8] Weisburd, David, and Cody W. Telep, Hot Spots Policing, what we know and what we need to know, Journal of Contemporary Criminal Justice, Vol 30, 2014, pp. 200-220
  • [9] Yu, C. H., Ding, W., Chen, P., and Morabito, M, Crime forecasting using spatio-temporal pattern with ensemble learning. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer International Publishing, 2014.
  • [10] Wang, D., Ding, W., Stepinski, T., Salazar, J., Lo, H., and Morabito, White, M., and Kane, R. (2013). Pathways to Career-Ending Police Misconduct: An Examination of Patterns, Timing and Organizational Responses to Officer Malfeasance in the NYPD. Criminal Justice & Behavior. M, Optimization of criminal hotspots based on underlying crime controlling factors using geospatial discriminative pattern. International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer Berlin Heidelberg, 2012
  • [11]

    Cesario, Eugenio, Charlie Catlett, and Domenico Talia, Forecasting Crimes Using Autoregressive Models. Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech) IEEE 14th Intl C. IEEE, 2016

  • [12] Open Philly Publicly available Crime Data Set, URL: https://www.opendataphilly.org/
  • [13] Google Cloud Compute Engine Free Service, Specifications URL: $https://colab.research.google.com/drive/151805XTDg--dgHb3-AXJCpnWaqRhop_2$

X Appendix

Here we report more of our crime type aggregations over hours, months, and years. We encourage the reader to explore these figures to see the underlying patterns for specific crime types.

Fig. 34: Public Drunkenness(i) and Homicide-Criminal(ii) crime counts aggregated over hours
Fig. 35: Public Drunkenness(i) and Homicide-Criminal(ii) crime counts aggregated over months
Fig. 36: Public Drunkenness(i) and Homicide-Criminal(ii) crime counts aggregated over years
Fig. 37: Arson(i) and Embezzlement(ii) crime counts aggregated over hours. We see that arson and embezzlement are total opposites from each other.
Fig. 38: Arson(i) and Embezzlement(ii) crime counts aggregated over months. We see that cold months witness less crime incidents.
Fig. 39: Arson(i) and Embezzlement(ii) crime counts aggregated over years.
Fig. 40: DUI(i) and Robbery No Firearm(ii) crime counts aggregated over hours
Fig. 41: DUI(i) and Robbery No Firearm(ii) crime counts aggregated over months
Fig. 42: DUI(i) and Robbery No Firearm(ii) crime counts aggregated over years
Fig. 43: Offenses against family and children(i) and Gambling Violations(ii) crime counts aggregated over hours
Fig. 44: Offenses against family and children(i) and Gambling Violations(ii) crime counts aggregated over months
Fig. 45: Offenses against family and children(i) and Gambling Violations(ii) crime counts aggregated over years
Fig. 46: Liquor Law Violations(i) and Forgery and Counterfeiting(ii) crime counts aggregated over hours.
Fig. 47: Liquor Law Violations(i) and Forgery and Counterfeiting(ii) crime counts aggregated over months.
Fig. 48: Liquor Law Violations(i) and Forgery and Counterfeiting(ii) crime counts aggregated over years.
Fig. 49: Receiving Stolen Property(i) Homicide-Criminal(ii) crime counts aggregated over hours.
Fig. 50: Receiving Stolen Property(i) Homicide-Criminal(ii) crime counts aggregated over months.
Fig. 51: Receiving Stolen Property(i) Homicide-Criminal(ii) crime counts aggregated over years.