Highlights

Predictive analysis is applied to crime data;

Correlation between crime and urban metrics;

Quantification of the importance of urban metrics in predicting crime.
1 Introduction
Social phenomena increasingly attract the attention of physicists driven by successful application of methods from statistical physics for modeling and describing several social systems, including the collective phenomena emerging from the interactions of individuals Castellano et al. (2009), the spread of ideas in social networks Pentland (2014), epidemic spreading PastorSatorras et al. (2015), criminal activity D’Orsogna and Perc (2015), political corruption Ribeiro et al. (2018), vaccination strategies Wang et al. (2016), and human cooperation Perc et al. (2017). In the particular case of crime, this interest trace back to works of Quetelet, who coined the term “social physics” in the 19th century Quetlet (1869). On the one hand, traditional physics methods have proved to be useful in understanding phenomena outside conventional physics Galam (2012); Conte et al. (2012). On the other hand, recently, several problems from physics have been addressed through the lenses of machine learning methods, including topics related to phases of matter Carrasquilla and Melko (2017), quantum manybody problem Carleo and Troyer (2017)
van Nieuwenburg et al. (2017), phases of strongly correlated fermions Ch’ng et al. (2017), among others. As physicists had added these new tools to the box Zdeborová (2017), naturally, social physics problems could also be addressed using such ideas. In particular, in this work, we are interested in understanding the relationships between crime and urban metrics by using statistical learning.Crime and violence are ubiquitous in society. Throughout history, organized societies have tried to prevent crime following several approaches Gordon et al. (2009). In this context, understanding the features associated with crime is essential for achieving effective policies against these illegal activities. Studies have linked crime with several factors, including psychological traits Kamaluddin et al. (2015); Gottfredson and Hirschi (1990), environmental conditions Gamble and Hess (2012); Hsiang et al. (2013), spatial patternsShort et al. (2008); Alves et al. (2015); D’Orsogna and Perc (2015), and social and economic indicators Becker (1968); Ehrlich (1973); Wilson and Kelling (1982); Glaeser et al. (1996). However, it is easy to find controversial explanations for the causes of crimes Gordon (2010). Methodological problems in data aggregation and selection Levitt (2001); Spelman (2008), errors related to data reporting Maltz and Targonski (2002), and wrong statistical hypothesis Gordon (2010) are just a few issues which can lead to misleading conclusions.
A significant fraction of the literature on statistical analysis in criminology tries to relate the number of a particular crime (e.g. robbery) with explicative variables such as unemployment Raphael and WinterEbmer (2001) and income Kelly (2000)
. In general, these analyses are carried out by using ordinaryleastsquares (OLS) linear regressions
Alves et al. (2015). These standard linear models usually assume that the predictors have weak exogeneity (errorfree variables), linearity, constant variance (homoscedasticity), normal residual distribution, and lack of multicollinearity. However, when trying to model crime, several of these assumptions are, often, not satisfied. When these hypotheses do not hold, conclusions about the factors affecting crime are likely to be misconceptions.
Recently, researchers have promoted an impressive progress on the analysis of cities, where one of the main findings is that the relationship between urban metrics and population size is not linear, but it is well described by a powerlaw function Bettencourt et al. (2007, 2010); Alves et al. (2013a, b, 2014); Hanley et al. (2016); Leitao et al. (2016). Crime indicators scale as a superlinear function of the population size of cities Bettencourt et al. (2010); Alves et al. (2013b, 2015). Other indicators (commonly used as predictors in linear regression models for crime forecasting) also exhibit powerlaw behavior with population size. These metrics are categorized into sublinear (e.g. family income Alves et al. (2013b)), linear (e.g. sanitation Alves et al. (2013b)), and superliner (e.g. GDP Bettencourt et al. (2007); Alves et al. (2013b, 2015, 2018)), depending on the powerlaw exponent characterizing the allometric relationship with the population size Bettencourt et al. (2007)
. In addition, the relationships between crime and population size as well as urban metrics and population size have some degree of heteroscedasticity
Bettencourt et al. (2007), and most of these urban indicators also follow heavytailed distributions Marsili and Zhang (1998); Alves et al. (2014). Thus, it is not surprising to find controversial results about the importance of variables for crime prediction when so many assumptions of linear regressions are not satisfied.A possible approach to overcome some of these issues is to apply a transformation to the data in order to satisfy the assumptions of linear regressions. For instance, Bettencourt et al. Bettencourt et al. (2010) (see also Alves et al. (2013b, 2015)) employed scaledadjusted metrics to linearize the data and provide a fair comparison between cities with different population sizes. By considering these variables and applying corrections for heteroscedasticity Davidson and MacKinnon (1993), it is possible to describe of the variance of the number of homicides in function of urban metrics Alves et al. (2013b). Also, by the same approach, researchers have shown that simple linear models account for – of the observed variance in data and correctly reproduce the average of the scaleadjusted metric Alves et al. (2015). However, the data still have colinearities which can lead to misinterpretation of the coefficients in the linear models Alves et al. (2013b, 2015).
A better approach to crime prediction is the use of statistical learning methods (e.g. Kang and Kang (2017)). Regression models based on machine learning can handle all the abovementioned issues and are more suitable for the analysis of large complex datasets Breiman (2003)
. For instance, decision trees are known to require little preparation of the data when performing regression
Breiman (2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000). Treebased approaches are also considered a nonparametric method because they make no assumption about the data. Among other advantages, these learning approaches map well nonlinear relationships, usually display a good accuracy when predicting data, and are easy to interpret Breiman (2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000).Here, we consider the random forest algorithm Ho (1995); Amit and Geman (1997); Ho (1998); Breiman (2001) to predict and quantify the importance of urban indicators for crime prediction. We use data from urban indicators of all Brazilian cities to train the model and study necessary conditions for preventing underfitting and overfitting in the model. After training the model, we show that the algorithm predicts the number of homicides in cities with an accuracy up to of the variance explained. Because of the high accuracy and easy interpretation of this ensemble tree model, we identify the important features for homicide prediction. Unlike simple linear models adjusted through OLS, we show that the importance of the features is stable under slight changes in the dataset and that these results can be used as a guide for crime modeling and policymakers.
2 Methods and Results
2.1 Data
For our analysis, we choose the number of homicides at the city level as the crime indicator to be predicted. Homicide is the ultimate expression of violence against a person, and thus a reliable crime indicator because it is almost always reported. In Brazil, the report of this particular crime to the Public Health System is compulsory, and these data are aggregated at the city level and made freely available by the Department of Informatics of the Brazilian Public Health System – DATASUS Brazil’s Public healthcare System (SUS) (2017)
. As possible predictor variables of crime, we select 10 urban indicators (also at the city level) available from the Brazilian National Census that took place in 2000. They are: child labor (fraction of the population aged 10 to 15 years who is working or looking for work), elderly population (citizens aged 60 years or older), female population, gross domestic product (GDP) , illiteracy (citizens aged 15 years or older who are unable to read and write at least a single ticket in the language they know), family income (average household incomes of family residents, in Brazilian currency), male population, population, sanitation (number of houses that have piped water and sewerage), and unemployment (citizens aged 16 years or older who are without working or looking for work). We have also considered as possible crimepredicting variables the number of traffic accidents and suicides (these data are also aggregated at the city level and from the same year of the census). We have chosen these indicators because they are common in studies correlating crime with socioeconomic indicators
Gordon (2010). We have further included the number of homicides in the year 2000 as a predictor for homicides in 2010 for investigating a possible autocorrelated behavior. There are thus 13 urban indicators (including also the homicide indicator) from the year 2000 in our dataset that will be used as predictors of the number of homicides 10 years later. We choose a time interval of 10 years because the characteristic time for changes in social indicators has been estimated to be of the order of decades
Bettencourt et al. (2010); Alves et al. (2015). However, we also investigate how the accuracy changes as the timelag increases from one to ten years considering homicide data from the years 2001 to 2010.2.2 Problems with usual linear models
Many statistical models try to relate crime with possible explicative variables through OLS linear regressions Gordon (2010). Thus, it is usually assumed that crime is a linear function of the explicative variables, i.e., crime rate explicative variables, where is a linear function Gordon (2010). In the context of our dataset, a naive approach to model homicides in Brazilian cities is to consider the following linear regression
(1) 
where the dependent variable is the number of homicides in the year 2010, is the number of homicides in 2000, is the th () urban indicator in 2000, and are the linear coefficients, and
is a random noise normally distributed (with zero mean and unitary variance) that accounts for unobserved determinants of homicides. It is worth noting that this model has a lagged dependent variable as an instrumental variable, which may cause problems when the error
is autocorrelated in time Wooldrige (2003).The OLS linear regression of Eq. 1
requires the residuals to follow a normal distribution. However, this conditions is not satisfied as shown by the quantilequantile plot in Fig.
1A, and confirmed by the KolmogorovSmirnov normality test (value ). Usually, researchers account for this problem by applying some transformation to data. This may eventually solve the normality problem, but collinearities may exist among urban indicators, and ignoring this fact can lead to controversial results, even when linear models display high predicting accuracies.According to the urban scaling hypothesis Bettencourt et al. (2007, 2010); Alves et al. (2013b, 2015); Leitao et al. (2016); Hanley et al. (2016), the urban indicators in our dataset are dependent of the population size. Thus, each indicator can be written in terms of the population size as , where is the urban scaling exponent, a condition that introduces correlations among all indicators. This property in regression models is called multicollinearity of the variables, and we can check for that in our data by evaluating the Pearson correlation coefficient among all urban indicators, as shown in Fig. 1B. From this correlation matrix, we verify that there are significant correlations practically between all pair of variables, violating another condition assumed by most regression models.
If we ignore the assumptions required by linear regression models and perform the OLS fitting, we could eventually find good predictions. In fact, the linear model of Eq. 1 explains of the variance in our dataset. It is an appealing result since it is easy to interpret the coefficients of a linear model. However, the feature importance quantified by regression models can be very sensitive to changes in the dataset, such as by considering undersampling or bootstrapping methods Efron and Tibshirani (1994). For instance, by bootstrapping our dataset and applying the linear model of Eq. 1, we find that (depending on the sample used) city’s income can be positively, negatively, or even uncorrelated with homicides. These facts could explain some of the inconsistencies reported in the literature about crime. For instance, Entorf et al. Entorf and Spengler (2000) found that higher income and urbanization are related to higher crime rates, whereas Fatjnzylber et al. Fajnzylber et al. (2002) claimed that average income is not correlated with violent crime, but higher urbanization is associated with higher robbery rates, and not with homicides. The same disagreements happen for unemployment, punishment and deterrence, among others indicators Gordon et al. (2009); Alves et al. (2013b, 2015). A possible way to improve the OLS regression model could be adding the variables in a stepwise fashion or considering nonlinear models (e.g., the negative binomial regression) to overcome some of these problems. Unfortunately, in the crime literature, we often find results based on simple OLS linear regression, which can mislead the conclusions about the causes of crime. We could further explore and check for other inconsistencies in this simple linear regression by following the several approaches proposed in the literature of linear regression Wooldrige (2003). Nevertheless, we now focus on an alternative approach to overcome these limitations and provide a meaningful rank of features under slight changes in the dataset. Our approach is much more flexible, requiring no transformation or data preparation to achieve good results regarding accuracy in predictions and meaning of results.
2.3 Random forest algorithm
Random forest is an ensemble learning method used for classification or regression that fits several decision trees using various subsamples of the dataset and aggregate their individual predictions to form a final output and reduce overfitting Breiman (1996, 2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000)
. In the random forest regression, several independent decision trees are constructed by a bootstrap sampling of the dataset. The final output is the result of a majority voting among the trees (estimators) or the average over all trees. The process of “bagging features” selects more often the best metrics to describe the data splits, and consequently, make them more important on the majority voting
Ho (1998). Unlike usual linear models, the random forest is invariant under scaling and various other transformations of the feature values. It is also robust to the inclusion of irrelevant features and produces very accurate predictions Hastie et al. (2013). These properties of the random forest algorithm make it especially suitable for crime forecasting, because of the multicollinearity and nonlinearities present in urban data.Within the framework of random forest regression, the linear model of Eq. 1 can be rewritten as
(2) 
where is the number of trees,
represents each tree adjusted to data with parameters expressed by the vector
. The components of are related to the maximum depth of the tree, splitting variables, cutting points at each node, and the terminalnode values Breiman (2003).One common question that arises when using machine learning algorithms is related to underfitting and overfitting. These behaviors appear when estimating the best tradeoff that minimizes the bias and variance errors James et al. (2014). Bias is related to errors in the assumptions made by the algorithm that causes the model to miss relevant information about the features describing the data. In other words, the model is too simple to describe the data behavior, what is called underfitting. On the other hand, if we increase the complexity of the model by adding a great number of parameters, the model can become quite sensitive to the noise of the training set, causing an increase in the variance of the outputs, what is called overfitting. The random forest has two main parameters controlling the tradeoff between bias and variance errors: the number of trees and the maximum depth of each tree. If a forest has only a few trees with a depth that goes to few layers of nodes, we are likely underfitting the data. However, if there are too many trees splitting the data in a lot of nodes to make the decisions, there is a chance that we are making choices based on the noise of the data rather than the true underlying behavior.
To determine the set of parameters which avoid underfitting and overfitting of data, we use the stratified fold crossvalidation Kohavi et al. (1995); Hastie et al. (2013) with for estimating the validation curves for a range of values of the parameter number of trees (Fig. 2A) and maximum depth (Fig. 2B). If the number of trees and maximum depth are both slightly smaller than 10, we find that the training score and validation score are small. As we increase the number of trees and maximum depth, both training and crossvalidation scores increase and reach constant plateaus, which indicates that up to trees or a maximum depth equal to there is no overfitting.
Another question is about how much data we need to train our model. This is a relevant question because we can eventually introduce more noise to the model by adding unnecessary data. To answer this question, we apply again the stratified fold crossvalidation with to estimate the learning curves for a range of values of random fractions of the dataset (Fig. 2C). We observe that the more data used for training the model, the better the crossvalidated scores. However, we already have good crossvalidated scores with of the data (about 1000 cities).
2.4 Predicting crime with the random forest regressor
We have to tune the model by searching for the best combination of parameters that enhances the performance of the random forest algorithm. The previous analysis of the validation and learning curves help to build a grid of parameters to seek the combination that improves our accuracy. To find the best combination of trees and the maximum depth of the trees, we use a gridsearch algorithm with the stratified fold crossvalidation (with ) as implemented in the Python library scikitlearn Pedregosa et al. (2011). This procedure exhaustively searches for the best combination of parameters that optimizes the score from a specified grid of parameter values. For our data, we find that the best accuracy (on average) is achieved for 200 trees and maximum depth equal 100.
Having the model properly trained, we can now use the random forest regressor to make crime predictions. Randomly splitting the data into for training and for testing, we obtain up to of accuracy in our predictions with an average adjusted equal to . Previous results using the same data and simple linear models combined with scaledadjusted metrics were able to predict homicides with an accuracy ranging from to (see supplementary material of reference Alves et al. (2015)). Figure 3A shows the empirical data versus the random forest predictions for a realization (one data splitting) of the algorithm. Because the dataset is split randomly, different runs can return different outputs and different scores. Figure 3
B depicts the probability distribution (computed via kernel density estimation method) of the adjusted
for 100 different splits, where we observe that the values are concentrated around the peak at .2.5 Features importance
The rank of features (urban indicators) importance is another intriguing question for a better understanding of crime. As previously discussed, usual linear regression models have found different answers to this question Gordon et al. (2009); Entorf and Spengler (2000); Fajnzylber et al. (2002). Here, we use the random forest algorithm to identify the most important urban indicators to predict crime and test whether this ranking of features is robust under slightly changes on the dataset.
The importance of a feature in a tree can be computed by accessing its contribution to the decision process. In each node of a single tree, the variable that have the best improvement in the squared error risk is used to split the region associated with that node into two subregions Hastie et al. (2013). Thus, Breiman et al. Breiman et al. (1984) proposed that the relative importance of a feature in a tree can be calculated by the sum of the improvements overall internal nodes of the tree where is used for splitting. The generalization of this measure to the random forest is the average importance of the features over all trees in the model, as implemented in the Python library scikitlearn Pedregosa et al. (2011).
We use this metric to calculate the importance of the features describing the number of homicides in our data. Because of the slight modifications in the dataset caused by the splits in training and test samples, the feature importance varies for each realization of the algorithm. To verify whether these changes affect the importance rank of the features, we calculate the importance of the features for 100 different samples and compute a boxplot ranked by the median importance of the outputs returned by the different samples (Fig. 4A). We check whether the differences in importance are statistically significant by computing the values of the Student test (testing if two samples have identical average values) with Bonferroni corrections (that corrects the values for multiple comparisons Rupert Jr (2012)). The resulting values are shown in Fig. 4B and ranked by the median in correspondence with the boxplot. From this figure, we note that unemployment is the most important feature to describe the crime, followed by illiteracy and male population. The next urban indicators in order of importance form groups of indistinguishable importance (squares on the diagonal matrix of Fig. 4B), i.e., the fourth most important feature is actually a group that includes female population, population, and sanitation. This group of features is followed by other three groups: child labor and homicides (at the fifth position), traffic accidents and elderly population (at the sixth position), and suicides and income (at the seventh position). The less important feature to predict crime is the GDP of cities. It is interesting to note that the number of homicides in past is a poor predictor of homicides in future. It is also worth remarking that despite some fluctuations in the rank caused by the different samples used to train the model, the ranking of features remains the same. This result demonstrates the robustness of the random forest algorithm for detecting the importance of features.
3 Discussion and conclusions
The accurate predictions obtained through statistical learning suggest that crime is quite dependent on urban indicators. The easy interpretation and good accuracies of the random forest algorithm show that this model is an excellent solution for predicting crime and identify the importance of features, even under small perturbations on the training dataset. We cannot assert from the rank of Fig. 4A which features have a positive and negative contribution to the number of homicides. We could try to decompose the contributions of each feature in each node of the trees, and calculate an average over all nodes and trees to identify the gradients, as described by Hastie et al. Hastie et al. (2013). However, this decomposition is problematic because features can contribute differently depending on thresholds imposed by the constraints of the sample chosen to train the model. The signals of the contributions of a particular feature can vary for different thresholds, even when the average rank of importance remains the same Breiman et al. (1984). Thus, despite the great stability of the results, it is hard to identify whether the variables cause a positive or a negative effect on the number of homicides. While one can speculate whether the most important indicators found in our results cause crime to increase or decrease, further investigation is necessary to understand the local effects of urban metrics on crime.
Because unemployment has a very large variability at local level Levitt (2001), Blumstein Blumstein (2002) argue that different data aggregation can lead to different conclusions on whether this indicator affects crime or not. There is also evidence supporting the idea that crime is affected by unemployment only when this indicator exceeds a given threshold Alves et al. (2013b)
. This is somehow similar to what random forest does when separating the hyperplane of features by thresholds and classifying the number of homicides according to the different values of the urban indicators. The algorithm cannot indicate whether unemployment contributes positively to crime; however, it indicates that this indicator is the most important for describing crime among the set of 12 features in our dataset. Particularly, a recent work has shown that the raising of shooting in schools is related to unemployment rate across different geographic aggregation levels (national, regional and city)
Pah et al. (2017), which is in agreement with our findings.The second most important feature for describing crime is illiteracy, which has been associated with violence by other works Davis et al. (1999). Previous works on scaledadjusted metrics have also shown that the levels of illiteracy are correlated to the number of homicides Alves et al. (2013b, 2015). A report made by the Canadian Police further shows that people with low literacy skills are less likely to involve in group activities than those with higher literacy skills Literacy and policing project of the Canadian association of chiefs of police (2008). Consequently, low literate people often feel isolated and vulnerable, making them more likely to involve in violence and crime Literacy and policing project of the Canadian association of chiefs of police (2008). The male population is the third most important feature for describing homicides and has also been linked to high levels of violence Hesketh and Xing (2006); Alves et al. (2013b, 2015). As discussed by Hesketh and Xing Hesketh and Xing (2006), the surplus of male population increases the marginalization in society and is linked to antisocial behavior and violence. A similar result was reported in Ref. Alves et al. (2013b), where it was found that cities with more man than the expected (in terms of the urban scaling) usually have more crimes. Similarly, our results suggest that male population plays an important role in defining the levels of crime in cities. Notice that together unemployment, illiteracy, and male population are responsible for explaining of the variance in our dataset.
It is worth mentioning that previous results based on the same data and simple linear models combined with scaledadjusted metrics were able to predict homicides with accuracy ranging from to (see supplementary material of reference Alves et al. (2015)), a much lower score compared to the ones obtained with the random forest approach. However, more sophisticated methods could improve even more this accuracy, and this could be the target of future investigations on crime. The literature of crime still lacks a full comparison with other methods (such as nonlinear models, Bayesian regression, and other machine learning techniques), a work that can unveil new properties of the data as well as improve the accuracy of predictions.
Finally, we believe that the application of machine learning for identifying urban indicators that correlate with crime helps to settle the discussion about whether an indicator is important or not for describing a particular crime type. The results of our analysis can further be used as a guide for building other crime models and may help policymakers in the seek of better strategies for reducing crime. Indeed, our results indicate that unemployment and illiteracy levels play an important role in defining the number of homicides in Brazil.
Acknowledgments
L.G.A.A. acknowledges FAPESP (Grant No. 2016/169877) for financial support. H.V.R. acknowledges CNPq (Grant No. 440650/20143) for financial support. F.A.R. acknowledges CNPq (Grant No. 307748/20162) and FAPESP (Grant No. 2016/256825 and Grant No. 13/073750) for financial support.
References
 Castellano et al. (2009) C. Castellano, S. Fortunato, V. Loreto, Statistical physics of social dynamics, Reviews of modern physics 81 (2009) 591.
 Pentland (2014) A. Pentland, Social Physics: how good ideas spread — the lessons from a new science, EBLSchweitzer, Scribe Publications Pty Limited, 2014.
 PastorSatorras et al. (2015) R. PastorSatorras, C. Castellano, P. Van Mieghem, A. Vespignani, Epidemic processes in complex networks, Reviews of modern physics 87 (2015) 925.
 D’Orsogna and Perc (2015) M. R. D’Orsogna, M. Perc, Statistical physics of crime: A review, Physics Of Life Reviews 12 (2015) 1–21.
 Ribeiro et al. (2018) H. V. Ribeiro, L. G. Alves, A. F. Martins, E. K. Lenzi, M. Perc, The dynamical structure of political corruption networks, Journal of Complex Networks cny002 (2018) 1–15.
 Wang et al. (2016) Z. Wang, C. T. Bauch, S. Bhattacharyya, A. d’Onofrio, P. Manfredi, M. Perc, N. Perra, M. Salathé, D. Zhao, Statistical physics of vaccination, Physics Reports 664 (2016) 1–113.
 Perc et al. (2017) M. Perc, J. J. Jordan, D. G. Rand, Z. Wang, S. Boccaletti, A. Szolnoki, Statistical physics of human cooperation, Physics Reports 687 (2017) 1–51.
 Quetlet (1869) A. Quetlet, Physique sociale, Bachelier, Paris, 1869.
 Galam (2012) S. Galam, Sociophysics: a physicist’s modeling of psychopolitical phenomena, Springer Science & Business Media, 2012.
 Conte et al. (2012) R. Conte, N. Gilbert, G. Bonelli, C. CioffiRevilla, G. Deffuant, J. Kertesz, V. Loreto, S. Moat, J.P. Nadal, A. Sanchez, et al., Manifesto of computational social science, European Physical JournalSpecial Topics 214 (2012) p–325.
 Carrasquilla and Melko (2017) J. Carrasquilla, R. G. Melko, Machine learning phases of matter, Nature Physics (2017).

Carleo and Troyer (2017)
G. Carleo, M. Troyer,
Solving the quantum manybody problem with artificial neural networks,
Science 355 (2017) 602–606.  van Nieuwenburg et al. (2017) E. P. van Nieuwenburg, Y.H. Liu, S. D. Huber, Learning phase transitions by confusion, Nature Physics 13 (2017) 435–439.
 Ch’ng et al. (2017) K. Ch’ng, J. Carrasquilla, R. G. Melko, E. Khatami, Machine learning phases of strongly correlated fermions, Physical Review X 7 (2017) 031038.
 Zdeborová (2017) L. Zdeborová, Machine learning: New tool in the box, Nature Physics 13 (2017) 420–421.
 Gordon et al. (2009) M. B. Gordon, J. R. Iglesias, V. Semeshenko, J.P. Nadal, Crime and punishment: the economic burden of impunity, The European Physical Journal B 68 (2009) 133–144.
 Kamaluddin et al. (2015) M. R. Kamaluddin, N. Shariff, A. Othman, K. H. Ismail, G. A. M. Saat, Linking psychological traits with criminal behaviour: A review, ASEAN Journal of Psychiatry 16 (2015) 13–25.
 Gottfredson and Hirschi (1990) M. R. Gottfredson, T. Hirschi, A General Theory Of Crime, Stanford University Press, 1990.
 Gamble and Hess (2012) J. L. Gamble, J. J. Hess, Temperature and violent crime in dallas, texas: relationships and implications of climate change, Western Journal Of Emergency Medicine 13 (2012) 239.
 Hsiang et al. (2013) S. M. Hsiang, M. Burke, E. Miguel, Quantifying the influence of climate on human conflict, Science 341 (2013) 1235367.
 Short et al. (2008) M. B. Short, M. R. D’Orsogna, V. B. Pasour, G. E. Tita, P. J. Brantingham, A. L. Bertozzi, L. B. Chayes, A statistical model of criminal behavior, Mathematical Models and Methods in Applied Sciences 18 (2008) 1249–1267.
 Alves et al. (2015) L. G. A. Alves, E. K. Lenzi, R. S. Mendes, H. V. Ribeiro, Spatial correlations, clustering and percolationlike transitions in homicide crimes, EPL 111 (2015) 18002.
 Becker (1968) G. S. Becker, Crime and punishment: An economic approach, in: The Economic Dimensions Of Crime, Springer, 1968, pp. 13–68.
 Ehrlich (1973) I. Ehrlich, The deterrent effect of capital punishment: A question of life and death, 1973.
 Wilson and Kelling (1982) J. Q. Wilson, G. L. Kelling, Broken windows, Atlantic Monthly 249 (1982) 29–38.
 Glaeser et al. (1996) E. L. Glaeser, B. Sacerdote, J. A. Scheinkman, Crime and social interactions, The Quarterly Journal of Economics 111 (1996) 507–548.
 Gordon (2010) M. B. Gordon, A random walk in the literature on criminality: A partial and critical view on some statistical analyses and modelling approaches, European Journal of Applied Mathematics 21 (2010) 283–306.
 Levitt (2001) S. D. Levitt, Alternative strategies for identifying the link between unemployment and crime, Journal Of Quantitative Criminology 17 (2001) 377–390.
 Spelman (2008) W. Spelman, Specifying the relationship between crime and prisons, Journal of Quantitative Criminology 24 (2008) 149–178.
 Maltz and Targonski (2002) M. D. Maltz, J. Targonski, A note on the use of countylevel ucr data, Journal of Quantitative Criminology 18 (2002) 297–318.
 Raphael and WinterEbmer (2001) S. Raphael, R. WinterEbmer, Identifying the effect of unemployment on crime, The Journal of Law and Economics 44 (2001) 259–283.
 Kelly (2000) M. Kelly, Inequality and crime, The Review of Economics and Statistics 82 (2000) 530–539.
 Alves et al. (2015) L. G. A. Alves, R. S. Mendes, E. K. Lenzi, H. V. Ribeiro, Scaleadjusted metrics for predicting the evolution of urban indicators and quantifying the performance of cities, PLoS ONE 10 (2015) e0134862.
 Bettencourt et al. (2007) L. M. Bettencourt, J. Lobo, D. Helbing, C. Kühnert, G. B. West, Growth, innovation, scaling, and the pace of life in cities, Proceedings Of The National Academy Of Sciences 104 (2007) 7301–7306.
 Bettencourt et al. (2010) L. M. Bettencourt, J. Lobo, D. Strumsky, G. B. West, Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities, PLoS ONE 5 (2010) e13541.
 Alves et al. (2013a) L. G. A. Alves, H. V. Ribeiro, R. S. Mendes, Scaling laws in the dynamics of crime growth rate, Physica A 392 (2013a) 2672–2679.
 Alves et al. (2013b) L. G. A. Alves, H. V. Ribeiro, E. K. Lenzi, R. S. Mendes, Distance to the scaling law: a useful approach for unveiling relationships between crime and urban metrics, PLoS ONE 8 (2013b) e69580.
 Alves et al. (2014) L. G. A. Alves, H. V. Ribeiro, E. K. Lenzi, R. S. Mendes, Empirical analysis on the connection between powerlaw distributions and allometries for urban indicators, Physica A 409 (2014) 175–182.
 Hanley et al. (2016) Q. S. Hanley, D. Lewis, H. V. Ribeiro, Rural to urban population density scaling of crime and property transactions in english and welsh parliamentary constituencies, PLoS ONE 11 (2016) e0149546.
 Leitao et al. (2016) J. C. Leitao, J. M. Miotto, M. Gerlach, E. G. Altmann, Is this scaling nonlinear?, Royal Society Open Science 3 (2016) 150649.
 Alves et al. (2018) L. G. A. Alves, H. V. Ribeiro, F. A. Rodrigues, The role of city size and urban metrics on crime modeling, in: Understanding Crime Through Science  Interdisciplinary Analysis and Modeling of Criminal Activities, arXiv preprint arXiv:1712.02817, 2018.
 Marsili and Zhang (1998) M. Marsili, Y.C. Zhang, Interacting individuals leading to zipf’s law, Physical Review Letters 80 (1998) 2741.
 Davidson and MacKinnon (1993) R. Davidson, J. G. MacKinnon, Estimation and inference in econometrics, JSTOR, 1993.

Kang and Kang (2017)
H.W. Kang, H.B. Kang,
Prediction of crime occurrence from multimodal data using deep learning,
PLoS ONE 12 (2017) e0176244.  Breiman (2003) L. Breiman, Statistical modeling: The two cultures, Quality Control And Applied Statistics 48 (2003) 81–82.
 Breiman (2001) L. Breiman, Random forests, Machine Learning 45 (2001) 5–32.
 Hastie et al. (2013) T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: Data mining, inference, and prediction, Springer Series in Statistics, Springer New York, 2013.
 James et al. (2014) G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, Springer New York, 2014.
 Death and Fabricius (2000) G. Death, K. E. Fabricius, Classification and regression trees: a powerful yet simple technique for ecological data analysis, Ecology 81 (2000) 3178–3192.
 Ho (1995) T. K. Ho, Random decision forests, in: Document analysis and recognition, 1995., proceedings of the third international conference on, volume 1, IEEE, pp. 278–282.
 Amit and Geman (1997) Y. Amit, D. Geman, Shape quantization and recognition with randomized trees, Neural computation 9 (1997) 1545–1588.
 Ho (1998) T. K. Ho, The random subspace method for constructing decision forests, IEEE Transactions On Pattern Analysis And Machine Intelligence 20 (1998) 832–844.
 Brazil’s Public healthcare System (SUS) (2017) Brazil’s Public healthcare System (SUS), Department of Data Processing (DATASUS), 2017. Accessed: 20170601.
 Wooldrige (2003) J. Wooldrige, Introductory Econometrics: A Modern Approach, SouthWestern College Pub, 2003.
 Efron and Tibshirani (1994) B. Efron, R. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994.
 Entorf and Spengler (2000) H. Entorf, H. Spengler, Socioeconomic and demographic factors of crime in germany: Evidence from panel data of the german states, International Review Of Law And Economics 20 (2000) 75–106.
 Fajnzylber et al. (2002) P. Fajnzylber, D. Lederman, N. Loayza, What causes violent crime?, European Economic Review 46 (2002) 1323–1357.
 Breiman (1996) L. Breiman, Bagging predictors, Machine Learning 24 (1996) 123–140.
 Kohavi et al. (1995) R. Kohavi, et al., A study of crossvalidation and bootstrap for accuracy estimation and model selection, in: Ijcai, volume 14, Stanford, CA, pp. 1137–1145.
 Pedregosa et al. (2011) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikitlearn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
 Breiman et al. (1984) L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, CRC press, 1984.
 Rupert Jr (2012) G. M. Rupert Jr, Simultaneous statistical inference, Springer Science & Business Media, 2012.
 Blumstein (2002) A. Blumstein, Crime modeling, Operations Research 50 (2002) 16–24.
 Pah et al. (2017) A. Pah, J. Hagan, A. Jennings, A. Jain, K. Albrecht, A. Hockenberry, L. Amaral, Economic insecurity and the rise in gun violence at us schools, Nature Human Behaviour 1 (2017) 0040.
 Davis et al. (1999) T. C. Davis, R. S. Byrd, C. L. Arnold, P. Auinger, J. A. Bocchini, Low literacy and violence among adolescents in a summer sports program, Journal of Adolescent Health 24 (1999) 403–411.
 Literacy and policing project of the Canadian association of chiefs of police (2008) Literacy and policing project of the Canadian association of chiefs of police, Literacy awareness resource manual for police, 2008. Accessed: 20170601.
 Hesketh and Xing (2006) T. Hesketh, Z. W. Xing, Abnormal sex ratios in human populations: causes and consequences, Proceedings of the National Academy of Sciences 103 (2006) 13271–13275.