Introduction
In criminology, social cohesion among neighbours has been linked to their willingness to cooperate in order to solve common problems and reduce violence Graif and Sampson (2009); Sampson and Groves (1989); Sampson (1997). Cooperation, as opposed to disorganization of neighbours is indeed believed to create the mechanisms by which residents themselves achieve guardianship and public order Sampson and Groves (1989). This mechanism also finds its roots in urban planning, where the relationship between specific aspects of urban architecture Newman (1972) and urban physical characteristics Jacobs (1961) are related to security. However, neighbourhoods are not to be considered islands unto themselves, as they are embedded in a citywide system of social interactions. On a daily basis, people’s routine exposes residents to different conditions, possibilities Wang et al. (2018), and it may favour crime Cohen and Felson (1979). Yet, mainstream studies focus on just a subset of static factors at a time, often in a single city (e.g. Chicago or New York), thus neglecting the complex urban interplay between crime, people, places, culture and human mobility.
Criminology widely recognize the importance of places. Crime occurs in small areas such as street segments, buildings or parks. However, neighbourhoods and their contextual characteristics are also believed to influence offenders’ activities. Studies on small areas and neighbourhoods roughly come from two streams of literature. The first stream focuses on the routine activity and crime pattern theories Cohen and Felson (1979); Felson and Clarke (1998); Brantingham and Brantingham (1993), and small areas. These studies suggest that crime occurs when an offender, its suitable target, and the absence of any deterrence system, such as police or even ordinary citizens Felson and Boba (2010), converge at a place. The presence of people influence the number of offenders and targets, but daily routine of residents exposes homes and people to predatory crimes Hindelang et al. (1978). The built environment was also found to affect criminal activities, as physical disorder and specific locations (e.g. bar, taverns) attract offenders and suitable targets O’Brien and Sampson (2015); Murray and Roncek (2008); Salesses et al. (2013). The second stream of literature builds upon the social disorganization theory Sampson and Groves (1989); Sampson (1997), which found high crime concentration in socially and economically disadvantaged neighbourhoods. In these studies, census data is the primary source used to measure social cohesion through socioeconomic disadvantage, ethnic diversity, residential instability Sampson and Groves (1989); Sampson (1997, 1985). In some cases, new sources of data were used. For example, scholars exploited synthetic social ties to simulate neighbourhood cohesion Hipp et al. (2013), and mobility flows to indicate crime opportunities and connections between neighbourhoods Song et al. (2019). Others leveraged crowdsourced Point of Interests (POIs), taxi flows Wang et al. (2016), and dynamic population mapping from satellite imagery Andresen (2006, 2011) and mobile phone activity Bogomolov et al. (2014); Malleson and Andresen (2015) to assess the presence of people. Altogether, these results highlight the tight relation between socioeconomic, built environment and mobility conditions, and their impact on criminal activities. Although the two streams of theory are often seen as competing, we argue that they can complement each other. However, very limited work has integrated socioeconomic, built environment and mobility conditions together in multiple cities and in small areas. Existing literature focuses on a single city, and often describe crimes at the neighbourhood level and rely on census boundaries. These limitations result in a fragmented and incomplete picture of how the numerous factors influence crime in the urban context and limit the impact of the conclusions.
Here, we seek to shed light on the diverse set of factors at play with urban crime exploring how this is related, at the same time, to social disorganisation, built environment characteristics and human mobility. Specifically, we analyse crime at the level of blocks, considering both the local features of the block and its surrounding context, represented by all the blocks within a halfmile. The contribution of this paper is twofold. First, we address the need for a comprehensive study that explores crime patterns at fine grained resolution across multiple cities of the world, analysing Bogotá, Boston, Los Angeles and Chicago. Secondly, we show that the previously neglected complex interplay between crime, people, places, and human mobility can significantly improve the performance of the crime inference. We make use of massive and ubiquitous data sources such as mobile phone records and geographical data, implying that the resulting framework can be replicated at scale. Our generated insights can help recommend effective policies and interventions that improve urban security.
Results
We study criminal activity in Bogotá (Colombia), Boston (USA), Chicago (USA) and Los Angeles (USA), four very different cities with respect to cultural, urban and socioeconomic conditions. The selected unit of analysis is the census block group, the smallest geographical unit for which the census publishes data, and measuring on average 378 square meters. We account for the contextual characteristics around the block group, here called core, by computing a corehood, defined as the set of all the surrounding block groups within a half mile from the core (see Figure 4). Note that neighbouring cores have overlapping corehoods. We tested different sizes of the corehood, finding the half mile distance as the best to describe the neighborhood effect (see the Supplementary Information (SI) Note 11).
Criminal activity is provided by police agencies, which record through police reports the geographic location, date, time of day and category of each crime event. We analyse crime belonging to two broad categories of crime: violent and property crimes, which include homicides, sexual and nonsexual aggravated assaults, robbery, motor vehicle thefts and arson. We assign each crime to a corehood through its position.
In order to estimate the number of crimes in a given core, we compute two types of features. First, we consider the characteristics of the core itself. We include features that were previously found to attract potential offenders and targets
Wang et al. (2016), such as the residential population and the number of nightlife, shops and food POIs. Then, to account for the fact that environmental (neighbourhood) characteristic influence crime Jacobs (1961); Sohn (2016), we consider corehood features in our model. We group them in Social disorganization (SD), Built Environment (BE) and Mobility (M) features. The SD characteristics include the disadvantage, instability and ethnic diversity of corehood. Consistently with the literature Sampson et al. (1999); Sampson and Groves (1989); Sampson (1997); Kubrin and Weitzer (2003), disadvantage and instability are composite variables built from the two largest principal components of: (i) unemployment rate, (ii) poverty rate, defined as people living below the poverty line, and (iii) residential mobility rate, defined as the percentage of people who recently changed residency. Again, in accordance with the literature Sampson (2013); Sampson and Groves (1989); Sampson and Graif (2009), ethnic diversity is computed as the HirschmanHerfindahl index across six population groups (e.g. hispanic, black, white people). Additional details are present in the Methods section. Note that we excluded all racespecific variables that are usually employed (e.g. percentage of black people) to build an evidencebased and raceneutral model.The BE features are based on the Jane Jacobs theory Jacobs (1961), which states that four conditions have to be valid to ensure a virtuous loop between the presence of people and a vibrant neighborhood life. First, a district should serve at least two or more functions to have streets continuously used by residents and strangers. Second, street blocks should be small and short to ensure both high walkability and frequent meeting of people at street intersections. Third, diverse buildings make it possible to have low and highrent spaces, and thus a mixture of people and enterprises. The fourth condition is about dense concentration, which ensures a sufficient presence of people and enterprises to attract dwellers from different neighbourhoods continuously. This idea is summarized by the idea that "a wellused city street is apt to be a safe street and a deserted city street is apt to be unsafe" Jacobs (1961). Moreover, walkability is promotes social realtions Leyden (2003) and connected to local cohesion of neighbors. Thus, in accordance with the literature De Nadai et al. (2016) we operationalize the four conditions in: i) landuse mix; ii) block size iii) building age diversity; iv) population density and walkability, related to the second condition but also to density and reachability of POIs in the area. The details of these metrics are available in the Methods section.
The M features are built upon recent mobility and criminology literature. We account for the average number of people at risk in the core by measuring the core ambient population Andresen (2006) and the attractiveness of the corehood, where the latter is measured as the number of trips to the corehood for reasons different than travelling to work or home. Ambient population and attractiveness are computed by simulating realistic urban traces using Timegeo Jiang et al. (2016), state of the art model for human mobility, in combination with mobile phone data. We do not include M features in Chicago, as we do not have mobile phone traces.
We model the relation of crime with core and corehood features through a spatially filtered Bayesian Negative Binomial, which is specifically tailored for discrete data, accounts for the overdispersion of crime events, and models uncertainty. The model accounts for the spatial autocorrelation, thus avoiding the biased parameters of nonspatial models Griffith and PeresNeto (2006); Tiefelsdorf and Griffith (2007)
. We identify spatial autocorrelation of crime events using a matrix indicating spatial proximity, and modelling spatial random effects. Specifically, criminal activity is explained by a linear combination of an intercept, fixed effects (i.e. the input features), and random effects, which represent the unexplained variance that emerge from the spatialautocorrelation of neighboring areas. Although we find high spatial correlation in crime events, we did not find any significant spatial autocorrelation in the residuals with our spatial model (see Note 4 in the SI). The reader can refer to the Methods section for additional details about the model and its formulation.
Model  Bogotá  Boston  Los Angeles  Chicago  

()  LOO  ()  LOO  ()  LOO  ()  LOO  
Core  0.54 (0.75)  3897  0.21 (0.64)  2035  0.18 (0.68)  9665  0.09 (0.68)  8415 
Socialdisorganization (SD)  0.57 (0.75)  3891  0.55 (0.68)  2019  0.53 (0.72)  9529  0.66 (0.78)  8019 
Built environment (BE)  0.61 (0.76)  3881  0.36 (0.68)  2014  0.27 (0.69)  9629  0.21 (0.69)  8371 
Mobility (M)  0.64 (0.80)  3804  0.42 (0.70)  2001  0.25 (0.70)  9570     
SD+BE  0.64 (0.76)  3881  0.65 (0.72)  1987  0.56 (0.72)  9508  
SD+M  0.66 (0.81)  0.67 (0.73)  1973  0.55 (0.73)  9467      
BE+M  0.68 (0.80)  3819  0.50 (0.72)  1989  0.30 (0.70)  9585     
SD+BE+M (Full)  3808     
Description and prediction of crime
For each city, we evaluate our model under various feature combinations to assess the contribution of each group of features. We measure the capability of the model to describe crime through the marginal Nakagawa et al. (2017), measuring the proportion of variance explained by the fixed effects (i.e. the input features). As reference, we also measure the conditional Nakagawa et al. (2017) that takes into account both the variance explained by the fixed and random effects (i.e. the spatial autocorrelation) in explaining crime. Additionally, we use the Paretosmoothed importance sampling LeaveOneOut crossvalidation (LOO) Vehtari et al. (2017) to assess the pointwise outofsample prediction accuracy (the higher, the better).
First, we evaluate the baseline model that includes only the core variables. Table 1 shows that the coreonly model performs poorly in Chicago, Los Angeles and Boston, while it has high in Bogotá. The difference between and highlight that in all cities there is a significant unexplained variance that is captured by the spatial random effects, but not from the input features.
The SD, BE and M features significantly increase the explanatory power of our model. Particularly, in US cities, the increases up to 161%, 194% and 633% in Boston, Los Angeles and Chicago. Notably, and not surprisingly, the SD features are very important, especially in Chicago, were the "Chicago school" forged the Social Disorganization theory and further elaborated the role of collective efficacy on dealing with crime. Differently, the increase in Bogotá is less pronounced, suggesting that the neighbourhood impact on crime is limited. Turning to M and BE features, we find that they describe the crime, but they are often as not meaningful as the SD features for crime prediction. However, the importance of mobility confirms the importance of floating population at describing microdynamic behaviour of criminal activity Caminha et al. (2017). We observe that in all cities the conditional increases when adding the SD, BE and M features, revealing that the included variables also help explain the variance of crime across cores.
Overall, Table 1 shows that considering together SD, BE and M variables result in the highest descriptive () and predictive (LOO) performance. This result means that, in order to model crime, one needs to account for multiple aspects of urban life, including Social Disorganization, the physical characteristics of the neighbourhoods, and mobility. This result holds also against different combinations of the features (i.e. SD+BE, SD+M and BE+M). Nonetheless, some of the SD+BE and SD+M models are very competitive and might be considered when all datasources are available. Particularly, the ambient population (i.e. the average number of people who stop at the core) is one of the most important variables in the model and allows to better assess the number of people at risk, as suggested by previous works on aggregated mobility Caminha et al. (2017), satellite imagery Andresen (2006), Twitter Malleson and Andresen (2015) and census data Mburu and Helbich (2016)
. However, we found that it might generate large errors due to places that are outliers of mobility in densely populated areas or hotspots of activity (see Figure S7 and Figure S8 in the SI).
improvements indicate that the model relies less on the random effects and it is better at explaining crime from the input features. Figure 2 shows the spatial gain in performance from the baseline in Bogotá. First, it reveals that our Full model prediction resembles the ground truth data (Figure 2 DE), as confirmed by the high value of . Second, it shows that, while the SD and BE models achieve localized improvements (Figure 2 AB), the Full model improves the prediction almost everywhere. However, the Full model performs quite poorly in a specific area of Bogotá (see Figure 2 C), part of the Engativá neighbourhood. By inspecting the coefficients of the model, we find that this area is an outlier as it is densely populated, thus resulting in an inflated prediction of crime, due to the high importance of residential and ambient population in the Bogotá model. Note, however, that our prediction is at the block level and the citywide goodness of fit is .
The difference between and represents the unexplained variance due to spatial autocorrelation, which might suggest missing effects and variables. In Bogotá, our model points out that the touristic and dangerous neighbourhood La Candelaria, and the populous district of Engativá have significant unexplained variance that our input features cannot capture (see Figure S4 in the SI). In Boston, the area near the Franklin park indicates missing local factors (see Figure S3 in SI). In Los Angeles, unexplained variance seems to be tied to places with a large number of people, namely the international airport and the UCLA campus (see Figure S5 in SI). Again, in Chicago, missing variables are suggested near the prison and the southern area (see Figure S6 in SI). Altogether, these signals could help policymakers on including the best factors for each city and enacting policies that prevent crime.
Previous results suggested that the use of mobility flows between different regions might help describing crime Wang et al. (2016); Wang and Li (2017). Thus, we test our model against this hypothesis by using the OriginDestination matrix of people trips to model the autocorrelation between corehoods. The idea here is that human mobility might better explain the relation between corehoods than geographical closeness. However, we find that mobility flows significantly worsen the performance of our model (see Note 5 of SI).
While the effects of urban environment characteristics, socioeconomic conditions, and mobility have been empirically tested separately De Nadai et al. (2016); Graif et al. (2017); Lee et al. (2017a); Sung and Lee (2015); Sampson (1997), to the best of our knowledge, this is the first study to support with largescale data the association of crime with socioeconomic conditions, the built environment, and the mobility. However, we find that these aspects do not play the same role across cities, and only some of them contribute to the crime prediction model.
Neighborhood variables across cities
In this section, we turn our attention to the standardized coefficients that reveal how features correlate with criminal activity.
First, we focus on the coefficients of the Full model, which combines socioeconomic features with the characteristics of the built environment and human mobility. Note that here Chicago is excluded for lack of data. Figure 3 pictures that the coefficients vary greatly across cities. For example, landuse mix correlates negatively with criminal activity in Bogotá and Los Angeles, but positively in Boston. Similarly, higher population building age diversity is present in lowcrime areas in Boston and Los Angeles, but in highcrime areas in Bogotá. Social disorganization variables are no less different, as corehood instability is correlated with crime activity only in Bogotá, differently from what expected from the theory Shaw and McKay (1942); Sampson and Groves (1989).
The discrepancies between cities could be explained by the different spatial and socioeconomic processes at play. When we look at the bivariate correlations across features, we observe interesting patterns. For example, in Los Angeles and Boston, walkability is strongly positively correlated with population density and neighbourhood attractiveness, as expected Shaw and McKay (1942); Sampson and Groves (1989), and slightly correlated with advantaged neighbourhoods. Differently, walkable areas in Bogotá have low population density areas and are highly advantaged, while the attractiveness is slightly correlated (see Figure S11 in SI). A possible reason for the
coefficients disagreement lies on the multicollinearity of the input features. Although we use the QR decomposition and Ridge penalty to shrink down the variables that are not necessary, the difference between the coefficients is present also in simpler models.
The difference between the results across cities also suggests that crime correlates differently with space and people. For example, we observe that in Bogotá high crime areas relate to advantaged neighbourhoods, while in Boston and Los Angeles higher crime seem to be linked to disadvantaged neighbourhoods, according to the theory Shaw and McKay (1942); Sampson and Groves (1989). A possible explanation might be related to underreporting and police disrespecting, which seems to be a problem particularly in Bogotá Godoy et al. (2018). However, literature has shown how neighbourhood cultural codes, informal local control, and problematic policing are also related to violent criminal activities Kubrin and Weitzer (2003).
However, some features behave similarly in all the cities. We find that corehoods with high disadvantage and ethnic diversity but, surprisingly, smaller blocks have higher crime activity. While in the core we find that the presence of Shops, Food POIs, and population (both residential and ambient) correlates positively with criminal activity. These results resonate with literature showing that the presence of POIs and ambient population increase crime due to a higher number of potential targets and offenders in an area. Additionally, we find that corehood attractiveness has a strong connection with crimes, suggesting that the presence of people that do not live nor work in the area might influence crime. This result is in contrast with literature based on Jacobs’ theory Jacobs (1961); Traunmueller et al. (2014), but resonate with Oscar Newman’s one arguing that a high number of visitors results in higher anonymity and, thus, crime Newman (1972). Additionally, a recent empirical study from survey data Boivin and Felson (2018) agrees with our result, obtained instead with largescale and passively collected information. In the supplementary materials (SI), we compare all the cities in detail.
To test the possibility of having a universal model that predicts crime, we test a model that uses only the features that behave in the same direction in all the cities. This model consistently performs worse than the Full model (see Note 10 in SI), showing that at this moment, no model is convenient to be easily applied to all cities. We also studied at what extent a model trained in one city can be tested to another city. We found that US cities are, as expected, more similar to each other than Bogotá, and that Los Angeles behave similarly to Chicago.
Discussion
In this paper, we modelled the presence of crime across four cities, widely different with respect to cultural, economic, historical and geographical aspects. We found that the variability of the dynamics and history of each city poses a challenge to the existence of a model that "fits it all", able to learn from one city and to predict on another one. Instead, we presented a model that could describe and disentangle the role of diverse factors in urban crime and draw some theoretical and practical implications.
The goal of this research goes beyond crime prediction in time (i.e. forecasting). Offences are concentrated in a small number of places Lee et al. (2017b), and are tightly coupled with places, stable over time Weisburd et al. (2012). Thus, the easiest way to predict crime is modelling those few places with the highest number of crimes, also known as hotspots Bogomolov et al. (2014); Short et al. (2010). On the contrary, we seek to shed light on the diverse set of factors at play with urban crime and do predictions for those areas without crime statistics (i.e. nowcasting).
Our cumulative results show little evidence in support of the Jane Jacobs’ theory, arguing that specific urban features and people on the street generate higher security. On the contrary, we often found that Jacobs’ features and urban vibrancy increase people’s vulnerability to crime, suggesting that further work has to be done in this direction.
We found that different theories often seen as competing can complement each other in models that take into account the socioeconomic, built environment and mobility conditions together. The importance of mobility and built environment characteristics showed that competitive descriptive and predictive models can be built from data available at large scale without the necessity of costly infield survey studies. However, we found that aspects related to social disorganisation are important for crime description and prediction. Therefore, it is crucial to consider alternative sources of data to infer social cohesion and interactions and overcome the use of census information, which is costly to collect and rarely updated. There have been multiple attempts at inferring social interactions Eagle et al. (2009), poverty Blumenstock et al. (2015), wellbeing Pappalardo et al. (2016) and unemployment Toole et al. (2015) but so far very little work has been done at micro spatial levels.
Comparing multiple cities in different countries do not come without limitations. First, our analysis ignore temporal variation such as opening times of POIs or temporal variation in mobility. Second, due to lack of consistent data, we did not account for variables such as political and housing policies, security perception, community participation, and social ties within family and within neighbourhoods that were previously found to be related to crime Faust and Tita (2019); Salesses et al. (2013); Tran et al. (2013). Finally, official crime data do not come without errors, given that not all crimes are reported nor recorded Small (2018), and there is no "ground truth" data to gauge any bias in police records.
Our work seeks to make headway on the previous limitation of a single site of study origin. While recent works have started the use of street units and blocks to study criminal activity Contreras (2017); Hipp et al. (2019); Kim and Hipp (2020); Rosser et al. (2017), they often relied on a small subset of variables and one city. Analysing multiple cities together exposed criminology theories to discrepancies and differences. Descriptive modelling can help policymakers to understand the use of urban space and deploy future investments and resources thoughtfully. Moreover, from the scientific perspective, descriptive modelling can provide insights for strong predictors, and potentially for explanatory variables, to be further investigated by explanatory modelling and experiments Kenett et al. (2018). Thus, we hope that additional research keeps exploring multidimensional aspects related to crime, to clarify potential crime causes and design better cities.
Methods
The socioeconomical and Jane Jacobs’ urban theories are dependent upon the actions and activities at work in communities. Thus, we identified corehoods as social and geographical units of analysis. Then, we obtained and aggregated the data for each corehood of Bogotá, Boston, Los Angeles and Chicago.
Crime data
Data collection mechanisms and crime categories can vary from country to country. The Uniform Crime Reporting (UCR) Program (https://ucr.fbi.gov/) is a US statistical effort to make crime reports uniform across the country. The UCR divides crime in two main groups: Part 1 and Part 2 offences. The former is composed by violent crimes (aggravated assault, forcible rape, robbery and murder) and property crimes (larcenytheft, motor vehicle theft, burglary and arson), while the latter are considered less serious and they include offences such as simple assaults and nuisance crimes. For each city we thus collect the georeferenced data of committed crimes and we filter out those crimes not belonging to Part 1 of UCR, similarly to most of the criminology literature. We categorized crimes in Bogotá consistently with UCR categories and released the mapping for future comparisons. We reference crimes to cores and, when a crime event happens in a street segment shared between cores, we evenly assign the event to both cores. Due to the limit in accuracy of GPS positioning, we create a buffer of 30 meters for each crime, which is the distance usually employed for stop location detection algorithms De Nadai et al. (2019). More details are presented in the SI. We summed crime events over one year to minimize seasonal fluctuations.
Mobile phone data
We computed the ambient population and the OD matrices for Bogotá, Boston and Los Angeles through the TimeGeo modelling framework Jiang et al. (2016)
. We fitted the model starting from aggregated and anonymized Call Detailed Records (CDRs) collected from 12012013 to 05312014, 6 weeks in 2010, and 1015, 2012 to 1124, 2012 for Bogotá, Boston and Los Angeles respectively. The anonymized data for the three cities was collected for billing purposes by two mobile operators, who also kindly provided to us the data for the present research. Timegeo is an agentbased model that simulates the activity of people from mobile phone data. To be consistent with the travel surveys of each city it simulates the time, duration, direction and type of travels within the city. The types of travels are classified as HomeBased from/to Work (HBW), HomeBased from/to Other type of locations (HBO) and NonHomebased from/to Other type of locations (NHB). To build the
ambient population we counted the number of people who stops at a specific location for at least one hour, while we built the corehood attractiveness counting the number of NHB trips with the corehood as destination.Spatial and census data
Census blocks, population, employment and poverty for US cities were drawn from the American Community Survey (ACS) (https://www.census.gov/programssurveys/acs). For US cities we also used some cityspecific datasets that are described in the SI. The census data of Bogotá was obtained by the Departmento Administrativo Nacional de Estadística (DANE), which organized the 2005 general census for the city (http://www.dane.gov.co). The poverty data of Bogotá was extracted from the Sisbén in the Identification System III of 2014. The detailed list of datasets and URLs are listed in the SI.
Built environment features
We operationalize the Jane Jacobs conditions through some state of the art metrics defined in literature De Nadai et al. (2016). The landuse mix is computed as the average entropy among land uses: , where is the percentage of square meters having land use in unit , and represents the considered land uses in the metric. The LUM ranges between 0, wherein the unit is composed by only one land use (e.g. residential), and 1, wherein developed area is equally shared among the landuses.
Then, for each corehood we determine the walkability through the accessibility of the core to the nearest point of interests (e.g. convenience stores, restaurants, sport facilities). Consistently with literature wal , we define the weighted walkability score as: , where is the set of categories (i.e., Food, Shops, Grocery, Schools, Entertainment, Parks and outside, Coffee, Banks, Books), is the streetnetwork distance decay function, and is the set of POIs of category . The distance decay function gives a weight (importance) to each POI reachable from a staring point. Additional information about the walkability score can be find in the SI.
We then compute the average block area among the set of blocks in unit as
, and the building age diversity as the standard deviation of building ages in the corehood.
Finally, we operationalize Jacobs’ density condition with the dwelling units density, computed from census data. Additional details are described in the SI.
Socialdisorganization
We create the feature disadvantage and instability through the two largest PCA principal components of: (i) unemployment rate, (ii) poverty rate, defined as people living below the poverty line, and (iii) residential mobility rate, defined as the percentage of people who recently changed residency (one year for US cities and fiver years for Bogotá). From the loadings of the PCA linear combination we verified that disadvantage is mainly a linear combination of poverty rate and unemployment, while instability is mainly about residential mobility rate.
In the Socialdisorganization variables we do not include any ethnicspecific variables (e.g. percentage of black people) other than diversity because they might be present only in some places and not in others (e.g. native Americans in Bogotá), and to avoid any ethnicspecific bias. Ethnic diversity represents the difficulties of a community to communicate and collaborate for a common goal. Accordingly to the literature, it is computed as the HirschmanHerfindahl diversity index of six population groups , where is the proportion of people belonging to the ethnicity , and is the number of ethnicities. Consistently with the literature we include for US cities: Hispanics, nonHispanic Blacks, Whites, Asians, Native Hawaiians  Pacific Islanders and others. For Bogotá we include: Indigenous, Rom, Islanders (San Andrés), Palenquero, Black and others.
Bayesian model
Let be the discrete number of crimes for a set of spatial regions . We approximate the relation between crimes and spatial features through a Negative Binomial approach that models the nonnegative nature of the crimecounts in a city, but also the overdispersion found in the data (Note 4 in the SI). Specifically, where is the input data and the coefficients of the model. are the random effects that accounts for the unexplained variability of crime (i.e. the spatialautocorrelation). In this paper, we account the spatial autocorrelation with the Bayesian Spatial Filtering (BSF) Hughes (2017) that defines where are coefficients to be found. is instead defined as the first principal components of , where is a spatial matrix that describes the graph between spatial locations, while , which is an approximation of the spatial error model Tiefelsdorf and Griffith (2007). We tested for the presence of spatial autocorrelation on the residuals of all the models without finding significant autocorrelation. As the results might change with different definitions of , we tested all the models for three definitions: i) is a binary adjacency matrix identifying whether a corehood overlaps another corehood, ii) is a inverse distance matrix between corehoods, iii) describes the flow of people between corehoods, which is extracted from mobile phone data. We found that the binary matrix consistently outperforms other definitions. Additional details of the presented models, definition of , and other competitive models tested are present in the SI.
As we have to account for collinearity, we employ a Ridge penalty to all fixed effects.
Model calibration ed evaluation
Model calibration is carried out by means of Markov Chain Monte Carlo (MCMC) approach. Convergence was assured by the GelmanRubin convergence statistics, and discarding the first 15,000 iterations and running the model over 5,000 iterations.
We assess how well the models describe crime through the conditional and the marginal Nakagawa et al. (2017), which adapt the popular coefficient of determination to the generalized linear mixedeffects models. They are defined as:
where is the variance explained by the fixed effects, is the variance explained by the random effects, and is the variance of the residuals. Specifically, , and is specific to the Negative Binomial and defined Nakagawa et al. (2017) as , with and
is the shape parameter of the Negative Binomial distribution.
We assess the out of sample predictive accuracy through the Paretosmoothed importance sampling LeaveOneOut crossvalidation (LOO) Vehtari et al. (2017), which is the state of the art for evaluating Bayesian models.
Data Availability
We are pleased to make available the sourcecode and datasets accompanying this research. The projects files are available at https://github.com/denadai2/bayesiancrimemultiplecities/.
Acknowledgements
We thank Paolo Bosetti and Junpeng Lao for the helpful comments. We especially thank Andrés Clavijo for his support on the data, we all hope that this work could make Bogotá better. This work was supported by the Berkeley DeepDrive and the ITS Berkeley 201819 SB1 Research Grant (to M.C.G.); the French Development Agency and the World Bank (to M.D.N., B.L. and E.L.).
Author contributions statement
M.D.N, E.L., M.C.G. and B.L. designed research and experiments; M.D.N, Y.X., M.C.G. and B.L. performed research and experiments; M.D.N, M.C.G. and B.L. contributed new analytic tools; M.D.N, and Y.X. analysed the data; and M.D.N, M.C.G. and B.L. wrote the paper. All authors read, reviewed and approved the final manuscript.
Competing Interests
The authors declare no competing interests.
References
 Graif and Sampson (2009) C. Graif and Robert J. Sampson, “Spatial Heterogeneity in the Effects of Immigration and Diversity on Neighborhood Homicide Rates,” Homicide Studies 13, 242–260 (2009).
 Sampson and Groves (1989) Robert J. Sampson and W. Byron Groves, “Community structure and crime: Testing socialdisorganization theory,” American Journal of Sociology 94, 774–802 (1989).
 Sampson (1997) Robert J. Sampson, “Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy,” Science 277, 918–924 (1997).
 Newman (1972) Oscar Newman, Defensible space (Macmillan New York, 1972).
 Jacobs (1961) Jane Jacobs, The death and life of great American cities (Vintage, 1961).
 Wang et al. (2018) Qi Wang, Nolan Edward Phillips, Mario L Small, and Robert J Sampson, “Urban mobility and neighborhood isolation in america’s 50 largest cities,” PNAS 115, 7735–7740 (2018).
 Cohen and Felson (1979) Lawrence E Cohen and Marcus Felson, “Social change and crime rate trends: A routine activity approach,” American sociological review , 588–608 (1979).
 Felson and Clarke (1998) Marcus Felson and Ronald V Clarke, “Opportunity makes the thief,” Police research series, paper 98, 1–36 (1998).
 Brantingham and Brantingham (1993) Patricia L Brantingham and Paul J Brantingham, “Nodes, paths and edges: Considerations on the complexity of crime and the physical environment,” Journal of environmental psychology 13, 3–28 (1993).
 Felson and Boba (2010) Marcus Felson and Rachel L Boba, Crime and everyday life (Sage, 2010).
 Hindelang et al. (1978) Michael J Hindelang, Michael R Gottfredson, and James Garofalo, Victims of personal crime: An empirical foundation for a theory of personal victimization (Ballinger Cambridge, MA, 1978).
 O’Brien and Sampson (2015) Daniel Tumminelli O’Brien and Robert J Sampson, “Public and private spheres of neighborhood disorder: Assessing pathways to violence using largescale digital records,” Journal of research in crime and delinquency 52, 486–510 (2015).
 Murray and Roncek (2008) Rebecca K Murray and Dennis W Roncek, “Measuring diffusion of assaults around bars through radius and adjacency techniques,” Criminal Justice Review 33, 199–220 (2008).
 Salesses et al. (2013) Philip Salesses, Katja Schechtner, and César A Hidalgo, “The collaborative image of the city: mapping the inequality of urban perception,” PloS one 8 (2013).
 Sampson (1985) Robert J. Sampson, “Neighborhood and crime: The structural determinants of personal victimization,” Journal of Research in Crime and Delinquency 22, 7–40 (1985).
 Hipp et al. (2013) John R Hipp, Carter T Butts, Ryan Acton, Nicholas N Nagle, and Adam Boessen, “Extrapolative simulation of neighborhood networks based on population spatial distribution: Do they predict crime?” Social Networks 35, 614–625 (2013).
 Song et al. (2019) Guangwen Song, Wim Bernasco, Lin Liu, Luzi Xiao, Suhong Zhou, and Weiwei Liao, “Crime feeds on legal activities: Daily mobility flows help to explain thieves’ target location choices,” Journal of Quantitative Criminology 35, 831–854 (2019).
 Wang et al. (2016) Hongjian Wang, Daniel Kifer, Corina Graif, and Zhenhui Li, “Crime rate inference with big data,” in ACM SIGKDD, KDD ’16 (ACM, New York, NY, USA, 2016) pp. 635–644.
 Andresen (2006) Martin A Andresen, “Crime measures and the spatial analysis of criminal activity,” British Journal of criminology 46, 258–285 (2006).
 Andresen (2011) Martin A Andresen, “The ambient population and crime analysis,” The Professional Geographer 63, 193–212 (2011).
 Bogomolov et al. (2014) Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, and Alex Pentland, “Once upon a crime: towards crime prediction from demographics and mobile data,” in ICMI (ACM, 2014) pp. 427–434.
 Malleson and Andresen (2015) Nick Malleson and Martin A Andresen, “Spatiotemporal crime hotspots and the ambient population,” Crime science 4, 10 (2015).
 Sohn (2016) DongWook Sohn, “Residential crimes and neighbourhood built environment: Assessing the effectiveness of crime prevention through environmental design (cpted),” Cities 52, 86–93 (2016).
 Sampson et al. (1999) Robert J Sampson, Jeffrey D Morenoff, and Felton Earls, “Beyond social capital: Spatial dynamics of collective efficacy for children,” American sociological review , 633–660 (1999).
 Kubrin and Weitzer (2003) Charis E Kubrin and Ronald Weitzer, “Retaliatory homicide: Concentrated disadvantage and neighborhood culture,” Social problems 50, 157–180 (2003).
 Sampson (2013) Robert J Sampson, “The place of context: a theory and strategy for criminology’s hard problems,” Criminology 51, 1–31 (2013).
 Sampson and Graif (2009) Robert J Sampson and Corina Graif, “Neighborhood social capital as differential social organization: Resident and leadership dimensions,” American Behavioral Scientist 52, 1579–1605 (2009).
 Leyden (2003) Kevin M Leyden, “Social capital and the built environment: the importance of walkable neighborhoods,” American journal of public health 93, 1546–1551 (2003).
 De Nadai et al. (2016) Marco De Nadai, Jacopo Staiano, Roberto Larcher, Nicu Sebe, Daniele Quercia, and Bruno Lepri, “The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective,” in WWW (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2016) pp. 413–423.
 Jiang et al. (2016) Shan Jiang, Yingxiang Yang, Siddharth Gupta, Daniele Veneziano, Shounak Athavale, and Marta C. González, “The TimeGeo modeling framework for urban mobility without travel surveys,” PNAS 113, E5370–E5378 (2016).

Griffith and PeresNeto (2006)
Daniel A Griffith and Pedro R PeresNeto, “Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses,” Ecology
87, 2603–2613 (2006). 
Tiefelsdorf and Griffith (2007)
Michael Tiefelsdorf and Daniel A Griffith, “Semiparametric filtering of spatial autocorrelation: the eigenvector approach,” Environment and Planning A
39, 1193–1221 (2007).  Nakagawa et al. (2017) Shinichi Nakagawa, Paul CD Johnson, and Holger Schielzeth, “The coefficient of determination r 2 and intraclass correlation coefficient from generalized linear mixedeffects models revisited and expanded,” Journal of the Royal Society Interface 14, 20170213 (2017).
 Vehtari et al. (2017) Aki Vehtari, Andrew Gelman, and Jonah Gabry, “Practical bayesian model evaluation using leaveoneout crossvalidation and waic,” Statistics and computing 27, 1413–1432 (2017).
 Caminha et al. (2017) Carlos Caminha, Vasco Furtado, Tarcisio HC Pequeno, Caio Ponte, Hygor PM Melo, Erneson A Oliveira, and José S Andrade Jr, “Human mobility in large cities as a proxy for crime,” PloS one 12, e0171609 (2017).
 Mburu and Helbich (2016) Lucy W. Mburu and Marco Helbich, “Crime Risk Estimation with a CommuterHarmonized Ambient Population,” Annals of the American Association of Geographers 106, 804–818 (2016).
 Wang and Li (2017) Hongjian Wang and Zhenhui Li, “Region representation learning via mobility flow,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017) pp. 237–246.
 Graif et al. (2017) Corina Graif, Alina Lungeanu, and Alyssa M Yetter, “Neighborhood isolation in chicago: Violent crime effects on structural isolation and homophily in interneighborhood commuting networks,” Social Networks (2017).
 Lee et al. (2017a) Sugie Lee, Chisun Yoo, Jaehyun Ha, and Jeemin Seo, “Are perceived neighbourhood built environments associated with social capital? Evidence from the 2012 Seoul survey in South Korea,” International Journal of Urban Sciences , 1–17 (2017a).
 Sung and Lee (2015) Hyungun Sung and Sugie Lee, “Residential built environment and walking activity: Empirical evidence of jane jacobs’ urban vitality,” Transportation Research Part D: Transport and Environment 41, 318–329 (2015).
 Shaw and McKay (1942) Clifford Robe Shaw and Henry Donald McKay, Juvenile delinquency and urban areas. (University of Chicago Press, 1942).
 Godoy et al. (2018) Juan Felipe Godoy, C Rodriguez, and H Zuleta, “Security and sustainable development in bogota, colombia,” Geneva: DCAF (2018).
 Traunmueller et al. (2014) Martin Traunmueller, Giovanni Quattrone, and Licia Capra, “Mining mobile phone data to investigate urban crime theories at scale,” in International Conference on Social Informatics (Springer, 2014) pp. 396–411.
 Boivin and Felson (2018) Remi Boivin and Marcus Felson, “Crimes by visitors versus crimes by residents: The influence of visitor inflows,” Journal of Quantitative Criminology 34, 465–480 (2018).
 Lee et al. (2017b) YongJei Lee, John E. Eck, SooHyun O, and Natalie N. Martinez, “How concentrated is crime at places? a systematic review from 1970 to 2015,” Crime Science 6, 6 (2017b).
 Weisburd et al. (2012) David Weisburd, Elizabeth R Groff, and SueMing Yang, The criminology of place: Street segments and our understanding of the crime problem (Oxford University Press, 2012).
 Short et al. (2010) Martin B Short, P Jeffrey Brantingham, Andrea L Bertozzi, and George E Tita, “Dissipation and displacement of hotspots in reactiondiffusion models of crime,” PNAS (2010).
 Eagle et al. (2009) Nathan Eagle, Alex Sandy Pentland, and David Lazer, “Inferring friendship network structure by using mobile phone data,” PNAS 106, 15274–15278 (2009).
 Blumenstock et al. (2015) Joshua Blumenstock, Gabriel Cadamuro, and Robert On, “Predicting poverty and wealth from mobile phone metadata,” Science 350, 1073–1076 (2015).
 Pappalardo et al. (2016) Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli, Zbigniew Smoreda, Dino Pedreschi, and Fosca Giannotti, “An analytical framework to nowcast wellbeing using mobile phone data,” International Journal of Data Science and Analytics 2, 75–92 (2016).
 Toole et al. (2015) Jameson L. Toole, Yuru Lin, Erich Muehlegger, Daniel Shoag, Marta C. González, and David Lazer, “Tracking employment shocks using mobile phone data,” Journal of The Royal Society Interface 12, 20150185 (2015), arXiv:1505.06791 .
 Faust and Tita (2019) Katherine Faust and George E Tita, “Social networks and crime: Pitfalls and promises for advancing the field,” Annual Review of Criminology 2, 99–122 (2019).
 Tran et al. (2013) Van C Tran, Corina Graif, Alison D Jones, Mario L Small, and Christopher Winship, “Participation in context: Neighborhood diversity and organizational involvement in boston,” City & Community 12, 187–210 (2013).
 Small (2018) Mario L Small, “Understanding when people will report crimes to the police,” Proceedings of the National Academy of Sciences 115, 8057–8059 (2018).
 Contreras (2017) Christopher Contreras, “A blocklevel analysis of medical marijuana dispensaries and crime in the city of los angeles,” Justice Quarterly 34, 1069–1095 (2017).
 Hipp et al. (2019) John R Hipp, YoungAn Kim, and Kevin Kane, “The effect of the physical environment on crime rates: Capturing housing age and housing type at varying spatial scales,” Crime & Delinquency 65, 1570–1595 (2019).
 Kim and Hipp (2020) YoungAn Kim and John R Hipp, “Street egohood: An alternative perspective of measuring neighborhood and spatial patterns of crime,” Journal of Quantitative Criminology 36, 29–66 (2020).
 Rosser et al. (2017) Gabriel Rosser, Toby Davies, Kate J Bowers, Shane D Johnson, and Tao Cheng, “Predictive crime mapping: Arbitrary grids or street networks?” Journal of Quantitative Criminology 33, 569–594 (2017).
 Kenett et al. (2018) Ron S. Kenett, Danny Pfeffermann, and David M. Steinberg, “Election polls—a survey, a critique, and proposals,” Annual Review of Statistics and Its Application 5 (2018), 10.1146/annurevstatistics031017100204.
 De Nadai et al. (2019) Marco De Nadai, Angelo Cardoso, Antonio Lima, Bruno Lepri, and Nuria Oliver, “Strategies and limitations in app usage and human mobility,” Scientific reports 9, 1–9 (2019).
 (61) Front Seat Walk Score Methodology, Tech. Rep., available online at http://pubs.cedeus.cl/omeka/files/original/b6fa690993d59007784a7a26804d42be.pdf. Last accessed on 3 January 2020.
 Hughes (2017) John Hughes, “Spatial regression and the bayesian filter,” arXiv preprint arXiv:1706.04651 (2017).
 Kadar et al. (2017) Cristina Kadar, Raquel Rosés Brüngger, and Irena Pletikosa, “Measuring ambient population from locationbased social networks to describe urban crime,” in International Conference on Social Informatics (Springer, 2017) pp. 521–535.
 dat (a) “Ideca,” https://www.ideca.gov.co/es/servicios/mapadereferencia/tablamapareferencia?tid_1=All&title=&submitb=Filtrar (a).
 dat (b) “Us census tiger,” ftp://ftp2.census.gov/geo/tiger/TIGER2014/TABBLOCK/ (b).
 dat (c) “Boston maps open data site,” http://bostonopendataboston.opendata.arcgis.com/datasets/142500a77e2a4dbeb94a86f7e0b568bc_0 (c).
 dat (d) “Chicago boundaries,” https://data.cityofchicago.org/FacilitiesGeographicBoundaries/BoundariesCommunityAreascurrent/cauq8yn6 (d).
 dat (e) “La boundaries,” https://data.lacity.org/AWellRunCity/Neighborhoods/ykhezspy/data (e).
 dat (f) “Bogota buildings,” https://www.ideca.gov.co/es/servicios/mapadereferencia/tablamapareferencia?tid_1=All&title=&submitb=Filtrar (f).
 dat (g) “Boston buildings,” https://data.boston.gov/dataset/buildings (g).
 dat (h) “Chicago buildings,” https://data.cityofchicago.org/Buildings/BuildingFootprintsdeprecatedAugust2015/qv973bvb (h).
 dat (i) “Los angeles buildings,” https://egis3.lacounty.gov/dataportal/2016/11/03/countywidebuildingoutlines2014updatepublicdomainrelease/ (i).
 dat (j) “Boston crime,” https://data.boston.gov/dataset/crimeincidentreportsjuly2012august2015sourcelegacysystem (j).
 dat (k) “Chicago crime,” https://data.cityofchicago.org/PublicSafety/Crimes2001topresent/ijzpq8t2 (k).
 dat (l) “Los angeles crime,” https://data.lacity.org/ASafeCity/Crimes20122015/s9rjh3s6 (l).
 dat (m) “Us census factfinder,” https://factfinder.census.gov (m).
 dat (n) “Boston landuse,” https://data.boston.gov/dataset/parcels2016datafull (n).
 dat (o) “Chicago landuse,” https://datacatalog.cookcountyil.gov/GISMaps/ccgisdataParcel2014/2m9hcq6j (o).
 dat (p) “Los angeles landuse,” https://egis3.lacounty.gov/dataportal/2015/03/10/assessorparcel/ (p).
 Moran (1948) Patrick AP Moran, “The interpretation of statistical maps,” Journal of the Royal Statistical Society. Series B (Methodological) 10, 243–251 (1948).
 Arnold et al. (1999) N Arnold, ANDREW Thomas, L Waller, and E Conlon, “Bayesian models for spatially correlated disease and exposure data,” in Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, Vol. 6 (Oxford University Press, 1999) p. 131.
 Gelman et al. (2006) Andrew Gelman et al., “Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper),” Bayesian analysis 1, 515–534 (2006).
 Potthoff (2006) Richard F Potthoff, “Homogeneity, potthoffwhittinghill tests of,” Encyclopedia of Statistical Sciences (2006).
 Chun et al. (2016) Yongwan Chun, Daniel A Griffith, Monghyeon Lee, and Parmanand Sinha, “Eigenvector selection with stepwise regression techniques to construct eigenvector spatial filters,” Journal of Geographical Systems 18, 67–85 (2016).
 Lin and Zhang (2007) Ge Lin and Tonglin Zhang, “Loglinear residual tests of moran’s i autocorrelation and their applications to kentucky breast cancer data,” Geographical Analysis 39, 293–310 (2007).
 Murakami and Griffith (2019) Daisuke Murakami and Daniel A Griffith, “Eigenvector spatial filtering for large data sets: fixed and random effects approaches,” Geographical Analysis 51, 23–49 (2019).
Appendix A Walkability
We determine the walkability of a neighbourhood through its accessibility to the nearest Point Of Interests (e.g., convenience stores, restaurants, sport facilities). The concept of walkability is empirically calculated in many different ways. However, one of the most accepted one is Walk Score wal . We here describe and compute the walkability score for our cities consistently with their methodology wal , as Walk Score is not available for all the cities we consider.
Thus, for each city block we first collect an ordered list of closest Point Of Interests (POIs) belonging to category :
(1) 
where is the closest POI of category to , is the second closest and so on so forth. And then we compute the walkability score as:
(2) 
where is the set of categories (i.e. Food, Shops, Grocery, Schools, Entertainment, Parks and outside, Coffee, Banks, Books), is the streetnetwork distance decay function (explained later), and is a weighting factor that depends on both the category and the est closest POI.
In categories where depth of choice is important, multiple POIs are considered (i.e. ). For example, restaurants and bars are combined in a single category due to their overlapping function. They are the most frequent walking destination, hence we include 10 counts of places to account for the depth of offer in the neighbourhood. The shopping category represent all the retails where people can buy products such as clothes, gifts, etc. They are common walking destinations and they are commonly described as important for the attractiveness of a place. Thus, we considered 5 counts of places for this category. Coffee shops are also important for the neighbourhood, but not as important as restaurants and shopping places. Thus, we considered 2 counts for this category. For other categories only the distance from the nearest POI is calculated. These parameters are consistent with Walk Score wal . The definitions of and as summarized in Table 2.
The amenities are extracted from Foursquare, a crowdsourced project where people participate in an online game where they checkin places where they go.
Category  

Grocery  1  
Food  10  
Shops  5  
Schools  1  
Entertainment  1  
Parks and outside  1  
Coffee  2  
Banks  1  
Books  1 
The distance decay function that computes importance weight to each POI reachable from a starting point. Similarly to Walk Score, we use a polynomial distance that assigns the maximum score to amenities meters far from the starting point, then the score decays quickly until 1500 meters, where it first slows down then it goes to zero. The distance is along the street network, instead of the geometric distance (see Figure 4).
Appendix B Crime measure
In criminology, there are usually two ways to assess crime: crime counts and rates. The main difference between the two consists on how the population at risk is modelled. The former assumes that the importance of the population at risk has to be found by the model, while the latter assumes that importance is equal to one. Specifically, in crime rates model, the crime counts with the population at risk, which is usually the residential one. Recently, some scholars have discussed the costs and benefits of alternative denominators such as the ambient population Andresen (2006). Different ways exist on how to compute it, including the use of satellite imagery Andresen (2006), census data Mburu and Helbich (2016) and Foursquare checkins Kadar et al. (2017).
However, it is not clear whether using residential rates, ambient rates, nor the policy implications of using one or the other. Moreover, the bias of ambient rates can be potentially lead to misleading interpretation of crime. In this research, we thus prefer to describe crime counts controlling for residential and ambient population. Hence, it is also possible to describe their relative role through the coefficients of the regression.
Appendix C Data sources
Type  City  Open Data  Data is shared  
Raw  Aggregated  
Blocks  Bogotá  ✓  ✓  
Boston  ✓  ✓  ✓  
Chicago  ✓  ✓  ✓  
LA  ✓  ✓  ✓  
Census Blocks  Bogotá  ✓dat (a)  ✓  ✓ 
Boston  ✓dat (b)  ✓  ✓  
Chicago  ✓dat (b)  ✓  ✓  
LA  ✓dat (b)  ✓  ✓  
Boundaries  Bogotá  ✓  ✓  
Boston  ✓dat (c)  ✓  ✓  
Chicago  ✓dat (d)  ✓  ✓  
LA  ✓dat (e)  ✓  ✓  
Buildings  Bogotá  ✓dat (f)  ✓  ✓ 
Boston  ✓dat (g)  ✓  ✓  
Chicago  ✓dat (h)  ✓  ✓  
LA  ✓dat (i)  ✓  ✓  
Crime  Bogotá  ✓  
Boston  ✓dat (j)  ✓  ✓  
Chicago  ✓dat (k)  ✓  ✓  
LA  ✓dat (l)  ✓  ✓  
Employment and Ethnic mix  Bogotá  ✓  
Boston  ✓dat (m)  ✓  ✓  
Chicago  ✓dat (m)  ✓  ✓  
LA  ✓dat (m)  ✓  ✓  
Land Use  Bogotá  ✓dat (a)  ✓  ✓ 
Boston  ✓dat (n)  ✓  ✓  
Chicago  ✓dat (o)  ✓  ✓  
LA  ✓dat (p)  ✓  ✓  
Mobile phone data  All but Chicago  ✓  
POIs  All  ✓  
Population  Bogotá  ✓  
Boston  ✓dat (m)  ✓  ✓  
Chicago  ✓dat (m)  ✓  ✓  
LA  ✓dat (m)  ✓  ✓  
Poverty  Bogotá  ✓  
Boston  ✓dat (m)  ✓  ✓  
Chicago  ✓dat (m)  ✓  ✓  
LA  ✓dat (m)  ✓  ✓  
Residential stability  Bogotá  ✓  
Boston  ✓dat (m)  ✓  ✓  
Chicago  ✓dat (m)  ✓  ✓  
LA  ✓dat (m)  ✓  ✓  
Street network  All  ✓  ✓  ✓ 
Distribution of crime
Crime is not evenly distributed in space and its distribution in the neighbourhoods of the analysed cities can be seen in Figure 5.
Appendix D The spatial model
In each city, we tested the presence of spatial autocorrelation of crime through the Moran’s I coefficient Moran (1948) . When there is positive autocorrelation, negative otherwise. In the former case, places with high crime tend to be near places with high crime, in the latter places with high crime are near places with low crime. When is not near zero, regression models might exhibit spatial correlation in the residuals, thus invalidating the assumption of independence of the errors. In these cases, regression models should account for the spatial autocorrelation between spatial units, as we did.
In our paper, we model the crime counts in a city with a Negative Binomial (NB) regression, and we account for the spatial autocorrelation with the Bayesian Spatial Filtering (BSF) Hughes (2017) approach. The NB model is defined as:
(3) 
where the mean and variance of are: . We use the logarithm as link function for the NB.
The BSF is defined as:
where is the Laplacian of , and is a Gamma with a large mean to discourage artifactual spatial structures in the posterior Arnold et al. (1999); Hughes (2017). Since we want to account for the correlation between the features, and well generalize the model, we apply a Ridge penalty to the coefficients and the QR decomposition to decorrelate covariates and, thus, the resulting posterior distribution. Thus, we model and as:
The alternative formulation that does not account for the autocorrelation is a NB model with Ridge penalty on the beta coefficients, which is defined as:
where
is the halfCauchy distribution with a mean of zero and a scale parameter of one. We chose the halfCauchy as suggested by Andrew Gelman
Gelman et al. (2006).In Table 4 we show the NB model exhibits a strong positive spatial autocorrelation in the residuals, while the BSF model does not, as expected. The models based on BSF are also superior in the LOO and .
Test for overdispersion
The NB model is motivated by the extraPoisson variability of the crime distribution in the city. We can test the need of overdispersion through the PotthoffWhittinghill and the Lagrange multiplier test.
The PotthoffWhittinghill index of dispersion test Potthoff (2006) rejects the hypothesis of no overdispersion. It is defined as:
(4) 
which is approximately a chisquare distribution with
degrees of freedom. We also apply the Lagrange multiplier test, defined as:(5) 
With one degree of freedom, the test appears to be significant – the hypothesis of no overdispersion is again rejected.
Selection of E eigenvectors
Following seminal literature of eigenbased spatial modelling and filtering Tiefelsdorf and Griffith (2007), we select the first eigenvectors from , where is a spatial matrix that describes the graph between spatial locations, while , which is an approximation of the spatial error model.
The associated sets of eigenvalues (
) from thedecomposition assess the strength of a spatial pattern. A vector
with describes positive spatial autocorrelation, while vectors with describe negative spatial autocorrelation. Spatial models are notoriously inefficient at dealing with negative spatial autocorrelation, which is also rather rare. Thus, consistently with literature Chun et al. (2016) we focus on positive autocorrelation and we select those vectors having , where is the maximum value among the eigenvalues.Test for residual autocorrelation
We test for the presence of autocorrelation in the models’ residuals to the Moran’s I. However, since our model is not an Ordinary Least Squares, we used a corrected version of Moran’s I that is specifically tailored for loglinear relationships
Lin and Zhang (2007). The index is defined as:where is the number of spatial units, is the spatial matrix, and is the vector of residual errors.
We do not find significant residual spatial autocorrelation in our spatial models.
City  Model  BSF  NB  

LOO  LOO  
Bogota  Core  3897  0.75  0.034  4126  0.53  0.455 
Socialdisorganization (SD)  3891  0.75  0.043  4079  0.58  0.354  
Built environment (BE)  3881  0.76  0.036  4061  0.61  0.371  
Mobility (M)  3804  0.80  0.042  4034  0.64  0.460  
SD+BE  3880  0.76  0.035  4013  0.65  0.287  
SD+M  3795  0.81  0.050  3988  0.67  0.374  
BE+M  3819  0.80  0.025  3980  0.68  0.361  
SD+BE+M (Full)  3809  0.80  0.040  3941  0.71  0.284  
Boston  Core  2035  0.64  0.005  2209  0.22  0.418 
Socialdisorganization (SD)  2019  0.68  0.003  2088  0.55  0.236  
Built environment (BE)  2014  0.68  0.033  2169  0.37  0.309  
Mobility (M)  2001  0.70  0.026  2140  0.45  0.351  
SD+BE  1987  0.72  0.043  2030  0.65  0.108  
SD+M  1973  0.73  0.030  2011  0.67  0.105  
BE+M  1989  0.72  0.033  2109  0.52  0.264  
SD+BE+M (Full)  1957  0.75  0.040  1993  0.70  0.084  
LA  Core  9665  0.68  0.032  10757  0.17  0.647 
Socialdisorganization (SD)  9529  0.72  0.005  10042  0.55  0.416  
Built environment (BE)  9629  0.69  0.005  10618  0.27  0.615  
Mobility (M)  9570  0.70  0.018  10658  0.24  0.628  
SD+BE  9508  0.72  0.010  9989  0.57  0.366  
SD+M  9467  0.73  0.002  10003  0.57  0.444  
BE+M  9585  0.70  0.011  10571  0.30  0.618  
SD+BE+M (Full)  9453  0.74  0.011  9967  0.58  0.388  
Chicago  Core  8415  0.68  0.117  9350  0.09  0.543 
Socialdisorganization (SD)  8019  0.78  0.016  8391  0.66  0.295  
Built environment (BE)  8371  0.69  0.093  9237  0.21  0.519  
SD+BE  8003  0.79  0.003  8357  0.68  0.282 
Appendix E Alternative spatial models
We tested alternative spatial models that could explain the residual spatialautocorrelation. Here, we compare the BSF with other two similar, but competitive models: the Random Effects Eigenvector Spatial Filtering (REESF) Murakami and Griffith (2019) and the Linear ESF model Tiefelsdorf and Griffith (2007). The ESF model is defined as:
where is the vector of the eigenvalues associated with , and is chosen to constrain the spatial random effects and avoid they penalize too much the fixed effects. To ensure limited variance, is limited to an upper value of 2.
The REESF instead assumes to be random such that:
where is a multiplier that represents the scale of spatial variance, and is a parameter to be found.
Table 5 shows that no models clearly outperforms another, suggesting that they are almost equivalent in a Full Bayesian setting.
City  Model  BSF  REESF  ESF  

LOO  LOO  LOO  
Bogota  Core  3897  0.034  3899  0.045  3902  0.041 
Socialdisorganization (SD)  3891  0.043  3895  0.052  3896  0.049  
Built environment (BE)  3881  0.036  3882  0.045  3884  0.042  
Mobility (M)  3804  0.042  3803  0.048  3807  0.046  
SD+BE  3880  0.035  3882  0.043  3884  0.040  
SD+M  3795  0.050  3796  0.057  3798  0.056  
BE+M  3819  0.025  3817  0.033  3822  0.032  
SD+BE+M (Full)  3809  0.040  3810  0.049  3810  0.046  
Boston  Core  2035  0.005  2035  0.016  2035  0.014 
Socialdisorganization (SD)  2019  0.003  2020  0.017  2020  0.016  
Built environment (BE)  2014  0.033  2013  0.044  2014  0.044  
Mobility (M)  2001  0.026  1999  0.035  1999  0.035  
SD+BE  1987  0.043  1987  0.057  1986  0.057  
SD+M  1973  0.030  1972  0.046  1971  0.045  
BE+M  1989  0.033  1988  0.043  1988  0.042  
SD+BE+M (Full)  1957  0.040  1957  0.054  1956  0.053  
LA  Core  9665  0.032  9663  0.028  9671  0.029 
Socialdisorganization (SD)  9529  0.005  9530  0.001  9535  0.000  
Built environment (BE)  9629  0.005  9629  0.002  9638  0.001  
Mobility (M)  9570  0.018  9569  0.014  9576  0.014  
SD+BE  9508  0.010  9510  0.015  9514  0.014  
SD+M  9467  0.002  9468  0.006  9472  0.006  
BE+M  9585  0.011  9585  0.008  9591  0.007  
SD+BE+M (Full)  9453  0.011  9455  0.015  9458  0.015  
Chicago  Core  8415  0.117  8413  0.115  8414  0.115 
Socialdisorganization (SD)  8019  0.016  8019  0.012  8019  0.013  
Built environment (BE)  8371  0.093  8369  0.090  8370  0.090  
SD+BE  8002  0.003  8004  0.001  8006  0.000 
Appendix F Alternative Connectivity Matrices
Incorporating spatial relationship in spatial models requires the definition of a connectivity matrix that describes the relationship (if any) between one spatial unit and all the others. One of the most common connectivity matrix is a binary relationship between spatial units, also called topology representation Griffith and PeresNeto (2006), which usually results in a sparse matrix:
(6) 
An alternative formulation is based on distance. For example, Griffith et al. Griffith and PeresNeto (2006) defines:
(7) 
where is chosen as the maximal distance that keeps all the spatial units connected, while is the (Euclidian) distance between the centroids of unit and . is computed through the maximal distance of a Minimum Spanning Tree (MST) computed on the distance matrix .
We also test for an additional formulation that accounts for the connectivity of spatial units, extracted from mobile phone data. Here, the assumption is that the more connections two sites have, the strongest is the similarity between them. Thus , where is the number of trips made, on average, from the unit to unit and viceversa. As the matrix is not symmetrical and it does not have the diagonal equal to zero:
(8) 
and for all .
As shown in Table 6, the BSF model with contiguity matrix achieves better performance in all the urban settings.
City  Model  Contiguity  Distance  Mobility  

LOO  LOO  LOO  
Bogota  Core  3897  0.034  3999  0.016  4104  0.073 
Socialdisorganization (SD)  3891  0.043  3969  0.010  4051  0.060  
Built environment (BE)  3881  0.036  3971  0.003  4051  0.048  
Mobility (M)  3804  0.042  3906  0.010  4021  0.037  
SD+BE  3880  0.035  3949  0.006  4000  0.042  
SD+M  3795  0.050  3879  0.002  3964  0.031  
BE+M  3819  0.025  3889  0.003  3975  0.027  
SD+BE+M (Full)  3809  0.040  3873  0.001  3929  0.027  
Boston  Core  2035  0.005  2078  0.036  2208  0.118 
Socialdisorganization (SD)  2019  0.003  2037  0.024  2081  0.107  
Built environment (BE)  2014  0.033  2040  0.010  2168  0.076  
Mobility (M)  2001  0.026  2014  0.010  2094  0.029  
SD+BE  1987  0.043  2007  0.002  2029  0.022  
SD+M  1973  0.030  1991  0.011  2011  0.001  
BE+M  1989  0.033  2006  0.006  2108  0.075  
SD+BE+M (Full)  1957  0.040  1976  0.008  1993  0.002  
LA  Core  9665  0.032  9966  0.061  10691  0.134 
Socialdisorganization (SD)  9529  0.005  9708  0.014  9977  0.033  
Built environment (BE)  9629  0.005  9767  0.017  10528  0.113  
Mobility (M)  9570  0.018  9875  0.050  10596  0.118  
SD+BE  9508  0.010  9668  0.005  9922  0.026  
SD+M  9467  0.002  9647  0.016  9898  0.029  
BE+M  9585  0.011  9733  0.021  10514  0.111  
SD+BE+M (Full)  9453  0.011  9613  0.003  9891  0.027  
Chicago  Core  8415  0.117  8716  0.076     
Socialdisorganization (SD)  8019  0.016  8257  0.028      
Built environment (BE)  8371  0.093  8623  0.058      
SD+BE  8003  0.003  8244  0.032     
Appendix G Spatial model decomposition
The fit of our model can be decomposed in fixed effects, which are the input variables, random effects, which are the unexplained variance through spatial autocorrelation, and residuals, which are the errors of the model.
From Figure 6 D, Figure 7 D, Figure 8 D, and Figure 9 D we do not observe any clear spatial pattern on the residuals, confirming that the BSF model is easing the spatial autocorrelation as expected.
The observation of the random effects can help on locating local spatial effects that are not considered in the fixed effects. In Bogotá, the model suggests that significant unexplained variance is present near the touristic and dangerous neighbourhood La Candelaria, and near the populous district of Engativá (see Figure 7). In Boston, the area near the Franklin park indicates missing local factors (see Figure 6). In Los Angeles, unexplained variance seems to be tied to places with a large amount of people, namely the international airport and the UCLA campus (see SI Figure 8). Finally in Chicago missing variables are suggested near the prison and the southern area (see Figure 9).
Appendix H Improvement analysis
Appendix I Autocorrelation of features
Figure 14 shows how the features do not act with the same strength and direction in all cities.
Appendix J The minimal model
Table 7 shows the results of the minimal model, which employs only the features that play the same role in all the cities. Results show that no minimal setting is better at predicting crime in all cities.
City  Model  LOO  

Bogota  Minimal  3872  0.046 
Full  3809  0.040  
Boston  Minimal  1994  0.028 
Full 1957  0.040  
Los Angeles  Minimal  9534  0.005 
Full  9453  0.003  
Chicago  Minimal  8009  0.012 
Full  8003  0.003 
Appendix K Corehood tests
Table 8 shows the results of the Full model, and SD+BE model in Chicago, for different sizes of Corehood. From the results we can observe that the best size to infer the neighborhood effect is half a mile.
City  Model  Corehood size  LOO  

Bogota  Full  0.5 miles  3809  0.040 
1 mile  3860  0.005  
Boston  Full  0.5 miles  1957  0.040 
1 mile  2026  0.011  
Los Angeles  Full  0.5 miles  9453  0.003 
1 mile  9644  0.001  
Chicago  SD+BE  0.5 miles  8003  0.003 
1 mile  8066  0.016 
Comments
There are no comments yet.