Entropy as a measure of attractiveness and socioeconomic complexity in Rio de Janeiro metropolitan area

by   Maxime Lenormand, et al.

Defining and measuring spatial inequalities across the urban environment remains a complex and elusive task that has been facilitated by the increasing availability of large geolocated databases. In this study, we rely on a mobile phone dataset and an entropy-based metric to measure the attractiveness of a location in the Rio de Janeiro Metropolitan Area (Brazil) as the diversity of visitors' location of residence. The results show that the attractiveness of a given location measured by entropy is an important descriptor of the socioeconomic status of the location, and can thus be used as a proxy for complex socioeconomic indicators.


page 3

page 5

page 13


Measuring heterogeneity in urban expansion via spatial entropy

The lack of efficiency in urban diffusion is a debated issue, important ...

Geographical veracity of indicators derived from mobile phone data

In this contribution we summarize insights on the geographical veracity ...

A Bayesian Approach for Shaft Centre Localisation in Journal Bearings

It has been shown that ultrasonic techniques work well for online measur...

Two remarks on generalized entropy power inequalities

This note contributes to the understanding of generalized entropy power ...

Airbnb's disruption of the housing structure in London

This paper explores Airbnb, a peer-to-peer platform for short-term renta...

Distance entropy cartography characterises centrality in complex networks

We introduce distance entropy as a measure of homogeneity in the distrib...

Overcoming Problems in the Measurement of Biological Complexity

In a genetic algorithm, fluctuations of the entropy of a genome over tim...


While cities have long been recognized as the cradle of modern civilization by providing a safe place for cultural development, the inequality distribution of wealth and services remain the main pressing issue threatening the sustainability of modern societies. Despite the large technological advances making our life apparently easier, economic inequality has been on the rise worldwide since 1980. This has become such an issue that most recent datasets show that the top 1% of the wealthy population capture twice as much of the global income growth as the bottom 50% Alvaredo et al. (2018). While such distribution disparity among urbanites and social stratification is currently under deep scrutiny among economists, including the spatial components to such descriptions, it imposes additional methodological difficulties given the vagility of human nature and the heterogeneity of the spatial distribution of resources.

While different views exist regarding the origins of socio-spatial inequalities across cities Ruiz-Tagle (2013), the consequences of poorly integrated societies deeply affect opportunities in key realms of social life that hamper social cohesion at a local and societal levels Jargowsky (1997); Massey (1990); Wilson (2012). While some discuss causal factors behind socio-spatial inequalities, evidence coming from natural experiments have shown direct impacts on particularly vulnerable groups Cutler and Glaeser (1997). Such evidence, among others, has tied inequalities to societal imbalances leading to critical states in terms of security, health, and wealth distribution Ruiz-Tagle (2013); Cutler and Glaeser (1997); Garreton and Sánchez (2016); Krieger (1999); Massey and Denton (1988) dreading social cohesion and precluding possibilities of enriching the social capital at particular locations Bolt et al. (1998); Farber et al. (2015, 2013); Forrest and Kearns (2001). Defining and measuring spatial inequalities remains a complex and elusive task for which scientists have recognized several dimensions that are, so far, poorly integrated with a general conceptual framework Louf and Barthelemy (2016); Massey (1990); Netto et al. (2018). For instance, its precise understanding is often linked to the study objects at hand and the particular methodology employed to study them. Dimensions of inequalities often include the localized concentration of particular groups within cities, the spatial homogeneity of social groups, their accessibility, or more particularly, their distance to downtown Caldeira (2012). Hence, devising appropriate tools to characterize the spatial distribution of complex socioeconomic factors may contribute to the urgently needed development of integrative urban planning.

The explosive use of Information and Communication Technologies (ICT), such as cellphones and large databases of user spending behavior, has made huge volumes of non-conventional data available for urban research purposes Batty (2013); Bettencourt et al. (2014); Blondel et al. (2015); Louail et al. (2017); Barbosa et al. (2018). Knowing the cellphone tower to which we connect permits the reconstruction of our daily trajectories, providing a surprisingly high spatio-temporal resolution of our social interactions Onnela et al. (2007); Panigutti et al. (2017). This approach has been widely used recently to assess a variety of topics going from individual mobility patterns Gonzalez et al. (2008) and land use patterns Lenormand et al. (2015), to the detection of relevant places of high social activity within the city Beiró et al. (2018), thereby unveiling the structure and function of cities Louail et al. (2014); Lenormand et al. (2015); Sotomayor-Gómez and Samaniego (2020). Devising an efficient mobility infrastructure has long been known as a means for city integration and the increasing availability of ICT data allows for a new understanding of spatial integration patterns and its relationship to mobility, socioeconomic and ethnic stratification Lamanna et al. (2018). Such highly resolved datasets provide a contextual understanding of land use that is readily available to derive new measures of social integration in its spatial context, thereby contributing to accurate, and near-real-time, descriptions of urban dynamics Dannemann et al. (2018); Jiang et al. (2017); Motte et al. (2016); Rubim and Leitão (2013); Toole et al. (2015); Song et al. (2010). Many of these studies are based on the concept of activity space Hägerstrand (1970); Jiang et al. (2017); Schönfelder and Axhausen (2003), defined as the set of locations visited by a traveler throughout their daily activities. Different measures describing the activity space have been studied to understand daily mobility patterns Phithakkitnukoon et al. (2012); Barbosa et al. (2018). Among these metrics, metrics based on the Shannon entropy are particularly interesting to study human mobility patterns. Indeed, the concept of “Mobility Entropy” indicators has been widely used to measure the diversity of users’ movement pattern Lin et al. (2012); Pappalardo et al. (2016); Vanhoof et al. (2018); Cottineau and Vanhoof (2019). It can be used at different scales to evaluate the diversity of trips made by an individual Pappalardo et al. (2015, 2016), the diversity of locations visited by an individual Lin et al. (2012); Cottineau and Vanhoof (2019) or a group of individuals Lamanna et al. (2018); Lenormand et al. (2018).

In this work, we rely on the concept of “Mobility Entropy” from the point of view of visiting locations in order to deepen our understanding of human mobility in the context of urban computing by focusing on the concept of attractiveness. We particularly look into mapping the entropy of urban structure using increasingly available mobile phone datasets as a tool to provide highly resolved descriptions of the relationship between attractiveness and several key aspects of the urban environment such as productivity, education and ethnic origin in the Rio de Janeiro Metropolitan Area of Brazil. We focus here on the diversity of visitors’ residence to measure the attractiveness of a location and then compare our results to economic and social indicators to assess how entropy effectively relates to socioeconomic indicators. We show that entropy is an important descriptor of socioeconomic complexity across this vastly populated area.

Figure 1: Rio de Janeiro Metropolitan Area (RJMA). The RJMA is composed of 49 locations, 16 municipalities outside the Capital represented in grey and 33 sub-districts inside the Capital, grouped into 5 districts.

Materials and Methods

The study area and dataset

The study area is the Rio de Janeiro Metropolitan Area (RJMA), the second largest urban area in Brazil with 12,145,734 inhabitants. Administratively, the RJMA is a part of the Rio de Janeiro State, of which Rio de Janeiro city (Rio for short) is the state Capital and the largest municipality with 6,320,446 inhabitants and 1,200,177 km.

The organization responsible for the demographic census in Brazil is the Institute of Geography and Statistics (IBGE) who follows global standards to aggregate census tracts in sub-district, district, city, state, and country levels such that this partitioning can be used for most regions in the world and at different scales. This study relies on such partitioning, dividing the study area in 49 locations (Figure 1) where the city of Rio is divided in 33 sub-districts aggregated into 5 districts as shown in Figure 1. Districts are called Planning Areas (AP) and represent macro zones of the city with AP1 the center; AP2, the southern zone; AP3, the northern zone; AP4 Barra-Jacarepaguá; and AP5 depicts the western zone.

Our analysis is based on mobile phone data provided by a Brazilian telecommunication operator. The dataset was collected during 363 days between January and December 2014 across the phone area code 21. We use 2.1 10 call records originated from 2.9 10 anonymized subscribers. Only outgoing voice call data were made available for this work. We first focused on the identification of user’s residence. The algorithm to detect places of residence is based on the analysis of the most frequently visited locations on evenings and weekends (see the Appendix for more details). This step allows us to discard users not living in the RJMA and remove users with no significant activity for the analysis. residences were identified.

We then aggregate the data in space and time. Aggregated records represent the number of users living in the location and visiting the location at time . We spatially aggregate the antennas’ Voronoi polygons in order to obtain locations matching the 49 locations composing the RJMA shown in Figure 1. We also divide each day in four 6-hours shifts (Morning, Work, Afternoon and Night) and label each time period as either weekday or weekend, including holidays. More details regarding the data preprocessing are available in Appendix.

Entropy as a measure of attractiveness

For each time interval

, there is a probability that a user living in

, will visit location described by:


This probability describes the production of visitors and is normalized by the total number of users living in location . In this study, we are interested in the diversity of visitors’ location of residence as a measure of the attractiveness of the destination. We therefore need to compute the probability that a user visiting location lives in location . To do so, we combine the probability

with census data to estimate

, the number of users living at location and visiting the location at time using the following equation.


is the population of location according to the 2010 IBGE census. We can now compute the probability for an individual visiting at time that lives in (Equation 3).


This second probability is thus related to the attraction of visitors, normalized at destination, and allows us to compute the normalized Shannon entropy as follows,


Large entropy values () mean that people visiting location at time are evenly distributed among all 49 locations, whereas a smaller values of entropy means that people visiting location at time tend to be mostly concentrated among few residence locations. The entropy has been widely used to analyze and model human mobility patterns. It can be used in spatial analysis to describe the diversity of individual movement patterns Vanhoof et al. (2018) or in spatial interaction modeling to estimate trip distributions by entropy maximization Wilson (1969) to name a few. It is worth noting that we focus in this work on the analysis of entropy as a measure of attractiveness that can be used as a proxy for complex socioeconomic indicators.

It is important to keep in mind that a given entropy value can cover a large variety of situations regarding the distance traveled by visitors. Here, we characterize the relationship between traveled distance and entropy by computing the radius of attraction of a location as the average distance traveled by people visiting at time :


where is the distance from location to along the road network between the locations’ centroids computed using the Google Maps API 111 https://developers.google.com/maps/documentation/distance-matrix/. This calculation is particularly important in the case of Rio due to the presence of mountains, lakes and the Guanabara Bay, which makes road distances between certain locations very different from the Euclidean distances.

Finally, we also consider the ratio between the number of visitors divided by the population as a complementary measure of attractiveness.

Figure 2:

Results of the clustering analysis.

Log-log scatter plot of (a) the attractiveness and (b)

the radius of attraction in terms of the entropy index. The inset in (a) shows the relationship after removing one outlier (cluster C4). Each dot represents a location within the study area. Indicators have been averaged over the work shift time period during weekdays.

Entropy, economic and sociodemographic indicators

Because our entropy index represents a synoptic representation of mobility across the RJMA, we finally seek to describe its impact in terms of well known economic, social and demographic indicators as collected by the IBGE. We therefore evaluated how the diversity of visitors relates to the economic performance of the city by plotting the number of jobs and income levels against our entropy estimation. Sociodemography, in turn, was assessed by establishing the relationship between education levels in primary and secondary (high school) education among the population resident in each partitioned area. Finally, two developmental indices were chosen to evaluate entropy performance across the RJMA.


Classification of locations according to their attractiveness

We start our analysis by performing a clustering analysis to group together locations exhibiting similar features regarding their attractiveness. As a first step, we focus on two features across the urban landscape, the diversity at the origin location and the attractiveness at work locations. This led us to average the three indicators for each location (Equations 4, 5 and 6

) over the work shifts time periods on weekdays. Locations are clustered using the k-means algorithm based on the three standardized averaged metrics. The number of clusters were chosen based on the ratio between within-group variance and the total variance (see the Appendix for more details). We obtained four clusters. Clustering results and the relationships between the different metrics are shown in Figure

2. We observe a positive relationship between metrics, in which attractiveness and radius of attraction tend to increase with the entropy. There is nevertheless a strong dispersion around these tendencies with an attractiveness and radius of attraction values that can double for a given entropy value.

Figure 3 shows the spatial distribution of the four resulting clusters across the whole studied area. Clusters are determined by a certain level of attractiveness and can be described as follows:

  • C1 (red) represents a low attractive cluster composed of 17 locations. It is characterized by a low entropy, an attractiveness ratio lower than one and a low radius of attraction. Locations in C1 are far from the Rio city center or segregated areas inside the Capital;

  • C2 (green) is a cluster of 22 locations, mostly located inside the city. This cluster is characterized by medium values of entropy of visitors and radius of attraction, while having an attractiveness ratio close to one;

  • C3 (blue) is an attractive group with 8 locations mostly near to the sea inside Capital. This cluster shares high entropy values, attractiveness ratio between 1 and 2 and a large radius of attraction;

  • C4 (orange) is composed of only one location that can be considered as an outlier due to its very high attractiveness. The remaining three clusters do not change if this outlier is removed before clustering. This location is the business center (Centro) of the city, and is a very attractive cluster with a very large entropy (), attractiveness ratio and radius of attraction (). This location concentrates most of jobs and visitors from all the RJMA.

Figure 3: Map of the RJMA that display the spatial distribution of four clusters.
Figure 4: Zoom on Rio de Janeiro city. (a) Favela sub-districts (in purple) and business center (in orange) locations. We focus the discussion in five locations (1) Complexo do Alemão, (2) Jacarezinho, (3) Rocinha, (4) Complexo da Maré, and (5) Cidade de Deus. (b) Clusters spatial distribution. (c) Municipal Human Development Index (MHDI) from 2013. (d) Social Progress Index (IPS) from 2016.

Our methodology allows us to detect segregated areas with a very low diversity of visitors and attractiveness. Figure 4 shows the comparison of the clustering results with two social development indexes. We focus the discussion in five locations shown in Figure 4

a: (1) Complexo do Alemão, (2) Jacarezinho (3) Rocinha (4) Complexo da Maré and (5) Cidade de Deus. The first four locations are classified by Rio City Hall as favela sub-districts

222http://bit.ly/2O9SEdA (in portuguese) and are shown in purple in Figure 4a. The term “favela” is used here in the sense of subnormal agglomerate as defined by IBGE 333http://bit.ly/337gQlb: “a form of irregular occupation of land usually characterized by an irregular urban pattern, with scarce essential public services and located in areas not proper or allowed for housing use”. In a broad sense, favela also includes urbanized areas, areas that were once subnormal agglomerates but have been urbanised, and also housing estates. The favela sub-districts assigned in purple in Figure 4a are defined according to Rio City Hall, as the locations with more that 50% of population living in subnormal agglomerates. In Cidade de Deus, only 13% of the population is living in subnormal agglomerates as it is mostly composed by housing estates building, while its socioeconomic indexes are similar to the favela sub-districts.

Figure 5: Economic analysis. Number of jobs (a) and income (in Brazilian Reals) (b) as a function of the entropy index. The entropy have been averaged over the work shift time periods on weekdays.

Figure 4b shows a zoom in the clustering results. The main favela sub-districts were classified in low attractive cluster (C1) as did Cidade de Deus. Complexo da Maré has also many housing estates building and 54% of its population living in subnormal agglomerate. It was classified in the medium attractive cluster (C2), maybe because it is crossed by two of the main expressways that lead to the exit of the city. In the dataset used in this work, a visitor is detected in a given location by a call recorded within the location, such that some detected visitors may be passing-by the location to reach another destination.

Figure 4c and 4d show two social development indexes. In Figure 4c the Municipal Human Development Index (MHDI), which is an adaptation of the Human Development Index (HDI) for municipalities. The MHDI data were obtained from the Atlas of Human Development in Brazil 444www.atlasbrasil.org.br, where the MHDI computed in 2013 is available at the census track level, as so as aggregated values for all municipalities and for district level in metropolitan areas. In Rio, the MHDI is available for the macro zones shown in Figure 1 and the value for the five locations of interest in Figure 4a were obtained from the census track level. The classes and colours used in Figure 4c were suggested by the Atlas. All five locations assigned in Figure 4a were classified as medium MHDI and many locations classified in the high attractive cluster (C3) have very high MHDI.

The MHDI is a global index intended to compare the social development in the whole country. The Rio City Hall has adopted the Social Progress Index (IPS), which is more focused on the city characteristics and is based in 32 indicators in three dimensions. The data used in this work were computed in 2016 and obtained from the open data portal of Rio City Hall 555www.data.rio. The colours and levels presented in Figure 4d are the ones used by the Rio City Hall. It can be seen from Figure 4d that all four locations assigned in low attractive cluster (C1) have low IPS (IPS 50). The Complexo da Maré sub-district has medium IPS (50 IPS 60) and was assigned to the medium attractive cluster (C2). Moreover, most locations assigned to high attractive cluster (C3) have a very high IPS (IPS 70). There is a very good agreement between the clusters computed from mobility and IPS, as cluster C1 correspond to IPS 50, cluster C2 corresponds to 50 IPS 70 and cluster C3 corresponds to IPS 70.

In the next section, we discuss the relationship between the mobility indicators and the economic and social indicators selected for this study.

Figure 6: Sociodemographic analysis. Percentage of primary school level education (a), high school level education (b), black people (c), and white people (d) as a function of the entropy index. The entropy have been averaged over the work shift time periods on weekdays.
Figure 7: Social development indexes. MHDI (a) and IPS (b) as a function of the entropy index. The entropy have been averaged over the work shift time periods on weekdays.

Economic activity and sociodemographic factors

While transportation mobility has largely been recognized as a major player in the urban economy Duranton and Puga (2004), the recent scrutiny of Call Detail Records (CDR) have expanded our understand of how mobility relates to economic activity across cities Cottineau and Vanhoof (2019); Xu et al. (2018). We here evaluated how entropy relates to officially reported job numbers and income levels (Figure 5). In spite of the large informal job market known to occur in RJMA, our analysis shows a positive and exponential relationship between formal jobs and entropy (Figure 5a). Similar patterns emerge when relating income level with entropy (Figure 5b) as well as with Gross Domestic Product (GDP) (see Appendix).

Interestingly, opposite trends emerge when entropy is plotted against demographic indices such as the percentage of the population having completed primary education and high school degrees. In Figure 6, “primary school” refers to the percentage of individuals having primary school or lower education level and “high school” refers to individuals having high school or higher education level. School degrees are positively correlated with income, meaning that higher income locations tend to have higher education levels. In the same way race is negatively correlated with income, there is indeed a prevalence of white skin individuals in higher income locations and the prevalence of black skin individuals in lower income locations. As entropy is related to income (Figure 5), locations having a large fraction of its population with a completed primary school diploma or lower exhibit lower entropy values (Figure 6a), while locations with a large proportion with high school or higher education level is positively associated to entropy (Figure 6b). This is strikingly similar to the pattern exhibited by ethnic origin. Black skin population, as well as the percentage of primary school, also shows a negative relation to entropy (Figure 6c), while areas with a larger percentage of white skin population tend to exhibit higher entropy values (Figure 6d).

The entropy of visitors, computed from CDR, reflects the complexity of indicators usually computed using classical approaches. In fact, entropy seems to be positively associated with socioeconomic indicators such as MHDI and IPS (Figure 7), as shown in Figures 5 and 6.

Figure 8: Temporal evolution of the three metrics. From the top to the bottom, entropy, attractiveness and radius of attraction as a function of time by cluster. The values are averaged by cluster and normalized by the value obtained for the work shift during weekdays. A similar plot displaying boxplots instead of average values is available in Appendix.

Temporal evolution of the attractiveness

To study the temporal evolution of entropy, attractiveness and radius of attraction, we plot the normalized average metric values for each cluster across time shifts (Figure 8). Normalizations are performed using the reference values obtained for the work shift time period on weekdays. We decided here to consider relative, instead of absolute, values in order to make average attractiveness of clusters of locations comparable over time. Entropy tends to globally decrease along the day on both weekdays and weekends for every location whatever the cluster it belongs to. It is, however, interesting to note that the entropy is relatively higher during weekday night and weekends for locations classified as low attractive during weekday work shifts compared to highly attractive locations. Indeed, while locations of cluster C4 exhibits an entropy index 50% lower than the reference value, it actually represents 80% for cluster C2/C3, and more than 90% for locations belonging to cluster C1. A similar behavior is observed for the radius of attraction. The situation is slightly different, however, for the attractiveness with an increase of the metrics during afternoons and night shifts on weekdays for the low attractive cluster C1. It further reaches a plateau during the weekend days. The location of cluster C4 shows an opposite behavior with a decreasing attractiveness along the day to reach a plateau during weekend days. The attractiveness remains more or less constant for locations belonging to cluster C2 and C3.


The impact of socio-spatial inequalities on urban systems has largely been treated in the urban economics and sociological literature, but the increasing availability of large mobile phone databases has open the possibility to provide a clearer picture of how different aspects of urban life impact economic and sociodemographic aspects of cities Blondel et al. (2015). Going into this direction, this work presents the results of the processing of 2.1 Billion records collected from 2 million users in the Rio de Janeiro Metropolitan Area, Brazil, during the whole year of 2014, placing this research among the largest analysis, to our knowledge, used to relate mobility and its link to socioeconomic complexity in Brazil. We hereby illustrate the potential of combining mobile phone data with entropy-based metrics to measure the attractiveness of a location. This may prove useful to urban planners and managers when it comes to describe and plan for complex socioeconomic indicators. While it is known that mobility is in fact related to economic activity, this work presents an effective and simple way to measure such relationships from increasingly available ICT data such as mobile phone datasets.

While most capital cities in South America suffer from a disproportionate growth compared to other urban settlements Henderson (1991), common patterns of spatial inequalities show that underprivileged populations establish themselves away from highly productive central zones Sabatini (2006); Dannemann et al. (2018), with often clear differences among the usage of urban infrastructure Lotero et al. (2016). In this sense, the particular and complex topography of Rio de Janeiro would suggest the existence of shared usage patterns of the city among urbanites coming from different social contexts. The spatial partitioning employed in our study closely matches IBGE delineation, we are therefore able to compare official statistics with measures derived from CDR data and offer specific insights regarding the usage of ICT as proxies for the spatial distribution of complex socioeconomic indicators derived from mobile phone datasets. Our analysis shows that the attractiveness of a district measured with the diversity of visitors’ place of residence is correlated with the income and the number of jobs in spite of the large informal job market of Rio Motte et al. (2016).

We also show that the attractiveness is lower in areas hosting a large percentage of the population with African descent and/or locations where primary school training is prevalent (Figure 6a,c). While this points to previous descriptions showing how available schooling options closely reproduce residential patterns of socio-spatial segregation Flores (2008); Li et al. (2013), the spatial mismatch and highly productive Centro area, where work opportunities are concentrated in the RJMA, leads us to think that residential segregation of the poorest is reinforced by new inequalities when taking into account daily mobility opportunities. Unfortunately, and in spite of using state-of-the-art descriptors of urban diversity, we are able to corroborate a well-known trend in which areas with large African descendant populations are still syndicated as an indicator of social inequality. This poses important planning challenges to historical areas such as the RJMA, where almost one million enslaved Africans were estimated to arrive in the XVII century Karasch (1987).

The observed results concur on recent developments in the scientific literature that show how mobile phone information can be used to evaluate the socioeconomic state of spatially heterogeneous regions Pappalardo et al. (2015); Eagle et al. (2010); Blumenstock et al. (2015), especially in developing countries. Moreover, the RJMA is a very particular case study where socioeconomic isolated districts are placed in-between richer areas, as well as in the periphery, which is more common in greater cities of developing countries. This particular characteristic of the city allows to validate the results, as the clusters accurately identified favelas and other socioeconomic isolated districts, as shown in Fig. 4.

In summary, this manuscript serves to illustrate the potential of mobile phone data combined with entropy-based metrics for measuring the attractiveness of a location that can be used as a proxy for complex socioeconomic indicators. Even if the spatial partitioning used in this study tends to reduce the level of spatial uncertainty inherent in this type of data sources Lenormand et al. (2016), it would be interesting to reproduce the results with different datasets coming form different sources of mobility information.


JCC, VFV, MAHBS and AGE acknowledge the funding granted by The Rio de Janeiro State Research Agency (FAPERJ) and by the Getulio Vargas Foundation. The work of ML was funded by the French National Research Agency (grant number ANR-17-CE03-0003). HS was funded by FONDECYT-CONICYT Chile (grant no. 1161280).


  • [1] F. Alvaredo, L. Chancel, T. Piketty, E. Saez, and G. Zucman (2018) World inequality report 2018. Belknap Press. Cited by: Introduction.
  • [2] H. Barbosa, M. Barthelemy, G. Ghoshal, C. R. James, M. Lenormand, T. Louail, R. Menezes, J. J. Ramasco, F. Simini, and M. Tomasini (2018) Human mobility: models and applications. Physics Reports 734, pp. 1–74. Cited by: Introduction.
  • [3] M. Batty (2013) Big data, smart cities and city planning. Dialogues in Human Geography 3 (3), pp. 274–279. Cited by: Introduction.
  • [4] M. G. Beiró, L. Bravo, D. Caro, C. Cattuto, L. Ferres, and E. Graells-Garrido (2018) Shopping mall attraction and social mixing at a city scale.

    EPJ Data Science

    7 (1), pp. 28.
    Cited by: Introduction.
  • [5] L. Bettencourt, H. Samaniego, and H. Youn (2014) Professional diversity and the productivity of cities. Scientific Reports 4, pp. 5393. Cited by: Introduction.
  • [6] V. D. Blondel, A. Decuyper, and G. Krings (2015) A survey of results on mobile phone datasets analysis. EPJ data science 4 (1), pp. 10. Cited by: Introduction, Discussion.
  • [7] J. Blumenstock, G. Cadamuro, and R. On (2015) Predicting poverty and wealth from mobile phone metadata. Science 350 (6264), pp. 1073–1076. Cited by: Discussion.
  • [8] G. Bolt, J. Burgers, and R. Van Kempen (1998) On the social significance of spatial location; spatial segregation and social inclusion. Netherlands Journal of Housing and the built Environment 13 (1), pp. 83. Cited by: Introduction.
  • [9] T. Caldeira (2012) Fortified enclaves: the new urban segregation. In The urban sociology reader, pp. 419–427. Cited by: Introduction.
  • [10] C. Cottineau and M. Vanhoof (2019) Mobile Phone Indicators and Their Relation to the Socioeconomic Organisation of Cities. ISPRS International Journal of Geo-Information 8 (1), pp. 19. Cited by: Introduction, Economic activity and sociodemographic factors.
  • [11] D. Cutler and E. Glaeser (1997) Are ghettos good or bad?. The Quarterly Journal of Economics 112 (3), pp. 827–872. Cited by: Introduction.
  • [12] T. Dannemann, B. Sotomayor-Gómez, and H. Samaniego (2018) The time geography of segregation during working hours. Royal Society open science 5 (10), pp. 180749. Cited by: Introduction, Discussion.
  • [13] G. Duranton and D. Puga (2004) Micro-foundations of urban agglomeration economies. In Handbook of Regional and Urban Economics, Vol. 4, pp. 2063–2117. External Links: ISBN 978-0-444-50967-3 Cited by: Economic activity and sociodemographic factors.
  • [14] N. Eagle, M. Macy, and R. Claxton (2010) Network diversity and economic development. Science 328 (2010), pp. 1029–1031. Cited by: Discussion.
  • [15] S. Farber, T. Neutens, H. Miller, and X. Li (2013) The social interaction potential of metropolitan regions: a time-geographic measurement approach using joint accessibility. Annals of the Association of American Geographers 103 (3), pp. 483–504. Cited by: Introduction.
  • [16] S. Farber, M. O’Kelly, H. Miller, and T. Neutens (2015) Measuring segregation using patterns of daily travel behavior: a social interaction based model of exposure. Journal of Transport Geography 49, pp. 26–38. Cited by: Introduction.
  • [17] C. A. Flores (2008) Residential segregation and the geography of opportunites: a spatial analysis of heterogeneity and spillovers in education. Ph.D. Thesis, LBJ School of Public Affairs, University of Texas. Cited by: Discussion.
  • [18] R. Forrest and A. Kearns (2001) Social cohesion, social capital and the neighbourhood. Urban studies 38 (12), pp. 2125–2143. Cited by: Introduction.
  • [19] M. Garreton and R. Sánchez (2016) Identifying an optimal analysis level in multiscalar regionalization: a study case of social distress in greater santiago. Computers, Environment and Urban Systems 56, pp. 14–24. Cited by: Introduction.
  • [20] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabási (2008) Understanding individual human mobility patterns. Nature 453 (7196), pp. 779. Cited by: Introduction.
  • [21] T. Hägerstrand (1970) What about people in regional science?. Papers in regional science 24 (1), pp. 6–21. Cited by: Introduction.
  • [22] J. V. Henderson (1991) Urban development: theory, fact, and illusion. Oxford University Press. Cited by: Discussion.
  • [23] P. A. Jargowsky (1997) Poverty and place: ghettos, barrios, and the american city. Russell Sage Foundation. Cited by: Introduction.
  • [24] S. Jiang, J. Ferreira, and M. C. Gonzalez (2017) Activity-based human mobility patterns inferred from mobile phone data: a case study of singapore. IEEE Transactions on Big Data 3 (2), pp. 208–219. Cited by: Introduction.
  • [25] M. C. Karasch (1987) Slave life in rio de janeiro, 1808-1850. Princeton University Press. External Links: ISBN 0691077088 Cited by: Discussion.
  • [26] N. Krieger (1999) Embodying inequality: a review of concepts, measures, and methods for studying health consequences of discrimination. International journal of health services 29 (2), pp. 295–352. Cited by: Introduction.
  • [27] F. Lamanna, M. Lenormand, M. H. Salas-Olmedo, G. Romanillos, B. Gonçalves, and J. J. Ramasco (2018) Immigrant community integration in world cities. PLOS ONE 13 (3), pp. e0191612. Cited by: Introduction.
  • [28] M. Lenormand, T. Louail, M. Barthelemy, and J. J. Ramasco (2016) Is spatial information in ICT data reliable?. In proceedings of the 2016 Spatial Accuracy Conference, 9-17, Montpellier, France.. Cited by: Discussion.
  • [29] M. Lenormand, S. Luque, J. Langemeyer, P. Tenerelli, G. Zulian, I. Aalders, S. Chivulescu, P. Clemente, J. Dick, J. van Dijk, M. van Eupen, R. C. Giuca, L. Kopperoinen, E. Lellei-Kovács, M. Leone, J. Lieskovský, U. Schirpke, A. C. Smith, U. Tappeiner, and H. Woods (2018) Multiscale socio-ecological networks in the age of information. PLOS ONE 13 (11), pp. 1–16. Cited by: Introduction.
  • [30] M. Lenormand, M. Picornell, O. G. Cantú-Ros, T. Louail, R. Herranz, M. Barthelemy, E. Frías-Martínez, M. San Miguel, and J. J. Ramasco (2015) Comparing and modelling land use organization in cities. Royal Society open science 2 (12), pp. 150449. Cited by: Introduction.
  • [31] H. Li, H. Campbell, and S. Fernandez (2013) Residential segregation, spatial mismatch and economic growth across us metropolitan areas. Urban Studies 50 (13), pp. 2642–2660. Cited by: Discussion.
  • [32] M. Lin, W.-J. Hsu, and Z. Q. Lee (2012) Predictability of individuals’ mobility with high-resolution positioning data. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp ’12, Pittsburgh, Pennsylvania, pp. 381–390. Cited by: Introduction.
  • [33] L. Lotero, R. G. Hurtado, L. M. Floría, and J. Gómez-Gardeñes (2016) Rich do not rise early: spatio-temporal patterns in the mobility networks of different socio-economic classes. Royal Society open science 3 (10), pp. 150654. Cited by: Discussion.
  • [34] T. Louail, M. Lenormand, J. M. Arias, and J. J. Ramasco (2017) Crowdsourcing the robin hood effect in cities. Applied network science 2 (1), pp. 11. Cited by: Introduction.
  • [35] T. Louail, M. Lenormand, O. G. Cantú-Ros, M. Picornell, R. Herranz, E. Frias-Martinez, J. J. Ramasco, and M. Barthelemy (2014) From mobile phone data to the spatial structure of cities. Scientific reports 4, pp. 5276. Cited by: Introduction.
  • [36] R. Louf and M. Barthelemy (2016) Patterns of residential segregation. PloS one 11 (6), pp. e0157476. Cited by: Introduction.
  • [37] D. S. Massey and N. A. Denton (1988) The dimensions of residential segregation. Social forces 67 (2), pp. 281–315. Cited by: Introduction.
  • [38] D. S. Massey (1990) American apartheid: segregation and the making of the underclass. American journal of sociology 96 (2), pp. 329–357. Cited by: Introduction.
  • [39] B. Motte, A. Aguilera, O. Bonin, and C. D. Nassi (2016) Commuting patterns in the metropolitan region of rio de janeiro. what differences between formal and informal jobs?. Journal of Transport Geography 51, pp. 59–69. Cited by: Introduction, Discussion.
  • [40] V. Netto, E. Brigatti, J. Meirelles, F. Ribeiro, B. Pace, C. Cacholas, and P. Sanches (2018) Cities, from information to interaction. Entropy 20 (11), pp. 834. Cited by: Introduction.
  • [41] J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási (2007) Structure and tie strengths in mobile communication networks. Proceedings of the national academy of sciences 104 (18), pp. 7332–7336. Cited by: Introduction.
  • [42] C. Panigutti, M. Tizzoni, P. Bajardi, Z. Smoreda, and V. Colizza (2017) Assessing the use of mobile phone data to describe recurrent mobility patterns in spatial epidemic models. Royal Society open science 4 (5), pp. 160950. Cited by: Introduction.
  • [43] L. Pappalardo, D. Pedreschi, Z. Smoreda, and F. Giannotti (2015) Using big data to study the link between human mobility and socio-economic development. In 2015 IEEE International Conference on Big Data (Big Data), pp. 871–878. Cited by: Introduction, Discussion.
  • [44] L. Pappalardo, M. Vanhoof, L. Gabrielli, Z. Smoreda, D. Pedreschi, and F. Giannotti (2016) An analytical framework to nowcast well-being using mobile phone data. International Journal of Data Science and Analytics 2 (1), pp. 75–92. Cited by: Introduction.
  • [45] S. Phithakkitnukoon, Z. Smoreda, and P. Olivier (2012) Socio-geography of human mobility: a study using longitudinal mobile phone data. PloS one 7 (6), pp. e39253. Cited by: Introduction.
  • [46] B. Rubim and S. Leitão (2013) O plano de mobilidade urbana e o futuro das cidades. Estudos avançados 27 (79), pp. 55–66. Cited by: Introduction.
  • [47] J. Ruiz-Tagle (2013) A theory of socio-spatial integration: problems, policies and concepts from a us perspective. International Journal of Urban and regional research 37 (2), pp. 388–408. Cited by: Introduction.
  • [48] F. Sabatini (2006) The Social Spatial Segregation in the Cities of Latin America. Technical report Inter-American Development Bank. Cited by: Discussion.
  • [49] S. Schönfelder and K. W. Axhausen (2003) Activity spaces: measures of social exclusion?. Transport policy 10 (4), pp. 273–286. Cited by: Introduction.
  • [50] C. Song, Z. Qu, N. Blumm, and A.-L. Barabási (2010) Limits of predictability in human mobility. Science 327 (5968), pp. 1018–1021. Cited by: Introduction.
  • [51] B. Sotomayor-Gómez and H. Samaniego (2020) City limits in the age of smartphones and urban scaling. Computers, Environment and Urban Systems 79, pp. 101423. Cited by: Introduction.
  • [52] J. L. Toole, S. Colak, B. Sturt, L. P. Alexander, A. Evsukoff, and M. C. González (2015) The path most traveled: travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies 58, pp. 162–177. Cited by: Introduction.
  • [53] M. Vanhoof, W. Schoors, A. V. Rompaey, T. Ploetz, and Z. Smoreda (2018) Comparing regional patterns of individual movement using corrected mobility entropy. Journal of Urban Technology 25 (2), pp. 27–61. Cited by: Introduction, Entropy as a measure of attractiveness.
  • [54] A. G. Wilson (1969) The use of entropy maximising models, in the theory of trip distribution, mode split and route split. Journal of Transport Economics and Policy 3 (1), pp. 108–126. Cited by: Entropy as a measure of attractiveness.
  • [55] W. J. Wilson (2012) The truly disadvantaged: the inner city, the underclass, and public policy. University of Chicago Press. Cited by: Introduction.
  • [56] Y. Xu, A. Belyi, I. Bojic, and C. Ratti (2018) Human mobility and socioeconomic status: analysis of singapore and boston. Computers, Environment and Urban Systems 72, pp. 51–67. Cited by: Economic activity and sociodemographic factors.


Data preprocessing

Spatial aggregation

In this work, in order to link mobility results directly to socioeconomic data, each geographic unit is the union of Voronoi polygons of antennas (Figure A1) matching the geographic limits of the 49 locations (Figure S1). This makes the set of regions outlined here directly related to the respective locations and, consequently, to the census data and many other sources.

Figure S1: Spatial distribution of antennas with their respective Voronoi polygons.
Figure S2: Spatial overlap between the aggregation of Voronoi cells and the districts’ spatial polygons.

Temporal aggregation

In addition to the spatial partitioning, the results were aggregated for each day in the data set. The four shifts considered the distribution of activities throughout one day. The time shift with the smallest number of records was 04:00 AM, as shown in Figure S3.

Figure S3: Number of calls per hour and the partition of time shifts. Total number of calls made in the RJMA in 2014 (including weekdays and weekends).

Identification of the user’s place of residence

The presumed residence of each user was computed as the most visited Voronoi cell between 08:00PM and 06:00AM during workdays and the entire day on Sundays and holidays. We additionally required that the user to be regularly detected in this cell (at least five times) and that the number of visits at the most frequented cell is always greater than the number of visits at the second most frequented cell. The final dataset containing only users with an identified residence ended up to be mobile phone users. As mentioned above, the data were aggregated spatially by assigning each Voronoi cell to one of the 49 districts. The identification of the user’s place of residence was then evaluated using data from the IBGE 2010 census. As it can be observed in Figure S4 we obtained a good match between the census data and the residence identified with mobile phone data with a Pearson correlation coefficient equal to 0.9.

Figure S4: Number of mobile phone users with an identified residence in the RJMA as a function of the number of inhabitants in the 49 locations.

Clustering analysis

Figure S5: Ratio between the within-group variance and the total variance as a function of the number of clusters.

Economic activity

Figure S6: Global Domestic Product (GDP) as a function of the entropy index. The entropy have been averaged over the work shift time periods on weekdays.

Temporal evolution

Figure S7: Temporal evolution of the three metrics. From the top to the bottom, Tukey boxplots of the entropy, attractiveness and radius of attraction as a function of time by cluster. The values are normalized by the value obtained for the work shift during weekdays.