News and the city: understanding online press consumption patterns through mobile data

07/04/2019 ∙ by Salvatore Vilella, et al. ∙ Università di Torino Universidad del Desarrollo ISI Foundation 0

The always increasing mobile connectivity affects every aspect of our daily lives, including how and when we keep ourselves informed and consult news media. By studying mobile web data, provided by one of the major Chilean telecommunication companies, we investigate how different cohorts of the population of Santiago De Chile consume news media content through their smartphones. We address the issue of inequalities in the access to information, trying to understand to what extent socio-demographic factors impact the preferences and habits of the users.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Internet, the World Wide Web and, more recently, the pervasiveness of mobile technologies have radically transformed the way individuals consume cultural content. One of the areas that was impacted the most by new forms of digital media is journalism: how newspapers are accessed, how news are consumed, the abundancy and diversity of topics have forever been altered. Gaining a better understanding of how different population groups are using and benefiting from on-line news services is now even more important, since, by leveraging mobile information, it is possible to map the variability of news consumption patterns of the population onto inequalities in socio-demographic features such as the education level, age, income etc.

In this paper we present the results of a study on accesses to on-line news media by individuals living in the city of Santiago de Chile (SCL) through mobile devices (DPI, deep packet inspection). The focus is on the time window that spans from the 6 of July 2016 to the 2 of August 2016. The goal is to examine the possible relationship between the patterns of news consumption, as revealed through the mobile connections to on-line news outlets, and the socio-demographic characteristics of the areas from where the mobile connections were originated. We focus on understanding whether there are differences and possible inequalities in the news consumption patterns of those living in the many diverse areas of SCL, from the most deprived to the wealthiest ones.

2 Related Work

News consumption patterns have been increasingly studied in the last decade, especially since when digital media have started providing a huge variety of new platforms that facilitate - and, on the other hand, makes more complex - the fruition of news. Access to news outlets from mobile, for example, is more fragmented and shallow with respect to what is observed with traditional media [1]. Technology drives the consumption of content also in terms of continuity of access ("anywhere, anytime") [2].

Moreover, socio-demographic features have a huge impact on the news media fruition patterns. This has traditionally been investigated through surveys among various population groups ([3, 4, 5, 6], just to mention a few). Mobile technology could provide a solution to some limitations that affect most of these traditional methods of data collection, like surveys and self reported behavioural information, that are usually expensive, biased, time consuming, and unlikely to scale efficiently. Indeed, many diverse research questions have been addressed using mobile communication data. For example they have been used to elaborate indexes of the economic development of a region [7], or to study traffic flows, urban human mobility, social mixing phenomena [8, 9, 10, 11, 12, 13] as well as epidemic spreading on fine spatial and temporal scales [14].

Most of the aforementioned works make use of CDRs (Call Detail Records) and XDRs (Data Detail records). Our dataset is based on DPI data, which are rather different (for a detailed description, see Sec. 3.1.2) and, at the best of our knowledge, there are yet no other studies based on this kind of private data, especially not in the field that we are investigating. Nevertheless such data can be useful to better understand the relationship between news consumption behaviour and the socio-demographics of the urban districts from where these news are accessed.

Chilean news media have been studied to outline for example their ownership structures or their political bias and to understand the effects of certain press manipulations by owners of content shown [15]. These results are based on hypotheses born out of an operationalization of Herman and Chomsky’s Propaganda Model [16], whose so-called “filters” provide an assessment of how the media behave. One particular hypothesis is that the media will manipulate content to target news to appeal to a certain audience. This has been explored also using Twitter data [17]. What we study here is a specialization of that study; namely, how people of different socio-economic backgrounds access news media information using their mobile phones, also with an insight on particular news outlets, at the finest possible level of granularity.

3 Methods

3.1 Datasets and data pre-processing

In this section we present a description of the data used and of the procedures that we followed for data cleaning and pre-processing.

3.1.1 Census data

To characterize Chile socio-economically, we used the official 2017 Chilean census111Released by the Instituto Naciónal de Estadística (INE) See https://www.censo2017.cl, surveying 17,574,003 people (51.1% females). Geopolitically, from largest to smallest, Chile consists of “regions”, the largest administrative division, followed by “comunas” (similar to counties in the United States), census districts, census zones, and finally blocks, the finest level of geographical granularity. The public dataset is available at the level of blocks. Here, we focus on the Metropolitan Region of Santiago de Chile (RM, for short). The RM consists of 52 “comunas” for a total of 7,112,808 individuals (51.3% females), accounting for about 40% of the total population of Chile. In this work we correlate census information with data from mobile web traffic records. Since minors (persons whose age is ) cannot sign mobile phone subscriptions, we decided to be conservative and only work with adults in the census to have a better match between the sample populations of the mobile data and census data sets, ending up with 5,450,592 individuals in the census and 2,455,148 unique users in the general phone dataset.

The 2017 Census contains a great deal of information on several aspects of the socio-economic composition of the Chilean population. It has the issue that it is only an abridged version of the usual census surveys, since a more comprehensive one will be released in 2022. Given the particular structure of the questionnaire, we were able to perform a manual selection of the census features. Indeed, the vast majority of the questions in the survey revolve around a few topics (e.g., on top of the basic question "Do you consider yourself as a member of a native minority?" there are other additional questions to distinguish between the different native minorities). We thus decided to select our features only among the following basic socio-demographic information:

  • Age;

  • Escolaridad (years of formal education attained);

  • Student status (whether an individual is still studying or not);

  • Membership to a native population.

The census does not report on individuals’ income directly. Thus, information about the economic situation had to be inferred. To do so, we chose escolaridad as a proxy. In Chile, at least, there is a well-known strong correlation between formal education and income distribution and inequality [18, 19]. Moreover, the variable escolaridad weighs heavily in the calculation of the Human Development Index (HDI), which is a widely used indicator of wellness and quality of life, that in turn can be used to get a first, qualitative proof of the soundess of our results (a map of Santiago HDI distribution at the level of municipalities can be seen in Fig. 1). Census data can be aggregated at the different levels already mentioned: region, comuna, districts, zones and blocks. As we go into more granularity, census information becomes less specific in order to avoid identification and preserve privacy of individuals. We have chosen the level of district as a good trade-off between granularity, privacy preservation and availability of information, see Fig.2. Finally, each value was expressed as a percentage - or a mean, depending on the typology of the quantity - over the population of each district.

Figure 1: HDI distribution for the comunas of the Region Metropolitana de Santiago
Figure 2: Census districts and comunas in the urban area of Santiago. We grouped census data at a level of census districts, the smaller areas surrounded by grey borders. In the figure are also shown the boundaries of the comunas (black lines), the administrative areas into which the city is divided. The red dots represent the dummy towers, obtained by clipping the coordinates of each antennas to the second decimal digit. All the traffic flow outgoing from each antenna was aggregated in the dummy towers to which it belongs.

Other than the provisos made above, the rest of the census dataset was not modified or preprocessed in any other way.

3.1.2 Building a mobile web connections data set

From one of the largest company in terms of mobile subscriptions marketshare () in Chile, we had access to a DPI (Deep Packet Inspection) dataset, which is a record of internet connections from smartphones. This dataset consists of almost a month (between the 6 of July 2016 and the 2 of August 2016) of anonymized events. An event is defined as a connection of an individual device to an IP address through a cell tower (or antenna). In order to preserve privacy, information was aggregated by antenna and by hour, without any single user information, making it virtually impossible to de-anonymize news consumers.

Our dataset, henceforth called dsUsers or DPI, includes the number of unique users that connected from an antenna to an IP address at a certain hour (00, 01, 02… 23), as in the following:

antenna date hour ip usrs
1 00000000 20160706 11 200.12.26.117 1
2 00000000 20160706 14 190.153.242.131 1
3 00000000 20160706 14 200.12.20.11 1
4 00000000 20160706 15 190.110.123.219 1
Table 1: The first lines of the DPI dataset.

In the case of the small (real) sample above, the first row of the raw dataset tells us that from antenna 00000000, on July 6, 2016, at 11 in the morning there was one user visiting 200.12.26.117.

The city of Santiago de Chile contains more than 15,000 antennas, both outdoors and indoors: their exact latitude and longitude positions are known, allowing us to georeferentiate our analysis. We can group together all the antennas within a 1.1km radius, obtaining a lattice of about 700 points that will be our new, “fictitious” antennas (see Fig.2). This should be taken as “sensors”, effectively counting accesses to outlets, but also helping preserve privacy, since we only report connections at the 1 square kilometer granularity.

Our dataset is a part of the complete deep packet inspection dataset of the telco provider. This subset contains only calls to those IP addresses that belong to some kind of news media outlet. In order to navigate among these outlets, we used a curated list of news organizations analyzed in a previous work [15]. These are around 400 news outlet accounts 222https://bit.ly/2Ukpbyl, twenty six outlets for which we knew their economic and political bias [17].

The process of identification of each IP address in order to associate the name of a website was not straightforward. There was often a many-to-one relationship between websites and IPs; for example, there are clusters of two or more websites that share the same IP. This is a critical issue, since in the dataset we can only see IP addresses, without any DNS resolving. In any case, most of the websites that share the same IP belong to the same owner. This allows us to label each IP by its owner: this way we lose knowledge about the individual news outlet, that we are often unable to identify, but we still keep a satisfying amount of information by creating a unique matching between each IP and the editorial group (EG) that owns it. Thus, whenever needed, the EG was considered instead of the individual news media. Also, as shown in [15], the power structure in Chilean news media is strongly biased towards very few groups that share an identifiable editorial line. The only newspaper outside the above list is The Clinic, a Chilean satirical newspaper that is usually identified as leftist, which we added in order to cover the political spectrum as widely as possible, and use it as a baseline for informational extremes (“El Mercurio” would be right-conservative, and “The Clinic” would be left-liberal). The complete list of news outlets examined is shown in Table 2.

List of the news outlets examined
BioBioChile
El Mercurio editorial group
Cooperativa
AdnRadioChile
The Clinic
Tele 13
Publimetro Chile
Diario Financiero
Table 2: List of news outlets (or editorial groups) examined in the dsUsers dataset.

Finally, since we are crossing mobile data with census, which contains socio-demographic information about the residents of a certain area, we need to take into account the phenomenon of the floating population, i.e. people moving from one place to another, especially at commuting hours and during working days. This matter has been extensively studied, and there are study cases set up in Santiago that show exactly how the city is affected by the phenomenon (and how mobile data could be used to understand urban mobility, see for example [20]). To tackle this issue, we examined the temporal patterns of the connections, illustrated in Fig.3. By comparing the trends between the weekends (Saturdays and Sundays) and the working days (Mondays to Fridays) we can easily notice the circadian rhythms of the city: the peak in connections starts when people wake up, continues when they commute to work, rises again at lunch time and finally when they go back home at 6 pm. This last peak is what differentiates the working days from the weekends: on Saturdays and Sundays, no afternoon peak can be observed, with a smooth decline of the connections towards night hours instead. The typical effects of the floating population phenomenon seem much attenuated during the weekends. If we check the correlation between the number of unique accesses and the number of residents in each comuna, we get a Pearson coefficient P=0.75 , obtaining further proof of the representativeness of our dataset. Therefore, by considering only the connections at the weekends, we reduce the noise caused by the floating population phenomenon.

Figure 3: Total connections compared over the whole period: working days and weekends.

3.2 Ethical considerations

Working with mobile data could raise privacy concerns. Indeed, DPI data was handed out by the mobile provider already anonymized and grouped per tower: data appeared as shown in Table 1, displaying only the number of smartphones connected to a certain antenna, without any knowledge about the identity or the profile of the customer. Researchers always worked with highly aggregated data, and, for good measure, the resolution was then made even coarser, as described in Sec. 3.1.2. The same applies to census data: even though they are completely anonymized and publicly available, the research was not carried out at the finest possible granularity, in order to have a good balance between available amount of information and privacy preservation.

No attempt was made to infer statistics about individuals: the nature of DPI itself prevents this from happening. All the results are general trends of very large groups of residents - between hundreds of thousands and millions of people.

3.3 Mapping census data to districts

All unique connections to the websites are grouped by hour and by antenna: this provides with a good approximation of the users’ position in the city. Our goal is to study the connections based on the census features of the areas from where they are originated. Thus, we want to assign to each census district (CD) a label that helps us distinguishing between the different levels of census: we will call them census levels

. To estimate the most appropriate number of levels, we cluster the CDs based on the census features that we selected earlier. We use a k-means algorithm, running the procedure several times for

. Since we need a clear and easily interpretable classification, we limit our choice to : having only 2 census levels would not be enough for such an analysis, while more than 8 would be hard to interpret. For each we compute the Gini coefficient of the distribution of the population (Fig. 4): a low Gini represents a situation of order, in which the population is almost equally distributed across all the clusters, while a high value is retrieved for an heterogeneous distribution throughout the different groups. The final value, , is chosen as it is the value that maximizes the Gini coefficient and, thus, the heterogeneity of the distribution of the population. This is based on the assumption that a very ordered situation, in which we have almost the same number of people for each level of census, would be non realistic.

Figure 4: Gini coefficient for different values of . was chosen as it maximizes the coefficient. The green area refers to the values of that we kept into consideration for the final choice.

By setting and inspecting the average values of the features in each cluster (see Table 3), we observe an almost hierarchical relationship among the different clusters. This is particularly true for the escolaridad variable, which correlates very well with the income, as mentioned above. The clusters were simply named K1-K2-K3-K4-K5, with K1 being the wealthiest area, and all the others following in an ordinal fashion.

Cluster Mean age Avg years of schooling % of students % of people of indigenous ethnicity
K1 46.25 16.91 0.15 0.05
K2 38.78 16.50 0.18 0.07
K3 42.05 14.65 0.14 0.10
K4 46.36 14.30 0.12 0.10
K5 44.62 12.86 0.11 0.13
Table 3: Insights of the clusters. In these table are shown the mean values of the features for each cluster.

The resulting map, shown in Fig. 5, resembles the distribution of the HDI shown in Fig.1. In particular, the clustering procedure captures very well the presence of a very rich and highly segregated area (cluster K1) in the north-eastern part of the urban area, while the other census groups are distributed more heterogeneously throughout the city.

Figure 5: Result of the k-means clustering on the census districts.

3.3.1 Study of the global spatial autocorrelation

As a further proof of the segregation that emerges from the k-means algorithm, we dig deeper into the spatial distribution of the census features. To do this, we measure Moran’s Index [21] on census data and test the global spatial autocorrelation of the features against a random null model. Moran’s I for a variable measured over spatial units is defined as

where are the elements of a spatial weight matrix , and is the deviation of y from its mean value in the spatial unit i. The matrix is essential in spatial autocorrelation analysis, since it provides the model with a measure of spatial contiguity. The definition of spatial contiguity is usually specified as a neighborhood relationship between spatial units. Since we are dealing with census districts, namely areas of the city, we decided to use the Queen neighborhood (Fig. 5(a)). Thus, we defined as contiguous all those areas that surround the zone we are observing: all those areas interact, communicate and most likely influence one another.

(a) Queen neighborhood
(b) Rook neighborhood
Figure 6: Types of neighborhood for the spatial weight matrix

We measured the global spatial autocorrelation for the census since we need to corroborate any evidence of clustering resulting from the application of the k-means algorithm. The Moran’s I is an univariate measure of correlation, hence we measured it for every census feature. All the results were then compared with the Moran’s I calculated for the case of a completely random spatial distribution of the features.

In order to have a better understanding of the spatial autocorrelation and have a more local view, we examined the Moran scatterplot of the data points for each feature. The analysis of the Moran scatterplot was first introduced by Anselin [22]

, and it is based on the interpretation of the Moran’s Index as a coefficient of an ordinary least squares (OLS) regression of the

spatially lagged variable on the variable itself:

where is the spatial lag of the variable z (which in turn is expressed as deviation from the mean). Hence the spatial lag is a weighted average of the variable over the neighbours of . Note that the interpretation of as a coefficient of an OLS regression is valid for any statistic that can be expressed as a ratio between a quadratic form and the sum of the squares [22], which is exactly how the Moran’s I is defined.

Plotting the spatial lag of as a function of allows us to have a local view of the spatial autocorrelation. The plot is divided into four quadrants, going from the 1st quadrant of high values of both and the spatial lag (high-high correlation) to the 4th quadrant of high-low correlation, passing through the so-called low-high and low-low correlation. A point being far up in the first quadrant of the Moran scatterplot of the average schooling year feature, represents an area that has a very high mean value of schooling, and is surrounded by neighbours with similarly high values.

As we can see in Fig. 7, there is strong evidence of segregation for almost all the census features, in particular for the extreme census segments K1 and K5 which, depending on the feature, are usually located way up near the bisector of the first and third quadrants. The census segments in between, instead, display a far more mixed situation.

Figure 7: Moran scatterplot of geo-located census data.

4 Results and discussion

4.1 Geo-referenced analysis of the dsUsers dataset

The labeling of the census zones based on socio-demographic features finally enables us to proceed with a geo-referenced analysis of the connections towards the selected news outlets websites. The goal is to find differences in the consumption of news media content by areas of the city that have different socio-demographical attributes.

A list of the news outlets can be found in Table 2: wherever we encountered the issues described in Sec. 3.1.2, we grouped the news outlets by owner and considered the resulting group as an individual entity. This is the case of what happened with all the outlets belonging to the El Mercurio editorial group, which we included in our final list as a single entry.

In figure 9 we plot, for each News Outlet (NO), the number of unique users in each CD, normalized over the number of residents of the cluster the CD belongs to:

where

is the number of unique users from the antennas aggregated by census zones, K the clusters and CD the census districts, and time .

Fig. 8 shows the same quantity considering all the NOs together, i.e. the total traffic towards news outlets websites during the examined weekends of July and August 2016.

Figure 8: Total connections detailed per hour and disaggregated by census segment of the area of origin.

Before analyzing these plots, it is worth going back to Table 3. The grouping into 5 clusters portrays a very clear division between the first two census segments and the remaining three. Indeed, groups K1 and K2 display a very high average value of schooling years (), with group K2 being composed of much younger individuals and a higher percentage of students. Groups K3, K4 and K5 instead are more comparable with each other, displaying a progressively lower value of schooling years and percentage of students, and a higher value of people of indigenous ethnicity. As pointed out in Sec. 3.1.1, these three groups are spatially mixed, while K1 and K2 are segregated in the north-eastern and, to a lower extent, in the central part of the urban area of the Region Metropolitana de Santiago.

Given this classification, we can inspect the results shown in the plots. From Fig. 8 we can gather the first, clear evidence: group K2 completely dominate the chart. This means that the majority of the connections are originated in those areas that are characterized by a high percentage of young and highly educated residents. As said, for each cluster the connections have been normalized over the number of people in it, meaning that we are looking at a measure of the activity of a cluster with respect to its size. This dominance of the group K2 and, in smaller measure, of K1, is confirmed also by computing the Pearson correlation between the number of connections and the number of residents in each cluster. We get P=-0.92 , a very high anti-correlation, meaning that - in proportion to their size - the smallest clusters show the highest activity. Indeed, the smallest clusters are K2 (with only 255699 people) and K1 (682053 inhabitants), while the remaining three all have more than 1M residents each.

We can also analyze the same trends for the individual news media websites. From Fig. 9 we can see that the areas belonging to group K2 are again on top in the majority of the plots. The only exception is for the websites belonging to El Mercurio, a right wing editorial group, which is most accessed by those areas belonging to the K1 segment. Comparable numbers between K1 and K2 can be found also for the website of Diario Financiero, a financial newspaper. As for the other census groups no clear pattern can be identified, but still it seems pretty clear that, both in Fig. 9 and 8, group K3 is always the least active. In general, segments K3, K4 and K5 display an activity comparable to that of groups K1 and K2 only for those news media, like Biobio, Cooperativa or Tele13

, that can be classified as containers of

generic news and information, unlike Diario Financiero, which is highly specialised on financial matters, or El Mercurio, which is openly politically oriented.

Figure 9: Total connections over the whole time window studied, detailed per hour and per news outlet.

As expected ([23, 24, 12]), the circadian rhythm of the population can still be seen from these figures, even though to a lower extent than the same analysis performed on the weekdays (Fig. 3), as explained in Sec. 3.1.2. The decline in connections towards night hours is smooth, and the trends are very similar across all the census segments.

4.1.1 Addressing the limitations

This method presents some limitations. First of all, we acknowledge a possible bias in our dataset: while census is representative of the whole population, mobile data accounts only for the customers of the service provider, which can be biased towards a certain segment of the population depending on multiple factors (typologies of the proposed contracts, marketing strategies, etc.). Nonetheless, bearing in mind that this provider alone owns more than the 37% of the marketshare, a correlation check between our dataset and the number of residents in each census zone gave us satisfactory results, as pointed out in Sec. 3.1.2. Another influence on the results could come from the diverse penetration rate of WiFi technologies among the various census groups. One of our basic assumptions is that people, during weekends, access the web mostly from their houses or from nearby locations. This means that, especially in their houses but also in some public places, people could be surfing the web via WiFi instead of using their own mobile data. This, again, strongly depends on the specific country and on its telco market, and it could also be related to the fact that - in general - WiFi is more popular among those individuals that have, for example, a bank account (sometimes needed even just to get a broadband contract in the first place) or at least a certain yearly income. Thus, we acknowledge that these census groups, in particular K1 and K2 segments according to our classification, could be slightly under-represented in our dataset. Nonetheless, the positive correlation between the number of unique accesses and the residents in each municipality is a reassuring evidence with respect to this matter. Moreover, for the particular case of Chile, the effects of this issue should be limited: according to official data (Subtel), nearly 80% of all internet access is made from mobile phones [25].

5 Conclusions

The main purpose of this work was to understand if different areas of the city of Santiago de Chile, and in particular the most deprived ones, had access to the same amount of news media content as the wealthier areas. In order to do so, we studied a record of a month (July-August 2016) of anonymized and geo-referenced accesses to several websites of Chilean news media through the cellphone network of a major telco provider.

Age and education seems to play the most important role in defining the news consumption patterns of the inhabitants of Santiago de Chile: by applying a k-means algorithm on official Chilean census data, in order to group the population into 5 different census levels, we find that the wealthiest areas of the city, K1 and K2, are very similar in terms of years of schooling, with K2 members being - on average - much younger than K1. The other three clusters (K3-K4-K5) are indeed pretty far from the first two in terms of average education level, and far more spatially mixed: K1 and K2 are mostly grouped in the north-eastern quadrants of the city.

To reduce the effects of the floating population phenomenon, we analyzed only a portion of the connections, narrowing the window to the weekends only. The news access patterns tell us that segment K2 is overall the most active group, as well as the most comprehensively informed. Indeed, they dominate the charts for every news media, covering all kind of contents and political positions. This means that the wealthiest and youngest are also the most informed, clearly ahead of older people of similar social extraction. The same cannot be said about segments K3-K4-K5. There is a gap between these levels and the more educated ones; this is confirmed also by studying accesses to the individual news outlets, where they often stand very far from the wealthier groups, getting closer to K1 only for the most generic news media like Cooperativa or Biobio.

These findings confirm empirically that socio-demographic features have an influence on the consumption of news. Our results suggest that the consumption of news media content does not always necessarily increase with the education level of the user. While it is true that the highly educated are those who access news media websites the most, the opposite cannot be inferred: indeed group K3, which includes all those areas of the city whose residents are averagely educated (see Table 3), is the cluster that displays the lowest activity.

The predominance of group K2 also tells us that age plays a key role in this context. Young generations are likely more comfortable in using mobile devices in their daily lives, and this has been also explored in the specific field of news consumption [26].

In summary, although we found clear signals of correlation between the consumption of online news media content and the socio-demographics of users, this relationship seems to be non-trivial. While the highly educated are the most eager users of mobile news media outlets, there are important clues that a group of well educated people, that we could identify as middle-class, show a lack of interest in accessing news media via mobile. This could have a significant impact on the civic engagement of a relevant share of the population, as well as on the economical and political life of a Country, not mentioning the media market itself.

There are positive signals of correlations between the other two features - student status and percentage of people of native ethnicity. Therefore, a more detailed analysis of the ties between these features and the consumption patterns of the individual news outlets - namely, the habits and preferences of students and native minorities - could be of much interest in the context of studying inequalities in the accesses between different cohorts of the population, and it is left as future work.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

Conceptualization: Leo Ferres, Daniela Paolotti, Giancarlo Ruffo, Salvatore Vilella; Data curation: Salvatore Vilella, Leo Ferres; Formal analysis and Methodology: Daniela Paolotti, Giancarlo Ruffo, Leo Ferres, Salvatore Vilella; Writing: Salvatore Vilella, Leo Ferres, Daniela Paolotti, Giancarlo Ruffo.

Acknowledgements

Daniela Paolotti and Salvatore Vilella acknowledge support from the Lagrange Project of the Institute for Scientific Interchange Foundation (ISI Foundation) funded by Fondazione Cassa di Risparmio di Torino (Fondazione CRT). The authors acknowledge financial support from Movistar - Telefónica Chile, the Chilean government initiative CORFO 13CEE2-21592 (2013-21592-1-INNOVA_ PRODUCCION2013-21592-1), Conicyt PAI Networks (REDES170151) “Geo - Temporal factors in disease spreading and prevention in Chile”, and Project PLU180009, Tenth “Fondo de Estudios sobre el Pluralismo en el Sistema Informativo Nacional”, 2018, “ Geo-Temporal Access to Chilean news outlets using digitales traces” (LF).

Data availability

2017 Chilean Census data is available at https://www.censo2017.cl. DPI dataset is private, and therefore not publicly available.

References

  • [1] Logan Molyneux. Mobile news consumption: A habit of snacking. Digital Journalism, 6(5):634–650, 2018.
  • [2] Pablo J Boczkowski, Eugenia Mitchelstein, and Mora Matassi.

    “news comes across when i’m in a moment of leisure”: Understanding the practices of incidental news consumption on social media.

    New Media & Society, 20(10):3523–3539, 2018.
  • [3] Usha M Rodrigues and Yin Paradies. News consumption habits of culturally diverse australians in the digital era: Implications for intercultural relations. Journal of Intercultural Communication Research, 47(1):38–51, 2018.
  • [4] Stephanie Edgerly, Kjerstin Thorson, Esther Thorson, Emily K Vraga, and Leticia Bode. Do parents still model news consumption? socializing news use among adolescents in a multi-device world. new media & society, 20(4):1263–1281, 2018.
  • [5] Richard Fletcher and Rasmus Kleis Nielsen. Paying for online news: A comparative analysis of six countries. Digital Journalism, 5(9):1173–1191, 2017.
  • [6] Koen Matthijs, David De Coninck, Marlies Debrael, Leen d’Haenens, and Rozane De Cock. Unpacking attitudes on immigrants and refugees: a focus on household composition and news media consumption. Media and Communication, 7(1):43–55, 2019.
  • [7] Huina Mao, Xin Shuai, Yong-Yeol Ahn, and Johan Bollen. Quantifying socio-economic indicators in developing countries from mobile phone communication data: applications to côte d’ivoire. EPJ Data Science, 4(1):15, 2015.
  • [8] Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. Understanding individual human mobility patterns. nature, 453(7196):779, 2008.
  • [9] Francesco Calabrese, Massimo Colonna, Piero Lovisolo, Dario Parata, and Carlo Ratti. Real-time urban monitoring using cell phones: A case study in rome. IEEE Transactions on Intelligent Transportation Systems, 12(1):141–151, 2011.
  • [10] Francesco Calabrese, Laura Ferrari, and Vincent D Blondel. Urban sensing using mobile phone network data: a survey of research. Acm computing surveys (csur), 47(2):25, 2015.
  • [11] Md Shahadat Iqbal, Charisma F Choudhury, Pu Wang, and Marta C González. Development of origin–destination matrices using mobile phone call data. Transportation Research Part C: Emerging Technologies, 40:63–74, 2014.
  • [12] Eduardo Graells-Garrido, Leo Ferres, Diego Caro, and Loreto Bravo. The effect of pokémon go on the pulse of the city: a natural experiment. EPJ Data Science, 6(1):23, 2017.
  • [13] Mariano G Beiró, Loreto Bravo, Diego Caro, Ciro Cattuto, Leo Ferres, and Eduardo Graells-Garrido. Shopping mall attraction and social mixing at a city scale. EPJ Data Science, 7(1):28, 2018.
  • [14] Amy Wesolowski, Caroline O Buckee, Kenth Engø-Monsen, and Charlotte Jessica Eland Metcalf. Connecting mobility to infectious diseases: the promise and limits of mobile phone data. The Journal of infectious diseases, 214(suppl_4):S414–S420, 2016.
  • [15] Jorge Bahamonde, Johan Bollen, Erick Elejalde, Leo Ferres, and Barbara Poblete. Power structure in chilean news media. PloS one, 13(6):e0197150, 2018.
  • [16] Edward S Herman and Noam Chomsky. Manufacturing consent: A propaganda model. Manufacturing Consent, 1988.
  • [17] Erick Elejalde, Leo Ferres, and Eelco Herder. On the nature of real and perceived bias in the mainstream media. PloS one, 13(3):e0193765, 2018.
  • [18] Jose De Gregorio and Jong-Wha Lee. Education and income inequality: new evidence from cross-country data. Review of income and wealth, 48(3):395–416, 2002.
  • [19] John Jerrim and Lindsey Macmillan. Income inequality, intergenerational mobility, and the great gatsby curve: Is education the key? Social Forces, 94(2):505–533, 2015.
  • [20] Eduardo Graells-Garrido and Diego Saez-Trumper. A day of your days: estimating individual daily journeys using mobile data to understand urban flow. In Proceedings of the second international conference on IoT in urban space, pages 1–7. ACM, 2016.
  • [21] Patrick AP Moran. Notes on continuous stochastic phenomena. Biometrika, 37(1/2):17–23, 1950.
  • [22] Luc Anselin. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. Regional Research Institute, West Virginia University Morgantown, WV, 1993.
  • [23] Marco De Nadai, Jacopo Staiano, Roberto Larcher, Nicu Sebe, Daniele Quercia, and Bruno Lepri. The death and life of great italian cities: a mobile phone data perspective. In Proceedings of the 25th international conference on world wide web, pages 413–423. International World Wide Web Conferences Steering Committee, 2016.
  • [24] Eduardo Graells-Garrido, Oscar Peredo, and José García. Sensing urban patterns with antenna mappings: the case of santiago, chile. Sensors, 16(7):1098, 2016.
  • [25] 2017 Digital News Report, reuters institute for the study of journalism and university of oxford. http://www.digitalnewsreport.org/survey/2017/chile-2017/#fn-6155-1. Accessed: 2019-02-26.
  • [26] Michael Chan. Examining the influences of news use patterns, motivations, and age cohort on mobile news use: The case of hong kong. Mobile Media & Communication, 3(2):179–195, 2015.