Data from social media are increasingly being used in the digital phenotyping of individual users and the characterisation of population-level behaviours to answer health-related questions [1, 2, 3, 4, 5, 6, 7]
. Sentiment analysis—the detection of mood from text—is a class of natural language processing methods that have been used in this area to evaluate reactions and attitudes to certain current events, health interventions like vaccination , human mobility , and health outcomes like seasonal affective disorder and obesity [11, 12, 13].
When using sentiment analysis tools to observe or find signals of changes in the sentiment of a population, researchers must navigate the complicated interactions between the tools they use and the spatiotemporal and social factors that modify mood and emotion. For example, positive and negative affect measured by sentiment analysis have been shown to be associated with the time of day and day of week [14, 15, 16], weather [17, 18, 19, 20], and the quality of social interactions .
Studies applying sentiment analysis to Twitter data have confirmed the periodicity of positive and negative affect by time of day and day of week [11, 20, 22, 23, 24]. However, results and conclusions vary from study to study, and these differences may depend on the tools used to measure sentiment in text, the methods used to aggregate sentiment across sets of tweets or users, or because of challenges associated with validating results against external information. Studies examining variation in sentiment by geography or weather are relatively rare compared to those that measure temporal variation [25, 26, 27, 28]. Studies that report analyses for different types of social interaction on Twitter do not appear to have focused on measuring differences in sentiment across tweets that are social—mention, reply to, or quote other users—relative to those tweets that are simply broadcast messages .
To extend studies in this area and examine how spatiotemporal and social factors might introduce biases in public health studies that apply sentiment tools to Twitter data, our aim was to construct models of positive and negative sentiment using time of day, day of week, interaction type, weather, or city as factors. We then used the model and degenerate versions of the model to identify unexpected differences between expected and observed sentiment.
To address our aims, we aggregated sentiment scores for each hour in each of the 100 cities, and constructed multivariable models for explaining differences in the proportion of tweets expressing positive or negative sentiment using city, interaction type, weather, time of day, and day of week as factors.
2.1.1 Twitter data
We used the Twitter streaming Application Programming Interface (API) to collect tweets between 13 July 2017 and 30 November 2017 (see Figure 1). Each tweet contains information about the user including name, location, tweet counts, follower counts, following counts and the information about the tweet itself such as timestamp, whether it was a reply to a previous tweet, and the users it mentions. We used this information to label each tweet as either non-social (retweets and tweets that do not mention other users), and social (replies and mentions of other users in the tweet).
2.1.2 Location data
Identifying the home locations of users on Twitter is a challenging task due to the low number of posts with precise location information (geo-tags), and the need to parse user-defined location information using a gazetteer. Fewer than 0.5% tweets are geotagged, and fewer than 50% of Twitter users have provided useful home locations in their profiles . To identify the location of the tweets from where it has been posted, we take the user-defined text from the location field in Twitter user profiles and use Nominatim, a gazetteer that returns structured geographical information and a score associated with the confidence in the answer .
Not all Twitter accounts represent individuals. In a similar manner to the celebrity removal approach used by Rahimi et al.  for location inference, we removed tweets from accounts that had more than 300,000 followers under the assumption that organisations and brands may include an identifiable city in their user biographical information but what they post is less likely to represent that location compared to other users.
2.1.3 Weather data
Hourly weather data were collected for the top 100 cities using the API from OpenWeatherMap website (openweathermap.org). The information provided by OpenWeatherMap website includes detailed weather information, such as temperature, humidity, and weather descriptions. Weather for each hour in each city was mapped to one of 7 values: clear, clouds, fog, haze, rain, snow, or storm.
2.2 Sentiment measures
Sentiment analysis of written texts is a widely-studied problem in natural language processing [33, 34, 35]. In this study, we consider sentiment in a simple form—positive or negative affect—and applied SentiStrength 
, a widely-used open source library designed for sentiment analysis of tweets. SentiStrength is a dictionary-based method, using a lexicon of words categorised as positive or negative with a score for its polarity and strength. For a given tweet, SentiStrength identifies the presence of sentiment terms from its lexicon, and computes the sentiment of the text based upon the scores of the words found. Each tweet is labelled with two scores, one indicating positive sentiment (from 1 to 5, least positive to most positive), and one indicating negative sentiment (from 1 to 5, least negative to most negative).
Sentiment scores were aggregated across a set of tweets using the proportion of tweets that have a positive sentiment score or the proportion of tweets that have a negative sentiment score. Methods for aggregating scores across groups of tweets are important because they can influence the interpretation and lead to different conclusions. To aggregate sentiment scores, researchers have used counts, averages, proportions, ratios, and weighted averages [11, 22, 23, 37, 38, 39, 40, 41, 42]. Some have combined positive and negative scores to create a single measure [8, 22, 23, 40, 41], while others have kept positive and negative scores separate [38, 39, 43]. Following Scott et al. , we used positive and negative sentiment scores separately because positive and negative affect can co-exist [44, 45], and because when aggregated, a population can exhibit higher levels of both positive and negative sentiment at the same time. Thus, a low positive score indicates the absence of positive emotion across a set of tweets, not the presence of negative emotion.
2.3 Analysis and modelling
In the first part of the analysis, we examined how each of the factors—interaction type, time of day, day of week, weather, and city—were associated with differences in the proportions of tweets that expressed positive or negative sentiment in a city in an hour. To do this, we constructed multivariable regression models using each of the factors individually and then in combination, reporting the percentage of the variation in sentiment explained by each of the models, and Pearson’s R between the values predicted by the model and the observed data in a set of testing data, distinct from the period of observation used to construct the models.
In the second part of the analysis, we used the models constructed in the first part of the analysis as a baseline for detecting deviations from the expected proportions of positive and negative sentiment tweets per city per hour. The objective was to determine whether baseline differences in spatiotemporal and social factors would introduce biases in the detection of extreme deviations in sentiment that occur during major localised news events, and if accounting for them in a baseline model could address these biases. To do this, we compared the expected and observed proportions of positive and negative sentiment tweets per city per hour using a chi-square test, and then used the resulting p-value as an indicator of the magnitude of the deviation.
Given a set of chi-square test statistics, we then defined recurrence intervals based on how often deviations of each magnitude occurred during the time period. The recurrence interval is defined by the number of days of observation divided by the frequency of an event of that magnitude across the set of all cities in the analyses. For example, given 60 days of observation in the test period, a recurrence interval of 30 days is defined by an event with a test statistic that was exceeded only twice during the test period.
To characterise an event by its magnitude we also needed to account for extreme sentiment that persisted for multiple hours or was expressed across multiple cities within a country. To do this, we merged events that produced significant differences between the observed and predicted number of positive or negative sentiment tweets and labelled them using the lowest p-value in the period. Similarly, we merged cities within a country if significant events occurred at the same time.
We then compared the events identified from the full model to the events produced by degenerate forms of the full model (e.g. excluding city or interaction type as a factor). We used these differences to evaluate how the use of baseline spatiotemporal modelling affected the identification and ranking of extreme sentiment events. The expectation was that by using degenerate forms of the models, the distribution of events would be biased towards certain cities, weather types, times of day, days of the week, or events where social interactions were more or less likely.
The study was an ecological study of tweets posted by Twitter users. We used the Twitter streaming Application Programming Interface (API) to collect tweets between 13 July 2017 and 30 November 2017. On average, we received 3.6 million tweets a day for 141 days; for a total of 507.6 million tweets from 27.4 million unique users. In the dataset, only 29.8% (151.2 million) tweets were tagged as English language and 65.7% (99.3 million) of these tweets had location information available in the user biography and only 16.7% (16.5 million) of these tweets were successfully resolved by gazetteer to city-level addresses. This data was used as the basis for the study.
After ranking cities based on the total number of English language tweets posted by users with locations that could be resolved using the gazetteer, in this study we have included the top 100 cities—52 cities in North America (45 from the United States, 6 from Canada, and 1 from Mexico), 11 cities in the United Kingdom, 6 cities from Europe, 16 cities in Asia and South-East Asia, 9 cities in Africa, 3 cities in Australasia, 2 cities from the Middle East and 1 city in South America.
3.1 Analysis of spatiotemporal and social factors
The training data used to construct the multivariable models comprised 8.39 million tweets from the first 81 days of data collection (13 July 2017 to 30 September 2017). Of these, 39.7% (3.33 million) were labelled as having expressed positive sentiment and 28.1% (2.36 million) were labelled as having expressed negative sentiment. Each model described below was constructed to estimate the proportion of tweets that expressed positive or negative sentiment in a city in an hour, and results presented based on the correlation between the estimated and observed proportions within the training data (Table 1 and Table 2).
|r-squared||Pearson’s R (95% CI)|
|- all factors||136 (108)||9.345%||0.306 (0.301-0.310)|
|- social, city, hour, day||130 (107)||9.338%||0.306 (0.301-0.310)|
|- social, city||101 (80)||8.831%||0.297 (0.292-0.302)|
|- hour, day||30 (26)||0.486%||0.070 (0.065-0.075)|
|- city||100 (81)||8.736%||0.296 (0.291-0.300)|
|- hour of day||24 (20)||0.298%||0.055 (0.049-0.060)|
|- day of week||7 (7)||0.191%||0.044 (0.039-0.049)|
|- weather||7 (5)||0.193%||0.044 (0.039-0.049)|
|- social proportion||2 (2)||0.010%||0.010 (0.005-0.015)|
|r-squared||Pearson’s R (95% CI)|
|- all factors||136 (107)||5.584%||0.236 (0.231-0.241)|
|- social, city, hour, day||130 (107)||5.580%||0.236 (0.231-0.241)|
|- social, city||101 (85)||4.671%||0.216 (0.211-0.221)|
|- hour, day||30 (26)||1.330%||0.115 (0.110-0.133)|
|- city||100 (90)||3.732%||0.193 (0.188-0.198)|
|- hour of day||24 (21)||1.271%||0.113 (0.108-0.118)|
|- day of week||7 (6)||0.053%||0.023 (0.018-0.028)|
|- weather||7 (5)||0.170%||0.041 (0.036-0.046)|
|- social proportion||2 (2)||1.387%||0.118 (0.113-0.123)|
Users across the 100 cities posted more tweets on Monday to Thursday, and slightly fewer tweets from Friday to Sunday. The hour in which users were typically most active was between midday and 1pm (an average of 7,652 tweets across the 100 cities), and the fewest tweets were posted by users between 4am and 5am (an average of 1,745 tweets across the 100 cities). A model combining both temporal factors was significantly correlated with the proportion of tweets expressing negative sentiment (R=0.070; 95% CI 0.065-0.070). The association was stronger with the proportion of tweets expressing positive sentiment (R=0.115; 95% CI 0.110-0.133), and explained nearly 5% of the variance. For both positive and negative sentiment outcomes, adding the day of the week to the hour of the day in the model produced a significant improvement in the model.
Positive and negative sentiment also varied by interaction type, where social tweets (tweets that mention or reply to another user) were much more likely to be expressions of positive sentiment relative to non-social tweets (tweets that do not mention or reply to another user). In hours where higher proportions of the tweets were social interactions, the proportion of tweets that expressed positive sentiment were higher (R=0.118; 95% CI 0.113-0.123), and the proportion of tweets with negative sentiment were lower (R=0.010; 95% CI 0.005-0.015) but this was a much weaker association. This suggests that social tweets tend to be positive and has little influence over the number of negative tweets posted. In multivariable models, adding the proportion of tweets that were social interactions as a factor made a significant improvement to the performance of the model in all cases.
The median number of tweets per city during the testing period was 48,974, and the number varied from 24,825 (Istanbul, Turkey) to 856,471 (New York City, United States). The numbers of tweets were generally aligned with the populations (Figure 2), except for countries where languages other than English are used. Cities in the United States tended to have higher proportions of negative sentiment tweets and lower proportions of positive sentiment tweets (Figure 3). Models using only city information exhibited the strongest correlation with the proportion of positive and negative sentiment tweets in an hour compared to all other factors, explaining 8.73% of the variance in negative sentiment (R=0.296; 95% CI 0.291-0.300), and 3.7% of the variance in positive sentiment (R=0.193; 95% CI 0.188-0.198).
The number of tweets in each category of weather varied from snow (230 tweets) and storms (189,201 tweets) to cloudy weather (3,247,680). Weather exhibited relatively weak associations with the proportions of tweets expressing positive (R=0.041; 95% CI 0.036-0.046) or negative sentiment (R=0.044; 95% CI 0.039-0.049). In multivariable models, adding weather as a factor improved the performance. However, since the coefficients for weather were orders of magnitude smaller than other factors such as city and social proportion, weather did not appear to be a useful addition to the baseline models used in the detection of variation in sentiment caused by exogenous factors.
3.2 Detecting deviations in city-level expression of positive or negative sentiment
We then used the models constructed above to predict the expected sentiment in city-hour pairs constructed from a separate set of 8.02 million tweets from the following 60 days (1 October 2017 to 30 November 2017). Differences between the expected and observed proportions of positive and negative tweets were then used to define the magnitude of localised deviations in positive or negative sentiment. The proportions of tweets expressing positive sentiment (39.9%; 3.20 million) or negative sentiment (28.4%; 2.28 million) were similar to the proportions in the training data.
Using the full model to identify unexpected deviations in the proportion of positive or negative sentiment tweets in the test period, we ranked events based on the magnitude of the deviation and the top ten are listed in Table 3. After accounting for city-level differences in baseline proportions of positive and negative sentiment tweets, the highest ranked events were distributed across 7 countries and could be retrospectively matched with major news stories that were specific to each of the cities. Using the degenerate models that do not account for city-level baseline differences, the United States accounted for a much higher proportion of extreme negative city-hour pairs and a much lower proportion of extreme positive city-hour pairs (Figure 4). This occurs because cities in the United States tend to exhibit higher rates of negative sentiment and lower rates of positive sentiment than cities in other countries. Models that do not take account for this difference may overestimate the importance of negative sentiment deviations (which shifts up negative sentiment events in the United States to make violence in Barcelona or Nairobi seem less important), or underestimate the importance of positive deviations (shifting down positive sentiment events such as Thanksgiving Day parade in New York City, New York; or the World Series win in Houston, Texas) (Table 3).
|Time and location||
|49.6% (28.7%)||31.3% (38.1%)||>60 days||
|12.1% (22.9%)||73.2% (45.7%)||30 days||
|61.5% (30.7%)||48.3% (40.5%)||20 days||
|60.9% (23.8%)||14.7% (39.6%)||12 days||
|67.4% (23.8%)||17.8% (39.7%)||10 days||
|14.4% (31.6%)||56.6% (38.2%)||8.6 days||
|20.4% (29.0%)||50.5% (37.4%)||7.5 days||
|8.1% (25.0%)||92.1% (39.0%)||6 days||Diwali festival|
|48.5% (26.5%)||22.1% (37.3%)||5.5 days||
|8.0% (21.0%)||71.7% (43.3%)||5 days||
|35.5% (26.5%)||47.1% (37.9%)||4.6 days||
From among the examples listed in Table 3, the visualisation of the extreme events illustrates different types of deviations from the baseline (Figure 5). In each example, the expected baseline is the expected proportion of positive sentiment and negative sentiment tweets in an hour multiplied by the number of tweets from that city. Unexpected deviations occur when the observed number of positive or negative sentiment tweets are much higher or much lower than the baseline (in Figure 5, coloured in red or blue). There were visible differences in the patterns indicating events that occur over a period of time (e.g. riots after an election in Nairobi, a day of attempted voting in Barcelona) and events that occur within one or several hours (Houston Astros winning a baseball final). Other events not pictured include the outpouring of grief across multiple cities in the United States after a mass shooting, which decay more slowly over a period of days.
When applying sentiment analysis tools to Twitter data to characterise a population over time, it is useful to account for baseline spatiotemporal differences before attempting to detect deviations in mood. The first contribution of this work was to show that hour of day, day of week, the proportion of social tweets, the locations of the users posting the tweets, and the weather are each independently correlated with both positive and negative sentiment. Second, while these factors together account for less than 10% of the variance in positive and negative sentiment, ignoring them can affect the detection of unexpected deviations. Third, we confirmed that in studies aggregating across populations (ecological designs), positive and negative sentiment can rise and fall separately, and aggregating them into a single measure may mean losing important information that helps characterise the mood of a population.
5 Comparisons with prior literature and implications
A range of studies have applied sentiment analysis tools to social media data to examine changes in mood or emotion in relation to current events, weather and season, or circadian and daily rhythms. Our results extend these analyses to demonstrate the relative importance of each of these factors.
We found that the time of day and day of week were more closely correlated with positive sentiment than with negative sentiment. For positive sentiment, models built using these temporal factors generally explained less of the variance than models that used social interactions and cities as factors. Previous studies investigating hourly and daily patterns of sentiment on Twitter vary in structure from cohort designs, where individual users are followed [11, 46], to ecological designs where signals from a population are aggregated [22, 24, 47]. The results of these studies and the conclusions they draw appear to be related to design choices including the tools used to measure sentiment and the methods used to aggregate measures of sentiment across populations.
The results of the study are consistent with previous studies that have found associations between weather and sentiment on Twitter [26, 27, 28]. Despite the observed independent correlations between weather and sentiment, weather explained very little of the variance in positive or negative sentiment. These results should not be confused with seasonal variation in weather or sunlight; our results did not extend across a full range of seasons and other studies have examined the use of Twitter data for its potential to observe seasonal affective disorder [11, 12].
Mitchell et al.  examined the geography of happiness in 373 cities in the United States using Twitter data and found that happiness was correlated with socio-economic status and health-related census data, among other factors. We found that negative sentiment was generally more common and positive sentiment less common in tweets from many cities in the United States, and though we did not examine socioeconomic status directly, our results appear to be consistent with the existing research.
Tweets that involve social interactions on Twitter (typically replies and mentions) are commonly used in applications of network science. Our results show a strong positive correlation between the proportion of social interactions in a city in an hour and positive sentiment, and a weak correlation with negative sentiment. Future applications that couple network analysis with content analysis would benefit from recognising these correlations.
. However, studies in the area are at risk of producing incomparable results and inconsistent conclusions if sampling methods vary in ways that skew towards certain locations or certain times of the day or week. Practitioners in the area are already aware of the risks of selecting only geo-tagged tweets, but the spatiotemporal differences we highlight here are typically not discussed or accounted for in applications that use Twitter data to answer public health questions.
6 Limitations and future work
The study has several limitations. First, Twitter users represent a biased sample of countries and a biased sample of the population within countries [51, 52, 53, 54, 55], and we did not infer the demographics nor apply any re-weighting methods to adjust for differences between the users posting English-language tweets and the demographics of the cities we examined. Further, users who include enough biographical information to be located within a city may represent a biased subset of the overall Twitter population. For these reasons, the study only captures deviations that were important to the population studies. However, there is growing evidence that Twitter data can be used to model or predict real-world outcomes such as heart disease mortality or vaccination coverage despite sampling biases [6, 57]. Second, we did not use any external source of information to compare the importance of individual events with the recurrence intervals we observed. In the absence of an objective measure of event importance in relation to sentiment, we made the assumption that positive and negative sentiment deviations should be more balanced within and across cities in different countries. Alternative approaches to ranking events by importance would require an externally validated list of important events, which would be challenging to produce in a robust manner. Third, certain events are less localised and affect multiple cities or even multiple countries, and others may extend across many hours, days, or weeks. Methods for dealing with the spatiotemporal granularity of these events would be a useful addition to the sets of methods used in analyses of sentiment (or other measures that can be observed in social media datasets). Real-time event detection on Twitter is an active area of research [58, 59], and our aim was not to add to this literature. Rather, we sought to develop a way to improve the robustness of observational studies that use sentiment analysis of Twitter to make sense of how populations react to real world events. Other methods for constructing models may be more useful for detecting events, and further work aimed at embedding this research into event detection methods may improve the robustness of observational studies of Twitter data in public health applications.
In this study we showed that baseline spatiotemporal and social factors explain some of the difference in sentiment on Twitter, and accounting for these differences may improve the detection of exogenous factors that affect the mood of a city. The first contribution of this research is the consistent evaluation of a broad set of factors—making it easier to compare the importance of location, time, and social interactions on positive and negative sentiment. The second contribution is the use of these factors to construct a model of the expected variation in positive and negative sentiment on Twitter, and a demonstration of that approach for use in identifying the events that shape the moods of cities.
-  Centola D. Social Media and the Science of Health Behavior. Circulation. 2013;127(21):2135-44.
-  Salathé M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C, et al. Digital Epidemiology. PLOS Computational Biology. 2012;8(7):e1002616.
-  Dredze M. How social media will change public health. IEEE Intelligent Systems. 2012;27(4):81-4.
-  Paul MJ, Dredze M. You are what you Tweet: Analyzing Twitter for public health. ICWSM. 2011;20:265-72.
-  Coppersmith G, Dredze M, Harman C, editors. Quantifying mental health signals in Twitter. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; 2014.
-  De Choudhury M, Counts S, Horvitz E. Social media as a measurement tool of depression in populations. Proceedings of the 5th Annual ACM Web Science Conference; Paris, France. 2464480: ACM; 2013. p. 47-56.
Althouse BM, Scarpino SV, Meyers LA, Ayers JW, Bargsten M, Baumbach J, et al. Enhancing disease surveillance with novel data streams: challenges and opportunities. EPJ Data Science. 2015;4(1):17.
-  Bollen J, Mao H, Pepe A. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM. 2011;11:450-3.
-  Salathé M, Khandelwal S. Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLOS Computational Biology. 2011;7(10):e1002199.
-  Frank MR, Mitchell L, Dodds PS, Danforth CM. Happiness and the patterns of life: A study of geolocated tweets. arXiv preprint arXiv:13041296. 2013.
-  Golder SA, Macy MW. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science. 2011;333(6051):1878-81.
-  Coppersmith G, Dredze M, Harman C, Hollingshead K, editors. From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses. CLPsych@ HLT-NAACL; 2015.
-  Gore RJ, Diallo S, Padilla J. You Are What You Tweet: Connecting the Geographic Variation in America’s Obesity Rate to Twitter Content. PLOS ONE. 2015;10(9):e0133505.
-  Super DE. A life-span, life-space approach to career development. Career choice and development: Applying contemporary theories to practice, 2nd ed. The Jossey-Bass management series and The Jossey-Bass social and behavioral science series. San Francisco, CA, US: Jossey-Bass; 1990. p. 197-261.
-  Stone AA, Schneider S, Harter JK. Day-of-week mood patterns in the United States: On the existence of ‘Blue Monday’, ‘Thank God it’s Friday’ and weekend effects. The Journal of Positive Psychology. 2012;7(4):306-14.
-  Egloff B, Tausch A, Kohlmann C-W, Krohne HW. Relationships between time of day, day of the week, and positive mood: Exploring the role of the mood measure. Motivation and Emotion. 1995;19(2):99-110.
-  Howarth E, Hoffman MS. A multidimensional approach to the relationship between mood and weather. British Journal of Psychology. 1984;75(1):15-23.
-  Denissen JJ, Butalid L, Penke L, Van Aken MA. The effects of weather on daily mood: A multilevel approach. Emotion. 2008;8(5):662.
-  Klimstra TA, Frijns T, Keijsers L, Denissen JJ, Raaijmakers QA, van Aken MA, et al. Come rain or come shine: individual differences in how weather affects mood. Emotion. 2011;11(6):1495.
-  Baylis P. Temperature and temperament: Evidence from a billion tweets. Energy Institute at HAAS working paper. 2015.
-  Berry DS, Hansen JS. Positive affect, negative affect, and social interaction. Journal of Personality and Social Psychology. 1996;71(4):796.
-  Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one. 2011;6(12):e26752.
-  O’Connor B, Balasubramanyan R, Routledge BR, Smith NA. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM. 2010;11(122-129):1-2.
-  Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H. We feel: mapping emotion on Twitter. IEEE journal of biomedical and health informatics. 2015;19(4):1246-52.
-  Mitchell L, Frank MR, Harris KD, Dodds PS, Danforth CM. The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place. PLOS ONE. 2013;8(5):e64417.
-  Park K, Lee S, Kim E, Park M, Park J, Cha M, editors. Mood and weather: Feeling the heat? ICWSM; 2013.
-  Hannak A, Anderson E, Barrett LF, Lehmann S, Mislove A, Riedewald M, editors. Tweetin’in the Rain: Exploring Societal-Scale Effects of Weather on Mood. ICWSM; 2012.
-  Li J, Wang X, Hovy E, editors. What a nasty day: Exploring mood-weather relationship from twitter. proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management; 2014: ACM.
-  Giachanou A, Crestani F. Like it or not: A survey of twitter sentiment analysis methods. ACM Computing Surveys (CSUR). 2016;49(2):28.
-  Mahmud J, Nichols J, Drews C. Home location identification of twitter users. ACM Transactions on Intelligent Systems and Technology (TIST). 2014;5(3):47.
-  Teske D. Geocoder accuracy ranking. Process Design for Natural Scientists: Springer; 2014. p. 161-74.
-  Rahimi A, Cohn T, Baldwin T. Twitter user geolocation using a unified text and network prediction model. arXiv preprint arXiv:150608259. 2015.
-  Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems. 2015;89:14-46.
-  Ribeiro FN, Araújo M, Gonçalves P, Gonçalves MA, Benevenuto F. Sentibench-a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science. 2016;5(1):1-29.
-  Reagan AJ, Danforth CM, Tivnan B, Williams JR, Dodds PS. Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs. EPJ Data Science. 2017;6(1):28.
-  Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A. Sentiment strength detection in short informal text. Journal of the Association for Information Science and Technology. 2010;61(12):2544-58.
-  Thelwall M. Sentiment analysis and time series with twitter. Twitter and Society Peter Lang Publishing. 2014:83-96.
-  Alves ALF, de Souza Baptista C, Firmino AA, de Oliveira MG, de Paiva AC. A spatial and temporal sentiment analysis approach applied to Twitter microtexts. Journal of Information and Data Management. 2016;6(2):118.
-  Balog K, Mishne G, De Rijke M, editors. Why are they excited?: identifying and explaining spikes in blog mood levels. Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations; 2006: Association for Computational Linguistics.
-  Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of computational science. 2011;2(1):1-8.
-  Antweiler W, Frank MZ. Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance. 2004;59(3):1259-94.
-  Kramer AD, editor An unobtrusive behavioral model of gross national happiness. Proceedings of the SIGCHI conference on human factors in computing systems; 2010: ACM.
-  Gilbert E, Karahalios K, editors. Widespread Worry and the Stock Market. ICWSM; 2010.
-  Diener E, Emmons RA. The independence of positive and negative affect. J Pers Soc Psychol. 1984;47(5):1105-17.
-  Clark LA, Watson D. Mood and the mundane: relations between daily life events and self-reported mood. J Pers Soc Psychol. 1988;54(2):296-308.
-  Bollen J, Gonçalves B, van de Leemput I, Ruan G. The happiness paradox: your friends are happier than you. EPJ Data Science. 2017;6(1):4.
-  Burnap P, Williams ML. Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science. 2016;5(1):11.
-  An J, Quercia D, Cha M, Gummadi K, Crowcroft J. Sharing political news: the balancing act of intimacy and socialization in selective exposure. EPJ Data Science. 2014;3(1):12.
-  Salathé M, Vu DQ, Khandelwal S, Hunter DR. The dynamics of health behavior sentiments on a large online social network. EPJ Data Science. 2013;2(1):4.
-  Volkova S, Charles LE, Harrison J, Corley CD. Uncovering the relationships between military community health and affects expressed in social media. EPJ Data Science. 2017;6(1):9.
-  Sloan L, Morgan J. Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter. PLoS ONE. 2015;10(11):e0142209.
-  Sloan L, Morgan J, Burnap P, Williams M. Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data. PLoS ONE. 2015;10(3):e0115545.
-  Mislove A, Lehmann S, Ahn Y-Y, Onnela J-P, Rosenquist JN. Understanding the Demographics of Twitter Users. Fifth International AAAI Conference on Weblogs and Social Media. 2011;11:554-7.
-  Sadah AS, Shahbazi M, Wiley TM, Hristidis V. A Study of the Demographics of Web-Based Health-Related Social Media Users. J Med Internet Res. 2015;17(8):e194.
-  Malik MM, Lamba H, Nakos C, Pfeffer J. Population bias in geotagged tweets. People. 2015;1(3,759.710):3,759.10-7,233.531.
-  Dunn AG, Surian D, Leask J, Dey A, Mandl KD, Coiera E. Mapping information exposure on social media to explain differences in HPV vaccine coverage in the United States. Vaccine. 2017;35(23):3033-40.
-  Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science. 2015.
-  Atefeh F, Khreich W. A survey of techniques for event detection in twitter. Computational Intelligence. 2015;31(1):132-64.
-  Weng J, Lee B-S. Event detection in twitter. ICWSM. 2011;11:401-8.
Funding for this research: National Health and Medical Research Council (NHMRC Project APP1128968).
Z.S., E.C., K.D.M., and A.G.D. designed the study; Z.S. and P.N. collected the data; Z.S. and A.G.D. analysed the data; Z.S. and A.G.D. drafted the manuscript; Z.S., P.N., E.C., K.D.M., and A.G.D. critically revised the manuscript and approved its submission.
Competing financial interests
The authors declare no competing financial interests.