The Human Geography of Twitter

by   Rudy Arthur, et al.

Given the centrality of regions in social movements, politics and public administration we aim to quantitatively study inter- and intra-regional communication for the first time. This work uses social media posts to first identify contiguous geographical regions with a shared social identity and then investigate patterns of communication within and between them. Our case study uses over 150 days of located Twitter data from England and Wales. In contrast to other approaches, (e.g. phone call data records or online friendship networks) we have the message contents as well as the social connection. This allows us to investigate not only the volume of communication but also the sentiment and vocabulary. We find that the South-East and North-West regions are the most talked about; regions tend to be more positive about themselves than about others; people talk politics much more between regions than within. This methodology gives researchers a powerful tool to study identity and interaction within and between social-geographic regions.


page 3

page 4

page 10

page 11


Coronavirus on Social Media: Analyzing Misinformation in Twitter Conversations

The ongoing Coronavirus Disease (COVID-19) pandemic highlights the inter...

Decay No More: A Persistent Twitter Dataset for Learning Social Meaning

With the proliferation of social media, many studies resort to social me...

Using Linguistic Cues for Analyzing Social Movements

With the growth of social media usage, social activists try to leverage ...

Filtering the intensity of public concern from social media count data with jumps

Count time series obtained from online social media data, such as Twitte...

Searching for Apparel Products from Images in the Wild

In this age of social media, people often look at what others are wearin...

Urban Explorations: Analysis of Public Park Usage using Mobile GPS Data

This study analyzes mobile phone data derived from 10 million daily acti...


Given the centrality of regions in social movements, politics and public administration we aim to quantitatively study inter- and intra-regional communication for the first time. This work uses social media posts to first identify contiguous geographical regions with a shared social identity and then investigate patterns of communication within and between them. Our case study uses over 150 days of located Twitter data from England and Wales. In contrast to other approaches, (e.g. phone call data records or online friendship networks) we have the message contents as well as the social connection. This allows us to investigate not only the volume of communication but also the sentiment and vocabulary. We find that the South-East and North-West regions are the most talked about; regions tend to be more positive about themselves than about others; people talk politics much more between regions than within. This methodology gives researchers a powerful tool to study identity and interaction within and between social-geographic regions.


Studies of human social interaction using phone call data and online social networks [5, 6, 3, 1, 7, 4, 2] have found that, contrary to some expectations [8], geography is alive and well. Despite digital technology decoupling distance and difficulty of communication, spatial proximity remains one of the key factors in determining who communicates with whom. Regions determined from records of telephone communication closely reflect traditional regional and local identities [1]. This has been confirmed for numerous countries [2] and for various different forms of electronic interaction [3].

The notion of a region therefore has much more than a purely bureaucratic meaning. Discussions of regional identity pervade social theory [9, 10], and many stereotypes, sporting rivalries and political differences occur at the regional level. In this paper we will study the regions of England and Wales, which is of particular relevance at a time when British national identity is being challenged by Brexit, regional devolution, and the economic disparity between North and South. However given that national and international policies are often implemented at the regional level (e.g. the European Union (EU) cohesion policy 111 Accessed June 2018), questions of regional identity have wider geo-political relevance. Given that geographic regions are so fundamental, our question is: how can we quantitatively study ideas like ‘regional identity’, ‘regional rivalry’ or the ‘cultural dominance’ of regions?

We begin with the observation that online social networks tend to have similar properties to offline, spatially embedded, social networks. In fact spatial structure in communication networks is robust enough to have instrumental value. Much recent research, especially on social media, has focused on exploiting the strong spatial correlations that exist in friendship networks to infer the locations of users (e.g. [11, 12, 13]). Other work linking social networks to geography has, for example, attempted to determine the amount of commerce in a given area [14] or the location of a city’s ‘heart’ [15]. This field of network geography has mostly focused on the network’s topology and how this influences interaction and accessibility [16]. It is our aim to move beyond this by studying a social network where the links carry much richer metadata.

We analyse a social network of interactions on Twitter. This social network is constructed from ‘mention’ interactions, in which one user explicitly mentions or replies to one or more other users, and thus it has a few interesting properties. Firstly, connections are intrinsically directional. Alice can mention Bob on Twitter without Bob’s permission, and Bob does not have to reciprocate. This allows for asymmetries in communication, so (at the network level) some regions can be the target of more mentions than others. Secondly, and crucially, unlike either phone call networks (where the call content is unknown) or friend/follower networks (which do not imply communication) here we have both the directed link between users and the content of the message.

The plan of the paper is as follows. We first demonstrate that user communities, identified algorithmically from Twitter mentions, are geographically contiguous and loosely correspond to our expectations, based on administrative boundaries and ‘folk’ conceptions of British regions. This approach makes no a priori assumptions about the number, location or boundaries of different regions, and is independent of administrative demarcations that may or may not reflect real regional identities. Next we use these emergent regions as the subjects of a comparative study of intra- and inter-region communication. We will compare the vocabulary and topics used by members of a region when speaking to each other compared with those used to speak to ‘outsiders’. We will then look at the volume and sentiment of messages sent within and between regions. Thus we can ask questions like “What does the South-West say to North Yorkshire?” and answer them in a concrete way e.g. “They talk about sport, and the sentiment of the communication is slightly more negative than average”.

Materials and methods

Our dataset of tweets from all of England and Wales (defined by a bounding box with lower left longitude and latitude (-5.8,49.9) and upper right (-1.2,55.9)) was obtained from four separate collections, one for the South-West (which was ongoing from previous work [17]) and the other three chosen to sample an area containing million people each. This is to avoid any potential to hit the rate limit for the Twitter streaming API (1% of all tweets globally). This collection lasted from 01/10/2017 to 22/03/2018.

All of our tweets have geographical information attached as GPS co-ordinates or ‘place-tags’. Previous work [17] has shown that GPS tagged posts are predominantly shares from other social media platforms or automated accounts, while place-tagged tweets represent direct human interactions on Twitter. Thus we exclusively use place tags for location and discard GPS-tagged tweets. We locate users by assigning them to grid tiles proportionally to the frequency of their tweeting within the tile. For example, we can have 0.5 of a user in tile 1 and 0.5 in tile 2 if that user tweets equally often from 1 and 2. This is preferred to using the user location field which is often blank, doesn’t contain location information or is too vague (e.g. ‘England’). We end up with 4513957 useful tweets authored by users in England and Wales and which mention users in England or Wales (excluding self mentions). All of our analyses are performed with this set.

Identifying Regions with Tweets

In our set of 4.5 million tweets there are unique users who mention another user in the target area. The mention network is constructed by treating each grid tile as a node and then adding an edge, , between every pair of tiles and when a user in tile mentions a user in tile . Edges are directed and have weight equal to the number of mentions sent from tile to tile . Self-edges (i.e. ) are allowed.

The Louvain method [18] is used to find communities within the resulting network; this method of community detection is robust, fast and automatically determines the best (modularity maximising) number of communities. However, this method is intended to work on undirected graphs. To turn the directed mention network into an undirected graph we set the edge weight between every pair of tiles as the total number of tweets sent in either direction (), ignoring self-edges. We run the Louvain algorithm with 100 random restarts (to sample multiple local maxima) and choose the community partition with highest modularity from the set of 100 outcomes.

Figure 1 shows the resulting regional communities, presented as a spatial grid with each tile coloured by its community label. These communities were found (with modularity ) using a network constructed using a grid. The sensitivity of community structure on the grid resolution is analysed in Supporting Information 1, finding that communities are quite robust to variation in the size of the grid boxes. Supporting Information 2 shows an image of the network itself, as well as summary statistics.

Fig 1: Communities in England and Wales determined from Twitter mentions. The Louvain algorithm suggests 9 communities is optimal for this grid resolution. The largest city in each community is labelled. White space means no tweets were recorded. The regions identified correspond roughly with England and Wales’ administrative regions (shown in grey on the map), apart from London which subsumes two other administrative regions.

Examining the map shown in Figure 1

, there is a striking geographical coherence to the communities, with 9 contiguous regions easily identified. There are some ‘outliers’, tiles belonging to a community different than their neighbours. These outliers typically have low populations, and hence low numbers of Twitter users and small edge weights, so their assignment has very little effect on the total modularity. Overall the communities reflect ‘folk’ preconceptions of where the regions of the UK should be, have a reasonable correspondence to administrative regions, and also agree with previous work using phone call networks

[1]. The main difference occurs in the London region, where London, the South-East and part of East Anglia are incorporated into one region. This area comprises (broadly) the extent of London’s ‘commuter belt’ and is likely an effect of London’s enormous economic and cultural influence. Henceforth will we label each region by its largest city, for ease of reference and since, by examination of Figure6, most of the communication volume originates from the largest city in a region.

Comparison of Vocabulary

Given that people are more likely to be ‘friends’ with people living nearby [11], the types and topics of communications within and between communities may be different. The field of topic modelling, in general and for Twitter, has a large literature (e.g. [19, 20, 21]). Other research has studied dialect differences on Twitter, particularly in the USA [22, 23, 24]. Here we use a simple approach to compare the words and topics used in intra- and inter-community communication.

We first create a lexicon

containing all distinct case-insensitive words () from all tweets. We removed user names prefaced by an ‘@’ symbol, URLs, special characters (e.g. emojis), as well as the ‘#’ symbol prefacing hashtags, though we kept the hashtag itself.

defines our word-vector space within which we construct 9 vectors,

, representing the word sets obtained from tweets originating in each of the 9 regions. We use cosine-similarity,

, to measure the similarity between regional vocabulary. Figure 2 shows that all regions are quite similar to each other. Cardiff is the most dissimilar, and there is a suggestion that the ‘northern’ regions (Birmingham, Nottingham, Leeds, Manchester and Newcastle) are more similar to each other than to the southern regions (Bristol, London and Norwich). To further investigate this we calculate the tf-idf (term frequency-inverse document frequency) score. Tf-idf assigns high scores to words that differentiate documents within a corpus. We create 9 ‘documents’ by aggregating the tweets originating from each region. The top tf-idf words in Cardiff are mainly local place names or words in the Welsh language. Other regions’ highest tf-idf words are mainly local place names or sports clubs (Supporting Information 3). This method successfully detects the Welsh language, while the cosine similarity suggests a small dialect difference between the North and South of England as well as also detecting the difference between Welsh and English tweets.

Fig 2: Cosine similarity for word vectors corresponding to each community’s lexicon. This metric equals 1 for identical vectors. Wales is notably less similar to all other regions while the more northern regions are more similar to each other than the southern ones.

Local, Regional and National Communication

To compare how ‘locals’ communicate with each other to how they communicate with ‘outsiders’ we divide tweets originating from a region into two categories: tweets sent within the region and tweets sent to other regions. Let denote word frequencies in tweets sent within community and denote word frequencies in tweets sent from community to any other community. We can then assign each word a frequency rank from most common () to least , so and . Looking at the rank differences: then gives us a rough indication of how the vocabulary varies in tweets sent within regions compared to those sent between regions. Positive rank differences indicate a word is more common in inter-community messages and negative rank differences indicate the word is more common in intra-community messages. In order to avoid distraction by rare words with spurious large rank differences, we restrict our analyses to words with frequency per tweet (calculated separately in each region and for loc or out) greater than 0.1% for every word considered. We can look at pairs of regions in the same way. We rank the words used to communicate between communities and , , and the words used to communicate between and all other communities (excluding itself and ), , and look at to see what words are characteristic of communication between and specifically.

Figure 3 and Table 1 show that intra-community words (negative rank difference) primarily refer to local issues like sports (pigeonswoop, villa, mufc) and places in the region (wigan, bradford, chester) similar to the high ranking tf-idf words. Inter-community words primarily refer to national issues (brexit, eu, nhs, tory). Table 2 shows an example for two neighbouring southern regions. Sport shows up again in pairwise communication, as it does in local communication, but not nationally - indicating that sporting rivalries are playing out on a regional level, as one might expect.

Fig 3: Loc rank versus out rank for Bristol region. Words with largest magnitude rank difference are indicated. Intra-region words include local place names while inter-region words refer to national politics.
London Manchester Birmingham Leeds
liverpool(613) brexit(694) brexit(803) brexit(682)
sleep(369) government(612) disabled(663) eu(594)
trending(337) tory(572) extremely(654) leaving(570)
ff(316) nhs(506) inclusion(654) nhs(539)
topic(304) eu(493) wwfc(518) ref(520)
co(-396) wigan(-475) thursday(-491) pop(-497)
brighton(-423) chester(-504) pigeonswoop(-504) south(-504)
lunch(-449) lunch(-515) villa(-557) thursday(-512)
awards(-472) gt(-564) art(-692) council(-564)
greater(-818) mufc(-606) blues(-700) bradford(-586)
Table 1: Top and bottom five rank differences, , for 4 most populous regions. See Supporting Information 3 for full list.
Rank difference Cardiff to Bristol
30622 busygettingbetter
22063 anglowelshcup
21903 younglivesvscancer
17430 bigwednesdayshow
14548 wenurses
Rank difference Bristol to Cardiff
19869 tywydd
20289 scarlets
29627 tandyout
40499 tandy
40727 goalscorers
Table 2: Top and bottom five rank differences, , for Bristol to Cardiff and vice versa. Most of these terms are popular hashtags. Cardiff is talking to Bristol about health related issues, a radio show and a sporting event. Bristol is talking to Cardiff about sports and ‘tywydd’, Welsh for ‘weather’.

Communication Flow Between Regions

Now that we have an assignment of each tile to a community, based on the undirected network, we form a directed network induced by the assignment of communities in Figure 1. We want to know about the net flow of mentions i.e. Does London mention Manchester more than vice-versa? Let there be communities and let be the number of mentions of (users in) community by (users in) community . If we draw a directed edge from to with weight . If the edge goes from to with weight . Thus our arrows always point towards the region which is mentioned more, weighted by the net difference in communication volume. We show this network in Figure 4 (left).

Fig 4: Left: Number of mentions sent between UK regions. Arrows show the direction of flow e.g. Manchester mentions London more than London mentions Manchester. Node colour shows number of mentions sent within a community. Right: The flow of mentions computed via the null model, node colours same as left image.

We see that London (the most populous region) is always mentioned more by the other regions than vice versa. This is perhaps expected, as tweets referencing politics are likely to be directed towards the capital. Manchester (containing the second largest city) is mentioned more by all but London. In light of previous work [17] showing that high population density leads to a super-linear increase in the amount of Twitter activity, this is perhaps not surprising; London and Manchester are very densely populated regions, so contain a lot of users and hence present a large ‘target’ for other regions. However population is not a perfect predictor, contrast Newcastle (always in deficit of mentions) with Norwich. Despite Newcastle having a larger population ( 2.7 million versus 1.6 million, see Table 3, data from UK Census222 (Accessed April 2018) ) it is mentioned much less than Norwich, perhaps an indication that its geographic isolation (it is far from the large population centres in the North-West and South-East) is leading to some social isolation.

To see which communities talk more or less than expected we establish a null model by cutting all the outgoing edges and rewiring them randomly, while keeping the in- and out-degree of every node fixed, to account for the relative size and activity of each region. This null model generates the expected pattern of inter-region communication based on Twitter activity in each region, assuming no bias in inter-regional communication. We do not redirect self-edges since the communities are, roughly, chosen to maximise self interaction by the Louvain algorithm. Comparison to a null model which randomly reassigns self-edges will thus show that the observed graph has more self interaction, by construction. It is more informative to focus on inter-community communication only. A community has outgoing mentions and incoming mentions. If mentions are distributed equally to all regions, in proportion to their share of incoming edges, then we would expect the fraction of mentions from to to be where .

We compare the expected to observed in Figure 4. The main observation here is that all regions communicate more with London (and to a lesser degree Manchester) than the null model predicts. The volume of communication between other regions is therefore less than expected, however, the direction of net communication flow is preserved for all pairs but one: Norwich talks more to Nottingham, whereas the null model predicts the opposite. For each region we look at the ratio of total incoming to total outgoing mentions in Table 3. This paints a similar picture; only London has a ratio greater than one and the North-East (Newcastle) has the lowest ratio of all 9 regions.

Region Population
London 22650335 1.629
Manchester 7733168 0.981
Bristol 4623903 0.838
Leeds 5571395 0.774
Birmingham 5038212 0.746
Norwich 1561239 0.733
Nottingham 5493704 0.690
Cardiff 2818706 0.689
Newcastle 2744728 0.638
Table 3: Regional populations (using our discovered regions) and ratio of number of incoming mentions to number of outgoing mentions.

Inter-region Sentiment

Regional identities and rivalries lead to strong emotions about sport, politics or any number of issues. Local stereotypes may lead to negative associations with a particular place. By analysing the text of the messages exchanged between regions we can ask if these expectations are reflected in the sentiment of the communication.

Sentiment analysis on Twitter is another large topic. Early work used sentiment analysis of tweets to try to predict elections, [25] movie box-office returns [26] and brand sentiment [27], demonstrating the power of the approach. Much research has been done on improving sentiment analysis for short texts like tweets or SMS messages [28, 29, 30]. We use a popular lexicon-based sentiment analyser [31, 32] to to assign a polarity to each tweet. Polarity is a number between -1 and 1 measuring how negative or positive the sentiment is in a text.

We explore the message sentiment in two ways, again using the induced graph. Figure 5 (left) shows the average polarity of a message sent between any two communities, . Polarity is on average positive, indicating the average tweet which mentions another user is positive. This is in line with research on sentiment in other corpora which finds a general trend towards positive polarity [33]. Self-polarity is shown as the node color, indicating that southern regions are more positive in both inter- and intra-community communication.

Fig 5: Left: Arrows: average sentiment per tweet between regions. Node: average sentiment for tweets sent within region. Right: Arrows , nodes: i.e. sentiment corrected for baseline of each region.

To determine the background level of sentiment for each region, for each community let . Each arrow or node in Figure 5 (right) shows . As differences in regional vocabulary may lead to some regions sending tweets with lower measured polarity scores, this procedure allows us to look at inter-region communication relative to the baseline in each region. See Supporting Information 3 for exact values, with errors. As we have so many mentions, most average polarity measurements are quite precise, smaller than the resolution of the colormap, the largest errors are between distant pairs of small regions e.g. Newcastle and Norwich. Figure 5 (right) shows that after correcting for background sentiment, southern regions are still relatively more positive about themselves. The ‘friendliest’ pair of regions, i.e. the pair with the highest are the neighbouring Midland regions Nottingham and Birmingham, while the least friendly are Birmingham and Cardiff. This implies spatial proximity alone does not account for inter-region sentiment.

We calculate two additional metrics based on polarity, and . measures how positive a region is in communication with itself compared to its communication with other regions, its ‘self-regard’. measures how positive other regions are about region , its ‘popularity’. Values are shown in Table 4. Southern regions have slightly positive , so they are more positive about themselves than about other regions, northern regions tend to be neutral or negative. Perhaps surprisingly, given its centrality in political discussions, London is the region with the highest incoming polarity from the other regions i.e. the most popular.

Region Region
Newcastle -0.0116(8) Manchester -0.0108(3)
Nottingham -0.0059(5) Newcastle -0.0049(7)
Manchester 0.0012(4) Norwich -0.0023(8)
Leeds 0.0036(5) Leeds -0.0006(4)
London 0.0072(2) Nottingham -0.0006(5)
Birmingham 0.0130(5) Bristol 0.0014(4)
Cardiff 0.0145(6) Birmingham 0.0038(4)
Bristol 0.0145(6) Cardiff 0.0038(6)
Norwich 0.0180(11) London 0.0042(2)
Table 4: Self minus outgoing sentiment, , and average incoming sentiment,, which measures the sentiment of the other regions for the target region. Norwich has the most self-regard, Newcastle the least. London is the most popular region and Manchester is the least.


Traditional regional identities are reflected in social media interactions. Located tweets are a unique resource that allow both community identification and analysis of inter-community communication. We have examined the volume of messages sent between regions, the vocabulary/topics within a region versus the vocabulary/topics used to communicate with other regions, as well as sentiment between regions.

We must be cautious in our analysis and recognise that Twitter (like all surveys, solicited or not) is not giving us an unbiased view of society at large. It is heavily urban [17, 34] and over-represents e.g. younger, higher-income people [35]. Twitter is also a platform that is used more to discuss news and social issues than personal communication, in contrast to say LinkedIn or Facebook which have different characteristic uses. This is both a feature, allowing us potential access to contentious or divisive topics, and a bug e.g. in sentiment analysis, we could be examining an unusually negative corpus. This also has consequences for the volume of tweets - the news and politics focus of Twitter is perhaps another reason that London, seat of government, finance, as well as many news organisations, is over-represented.

Nevertheless, this combination of community identification with text analysis has widespread application. Marketing and political campaigns could potentially use this methodology (perhaps at a smaller scale than national) to identify relevant local issues or if they are targeting single or multiple ‘communities’, which may respond better to different messages. Beyond practical applications, this methodology has the potential to build a quantitative, econometric basis for the study of cultural exchange. The agents of this quantitative theory are the emergent regions, and we can use this combination of social-media data, network science and text analysis to shed light on regional discourse, dialect, connectivity or possibly even regional tension in an area more fractious than the UK. This method provides a way to characterise regions and both suggests interesting social questions (e.g. why does Norwich have such a large ‘influence’ relative to its population?) and also provides the empirical data to quantitatively test explanatory theories. In general we believe this methodology will help expose the relationship between people, social media, space and place.

Supporting Information 1

Here we study how the partition of England and Wales into regions depends on how we aggregate users.

Varying the Grid Resolution

We, as in previous work [1], look at the connections between grid tiles rather than users themselves. The size of the grid tile is chosen by us, we simply divide our bounding box into an grid. Given this somewhat arbitrary choice it is important to investigate the dependence on the chosen grid resolution.

Figure 6 shows three grids, Coarse (), Medium (, which we use in the main text) and Fine (). We see that the Coarse grid identifies 5 regions (roughly: South-East, Wales & South-West, Midlands, North-East and North-West). Clustering on the Medium grid identifies the 9 regions discussed in the text. Clustering on the Fine grid identifies the same 9 regions, plus two small additional regions, centered around the cities of Stoke-on-Trent and Southampton. Modularity increases going from the Coarse (0.086) to Medium (0.209) to Fine (0.255) grid resolutions.

Fig 6: Left to right: Coarse, Medium and Fine grids.

There is no a priori reason to prefer one grid resolution over another. In this case, different choices allow different hierarchical levels of structure to be distinguished. Our choice of the grid for primary study is motivated by the observation that it is the coarsest grid that roughly reproduces the administrative regions of England and Wales. Very small communities have low volumes of communication to/from them and it becomes difficult to make statistically meaningful statements.

As we increase the resolution further we find further subdivision of the regions e.g. a small part of the South-East splits into its own community centered around Southampton. In the limit where we consider every user individually we find 1000s of communities. Our final choice is a compromise between high modularity and large volumes of communication data flowing between the regions. Furthermore, the same users and tweets are included in the common regions identified by the Medium or Fine grids. This means we would obtain the same results if we performed the analysis using the Fine grid, though with some edge cases differing slightly and some communities splitting at the periphery. Although the modularity is bigger for the Fine grid, these considerations lead us to choose the Medium grid in preference.

Supporting Information 2

We require a non-zero number of internal mentions, i.e. , in each tile, this ad hoc condition removes very sparsely populated tiles which can be assigned to any community without significantly affecting the modularity score. We are left with nodes/tiles in the graph. The associated undirected mention network contained edges (i.e. density of ) with mean node degree and mean weighted node degree . The network has no isolates i.e. it is equal to its giant component.

Twitter Mention Network for England and Wales

Fig 7: The undirected network of Twitter mentions in England and Wales aggregated on a grid. The node sizes correspond to the number of tweets sent within a tile i.e. the size of the self edge . Colours correspond to the community allocation discussed in the text. Only connections where are shown. The map is reflective of population density, showing large numbers of tweets originating in the south-east and north-west. We also see significant communication flow between the regions, supporting our assertion that this data set can be used to study inter-region communication and sentiment.

Supporting Information 3

Top 100 ‘Local’ Terms by TF-IDF.


trndnl mymyselfandmyphotos afirmremain kenonmkfm greenwichhour ealinghour womened jbombs catford freelancephotographer artconnects drivingforward mcmlvp cosplaygirls countryradio automotiveindustry rbemusicshowcase sneakerjeans mcmcomiccon ldnmcmcomiccon sqsc partyclubmix mcmmidlands willink filteredphotos britishsports forkster noskillphotography cosplayer lewisham controlyourbody southsea farmoor tooting gohartsc wearelondon headington brownout gistwithcynthia commercialvehicle leighonsea cheam hertshour govia iicsa eyesofladyw trumpdossier mylocalculture ukbtsarmy toxicsurv upthehamlet wristinstability cosplaymasquerade lovemk removerutnamskcb southeastern mictizzle ncsupperclubs newham bwsfb ldnmovesme visitbanbury upthestoke muckymerton feedthepoor meaow coycards cobbett cultureofukafrobeats boweldisease wearerugby londonfashionweek thelindaeshow samplesale osteopathy midcarpalinstability dhfc elft runwithandy bostikpremier gamificationeurope ulcerativecolitis northlondonhour otmoor mertonculture southlondon londres lovemycity ifgs herberts digme moveitshow axschat wristpain wandsworth successhour rbwm ruinadatein engagors beckenham


thefootball oldhamhour marktokerrys beeryview cheerstobeers adelaidies ffrindiau evostikleague lbe leadinggm urmston tweacon tameside camrgb oneport aufc sthelenshour eclcm savemajorcrimes calderstones bolllocks chestertweets corinne itsliverpool cfba mdso fabcan saletown poggy newbrew stav kirbs larklane cumbrianbeer isplenty churchgate theys kerrys prestonhour ourmanchester bevys faithinyoungpeople ancoats ministryofsport knitswithbeer jukeboxthursdays manchestercentralfc swanfest gmcuk purpleandproud janeballlandscapes wxmafc codarmy gan prosandcoms pieceofrubbish oafc loveforever upthetics prelovedhour chorlton majorcrimes lancashirehour sthelens wallasey fmcreators thinkfaster wythenshawe bestshowontv teamukfast onesalefc timperley proudtofoster proudtoadopt uhn mancmade teamorrell janeballphotography raggies wwoolfall altrincham indiehour mcrchristmas ourtownsteam rra hulme piccadillyward lunchtimers dazzzzer chorley cwfl moston poggys hogblog slingthemesh pnefc gbar fitnessisakeyrequirement baltictriangle refperformance


niks cornwallgate exeterhour bathindiechat swindonhour ukcachehour cmonbris mojorocks communityownership somersethour cornishhake exeterlive freshcornish countrykids devourbrischat champrugby nerdschatting remoanersbs mikevigorfanclub ourchelt dailyinvinciblecover wnukrt theendofallthings glawsfamily devonhour englishriviera makingbristolproud savebathfromboring crediton daretodisrupt selworthy smwbristol musichouruk philskillerthree glosbiz barnstaple ericalovesyou packthegate torbayhour winchcombe torbay merrybrexmas lovethebarbican bristolrugby womaninbiz teahour jacki officialgettogether newlyn walt nsomerset exmoor finzelsreach tiverton cind glaws hagx bewhatyouare cockington swtawards cornishfootball ukwinehour signedupsaturday remainfakenews bris futurecity saltash babbacombe capetown devonfoodhour wickedontour nbtproud babber opmfamily amazingarchie cwlbirds treesofthisworld paignton teignmouth glos boosttorbay brivbed maymedia cornishlove remainlies veawards hetourism bristolandproud truro gloshour falmouth poyf propertyladder newtonabbothour eatmorehake transhealth amateurradio bristolbizhour topsham dartmoor


euers yesladhull kirklate sheffieldissuper hcafc wharfedalelinks ukcolumn safclive htafc kirklatemarket bcafc euer gmy gewgo doncasterisgreat lovinleeds bigupbradford weareyork cookridge dearne connectingthepieces northstarchat hullhour theworldiswatching maboxing headingley realaleawaydays tvaddict dts ofosheffield hazlegreaves utm marketingweek standwithduchess ukwinehour backinghuddersfield sheffbeerweek ehab calumshullcrew topoftown fanthursday getskyhome nyp leedshour yesladlogo brexitnow syp dazl fido teamdylan otweeklds ijw gayleeds barnsleyisbrill nomorefelling filey taddy teamtheo scc allam projekt horsforth nostell halfhourofpower torbayhour gafc isdead stophs amey geofleurshop bdxmas dewsbury saveshefftrees teambentsgreen dwba wahaw ramember selloutsaturday ataw ryuvoelkel liff teambradford meersbrook sheffieldhour allams geddes hkr burnsy goc menston calderdale roundhay cits fearthefin midstaff supportlocalmarkets enjoyyourprivilege abilitiesnotdisabilities teamyas brighouse


yn introbiz dwi oneclubonecounty introbizexpo herefordgoals usw ddim wedi jdwpl valueadded betws iawn nawr roath edrych llongyfarchiadau ukdiscoversaudi chdi ymlaen efo oedd alfieandwilf heddiw sportcardiff mewn ydi hynny rbcdn jofli heno redbalwncoch ytad letsgodevils yna hefyd cael llanelli ond gyfer savemajorcrimes mwynhau uswrugby uptheport daysofselfcare bawb ffordd theglovesareon meddwl caerdydd murenger fel wruin wrth aberdare pubofdreams ndwg wneud pigparker heretoplay siarad penarth gyda gyd poorparkingcdf hellocolin hefo neud bigtuesdayshow uhw erbyn gymraeg rwan fynd ringland ysgol depressionrecovery llanedeyrn rhondda rhaglen cardiffmet llanishen newydd bethespark chwarae angen vitallis llandaff ystrad wyburn wearredforwalesandvelindre cathays sydd virtualworldrideforrity ammanvalley sesiwn siwr grangetown gweld roedd


norfolkhour skullswasps livinghistory allezallezallez norwichhour euroland theveryear suffolkhour upthepeckers holtbirdcafe norfolkfootball felixstowe alycia yarmouth uea tryyy autisticandproud autismaware norvwas wickeduk charitygala bigdaddyfamily badtouch rolloff facoachmentor liveworklovenorfolk garboldisham burystedmunds wearewsft drinkextraordinary northnorfolkcoast fidel caw marketmen cromer vhurrell oldchimneysbrewery nmfc erwfl holkham soyuztour akingsransom planetsuffolk weareabode wasvlei tryyyy autisticsinger promnation proudcanariesfc edpphotographer lavenham pltv forecastingchallenge grandmentors pdnreflect canarycall ueailm lovethekick communitypub gomagpies dereham beccles mfta swanlavenham commitioner srbeny norwichpubs wasvhar klose pricklypals greatyarmouth teamnnuh fawomenscup itlfc pricklypal tayfen cihcareersweek otbc fawpl woodbridge hotelschool teamwicked emwfl kyley plasterersarms sudbury thetford harvwas needhammarketfc kingslynn stowmarket hoglet sundayatthemusicals holthogcafe trowse norfolkrestaurantweek buildthenest norwichlighttunnel norwichlights foxythanks


covhour morphettes thisiscoventry brumhour worcestershirehour jewelleryquarter covhourlive leamingtonhour sociallyshared noflyzone kingsheath xhx lovebrum suahour bizw ipex cwrocks bilbrook thepaguide lovelydrop eastsidejazzclub nearbywild digbeth tuffnut pusb kfe progmill sutcolhour pedsicu thekindnessofpeople salop newbuckshead harrisgibbshair dayswild jonworcesterman willhomewoodbirmingham afrin soulstew secondsinbrum warwickshire tonetaxi covcab grumpys teamwolves cwbf atruegent loveleam harborne biggen gbccexpo nowplaying yeworcestershire blazemotm rockly yecompany studley wearebeingwarned paintwithcars mtptproject haircolour silvermountainagency dosanjh carlingo wehavebeenwarned brumbloggers wolvesaywe stirchley bypy moleend lovesolihull chamberlinkdaily bcccawards hellobrum whitneyout nuno tasteofsrilanka olton pyfproms ckob blackcountry malvernhills jq birminghamartists bonser bubblebobauk lifeecho fosun bridgnorth huis mijn zyderney getinvolvedbrum thriveawards glassboys pelsall brumindependents brumtechawards fwaw boldmere malvernhillshour


lollol nccc wherehistorybegins fankew blatherwick nonlge doigy dilnot pineappleoregg whowzers dogtanian lbororugby cousion betteridge cousions jaqui luking gudvkuking boppy simmos bforbhour gasteric byepass changingbeers jannet teamdmu blatherwicks proo earlycrew panthersnation tkofficialmc sethslegacy lborowalksonwater doigys hauntedthursday askuon psyed bennifit includeing footbsllers rebete agentr disolsys lborofamily thff picheal teambeswicks nantwich lboro earings pvfc trentham westerne lovestoke pancanstory lwow southerton wearelincoln votecomedy jsnnet stokeontrent faydeesupporter greennwhites wlechat teamngh armyred boppys farndale proudtobestaffs thrapston nffcclub faceofsot bfbd farndales twitterposh ripleys faydeearmy hcfc antoney greenandgold dilnots djfestlei nearley marksfacuptour loveintheafternoon togetherforautism leicscomedyfest myk globelist tennereefe valproate fatrophyquest theverseandguests lanzorottee ukmodel jasondenoshow kenty laughterloft welovenewhomes fingertoescrossed


votecategories consett nefollowers tnarmy prayfortidal leadgate voterfraud coymmp pelaw sparkysrunningclub sundayya totalsport votingcountmodel cheryll northeasthour morekidsonbikes enlscores goffy mancrushforever thisismine northeast corbynistalife northeastcreatives newcastlescaleupsummit letsgoeagles newcastlegateshead veteranslivesmatter utb howayblyth stillflowering hawaythescholars cillamoment templeofboom morpeth getnorth epdp uptheglebe digitalshowcase ultravires boldon hebburn metrocentre cdsou teesvalley teesside chesterlestreet stopthehate welcometosunderland southshields allhailtotheale tidesoflove turfeoke justwrestle amandain madeinnewcastle thisnortherngirlcan edinburghisblackandwhite gateshead hitthebar shinethelightbrightaootherscansee ouseburn aycliffe cramlington tvbcmembers robmcavoy derwentside sackrodwell narrowmind heworth stockton mackem votercountmodel seaham whitleybay uofefutsal cakelicious nonpolitical intothewoods nebloggers boostyourbusiness blobbys babyitschristmas trfc kone bewateraware jesmond teammcjonesforthewin sunderlandfutures boty secundus uptheesh mushroomworks weekendanthems nulli inrafawetrust longbenton roker teammcjones newcastleupontyne haway

Largest Magnitude Rank Difference Words

London Manchester Bristol Leeds Cardiff Norwich Birmingham Nottingham Newcastle
liverpool(613) brexit(694) labour(783) brexit(682) brexit(709) song(597) brexit(803) eu(787) country(567)
sleep(369) government(612) brexit(603) eu(594) eu(653) hell(533) disabled(663) brexit(696) mum(558)
trending(337) tory(572) nik(568) leaving(570) series(580) cannot(507) extremely(654) labour(568) bro(548)
ff(316) nhs(506) retweet(500) nhs(539) nhs(568) film(481) inclusion(654) nhs(442) vote(538)
topic(304) eu(493) eu(467) ref(520) labour(537) answer(461) wwfc(518) tour(437) sweet(508)
co(-396) wigan(-475) lunch(-596) pop(-497) ar(-564) students(-515) thursday(-491) dave(-509) nowt(-502)
brighton(-423) chester(-504) breakfast(-617) south(-504) coach(-570) latest(-517) pigeonswoop(-504) lincoln(-520) boro(-585)
lunch(-449) lunch(-515) swindon(-654) thursday(-512) photos(-582) involved(-522) villa(-557) steve(-530) students(-656)
awards(-472) gt(-564) cheltenham(-656) council(-564) project(-597) ncfc(-529) art(-692) council(-542) gateshead(-677)
greater(-818) mufc(-606) council(-716) bradford(-586) iawn(-852) lunch(-664) blues(-700) aswell(-881) safc(-778)
Table 5: Top and bottom five rank differences for all regions.

Inter-Region Sentiment

Region Region
Bristol Bristol 0.164(1)
Bristol Cardiff 0.149(2)
Bristol London 0.154(1)
Bristol Manchester 0.140(1)
Bristol Nottingham 0.146(2)
Bristol Newcastle 0.150(3)
Bristol Birmingham 0.157(2)
Bristol Leeds 0.152(2)
Bristol Norwich 0.149(4)
Region Region
Cardiff Bristol 0.152(2)
Cardiff Cardiff 0.156(1)
Cardiff London 0.161(1)
Cardiff Manchester 0.136(1)
Cardiff Nottingham 0.134(3)
Cardiff Newcastle 0.136(5)
Cardiff Birmingham 0.124(2)
Cardiff Leeds 0.150(3)
Cardiff Norwich 0.142(6)
Region Region
London Bristol 0.148(1)
London Cardiff 0.143(1)
London London 0.151(0)
London Manchester 0.132(1)
London Nottingham 0.140(1)
London Newcastle 0.148(1)
London Birmingham 0.151(1)
London Leeds 0.146(1)
London Norwich 0.143(2)

Region Region
Manchester Bristol 0.137(1)
Manchester Cardiff 0.152(2)
Manchester London 0.138(0)
Manchester Manchester 0.139(0)
Manchester Nottingham 0.133(2)
Manchester Newcastle 0.127(2)
Manchester Birmingham 0.138(1)
Manchester Leeds 0.137(1)
Manchester Norwich 0.140(3)
Region Region
Nottingham Bristol 0.148(2)
Nottingham Cardiff 0.154(3)
Nottingham London 0.146(1)
Nottingham Manchester 0.135(1)
Nottingham Nottingham 0.141(1)
Nottingham Newcastle 0.143(3)
Nottingham Birmingham 0.162(1)
Nottingham Leeds 0.139(1)
Nottingham Norwich 0.146(4)
Region Region
Newcastle Bristol 0.145(3)
Newcastle Cardiff 0.148(4)
Newcastle London 0.147(1)
Newcastle Manchester 0.133(2)
Newcastle Nottingham 0.141(3)
Newcastle Newcastle 0.130(1)
Newcastle Birmingham 0.148(3)
Newcastle Leeds 0.147(2)
Newcastle Norwich 0.126(7)

Region Region
Birmingham Bristol 0.152(2)
Birmingham Cardiff 0.130(2)
Birmingham London 0.149(1)
Birmingham Manchester 0.132(1)
Birmingham Nottingham 0.153(2)
Birmingham Newcastle 0.129(3)
Birmingham Birmingham 0.152(1)
Birmingham Leeds 0.138(2)
Birmingham Norwich 0.134(4)
Region Region
Leeds Bristol 0.119(2)
Leeds Cardiff 0.133(3)
Leeds London 0.132(1)
Leeds Manchester 0.125(1)
Leeds Nottingham 0.135(2)
Leeds Newcastle 0.137(2)
Leeds Birmingham 0.137(2)
Leeds Leeds 0.136(1)
Leeds Norwich 0.139(4)
Region Region
Norwich Bristol 0.144(4)
Norwich Cardiff 0.164(6)
Norwich London 0.148(1)
Norwich Manchester 0.128(3)
Norwich Nottingham 0.153(5)
Norwich Newcastle 0.136(7)
Norwich Birmingham 0.158(4)
Norwich Leeds 0.140(4)
Norwich Norwich 0.164(1)
Table 6: Average polarity of mentions sent from region to region . Number in bracket is error on last digit.


  •  1. Ratti C, Sobolevsky S, Calabrese F, Andris C, Reades J, Martino M, Claxton R, Strogatz SH. Redrawing the Map of Great Britain from a Network of Human Interactions. PLoS ONE. 2010;5(12):e14248.
  •  2. Sobolevsky S, Szell M, Campari R, Couronne T, Smoreda Z, Ratti C. Delineating Geographical Regions with Networks of Human Interactions in an Extensive Set of Countries. PLoS ONE. 2013;8(12):e81707.
  •  3. Lengyel B, Varga A, Ságvári B, Jakobi Á, Kertész J. Geographies of an Online Social Network. PLoS ONE. 2015;10(9):e0137248
  •  4. Yin J, Soliman A, Yin D, Wang S. Depicting urban boundaries from a mobility network of spatial interactions: a case study of Great Britain with geo-located Twitter data. International Journal of Geographical Information Science. 2017;31(7):1293–1313
  •  5. Stephens M, Poorthuis A. Follow thy neighbor: Connecting the social and the spatial networks on Twitter. Computers, Environment and Urban Systems. 2014;53:87–95
  •  6. Takhteyev Y, Gruzd A, Wellman B. Geography of Twitter networks. Social Networks. 2012;34(1):73–81
  •  7. Blondel VD, Decuyper A, Krings G. A survey of results on mobile phone datasets analysis.

    EPJ Data Science. 2015;4(10)

  •  8. Cairncross F. The Death of Distance: How the Communications Revolution Is Changing our Lives. Boston: Harvard Business School Press. 1997.
  •  9. Paasi A. Region and Place: Regional Identity in Question. Progress in Human Geography. 2003-08;27:475–485.
  •  10. Paasi A. The resurgence of the ‘region’ and ‘regional identity’: theoretical perspectives and empirical observations on the regional dynamics in Europe. Review of International Studies. 2009;35(1):121–146.
  •  11. Backstrom L, Sun E, Marlow C. Find me if you can: improving geographical prediction with social and spatial proximity. WWW ’10 Proceedings of the 19th international conference on World wide web. 2010.
  •  12. Sadilek A, Kautz HA, Bigham JP. Finding your friends and following them to where you are. WSDM ’12 Proceedings of the fifth ACM international conference on Web search and data mining. 2012.
  •  13. Jurgens DJ. That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. Proceedings of the AAAI International Conference on Weblogs and Social Media. 2013.
  •  14. Porta S, Strano E, Iacoviello V, Messora R, Latora V, Cardillo A, Wang F, Scellato S. Street Centrality and Densities of Retail and Services in Bologna, Italy. Environment and Planning B Planning and Design. 2009-10;36:450–465
  •  15. Louail T, Lenormand M, Cantu Ros OG, Picornell M, Herranz R, Frías-Martínez E, Ramasco JJ, Barthelemy M. From mobile phone data to the spatial structure of cities. Scientific reports. 2014.
  •  16. Batty M. Network geography: Relations, interactions, scaling and spatial processes in GIS. Re-presenting GIS. 2005.
  •  17. Arthur R, Willaims HTP. Scaling laws in geo-located Twitter data. arxiv:1711.09700
  •  18. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;10:P10008.
  •  19. Michelson M, Macskassy SA. Discovering users’ topics of interest on twitter: a first look. Proceedings of the fourth workshop on Analytics for noisy unstructured text data. 2010.
  •  20. Weng J, Lee BS. Event Detection in Twitter. Fifth International AAAI Conference on Weblogs and Social Media. 2011.
  •  21. Atefeh F, Khreich W. A Survey of Techniques for Event Detection in Twitter. Comput. Intell. 2015;31(1):132–164.
  •  22. Yuan H, Guo D, Kasakoff A, Grieve J. Understanding U.S. regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems. 2016;59;244–255.
  •  23. Blodgett SL, Green L, O’Connor BT. Demographic Dialectal Variation in Social Media: A Case Study of African-American English.

    Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2016.

  •  24. Donoso G, Sanchez D. Dialectometric analysis of language variation in Twitter. VarDial. 2017.
  •  25. Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. Fourth International AAAI Conference on Weblogs and Social Media. 2010.
  •  26. Thelwall M, Buckley K, Paltoglou G. Sentiment in Twitter Events. J. Am. Soc. Inf. Sci. Technol. 2011-02;62(2):406–418.
  •  27. Mostafa MM. More than words: Social networks’ text mining for consumer brand sentiments. Expert Systems with Applications. 2013;40(10):4241–4251.
  •  28. Kiritchenko S, Zhu X, Mohammad SM. Sentiment Analysis of Short Informal Texts.

    Journal of Artificial Intelligence Research. 2014;50(10):723–762.

  •  29. Martínex-Cámara E, Martín-Valdivia MT, Ureña-López LA, Montejo-Ráez A. Sentiment analysis in Twitter. Natural Language Engineering. 2014;20(1):1–28.
  •  30. Giachanou A, Crestani F. Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Comput. Surv. 2016;49(2):1–28.
  •  31. De Smedt T, Daelemans W. Pattern for Python.

    Journal of Machine Learning Research. 2012;13:2031–2035.

  •  32. Loria S. Textblob: Simplified Text Processing. 2010. Accessed: 01-03-2018.
  •  33. Dodds PS, Clark EM, Desu S, Frank MR, Reagan AJ, Williams JR, Mitchell L, Harris KD, Kloumann IM, Bagrow JP, Megerdoomian K, McMahon MT, Tivnan BF, Danforth CM. Human language reveals a universal positivity bias. Proceedings of the National Academy of Sciences. 2015;112(8):2389–2394.
  •  34. Hecht B, Stephens M. A Tale of Cities: Urban Biases in Volunteered Geographic Information. Proceedings of the 8th International Conference on Weblogs and Social Media. 2014-01;14:197–205.
  •  35. Malik MM, Lamba H, Nakos C, Pfeffer J. Population Bias in Geotagged Tweets. Ninth International AAAI Conference on Web and Social Media. 2015.