A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

05/28/2017 ∙ by Nazli Farajidavar, et al. ∙ 0

Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets of over four months from San Francisco Bay Area dataset and additional datasets collected from London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81 classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5 traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75 Twitter comments than sport, weather and crime topics.



There are no comments yet.


page 5

page 12

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent advances in ubiquitous computing and context-aware technologies have boosted the interest in smart city framework designs. These frameworks endeavour to provide authorities and citizens with real-time information and assistance in the decision-making and resource allocation processes. Meantime, the departmental structure of a city can be very complex, and its management continues to be strained by various factors, such as dynamic nature of their services, population growth and continuously shrinking pool of available financial resources. Figure 1(a) illustrates an evidence of some of the common departments that provide public support and management for London and their budget re-allocations within the past two years. 111https://www.gov.uk/government/publications/public-expenditure-statistical-analyses-2015

(a) Figure A
(b) Figure B
Figure 1: A: Common city departments [© [9]], B: Tweets reporting various concerns about a city spanning power supply, water quality, traffic jams, and public transport delays (© [3]).

Some of the services offered by these departments are dynamic, e.g., transportation services and their behaviour may vary in response to social and cultural events, accidents, and weather conditions. In this sense, understanding events occurring in cities is of great contemporary interest [33, 28, 23] to city authorities to enhance their management and to optimise operations and interactions among various city departments and services. A possible way to do this is through getting continues feedback and event reports from citizens, who are the front-end users of these services.

Meanwhile, the emergence of social networks, such as Twitter222http://twitter.com/, Facebook333http://facebook.com/ and Instagram444http://www.instagram.com, offers enormous information that can be exploited for citizen sensing. This could be used to notify citizens as well as authorities regarding the events occurring in smart urban spaces (Figure 1(b) depicts samples of real-world city events reported directly by citizens on social media). However, the citizen sensing [38, 10] component that can provide complementary or corroborative information is often ignored in state-of-the-art analytics for smart cities [15].

In this article we propose a hybrid pipeline for real-time sensing in cities through utilisation of complementary dynamic data sources, namely Twitter, London Road disruption reports from traffic sensors; and Time Out London. The proposed data processing pipeline involves data wrappers, a novel Natural Language Processing (NLP) component based on multi-view learning, and multi-sensor correlation analysis. We presented a priliminary version of this pipeline in 


. And in this article we will further focus on the detailed theoretical design aspects of the model and include extended experiments to showcase its performance. The multi-view learning component combines the output of a Convolutional Neural Network (CNN) learning with a name entity event extraction to enable a near real-time city-related event extraction from short informal text corpus of Twitter. Developing a scalable automatic city event annotation system, we show that our proposed solution achieves performance boost compared to the state-of-the-art approaches 

[3, 36]

. Up to our knowledge, this is the first time that a multi-view deep learning algorithm has been proposed in the context of city event extraction. Subsequently, we conducted a similarity analysis on the processed data from social media, road sensors, and Web of Data, and discover the associations between incidents in near real-time. The research contributions are four-fold in high-level and can be summarised as follows: i) Automated real-time data collection wrappers for Twitter and city sensors; ii) A near real-time NLP component for classifying Twitter data; iii) A correlation analysis for detecting the dependencies between Twitter stream and city sensors and web driven data records; iv) A web interface for displaying and visualising the city’s event highlights. The fine-grained contributions of the proposed NLP component are as follows: ii-i) real-time multi-label event extraction from Twitter, ii-ii) a novel multi-view deep learning formulation for event extraction based on graphical models, ii-iii) late classification results fusion for an enhanced event location extraction from tweets.

The paper is organised as follows. Section 2 describes the benchmark task of interest - Tweet annotation - where we discuss related works. In Section 3, we describe the proposed multi-view pipeline. Section 4 details our experimental setup and discusses the evaluation results. Finally, in Section 5 we derive a conclusion for our work and provide future directions.

2 Related work

Typically, a city has many departments such as public safety, urban planning, energy, water, transportation, social programs, and education [6, 7]. The live updates on the performance and quality of services offered by these departments are important for city authorities to leverage the management of city resources and for citizens to make more informed decisions using the city services and to interact better with surrounding environment. Meanwhile, social media networks, such as Twitter offer a near real-time communication platforms which can be utilised to facilitate this purpose. Such information can complement sensor data and textual reports collected from conventional sources or city departments, and it can help to enhance the public services. For example, sensors deployed on a road may report reduced speed of vehicles which can be explained by the procession obstructing traffic that is reported on social media.

The design of such platform which utilises the social media as a source for public sensing in city-related event extraction context, needs to address the following research question: How to extract city infrastructure related events from Twitter? How to exploit event and location knowledge-bases for event extraction? And finally how accurately these Twitter extracted events are matching the reality of city events?

The studies such as [32, 3] assumed the presence of event data sources such as sensory data (e.g., loop detectors) and formal report of events (e.g., eventful555http://eventful.com/) in a city. While utilisation of such a formal data source can serve as a reliable source for training an automated event extraction system, such resources may not be available with short latency or even not exist at all in many cities. Therefore, we need the alternative and complementary data sources for training such model for different cities.

Event extraction from textual corpus, can be categorised into two groups according to the structure of the text; formal corpus vs informal. Where the former refers to the grammatical text such as news documents and the later addresses the user-generated content with no overt structure that might contain a lot of slang and non-standard abbreviations and notations (as it is the case in data obtained from Twitter).

In formal text analysis domain, Liu et al[29] proposed to alleviate information overload in daily news by extracting key entity and significant event of news documents. A bipartite graph was induced in [4], based on the entities and their associations to documents using mutual reinforcement principle capturing salient entities and the documents with salient entities used to rank the news events. Extraction of local events from blog entries carried out by [34]. Use of lightweight patterns to extract global crisis events from news text presented in [39]. Event extraction in the context of detecting infectious disease outbreak was achieved by [19]

where the event schema consisted of date range, geo-location, disease name, organism type and number affected by the disease, and the organism survival information. The event extraction then obtained by finite-state pattern matching on the tokenized input text. More recently, adding convolutional layers to the neural network language model of Bengio

et al[8], Collorbert et al[12]

developed their convolutional neural network model that shared representations across the tasks of language modelling, part of speech tagging, chuncking, named entity recognition, semantic role labelling, and syntactic parsing. Although the proposed model was not specifically designed for event extraction, its performance surpassed the state of the art methods on majority of the language modelling tasks.

Event extraction from informal text (which is our main focus in this paper due to the informal nature of Twitter textual content) is also addressed in literature [5, 36, 3]. In [5], the authors used temporal (volume changes), social (replies, broadcast), topical (coherence of clusters), and Twitter-centric (multi-word hashtags) features to train a classifier that performed better than the baseline. Ritter et al[36], solved the task in an unsupervised manner by building a calendar of significant events such as sports, concert, protests, politics, TV, and religion. Their approach utilised the Latent Drichlet Allocations (LDA) method to model each entity in terms of a mixture of event types and each event type in terms of a mixture of entities. Recent stdudies in [41, 44] utilised the LDA for hit and run crimes and traffic related event extraction, respectivlely. And in [30]

, the authors used the latent topic model for semantic role labeling task in Twitter data. A generalised linear regression model also used to capture the association between topics and crimes from a training dataset. Lampos and Cristianini 


proposed to use an optimised feature selection approach with a regressor to estimate the intensity of environmental and epidemiological events based on event markers.

Considering the same assumption as of [32], Anantharam et al[3] developed an automatic data annotation unit to obtain ground truth by using officially reported traffic events 666http://511.org and location 777https://www.openstreetmap.org/ knowledge-bases. The authors then used this annotated data to train a CRF-based event extraction model to capture long-term word dependencies for Twitter analysis. While their proposed approach for the preparation of the ground-truth data has shown a good word-tagging performance, the proposed CRF-based event extraction had some limitations. The model was designed to only extract traffic events. Precisely speaking, since the automatic annotation unit was trained with the officially reported ground-truth traffic events of a limited time period, the model performed poorly in the prediction of future incidents specifically reported by new users. Besides, although the location terms have been extracted, they were not utilised to associate locations with extracted events. Instead, the authors assumed that the tweet’s geo-location tag (the location where the users tweet the events) can serve as the event locations, which is not always valid.

In multi-view learning literature, Chen et al. [11] developed a statistical framework that learns a predictive subspace shared by multiple views based on a generic multi-view latent space Markov network. Kumar et al[25]

co-trained unsupervised learning models and proposed a spectral clustering algorithm for multi-view data. Quadrianto and Lampert 


studied the metric learning problem in cross-media retrieval tasks with the aim to learn metrics with which the original multi-view higher dimensional features can be projected into a shared feature space, so that the Euclidean distance in this space is meaningful not only within a single view, but also among different views. In our multi-view learning model we used the Restricted Boltzmann Machines formulation to be consistent with the rest of the neural network architecture of CNNs.

While in all proposed platforms for event extraction from Twitter, the main focus had been on training an NLP model using tweet’s informal text corpus, the human intelligence learning model does not work as such. As human, we initially learn the semantic meaning of the words in a language. We then, been taught on the synthetic structure (Grammar rules) of the sentence at the school by means of formal corpus (i.e. books). Analogously, NLP approaches which are jointly attempt to accomplish the PoS and NER tagging using the informal Twitter corpus will not acquire the potential of being extended to future data due to their intrinsic limitation. Taking this into consideration, we have proposed an NLP framework for informal text classification which is not only applicable to future data but also addresses the the limitations of the other state of the art approaches. We utilised a CRF-based Name Entity Recognition (NER) model of [3] and extending it beyond traffic event extraction, we have proposed a multi-view learning pipeline which fuses the CRF output with the part of speech (POS) tags extracted from the Convolutional Neural Network (CNN) [12] model, for leveraging the city event extraction.

Utilising a CNN model which is trained on formal texts for PoS tagging of tweet words is plausible, since the underlying syntactic role of words in a language are still valid even in informal texts such as Twitter corpora despite their variation in sentence grammatical structures. In terms of CRF training, unlike Anantharam [3] et al.’s model, our proposed model is trained on more generic categorical data and is capable of detecting a wider categorical range of city events. This allows the model to better generalise to future events and incidents. While various neural network architectures [13, 18, 40] have been proposed in literature and their performance are investigated for Twitter sentiment classification, to the best of our knowledge, this is the first time that the CNN text analysis is utilised for city even extraction from informal text and its result is integrated with a CRF NER tagger in a deep multi-view learning framework to obtain an enhanced sentence-level inference and event extraction. To further validate the verity of the extracted events, we have parsed data from London Traffic API and TimeOut London sociocultural resources and evaluated the veracity of twitter extracted events through a graph-based similarity analysis.

3 Methodology

Our proposed hybrid approach is based on undirected graphical models. Figure 2 depicts the diagram of the proposed hybrid approach. We developed three data wrappers to collect data from the city; Twitter stream API 888https://dev.twitter.com/streaming, Transport for London API 999http://data.tfl.gov.uk/tfl, and Time Out London 101010http://www.timeout.com/london parser. Furthermore, we developed a data processing component that involves of two main parts: i) Natural Language Processing (NLP) on Twitter data streams and ii) similarity analysis on Twitter, road sensor data, and scheduled events collected from Time Out London website. We used the Google translate API to automatically detect the source language on non-English tweets and translate them into English to facilitate the text analysis step.

Figure 2: The proposed hybrid pipeline

3.1 Twitter NLP Component

Figure 3 shows the data processing units of the proposed NLP component which is composed of three sub-components: a semantic embedding subspace learning, a syntactic embedding subspace learning, and a multi-view event extraction.

Figure 3: NLP component: detailed event detection pipeline. Note that the semantic embedding view is modelled with a CRF and the syntactic embedding view is modelled through CNN.

Given a tweet text represented by , we are interested in associating it with one or multiple city-related event classes from the events set: = {TransportationEvent, WeatherEvent, CulturalEvent, SocialEvent, SportEvent, FoodEvent, CriminalEvent} along with a Location tag. To assign event tags to tweets, we have assumed that each tweet contains only one sentence. Considering the 140 character limit of a tweet, this assumption sounded plausible. We then decomposed sentences into semantic and syntactic embeddings where the former deals with the meaning of the words in the sentence and the later addresses its grammar structure.

The fusion of these embeddings have been used to provide an explicit insight to the meaning of sentences to facilitate their classification. This fusion can be formulated as a multi-view learning task where each embedding contributes to a distinct view of the same training data. Although baseline methods such as one proposed by [3] had shown an acceptable performance on time and location dependent annotation tasks, they will not generalise well to annotation task of varying locations and times. To address these generalisation issue, we have estimated the semantic and syntactic embedding matrices off line and independently, using more comprehensive data.

Inspired by human cognitive ability, we believe that a Part of Speech (PoS) word tagging approach which has been trained on encyclopedia corpus can help in extracting a more realistic syntactic embedding of the tweet. This in practice can resemble human’s general grammar knowledge. Doing though, we have adopted the CNN graphical model (CNN) proposed in [12] which had been trained on entire English Wikipedia.

To align the formulation of the semantic embedding extraction with the CNN based syntactic embedding, and to capture the long term dependencies in name phrases, we have chosen the Conditional Random Field (CRF) formulation of undirected graphical models for Name Entity Recognition (NER). To do so, we have used phrases, short reports and location terms extracted from official websites and authority reports (listed in Table 1) to built class conditional corpora. These conditional corpora are then used to train CRF models for name entity recognition.

Event Class Vocabulary Source
Crime http://www.shouselaw.com/crimes-a-z.html
Cultural event http://en.wikipedia.org/wiki/Category:Cultural_events
Food http://www.foodterms.com/encyclopedia
Location https://www.openstreetmap.org
Social event http://en.wikipedia.org/wiki/Category:Social_events
Sport sport dictionaries of [36]
Weather http://www.erh.noaa.gov/er/box/glossary.htm
Transportation http://511.org
Table 1: City-related event classes and their corresponding sample tweets

To fuse the information gained from these two embeddings, we have proposed a multi-view learning approach. In order to be consistent with the rest of the architecture, we have chosen a supervised learning undirected graphical model, Restricted Boltzmann Machine (RBM). This formulation in practice uses the obtained tags of the two previous embeddings for mutually validating and scoring them for a final sentence-level inference 

111111Note that retraining the last supervised learning layer of a deep architecture is a common practice in deep learning.

An example of sentence level inference is in the case of tweets such as “seeing someone being given a parking ticket” where individual words “parking” and “ticket” can belong to classes Transportation and Cultural events respectively while considering these words’ grammar roles can resolve this confusion.

The output of the system can be represented as where are representing the event type and location, extracted from the proposed NLP analysis framework, are tweet’s geo-location and time of report (meta data obtained from Twitter Streaming API) and finally denotes the event impact. The event annotation impact score is calculated as the product of event severity and event likelihood scores as in [42].

3.1.1 CRF Name Entity Tagging

The CRF is an undirected graphcal model [24] containing nodes that correspond to the set: where and . The model defines factors between (a) neighbouring tags and (b) tags and words in a sequence where and . The factor function maps all possible values of inputs variable combinations to Real numbers (also known as potential for the input variable combination) and can be formulated as, where , e.g., captures the number of times appears before in a text. Concretely, if is B-Location representing the beginning location term, and is I-Location, representing the intermediate location term, maps to the number of times this sequence appears in the corpus which may not be a normalised value. The factors for each word (where the is always observed) captures the number of times the word was labelled with the . Let’s assume is a word e.g. “Piccadily” and is B-Location, then captures the number of times the word “Piccadily” was labelled with the tag B-Location in the corpus.

More specifically, if there are words in a tweet sequence, we need factors to define relations between neighbouring tags and

factors to define the relation between tags and words. Finding the most likely tag assignment to a word in a tweet can be formalised as maximising the probability

as shown in Table 2 (a).

Essentially, the tag assignment resulting in the highest probability score is chosen as the final tag assignment for all the words. Even though the model captures the relation between adjacent tags, tag assignment is done based on the global maximum i.e., tags that result in highest overall score are assigned to all the words. Such a global assignment of tags naturally captures long distance dependencies in text.

The location and event tagging module uses the linear chain CRF model presented in Table 2 (b) which is implemented in LingPipe [2]. In a linear chain CRF model, each tag type and its positions in a corpus are extracted using a feature extractor function which takes position and the tags as input. The first word in the sequence will have “null” as the previous tag. For the rest of the words in the input sequence, the feature function is invoked with all possible tags .

are the coefficient vectors learned for each output tag in the tag set

where is the number of tags from the corpus. The corresponding scores for tag assignment given words is provided as a regression model and is not normalised. To get the probability of tag assignment, these scores need to be normalised by summation over all possible tags as shown in Table 2 (b). Though the features are extracted locally using the function , the global normalisation captures long distance relationships in the word sequences.

(a) (b)
Table 2: Formalisation of sequence labelling task (a) a generic Conditional Random Field (CRF), (b) LingPipe CRF implementation which we used in our pipeline.
Training the CRF Model

The objective is to spot event and location terms in tweets. Identifying locations in a tweet is challenging as location references in the text are hard to recognise especially in the presence of non-standard abbreviations, spellings, and capitalisation convention. To address these challenges, we train the sequence model with the knowledge of locations from Open Street Maps (OSM) [20].

On the other hand, identifying event terms is even more challenging especially given the open domain nature of city related events. To address this issue, background knowledge consisting of domain dictionaries are obtained from event reports of different web pages (see Table 1), e.g. sport, weather and locations are such categories of events. The CRF is trained on short reports of such categorical event reports and then applied to our data for event terms name entity recognition. The result of this step (shown in Fig. 3), forms the semantic embedding view and will be denoted with . This embedding can also be considered as a naive projection (embedding) of the output label space, .

3.1.2 CNN Word Tagging

The CNN model takes the input sentence and learns several layers of feature extraction that process the input tweets. The features computed by the deep layers of the network are automatically trained by back-propagation. Fig. 

4 depicts the CNN network architecture.

Figure 4: Convolutional Neural Network architecture, source: [12]
Word-Level Feature Extraction

The CNN Word Tagging unit considers a fixed-sized word dictionary 121212Unknown words are mapped to a special unknown word. Numbers references are also mapped to a “number” word. . Given a sentence of words , where , it is first embedded into a -dimensional vector space where the index is taken from a finite dictionary of size , by applying a look-up table operation:


Matrix represents the parameters to be trained in this look-up layer. Each column corresponds to the embedding of the word in the dictionary .

Having in mind the matrix-vector notation in Eq. 1, the look-up table applied over the sentence can be seen as an efficient implementation of a convolution with a kernel width of size one. Parameters are thus initialised randomly and trained as any other neural network layer. These representations have been trained on the English Wikipedia corpus 131313Available for download at http://download.wikipedia.org affter using the Penn Treebank tokenizer 141414Available at http://www.cis.upenn.edu/ treebank/tokenization.html. and after removing all pragraphs containing non-roman characters and all MediaWiki markups. The extracted features contain syntactic and semantic information which appears to be useful for inference.

In practice, it is common that one wants to represent a word with more than one feature. In such a scenario, the low-caps words and the ”caps” feature: can be used and to obtain this, one needs to apply different look-up tables for each discrete feature ( and ), and the final word embedding is formed by concatenating the output of all these look-up tables:


For simplicity, we followed [12] suggestion and considered only one look-up table.

Sentence-Level Representation

Scores for all tags and all words in the sentence are produced by applying a classical Convolutional Neural Network over the look-up table embeddings obtained from Eq. 1. More precisely, all successive windows of text (of size ) are considered by sliding over the sentence, from position to . At position , the neural network of the structural inference step is trained with the vector resulting from the concatenation of the embeddings:


The words with index exceeding the sentence boundaries

are mapped to a special padding word. As any classical neural network, Collobert proposed architecture performs several matrix-vector operations on its inputs interleaved with some non-linear transfer function

. It outputs a vector of size for each word at position , interpreted as a score for each tag in and each word in the sentence:


where denotes the number of the hidden units and the matrices and are the parameters to be trained on the network. The “hard” version of the hyperbolic tangent function is utilised as the transfer function:


Fine details of the adopted CNN architecture are explained in [12].

3.1.3 Multi-view Learning for tweet Annotation

The dictionary-based NER approaches explained in previous sections are beneficial when a text (i.e. tweet) contains some event terms that is previously seen by the model in the predefined general English words.

Given a sentence, these methods extract event terms by searching for word sequences that match the lexical entries, and create a token graph according to the word order. The next step is to estimate the score of every path using the weights of node and edges estimated by training CRF (or CNN) and selecting the best path in a joint learning model.

While combining the two proposed NER tagging approaches can lead in performance enhancement, when term ambiguity and variability are very high, specifically in the case of tweets of short-sentence nature, dictionary-based Named Entity Recognition (NER) may not be an ideal solution even though large-scale terminological resources are available [37].

A common solution to enhance the performance would be the addition of named entities to a Named Entity dictionary. However, in the case of multi-class annotation this might increase the risk of class confusions. Moreover, retraining of NER models is required to guarantee achieving task specific class labels.

Consensus principle of multi-view learning as a joint learning model aims to maximise the agreement on multiple distinct views. Suppose the available Twitter data sample has two views: the semantic view, , which is obtained from CRF+CNN NER word tagging and the syntactic view, , which is derived from CNN PoStagging. An example is therefore viewed as , where is the final label assigned to sample .

While the PoS tagging output of the CNN model will shed a light on the grammatical structure of the text (i.e. tweet) and possibly facilitates the global inference on tweet’s meaning, its NER location and organisation named entity recognition output can be utilised for boosting the Location name entity recognition of CRF model.

Since retraining the last supervised fully connected layer of a convolutional neural network for adapting the learning for a new task is a common practice in deep learning, we adopted the Restricted Boltzman Machine (RBM) 

[16] formulation to perform the multi-view learning with the aim of event classification.

The RBM is a Markov Random Field associated with a bipartite undirected graph. In the Bernoulli RBM, our focus in this work, the visible and hidden variables are assumed to take values . Each value encodes the probability that the specific feature would be active. Perceiving the RBM as an energy model [1, 17], the RBM feature learning encodes an input vector , using a vector of latent variables . Therefore each column of the weight matrix can be viewed as a filter which corresponds to network’s hidden variable in which is a non-linearity, such as the sigmoid, . The weight parameters are then estimated through maximising the likelihood of the observations via Gibbs sampling [21] based on a set of training examples and the activity of the hidden unit. The model is defined as the sum over the filter responses:


where is element-wise product and the columns of contain subspace projection filters that are learned along with from data.

Figure 5: Modelling the multi-view learning through an RBM energy model applied to the concatenation of the semantic and syntactic embeddings of a sentence.

When an energy model is applied to a concatenation of two views of a data, a response that is closely related to the response of a multi-view sparse coding model is obtainable. Inspired by Memisevic [31] multi-view image correlation model, we can formulate our view fusion problem via defining a compatibility energy function that encode the relationship between the two views as shown in Fig. 5. We have modelled the visible layer of the RBM as the concatenation of two embeddings where denotes the part of the filter in Eq. 6 that is applied to input sentence to extract its syntactic embedding and denote the part of the filter in Eq. 6 that extracts the semantic embedding of the sentence which can be perceived as a projection of the label space Y. in this formulation is a matrix representing the parameter space with where and are the two view embedding dimensionality and represents the cardinality of the label set.

Substituting in Eq. 6 with a concatenation of view projection matrices and and with , the hidden unit activities in the multi-view feature learning scenario, take the form:


In this formulation, the quadratic terms in Eq. 7 are view-specific optimisation problems which have already been solved through the prior CNN and CRF training steps.

Having an estimate of the subspace projection matrices and , we now just need to learn the weight matrix from in-domain data (tweet instances). Given the fact that the activity of the second layer in the proposed architecture (see Fig. 5) will be the concatenation of the projected views, the last layer can be trained by simply maximising the log-likelihoods over the training set. Given a sentence , the network with energy function and parameter set computes a score for each event label

. In order to transform these scores into a conditional probability distribution of labels given the sentence and the set of network parameters, one can apply a softmax operation on the scores:


Taking the log from two sides of the Eq.(8):


One can then use the gradient decent theorem to minimise the negative log probability with respect to . The back-propagation algorithm is a natural choice to efficiently compute gradients of the network architecture as stated in [12].

3.2 Tweet Impact Estimation

Many factors are contributing in reliability of an event extracted from Twitter data; there might be multiple references to the same event and the event’s extracted location might be different from the location where the tweet is published.

To capture this, we define the tweet impact factor as the product of event severity and event likelihood scores following [42]:


where the event severity score, , is calculated following the spatio-temporal event grouping approach of [3] referred as Thematic Coherence. The Thematic Coherence approach considers events with similar entities, reported within a grid (where is a set of all grids in a city) and time as multiple references of the same event and reports the severity score as the total number of events falling in this criteria.

In our evaluations, we have fixed the time to five minutes and unlike [3] who used the tweet’s geo-tag for computing the thematic coherence, upon existence we have utilised the extracted event location (the output of the multi-view tweet annotation described in section 3.1.3) along with the predicted event type for grouping and computing the event severity scores.

We have also formulated the event likelihood score computation as follows:


where the function measures the Vincenty distance [22] between two geolocation coordinates. The pairs, , and are corresponding to city-specific centroid and bounding box information, respectively which are estimated using the Flickr’s Geo API Explorer 151515https://www.flickr.com/places/info/44418. Therefore, these values will be set as for the London city. The event likelihood score in practice will assign more impact to the events which are reported closer to the city centre.

3.3 Similarity Analysis Graph Representation

For the similarity analysis, we narrowed down our focus to the event classes which enabled the access to the authority event records, namely the Transport and traffic reports and scheduled sociocultural records. To do this, we collected officially registered traffic reports from London open data store portal and parsed the Time Out London webpage to get a list of scheduled sociocultural events taking place in London along with their timestamp and locations.

Two graph structure have been considered to represent the spatial distribution of the traffic and sociocultural records. Let assume that, and representing the traffic and sociocultural record graphs respectively where denotes the cardinality of each record set. The edge values in these graphs are associated with the pair-wise Euclidean distances between the nodes which are in 3D coordinates and are denoted with feature vectors . The first three variables, , represent the spatial coordinate of a point after polar to Cartesian conversion and are the event timestamp and event type, respectively. We employed these two graphs for detecting the nearest node to each of the automatically annotated Twitter events. Moreover, we have taken into account the spatio-spectral topography of London city presented in Fig. 6.

Figure 6: Colour-coded map of the London zones showing the spectral topography of the city. Note that the use of colours on the map is only to depict the spectral distribution of the city locations.

As one can note, the spectral structure of the city infers a spectral weighting of computed distances. Meaning that, the farther we move away from the city centre, the distance turns to be inversely prominent. Taking this into account, given that seven target graphs each represents the twitter events of each class and simplifying the location vector of each node with , we then formulated the graph dissimilarities with respect to authority graphs (i.e. traffic) as follows:


where represents the Euclidean distance between two points. The points superscripts, , shows the graph memberships, and denotes the cardinality of tweets, which are classified as event type and with being the Cartesian conversion of the city centre geo-coordinates 161616Following flicker, (-0.127, 51.507) is considered as the (longitude, latitude) pair describing the city centre. The parameter is formulated as . In practice, the value of the parameter imposes higher weight to close-to-centre events compared to off centre events.

4 Experimental Setup and Results

Our experimental objective is to evaluate the proposed framework performance and its extendability for tweet classification where the data is collected from new locations and at varying time with respect to training data. To showcase this, we conducted experiments on textual Twitter data collected from two geographically different locations: San Francisco Bay area and London.

Our objective in the following evaluations are three-fold: i) to quantify the extent to which our framework can extract city events from Twitter where we compare our approach with the state-of-the-art baselines [3, 36] on San Francisco data, ii) to evaluate the performance boost of the proposed MV-RBM approach for sentence inference rather than just word tagging by testing the model on locally collected dataset from London; iii) finally to perform similarity analysis and study how well the Twitter extracted events are matching with authority reports.

4.1 Datasets

To make the evaluation, we constrain our experiments to the domain of city related events. The proposed approach is generic enough to be applied to any other cities for which the Twitter data is available. The final aim in the proposed pipeline is to assign one (or multiple) label(s) to each tweet out of a set of city event classes {Crime, Transportation (Trans.), Cultural event, Sport, Social event, Food, Weather and Location}. We leverage the open domain knowledge available for a city, specifically, vocabulary related to each of these categories from official and authorised web reports as summarised in Table1, i.e. Transportation (Trans.) vocabulary is constructed using phrases that are taken from http://511.org 171717http://511.org web page, the Open Street Map (OSM) 181818http://www.openstreetmap.org of the cities is used for extracting the city location terms and the Wikipedia cultural activities hierarchy 191919http://en.wikipedia.org/wiki/Category:Cultural_events is utilised for constructing the Cultural event terms.

4.1.1 Data Collection through Twitter Streaming API

The Twitter data which is used in this study has been collected via Twitter Streaming API which allows searching for keywords, hash tags, user Ids and geographic bounding boxes simultaneously. The filter API facilitates the search by providing a continues stream of tweets matching the search criteria. Three key parameters are used for the search:

  • Follow: a comma-separated list of user Ids to follow, which returns all of publicly published tweets in the stream.

  • Track: a comma-separated list of keywords to track.

  • Location: a comma-separated list of geographic bounding boxes containing the coordinates of the southwest point and the northeast point as a pair.

Twitter Streaming API limits the number of parameters which can be supplied in one request. Up to keywords, geographic bounding boxes and user Ids can be provided in one request. In addition, the API returns all matching tweets up to a volume equal to the streaming cap where the cap is currently set to of the total current volume of tweets published on Twitter [26].

We used the San Francisco Twitter data collected by [3] for a period of four months (Aug 2013 to Nov 2013). While the original dataset contained over 8 million tweets for this time period, the authors sub-sampled the data, resulting in a test dataset of size 500 tweets for testing their trained model. We have used the same test dataset for our comparative evaluations. This dataset is referred to as San Francisco. Additionally, we have collected data from London using all API parameters (Location bounding box, tracking and following official news agency user names and user Ids) at two different timestamps, referenced in the remaining of this paper as and . The data is composed of 3000 tweets collected between and of May 2015 and manually cleaned and annotated for training and testing the MV-RBM model. The manual annotation results undergo a second investigation for ensuring their consistency and validity. We have asked a group of technical users, who work in the field of smart cities to peer-review the validation of the annotations. The data is collected on of February 2016 and is of size . In section 4.3, we used this dataset to examine the Twitter extracted event similarity with the road sensor data and the scheduled events that are parsed from the Web.

Temporal distribution of daily tweets collected from San Francisco and datasets are shown in Fig. 7.

Figure 7: a: A comparison of temporal distributions of daily tweets in two different cities within 15-days time period: for the time period of 15/05/2015 to 01/06/2015 and San Francisco for the time period of 01/08/2013 to 15/08/2013., b: View of the Tweet annotation tool.

4.1.2 Ground-truth Annotation Tool

To facilitate the ground-truth annotation of data, we have developed a GUI tool. A view of this tool is represented in Fig. 7 202020The annotation tool is available for downloading at: http://goo.gl/UBTKQp.

For this study, we have used this tool to annotate the 3000 tweets for constructing a training dataset of size 2000 and a test set of size 1000 tweets for training and evaluating our model. We have asked a group of seven technical users who work on smart city research in our team to peer-review the annotations. Following criteria have been asked to be considered during the annotations:

  • The user is asked to assign tweets to one or more classes of events, having in mind the potential effect of the event on city daily pattern

  • The “Social” class is the class which includes voting, election, protest and other city related group activities

  • The class “Other” is an indicator for tweets which either their contents can not be associated with any of the provided classes (i.e. personal messages and opinion sharing) or they are hard to be understood (i.e. ambiguous notes and texts that are hard to follow in the absence of background knowledge).

The result of the ground-truth annotation is used for training the proposed multi-view learning model and to perform evaluations.

4.2 Performance Evaluation I: Word Level Annotation

In this section we will evaluate the three proposed city-event classification algorithms performance. We present detailed performance measures on the two datasets collocated from two different cities and show how different steps of the proposed multi-view learning model can help to achieve an enhanced performance considering all different views of the same data.

4.2.1 CRF Tweets Tagging Evaluation

In order to evaluate our CRF name entity recognition model and to assess the efficiency of the proposed class-conditioned report catalogues, we compared the performance of our model against two state-of-the-art approaches:  [36] and  [3]. Unlike our supervised event extraction, the two baseline approaches are unsupervised. Therefore, we had to combine all event classes of our framework against the non-event (other) class and mainly focus on the coverage of events and the event locations. We have used the dataset for this comparison as the baseline approaches had been evaluated using this dataset. Table 3 shows this evaluation results.

Other Location Events Precision
Ours Ours Ours Ours
Other 3936 4267 4227 590 175 68 178 9 40 0.84 0.96 0.97
Location 336 76 46 459 983 972 20 2 0 0.56 0.93 0.95
Event 26 14 29 4 0 13 70 85 225 0.7 0.86 0.84
Recall 0.91 0.98 0.98 0.43 0.85 0.92 0.26 0.88 0.85
Table 3: Comparisons of our CRF dictionary tagging vs. the baseline ( and ) methods of Anantharam et al. and our universal English CRF dictionary tagging approach.

We found that our CRF-NER model performed equally well as the best performing baseline model, , recalling for vs precision on Location terms detection and vs precision on event detections. Note that while  [3] method had been trained on a large corpus of approximately million tweets collected from San Francisco, out CRF model was only trained on generic city-independent report catalogues of Table 1 which means we did not provide domain (city) specific prior knowledge for training our CRF model. Instead, we used the CRF-NER tagging approach that is more flexible due to benefiting from a more generic set of conditional class terms. This enabled us to remove any geographical or temporal bias. However, our model performed as well as the model, which is specifically trained for a controlled domain (San Francisco Bay area) with access to official traffic reports of a given time period. Overall, we developed a model that is more flexible and adaptable, while producing results that were as good as the baseline approach. Therefore, our approach can be used for other cities with potentially varying event distribution.

To demonstrate the granular performance on each of our defined city event classes, we have presented the NER tagging results in terms of confusion matrix in Table 

4. In order to obtain the ground truth NER tags for this confusion matrix, we have used the same tagging schema as of proposed in [3], with the B- and I- prefixes referring to beginning and intermediate tags respectively where exist multiple consecutive tags in an entity phrase 212121Note that in this way each detected event (location) phrase in a given tweet might be composed of multiple terms and thereby multiple tags..

CRF dic. Tagging Ground-truth Labels Total
Crime Cultural Food Location
Social Sport Weather Trans.
Crime 12 0 0 0 4 0 0 0 0 16
Cultural 0 39 0 0 1 0 0 0 0 40
Food 0 0 63 4 1 0 0 0 0 68
Location 0 0 0 974 46 0 0 0 0 1020
1 9 6 68 4227 5 4 2 6 4335
Social 0 1 0 4 17 27 0 0 0 49
Sport 0 0 0 1 5 0 19 0 0 25
Weather 0 0 0 2 0 0 0 21 0 23
Trans. 0 0 0 2 1 0 2 0 44 49
Total 13 49 69 1055 4302 32 25 23 50
Recall 0.92 0.79 0.91 0.92 0.98 0.84 0.76 0.91 0.88
Precision 0.75 0.97 0.93 0.95 0.97 0.55 0.76 0.91 0.90
F-measure 0.83 0.88 0.92 0.94 0.97 0.67 0.76 0.91 0.89
1vs.All Acc. 0.8 0.79 0.75 0.96 0.95 0.77 0.78 0.94 0.83
Table 4: Evaluation results of the CRF dictionary based annotation on San Francisco data subset

In Table 4 we have reported Precision, recall and f-measure scores for class dependent word tagging for the same dataset. The one vs. all Tweet classification accuracies are also computed via dividing the true positive rate by total number of samples of a class.

The noteworthy is the slight difference between the one vs. all Tweet classification accuracy and word-tagging recall rate in Table 4. This in fact is caused by the intrinsic difference in word-level tagging vs. Tweet annotation made by human experts where whole Tweet meaning has taken into account and inference is involved. Investigating the Ground-truth Tweet annotations by expert users depicts that annotation differences are occurring under two general circumstances. The first source of such slight differences is where CRF-based label prediction mistakes are initiated from the assumptions made in sentence class label associations. As also reported by Anantharam et al[3], subtle changes in context result in diverse interpretation of Tweet and subtle difference in location and event references and can cause loss of precision. An example is where tweets are assigned to class “Other”. This class association is based on absence of any non-other class word tags within a Tweet. This means that if a word in a Tweet is tagged as “Location” the Tweet will be labelled as “Location” regardless of its global meaning.

The second, is where wrong Ground-truth labels are assigned to tweets due to experts’ lack of common-knowledge. This itself is of two origins: i) people normally are not aware of all events taking place and also of all locations existing in a city, and ii) there are oddly phrased tweets which understanding them is quite challenging without following tweets on a specific topic which are tweeted by a specific user - the user Ids were not included in our data due to user privacy policy.). While we have minimised the probability of such mistakes caused due to lack of individuals general knowledge with our peer-reviewed annotation scheme, the second cause remains intact as following historical data from a user is not allowed on Twitter stream API. This user’s ground-truth annotation mistake, indeed demonstrates the necessity of an automated machine annotation model.

In order to partially tackle the failures caused by the CRF-dictionary annotation, we have proposed an alternative to Anantharam et al. CRF learning by boosting the CRF dictionary knowledge view through utilising a CNN generated view which jointly aims at enhancing the Location word tagging and providing words grammar roles. The two views of the data are then fed to a multi-view learning framework to enable a sentence-level reasoning and classification. In the next two sections, we will further investigate and evaluate these claims.

4.2.2 CNN-enhanced LOCATION Tagging Evaluations

As mentioned earlier, two alternative solutions can help in enhancing the word level Tweet annotation: i) boosting the word tagging through fusion of multiple approaches and ii) training a model which considers a sentence level reasoning for Tweet annotations (i.e. classification) rather than solely relying on event-tag occurrences.

To achieve the former enhancement, we used the CNN derived tags which boosts the tagging accuracy of the LOCATION class from which was previously reported in Table 4 to . This Location tagging enhancement is important in our framework since it will enable us to assign more accurate locations to each extracted event rather than assigning the events to their tweeted locations 222222Tweet’s Geo-tags in most of the occasions is different from actual event’s location as people rarely publish their thoughts about an event exactly in the actual venue of that event. which was reported in previous studies [3].

To better evaluate our claims we have also tested our proposed word tagging approaches on a more realistic dataset collected from London referenced as . Unlike the San Francisco test data [3], the data has not been cleaned prior to evaluations. Though basic pre-processing steps such as tokenizing and stop word removals have been included in the pipeline. The dataset is divided into two sub-corpora for training and testing the fused NER model. The results are reported in Table 5.

Location tagging performance CRF-dictionary Tagging CNN-enhanced LOCATION Tagging
Recall Precision F-measure Recall Precision F-measure
San Francisco data 0.96 0.93 0.94 0.99 0.87 0.93
data 0.49 0.83 0.61 1.00 0.43 0.59
Table 5: Evaluation results of the CRF-dictionary vs. CNN-enhanced tagging on city and San Francisco datasets

Comparing this results with the performance measure on San Francisco data shows a slight degradation in Tweet event annotation performance for all classes. The main reason is that the San Francisco data which had been used in previous experiments had gone through additional data cleanings (see details in [3]) prior to testing which in turn helped in leveraging the final performance. Moreover, the effect of the CNN tagging which enhances the San Francisco Location term tagging is slightly controversial in the case of data. While San Francisco Twitter users 232323One should also note that the San Fransicso bay areas is populated by industrial companies and organisation which in practice will lead in more harmonic text patterns of tweets published within its relative bounding box. were more frequently and correctly used the “@” and “#” characters for referring to locations and organisation names, the twitters have been observed to ignore these rules more frequently.

4.3 Performance Evaluation II: Multi-View Learning

Benefiting from the name entity word tags and word syntactic roles assigned by CNN, in this section we will investigate how the two views can be trained simultaneously to realise an enhanced tweet class inference. As one might have noticed in Table 3, the cardinality of tweets belonging to “Other” category makes the Twitter corpora quite unbalanced for a supervised learning task. To tackle this issue, we have sub-sampled the subset of data of “Other” class prior to our multi-view training step. This data sub-sampling step after the CRF NER tagging sounds plausible as the absence of any name entity tags from all non-other classes in a tweet can be assumed as a course classification of that tweet as class “Other”. Table 6 demonstrates the performance evaluation of the proposed multi-view approach (presented in Sec. 4.2.2). The performance is reported in terms of one vs. all class accuracies and the multi-view approach shows improvement in the performance compared to the single view model that classified tweeted events according to the CRF NER word tagging.

1 vs. All Class Acc. Crime Cultural Food Social Sport Weather Trans.
word tagging
0.53 0.36 0.35 0.16 0.52 0.8 0.67
0.6 0.37 0.48 0.21 0.54 0.8 0.69
Table 6: Numerical results of multi-view learning evaluation on data

However, one can note a degradation of performance when it is compared with San Francisco data classification results (last row of Table 4). As mentioned before, this degradation was expected, as unlike the San Francisco data (the collection and filtering described in [3]), non of the London datasets has been cleaned prior to the system annotation.

We obtained a lower performance on “Social”, “Cultural” and “Food” topics compared to other topics. This can be explained by two reasons: class conditional dictionary term similarities in “Social” and “Cultural” categories and incomplete class conditional dictionaries in “Food” category.

Although extra cautions have been taken for constructing class-specific dictionaries however, there were some terms and phrases which contributed in more than one dictionary. This has made the event extraction process more challenging in such scenarios.

The effect of the dictionary problem can be reduced by increasing the training data from the categories that provide less accurate results. However, we did not tend to bias our data by providing more training samples of these categories and reported the results based on fair number of annotated tweets for our categories. An example is in tweet “Rainbow food @ The Good Life Eatery” where the rainbow food phrase will be tagged as “B-Weather” and “B-Food” using the single-view CRF NER tagging while the actual tag should have been “B-Food” and “I-Food”. Such mistake will be resolved, once the NER tagging outcome is jointly weighted with the PoS tags derived from CNN, in our proposed multi-view learning step.

The multi-view training step can in fact provide more flexibility in expanding class-specific dictionaries. However, adding more terms will require the model to be trained with a larger size training data and will cause higher time and computational complexity and requires more manual annotation effort.

4.4 Performance Evaluation II: Similarity Analysis

We measured the similarity of the extracted Twitter events against the road sensor data and scheduled sociocultural events that are parsed from the Web. The dataset is used for this experiment.

The comparison results are reported in terms of classes average distance (represented with ) from their nearest authority (web) record which was described in sec. 3.3

along with its variance (represented by

) and a similarity measure. To compute the similarities from the average distance values, we have first rescaled the averaged distances by the within-class maximum average distance (sport class distance) and then subtracted the result from one. Doing so, the similarity values are confined to be between and where closer to one values will guarantee higher similarity and values close to zero show a higher level of dissimilarity. First row results in Table  7 show different Twitter event distributions compared against the ground-truth traffic sensor data, where (latitude, longitude) polar coordinated are converted to Cartesian values 242424We have used the Haversine formula explained at https://en.wikipedia.org/wiki/Haversine_formula. and Euclidean distance used as the metric.

Crime Cultural Food Social Sport Weather Trans.
Road Rep. 1.00 8.79 1.00 3.67 0.95 5.14 0.90 3.74 1.69 8.72 1.72 8.21 0.74 2.34
Similarity 0.44 0.44 0.47 0.50 0.00 0.04 0.59
TimeOut Rep. 3.0 31.27 2.95 7.95 3.60 45.45 2.00 7.49 5.26 31.17 3.77 17.46 1.73 5.03
Similarity 0.41 0.44 0.32 0.60 0.00 0.28 0.67
Table 7: Similarity analysis on dataset: different Twitter class distributions compared against the ground-truth traffic sensor driven data distribution (first row), different Twitter class distributions compared against the ground-truth sociocultural data parsed form TimeOut London (second row).

The results showed that the smallest average distance of , considering the variance intervals, correspond to the Traffic events, which is a proof of an acceptable tweet classification performance. Additionally, a comparison of Twitter traffic report times with their nearest neighbour authority traffic record time-stamp, showed that of the Twitter traffic alerts are reported on average minutes. This is approximately hours prior to the authority’s official reports. This finding highlights the advantage of utilising the social media, particularly Twitter driven knowledge in facilitating and speeding up the city traffic management and potentially smoothing the task-handling. Second row results in Table 7 show different Twitter class distributions compared with the ground-truth sociocultural data, which was parsed form Time Out London computed in the same way to traffic similarities. In our experiments, we discovered higher similarity measurements for traffic, cultural and social tweet event classes with , and respective average distance values. This, in fact, proves that the popular and well-advertised cultural activities are potential high-traffic zones. It is important to point out that the Time Out London sociocultural event set does not separate the social events, such as special events held in pubs and restaurants, from cultural events (i.e. exhibitions and ceremonies) while our proposed pipeline labels these events differently.

4.5 Case Studies and Web Interface

Table 8 represents case studies for classified events. Investigating the content of the misclassified tweets (shown in the last three rows in the table), one can spot the counter-effect of conversational language complexity on classification task where people occasionally use metaphors to emphasise their concepts i.e. shooting ourselves in the foot to describe the extend of a spoiled situation and the dessert miss the rain to describe a lingering missing sensation.

Tweet Location (extracted/geo-tag) Time Event type Impact
Wind 5 km/h NNW. Barometer 1012.1 mb Rising slowly. Temperature 4.9 °C. Rain today 0.0 mm. Humidity 73% (51.34,-0.08) 00:00:40 Weather
If Leicester win the league I will shave what’s left of my hair off (51.44,-0.03) 00:00:48 Sport
RT …: ”When a woman and 2 children are killed it’s not a domestic ””incident”” it’s a crime @…” (51.46,0.11) 00:01:02 Crime
Left Selhurst Park at 10pm left Clapham Junction at 12.40am and got to catch a bus at Basingstoke as line closed! #3points #2amHome (51.47,-0.17) 00:03:13 Transport
Cinema at its most gripping journalism at its most courageous. Well done @… hopefully many more awards to come! (51.29,-0.51) 00:27:20 Cultural
I’ve seen way too many horror films in my time to not feel at ease wandering around uni and/or halls in the windy darkness (51.44,-0.03) 01:08:22 Weather
… im legit so hyped for the Marina concert (51.52,-0.02) 01:09:26 Cultural
Traffic is disgusting this morning 😣 🔫 (51.51,0.06) 07:51:15 Transport
Good morning campus @ King’s College London (51.51, -0.11) 07:51:09 Social
Man’s body found after triple murder @SkyNews (51.39, 0.23) 07:54:17 Crime
RT @…: @… please attend debate consider voting for bigger spend leads to better outcomes for health & economy (51.64, -0.38) 07:59:17 Social
@… Ayew’s a villa player isn’t he? That’s us shooting ourselves in the foot. (51.51,-0.21) 00:03:05 Crime
And i miss you the desserts miss the rain…. (51.49, -0.35) 00:03:58 Weather
@… over 8 shots saved by keeper or cleared off the line. Could’ve been 5-0. These games happen - just have to keep going. #OurYear? (51.51, -0.21) 00:31:09 Transport
Table 8: Case studies for Twitter data classified events

While in a recent study Zuao et al[43] proposed to model the intrinsic geometry of tweets through a low-rank, non-linear manifold to visualize the tweets distribution in a two dimensional Euclidean space, in this study we have focused on a visualisation in the real space. To facilitate this, we have developed a Web interface. The interface displays the classification results on a Google map in near real-time and it is composed of four elements; Google map canvas layer on which the processed and annotated tweets are displayed with their class-identical icons; a live London traffic layer from Google traffic API - code coloured paths on the map; a bar chart panel which presents the class distribution histogram of daily tweets; and a panel for displaying Twitter time line. The map data is being updated every 60 seconds by appending the past minute’s tweets to existing ones up to a 60-minutes time window. In practice, the whole data will be updated on hourly basis. Clicking on each event a dialogue box is shown on the map which reveals the underlying tweet content along with its time-stamp. The twitter user id and the names are anonymised for privacy purpose. The web interface 252525The interface is accessible at http://iot.ee.surrey.ac.uk/citypulse-social/. utilises javascript and HTML coding to read the data results saved in a CSV data structure format and displays the tweets on the map in near real time with less than 60 seconds latency. Fig. 8 shows an screen shot of the web-interface 262626The interface is accessible at http://iot.ee.surrey.ac.uk/citypulse-social/..

Figure 8: A screen shot of the developed web-interface

5 Conclusions

In this paper, we developed a multi-label event detection framework for live annotation and classification of Twitter data, in a smart city context.

We have introduced a set of common event dictionaries with minimal overlaps for facilitating the detection of city-related events and developed a GUI to facilitate the ground-truth annotation of tweets for machine learning tasks. More importantly, we have developed a novel multi-view learning framework that utilises the Convolutional Neural Network (CNN) features with CRF-dictionary driven NER tags for sentence-level event annotation. The proposed model is tested on geographically different English speaking cities and the results have proved promising. The evaluation results showed that our proposed solution is capable of annotating tweet words with an averaged accuracy of

over all event classes. And the multi-view deep learning model boosted the performance over a single-view event classification approach by 5%.

We have also performed similarity analysis which showed how authority driven traffic reports and scheduled sociocultural events can affect the traffic pattern and how citizens project on them through social media. The evaluations showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to authority’s official records and the scheduled sociocultural events have observed influencing the distribution of the twitter comments of traffic class, along as cultural and social classes. The study highlights the possibility of utilising social media as human probes for realising a real-time Physical-Cyber-Social platform for ranking, completing and potentially speeding up the city service deliveries. Finally, we have developed a live stream analysis interface to present the analysis results and case studies on a Google map. The proposed interface enables the public to visualise their city and neighbourhood event patterns.

The proposed model serves as a proof of concept and improvements can be made in several stages of the pipeline. For example, the model can be tested on non-English twitter streams, the multi-view learning labels can be updated with an online recursive learning model which is more adaptive and provides performance feedback to the rest of the pipeline.

Acknowledgement: This work has been carried out in the scope of the European Commission’s Seventh Framework Programme funded project CityPulse (FP7-609035).


  • [1] Edward H. Adelson and James R. Bergen. Spatiotemporal energy models for the perception of motion. J. OPT. SOC. AM. A, 2(2):284–299, 1985.
  • [2] Alias-i. Lingpipe 4.1.0. http://alias-i.com/lingpipe. Accessed: 2015-05-05.
  • [3] Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit P. Sheth. Extracting city traffic events from social streams. In ACM Transactions on Intelligent Systems and Technology, volume -, New York, NY, USA, 2015. ACM.
  • [4] Armen S. Asratian, Tristan M. J. Denley, and Roland Häggkvist. Bipartite Graphs and Their Applications. Cambridge University Press, New York, NY, USA, 1998.
  • [5] Hila Becker, Mor Naaman, and Luis Gravano. Beyond trending topics: Real-world event identification on twitter. In Lada A. Adamic, Ricardo A. Baeza-Yates, and Scott Counts, editors, Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011. The AAAI Press, 2011.
  • [6] Jennifer Bélissent. Getting clever about smart cities: new opportunities require new business models. In Vendor Strategy Professionals, 2010.
  • [7] Jennifer Bélissent and Frederic Giron. Service providers accelerate smart city projects. In Forrester, July 2013.
  • [8] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137–1155, March 2003.
  • [9] Anthony Bernal and Chief Programmer. Building a smarter planet, one city at a time. online resource, May 2011. Industry Solutions, IBM.
  • [10] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava. Participatory sensing. In In: Workshop on World-Sensor-Web (WSW’06): Mobile Device Centric Sensor Networks and Applications, pages 117–134, 2006.
  • [11] Ning Chen, Jun Zhu, and Eric P. Xing. Predictive subspace learning for multi-view data: a large margin approach. In John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta, editors, NIPS, pages 361–369. Curran Associates, Inc., 2010.
  • [12] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537, November 2011.
  • [13] Cicero dos Santos and Maira Gatti.

    Deep convolutional neural networks for sentiment analysis of short texts.

    In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 69–78. Dublin City University and Association for Computational Linguistics, 2014.
  • [14] Nazli FarajiDavar, Sefki Kolozali, and Payam M. Barnaghi. Physical-cyber-social similarity analysis in smart cities. In 3rd IEEE World Forum on Internet of Things, WF-IoT 2016, Reston, VA, USA, December 12-14, 2016, pages 484–489, 2016.
  • [15] Luca Filipponi, Andrea Vitaletti, Giada Landi, Vincenzo Memeo, Giorgio Laura, and Paolo Pucci. Smart city: An event driven architecture for monitoring public spaces with heterogeneous sensors. In Fourth International Conference in Sensor Technologies and Applications SENSORCOMM, pages 281–286. IEEE, 2010.
  • [16] Asja Fischer and Christian Igel. An introduction to restricted boltzmann machines. In Luis Álvarez, Marta Mejail, Luís Gómez Déniz, and Julio C. Jacobo, editors, CIARP, volume 7441 of Lecture Notes in Computer Science, pages 14–36. Springer, 2012.
  • [17] D. Fleet, H. Wagner, and D. Heeger. Nueral encoding of the binocular disparity: Energu models, position shifts and phase shift. Vision Research, 36(12):1839–1857, 1996.
  • [18] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of theTwenty-eight International Conference on Machine Learning (ICML’11), volume 27, pages 97–110, June 2011.
  • [19] Ralph Grishman, Silja Huttunen, and Roman Yangarber. Real-time event extraction for infectious disease outbreaks. In Proceedings of the Second International Conference on Human Language Technology Research, HLT ’02, pages 366–369, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
  • [20] Mordechai (Muki) Haklay and Patrick Weber. Openstreetmap: User-generated street maps. IEEE Pervasive Computing, 7(4):12–18, October 2008.
  • [21] Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, July 2006.
  • [22] Charles F. F. Karney. Algorithms for geodesics. In Journal of Geodesy, volume 87, pages 43–55. Springer, 2013.
  • [23] Michael Kehoe, Michael Cosgrove, SD Gennaro, Colin Harrison, Wim Harthoorn, John Hogan, Pam Nesbitt John Meegan, and Christina Peters. Smarter cities series: a foundation for understanding ibm smarter cities. In An IBM Redguide publication. An IBM Redguide publication, 2011.
  • [24] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
  • [25] Abhishek Kumar, Piyush Rai, and Hal Daumé III. Co-regularized multi-view spectral clustering. In Proceedings of the Conference on Neural Information Processing Systems (NIPS), Granada, Spain, 2011.
  • [26] Shamanth Kumar, Fred Morstatter, and Huan Liu. Twitter Data Analytics. Springer, New York, NY, USA, 2013.
  • [27] Vasileios Lampos and Nello Cristianini. Nowcasting events from the social web with statistical learning. ACM Trans. Intell. Syst. Technol., 3(4):72:1–72:22, September 2012.
  • [28] Greg Lindsay. Cisco’s big bet on new songdo: creating cities from scratch, 2010. http://www.fastcompany.com/.
  • [29] Mingrong Liu, Yicen Liu, Liang Xiang, Xing Chen, and Qing Yang. Extracting key entities and significant events from online daily news. In Intelligent Data Engineering and Automated Learning - IDEAL 2008, 9th International Conference, Daejeon, South Korea, November 2-5, 2008, Proceedings, pages 201–209, 2008.
  • [30] Lluís Màrquez, Xavier Carreras, Kenneth C. Litkowski, and Suzanne Stevenson. Semantic role labeling: An introduction to the special issue. Comput. Linguist., 34(2):145–159, June 2008.
  • [31] Roland Memisevic. On multi-view feature learning. CoRR, abs/1206.4609, 2012.
  • [32] Dunja Mladenić and Alexandra Moraru. Complex event processing and data mining for smart cities. In Conference on Data Mining and Data Warehouses (SiKDD 2012), 2012.
  • [33] Milind Naphade, Guruduth Banavar, Colin Harrison, Jurij Paraszczak, and Robert Morris. Smarter cities and their innovation challenges. Computer, 44(6):32–39, June 2011.
  • [34] Masayuki Okamoto and Masaaki Kikuchi. Discovering volatile events in your neighborhood: Local-area topic extraction from blog entries. In Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko N. Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, and Tetsuya Sakai, editors, AIRS, volume 5839 of Lecture Notes in Computer Science, pages 181–192. Springer, 2009.
  • [35] Novi Quadrianto and Christoph H Lampert. Kernel-based learning. In: Encyclopedia of systems biology. Springer New York, New York, 2013.
  • [36] Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 1104–1112, New York, NY, USA, 2012. ACM.
  • [37] Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. How to make the most of ne dictionaries in statistical ner. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP ’08, pages 63–70, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
  • [38] Amit P. Sheth. Citizen sensing, social signals, and enriching human experience. In IEEE Transactions on Internet Computing, volume 13, pages 87–92, 2015.
  • [39] Hristo Tanev, Jakub Piskorski, and Martin Atkinson. Real-time news event extraction for global crisis monitoring. In Epaminondas Kapetanios, Vijayan Sugumaran, and Myra Spiliopoulou, editors, NLDB, volume 5039 of Lecture Notes in Computer Science, pages 207–218. Springer, 2008.
  • [40] Duyu Tang, Furu Wei, Bing Qin, Ting Liu, and Ming Zhou. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 208–212. Association for Computational Linguistics, 2014.
  • [41] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. Automatic crime prediction using events extracted from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 231–238. Springer, 2012.
  • [42] Harry Yang, Jianchun Zhang, Binbing Yu, and Wei Zhao. Statistical Methods for Immunogenicity Assessment. Chapman and Hall/CRC, Sep 2015.
  • [43] Deyu Zhou, Tianmeng Gao, and Yulan He. Jointly event extraction and visualization on twitter via probabilistic modelling. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, 2016.
  • [44] Yuchao Zhou, Suparna De., and Klaus Moessner. Real world city event extraction from twitter data streams. Procedia Computer Science, 98:443–448, 2016.