Location reference identification from tweets during emergencies: A deep learning approach

01/24/2019 ∙ by Abhinav Kumar, et al. ∙ NIT Patna 8

Twitter is recently being used during crises to communicate with officials and provide rescue and relief operation in real time. The geographical location information of the event, as well as users, are vitally important in such scenarios. The identification of geographic location is one of the challenging tasks as the location information fields, such as user location and place name of tweets are not reliable. The extraction of location information from tweet text is difficult as it contains a lot of non-standard English, grammatical errors, spelling mistakes, non-standard abbreviations, and so on. This research aims to extract location words used in the tweet using a Convolutional Neural Network (CNN) based model. We achieved the exact matching score of 0.929, Hamming loss of 0.002, and F_1-score of 0.96 for the tweets related to the earthquake. Our model was able to extract even three- to four-word long location references which is also evident from the exact matching score of over 92%. The findings of this paper can help in early event localization, emergency situations, real-time road traffic management, localized advertisement, and in various location-based services.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 5

page 6

page 7

page 8

page 19

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Tweets are very responsive to real-world events, and are sometimes even more immediate than traditional news channels. Therefore, it is possible to keep track of the latest information by following tweets. Several examples were seen when the news was first reported on Twitter, such as an airplane crash over the Hudson River in New York in the year 2009 (Sakaki et al., 2013), the death of former British Prime Minister Margaret Thatcher in April 2013111http://www.guardian.co.uk/technology/2013/apr/23/twitter-first-source-investment-news , and the explosions at the Boston Marathon 20131. In recent years, Twitter has been used extensively in the course of natural and human-made disasters such as earthquakes, floods, fire, terrorist attacks, civil unrest, and so on (Alexander, 2014; Landwehr et al., 2016; Laylavi et al., 2017, 2016; Luna & Pennock, 2018; Mejri et al., 2017; Mendoza et al., 2010; Sakaki et al., 2013; Singh et al., 2017; Yuan & Liu, 2018). The government and non-government agencies use Twitter in case of crisis so that different rescue operations can leap into action, disseminate information to the wider audience, and recognize floor reality (Imran et al., 2014a, 2015; Landwehr et al., 2016; Laylavi et al., 2017, 2016; Rossi et al., 2018; Sakaki et al., 2013; Zhou et al., 2017). In an American Red Cross survey, a question was asked to individuals that “whom they contacted in an emergency?” Twenty-eight percent of Americans turned to Twitter for help if they were unable to reach the emergency contact number (911)222http://www.ehstoday.com/fire_emergencyresponse/communications/red-cross-social-media-help-disaster-0232. Twitter is also used in real time road traffic monitoring (Gu et al., 2016; Hoang et al., 2016), event localization (Giridhar et al., 2015; Panteras et al., 2015), and in various location-based services (Ikawa et al., 2012; Xu et al., 2015)

. The estimation and detection of location information of events and users from tweets are a major concern in relation to the above-mentioned tasks.


Twitter provides three location information fields for sharing a user’s location: (1) User location; (2) Place name; and (3) Geo-coordinate. The user location field has 140 character spaces (previously it was limited to 30 characters) in which the user can write his/her home location information while creating their profile. This field is optional to the user and the user can write any arbitrary words or leave it blank. In many instances, they write meaningless words that might not refer to any location name. Hecht et al. (2011) analyzed that 34% of users do not reveal their “user location” information. Cheng et al. (2010) found that only 26% of users use city level or below city level location names in their user location field. However, this field can not be treated as the current location of the user as it is entered at the time of creating their profile and most of the time not updated by the users regularly. The second field is for the “place name,” which can be attached to a tweet when it is posted. The place name is represented by a location name with an array of the latitude-longitude pair in the form of the location’s boundary coordinates. These place names are predefined on the Twitter database, but it does not provide granular location information. Kumar et al. (2017) found that only 47.33% of tweets contain place names. However, 12% of those place names are incorrect in terms of their spatiotemporal information. The third field provided by Twitter is for the “geo-coordinates” (geographical footprints of latitude and longitude) that can be attached at the time of posting a tweet using a GPS- (Global Positioning System) enabled device. Most of the researchers (Huang et al., 2014; Nakaji & Yanai, 2012; Yuan et al., 2013) have considered geo-coordinates as the most explicit and precise information, i.e., tweets associated with latitude-longitude information. However, tweets with geo-coordinate information are infrequent. Cheng et al. (2010), Morstatter et al. (2013), and Kumar et al. (2017) determined that only 0.42%, 3.17%, and 7.90% of tweets respectively are geo-tagged. Kumar et al. (2017) further reported that although geo-coordinates are the most precise location information, they are not always authentic in terms of their spatiotemporal information if the tweet is posted from third-party applications such as Instagram333https://www.instagram.com/ etc. Hence, all three location information fields, available in tweets and user accounts, have their own limitations and cannot be completely relied on.

Along with the location fields mentioned above, people also make location references in their tweet texts when asking for help or reporting the event of a disaster. It is found that people from a disaster-related area tend to use their location information in their tweet text (Vieweg et al., 2010). The available location information in tweet texts is vitally important as it represents the location information of any event or user during emergencies. Hence, the location information mentioned in the tweet text may be considered as the most authentic source of geographic evidence in an emergency. The tweet text is a free-text field limited to 280 characters (previously it was 140 characters). Location information from these tweet texts can be extracted using either the gazetteer-based approach (Itoh et al., 2016; Li & Sun, 2014; Malmasi & Dras, 2015; Middleton et al., 2014; Sankaranarayanan et al., 2009; Zhang & Gelernter, 2014) or the Named Entity Recognition (NER) based approach (Gelernter & Mushegian, 2011; Giridhar et al., 2015; Unankard et al., 2015). Gazetteer is a corpus of location names (e.g., GeoNames444http://geonames.org

). In the gazetteer-based approach, the words of tweets are looked up in the gazetteer to find the location names. However, there are some inherent problems with this approach: (i) the unavailability of gazetteers for all the regions; and (ii) a location name mentioned in the text may have some other non-geographic meaning in the context of a text e.g., the word “Reading” may refer to a location name in England or it may also be used in another context. The other problem with this approach is the geo-ambiguity (distinct locations have the same name, e.g., Paris has 140 possibilities). The second approach is Named Entity Recognition (NER). The NER technique generally tokenizes each word of the tweet using language-specific part-of-speech tagging, then it detects the group of words that probably refer to named entities. This approach works well for well-written English sentences, but it does not work well for tweet texts as they have several grammatical mistakes, nonstandard abbreviation, and spelling mistakes

(Ajao et al., 2015; Gelernter & Mushegian, 2011; Ozdikis et al., 2017; Zheng et al., 2018). Temnikova et al. (2015) did an extensive analysis on the readability of tweets during the crisis and suggested several recommendations for writing understandable tweets. In many cases, a number of English language rules are violated e.g., the first letter of the proper nouns are not usually written in capital letters. Also, the grammar is not correct in many scenarios e.g., missing prepositions. Further, most users do not use the correct spelling in their tweets. They often write words in short by removing the vowels from words. To resolve the aforementioned problems and find the location references, several efforts have been made by researchers, such as Lingad et al. (2013), who re-trained the Named Entity Recognition tool for the Twitter environment, (Li et al., 2012; Liu et al., 2011; Ritter et al., 2011), they re-built their own Named Entity Recognition framework. Some other works also combined the gazetteer and NER approaches to find named entities from tweets (Gelernter & Balaji, 2013; Middleton et al., 2018).

In most of the earlier work, NER-based approaches used POS tagging and extracted all named entities such as person name, product, group, corporation, location, etc. In the current work, we are concentrating on the extraction of location words ignoring other named entities. For this, instead of using POS tagging, we train a Convolutional Neural Network- (CNN) based system to extract location names present in the tweet. We represent the tweet text as normal sentences and highlight the words containing location information. We assume that there is already a system that filters tweets based on their relatedness to a particular event. Several works have been reported regarding this (Chowdhury et al., 2013; Imran et al., 2014a; Nguyen et al., 2017a; Olteanu et al., 2014; Singh et al., 2017)

. Once the tweets are found to be related to the event, our model finds the location referring words in that tweet. We present this problem under the supervised learning paradigm. A dataset of tweets and their corresponding location words are made to train a system. Since the input is a sentence (tweet text), we had several options, such as LDA

(Blei et al., 2003), PLSA (Hofmann, 1999), and word embedding (Pennington et al., 2014) to represent the sentence. LDA and PLSA are generative statistical models that can represent a document as a mixture of a small number of topics. They are widely used for grouping tweets related to a specific event. Our target is to preserve the sentence structure so that the corresponding word number can be marked as a location word or not. This is why we prefer word embedding over other techniques, such as LDA or PLSA.

As a supervised learning model, there are several options starting from simple machine-learning models, such as SVM, Naive Bayes, Random Forest to deep-learning models, such as the Recurrent Neural Network (RNN), the Convolutional Neural Network (CNN). The machine-learning model requires some features to learn to associate input with output. Therefore, the performance of these systems heavily depends on the feature engineering. This is why we choose deep learning models. RNNs are good for sequential or long-text data. Tweets have short sentences, which favor the use of CNN over RNN. The intuition behind using CNN is that the convolutional layer can automatically learn the better representation of input data and then dense layers can utilize these input representations to identify location references. Our objective for the current work is formulated as: (i) Find whether a tweet contains a location name; and (ii) If there are location names present in a tweet, then highlight those words.


The remainder of the paper is organized as follows: in section 2, we briefly present the related literature. Our proposed framework is presented in section 3. The finding of the proposed system is presented in section 4. Section 5 contains discussion about the results and implications of the current research. We conclude the paper in Section 6.

2 Related work

Recently, a number of works have been reported for better utilization of social media for emergency purposes (Ajao et al., 2015; Alexander, 2014; Imran et al., 2015; Ozdikis et al., 2017; Shibuya, 2017; Zheng et al., 2018). Olteanu et al. (2015)

investigated several natural hazards and human-induced disasters in a systematic way to better understand the effective use of social media for information gathering processes during emergencies. Most of the existing works focus on event detection and location estimation of events or users during emergencies. In event detection, some researchers tried to detect an event as soon as possible, whereas some researchers tried to classify event-related tweets into predefined classes to conduct further analysis. In the location estimation, researchers tried to find the location of the events or users from social media. We are dividing this section into two subsections to better organize the existing works: (i) Event detection; and (ii) Location estimation.

2.1 Event detection

Imran et al. (2013) proposed a system that used machine-learning methods to detect informative messages during a crisis. After detecting informative messages, their system automatically extracted nuggets of information from them. Olteanu et al. (2014)

proposed a methodology for building an effective lexicon for crisis events. Their approach could improve the recall in the sampling of Twitter communication, which could greatly help in situation awareness during a crisis.

Imran et al. (2014b)

proposed an AIDR (Artificial Intelligence for Disaster Response) platform for automatic classification of disaster-related messages into user-defined classes.

Chowdhury et al. (2013)

used content-based features, such as n-gram and the tense of the message to automatically classify messages into pre-incident, during-incident, and post-incident classes.

Perol et al. (2018) used a convolutional neural network for earthquake detection and location estimation from seismograms. Nguyen et al. (2017b) proposed a convolutional neural network based model for classifying earthquake-related tweets into informative or non-informative classes. Their system could detect earthquake events earlier than the announcement from the official government website. Yang et al. (2017) used several classifiers to identify tweets related to flood victims and volunteers. They then proposed first come first served, static priority, and hybrid rescue-scheduling algorithms to provide help for victims as soon as possible. The extensive survey on event-detection techniques for Twitter can be seen in (Atefeh & Khreich, 2015).

2.2 Location estimation

Jurgens et al. (2015) presented a comprehensive analysis of network-based approaches for predicting the geo-location of users. Do et al. (2017) proposed a deep multiview learning model that combines the textual, network, and metadata features for predicting the geo-location of a tweet. Qian et al. (2017) proposed a probabilistic model, integrating the content and network features learned from social media, to predict the location of a user. Lourentzou et al. (2017) utilized neural network architecture to predict the geo-location of users. They found that the choice of appropriate network architecture and hyper-parameter selection can give better accuracy in predicting geo-location. A group of researchers (Do et al., 2017; Laylavi et al., 2016; Qian et al., 2017) used tweet text with other metadata such as “user location”, “geo-coordinates”, and “place name” to estimate location information. Another group (Gelernter & Balaji, 2013; Lingad et al., 2013; Malmasi & Dras, 2015) used only tweet texts to find location information. The tweet text is used because (i) geo-tagged tweets are infrequent, (ii) the user location field is not treated as the current location of the users as this field is mostly outdated. Gazetteer and Named entity recognition-based approaches are common techniques for finding location references from tweet texts. We categorize this section in three subsections: (i) the studies using gazetteers for finding location references; (ii) the studies using Named Entity Recognition for finding location references; and (iii) the studies to resolve the issue of noisy text by developing new methods.

2.2.1 Gazetteer Based Approach

Middleton et al. (2014) used gazetteers, street maps, and volunteered geographic information to develop real-time crisis-mapping by geoparsing the tweet text. They used 2,000 human labeled tweets to evaluate their results and found high precision for street-level location names. They stated that a high precision (0.90 or, above) can be found in location finding from the real-time tweets by preloading location information for the area that is at risk of any disaster. Sankaranarayanan et al. (2009) clustered several different news topics and applied a Part-of-Speech (POS) tagger and a Named Entity Recognizer to find location names from the tweet texts. However, they concluded that the Named Entity Recognizer fails to give good results in the case of tweets because it is difficult for the system to work efficiently over noisy text. Further, they applied TF-IDF to extract key phrases from tweets and used GeoNames555http://geonames.org gazetteer to find the location names. They used the extracted location names with other metadata to assign the location to clusters. Itoh et al. (2016) built their own gazetteer from geo-located posts submitted by the users from location-based services such as Foursquare. They enriched the gazetteer by adding named entities referring only to specific locations. They also used the parts-of-speech tagger to tag proper nouns from the text and added them to the list of the specific locations. They obtained 38,504 entries in their gazetteer to map a spatiotemporal visualization of the sports games and earthquake events. Malmasi & Dras (2015) proposed an unsupervised approach based on the Noun Phrase extraction and n-gram-based matching using the GeoNames gazetteer. They claimed that their system is better for the noisy microblog text. They used 2,000 manually annotated tweets to train the system and tested it with 1,000 tweets. They achieved an -score of 0.792. Li & Sun (2014) proposed a framework named PETAR, which included two components, one is the Point of Interest (POI) inventory and the second is a time-aware Point of Interest (POI) tagger. The POI inventory is built using the Foursquare check-ins, which consist of formal names of POI as well as informal abbreviations. The POI tagger is based on the Conditional Random Field model. They performed their analysis on 4,000 manually labeled tweets and achieved an -score of 0.87. Zhang & Gelernter (2014) used supervised machine-learning algorithms and utilized the gazetteer to build a model. They evaluated their model on 956 manually labeled tweets to find the location references mentioned there. Al-Olimat et al. (2017) proposed a system that extracts the location names from the text using n-gram statistics and location name gazetteers. Their location name extraction tool used augmented and filtered region-specific gazetteers to detect boundaries of multi-word location names.

2.2.2 Named Entity Recognition

Unankard et al. (2015) used clustering on tweets and then used a Standford Named Entity Recognizer (Ritter et al., 2011) to extract location names from the tweet text. They found correlations between user location and event location to localize events such as the Indonesian earthquake and the Queensland election 2012. Finally, they found the most frequent location names present in the cluster and took it as the location of the event. Giridhar et al. (2015) used road traffic twitter data from three major cities in California and clustered the tweets mentioning an event in a specific group. They tokenized each tweet and tagged each word using a Part-of-Speech (POS) tagger to find location names. Besides POS tagging, they observed that the location names were preceded by prepositions such as in, around, between, and after. In addition, they applied this grammar-based rule to find location names. After extracting location names, they obtained the geo-coordinates of each of the extracted location names using Google maps API666https://cloud.google.com/maps-platform/ and then averaged them to find the centroid of the events. Gelernter & Mushegian (2011) conducted their study on the Stanford Named Entity recognizer777https://nlp.stanford.edu/software/CRF-NER.html to know the effectiveness of it in finding location information from the tweets. They found the Stanford Named Entity could find the location names that are proper nouns, but fails to recognize local street names, buildings, nonstandard place abbreviations, misspellings and location names not starting with a capital letter. They commented that the result should improve if several named entity recognition algorithms are configured to work together. Sikdar & Gambäck (2016) proposed a named entity system for extracting the named entity from the tweets and then classifying those names’ entities in the ten different classes. For the named entity recognition they used several lexica, character, and context-based features of the tweets. Their system achieved an -score of 0.63 for named entity recognition and 0.40 for named entity classification.

2.2.3 Efforts for Twitter Named Entity Recognition

Some researchers tried to train existing NER systems with related social media text to better learn the named entities mentioned in them. Lingad et al. (2013) used several named entity recognizers namely, Stanford NER888http://nlp.stanford.edu/software/CRF-NER.shtml, OpenNLP999http://opennlp.apache.org, Yahoo! PlaceMaker101010http://developer.yahoo.com/boss/geo/, and TwitterNLP (Ritter et al., 2011) to find the location names from the disaster-related tweets. They retrained Stanford NER and Open NLP using the disaster-related tweets. They achieved an -score of 0.902 for the re-trained Standford NER and -score of 0.833 for Open NLP. Li et al. (2012) developed a novel two-step unsupervised Name Entity Recognition system named TwiNER using Wikipedia and Web N-Gram corpus. Their TwiNER named entity recognizer achieved comparable performance with other conventional NER systems for the real-life targeted tweet stream. They achieved -scores of 0.772 and 0.419 for the two different ground-truth labeled datasets. Ritter et al. (2011) experimented with the conventional NER tools and found that the accuracy dropped from 0.97 to 0.80 when it was applied to news and tweet corpus respectively. They addressed this problem by rebuilding the NLP pipeline starting with POS tagging, through chunking, to named-entity recognition. Gelernter & Balaji (2013)

used open-source Named Entity Recognition software and machine-learning techniques to identify location references, such as streets, addresses, buildings, location names, place acronyms, and abbreviations. To identify streets, buildings, and location names they used lexico-semantic pattern recognition, Named Entity Recognizer, and gazetteer, respectively. They found an

-score of 0.85 for streets, 0.86 for buildings, 0.96 for location names, and 0.88 for abbreviated place names. Overall, they found an -score of 0.90 in identifying location references. Middleton et al. (2018) proposed two different location extraction techniques: (i) entity matching by utilizing the OpenStreetMap database; and (ii) language model that makes use of numerous gazetteers and a large social media tag dataset. They also experimented with three different models that used third-party applications, such as GeoNames, Google Geocoder API, etc. They found that the OpenStreetMap database performed better among all five approaches with -scores between 0.90 and 0.97 for the English and Italian tweets and an -score of 0.66 for Turkish tweets. Liu et al. (2013, 2011)

proposed a named entity recognition framework combining the three components, which are tweet normalization, K-Nearest Neighbors (KNN) with a linear Conditional Random Fields, and a semi-supervised framework. They performed their analysis on 12,245 manually labeled tweets and found the overall

-score of 0.80 in finding named entities, such as the person, product, location, and organization. Dutt et al. (2018) proposed a system to infer location names mentioned in the text of tweets in an unsupervised fashion. They applied several preprocessing on tweets and then used a POS tagger to find proper nouns. After that, they used a gazetteer-based approach to find the location names mentioned in the tweet text with an -score of 0.79. The deep neural models are also used for Named Entity Recognition by several researchers (Chiu & Nichols, 2015; Collobert et al., 2011; Huang et al., 2015; Lample et al., 2016). Limsopatham & Collier (2016)

used bidirectional Long Short-Term Memory (LSTM), which learns the orthographic features of tweets. They extracted both character-based word representation and word-vector representation corresponding to each word of the tweet and found an

-score of 65.89 in finding named entities. Däniken & Cieliebak (2017)

used transfer learning and sentence level features for named entity recognition on tweets and achieved an

-score of 40.78.

Most of the earlier works used Named Entity Recognition (NER) and gazetteer-based approaches to find the location information from the tweet text. Existing works require a predefined set of features and location-specific gazetteers as the input for extracting location information. Therefore, the performance of these systems are heavily dependent on feature engineering. We are eliminating the feature extraction and POS tagging by using the deep Convolutional Neural Network (CNN) to find the location references mentioned in the tweets.

3 Methodology

The proposed convolutional neural network-based model learns the continuous representation of tweets and then picks salient features from them to predict the location names present in the tweets. The proposed architecture has three parts: (i) word embedding that represent tweets in the vector form; (ii) convolutional model that learns the salient features from the tweets representation; and (iii) a fully connected layer that interprets the extracted features to predict the output. The detailed proposed architecture is presented in Figure 1.

3.1 Data Collection, Preprocessing, and Labeling

We collected tweets related to earthquakes using the keywords earthquake and #earthquake from Twitter streaming API 111111https://developer.twitter.com/en/docs. Olteanu et al. (2014) proposed a system for selecting the keywords to extract relevant tweets for social media during an emergency. The data collection was accomplished between 20th October 2017 to 15th March 2018 for several earthquakes across different parts of the world, such as Iran, Mexico, Iraq, the Philippines, New York, Algeria, the United States, and Peru, to name a few. We collected a total of 103,384 tweets related to earthquakes in JSON (JavaScript Object Notation) format. The tweets contained the tweet text along with metadata, such as the posting time of tweets, user ID, tweet ID, and so on. We randomly selected a subset of these tweets to annotate the location references mentioned in the tweet text (Karimi & Yin, 2012). We kept the tweet text only and discarded other metadata for the current work as we wanted to focus on finding location words in the text only. We pre-processed the tweets to first remove non-English tweets and then removed duplicate tweets, mentioned user names, URL links, and emoticons. The duplicate tweets were removed by finding RT (re-tweets) in the tweet text. Hashtags were replaced with the corresponding word (e.g., #Mexico to Mexico). The text was converted to lower case. The stopwords were kept in the tweet as their occurrence may indicate the start of location words. We kept all the words of the entire tweet collection to make word representation even if they occurred only once. After pre-processing, the dataset has only the tweet text without any user identification marks. Hence, the user privacy has not been compromised in the current research.

In our dataset, tweets have a diverse granularity of location references, such as street name, building name, city, district, and even country name. We observed that several tweets have more than one piece of location information, i.e., tweets with multiple location references. Some location names need more than one word to represent it in the tweet. For example, this tweet “I had the same experience with the earthquake in New York back in 2012. I felt my office shake but nobody knew what happened until I saw Twitter” has two words: New and York, to refer the location name New York. Three postgraduate students volunteered to annotate the location references mentioned in the tweets. They individually annotated the tweet for words related to location references. We considered only those location references on which at least two students agreed. Finally, we obtained a total of 5,107 annotated tweets with 6,690 location references; a detailed description of the dataset used in this study is listed in Table 1. The sample tweets with location references and their annotations are listed in Table 2.

Number of Tweets (Total = 5,107) Number of words referring to location names in a tweet Total number of words referring to location names (Total = 6,690)

1897
0 0

1300
1 1300

1016
2 2032

499
3 1497

240
4 960

155
5 901
Table 1: Description of the tweets containing words referring to location names


Tweets
Location references

Hey @AppleSupport my friend @carloxito lost everything in #Mexico #earthquake, incl his iMac. Can you help him fix? http://bit.ly/2yA8HHI
Mexico


Help out! Give to ’Relief for Earthquake Victims in Kurdistan’. https://www.generosity.com/fundraisers/2269042 … #generosity via @generosity
Kurdistan

Moderate earthquake, 5 mag has occurred near Maasin in Philippines - https://wp.me/p5bFdp-rQQ #earthquake #quake
Maasin, Philippines

Small earthquake felt here, Missouri, Tennessee... https://fb.me/2K4lPX0f1
Missouri, Tennessee

There was an earthquake of seismic intensity 4 in Tokyo earlier. There is no damage.💥
Tokyo

Earthquake hits central Iraq, felt in Baghdad - Reuters http://fxmb.info/Q9m1rY #hng #earthquake http://earthcentral.org
Iraq, Baghdad


I had the same experience with the earthquake in New York back in 2012. I felt my office shake but nobody knew what happened until I saw Twitter
New York


Table 2: Sample tweets containing words referring to location names

Figure 1: Overall architecture for the Convolutional Neural Network (CNN)

3.2 Word embedding to represent tweets

We used word embeddings of tweets as the input to the model. Word embedding represents the real-valued vector representation of the words of the text corpus in a predefined fixed dimension. The word embedding creates similar vectors for words with similar meanings. For the representation of the word vector, we created a bag-of-words from all unique words in our tweet texts. After that, for each word , we made a look-up matrix to get its embedding in the dimensional vector, represented by , where represents dimensional vector. Basically, two types of initialization can be done to represent the look-up matrix . First, for the look-up matrix

, all word vectors can be randomly initialized from a uniform distribution

(Socher et al., 2013). Second, it can be a pre-trained word vector from a large corpus of text using the embedding learning algorithm (Mikolov et al., 2013; Pennington et al., 2014). In our case, we used the pre-trained word embedding GloVe (Global Vectors for word representation) (Pennington et al., 2014) as the look-up matrix for the experiment. Each word of the tweet is fed as input to the Embedding Input through the input layer, where the weight matrix between the input and embedding layer is the pre-trained look-up matrix . We used GloVe (‘glove.twitter.27B.100d.txt’121212It is freely available at https://nlp.stanford.edu/projects/glove/) with 100 dimension vectors embedding trained by the Google on 27 billion words of the tweets. The use of pre-trained GloVe embedding reduces the computation overhead and normally offers better results as it is trained over the massive corpus of the texts (Goldberg, 2016).

To represent the complete tweet in its matrix form, we concatenated the embedding of each word of the tweet. Suppose represents a tweet of length

(padded where necessary), the complete tweet in a matrix form can be represented by equation

1.

(1)

where, represents the concatenation operator, represents the concatenation of word from to . Padding is used to fix the length of each tweet to the same size. The complete tweet matrix is represented as given below:

where, represents the embedding of word , tweet matrix having words of dimension . The pictorial view can be seen from Figure 1, where each embedding input of a word is concatenated one after the other in a sequence to form a complete tweet representation. This tweet representation is then used by the Convolutional Neural Network (CNN) to learn the location references present in the tweet.

3.3 Context dependent feature extraction

The convolution process of the CNN model is used to extract the semantic features of the sentence (Kalchbrenner et al., 2014; Socher et al., 2013) by using n-gram information (Collobert et al., 2011). In the CNN model, the convolution process involves a filter , with the size of words with a dimension (same as the embedding vector). This filter is applied to the tweet matrix and performed element-wise multiplication, then the summation of all values are passed through a non-linear function to produce a new feature. Next time, this filter is again applied to the tweet matrix by moving one column towards the right and convolve with the next words of the matrix and passed through the non-linear function to again produce a new feature and so on. A feature for a window of words can be generated as:

(2)

where, is a bias and is a non-linear function. This filter is carried out to each feasible word windows having words to produce a feature map.

(3)

where . A simple convolution operation with a filter having size is represented as:

We used a Rectified Linear Unit (ReLu)

(Nair & Hinton, 2010)

as an activation function. The ReLu activation function is defined as:

, it means for it returns and for it returns itself. We used this function because it improves the training of CNN by speeding up the training process, as the computation step in ReLu is easier. After obtaining the feature map , we applied a max-over-time pooling operation (Collobert et al., 2011) and took the maximum value from a window of size , as given by equation 4.

(4)

The purpose of applying the pooling operation is to get the most important feature in each of the windows, i.e., one with the highest value. Similarly, we obtained a number-of-features matrix, one for each of the filters. After the convolution layer, we concatenated each of the matrices and flattened it to a single feature vector as can be seen in Figure 1

. The obtained neurons of the flattened layer are fully connected to the dense layer with sigmoid activation function, that predicts the output as the probability of occurrence of the location names in the tweet. To overcome the situation of over-fitting at the dense layers, we used dropout

(Srivastava et al., 2014) as the regularization technique. Dropout prevents the interdependency between the hidden neurons by simply dropping it out randomly with the probability of . This allows the neural network to learn more robust features and speed up the training.

3.4 Representation of labels at the output layer

The location references present in the tweet are represented to the output layer in the form of a zero-one vector. The location words are encoded as and the non-location words are encoded as . For the tweets “earthquake occurred near tazeh abad kermanshah at utc earthquake tazehabad,” “hey my friend lost everything in mexico earthquake incl his imac can you help him fix,” and “help out give to relief for earthquake victims in kurdistan generosity via” the location names are present at word index of , and respectively. So, we put into those word indexes and the rest as .

3.5 Loss Function and Optimizer

A loss function is used to calculate the difference between the actual and predicted values at the output layer. This loss is then back-propagated

(Hecht-Nielsen, 1992) through the output layer to adjust the weight of neurons in the network. In our case, we have a multi-labeled dataset with more than one location name, so we used binary cross entropy loss with sigmoid activation function at the output layer (Nam et al., 2014)

for the multi-labeled datasets. Binary cross entropy loss and sigmoid function are defined as:

where, represents the total number of labels in the tweet, , and

represents the actual and predicted values of the network respectively. We used Adaptive Moment Estimation (Adam) optimizer

(Kingma & Ba, 2014) to adjust the weights by back-propagating the calculated loss.

3.6 Hyper-parameter setting

We define several hyper-parameters for the proposed CNN network, which can be seen in Table 3

. In our preliminary analysis, we first experimented with the several variations of optimization algorithms, such as the Stochastic Gradient Descent (SGD), RMSProp, and Adam, by keeping all hyper-parameters constant as listed in Table

3

. We found that Adam, with binary cross entropy, produces the lowest training loss, so we used Adam for the proposed model. Next, we experimented with the numbers 64, 128, and 256 of each filter size. The best result was found in the case of 128 filters for each filter size with the max pooling operation having a window size of 5. The proposed model was again tested with a batch size of 50, 100, and 150. A better result was found in the case of batch size 50. We tested the model with a dropout value of 0.2, 0.3, and 0.5; the performance of the model was better in the case of the 0.2 dropout. Similarly, the system was tested by varying the epoch sizes; the performance of the model did not affect as much after 100 epochs, so we fixed the number of epochs to 100 for all our other experiments.

Description Values
Filter region size 2,3,4
Feature maps 128
Pooling window size 5
Pooling Max pooling
Activation function ReLu
Dense layer 60 neurons
Dropout rate 0.2
Learning rate 0.001
Batch size 50
Epochs 100
Table 3: Description of the Hyper-parameters

4 Result

We performed several experiments to evaluate our proposed model and extract location words from the earthquake-related tweets. To minimize the bias, we used 10-fold cross validation (Kohavi et al., 1995). It is a technique to randomly partition the data sample into ten equal subsamples in which one subsample is used to validate the system, whereas the remaining nine subsamples are used to train the model. This process is repeated ten times, with each of the ten subsamples used just once as the validation data. The results from each of the folds are averaged to estimate the overall system performance. According to our observation, in most of the tweets, the number of words is 60 at the most. Hence, we used 60 neurons at the input layer to represent the words of each tweet and 60 neurons at the output layer to encode the presence or absence of location references.

4.1 Evaluation metrics

To evaluate the proposed model, we used Precision, Recall, -score, the Hamming loss, the Jaccard similarity, and the Exact matching score. These metrics are widely used in the case of a multi-labeled dataset (Charte & Charte, 2015). Say a multi-labeled dataset contains a total of instances; each instance can be represented as , where is the set of attributes. is the set of labels, where represents the total number of labels used in the dataset. Suppose and represents the subset of true and predicted labels respectively for the instance, then the metrics can be described for the instance by the given formulae.

  • Precision: This is the number of accurately predicted location words to the total number of predicted location words. It is computed as given in equation 5. The range of precision varies between 0 and 1, where 1 is the best and 0 is the worst value.

    (5)
  • Recall: This is the number of accurately predicted location words to the total number of actual location words in the tweet. It is computed as given in equation 6. The range of recall varies between 0 and 1, where 1 is the best and 0 is the worst value.

    (6)
  • -score:

    This is the harmonic mean between Precision and Recall, which gives the balanced evaluation between them. It can be represented by equation

    7. The range of -score varies between 0 and 1, where 1 is the best and 0 is the worst value.

    (7)
  • Hamming Loss: This is the number of the wrong predictions to the total number of predictions. It is calculated by equation 8. The indicator function returns 1 when the expression is true, otherwise it returns to 0. The range of Hamming loss varies between 0 and 1, where 0 is the best and 1 is the worst value.

    (8)
  • Jaccard similarity: This is the number of accurately predicted location words to the union of actual and predicted location words. It is represented in equation 9

    . The range of the Jaccard index varies between 0 and 1, where 1 is the best and 0 is the worst value.

    (9)
  • Exact Matching score: This can be computed by equation 10. The indicator function returns to 1 if all the predicted location and non-location words are as true as the actual one, otherwise it returns to 0.

    (10)
Approach Filter size Precision Recall F1-score Hamming loss Jaccard similarity Exact matching
2 0.57 0.50 0.52 0.018 0.531 0.429
3 0.60 0.45 0.50 0.016 0.551 0.473
4 0.62 0.47 0.52 0.015 0.559 0.475
5 0.65 0.47 0.53 0.015 0.568 0.484
2,3 0.97 0.86 0.90 0.003 0.900 0.849
2,4 0.95 0.89 0.91 0.003 0.924 0.886
2,5 0.96 0.92 0.94 0.002 0.948 0.918
2-CNN+2-Dense+Dropout 3,4 0.97 0.89 0.92 0.003 0.914 0.875
3,5 0.98 0.93 0.95 0.002 0.949 0.922
4,5 0.97 0.90 0.92 0.003 0.926 0.890
2,3,4 0.97 0.93 0.95 0.002 0.953 0.924
2,3,5 0.98 0.92 0.94 0.002 0.938 0.906
2,4,5 0.98 0.91 0.94 0.002 0.949 0.925
3,4,5 0.98 0.90 0.93 0.002 0.926 0.892
2,3,4,5 0.99 0.91 0.94 0.002 0.934 0.892
Table 4: Result of the 2-CNN and 2-Dense with dropout model with combinations of 2-, 3-, 4-, and 5-gram filter size

Say there is a tweet that says “very strong earthquake felt here, kermadec island, new zealand”. This tweet has four location words kermadec, island, new, and zealand occurring at word positions 6, 7, 8, and 9 respectively. So the real output can be encoded as = [0 0 0 0 0 1 1 1 1]. In case 1, our system predicted the output = [0 0 0 0 1 0 0 1 1]. In this prediction, two location words new, and zealand are predicted correctly, while two location words kermadec, and island are wrongly predicted as non-location words. One non-location word here

was wrongly predicted as a location word. So, from the definition of evaluation metrics, precision = number of accurately predicted location words [at position (8, 9)]/number of predicted location words [at position (5, 8, 9)] = 2/3 = 0.66, recall = number of accurately predicted location words [at position (8, 9)]/number of actual location words [at position (6, 7, 8, 9)] = 2/4 = 0.5,

-score = harmonic mean of precision and recall = = 0.57, hamming loss = number of wrong prediction [at position (5, 6, 7)]/ total number of prediction = 3/9 = 0.33, Jaccard similarity = number of accurately predicted location words [at position (8, 9)]/union of actual and predicted location words [at position (5, 6, 7, 8, 9)] = 2/5 = 0.4, Exact matching score = 0 (as the total location and non-location words are not correctly predicted). In case 2, if system predicted the following output = [0 0 0 0 0 1 1 1 1], means all location and non-location words correctly predicted. Then, precision = 3/3 = 1.0, recall = 4/4 = 1.0, -score = = 1.0, hamming loss = 9/9 = 1.0, Jaccard similarity = 4/4 = 1.0, Exact matching score = 1.0 (as location and non-location words are correctly predicted).

4.1.1 Filter size estimation

Since the tweet contains more than one location word, determining the suitable filter size is a primary concern. We started experimentation with the different models: (i) 1-CNN + 2-Dense; (ii) 1-CNN + 2-Dense with Dropout; (iii) 2-CNN + 2-Dense; and (iv) 2-CNN + 2-Dense with Dropout. We used different combinations of 2-gram, 3-gram, 4-gram, and 5-gram filters with each of the models to extract the features from the tweet. The best-performing model was found to be 2-CNN + 2-Dense with dropout. The results of this model for different filter sizes are listed in Table 4. In the analysis, we found that the use of a single filter size was not adequate as all the models performed very badly. It can be seen from Table 4, that the use of individual 2-gram, 3-gram, 4-gram, and 5-gram filters did not perform well as it yielded the -score of 0.52, 0.50, 0.52, and 0.53 respectively. The use of the two filters together performed better than the individual filters. The best result was achieved when we used 2-gram, 3-gram, and 4-gram filters together, which yielded the precision of 0.97, a recall of 0.93, an -score of 0.95, a hamming loss of 0.002, a Jaccard similarity of 0.953, and an exact matching score of 0.924. So, for all further experiments, we fixed the filter sizes of 2-gram, 3-gram, and 4-gram and used them together.

4.1.2 Effect of varying the dense layers

To determine the suitable number of dense layers, we experimented with ten different combinations of CNN, dense, and dropout layers. The results of different combinations are tabulated in Table 5. As can be seen from Table 5, two dense layers yielded the best performance metrics with 2-CNN + 2-Dense + Dropout model. So for further experimentation, we fixed the number of dense layers to two.

Approach Precision Recall -score Hamming Loss Jaccard Similarity Exact matching
1-CNN + 1-Dense (Baseline) 0.83 0.77 0.79 0.005 0.843 0.800
1-CNN + 2-Dense 0.85 0.76 0.79 0.007 0.815 0.751
1-CNN + 2-Dense +Dropout 0.91 0.83 0.86 0.004 0.886 0.840
1-CNN + 3-Dense 0.89 0.81 0.84 0.005 0.860 0.810
1-CNN + 3-Dense + Dropout 0.91 0.84 0.86 0.004 0.887 0.859
2-CNN + 1-Dense 0.95 0.88 0.90 0.003 0.906 0.849
2-CNN + 2-Dense 0.94 0.86 0.89 0.003 0.898 0.843
2-CNN + 2-Dense + Dropout 0.97 0.93 0.95 0.002 0.953 0.924
2-CNN + 3-Dense 0.95 0.87 0.90 0.003 0.899 0.849
2-CNN + 3-Dense + Dropout 0.96 0.90 0.92 0.003 0.933 0.900
Table 5: Result of the proposed Convolutional Neural Network- (CNN) based model with filter size of 2-gram, 3-gram, and 4-gram.

4.1.3 Effect of varying the CNN layers

The number of CNN layers was the most important parameter to decide, as the addition of one more CNN layer in 1-CNN outperformed all previous 1-CNN models. So, we did further experimentation by adding more CNN layers in the model. In order to do that, we developed models starting from 1-CNN to 4-CNN with two dense layers and filter sizes of 2-grams, 3-grams, and 4-grams together. The result of 1-CNN, 2-CNN, 3-CNN, and 4-CNN with the combination of 2-Dense and dropout are listed in Table 6. We found that the use of 3-CNN yielded the best result. Our final model can be summarized as: (i) Number of CNN layers = 3; (ii) Filter size = 2, 3, 4; (iii) Number of dense layers = 2; and (iv) and Dropout = 0.2.

Approach Precision Recall -score Hamming Loss Jaccard Similarity Exact matching
1-CNN + 2-Dense +Dropout 0.91 0.83 0.86 0.004 0.886 0.840
2-CNN + 2-Dense + Dropout 0.97 0.93 0.95 0.002 0.953 0.924
3-CNN + 2-Dense + Dropout 0.98 0.95 0.96 0.002 0.962 0.929
4-CNN + 2-Dense + Dropout 0.98 0.94 0.96 0.002 0.956 0.927
Table 6: Result of the different combinations of CNN layers with filter sizes of 2-grams, 3-grams, and 4-grams

5 Discussion and Implications

In this work, we proposed a Convolutional Neural Network- (CNN) based model that learns the salient features from tweets to predict the location references mentioned in it. The proposed CNN-based models could extract the location references from the tweets with significant accuracy. In our case, the proposed CNN-based model can find the location information of almost every granularity, such as streets, buildings, the city, district, and country name with very significant accuracy. The use of 2-gram, 3-gram, and 4-gram filters together with the 3-CNN and 2-Dense with a dropout performed best with the precision of 0.98, a recall of 0.95, an -score of 0.96, a hamming loss of 0.002, a Jaccard similarity of 0.962, and the exact matching score of 0.929. The precision of 0.98 means we can extract 98 true location-referring words out of 100 location word predictions. The recall of 0.95 means we can identify 95 location-referring words out of the total 100 real location references contained in the tweet. The use of n-gram features played a major role in finding location references. The proposed system did not perform well when we applied 2-gram, 3-gram, 4-gram, and 5-gram filters individually. The use of 2-gram, 3-gram, and 4-gram together performed best, because the several location-referring names have more than one word in a tweet, as can be seen in Table 1. As a single location name can have more than one words to represent it, finding all the words that refer to a single location is very important. In many cases, individual words of a single location name do not preserve any meaning in terms of location. For example, the tweet “News Moderate earthquake 5.8 mag, 91 km S of Raoul Island, New Zealand - BREAKING http://ow.ly/oFAE50ga6z2 had four location references: Raoul, Island, New, and Zealand. If the system cannot predict all four location references together, then it is of no use as the individual words Raoul, Island, New, and Zealand do not have any meaning in terms of location reference. In the analysis, we also found that some of the tweets have five or more location words, but most of the times they were not consecutive. They were mostly with # sign and were distributed across the whole text. This can be one of the reasons why the use of 5-gram or more than 5-gram filter with 2-gram, 3-gram, and 4-gram did not improve the performance. We used the exact matching metric to measure the system performance. The exact matching was set to true if all the predicted location-referring word and non-location-referring words of a tweet are totally matched with the actual location-referring words and non-location-referring word respectively. Although the exact matching is a very strict evaluation metric, the proposed system can identify all the location references with the 92.9% exact match.

Some similar works have been reported for location word extraction from tweets. However, due to the inaccessibility of their dataset, we cannot directly compare our results with them. Hence, the comparison is merely in terms of the accuracy achieved over their datasets. Malmasi & Dras (2015) used 3,000 tweets related to general topics and found an -score of 0.792. Similarly, Gelernter & Balaji (2013) used 3,987 tweets related to earthquakes and found an -score of 0.90, and Lingad et al. (2013) used 2,878 tweets related to several disasters, such as the Queensland flood 2012, the Christchurch earthquake 2012, and the England riots of 2011 and achieved an -score of 0.902. In line with their studies, our system, which was validated by 5,107 tweets, achieved an -score of 0.96 in finding location references mentioned in the tweets. Some other works (Nguyen et al., 2017b; Perol et al., 2018) also utilized convolutional neural networks in case of the earthquake, but their objectives were different. Nguyen et al. (2017b) used CNN to classify tweets into relevant and non-relevant classes and proposed an algorithm for timely detection of the earthquake. Perol et al. (2018) used CNN to detect the earthquake from seismic waves. The proposed convolutional neural network-based model is independent of manual feature engineering. This system works well even in the situation of noisy tweet texts. We preserved the privacy of users during this work as we only used tweet text and removed all other metadata and mentioned user names in the tweets. The main theoretical implication of this work is in the development of models based on the convolutional neural network for geographical location estimation without going through POS tagging as when using conventional Named Entity Recognition tools. The major practical implication of this work is in: (i) early event localization; (ii) finding the location of victims; and (iii) the rescue and relief operation. Irrespective of this, the proposed system can be utilized in case of civil unrest, targeted advertising, observing regional human behavior, real-time road traffic management, and in various location-based services. The proposed system can be easily integrated with event detection models.

6 Conclusion

The extraction of location information from the tweets is a challenging task as tweets have various noise in terms of grammatical mistakes, spelling mistakes, and non-standard abbreviations. We have proposed a convolutional neural network-based model for finding location references present in the tweets. We used earthquake-related tweets and performed our implementation with several configurations of convolution layers with the dense layers. We achieved our best result with an -score of 0.96 when we used 3-CNN and 2-Dense layers with dropout. We can find location information of several granularities, such as streets, buildings, city, district, and country name with very impressive accuracy. This system can be utilized in other domain, such as road traffic management and sports events, in several location-based services by training it with domain-specific tweets. The trained models can even be integrated with mobile devices to also find location information of tweets on the fly. The limitation of this work is related to the manual labeling of location references in the tweets, as it requires huge human effort. A semi-supervised approach can be used to reduce the task of manual labeling to some extent. In this work, we only considered English language tweets, but users also post tweets in their regional languages during a crisis. So, a deep neural network-based model can be created to deal with the multi-linguality issues of tweets.

References

References

  • Ajao et al. (2015) Ajao, O., Hong, J., & Liu, W. (2015). A survey of location inference techniques on twitter. Journal of Information Science, 41, 855–864.
  • Al-Olimat et al. (2017) Al-Olimat, H. S., Thirunarayan, K., Shalin, V., & Sheth, A. (2017). Location name extraction from targeted text streams using gazetteer-based statistical language models. arXiv preprint arXiv:1708.03105, .
  • Alexander (2014) Alexander, D. E. (2014). Social media in disaster risk reduction and crisis management. Science and engineering ethics, 20, 717–733.
  • Atefeh & Khreich (2015) Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in twitter. Computational Intelligence, 31, 132–164.
  • Blei et al. (2003) Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3, 993–1022.
  • Charte & Charte (2015) Charte, F., & Charte, D. (2015). Working with multilabel datasets in r: the mldr package. R J, 7, 149–162.
  • Cheng et al. (2010) Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 759–768). ACM.
  • Chiu & Nichols (2015) Chiu, J. P., & Nichols, E. (2015). Named entity recognition with bidirectional lstm-cnns. arXiv preprint arXiv:1511.08308, .
  • Chowdhury et al. (2013) Chowdhury, S. R., Imran, M., Asghar, M. R., Amer-Yahia, S., & Castillo, C. (2013). Tweet4act: Using incident-specific profiles for classifying crisis-related messages. In ISCRAM. Citeseer.
  • Collobert et al. (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.
  • Däniken & Cieliebak (2017) Däniken, P., & Cieliebak, M. (2017). Transfer learning and sentence level features for named entity recognition on tweets. In

    Proceedings of the 3rd Workshop on Noisy User-generated Text

    (pp. 166–171).
  • Do et al. (2017) Do, T. H., Nguyen, D. M., Tsiligianni, E., Cornelis, B., & Deligiannis, N. (2017). Multiview deep learning for predicting twitter users’ location. arXiv preprint arXiv:1712.08091, .
  • Dutt et al. (2018) Dutt, R., Hiware, K., Ghosh, A., & Bhaskaran, R. (2018). Savitr: A system for real-time location extraction from microblogs during emergencies. arXiv preprint arXiv:1801.07757, .
  • Gelernter & Balaji (2013) Gelernter, J., & Balaji, S. (2013). An algorithm for local geoparsing of microtext. GeoInformatica, 17, 635–667.
  • Gelernter & Mushegian (2011) Gelernter, J., & Mushegian, N. (2011). Geo-parsing messages from microtext. Transactions in GIS, 15, 753–773.
  • Giridhar et al. (2015) Giridhar, P., Abdelzaher, T., George, J., & Kaplan, L. (2015). On quality of event localization from social network feeds. In Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on (pp. 75–80). IEEE.
  • Goldberg (2016) Goldberg, Y. (2016). A primer on neural network models for natural language processing. J. Artif. Intell. Res.(JAIR), 57, 345–420.
  • Gu et al. (2016) Gu, Y., Qian, Z. S., & Chen, F. (2016). From twitter to detector: Real-time traffic incident detection using social media data. Transportation research part C: emerging technologies, 67, 321–342.
  • Hecht et al. (2011) Hecht, B., Hong, L., Suh, B., & Chi, E. H. (2011). Tweets from justin bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 237–246). ACM.
  • Hecht-Nielsen (1992) Hecht-Nielsen, R. (1992).

    Theory of the backpropagation neural network.

    In Neural networks for perception (pp. 65–93). Elsevier.
  • Hoang et al. (2016) Hoang, T., Cher, P. H., Prasetyo, P. K., & Lim, E.-P. (2016). Crowdsensing and analyzing micro-event tweets for public transportation insights. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2157–2166). IEEE.
  • Hofmann (1999) Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289–296). Morgan Kaufmann Publishers Inc.
  • Huang et al. (2014) Huang, Q., Cao, G., & Wang, C. (2014). From where do tweets originate?: a gis approach for user location inference. In Proceedings of the 7th ACM SIGSPATIAL International Workshop on Location-Based Social Networks (pp. 1–8). ACM.
  • Huang et al. (2015) Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, .
  • Ikawa et al. (2012) Ikawa, Y., Enoki, M., & Tatsubori, M. (2012). Location inference using microblog messages. In Proceedings of the 21st International Conference on World Wide Web (pp. 687–690). ACM.
  • Imran et al. (2015) Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: A survey. ACM Computing Surveys (CSUR), 47, 67.
  • Imran et al. (2014a) Imran, M., Castillo, C., Lucas, J., Meier, P., & Rogstadius, J. (2014a). Coordinating human and machine intelligence to classify microblog communications in crises. In ISCRAM.
  • Imran et al. (2014b) Imran, M., Castillo, C., Lucas, J., Meier, P., & Vieweg, S. (2014b). Aidr: Artificial intelligence for disaster response. In Proceedings of the 23rd International Conference on World Wide Web (pp. 159–162). ACM.
  • Imran et al. (2013) Imran, M., Elbassuoni, S., Castillo, C., Diaz, F., & Meier, P. (2013). Extracting information nuggets from disaster-related messages in social media. In Iscram.
  • Itoh et al. (2016) Itoh, M., Yoshinaga, N., & Toyoda, M. (2016). Spatio-temporal event visualization from a geo-parsed microblog stream. In Companion Publication of the 21st International Conference on Intelligent User Interfaces (pp. 58–61). ACM.
  • Jurgens et al. (2015) Jurgens, D., Finethy, T., McCorriston, J., Xu, Y. T., & Ruths, D. (2015). Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. ICWSM, 15, 188–197.
  • Kalchbrenner et al. (2014) Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188, .
  • Karimi & Yin (2012) Karimi, S., & Yin, J. (2012). Microtext annotation. Technical Report Technical Report EP13703, CSIRO.
  • Kingma & Ba (2014) Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, .
  • Kohavi et al. (1995) Kohavi, R. et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (pp. 1137–1145). Montreal, Canada volume 14.
  • Kumar et al. (2017) Kumar, A., Singh, J. P., & Rana, N. P. (2017). Authenticity of geo-location and place name in tweets. In 23rd Americas Conference on Information Systems (AMCIS).
  • Lample et al. (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, .
  • Landwehr et al. (2016) Landwehr, P. M., Wei, W., Kowalchuck, M., & Carley, K. M. (2016). Using tweets to support disaster planning, warning and response. Safety science, 90, 33–47.
  • Laylavi et al. (2016) Laylavi, F., Rajabifard, A., & Kalantari, M. (2016). A multi-element approach to location inference of twitter: A case for emergency response. ISPRS International Journal of Geo-Information, 5, 56.
  • Laylavi et al. (2017) Laylavi, F., Rajabifard, A., & Kalantari, M. (2017). Event relatedness assessment of twitter messages for emergency response. Information Processing & Management, 53, 266–280.
  • Li & Sun (2014) Li, C., & Sun, A. (2014). Fine-grained location extraction from tweets with temporal awareness. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (pp. 43–52). ACM.
  • Li et al. (2012) Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., & Lee, B.-S. (2012). Twiner: named entity recognition in targeted twitter stream. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 721–730). ACM.
  • Limsopatham & Collier (2016) Limsopatham, N., & Collier, N. H. (2016). Bidirectional lstm for named entity recognition in twitter messages, .
  • Lingad et al. (2013) Lingad, J., Karimi, S., & Yin, J. (2013). Location extraction from disaster-related microblogs. In Proceedings of the 22nd international conference on world wide web (pp. 1017–1020). ACM.
  • Liu et al. (2013) Liu, X., Wei, F., Zhang, S., & Zhou, M. (2013). Named entity recognition for tweets. ACM Transactions on Intelligent Systems and Technology (TIST), 4, 3.
  • Liu et al. (2011) Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 359–367). Association for Computational Linguistics.
  • Lourentzou et al. (2017) Lourentzou, I., Morales, A., & Zhai, C. (2017). Text-based geolocation prediction of social media users with neural networks. In Big Data (Big Data), 2017 IEEE International Conference on (pp. 696–705). IEEE.
  • Luna & Pennock (2018) Luna, S., & Pennock, M. J. (2018). Social media applications and emergency management: A literature review and research agenda. International Journal of Disaster Risk Reduction, .
  • Malmasi & Dras (2015) Malmasi, S., & Dras, M. (2015). Location mention detection in tweets and microblogs. In International Conference of the Pacific Association for Computational Linguistics (pp. 123–134). Springer.
  • Mejri et al. (2017) Mejri, O., Menoni, S., Matias, K., & Aminoltaheri, N. (2017). Crisis information to support spatial planning in post disaster recovery. International Journal of Disaster Risk Reduction, 22, 46–61.
  • Mendoza et al. (2010) Mendoza, M., Poblete, B., & Castillo, C. (2010). Twitter under crisis: Can we trust what we rt? In Proceedings of the first workshop on social media analytics (pp. 71–79). ACM.
  • Middleton et al. (2018) Middleton, S., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y. (2018). Location extraction from social media:: geoparsing, location disambiguation and geotagging. ACM Transactions on Information Systems, .
  • Middleton et al. (2014) Middleton, S. E., Middleton, L., & Modafferi, S. (2014). Real-time crisis mapping of natural disasters using social media. IEEE Intelligent Systems, 29, 9–17.
  • Mikolov et al. (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
  • Morstatter et al. (2013) Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? comparing data from twitter’s streaming api with twitter’s firehose. In ICWSM.
  • Nair & Hinton (2010) Nair, V., & Hinton, G. E. (2010).

    Rectified linear units improve restricted boltzmann machines.

    In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814).
  • Nakaji & Yanai (2012) Nakaji, Y., & Yanai, K. (2012). Visualization of real-world events with geotagged tweet photos. In Multimedia and Expo Workshops (ICMEW), 2012 IEEE International Conference on (pp. 272–277). IEEE.
  • Nam et al. (2014) Nam, J., Kim, J., Mencía, E. L., Gurevych, I., & Fürnkranz, J. (2014). Large-scale multi-label text classification—revisiting neural networks. In Joint european conference on machine learning and knowledge discovery in databases (pp. 437–452). Springer.
  • Nguyen et al. (2017a) Nguyen, D. T., Al-Mannai, K., Joty, S. R., Sajjad, H., Imran, M., & Mitra, P. (2017a). Robust classification of crisis-related data on social networks using convolutional neural networks. In ICWSM (pp. 632–635).
  • Nguyen et al. (2017b) Nguyen, V. Q., Yang, H.-J., Kim, K., & Oh, A.-R. (2017b). Real-time earthquake detection using convolutional neural network and social data. In Multimedia Big Data (BigMM), 2017 IEEE Third International Conference on (pp. 154–157). IEEE.
  • Olteanu et al. (2014) Olteanu, A., Castillo, C., Diaz, F., & Vieweg, S. (2014). Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In ICWSM.
  • Olteanu et al. (2015) Olteanu, A., Vieweg, S., & Castillo, C. (2015). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 994–1009). ACM.
  • Ozdikis et al. (2017) Ozdikis, O., Oğuztüzün, H., & Karagoz, P. (2017). A survey on location estimation techniques for events detected in twitter. Knowledge and Information Systems, 52, 291–339.
  • Panteras et al. (2015) Panteras, G., Wise, S., Lu, X., Croitoru, A., Crooks, A., & Stefanidis, A. (2015). Triangulating social multimedia content for event localization using flickr and twitter. Transactions in GIS, 19, 694–715.
  • Pennington et al. (2014) Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
  • Perol et al. (2018) Perol, T., Gharbi, M., & Denolle, M. (2018). Convolutional neural network for earthquake detection and location. Science Advances, 4, e1700578.
  • Qian et al. (2017) Qian, Y., Tang, J., Yang, Z., Huang, B., Wei, W., & Carley, K. M. (2017). A probabilistic framework for location inference from social media. arXiv preprint arXiv:1702.07281, .
  • Ritter et al. (2011) Ritter, A., Clark, S., Etzioni, O. et al. (2011). Named entity recognition in tweets: an experimental study. In Proceedings of the conference on empirical methods in natural language processing (pp. 1524–1534). Association for Computational Linguistics.
  • Rossi et al. (2018) Rossi, C., Acerbo, F., Ylinen, K., Juga, I., Nurmi, P., Bosca, A., Tarasconi, F., Cristoforetti, M., & Alikadic, A. (2018). Early detection and information extraction for weather-induced floods using social media streams. International Journal of Disaster Risk Reduction, .
  • Sakaki et al. (2013) Sakaki, T., Okazaki, M., & Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25, 919–931.
  • Sankaranarayanan et al. (2009) Sankaranarayanan, J., Samet, H., Teitler, B. E., Lieberman, M. D., & Sperling, J. (2009). Twitterstand: news in tweets. In Proceedings of the 17th acm sigspatial international conference on advances in geographic information systems (pp. 42–51). ACM.
  • Shibuya (2017) Shibuya, Y. (2017). Mining social media for disaster management: Leveraging social media data for community recovery. In Big Data (Big Data), 2017 IEEE International Conference on (pp. 3111–3118). IEEE.
  • Sikdar & Gambäck (2016) Sikdar, U. K., & Gambäck, B. (2016). Feature-rich twitter named entity recognition and classification. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT) (pp. 164–170).
  • Singh et al. (2017) Singh, J. P., Dwivedi, Y. K., Rana, N. P., Kumar, A., & Kapoor, K. K. (2017). Event classification and location prediction from tweets during disasters. Annals of Operations Research, (pp. 1–21).
  • Socher et al. (2013) Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631–1642).
  • Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15, 1929–1958.
  • Temnikova et al. (2015) Temnikova, I., Vieweg, S., & Castillo, C. (2015). The case for readability of crisis communications in social media. In Proceedings of the 24th international conference on world wide web (pp. 1245–1250). ACM.
  • Unankard et al. (2015) Unankard, S., Li, X., & Sharaf, M. A. (2015). Emerging event detection in social networks with location sensitivity. World Wide Web, 18, 1393–1417.
  • Vieweg et al. (2010) Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010). Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1079–1088). ACM.
  • Xu et al. (2015) Xu, W., Chow, C.-Y., Yiu, M. L., Li, Q., & Poon, C. K. (2015). Mobifeed: A location-aware news feed framework for moving users. GeoInformatica, 19, 633–669.
  • Yang et al. (2017) Yang, Z., Nguyen, L. H., Stuve, J., Cao, G., & Jin, F. (2017). Harvey flooding rescue in social media. In Big Data (Big Data), 2017 IEEE International Conference on (pp. 2177–2185). IEEE.
  • Yuan & Liu (2018) Yuan, F., & Liu, R. (2018). Feasibility study of using crowdsourcing to identify critical affected areas for rapid damage assessment: Hurricane matthew case study. International journal of disaster risk reduction, 28, 758–767.
  • Yuan et al. (2013) Yuan, Q., Cong, G., Ma, Z., Sun, A., & Thalmann, N. M. (2013). Who, where, when and what: discover spatio-temporal topics for twitter users. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 605–613). ACM.
  • Zhang & Gelernter (2014) Zhang, W., & Gelernter, J. (2014). Geocoding location expressions in twitter messages: A preference learning method. Journal of Spatial Information Science, 2014, 37–70.
  • Zheng et al. (2018) Zheng, X., Han, J., & Sun, A. (2018). A survey of location prediction on twitter. IEEE Transactions on Knowledge and Data Engineering, .
  • Zhou et al. (2017) Zhou, L., Wu, X., Xu, Z., & Fujita, H. (2017). Emergency decision making for natural disasters: An overview. International Journal of Disaster Risk Reduction, .