Semantic-based End-to-End Learning for Typhoon Intensity Prediction

03/22/2020 ∙ by Hamada M. Zahera, et al. ∙ Universität Paderborn 0

Disaster prediction is one of the most critical tasks towards disaster surveillance and preparedness. Existing technologies employ different machine learning approaches to predict incoming disasters from historical environmental data. However, for short-term disasters (e.g., earthquakes), historical data alone has a limited prediction capability. Therefore, additional sources of warnings are required for accurate prediction. We consider social media as a supplementary source of knowledge in addition to historical environmental data. However, social media posts (e.g., tweets) is very informal and contains only limited content. To alleviate these limitations, we propose the combination of semantically-enriched word embedding models to represent entities in tweets with their semantic representations computed with the traditionalword2vec. Moreover, we study how the correlation between social media posts and typhoons magnitudes (also called intensities)-in terms of volume and sentiments of tweets-. Based on these insights, we propose an end-to-end based framework that learns from disaster-related tweets and environmental data to improve typhoon intensity prediction. This paper is an extension of our work originally published in K-CAP 2019 [32]. We extended this paper by building our framework with state-of-the-art deep neural models, up-dated our dataset with new typhoons and their tweets to-date and benchmark our approach against recent baselines in disaster prediction. Our experimental results show that our approach outperforms the accuracy of the state-of-the-art baselines in terms of F1-score with (CNN by12.1 experiments



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Disaster prediction and early warnings are crucial when mitigating the impact of disasters and consequent damage [8]. Even with the significant improvements in forecasting and warning systems, there are many factors that still limit the accuracy of the current prediction algorithms such as: the lack of complete data on natural hazards, monitoring instruments and the highly dynamic nature of natural hazards [21]. Interestingly, social media plays an increasingly significant role in disaster management and communication [22]. People use social media during disasters to share their feelings, ask for help and provide disaster relief efforts.

A significant body of research has hence leveraged shared disaster-related information in social media to reduce the impact of disasters and deliver faster responses [12, 10]. For instance, the authors of [24] analyzed user tweets during 25 different earthquakes in Japan, where they demonstrated how social media users can act as a reliable source to provide real-time situational updates during disasters. On the other hand, decision makers use social media to engage with the public quickly and widely. For example, during typhoon Pablo in 2012, local authorities in the Philippines asked people to use the hashtag #pabloph for getting or sharing on-site updates about the typhoon [26].

Such correlations are valuable for supporting decision makers in emergency response processes. Previous works used data mining techniques to extract such correlation. For instance, [3] applied a wavelet analysis to track the disaster progression from social media data. Their results showed that wavelet-based features can preserve text semantics and predict the total duration for localized small-scale disasters.

In this work, we propose an end-to-end learning model to classify the intensity of typhoons (also called a typhoon’s category or class 

[6]) by learning from environmental data and social media (tweets). We were inspired by previous works (see, e.g., [27, 18]) which suggest that the joint learning of multiple models can significantly outperform standalone models. Our proposed approach consists of two jointly-trained models. The first model (dubbed Feature Extractor

) analyzes typhoon-related tweets and computes statistical features (i.e., tweets volume and sentiments variances). To capture tweet sentiments, we employ a semantics-enriched word embedding in which


are recognized and represented as semantics vectors. The second model (dubbed

Typhoon Classifier

) takes an input of the combined features extracted by the first model and the environmental data. Both models are trained jointly through a shared loss function and their learning parameters are optimized using the same gradient descent.

We evaluate our joint model in experiments on two real sources of data. First, we use environmental data of typhoons tracked by the Joint Typhoon Warning Center (JTWC)

. The dataset contains measurements of climate changes (e.g., wind speed and pressure of sea level) before, during and after typhoon landfall. As a second dataset, we rely on typhoon-related tweets collected using keyword-based queries executed during periods of typhoons between 2006-2018. We employ different architectures based on Deep Neural Networks (DNN), Deep Convolutional Network (CNN) and Recurrent Neural Networks (e.g, RNN, LSTM and BiLSTM) as our baseline approaches. Our results suggest that our jointly-trained models outperform these baseline solutions in disaster prediction. We summarize the main contributions in this paper as follows:

  1. We propose a generic end-to-end framework that improves the overall system performance via joint learning.

  2. We conduct several experiments on a real disaster dataset to evaluate the performance of our proposed approaches. Our results clearly show that our proposed framework outperforms the state-of-the-art standalone baselines significantly.

  3. We studied the impact of incorporating semantics embedding from knowledge graphs to enrich tweets representation. Our experiments show that feeding our model with semantics representation of the entities included in the tweets improved the system overall performance.

  4. We provide an updated version our disaster dataset (TED), which includes typhoons environmental data and their tweets up-to 2018 (the last archived date by JTWC).

All our implementations are open-source and available from the project website


2 Data and Preliminaries

In this section, we discuss our social media analysis during different typhoons and the dataset collection and preprocessing.

2.1 Social Media Content Analysis During Typhoons

Typhoon environmental data are tracked periodically (i.e. at regular time intervals) before, during and after they strike. The goal of our work is to detect the intensity of typhoons not only based on such environmental data but also based on data collected by humans in the form of social media posts during each of the typhoons under study.

To pair social media with the environmental data of typhoons, we collect all tweets posted within the time slot of the respective environmental data into one batch. Inspired by the work of [11, 9], we thus analyzed the volume of tweets as well as sentiments during different time slots of typhoons. As shown in Figure 1, the upper row of plots depicts our content analysis of tweets during four different typhoons, where we explored how typhoon intensities vary during typhoon days. In the middle row of plot of the same figure, we present the count of tweets within the same time slots of the provided intensity in the upper row. Finally, the lower 4 plots present count of tweets with positive (in blue) and negative (in yellow) sentiment. Comparing the respective plots of intensity, tweets count and sentiments for each of the 4 typhoons, we are able to see the correlation between typhoon’s intensity and tweets count and sentiment. To this end, we use the count of tweets and their sentiments as additional indicators for predicting typhoon intensity.

Figure 1: Our analysis of social media during the typhoons HAGUIT, HAIYAN, RAMMASUN, and SANBA

2.2 Dataset

We used three datasets to evaluate the performance of our proposed models (details of each dataset are presented in Table 1).

  • JTWC Best-tracked (typhoons environmental data). This dataset consists of tracked data points for 70 typhoons between 2006-2016. Each data track is labelled into classes (TD: tropical depression, TS: tropical storm, TY: typhoon and ST: super typhoon). The typhoon data also includes time-stamp, location, maximum wind speed (VMAX), wind intensity (RAD) and sea level pressure (MSLP). Note that we removed noisy and corrupted data (e.g., missing values). Then, we handled the problem of imbalance classes using the SMOTE technique [5] and over-sampled minority classes.

  • Typhoon Tweets. To search and retrieve typhoon-related tweets, we executed keywords queries based on related typhoon terminology such as typhoon and typhoon names (e.g., Haiyan). The official Twitter streaming API limits free access222 to tweets into only the past 7 days. However, we were able to get all tweets using the open-source library GetOldTweets-python333

  • Stanford NLP Sentiment140. We used the Stanford sentiment dataset to enrich our model’s performance in sentiment detection. This dataset contains million labeled tweets with binary sentiments (positive/negative).

Dataset Training Testing Classes
JWTC Best-Track 2,529 633 4
Typhoon Tweets 1,052,599 270,364 unlabeled
Sentiment140 1,280,000 320,000 2
Table 1: Details of datasets.

Data Preprocessing

Tweets are commonly informal and often contain noisy and incomplete text. Hence, the preprocessing of tweets often involves varied techniques to achieve high-quality analysis in data mining applications. In particular, we carry out the following preprocessing steps:

  • Cleaning up.

    We remove URLs, non-ASCII characters, usernames and hashtags from all tweets. It is a common process to remove stop words in standard text preprocessing. However, in our work, we keep stop words to preserve the context of words and obtain an accurate sentiment analysis 


  • Entity recognition. There are several tools for entity extractions and semantics reasoning. We use the Spacy API444

    to annotate entities from tweets. Spacy is an open-source Natural Language Processing (NLP) library that is widely used due to its availability and preeminent accuracy in different linguistics tasks 


  • Tokenization. Our final tweet preprocessing step is to tokenize tweets into words, then convert to lower-case letters.

Example 1

The preprocessing of the tweet "My heart goes out to all those affected by Typhoon Haiyan. You can help by donating to the Philippine RED CROSS here (link:" will remove the URL at the end of the tweet and generate the following tokenized word vector: [my, heart, goes, out, to, all, those, affected, by, typhoon, haiyan, you, can, help, by, donating, to, the, philippine, red, cross, here].

3 The Approach

In this section, first we discuss the problem formulation of jointly training from social media and environmental data to improve typhoon intensities. Our approach takes two inputs: typhoon environmental data and tweet batches. Figure 2 shows the architectures (BiLSTM+CNN) for our jointly models (Feature Extractor and Typhoon Classifier). In the rest of this section, we discuss our semantics-enriched word embedding to represent tweets and joint models architectures.

Figure 2: Our Joint Model (BiLSTM+CNN) for typhoon intensity classification. The entities () vectors (in orange) are extracted from knowledge graph (e.g., ConceptNet). The words () vectors (in blue) are obtained from our word embedding.

3.1 Problem Formulation

Let be a typhoon environmental data, where is a set of typhoon observations and is a set of typhoon categories (i.e., labels, classes). Each represents an instance of a typhoon data with features (e.g., time-tamp, wind-speed, sea level pressure and gust) and each represents the respective typhoon’s category (e.g., either a tropical-depression, tropical-storm, typhoon or super-typhoon).

For each typhoon with environmental data , we collect all related tweets posted within the time-stamp of , we dub such tweets as . Further, we analyse all tweets to extract statistical features (i.e., tweets-based features) such as tweets volume and variances of tweets’ sentiments. Finally, we combine these features with typhoon’s environmental data in one input vector.

Definition 1

(Task Description.) Our goal is to design a classification model able to learn features from and in order to predict the typhoon category (). We build our classification model as a joint model of two cascaded models and dubbed feature extractor and typhoon classifier models respectively (i.e., ). To ensure the joint training of and , we combine the loss functions of both models (,) in one shared loss function as follows:


Where parameter is used to balance the individual loss functions (, ). In this paper, we set all parameters as 1. To compute the training losses, we used the cross-entropy function as a loss function as follows:


where and donate target and predicted typhoon categories respectively for typhoon instance .

Definition 2

(Joint-Learning). Let and be the learning parameters of feature extractor and typhoon classifier models. Both , are optimized concurrently as follows: Assume two consecutive batches of training data and , the learning parameters in

are optimized using the same gradient descent (e.g., ADAM optimizer) by backpropagating the gradients to both models. In the following batch (

), the computation of statistical features by feature extractor model is hence further adapted not only from the losses in its outputs, but also from the losses in the final output by typhoon classifier model. Therefore, the feature extractor model feeds adaptive features from social media to the typhoon classifier model.

3.2 Semantics-enriched Word Embedding

Our analysis of tweets during different typhoons (see Section 2.1) suggests that we can use the tweets volume and sentiments as additional features in our model. To perform sentiment analysis on crisis tweets, we used the continuous skip-gram approach [13] to train our word embedding model on typhoon tweets and the sentiment140 dataset. While generic word embeddings trained on generic large-scale datasets (e.g., Wikipedia and Google News) could have been used here, they often do not capture domain-specific knowledge and semantic nuances. In contrast, domain-adapted word embeddings are effective in the field of the context in which they are trained as they capture domain-specific knowledge [25].

Now, given a preprocessed list of words of a tweet , we map each word to its embedding vector in with dimension. Unlike classical word embeddings, we represent entities with their corresponding vectors from the semantic knowledge base of ConceptNet555, where entities and relationships are projected into the same embedding space. The motivation behind combining semantics embedding with the classical word embedding is the superior performance of semantics embedding in modern data mining applications [16].

Example 2

The semantics-enriched word embedding vector of the tweet from Example 1 would be [my, heart, goes, out, to, all, those, affected, by, typhoon, haiyan, you, can, help, by, donating, to, the, philippine, red, cross, here]. Note that, the bold words represent semantic entities.

For each input tweet, we build an embedding matrix , where is the number of words per tweet. Each row of represents the word2vec embedding of the at the corresponding position in a tweet. Our word2vec model has a dimension of and vocabulary size of words and recognized entities. Due to variable lengths of tweets, we fixed

to the average number of words per tweet to maintain a regular embedding matrix. For this reason, we truncated longer tweets and padded shorter tweets with zeros.

3.3 Feature Extractor

Feature extractor is the first model in our approach that aims to extract statistical features from tweets. To model the words sequences in tweets, we employ a bidirectional-LSTM (BiLSTM) model. First, we use an embedding look-up layer to map words to their corresponding vectors from semantics-enriched word embedding model. Then, we employ on BiLSTM layer with units and dropout rate , and one dense layer with softmax output.

Given a sequence of input words , BiLSTM see the context of word in both directions (left-to-right and right-to-left ) and summarizes information into a concatenated output vector . BiLSTM layer associates each time-stamp with an input , memory cell , forget gate and an output gate . The output vector is then computed by iterating the following equations:


Where are the weights vectors of the input, forget, memory and output gates concerned with the input vector . Respectively, are the weights vectors concerned with the previous hidden vector . are the bias terms for the four gates. and donates sigmoid function and product-wise multiplication respectively.

The outputs of the BiLSTM are probabilities of positive and negative sentiments computed by softmax function in Equation 

5. Subsequently, we extract and combine statistical features from BiLSTM outputs with typhoon data as with typhoon instances and features. represents the statistical features: tweets count, variance of negative sentiments and variance of positive sentiments. The variances of sentiments are computed as follows:


Where donates the predicted sentiment of tweet , is the average of sentiments and is the tweets count.


Given input vector and is the number of typhoon categories, donates the probability of typhoon category .

3.4 Typhoon Classifier

The Typhoon classifier takes input features from the Feature Extractor (see previous Section) to predict the typhoon intensity as a final output. We explored different deep architectures models (i.e., the DNN, CNN and RNN models) as baselines to benchmark the performance of our proposed joint-training model for typhoon predictions. In the following, we discuss the architecture of Typhoon Classifier presented in our joint approach in Figure 2.

We employed two convolutional layers (with ReLU activation function). The first CNN layer defines a filter (or also called feature detector) of

output dimension and kernel size . Only defining one filter would allow the neural network to learn one single feature in the first layer. The result from the first CNN layer is fed into the second CNN layer, where another filter is defined with output dimension and kernal size

. After each convolutional operation, we subsample the output by Max-pooling layer (also called pooling operation). We use a max-pooling layer to eliminate non-maximal values and reduces computation in later layers. We also employed dropout rates in convolutional layers (

after first layer and after second layer), to avoid overfitting and increase the model robustness. Finally, we used a fully connected layer with softmax activation function to compute the output probabilities for all typhoon categories as in Equation 5. At the end, the class with the highest probability is returned as the model output.

4 Experiments

We conducted several sets of experiments to benchmark the performance of the baseline models as well as our proposed approach to predict the intensity of typhoons. The aim of our evaluation is to answer the following two research questions:

  • To which extent, can social media improve the performance of the state-of-the-art disaster prediction approaches?

  • What is the impact of semantic embedding of tweets representation on the performance of our proposed approach?

In the rest of this section, we begin by describing baseline approaches and evaluation metrics. Thereafter, we analyze our results and answer each our research questions in details.

4.1 Baselines

We benchmarked our approach against different baselines including traditional ML and deep neural models. We selected the SVM classifier as our traditional baseline classifier as it outperform the other traditional classifiers [4]. For our experiment, we implemented an RBF-based kernel SVM classifier trained with typhoon environmental data. We also used four benchmarks from the deep neural based models (i.e., DNN and RNN, CNN and BiLSTM), where both achieved good performance in disaster-related research [7, 29]. In particular, CNN-based model with semantics has been employed to classify disaster-related social media data. The authors of [4] suggested to enrich data representation by recognizing entities (0 or 1) in tweets (also called bag of concepts) and add as additional features. However, their does not capture the context of an entity where it exist.

In our approach, we recognized entities and extract their semantic representation from an external knowledge graph embedding. The semantics embedding helps to consider not only the existence of an entity, but also represent the context of an entity where it exist. We specify our proposed approaches used in the experiments as follows:

  • LSTM+DNN (word embedding): Our first approach of two deep neural models (LSTM and DNN) are jointly trained with combined features from environmental data and word embedding.

  • LSTM+DNN (semantic embedding): This model is the same as LSTM+DNN (word embedding), but we consider semantic-enriched word embedding for data representation. See Section 3.2 for more details.

  • LSTM+RNN (word embedding): Our second approach of two deep neural models (LSTM and RNN) trained with combined features from environmental data and word embedding.

  • LSTM+RNN (semantic embedding): This model is the same as LSTM+RNN (word embedding), with considering semantic-enriched word embedding.

  • BiLSTM+CNN (word embedding): Our third approach of two deep neural models (BiLSTM and CNN) are jointly trained with combined features from environmental data and word embeddings.

  • BiLSTM+CNN (semantic embedding): This model is the same as BiLSTM+CNN (word embedding) with considering semantic-enriched word embedding.

To ensure a fair performance evaluation, we evaluated our proposed models with the same architectures and hyper-parameters settings used to configure the baselines. Moreover, we evaluated the impact of incorporating semantics embedding from external knowledge graphs in comparison with the traditional word embedding of our input tweet dataset.

4.2 Evaluation Metrics

We considered standard evaluation metrics to assess the models performance in the task of typhoons prediction. We divided the dataset (formally described in Section 2.2) into train-test splits of 80%-20% respectively. We train each model for epochs, then we evaluated the overall performance metrics (accuracy, precision, recall and F1-score) on the test dataset as depicted in Table 2.

Model Description A P R F
SVM Baseline 0.579 0.347 0.579 0.430
DNN Baseline 0.756 0.809 0.756 0.781
RNN Baseline 0.802 0.827 0.802 0.814
CNN Baseline 0.702 0.918 0.702 0.796
BiLSTM Baseline 0.840 0.880 0.840 0.859
LSTM+DNN Word emb. 0.873 0.892 0.873 0.882
LSTM+DNN Semantic emb. 0.917 0.922 0.925 0.917
LSTM+RNN Word emb. 0.860 0.875 0.860 0.855
LSTM+RNN Semantic emb. 0.891 0.904 0.891 0.891
BiLSTM+CNN Word emb. 0.847 0.938 0.847 0.890
BiLSTM+CNN Semantic emb. 0.902 0.933 0.902 0.917
Table 2: Performance evaluation on test dataset using Accuracy (A), Precision (P), Recall (R) and F1-Score (F). Our state-of-the-art models and their baselines are marked in gray.

4.3 Discussion and Result Analysis

To answer ,

we evaluated the baseline models with features extracted from the environmental data (see Section 2.2). Further, we used the same features, in addition to the features extracted from relevant tweets (i.e., tweets-based features) to train our proposed models. In particular, we computed the additional features of tweets count , variance of positive and negative sentiment of tweets.

Table 2 shows how challenging is the task of typhoons intensity prediction, where the best baseline model (i.e., the BiLSTM model) produces an accuracy of . All the base line models were trained on environmental data, which captured from sensor devices and usually include noisy and incomplete data [14]. In contrast, our proposed models clearly demonstrate a significant improved performance where tweets-based features were incorporated with environmental data. In particular, the best accuracy is achieved by our proposed model BiLSTM+CNN, where it outperforms the respective baselines (CNN by and BiLSTM by ) on micro average F1-score. The other proposed models (LSTM+DNN and LSTM+RNN) also outperform their respective baselines (by in DNN and in RNN) on average.

To summarize our answer, training our proposed model with relevant features from social media in addition to environmental data outperforms both the accuracy and the F1-measure of the baseline approaches. To understand the superior performance of our models, we evaluated the importance of training features using the Random Forests algorithm666We used the implementation from scikit-learn As shown in Figure 4, tweet-based features were found to be more important than environmental features. As discussed in Section 3.1, tweet-based features are adaptive to prediction losses in our joint model which helps to fit the features by joint training and improve the accuracy of final prediction.

To answer ,

we investigated the impact of incorporating semantics embedding on the performance of our proposed system. Our experiments showed improved performance in terms of accuracy, precision, recall and F1-measures with the semantics embedding based models over those based only on word embedding. In particular, the accuracy of our proposed joint-model architectures are improved by up to in both LSTM+DNN and LSTM+RNN models compared to word embedding based models. As discussed earlier in Section 3.2, the semantics embedding from knowledge enrich the representation of entities and their relationship into semantic vectors.

Training Robustness.

We validated the training robustness in each model and checked the training over-fitting. As shown in Figure 3, we trained all models to achieve robust performances in testing phase as well as in training and alleviate the over-fitting by tuning the hyper-parameters properly.

Figure 3: Over-fitting evaluation.
Figure 4: Importance of environmental and tweets-based features.

5 Related Works

Our work is related to social media analysis for disaster management, jointly-trained models and semantics-enrichment data mining. In the following, we briefly present the related state of the art in each of these areas in turn.

5.1 Social media analysis for disaster management

Several studies leveraged the role of social media in disasters management[1, 24, 28, 19]. For example, Yury et al. [11]

proposed a social media-based framework to estimate damages initially from social media (i.e., tweets and images). Their system analyzed users activities on Twitter before, during and after the hurricane

Sandy. Their results showed a strong correlation between activities on social media and hurricane path. Similarly, in this paper, we studied social media behaviors during different typhoons, our analysis showed a an implicit correlation based on tweets volume and sentiments.

Social media has also been shown to be a rapid event detector or so-called social-sensing from the crowd. For example, Takeshi et al. [24] proposed a probabilistic spatio-temporal approach to predict the center and path of natural hazards based on geo-based tweets. The authors analyzed the behaviors of social media users during different cases of earthquakes and typhoons. Their experimental results demonstrated a strong correlation between user’s behavior on Twitter and natural hazards. The authors used such correlation as an event detector for earthquakes.

Although previous works have explored the role of social media (e.g., tweets) in crisis events detection and enhancing situational awareness, few research studies showed how tweets sentiments have been used as discriminative features to improve events prediction. For example, Jingrui et al. [9] proposed an optimization framework to improve traffic-jam situations. Their system extracted traffic indicators from tweets semantics to improve traffic-jam production.

5.2 Semantics-enriched data mining

Researchers have leveraged semantics nuances to boost the performance in machine learning (ML) and data mining tasks. The authors of [4, 30] extended the traditional bag-of-word models with a bag-of-concepts model extracted from semantics knowledge graphs (e.g. WordNet, DBpedia). However, the authors represented the presence of concepts as vector of indices within a concept space. On the other hand, Jin Wang et al. [31] leverage information from knowledge graph for short text classification. Their approach associated each short text with relevant concepts. Then, words and concepts were combined to generate its embedding from a pre-traind word embedding model.

In contrast, in our approach, we leverage semantics embedding from knowledge graph which projects concepts (i.e., entities) and their relationship for conceptualized data representation.

5.3 Joint-Learning models

Recently, joint learning models have achieved superior performances in complex tasks. For instance, Jishnu et al[20] proposed a jointly-training RNN-based models to extract keyphrases from disaster-related tweets. Through intersection of two RNN models via joint learning, the experimental results clearly demonstrated a significant performances in comparison with existing baseline approaches. In a related task of image classification, Wanli et al. [15] proposed a unified deep model that jointly learn from different components in image classification task. Similarly, Zheng et al[33] demonstrated that the joint learning of two deep models could not only separately learn user and item latent factors from review text, but also cooperate with each other to boost the performance of rating prediction.

Inspired by these works, we propose our joint training model to learn features from social media and environmental data in comparison to individual or ensemble models employed in event detection (i.e. prediction) [4, 17]

6 Conclusion

In this paper, we propose an end-to-end training framework that learns from social media and environmental data to improve disasters prediction. In particular, we analyzed typhoon-related tweets to capture additional indicators of typhoon events. Unlike previous works, we extract adaptive features based on joint training models (e.g. BiLSTM+CNN). The first model (BiLSTM) acts as a feature extractor from social media and feeds the second model (CNN) with combined features from tweets and environmental data. Furthermore, we study the impact of applying semantically-enriched data representation on the performance of our system. We employed semantics embedding from the external knowledge graph of ConceptNet. We conducted several experiments to benchmark the prediction performance of our approach and several baselines. Our evaluation showed significantly improved accuracies in our approaches (LSTM+DNN: 87.3%, LSTM+RNN: 86.0% and BiLSTM+CNN:0.90%) when compared to state-of-the-art baselines (DNN: 75.6%, RNN: 80.2%, CNN: 0.70% and BiLSTM: 0.84%). Moreover, feeding our proposed joint models with the semantics representation of entities improved F1-score even further (up to 3% in the case of LSTM+DNN, up to 4% in the case of LSTM+RNN and up to 2.7% in the case of BiLSTM+CNN).

In future work, we aim to construct a domain specific knowledge graph from disaster-related tweets for disaster relief tasks (e.g event summarization, identifying actionable information). In addition, we will carry out a comprehensive study of the effect of generic word embedding models (e.g., Word2vec, Glove) in comparison with embedding from domain-specific corpus, knowledge graphs and conceptualized embedding (e.g., BERT).


  • [1] A. Acar and Y. Muraki (2011) Twitter for crisis communication: lessons learned from japan’s tsunami disaster. International Journal of Web Based Communities 7 (3), pp. 392–402. Cited by: §5.1.
  • [2] F. N. A. Al Omran and C. Treude (2017) Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments. In Proceedings of the 14th International Conference on Mining Software Repositories, Cited by: item -.
  • [3] A. Anam, A. Gangopadhyay, and N. Roy (2018) Evaluating disaster time-line from social media with wavelet analysis. In 2018 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 41–48. Cited by: §1.
  • [4] G. Burel, H. Saif, M. Fernandez, and H. Alani (2017)

    On semantics and deep learning for event detection in crisis situations

    In Proceedings of the International Workshop on Semantic Deep Learning (SemDeep), Cited by: §4.1, §5.2, §5.3.
  • [5] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer (2002) SMOTE: synthetic minority over-sampling technique.

    Journal of artificial intelligence research

    16, pp. 321–357.
    Cited by: item -.
  • [6] X. Chen, D. Pan, X. He, Y. Bai, and D. Wang (2012) Upper ocean responses to category 5 typhoon megi in the western north pacific. Acta Oceanologica Sinica 31 (1), pp. 51–58. Cited by: §1.
  • [7] X. Chen, L. Zou, and B. Zhao (2019) Detecting climate change deniers on twitter using a deep neural network. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing, pp. 204–210. Cited by: §4.1.
  • [8] T. Glade and F. Nadim (2014) Early warning systems for natural hazards and risks. Springer. Cited by: §1.
  • [9] J. He, W. Shen, P. Divakaruni, L. Wynter, and R. Lawrence (2013) Improving traffic prediction with tweet semantics. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pp. 1387–1393. External Links: ISBN 978-1-57735-633-2, Link Cited by: §2.1, §5.1.
  • [10] J. B. Houston, J. Hawthorne, M. F. Perreault, E. H. Park, M. Goldstein Hode, M. R. Halliwell, S. E. Turner McGowen, R. Davis, S. Vaid, J. A. McElderry, et al. (2015) Social media and disasters: a functional framework for social media use in disaster planning, response, and research. Disasters 39 (1), pp. 1–22. Cited by: §1.
  • [11] Y. Kryvasheyeu, H. Chen, N. Obradovich, E. Moro, P. Van Hentenryck, J. Fowler, and M. Cebrian (2016) Rapid assessment of disaster damage using social media activity. Science advances 2 (3), pp. e1500779. Cited by: §2.1, §5.1.
  • [12] P. M. Landwehr and K. M. Carley (2014) Social media in disaster relief. In Data mining and knowledge discovery for big data, pp. 225–257. Cited by: §1.
  • [13] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §3.2.
  • [14] M. Morton and J. L. Levy (2011) Challenges in disaster data collection during recent disasters. Prehospital and disaster medicine 26 (3), pp. 196–201. Cited by: §4.3.
  • [15] W. Ouyang and X. Wang (2013) Joint deep learning for pedestrian detection. In

    Proceedings of the IEEE International Conference on Computer Vision

    Cited by: §5.3.
  • [16] H. Paulheim (2018) Make embeddings semantic again!. In International Semantic Web Conference (P&D/Industry/BlueSky), Cited by: §3.2.
  • [17] S. Pouyanfar and S. Chen (2016) Semantic event detection using ensemble deep learning. In Multimedia (ISM), 2016 IEEE International Symposium on, pp. 203–208. Cited by: §5.3.
  • [18] H. Qin, J. Yan, X. Li, and X. Hu (2016)

    Joint training of cascaded cnn for face detection


    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3456–3465. Cited by: §1.
  • [19] Y. Qu, C. Huang, P. Zhang, and J. Zhang (2011) Microblogging after a major disaster in china: a case study of the 2010 yushu earthquake. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, Cited by: §5.1.
  • [20] J. Ray Chowdhury, C. Caragea, and D. Caragea (2019) Keyphrase extraction from disaster-related tweets. In The World Wide Web Conference, pp. 1555–1566. Cited by: §5.3.
  • [21] A. Reese (2016) How we’ll predict the next natural disaster: advances in natural hazard forecasting could help keep more people out of harm’s way. Discover Magazine, Sep. Cited by: §1.
  • [22] C. Reuter and M. Kaufhold (2018) Fifteen years of social media in emergencies: a retrospective review and future directions for crisis informatics. Journal of Contingencies and Crisis Management 26 (1), pp. 41–57. Cited by: §1.
  • [23] H. Saif, Y. He, and H. Alani (2012) Semantic sentiment analysis of twitter. In International semantic web conference, pp. 508–524. Cited by: item -.
  • [24] T. Sakaki, M. Okazaki, and Y. Matsuo (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pp. 851–860. Cited by: §1, §5.1, §5.1.
  • [25] P. K. Sarma, Y. Liang, and W. A. Sethares (2018) Domain adapted word embeddings for improved sentiment classification. arXiv preprint arXiv:1805.04576. Cited by: §3.2.
  • [26] (2013-06) To tweet or not to tweet during a disaster?. External Links: Link Cited by: §1.
  • [27] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler (2014)

    Joint training of a convolutional network and a graphical model for human pose estimation

    In Advances in neural information processing systems, pp. 1799–1807. Cited by: §1.
  • [28] I. Varga, M. Sano, K. Torisawa, C. Hashimoto, K. Ohtake, T. Kawai, J. Oh, and S. De Saeger (2013) Aid is out there: looking for help from tweets during a large scale disaster. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Cited by: §5.1.
  • [29] C. Wang, O. Singh, Z. Tang, and H. Dai (2017) Using a recurrent neural network model for classification of tweets conveyed influenza-related information. In Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017), pp. 33–38. Cited by: §4.1.
  • [30] F. Wang, Z. Wang, Z. Li, and J. Wen (2014) Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1069–1078. Cited by: §5.2.
  • [31] J. Wang, Z. Wang, D. Zhang, and J. Yan (2017)

    Combining knowledge with deep convolutional neural networks for short text classification.

    In IJCAI, pp. 2915–2921. Cited by: §5.2.
  • [32] H. M. Zahera, M. A. Sherif, and A. Ngonga Ngomo (2019) Jointly learning from social media and environmental data for typhoon intensity prediction. In Proceedings of the 10th International Conference on Knowledge Capture, pp. 231–234. Cited by: Semantic-based End-to-End Learning for Typhoon Intensity Prediction.
  • [33] L. Zheng, V. Noroozi, and P. S. Yu (2017) Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 425–434. Cited by: §5.3.