Efficacy of BERT embeddings on predicting disaster from Twitter data

by   Ashis Kumar Chanda, et al.
Temple University

Social media like Twitter provide a common platform to share and communicate personal experiences with other people. People often post their life experiences, local news, and events on social media to inform others. Many rescue agencies monitor this type of data regularly to identify disasters and reduce the risk of lives. However, it is impossible for humans to manually check the mass amount of data and identify disasters in real-time. For this purpose, many research works have been proposed to present words in machine-understandable representations and apply machine learning methods on the word representations to identify the sentiment of a text. The previous research methods provide a single representation or embedding of a word from a given document. However, the recent advanced contextual embedding method (BERT) constructs different vectors for the same word in different contexts. BERT embeddings have been successfully used in different natural language processing (NLP) tasks, yet there is no concrete analysis of how these representations are helpful in disaster-type tweet analysis. In this research work, we explore the efficacy of BERT embeddings on predicting disaster from Twitter data and compare these to traditional context-free word embedding methods (GloVe, Skip-gram, and FastText). We use both traditional machine learning methods and deep learning methods for this purpose. We provide both quantitative and qualitative results for this study. The results show that the BERT embeddings have the best results in disaster prediction task than the traditional word embeddings. Our codes are made freely accessible to the research community.



There are no comments yet.



Robust and Consistent Estimation of Word Embedding for Bangla Language by fine-tuning Word2Vec Model

Word embedding or vector representation of word holds syntactical and se...

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Word Representations form the core component for almost all advanced Nat...

Using Dynamic Embeddings to Improve Static Embeddings

How to build high-quality word embeddings is a fundamental research ques...

Word Embeddings to Enhance Twitter Gang Member Profile Identification

Gang affiliates have joined the masses who use social media to share tho...

The Secret Lives of Names? Name Embeddings from Social Media

Your name tells a lot about you: your gender, ethnicity and so on. It ha...

BERT Goes Shopping: Comparing Distributional Models for Product Representations

Word embeddings (e.g., word2vec) have been applied successfully to eComm...

CrisisBERT: Robust Transformer for Crisis Classification and Contextual Crisis Embedding

Classification of crisis events, such as natural disasters, terrorist at...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In the current age of internet, online social media sites have become available to all people, and people tend to post their personal experiences, current events, local and global news. For this reason, the daily usages of social media are growing up and making a large dataset that becomes an important source of data to make different types of research analysis. Moreover, the social media data are real time data and accessible to monitor. Therefore, several research works are conducted to perform different types real time predictions using the social media data, such as stock movement prediction (Nguyen et al., 2015), relation extraction (Ritter et al., 2015), and natural disaster prediction (Yoo et al., 2018; Mai et al., 2020).

Twitter is such a social media site that can be accessed through people’s laptops and smartphones. The rapid growth of smartphone or laptop usages enables people to share an emergency that they observe in real time. For this reason, many disaster relief organizations and news agencies are interested in monitoring Twitter data programmatically. However, unlike long articles, tweets are short length text, and they tend to have more challenges due to their shortness, sparsity (i.e., diverse word content) (Chen et al., 2011), velocity (rapid growth of short text like SMS and tweet) and misspelling (Alsmadi and Gan, 2019). For these reasons, it is very challenging to understand whether a person’s words are announcing a disaster or not. For example, a tweet like this, ” ! ! ” tells us an experience of a person in a concert and we can say from that he enjoyed it, because of the word, ””. Even though it contains the word, ””, it does not mean any danger or emergency; rather, it is used to describe the colorful decoration of the stage. Let’s assume another tweet like this, ” ”. Here, the word “” means disaster, and the tweet describes an emergency. The two examples show that one word could have multiple meanings based on its context. Therefore, understanding the context of words is important to analyze a tweet’s sentiment.

Different researchers proposed different methods to understand the meaning of a word by representing it in embedding or vector (Pennington et al., 2014; Mikolov et al., 2013; Bojanowski et al., 2016)

. Neural network-based methods such as Skip-gram

(Mikolov et al., 2013), FastText (Bojanowski et al., 2016)

are popular for learning word embeddings from large word corpus and have been used for solving different types of NLP tasks. These methods are also used for sentiment analysis of Twitter data

(Deho et al., 2018; Poornima and Priya, 2020). However, those embedding learning methods provide static embedding for a single word in a document. Hence, the meaning of the word,“” would remain the same in the above two examples for these methods.

To handle this problem, the authors of (Devlin et al., 2018) proposed a contextual embedding learning model, Bidirectional Encoder Representations from Transformers (BERT), that provides embeddings of a word based on its context words. In different types of NLP tasks such as text classification (Sun et al., 2019)

, text summarization

(Liu and Lapata, 2019), entity recognition (Hakala and Pyysalo, 2019), BERT model outperformed traditional embedding learning models. However, it is interesting to discover how the contextual embeddings could help to understand disaster-type texts. For this reason, we plan to analyze the disaster prediction task from Twitter data using both context-free and contextual embeddings in this study. We use traditional machine learning methods and neural network models for the prediction task where the word embeddings are used as input to the models. We show that contextual embeddings work better in predicting disaster-types tweets than the other word embeddings. Finally, we provide an extensive discussion to analyze the results.

The main contributions of this paper are summarized as follows.

  1. We analyze a real-life natural language online social network dataset, Twitter data, to identify challenges in human sentiment analysis for disaster-type tweet prediction.

  2. We apply both contextual and context-free embeddings in tweet representations for disaster prediction through machine learning methods and show that context-free embeddings (BERT) can improve the accuracy of disaster prediction compared with contextual embeddings.

  3. We provide a detailed explanation of our method and results and share our codes publicly that will enable researchers to run our experiments and reproduce our results for future research work directions 111 https://github.com/ashischanda/sentiment-analysis.

The rest of the paper is organized as follows. In section 2, some related works are introduced. The main methodology of this paper is elaborated in section 3. The dataset and the experiments are presented in section 4 and 5, respectively. Finally, the conclusion is drawn in section 6.

2. Related Works

Many research works analyzed Twitter data for understanding emergency situation and predicting disaster analysis (Karami et al., 2020; Zou et al., 2018; Ashktorab et al., 2014; Olteanu et al., 2014). One group of researchers used text mining and statistical approaches to understand crises (Karami et al., 2020; Zou et al., 2018), another group of researchers focused on clustering text data to identify a group of tweets that belong to disaster (Ashktorab et al., 2014; Olteanu et al., 2014). Later, different traditional machine learning models are used to analyze Twitter data and predict disaster or emergency situations where words of a tweet are represented as embeddings (Palshikar et al., 2018; Algur and Venugopal, 2021; Singh et al., 2019). For example, Palshikar et al. (Palshikar et al., 2018) proposed a weakly supervised model where words are presented with a bag of words (BOW) model. Moreover, frequency-based word representation is used in (Algur and Venugopal, 2021)

for disaster prediction from Twitter data using Naive Bayes, Logistic Regression, Random Forest, and SVM methods. The authors in

(Singh et al., 2019) used a Markov-based model to predict the location of Tweets during a disaster. In a recent work (Pota et al., 2021), the authors proposed a pre-processing method for BERT-based sentiment analysis of tweets. However, it is interesting to explore the model performance on different word embeddings to observe how the context words help to predict a tweet as a disaster.

3. Methodology

In this section, we discuss our approach of leveraging word embedding for disaster prediction from Twitter data using machine learning methods. We consider three types of word embeddings, 1) bag of words (BOW), 2) context-free, and 3) contextual embeddings. The word embeddings are used in both traditional machine learning methods and deep learning models as input for disaster prediction.

3.1. BOW embeddings

The bag-of-words (BOW) model is a common approach for text representation of a word document. If there are words in a text vocabulary, then BOW is a binary vector or array of length

where each index of the array is used to present one word of the vocabulary. If a word exists in a document, then the corresponding array index of the word becomes one; otherwise, it contains zero. We use BOW embeddings of Twitter data in three traditional machine learning methods such as decision tree, random forest, and logistic regression to predict the sentiment of a tweet.

Even though BOW is good for representing words of a document, it loses contextual information because the order of words is not recorded in the binary structure. However, contextual information is required to understand and analyze the sentiment of a text. For this reason, we also plan to use context-based embeddings for this sentiment analysis task.

3.2. Context-free embeddings

Many existing research works proposed to learn word embeddings based on the co-occurrences of word pairs in documents. GloVe (Pennington et al., 2014) is one common method for learning word embeddings from the co-occurrences of words in documents. However, neural network-based models such as Skip-gram (Mikolov et al., 2013), FastText (Bojanowski et al., 2016) became popular recently to learn word representations from documents and used for sentiment analysis.

In our research study, we use the pre-trained embeddings of three context-free embedding models (GloVe, Skip-gram, FastText) in a neural network-based model to analyze the sentiment of tweet data and predict disaster types tweets. To represent a tweet in context-free embeddings, we took the average of word embeddings of a tweet following the same strategy of (Kenter et al., 2016). For the calculated vector of a tweet, we use softmax to predict the sentiment of the tweet. Let’s suppose that the vector of a tweet is , we have a set of labels, = ”positive”, ”negative” and

is the weight matrix of softmax function. Then, the probability of the tweet to be positive or disaster is calculated as follows

Recently, deep neural networks are also used for sentiment analysis. To observe how the context-free embeddings work on deep neural networks, we used a bidirectional recurrent neural network with LSTM gates

(Hochreiter and Schmidhuber, 1997)

. The Bi-LSTM model processes the input words of a tweet from right to left and in reverse. The Bi-LSTM block is followed by a fully connected layer and sigmoid function to output as an activation function.

3.3. Contextual embeddings

Unlike the other word embeddings, BERT embeddings (Devlin et al., 2018) generates different vectors for the same word in different contexts. Recent advances in NLP have shown that BERT model have outperformed traditional embeddings in different NLP tasks, like entity extraction, next sentence prediction. In our study, we plan to investigate how well do contextual embeddings work better than traditional embeddings in sentiment analysis. For this purpose, we use the pre-trained embeddings of BERT models in the same neural network models to predict disaster types tweets.

Tweet (original) Tweet (after preprocessing)
RockyFire Update = California Hwy. 20 closed in both rockyfire update california hwy 20 closed
directions due to Lake County fire - CAfire wildfires directions due lake county fire cafire wildfires
TheAtlantic That or they might be killed in an airplane theatlantic might killed airplane accident
accident in the night a car wreck night car wreck
Table 1. Sample pre-processed tweets
Total train data 7,613
Total positive data (or disaster tweets) 3,271
Total unique words 21,940
Total unique words with frequency 1 6,816
Avg. length of tweets 12.5
Median length of tweets 13
Maximum length of tweets 29
Minimum length of tweets 1
Table 2. Training data statistic
Figure 1. Twitter length distribution in training data

4. Dataset

For this study, we used a Twitter dataset from a recent Kaggle competition (Natural Language Processing with Disaster Tweets 222 https://www.kaggle.com/c/nlp-getting-started). Kaggle competition is a very well-known platform for machine learning researchers where many research agencies share their data to solve different types of research problems. For example, many researchers used data from Kaggle competitions to analyze real-life problems and propose models to solve the problems, such as sentiment analysis, feature detection, diagnosis prediction (Koumpouri et al., 2015; Tolkachev et al., 2020; Iglovikov et al., 2017; Yang et al., 2018; Yang and Ding, 2020).

In the selected Kaggle competition, a dataset of 10,876 tweets is given to predict which tweets are about real disasters and which ones aren’t using machine learning model. This dataset has two separate files, train (7,613 tweets) and test (3,263 tweets) data, where each row of the train data contains id, natural language text or tweet, and label. The labels are manually annotated by humans. They labeled a tweet as positive or one if it is about real disaster, otherwise as negative or zero. On the other hand, the test data has D and natural language text but no label. The competition site stores the labels of test data privately and uses that to calculate test scores based on user’s machine learning model predictions and create leader-board for the competition based on the test score. Moreover, this dataset was created by the figure-eight company and originally shared on their website 333 https://appen.com/open-source-datasets/.

We used the training data to train different machine learning models and predict test data labels using trained models. We reported both the train and test data score in our experiment. Note that our purpose is not to get a high score in the competition, rather use Twitter data to study our research goals.

4.1. Data pre-processing

Since the twitter data is natural language text, and it contains different types of typos, punctuation, abbreviations, and numbers. For this reason, before training machine learning models on the natural language text, a text pre-processing step is required to remove stop words and word tokenization. Hence, we removed all the stopwords and punctuations from the training data and converted all the words into lower-case letters. Table 1 shows some pre-processed tweets with the original tweets.

4.2. Data analysis

Before running any machine learning methods on our data, we analyzed our dataset to obtain some insights about the data. Table 2 shows some statistical results on the training data after pre-processing the text. From the table, we find that there are 43 tweets that are annotated as real disasters and 57 are not. There are a total of 21,940 unique words, while only 6,816 words have frequency 1. The average length of tweets is 12.5. However, it is important to check the length of positive and negative tweets separately to verify whether they have common characteristics. Figure 1 shows word distribution for both the positive and negative tweets. The figure shows many negative tweets with small word lengths (10), but most positive and negative tweets are in a word length of 10 to 20.

We also analyzed the word frequency for positive and negative tweets. Figure 2 shows the most frequent words in a word cloud where the high font of a word presents high frequency. We can find some common words in both types of tweets (i.e., https, t, co, people). However, Figure 2(a) highlights many disaster-related words like storm, fire, bomber, death, and earthquake. On the other hand, Figure 2

(b) highlights daily used words such as think, good, love, now, time. From this figure, it is clear that the most frequent words are different in the two types of tweets, and understanding the meaning of words is important to classify them.

Figure 2. Showing the most frequent words in training data

5. Experiments

In our experimental study, we conduct several experiments based on the real Twitter data to predict disaster-types tweets. At first, we describe the experimental settings and model training procedures in this section. Then, we analyze the experimental results in detail.

5.1. Experimental settings

5.1.1. Traditional ML models with BOW embeddings

From the Table 2, we find that the training data has 21,940 unique words where 6,816 words have frequency more than 1. To avoid infrequent words, we considered only the vocabulary of 6,816 words in our BOW representations. To represent a tweet in BOW embeddings, we took a binary array of 6,816 length where it had 1 if a word of tweet was present in the vocabulary, otherwise 0. We used the BOW embeddings to predict the sentiment of a tweet using three traditional machine learning models, 1) decision tree, 2) random forest and 3) logistic regression. We used python Sklearn package 444https://scikit-learn.org/stable/ and used all the default parameters to train the models on our train dataset. After training the model, we used the test data to get labels and submit that in Kaggle to have test score.

5.1.2. Deep learning models with context-free embeddings

For this experiment, we chose three context-free methods, 1) Skip-gram (Mikolov et al., 2013), 2) FastText (Bojanowski et al., 2016), and 2) GloVe (Pennington et al., 2014). We used publicly available pre-trained embeddings of Skip-gram and fastText models that are trained on Wikipedia data 555 https://nlp.stanford.edu/projects/glove/. The pre-trained embeddings of FastText is collected from Mikolov et. al. 2018 (Mikolov et al., 2018). The size of all the pre-trained embeddings or feature is 300.

The proposed softmax model is trained for 100 epochs using a stochastic gradient algorithm to minimize the categorical cross entropy loss function. We took 1

of training data as validation data and used the validation data to stop the training model if the loss value for the validation data didn’t decrease in last ten epochs. Similarly to the softmax model, we also trained our Bi-LSTM model using batch gradient descent algorithm for 100 epochs to minimize the binary cross entropy loss function. We followed the same stop rule for this model.

5.1.3. Deep learning models with contextual embeddings

To obtain contextual embeddings, we downloaded publicly available pre-trained BERT model (Bert-base-uncased) (Devlin et al., 2018) from the official site of the authors 666https://github.com/google-research/bert. We gave tweets as inputs in the BERT model and took the hidden states of the [] token of the last layer from the model as embeddings of the given tweets. Then, the embeddings is used in our sigmoid model to predict sentiment of tweets. The same setting is used in a previous paper (Ji et al., 2021) to predict patient diagnosis from medical note words using pre-trained BERT model.

Moreover, we can find embeddings of each words of a tweet from the pre-trained BERT model. The BERT’s pre-trained word embeddings are used as input to our Bi-LSTM model. The authors of (Lu et al., 2020) used the similar setting for the sentiment analysis of text data.

5.2. Evaluation metric

Three different metrics are used in our experiment to evaluate the performance of the machine learning models on the disaster prediction task such as, 1) accuracy, 2) F1 score, and 3) Area Under the Curve (AUC). In our experiment, we considered disaster tweets as ’positive class’ and others as ’negative class’. Hence, True Positive (TP) means the actual disaster tweets that are predicted as disaster while False Positive (FP) shows the tweets that are actually false, but predicted as true. True Negative (TN) and False Negative (FN) imply in the same way. The accuracy is the number of correctly predicted tweets among all of the tweets and it is calculated as follows.

F1 score is another popular metric to test predictive performance of a model. The F1 score is measured by the harmonic mean of recall and precision where recall means the number of true labels are predicted by a model among the total number of existing true labels and precision means the number of true labels are predicted by a model divided by the total number of labels are predicted by the model. The F1 score is calculated as follows.

On the other hand, AUC tells us how much a model is capable of distinguishing between classes. The higher score of the AUC means the model works better at predicting negative classes as zero and positive classes as one.

5.3. Experimental results

5.3.1. Quantitative results

Table 3 provides the results of all the machine learning models on the disaster prediction tasks for all three types of embeddings. The table shows results for both the training and test data. Since the test data results are collected from the Kaggle competition, we only can report the accuracy score.

The table shows that the logistic regression model has the best results for the BOW embeddings among the three traditional machine learning models. However, the results of neural network models for context-free embeddings are better than the traditional machine learning models that used context-free embeddings as inputs. Among the three context-free embeddings (Skip-gram, FastText, GloVe), the GloVe with Bi-LSTM model has the best train and test score for all the three evaluation metrics. Note that the results also show us that deep learning model like Bi-LSTM has better results than the shallow neural network model such as the softmax model.

Moreover, when we used the same shallow neural network and deep learning models for contextual embeddings such as BERT; we found that there are 2 improvements on AUC and Acc over the context-free embeddings. It means that contextual embeddings are helpful and have the best performance for the disaster prediction task.

Model Train data Test data
AUC F1 Acc Acc
BOW embeddings
Decision tree 0.6320 0.5896 0.6273 0.6380
Random forest 0.8313 0.7320 0.7848 0.7042
Logistic regression 0.8660 0.7443 0.7927 0.7293
Context-free embeddings
Skip-gram+Softmax 0.8281 0.7301 0.7769 0.7649
FastText+Softmax 0.8336 0.7231 0.7769 0.7826
GloVe+Softmax 0.8246 0.7323 0.7717 0.7827
Skip-gram+Bi-LSTM 0.8272 0.7440 0.7808 0.7775
FastText+Bi-LSTM 0.8327 0.7369 0.7817 0.7955
GloVe+Bi-LSTM 0.8351 0.7500 0.7991 0.8093
Contextual embeddings
BERT+Softmax 0.8513 0.8254 0.8292 0.8250
BERT+Bi-LSTM 0.8578 0.8316 0.8351 0.8308
Table 3. Performance of different machine learning models on disaster prediction for different types of word representations or embeddings

5.3.2. Qualitative results

Table 3 shows us quantitative results for the prediction of disaster tweets where the neural model with contextual embeddings outperformed the other models. However, it is difficult to understand from the result that when the contextual embeddings predict a disaster tweet successfully while context-free models fail. For this purpose, we observe the prediction results of the Bi-LSTM model for both the context-free (GloVe) and contextual embeddings (BERT). Table 4 shows the model predictions with true labels for some sample tweets. From the table, we can find that the predictions for GloVe embeddings for the first two tweets are positive, maybe because of the word, “”, in the tweets, but the true labels are negative for the two tweets. If we read the tweets, then we can understand that the tweets are not related to disaster or crisis. On the other hand, since the BERT model generates word embeddings based on the context words, it successfully predicts the tweets as negative.

The predictions for GloVe embeddings for the third and fourth tweets are false while they are true. Note that no disaster-related words are used in the two tweets, but the tweets described serious situations. The predictions for BERT embeddings are also correct for the third and fourth tweets. The predictions of GloVe and BERT embeddings for the fifth and sixth tweets of Table 3 are correct. Since there are some disaster-related words (i.e., suicide, bomber, bombing) in the tweets, both models successfully labeled them.

After analyzing the results of Table 4, it can be implied that the context-free embeddings are helpful to predict a tweet as a disaster if disaster-related words (i.e., accident, bomb) exist in the tweet. In contrast, contextual embeddings help to understand the context of a tweet that is challenging and important for the sentiment analysis task. Although every tweet has a short length text, contextual embeddings works efficiently to understand the sentiment of a tweet.

Sample tweets Prediction True
GloVe BERT label
1 I swear someone needs to take it away from me, cuase I’m just accident prone. Yes No No
2 Dave if I say that I met her by accident this week- would you be
super jelly Dave? :p Yes No No
3 Schoolgirl attacked in Seaton Delaval park by ’pack of animals’ No Yes Yes
4 Not sure how these fire-workers rush into burning buildings
but I’m grateful they do. TrueHeroes No Yes Yes
5 A suicide bomber has blown himself up at a mosque in the south Yes Yes Yes
6 Bombing of Hiroshima 1945 Yes Yes Yes
Table 4. Showing sentiment predictions of Bi-LSTM model for pre-trained GloVe and BERT embeddings

6. Conclusion

In this paper, we described an extensive analysis for predicting disaster from Twitter data using different types of word embeddings. Our experimental results show that contextual embeddings have the best result for predicting disaster from tweets. We also showed that deep neural network models outperformed traditional machine learning methods in the disaster prediction task. Advance deep neural network models such as multi-layer convolutional models can also be used for this prediction task to achieve higher accuracy.


  • S. P. Algur and S. Venugopal (2021) Classification of disaster specific tweets-a hybrid approach. In 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 774–777. Cited by: §2.
  • I. Alsmadi and K. H. Gan (2019) Review of short-text classification. International Journal of Web Information Systems. Cited by: §1.
  • Z. Ashktorab, C. Brown, M. Nandi, and A. Culotta (2014) Tweedr: mining twitter to inform disaster response.. In ISCRAM, pp. 269–272. Cited by: §2.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606. Cited by: §1, §3.2, §5.1.2.
  • M. Chen, X. Jin, and D. Shen (2011) Short text classification improved by learning multi-granularity topics. In

    Twenty-second international joint conference on artificial intelligence

    Cited by: §1.
  • B. O. Deho, A. W. Agangiba, L. F. Aryeh, and A. J. Ansah (2018) Sentiment analysis with word embedding. In 2018 IEEE 7th International Conference on Adaptive Science & Technology (ICAST), pp. 1–4. Cited by: §1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1, §3.3, §5.1.3.
  • K. Hakala and S. Pyysalo (2019)

    Biomedical named entity recognition with multilingual bert

    In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, pp. 56–61. Cited by: §1.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.2.
  • V. Iglovikov, S. Mushinskiy, and V. Osin (2017)

    Satellite imagery feature detection using deep convolutional neural network: a kaggle competition

    arXiv preprint arXiv:1706.06169. Cited by: §4.
  • S. Ji, M. Hölttä, and P. Marttinen (2021) Does the magic of bert apply to medical code assignment? a quantitative study. arXiv preprint arXiv:2103.06511. Cited by: §5.1.3.
  • A. Karami, V. Shah, R. Vaezi, and A. Bansal (2020) Twitter speaks: a case of national disaster situational awareness. Journal of Information Science 46 (3), pp. 313–324. Cited by: §2.
  • T. Kenter, A. Borisov, and M. De Rijke (2016) Siamese cbow: optimizing word embeddings for sentence representations. arXiv preprint arXiv:1606.04640. Cited by: §3.2.
  • A. Koumpouri, I. Mporas, and V. Megalooikonomou (2015) Evaluation of four approaches for” sentiment analysis on movie reviews” the kaggle competition. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS), pp. 1–5. Cited by: §4.
  • Y. Liu and M. Lapata (2019) Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345. Cited by: §1.
  • Z. Lu, P. Du, and J. Nie (2020) VGCN-bert: augmenting bert with graph embedding for text classification. Advances in Information Retrieval 12035, pp. 369. Cited by: §5.1.3.
  • M. Mai, C. K. Leung, J. M. Choi, and L. K. R. Kwan (2020) Big data analytics of twitter data and its application for physician assistants: who is talking about your profession in twitter?. In Data Management and Analysis, pp. 17–32. Cited by: §1.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013)

    Efficient estimation of word representations in vector space

    CoRR abs/1301.3781. External Links: Link, 1301.3781 Cited by: §1, §3.2, §5.1.2.
  • T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2018) Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Cited by: §5.1.2.
  • T. H. Nguyen, K. Shirai, and J. Velcin (2015) Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42 (24), pp. 9603–9611. Cited by: §1.
  • A. Olteanu, C. Castillo, F. Diaz, and S. Vieweg (2014)

    Crisislex: a lexicon for collecting and filtering microblogged communications in crises

    In Eighth international AAAI conference on weblogs and social media, Cited by: §2.
  • G. K. Palshikar, M. Apte, and D. Pandita (2018) Weakly supervised and online learning of word models for classification to detect disaster reporting tweets. Information Systems Frontiers 20 (5), pp. 949–959. Cited by: §2.
  • J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §1, §3.2, §5.1.2.
  • A. Poornima and K. S. Priya (2020) A comparative sentiment analysis of sentence embedding using machine learning techniques. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 493–496. Cited by: §1.
  • M. Pota, M. Ventura, H. Fujita, and M. Esposito (2021) Multilingual evaluation of pre-processing for bert-based sentiment analysis of tweets. Expert Systems with Applications 181, pp. 115119. Cited by: §2.
  • A. Ritter, E. Wright, W. Casey, and T. Mitchell (2015) Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th international conference on world wide web, pp. 896–905. Cited by: §1.
  • J. P. Singh, Y. K. Dwivedi, N. P. Rana, A. Kumar, and K. K. Kapoor (2019) Event classification and location prediction from tweets during disasters. Annals of Operations Research 283 (1), pp. 737–757. Cited by: §2.
  • C. Sun, X. Qiu, Y. Xu, and X. Huang (2019) How to fine-tune bert for text classification?. In China National Conference on Chinese Computational Linguistics, pp. 194–206. Cited by: §1.
  • A. Tolkachev, I. Sirazitdinov, M. Kholiavchenko, T. Mustafaev, and B. Ibragimov (2020) Deep learning for diagnosis and segmentation of pneumothorax: the results on the kaggle competition and validation against radiologists. IEEE Journal of Biomedical and Health Informatics. Cited by: §4.
  • X. Yang and J. Ding (2020) A computational framework for iceberg and ship discrimination: case study on kaggle competition. IEEE Access 8, pp. 82320–82327. Cited by: §4.
  • X. Yang, Z. Zeng, S. G. Teo, L. Wang, V. Chandrasekhar, and S. Hoi (2018) Deep learning for practical image recognition: case study on kaggle competitions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 923–931. Cited by: §4.
  • S. Yoo, J. Song, and O. Jeong (2018) Social media contents based sentiment analysis and prediction system. Expert Systems with Applications 105, pp. 102–111. Cited by: §1.
  • L. Zou, N. S. Lam, H. Cai, and Y. Qiang (2018) Mining twitter data for improved understanding of disaster resilience. Annals of the American Association of Geographers 108 (5), pp. 1422–1441. Cited by: §2.