Using Word Embeddings to Analyze Protests News

The first two tasks of the CLEF 2019 ProtestNews events focused on distinguishing between protest and non-protest related news articles and sentences in a binary classification task. Among the submissions, two well performing models have been chosen in order to replace the existing word embeddings word2vec and FastTest with ELMo and DistilBERT. Unlike bag of words or earlier vector approaches, ELMo and DistilBERT represent words as a sequence of vectors by capturing the meaning based on contextual information in the text. Without changing the architecture of the original models other than the word embeddings, the implementation of DistilBERT improved the performance measured on the F1-Score of 0.66 compared to the FastText implementation. DistilBERT also outperformed ELMo in both tasks and models. Cleaning the datasets by removing stopwords and lemmatizing the words has been shown to make the models more generalizable across different contexts when training on a dataset with Indian news articles and evaluating the models on a dataset with news articles from China.

READ FULL TEXT

page 29

page 30

page 31

research
05/17/2017

Utility of general and specific word embeddings for classifying translational stages of research

Conventional text classification models make a bag-of-words assumption r...
research
11/14/2016

Lost in Space: Geolocation in Event Data

Extracting the "correct" location information from text data, i.e., dete...
research
12/14/2021

Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings

Slanted news coverage, also called media bias, can heavily influence how...
research
11/21/2019

An Empirical Study of Sections in Classifying Disease Outbreak Reports

Identifying articles that relate to infectious diseases is a necessary s...
research
08/01/2019

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...
research
10/07/2020

MuSeM: Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching

Measuring the congruence between two texts has several useful applicatio...
research
10/28/2016

Word Embeddings for the Construction Domain

We introduce word vectors for the construction domain. Our vectors were ...

Please sign up or login with your details

Forgot password? Click here to reset