Combating Temporal Drift in Crisis with Adapted Embeddings

04/17/2021
by   Kevin Stowe, et al.
0

Language usage changes over time, and this can impact the effectiveness of NLP systems. This work investigates methods for adapting to changing discourse during crisis events. We explore social media data during crisis, for which effective, time-sensitive methods are necessary. We experiment with two separate methods to accommodate changing data: temporal pretraining, which uses unlabeled data for the target time periods to train better language models, and a model of embedding shift based on tools for analyzing semantic change. This shift allows us to counteract temporal drift by normalizing incoming data based on observed patterns of language change. Simulating scenarios in which we lack access to incoming labeled data, we demonstrate the effectiveness of these methods for a wide variety of crises, showing we can improve performance by up to 8.0 F1 score for relevance classification across datasets.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/23/2020

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Language models pretrained on text from a wide variety of sources form t...
02/15/2021

How COVID-19 Is Changing Our Language : Detecting Semantic Shift in Twitter Word Embeddings

Words are malleable objects, influenced by events that are reflected in ...
11/14/2021

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

When an NLP model is trained on text data from one time period and teste...
04/16/2021

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

Language use differs between domains and even within a domain, language ...
01/06/2022

Fortunately, Discourse Markers Can Enhance Language Models for Sentiment Analysis

In recent years, pretrained language models have revolutionized the NLP ...
11/13/2020

Learning language variations in news corpora through differential embeddings

There is an increasing interest in the NLP community in capturing variat...
08/09/2016

TweeTime: A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter

We describe TweeTIME, a temporal tagger for recognizing and normalizing ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.