Combating Temporal Drift in Crisis with Adapted Embeddings
Language usage changes over time, and this can impact the effectiveness of NLP systems. This work investigates methods for adapting to changing discourse during crisis events. We explore social media data during crisis, for which effective, time-sensitive methods are necessary. We experiment with two separate methods to accommodate changing data: temporal pretraining, which uses unlabeled data for the target time periods to train better language models, and a model of embedding shift based on tools for analyzing semantic change. This shift allows us to counteract temporal drift by normalizing incoming data based on observed patterns of language change. Simulating scenarios in which we lack access to incoming labeled data, we demonstrate the effectiveness of these methods for a wide variety of crises, showing we can improve performance by up to 8.0 F1 score for relevance classification across datasets.
READ FULL TEXT