Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

10/07/2022
by   Asahi Ushio, et al.
0

Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it (NER models have been integrated into TweetNLP and can be found athttps://github.com/asahi417/tner/tree/master/examples/tweetner7_paper).

READ FULL TEXT
research
01/18/2022

Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis

Social media data such as Twitter messages ("tweets") pose a particular ...
research
06/10/2019

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

In the third shared task of the Computational Approaches to Linguistic C...
research
04/20/2021

Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp

Performance of neural models for named entity recognition degrades over ...
research
07/24/2017

CAp 2017 challenge: Twitter Named Entity Recognition

The paper describes the CAp 2017 challenge. The challenge concerns the p...
research
04/08/2021

COVID-19 Named Entity Recognition for Vietnamese

The current COVID-19 pandemic has lead to the creation of many corpora t...
research
06/26/2023

Transfer Learning across Several Centuries: Machine and Historian Integrated Method to Decipher Royal Secretary's Diary

A named entity recognition and classification plays the first and foremo...
research
09/06/2022

Depression Symptoms Modelling from Social Media Text: An Active Learning Approach

A fundamental component of user-level social media language based clinic...

Please sign up or login with your details

Forgot password? Click here to reset