COVID-19 Named Entity Recognition for Vietnamese

04/08/2021
by   Thinh Hung Truong, et al.
0

The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese. Particularly, our dataset is annotated for the named entity recognition (NER) task with newly-defined entity types that can be used in other future epidemics. Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets. We empirically conduct experiments using strong baselines on our dataset, and find that: automatic Vietnamese word segmentation helps improve the NER results and the highest performances are obtained by fine-tuning pre-trained language models where the monolingual model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) produces higher results than the multilingual model XLM-R (Conneau et al., 2020). We publicly release our dataset at: https://github.com/VinAIResearch/PhoNER_COVID19

READ FULL TEXT
research
04/08/2023

WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition

Named Entity Recognition (NER) is a fundamental NLP tasks with a wide ra...
research
03/22/2021

MasakhaNER: Named Entity Recognition for African Languages

We take a step towards addressing the under-representation of the Africa...
research
02/22/2023

FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model

The development of annotated datasets over the 21st century has helped u...
research
05/02/2020

Sources of Transfer in Multilingual Named Entity Recognition

Named-entities are inherently multilingual, and annotations in any given...
research
04/16/2023

EasyNER: A Customizable Easy-to-Use Pipeline for Deep Learning- and Dictionary-based Named Entity Recognition from Medical Text

Medical research generates a large number of publications with the PubMe...
research
05/18/2020

A Semantically Enriched Dataset based on Biomedical NER for the COVID19 Open Research Dataset Challenge

Research into COVID-19 is a big challenge and highly relevant at the mom...
research
10/07/2022

Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

Recent progress in language model pre-training has led to important impr...

Please sign up or login with your details

Forgot password? Click here to reset