Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

05/30/2018
by   Genta Indra Winata, et al.
0

We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76 F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2019

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

In the third shared task of the Computational Approaches to Linguistic C...
research
09/11/2019

From English to Code-Switching: Transfer Learning with Strong Morphological Clues

Code-switching is still an understudied phenomenon in natural language p...
research
04/05/2022

LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition

Code comment generation is the task of generating a high-level natural l...
research
09/08/2022

CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which c...
research
09/18/2019

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

In countries that speak multiple main languages, mixing up different lan...
research
03/28/2022

Using Domain Knowledge for Low Resource Named Entity Recognition

In recent years, named entity recognition has always been a popular rese...

Please sign up or login with your details

Forgot password? Click here to reset