Exploring the Potential of Machine Translation for Generating Named Entity Datasets: A Case Study between Persian and English

02/19/2023
by   Amir Sartipi, et al.
0

This study focuses on the generation of Persian named entity datasets through the application of machine translation on English datasets. The generated datasets were evaluated by experimenting with one monolingual and one multilingual transformer model. Notably, the CoNLL 2003 dataset has achieved the highest F1 score of 85.11 lowest F1 score of 40.02 machine translation in creating high-quality named entity recognition datasets for low-resource languages like Persian. The study compares the performance of these generated datasets with English named entity recognition systems and provides insights into the effectiveness of machine translation for this task. Additionally, this approach could be used to augment data in low-resource language or create noisy data to make named entity systems more robust and improve them.

READ FULL TEXT

page 3

page 5

research
05/04/2020

Soft Gazetteers for Low-Resource Named Entity Recognition

Traditional named entity recognition models use gazetteers (lists of ent...
research
08/26/2021

Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation

The state of art natural language processing systems relies on sizable t...
research
08/07/2018

Design Challenges in Named Entity Transliteration

We analyze some of the fundamental design challenges that impact the dev...
research
09/08/2022

CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which c...
research
03/18/2023

GazeReader: Detecting Unknown Word Using Webcam for English as a Second Language (ESL) Learners

Automatic unknown word detection techniques can enable new applications ...
research
12/14/2022

Building and Evaluating Universal Named-Entity Recognition English corpus

This article presents the application of the Universal Named Entity fram...
research
02/25/2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-re...

Please sign up or login with your details

Forgot password? Click here to reset