ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer

07/02/2022
by   Ebrahim Chekol Jibril, et al.
0

Named Entity Recognition is an information extraction task that serves as a preprocessing step for other natural language processing tasks, such as machine translation, information retrieval, and question answering. Named entity recognition enables the identification of proper names as well as temporal and numeric expressions in an open domain text. For Semitic languages such as Arabic, Amharic, and Hebrew, the named entity recognition task is more challenging due to the heavily inflected structure of these languages. In this paper, we present an Amharic named entity recognition system based on bidirectional long short-term memory with a conditional random fields layer. We annotate a new Amharic named entity recognition dataset (8,070 sentences, which has 182,691 tokens) and apply Synthetic Minority Over-sampling Technique to our dataset to mitigate the imbalanced classification problem. Our named entity recognition system achieves an F_1 score of 93 state-of-the-art result for Amharic named entity recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/29/2017

The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition

This paper presents a state-of-the-art system for Vietnamese Named Entit...
09/28/2019

Named Entity Recognition System for Sindhi Language

Named Entity Recognition (NER) System aims to extract the existing infor...
01/15/2019

A Tweet Dataset Annotated for Named Entity Recognition and Stance Detection

Annotated datasets in different domains are critical for many supervised...
10/18/2016

Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

This paper describes an efficient approach to improve the accuracy of a ...
10/12/2021

Investigation on Data Adaptation Techniques for Neural Named Entity Recognition

Data processing is an important step in various natural language process...
04/15/2021

UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches

The increment of toxic comments on online space is causing tremendous ef...
08/24/2016

Robust Named Entity Recognition in Idiosyncratic Domains

Named entity recognition often fails in idiosyncratic domains. That caus...