WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition

04/08/2023
by   Dávid Šuba, et al.
0

Named Entity Recognition (NER) is a fundamental NLP tasks with a wide range of practical applications. The performance of state-of-the-art NER methods depends on high quality manually anotated datasets which still do not exist for some languages. In this work we aim to remedy this situation in Slovak by introducing WikiGoldSK, the first sizable human labelled Slovak NER dataset. We benchmark it by evaluating state-of-the-art multilingual Pretrained Language Models and comparing it to the existing silver-standard Slovak NER dataset. We also conduct few-shot experiments and show that training on a sliver-standard dataset yields better results. To enable future work that can be based on Slovak NER, we release the dataset, code, as well as the trained models publicly under permissible licensing terms at https://github.com/NaiveNeuron/WikiGoldSK.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

MasakhaNER: Named Entity Recognition for African Languages

We take a step towards addressing the under-representation of the Africa...
research
04/08/2021

COVID-19 Named Entity Recognition for Vietnamese

The current COVID-19 pandemic has lead to the creation of many corpora t...
research
06/29/2021

Learning from Miscellaneous Other-Class Words for Few-shot Named Entity Recognition

Few-shot Named Entity Recognition (NER) exploits only a handful of annot...
research
01/18/2022

Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis

Social media data such as Twitter messages ("tweets") pose a particular ...
research
01/12/2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

While neural network-based models have achieved impressive performance o...
research
09/17/2021

reproducing "ner and pos when nothing is capitalized"

Capitalization is an important feature in many NLP tasks such as Named E...
research
08/28/2023

FonMTL: Towards Multitask Learning for the Fon Language

The Fon language, spoken by an average 2 million of people, is a truly l...

Please sign up or login with your details

Forgot password? Click here to reset