Toward More Meaningful Resources for Lower-resourced Languages

02/24/2022
by   Constantine Lignos, et al.
0

In this position paper, we describe our perspective on how meaningful resources for lower-resourced languages should be developed in connection with the speakers of those languages. We first examine two massively multilingual resources in detail. We explore the contents of the names stored in Wikidata for a few lower-resourced languages and find that many of them are not in fact in the languages they claim to be and require non-trivial effort to correct. We discuss quality issues present in WikiAnn and evaluate whether it is a useful supplement to hand annotated data. We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process. We conclude with recommended guidelines for resource development.

READ FULL TEXT
research
04/01/2021

Mining Wikidata for Name Resources for African Languages

This work supports further development of language technology for the la...
research
04/12/2022

Not always about you: Prioritizing community needs when developing endangered language technology

Languages are classified as low-resource when they lack the quantity of ...
research
03/22/2021

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

With the success of large-scale pre-training and multilingual modeling i...
research
06/04/2021

Facade-X: an opinionated approach to SPARQL anything

The Semantic Web research community understood since its beginning how c...
research
04/06/2022

Language Resources and Technologies for Non-Scheduled and Endangered Indian Languages

In the present paper, we will present a survey of the language resources...
research
11/28/2022

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

While the NLP community is generally aware of resource disparities among...
research
05/28/2020

A Corpus for Large-Scale Phonetic Typology

A major hurdle in data-driven research on typology is having sufficient ...

Please sign up or login with your details

Forgot password? Click here to reset