Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

10/23/2020
by   Diego Alves, et al.
0

This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition andwith addition ofSentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about major events that can cause an impactin Europe and the rest of the world. Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones. In order to classify all European Union official languages in terms of resources, we have analysed the size of annotated corpora as well as the existence of pre-trained models in mainstream Language Processing tools, and we have combined this information with the proposed classification published at META-NETwhitepaper series.

READ FULL TEXT
research
04/11/2023

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

The Serbian language is a Slavic language spoken by over 12 million spea...
research
10/23/2020

Evaluating Language Tools for Fifteen EU-official Under-resourced Languages

This article presents the results of the evaluation campaign of language...
research
08/29/2018

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

For languages with no annotated resources, unsupervised transfer of natu...
research
01/31/2023

Zero-shot cross-lingual transfer language selection using linguistic similarity

We study the selection of transfer languages for different Natural Langu...
research
05/12/2022

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Multilingual pre-trained models are known to suffer from the curse of mu...
research
08/08/2023

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotatio...
research
03/30/2020

European Language Grid: An Overview

With 24 official EU and many additional languages, multilingualism in Eu...

Please sign up or login with your details

Forgot password? Click here to reset