Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation

06/17/2023
by   Andrei-Marius Avram, et al.
0

Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification in a multilingual context by training it on all 14 languages available in version 1.2 of the PARSEME corpus. We also incorporate lateral inhibition and language adversarial training into our methodology to create language-independent embeddings and improve its capabilities in identifying multiword expressions. The evaluation of our models shows that the approach employed in this work achieves better results compared to the best system of the PARSEME 1.2 competition, MTLB-STRUCT, on 11 out of 14 languages for global MWE identification and on 12 out of 14 languages for unseen MWE identification. Additionally, averaged across all languages, our best approach outperforms the MTLB-STRUCT system by 1.23 global MWE identification.

READ FULL TEXT

page 1

page 5

page 14

research
04/22/2023

Romanian Multiword Expression Detection Using Multilingual Adversarial Training and Lateral Inhibition

Multiword expressions are a key ingredient for developing large-scale an...
research
01/12/2017

LanideNN: Multilingual Language Identification on Character Window

In language identification, a common first step in natural language proc...
research
11/01/2017

Improved Text Language Identification for the South African Languages

Virtual assistants and text chatbots have recently been gaining populari...
research
06/09/2022

Language Identification for Austronesian Languages

This paper provides language identification models for low- and under-re...
research
11/14/2017

Robust Multilingual Part-of-Speech Tagging via Adversarial Training

Adversarial training (AT) is a powerful regularization method for neural...
research
02/11/2021

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Language Identification is the task of identifying a document's language...
research
12/18/2019

Towards an automatic recognition of mixed languages: The Ukrainian-Russian hybrid language Surzhyk

Language interference is common in today's multilingual societies where ...

Please sign up or login with your details

Forgot password? Click here to reset