Strong Baselines for Complex Word Identification across Multiple Languages

04/11/2019
by   Pierre Finnimore, et al.
0

Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and cross-lingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural networks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings wit...
research
05/02/2018

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

We propose a fully unsupervised framework for ad-hoc cross-lingual infor...
research
10/11/2020

TransQuest at WMT2020: Sentence-Level Direct Assessment

This paper presents the team TransQuest's participation in Sentence-Leve...
research
11/29/2022

Compressing Cross-Lingual Multi-Task Models at Qualtrics

Experience management is an emerging business area where organizations f...
research
10/09/2014

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simp...
research
11/28/2014

Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge

We present a method for coarse-grained cross-lingual alignment of compar...
research
01/29/2022

Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Machine learning models allow us to compare languages by showing how har...

Please sign up or login with your details

Forgot password? Click here to reset