Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

05/15/2022
by   George-Eduard Zaharia, et al.
0

Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42 to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset. At the same time, we obtain an increase of 3 Complex Word Identification 2018 dataset. In addition, our model yields state-of-the-art results in terms of Mean Absolute Error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Predicting Lexical Complexity in English Texts

The first step in most text simplification is to predict which words are...
research
05/18/2021

LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction

This paper describes team LCP-RIT's submission to the SemEval-2021 Task ...
research
06/12/2018

Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment Analysis in Diverse Domains

Domain adaptation for sentiment analysis is challenging due to the fact ...
research
05/21/2019

Domain adaptation for part-of-speech tagging of noisy user-generated text

The performance of a Part-of-speech (POS) tagger is highly dependent on ...
research
10/23/2020

Rapid Domain Adaptation for Machine Translation with Monolingual Data

One challenge of machine translation is how to quickly adapt to unseen d...
research
10/13/2017

Complex Word Identification: Challenges in Data Annotation and System Performance

This paper revisits the problem of complex word identification (CWI) fol...
research
05/16/2018

Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples

We revisit domain adaptation for parsers in the neural era. First we sho...

Please sign up or login with your details

Forgot password? Click here to reset