Word-level Lexical Normalisation using Context-Dependent Embeddings

11/13/2019
by   Michael Stewart, et al.
0

Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while word-level models are seldom used. Recent language models offer solutions to the drawbacks of word-level LN models, yet, to the best of our knowledge, no research has investigated their effectiveness on LN. In this paper we introduce a word-level GRU-based LN model and investigate the effectiveness of recent embedding techniques on word-level LN. Our results show that our GRU-based word-level model produces greater results than character-level models, and outperforms existing deep-learning based LN techniques on Twitter data. We also find that randomly-initialised embeddings are capable of outperforming pre-trained embedding models in certain scenarios. Finally, we release a substantial lexical normalisation dataset to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

SenseBERT: Driving Some Sense into BERT

Self-supervision techniques have allowed neural language models to advan...
research
03/02/2018

Syntax-Aware Language Modeling with Recurrent Neural Networks

Neural language models (LMs) are typically trained using only lexical fe...
research
09/14/2022

Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Models of acoustic word embeddings (AWEs) learn to map variable-length s...
research
04/14/2021

UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted Features for Lexical Complexity Prediction

Reading is a complex process which requires proper understanding of text...
research
06/13/2019

Antonym-Synonym Classification Based on New Sub-space Embeddings

Distinguishing antonyms from synonyms is a key challenge for many NLP ap...
research
06/08/2021

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

We release a new benchmark for lexical substitution, the task of finding...
research
09/17/2022

Unsupervised Lexical Substitution with Decontextualised Embeddings

We propose a new unsupervised method for lexical substitution using pre-...

Please sign up or login with your details

Forgot password? Click here to reset