Misspelling Oblivious Word Embeddings

05/23/2019
by   Bora Edizel, et al.
0

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

Word Embeddings from Large-Scale Greek Web content

Word embeddings are undoubtedly very useful components in many NLP tasks...
research
11/25/2019

Towards robust word embeddings for noisy texts

Research on word embeddings has mainly focused on improving their perfor...
research
04/11/2018

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

In the medical domain, identifying and expanding abbreviations in clinic...
research
09/04/2017

Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some Practical Aspects

This paper describes a preliminary study for producing and distributing ...
research
12/30/2020

kōan: A Corrected CBOW Implementation

It is a common belief in the NLP community that continuous bag-of-words ...
research
04/21/2018

Context-Attentive Embeddings for Improved Sentence Representations

While one of the first steps in many NLP systems is selecting what embed...
research
01/05/2016

The Role of Context Types and Dimensionality in Learning Word Embeddings

We provide the first extensive evaluation of how using different types o...

Please sign up or login with your details

Forgot password? Click here to reset