Evaluating Informal-Domain Word Representations With UrbanDictionary

06/27/2016
by   Naomi Saphra, et al.
0

Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2017

Construction of a Japanese Word Similarity Dataset

An evaluation of distributed word representation is generally conducted ...
research
09/21/2017

WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition

We study the problem of evaluating automatic speech recognition (ASR) sy...
research
06/21/2016

Correlation-based Intrinsic Evaluation of Word Vector Representations

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector re...
research
11/12/2019

How to Evaluate Word Representations of Informal Domain?

Diverse word representations have surged in most state-of-the-art natura...
research
04/07/2018

Evaluating historical text normalization systems: How well do they generalize?

We highlight several issues in the evaluation of historical text normali...
research
06/05/2020

Evaluating Text Coherence at Sentence and Paragraph Levels

In this paper, to evaluate text coherence, we propose the paragraph orde...

Please sign up or login with your details

Forgot password? Click here to reset