Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

07/09/2019
by   Tyler J. Gray, et al.
0

Stretched words like `heellllp' or `heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of `stretchable words' found in roughly 100 billion tweets authored over an 8 year period. We introduce two central parameters, `balance' and `stretch', that capture their main characteristics, and explore their dynamics by creating visual tools we call `balance plots' and `spelling trees'. We discuss how the tools and methods we develop here could be used to study the statistical patterns of mistypings and misspellings, along with the potential applications in augmenting dictionaries, improving language processing, and in any area where sequence construction matters, such as genetics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

Automatic Language Identification for Romance Languages using Stop Words and Diacritics

Automatic language identification is a natural language processing probl...
research
01/31/2016

WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

Language in social media is mostly driven by new words and spellings tha...
research
12/11/2022

A Study of Slang Representation Methods

Warning: this paper contains content that may be offensive or upsetting....
research
04/11/2022

Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada

Over the last decade, Twitter has emerged as one of the most influential...
research
11/01/2018

Learning to Describe Phrases with Local and Global Contexts

When reading a text, it is common to become stuck on unfamiliar words an...
research
07/10/2019

Exploiting user-frequency information for mining regionalisms from Social Media texts

The task of detecting regionalisms (expressions or words used in certain...
research
03/31/2021

Self-Supervised Euphemism Detection and Identification for Content Moderation

Fringe groups and organizations have a long history of using euphemisms–...

Please sign up or login with your details

Forgot password? Click here to reset