A Study of Slang Representation Methods

12/11/2022
by   Aravinda Kolla, et al.
0

Warning: this paper contains content that may be offensive or upsetting. Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements, which can be traced to characteristics of slang as a quickly evolving and highly subjective language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2023

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

Pre-trained language models trained on large-scale data have learned ser...
research
07/01/2022

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Understanding vision and language representations of product content is ...
research
02/28/2019

Efficient Contextual Representation Learning Without Softmax Layer

Contextual representation models have achieved great success in improvin...
research
07/09/2019

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

Stretched words like `heellllp' or `heyyyyy' are a regular feature of sp...
research
11/15/2016

Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction Tasks

Research in social media analysis is experiencing a recent surge with a ...
research
06/11/2021

Dynamic Language Models for Continuously Evolving Content

The content on the web is in a constant state of flux. New entities, iss...
research
10/14/2021

Compressibility of Distributed Document Representations

Contemporary natural language processing (NLP) revolves around learning ...

Please sign up or login with your details

Forgot password? Click here to reset