A comprehensive empirical analysis on cross-domain semantic enrichment for detection of depressive language

06/24/2021
by   Nawshad Farruque, et al.
17

We analyze the process of creating word embedding feature representations designed for a learning task when annotated data is scarce, for example, in depressive language detection from Tweets. We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism. We also experimented with several other more sophisticated methods of such mapping including, several auto-encoder based and custom loss-function based methods that learn embedding representations through gradually learning to be close to the words of similar semantics and distant to dissimilar semantics. Our strengthened representations better capture the semantics of the depression domain, as it combines the semantics learned from the specific domain coupled with word coverage from the general language. We also present a comparative performance analyses of our word embedding representations with a simple bag-of-words model, well known sentiment and psycholinguistic lexicons, and a general pre-trained word embedding. When used as feature representations for several different machine learning methods, including deep learning models in a depressive Tweets identification task, we show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset. Also, we present several data ablation tests which confirm the efficacy of our augmentation techniques.

READ FULL TEXT
research
08/23/2020

Augmenting Semantic Representation of Depressive Language: from Forums to Microblogs

We discuss and analyze the process of creating word embedding feature re...
research
10/24/2022

Subspace-based Set Operations on a Pre-trained Word Embedding Space

Word embedding is a fundamental technology in natural language processin...
research
07/15/2021

Multi-Channel Auto-Encoders and a Novel Dataset for Learning Domain Invariant Representations of Histopathology Images

Domain shift is a problem commonly encountered when developing automated...
research
10/04/2016

Are Word Embedding-based Features Useful for Sarcasm Detection?

This paper makes a simple increment to state-of-the-art in sarcasm detec...
research
07/22/2016

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed ve...
research
07/25/2023

Word Sense Disambiguation as a Game of Neurosymbolic Darts

Word Sense Disambiguation (WSD) is one of the hardest tasks in natural l...
research
06/07/2018

Characterizing Departures from Linearity in Word Translation

We investigate the behavior of maps learned by machine translation metho...

Please sign up or login with your details

Forgot password? Click here to reset