SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs

10/18/2021
by   Jon Chun, et al.
0

SOTA Transformer and DNN short text sentiment classifiers report over 97 accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.

READ FULL TEXT

page 22

page 23

research
06/14/2023

AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian

Lack of available resources such as text corpora for low-resource langua...
research
05/08/2022

Multi-Domain Targeted Sentiment Analysis

Targeted Sentiment Analysis (TSA) is a central task for generating insig...
research
09/07/2023

Adapting Self-Supervised Representations to Multi-Domain Setups

Current state-of-the-art self-supervised approaches, are effective when ...
research
05/29/2017

An Automatic Contextual Analysis and Clustering Classifiers Ensemble approach to Sentiment Analysis

Products reviews are one of the major resources to determine the public ...
research
01/11/2018

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

In this paper we propose a new parallel architecture based on Big Data t...
research
12/06/2021

ActiveZero: Mixed Domain Learning for Active Stereovision with Zero Annotation

Traditional depth sensors generate accurate real world depth estimates t...
research
08/18/2021

FeelsGoodMan: Inferring Semantics of Twitch Neologisms

Twitch chats pose a unique problem in natural language understanding due...

Please sign up or login with your details

Forgot password? Click here to reset