Curating corpora with classifiers: A case study of clean energy sentiment online

05/04/2023
by   Michael V. Arnold, et al.
0

Well curated, large-scale corpora of social media posts containing broad public opinion offer an alternative data source to complement traditional surveys. While surveys are effective at collecting representative samples and are capable of achieving high accuracy, they can be both expensive to run and lag public opinion by days or weeks. Both of these drawbacks could be overcome with a real-time, high volume data stream and fast analysis pipeline. A central challenge in orchestrating such a data pipeline is devising an effective method for rapidly selecting the best corpus of relevant documents for analysis. Querying with keywords alone often includes irrelevant documents that are not easily disambiguated with bag-of-words natural language processing methods. Here, we explore methods of corpus curation to filter irrelevant tweets using pre-trained transformer-based models, fine-tuned for our binary classification task on hand-labeled tweets. We are able to achieve F1 scores of up to 0.95. The low cost and high performance of fine-tuning such a model suggests that our approach could be of broad benefit as a pre-processing step for social media datasets with uncertain corpus boundaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Emoji Prediction using Transformer Models

In recent years, the use of emojis in social media has increased dramati...
research
10/26/2020

UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

Offensive language detection is one of the most challenging problem in t...
research
02/02/2019

Making a Case for Social Media Corpus for Detecting Depression

The social media platform provides an opportunity to gain valuable insig...
research
07/05/2023

Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts

The massive collection of user posts across social media platforms is pr...
research
11/29/2020

A Novel Sentiment Analysis Engine for Preliminary Depression Status Estimation on Social Media

Text sentiment analysis for preliminary depression status estimation of ...
research
04/07/2023

Opinion Mining from YouTube Captions Using ChatGPT: A Case Study of Street Interviews Polling the 2023 Turkish Elections

Opinion mining plays a critical role in understanding public sentiment a...

Please sign up or login with your details

Forgot password? Click here to reset