Impact of Feature Selection on Micro-Text Classification

08/27/2017
by   Ankit Vadehra, et al.
0

Social media datasets, especially Twitter tweets, are popular in the field of text classification. Tweets are a valuable source of micro-text (sometimes referred to as "micro-blogs"), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others. Tweets often include keywords referred to as "Hashtags" that can be used as labels for the tweet. Using tweets encompassing 50 labels, we studied the impact of word versus character-level feature selection and extraction on different learners to solve a multi-class classification task. We show that feature extraction of simple character-level groups performs better than simple word groups and pre-processing methods like normalizing using Porter's Stemming and Part-of-Speech ("POS")-Lemmatization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2020

Effect of Text Processing Steps on Twitter Sentiment Classification using Word Embedding

Processing of raw text is the crucial first step in text classification ...
research
01/22/2020

Investigating Classification Techniques with Feature Selection For Intention Mining From Twitter Feed

In the last decade, social networks became most popular medium for commu...
research
03/07/2023

Classifying Text-Based Conspiracy Tweets related to COVID-19 using Contextualized Word Embeddings

The FakeNews task in MediaEval 2022 investigates the challenge of findin...
research
06/01/2017

Deep Learning for Hate Speech Detection in Tweets

Hate speech detection on Twitter is critical for applications like contr...
research
08/05/2019

A Deep Learning Approach for Tweet Classification and Rescue Scheduling for Effective Disaster Management

It is a challenging and complex task to acquire information from differe...
research
07/11/2020

Feature Selection on Noisy Twitter Short Text Messages for Language Identification

The task of written language identification involves typically the detec...
research
07/13/2020

An Enhanced Text Classification to Explore Health based Indian Government Policy Tweets

Government-sponsored policy-making and scheme generations is one of the ...

Please sign up or login with your details

Forgot password? Click here to reset