Learning to Weight for Text Classification

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.

READ FULL TEXT

page 4

page 5

page 12

research
12/13/2010

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifier...
research
12/11/2020

TF-CR: Weighting Embeddings for Text Classification

Text classification, as the task consisting in assigning categories to t...
research
10/10/2016

Supervised Term Weighting Metrics for Sentiment Analysis in Short Text

Term weighting metrics assign weights to terms in order to discriminate ...
research
06/03/2020

Exploiting Class Labels to Boost Performance on Embedding-based Text Classification

Text classification is one of the most frequent tasks for processing tex...
research
03/12/2020

TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

Sentiment Analysis is a branch of Affective Computing usually considered...
research
07/13/2020

Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

This article analyses and evaluates FDDe̱ṯa̱, a supervised term-weightin...
research
01/19/2021

Variance Based Samples Weighting for Supervised Deep Learning

In the context of supervised learning of a function by a Neural Network ...

Please sign up or login with your details

Forgot password? Click here to reset