Predicting the Effectiveness of Self-Training: Application to Sentiment Classification

01/13/2016
by   Vincent Van Asch, et al.
0

The goal of this paper is to investigate the connection between the performance gain that can be obtained by selftraining and the similarity between the corpora used in this approach. Self-training is a semi-supervised technique designed to increase the performance of machine learning algorithms by automatically classifying instances of a task and adding these as additional training material to the same classifier. In the context of language processing tasks, this training material is mostly an (annotated) corpus. Unfortunately self-training does not always lead to a performance increase and whether it will is largely unpredictable. We show that the similarity between corpora can be used to identify those setups for which self-training can be beneficial. We consider this research as a step in the process of developing a classifier that is able to adapt itself to each new test corpus that it is presented with.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2017

A Large Self-Annotated Corpus for Sarcasm

We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for...
research
06/04/2018

Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis

Japanese predicate-argument structure (PAS) analysis involves zero anaph...
research
10/04/2018

Semi-Supervised Methods for Out-of-Domain Dependency Parsing

Dependency parsing is one of the important natural language processing t...
research
06/07/2018

Semi-supervised and Transfer learning approaches for low resource sentiment classification

Sentiment classification involves quantifying the affective reaction of ...
research
10/29/2020

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Most existing approaches to disfluency detection heavily rely on human-a...
research
08/18/2017

The Natural Stories Corpus

It is now a common practice to compare models of human language processi...
research
05/28/2018

UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

The present study describes our submission to SemEval 2018 Task 1: Affec...

Please sign up or login with your details

Forgot password? Click here to reset