Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

08/25/2018
by   Radu Tudor Ionescu, et al.
0

Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. In this paper, we apply two simple yet effective transductive learning approaches to further improve the results of string kernels. The first approach is based on interpreting the pairwise string kernel similarities between samples in the training set and samples in the test set as features. Our second approach is a simple self-training method based on two learning iterations. In the first iteration, a classifier is trained on the training set and tested on the test set, as usual. In the second iteration, a number of test samples (to which the classifier associated higher confidence scores) are added to the training set for another round of training. However, the ground-truth labels of the added test samples are not necessary. Instead, we use the labels predicted by the classifier in the first training iteration. By adapting string kernels to the test set, we report significantly better accuracy rates in English polarity classification and Arabic dialect identification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2018

Transductive Learning with String Kernels for Cross-Domain Text Classification

For many text classification tasks, there is a major problem posed by th...
research
04/21/2018

Automated essay scoring with string kernels and word embeddings

In this work, we present an approach based on combining string kernels a...
research
03/09/2017

Detecting Sockpuppets in Deceptive Opinion Spam

This paper explores the problem of sockpuppet detection in deceptive opi...
research
07/26/2017

Can string kernels pass the test of time in Native Language Identification?

We describe a machine learning approach for the 2017 shared task on Nati...
research
03/30/2018

Learning to generate classifiers

We train a network to generate mappings between training sets and classi...
research
05/13/2018

UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row

We present a machine learning approach that ranked on the first place in...
research
06/28/2021

Dataset Bias Mitigation Through Analysis of CNN Training Scores

Training datasets are crucial for convolutional neural network-based alg...

Please sign up or login with your details

Forgot password? Click here to reset