Incremental Active Opinion Learning Over a Stream of Opinionated Documents

09/03/2015
by   Max Zimmermann, et al.
0

Applications that learn from opinionated documents, like tweets or product reviews, face two challenges. First, the opinionated documents constitute an evolving stream, where both the author's attitude and the vocabulary itself may change. Second, labels of documents are scarce and labels of words are unreliable, because the sentiment of a word depends on the (unknown) context in the author's mind. Most of the research on mining over opinionated streams focuses on the first aspect of the problem, whereas for the second a continuous supply of labels from the stream is assumed. Such an assumption though is utopian as the stream is infinite and the labeling cost is prohibitive. To this end, we investigate the potential of active stream learning algorithms that ask for labels on demand. Our proposed ACOSTREAM 1 approach works with limited labels: it uses an initial seed of labeled documents, occasionally requests additional labels for documents from the human expert and incrementally adapts to the underlying stream while exploiting the available labeled documents. In its core, ACOSTREAM consists of a MNB classifier coupled with "sampling" strategies for requesting class labels for new unlabeled documents. In the experiments, we evaluate the classifier performance over time by varying: (a) the class distribution of the opinionated stream, while assuming that the set of the words in the vocabulary is fixed but their polarities may change with the class distribution; and (b) the number of unknown words arriving at each moment, while the class polarity may also change. Our results show that active learning on a stream of opinionated documents, delivers good performance while requiring a small selection of labels

READ FULL TEXT

page 5

page 6

research
01/29/2019

Limitations of Assessing Active Learning Performance at Runtime

Classification algorithms aim to predict an unknown label (e.g., a quali...
research
04/11/2016

Active Learning for Online Recognition of Human Activities from Streaming Videos

Recognising human activities from streaming videos poses unique challeng...
research
01/17/2018

Efficient Test Collection Construction via Active Learning

To create a new IR test collection at minimal cost, we must carefully se...
research
12/19/2021

Active Weighted Aging Ensemble for Drifted Data Stream Classification

One of the significant problems of streaming data classification is the ...
research
10/28/2022

Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models

A key bottleneck in building automatic extraction models for visually ri...
research
08/20/2015

The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

Stream mining poses unique challenges to machine learning: predictive mo...
research
01/25/2022

Online Active Learning with Dynamic Marginal Gain Thresholding

The blessing of ubiquitous data also comes with a curse: the communicati...

Please sign up or login with your details

Forgot password? Click here to reset