Early Forecasting of Text Classification Accuracy and F-Measure with Active Learning

01/20/2020
by   Thomas Orth, et al.
0

When creating text classification systems, one of the major bottlenecks is the annotation of training data. Active learning has been proposed to address this bottleneck using stopping methods to minimize the cost of data annotation. An important capability for improving the utility of stopping methods is to effectively forecast the performance of the text classification models. Forecasting can be done through the use of logarithmic models regressed on some portion of the data as learning is progressing. A critical unexplored question is what portion of the data is needed for accurate forecasting. There is a tension, where it is desirable to use less data so that the forecast can be made earlier, which is more useful, versus it being desirable to use more data, so that the forecast can be more accurate. We find that when using active learning it is even more important to generate forecasts earlier so as to make them more useful and not waste annotation effort. We investigate the difference in forecasting difficulty when using accuracy and F-measure as the text classification system performance metrics and we find that F-measure is more difficult to forecast. We conduct experiments on seven text classification datasets in different semantic domains with different characteristics and with three different base machine learning algorithms. We find that forecasting is easiest for decision tree learning, moderate for Support Vector Machines, and most difficult for neural networks.

READ FULL TEXT
research
01/26/2019

Stopping Active Learning based on Predicted Change of F Measure for Text Classification

During active learning, an effective stopping method allows users to lim...
research
01/08/2022

Impact of Stop Sets on Stopping Active Learning for Text Classification

Active learning is an increasingly important branch of machine learning ...
research
05/12/2021

Mining Legacy Issues in Open Pit Mining Sites: Innovation Support of Renaturalization and Land Utilization

Open pit mines left many regions worldwide inhospitable or uninhabitable...
research
08/15/2021

Deep Active Learning for Text Classification with Diverse Interpretations

Recently, Deep Neural Networks (DNNs) have made remarkable progress for ...
research
01/24/2018

Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection

This paper investigates and evaluates support vector machine active lear...
research
12/29/2019

Active Learning in Video Tracking

Active learning methods, like uncertainty sampling, combined with probab...
research
02/08/2023

CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance Data

Financial sector and especially the insurance industry collect vast volu...

Please sign up or login with your details

Forgot password? Click here to reset