Boosting classification reliability of NLP transformer models in the long run

02/20/2023
by   Zoltán Kmetty, et al.
0

Transformer-based machine learning models have become an essential tool for many natural language processing (NLP) tasks since the introduction of the method. A common objective of these projects is to classify text data. Classification models are often extended to a different topic and/or time period. In these situations, deciding how long a classification is suitable for and when it is worth re-training our model is difficult. This paper compares different approaches to fine-tune a BERT model for a long-running classification task. We use data from different periods to fine-tune our original BERT model, and we also measure how a second round of annotation could boost the classification quality. Our corpus contains over 8 million comments on COVID-19 vaccination in Hungary posted between September 2020 and December 2021. Our results show that the best solution is using all available unlabeled comments to fine-tune a model. It is not advisable to focus only on comments containing words that our model has not encountered before; a more efficient solution is randomly sample comments from the new period. Fine-tuning does not prevent the model from losing performance but merely slows it down. In a rapidly changing linguistic environment, it is not possible to maintain model performance without regularly annotating new text.

READ FULL TEXT

page 9

page 11

research
09/07/2021

FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, Fact-Claiming Comments

In this paper we describe the methods we used for our submissions to the...
research
08/11/2023

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

The Forum for Information Retrieval (FIRE) started a shared task this ye...
research
12/07/2020

Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet Classification Using BERT

We describe the systems developed for the WNUT-2020 shared task 2, ident...
research
05/19/2022

ArabGlossBERT: Fine-Tuning BERT on Context-Gloss Pairs for WSD

Using pre-trained transformer models such as BERT has proven to be effec...
research
07/22/2022

Evaluation of Different Annotation Strategies for Deployment of Parking Spaces Classification Systems

When using vision-based approaches to classify individual parking spaces...
research
03/04/2022

CoNIC Solution

Nuclei segmentation and classification has been a challenge due to the h...
research
03/25/2022

A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

Objective: To comparatively evaluate several transformer model architect...

Please sign up or login with your details

Forgot password? Click here to reset