Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

05/11/2022
by   Rabab Alkhalifa, et al.
15

Where performance of text classification models drops over time due to changes in data, development of models whose performance persists over time is important. An ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. We find that one can estimate how a model will retain its performance over time based on (i) how well the model performs over a restricted time period and its extrapolation to a longer time period, and (ii) the linguistic characteristics of the dataset, such as the familiarity score between subsets from different years. Findings from these experiments have important implications for the design of text classification models with the aim of preserving performance over time.

READ FULL TEXT

page 8

page 10

page 11

page 15

research
09/12/2020

Improving Indonesian Text Classification Using Multilingual Language Model

Compared to English, the amount of labeled data for Indonesian text clas...
research
04/15/2021

Text Guide: Improving the quality of long text classification by a text selection method based on feature importance

The performance of text classification methods has improved greatly over...
research
08/27/2021

Opinions are Made to be Changed: Temporally Adaptive Stance Classification

Given the rapidly evolving nature of social media and people's views, wo...
research
11/14/2021

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

When an NLP model is trained on text data from one time period and teste...
research
08/03/2021

Your fairness may vary: Group fairness of pretrained language models in toxic text classification

We study the performance-fairness trade-off in more than a dozen fine-tu...
research
05/15/2023

Text Classification via Large Language Models

Despite the remarkable success of large-scale Language Models (LLMs) suc...
research
06/29/2021

Time-Aware Language Models as Temporal Knowledge Bases

Many facts come with an expiration date, from the name of the President ...

Please sign up or login with your details

Forgot password? Click here to reset