ZeroBERTo – Leveraging Zero-Shot Text Classification by Topic Modeling

01/04/2022
by   Alexandre Alcoforado, et al.
0

Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12 score in the FolhaUOL dataset. Keywords: Low-Resource NLP, Unlabeled data, Zero-Shot Learning, Topic Modeling, Transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2022

Pre-trained Language Models can be Fully Zero-Shot Learners

How can we extend a pre-trained model to many language understanding tas...
research
10/22/2022

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Recently, dataset-generation-based zero-shot learning has shown promisin...
research
09/18/2023

Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

User-generated texts available on the web and social platforms are often...
research
09/28/2019

Generalized Zero-shot ICD Coding

The International Classification of Diseases (ICD) is a list of classifi...
research
12/20/2022

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

Large language models can perform new tasks in a zero-shot fashion, give...
research
05/25/2022

ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

Nowadays, owing to the superior capacity of the large pre-trained langua...
research
07/13/2020

An Enhanced Text Classification to Explore Health based Indian Government Policy Tweets

Government-sponsored policy-making and scheme generations is one of the ...

Please sign up or login with your details

Forgot password? Click here to reset