Less is More: Parameter-Free Text Classification with Gzip

12/19/2022
by   Zhiying Jiang, et al.
0

Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, light-weight and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training, pre-training or fine-tuning, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distributed datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

Label Semantic Aware Pre-training for Few-shot Text Classification

In text classification tasks, useful information is encoded in the label...
research
11/08/2019

Not Enough Data? Deep Learning to the Rescue!

Based on recent advances in natural language modeling and those in text ...
research
11/09/2016

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Detecting and classifying targets in video streams from surveillance cam...
research
03/20/2022

Cluster Tune: Boost Cold Start Performance in Text Classification

In real-world scenarios, a text classification task often begins with a ...
research
04/04/2020

Knowledge Guided Metric Learning for Few-Shot Text Classification

The training of deep-learning-based text classification models relies he...
research
07/15/2023

Prompt Tuning on Graph-augmented Low-resource Text Classification

Text classification is a fundamental problem in information retrieval wi...
research
01/07/2022

Improved Input Reprogramming for GAN Conditioning

We study the GAN conditioning problem, whose goal is to convert a pretra...

Please sign up or login with your details

Forgot password? Click here to reset