Gzip versus bag-of-words for text classification with KNN

07/27/2023
by   Juri Opitz, et al.
0

The effectiveness of compression distance in KNN-based text classification ('gzip') has recently garnered lots of attention. In this note we show that simpler means can also be effective, and compression may not be needed. Indeed, a 'bag-of-words' matching can achieve similar or better results, and is more efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2016

Bag of Tricks for Efficient Text Classification

This paper explores a simple and efficient baseline for text classificat...
research
06/24/2016

Interactive Semantic Featuring for Text Classification

In text classification, dictionaries can be used to define human-compreh...
research
02/27/2018

Convolutional Neural Networks for Toxic Comment Classification

Flood of information is produced in a daily basis through the global Int...
research
02/17/2017

Analysis and Optimization of fastText Linear Text Classifier

The paper [1] shows that simple linear classifier can compete with compl...
research
08/25/2023

Compressor-Based Classification for Atrial Fibrillation Detection

Atrial fibrillation (AF) is one of the most common arrhythmias with chal...
research
07/12/2018

Orthogonal Matching Pursuit for Text Classification

In text classification, the problem of overfitting arises due to the hig...
research
11/08/2018

Doc2Im: document to image conversion through self-attentive embedding

Text classification is a fundamental task in NLP applications. Latest re...

Please sign up or login with your details

Forgot password? Click here to reset