FastText.zip: Compressing text classification models

12/12/2016
by   Armand Joulin, et al.
0

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings. While the original technique leads to a loss in accuracy, we adapt this method to circumvent quantization artefacts. Our experiments carried out on several benchmarks show that our approach typically requires two orders of magnitude less memory than fastText while being only slightly inferior with respect to accuracy. As a result, it outperforms the state of the art by a good margin in terms of the compromise between memory usage and accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing...
research
09/19/2023

Semantic Text Compression for Classification

We study semantic compression for text where meanings contained in the t...
research
04/26/2018

Link and code: Fast indexing with graphs and compact regression codes

Similarity search approaches based on graph walks have recently attained...
research
07/06/2016

Bag of Tricks for Efficient Text Classification

This paper explores a simple and efficient baseline for text classificat...
research
04/04/2019

Text Classification Components for Detecting Descriptions and Names of CAD models

We apply text analysis approaches for a specialized search engine for 3D...
research
11/23/2017

In Defense of Product Quantization

Despite their widespread adoption, Product Quantization techniques were ...
research
02/17/2017

Analysis and Optimization of fastText Linear Text Classifier

The paper [1] shows that simple linear classifier can compete with compl...

Please sign up or login with your details

Forgot password? Click here to reset