byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings

by   Xiang Zhang, et al.

This article introduces byteSteady – a fast model for classification using byte-level n-gram embeddings. byteSteady assumes that each input comes as a sequence of bytes. A representation vector is produced using the averaged embedding vectors of byte-level n-grams, with a pre-defined set of n. The hashing trick is used to reduce the number of embedding vectors. This input representation vector is then fed into a linear classifier. A straightforward application of byteSteady is text classification. We also apply byteSteady to one type of non-language data – DNA sequences for gene classification. For both problems we achieved competitive classification results against strong baselines, suggesting that byteSteady can be applied to both language and non-language data. Furthermore, we find that simple compression using Huffman coding does not significantly impact the results, which offers an accuracy-speed trade-off previously unexplored in machine learning.



There are no comments yet.


page 1

page 2

page 3

page 4


An Improved Classification Model for Igbo Text Using N-Gram And K-Nearest Neighbour Approaches

This paper presents an improved classification model for Igbo text using...

Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

Despite the success of deep learning on many fronts especially image and...

TextConvoNet:A Convolutional Neural Network based Architecture for Text Classification

In recent years, deep learning-based models have significantly improved ...

Sentiment Classification using N-gram IDF and Automated Machine Learning

We propose a sentiment classification method with a general machine lear...

An empirical study on large scale text classification with skip-gram embeddings

We investigate the integration of word embeddings as classification feat...

Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches

Feature embedding methods have been proposed in literature to represent ...

Beyond image classification: zooplankton identification with deep vector space embeddings

Zooplankton images, like many other real world data types, have intrinsi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.