DeepAI AI Chat
Log In Sign Up

Hierarchical Multiclass Decompositions with Application to Authorship Determination

by   Ran El-Yaniv, et al.

This paper is mainly concerned with the question of how to decompose multiclass classification problems into binary subproblems. We extend known Jensen-Shannon bounds on the Bayes risk of binary problems to hierarchical multiclass problems and use these bounds to develop a heuristic procedure for constructing hierarchical multiclass decomposition for multinomials. We test our method and compare it to the well known "all-pairs" decomposition. Our tests are performed using a new authorship determination benchmark test of machine learning authors. The new method consistently outperforms the all-pairs decomposition when the number of classes is small and breaks even on larger multiclass problems. Using both methods, the classification accuracy we achieve, using an SVM over a feature set consisting of both high frequency single tokens and high frequency token-pairs, appears to be exceptionally high compared to known results in authorship determination.


page 1

page 2

page 3

page 4


Hierarchical Classification using Binary Data

In classification problems, especially those that categorize data into a...

Nonparallel Hyperplane Classifiers for Multi-category Classification

Support vector machines (SVMs) are widely used for solving classificatio...

Direction-aware Feature-level Frequency Decomposition for Single Image Deraining

We present a novel direction-aware feature-level frequency decomposition...

MARS: Masked Automatic Ranks Selection in Tensor Decompositions

Tensor decomposition methods have recently proven to be efficient for co...

Sharp bounds on Helmholtz impedance-to-impedance maps and application to overlapping domain decomposition

We prove sharp bounds on certain impedance-to-impedance maps (and their ...

A Hierarchical Spectral Method for Extreme Classification

Extreme classification problems are multiclass and multilabel classifica...

Tokenization and the Noiseless Channel

Subword tokenization is a key part of many NLP pipelines. However, littl...