Hierarchical Multiclass Decompositions with Application to Authorship Determination

10/11/2010
by   Ran El-Yaniv, et al.
0

This paper is mainly concerned with the question of how to decompose multiclass classification problems into binary subproblems. We extend known Jensen-Shannon bounds on the Bayes risk of binary problems to hierarchical multiclass problems and use these bounds to develop a heuristic procedure for constructing hierarchical multiclass decomposition for multinomials. We test our method and compare it to the well known "all-pairs" decomposition. Our tests are performed using a new authorship determination benchmark test of machine learning authors. The new method consistently outperforms the all-pairs decomposition when the number of classes is small and breaks even on larger multiclass problems. Using both methods, the classification accuracy we achieve, using an SVM over a feature set consisting of both high frequency single tokens and high frequency token-pairs, appears to be exceptionally high compared to known results in authorship determination.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2018

Hierarchical Classification using Binary Data

In classification problems, especially those that categorize data into a...
research
04/16/2020

Nonparallel Hyperplane Classifiers for Multi-category Classification

Support vector machines (SVMs) are widely used for solving classificatio...
research
06/15/2021

Direction-aware Feature-level Frequency Decomposition for Single Image Deraining

We present a novel direction-aware feature-level frequency decomposition...
research
06/18/2020

MARS: Masked Automatic Ranks Selection in Tensor Decompositions

Tensor decomposition methods have recently proven to be efficient for co...
research
11/26/2022

Sharp bounds on Helmholtz impedance-to-impedance maps and application to overlapping domain decomposition

We prove sharp bounds on certain impedance-to-impedance maps (and their ...
research
11/10/2015

A Hierarchical Spectral Method for Extreme Classification

Extreme classification problems are multiclass and multilabel classifica...
research
06/29/2023

Tokenization and the Noiseless Channel

Subword tokenization is a key part of many NLP pipelines. However, littl...

Please sign up or login with your details

Forgot password? Click here to reset