On Horizontal and Vertical Separation in Hierarchical Text Classification

09/02/2016
by   Mostafa Dehghani, et al.
0

Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce a "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Hierarchical Significant Words Language Models (HSWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real-world data and demonstrate that how HSWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Hierarchy Decoder is All You Need To Text Classification

Hierarchical text classification (HTC) to a taxonomy is essential for va...
research
06/05/2020

Hierarchical Class-Based Curriculum Loss

Classification algorithms in machine learning often assume a flat label ...
research
05/26/2023

Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification

Due to the complex label hierarchy and intensive labeling cost in practi...
research
08/09/2023

RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction

We present RadGraph2, a novel dataset for extracting information from ra...
research
12/29/2018

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents ...
research
03/17/2022

HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information

Transformer-based language models usually treat texts as linear sequence...
research
01/30/2023

Using cluster analysis on municipal statistical data to configure public policies about Water, Sanitation and Hygiene in Venezuela

Objective: The aim of this research is to demonstrate how the use of hie...

Please sign up or login with your details

Forgot password? Click here to reset