Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

05/23/2022
by   Anish Saha, et al.
0

Practitioners from many disciplines (e.g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples. We propose HierSeed, a novel weakly supervised algorithm for this task that uses only a small set of labeled seed examples. It is both data and computationally efficient. HierSeed assigns documents to topics by weighing document density against topic hierarchical structure. It outperforms both unsupervised and supervised baselines for the SHC task on three real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2019

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification

Hierarchical text classification has many real-world applications. Howev...
research
11/20/2021

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Dataless text classification, i.e., a new paradigm of weakly supervised ...
research
04/19/2021

Modeling "Newsworthiness" for Lead-Generation Across Corpora

Journalists obtain "leads", or story ideas, by reading large corpora of ...
research
04/17/2023

Open-World Weakly-Supervised Object Localization

While remarkable success has been achieved in weakly-supervised object l...
research
12/29/2018

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents ...
research
10/28/2019

Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

Insightful findings in political science often require researchers to an...
research
06/04/2018

Automatic Clustering of a Network Protocol with Weakly-Supervised Clustering

Abstraction is a fundamental part when learning behavioral models of sys...

Please sign up or login with your details

Forgot password? Click here to reset