Topic supervised non-negative matrix factorization

06/12/2017
by   Kelsey MacMillan, et al.
0

Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case that the results of a topic model do not align with human interpretation. This interpretability fallacy is largely due to the unsupervised nature of topic models, which prohibits any user guidance on the results of a model. In this paper, we introduce a semi-supervised method called topic supervised non-negative matrix factorization (TS-NMF) that enables the user to provide labeled example documents to promote the discovery of more meaningful semantic structure of a corpus. In this way, the results of TS-NMF better match the intuition and desired labeling of the user. The core of TS-NMF relies on solving a non-convex optimization problem for which we derive an iterative algorithm that is shown to be monotonic and convergent to a local optimum. We demonstrate the practical utility of TS-NMF on the Reuters and PubMed corpora, and find that TS-NMF is especially useful for conceptual or broad topics, where topic key terms are not well understood. Although identifying an optimal latent structure for the data is not a primary objective of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard similarity scores than the contemporary methods, (unsupervised) NMF and latent Dirichlet allocation, at supervision rates as low as 10

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Classification and topic modeling are popular techniques in machine lear...
research
12/18/2019

Topic subject creation using unsupervised learning for topic modeling

We describe the use of Non-Negative Matrix Factorization (NMF) and Laten...
research
08/21/2022

SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection

As the amount of text data continues to grow, topic modeling is serving ...
research
11/19/2017

A Double Parametric Bootstrap Test for Topic Models

Non-negative matrix factorization (NMF) is a technique for finding laten...
research
07/11/2016

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

This study analyzes the political agenda of the European Parliament (EP)...
research
04/28/2021

Analysis of Legal Documents via Non-negative Matrix Factorization Methods

The California Innocence Project (CIP), a clinical law school program ai...
research
05/27/2021

Non-negative matrix factorization algorithms greatly improve topic model fits

We report on the potential for using algorithms for non-negative matrix ...

Please sign up or login with your details

Forgot password? Click here to reset