Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

07/13/2021
by   Dominic B. Dayta, et al.
0

Legacy procedures for topic modelling have generally suffered problems of overfitting and a weakness towards reconstructing sparse topic structures. With motivation from a consumer-generated corpora, this paper proposes semiparametric topic model, a two-step approach utilizing nonnegative matrix factorization and semiparametric regression in topic modeling. The model enables the reconstruction of sparse topic structures in the corpus and provides a generative model for predicting topics in new documents entering the corpus. Assuming the presence of auxiliary information related to the topics, this approach exhibits better performance in discovering underlying topic structures in cases where the corpora are small and limited in vocabulary. In an actual consumer feedback corpus, the model also demonstrably provides interpretable and useful topic definitions comparable with those produced by other methods.

READ FULL TEXT
research
02/23/2017

Stability of Topic Modeling via Matrix Factorization

Topic models can provide us with an insight into the underlying latent s...
research
01/28/2019

A new evaluation framework for topic modeling algorithms based on synthetic corpora

Topic models are in widespread use in natural language processing and be...
research
12/31/2019

Domain-topic models with chained dimensions: charting the evolution of a major oncology conference (1995-2017)

This paper presents three main contributions to the computational study ...
research
10/03/2014

Probit Normal Correlated Topic Models

The logistic normal distribution has recently been adapted via the trans...
research
09/06/2015

Sampled Weighted Min-Hashing for Large-Scale Topic Mining

We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to...
research
10/08/2021

Learning Topic Models: Identifiability and Finite-Sample Analysis

Topic models provide a useful text-mining tool for learning, extracting ...
research
11/24/2022

Multi-scale Hybridized Topic Modeling: A Pipeline for Analyzing Unstructured Text Datasets via Topic Modeling

We propose a multi-scale hybridized topic modeling method to find hidden...

Please sign up or login with your details

Forgot password? Click here to reset