Towards Generalising Neural Topical Representations

07/24/2023
by   Xiaohao Yang, et al.
0

Topic models have evolved from conventional Bayesian probabilistic models to Neural Topic Models (NTMs) over the last two decays. Although NTMs have achieved promising performance when trained and tested on a specific corpus, their generalisation ability across corpora is rarely studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation for documents in a different target corpus without retraining. In this work, we aim to improve NTMs further so that their benefits generalise reliably across corpora and tasks. To do so, we propose to model similar documents by minimising their semantical distance when training NTMs. Specifically, similar documents are created by data augmentation during training; The semantical distance between documents is measured by the Hierarchical Topic Transport Distance (HOTT), which computes the Optimal Transport (OT) distance between the topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora.

READ FULL TEXT

page 1

page 4

page 8

page 14

research
06/26/2019

Hierarchical Optimal Transport for Document Representation

The ability to measure similarity between documents enables intelligent ...
research
10/16/2022

Coordinated Topic Modeling

We propose a new problem called coordinated topic modeling that imitates...
research
11/30/2021

Bilingual Topic Models for Comparable Corpora

Probabilistic topic models like Latent Dirichlet Allocation (LDA) have b...
research
01/28/2019

A new evaluation framework for topic modeling algorithms based on synthetic corpora

Topic models are in widespread use in natural language processing and be...
research
10/26/2020

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Categorizing documents into a given label hierarchy is intuitively appea...
research
06/24/2019

Assessing the Applicability of Authorship Verification Methods

Authorship verification (AV) is a research subject in the field of digit...
research
10/24/2019

Deep topic modeling by multilayer bootstrap network and lasso

Topic modeling is widely studied for the dimension reduction and analysi...

Please sign up or login with your details

Forgot password? Click here to reset