A Framework for Neural Topic Modeling of Text Corpora

08/19/2021
by   Shayan Fazeli, et al.
0

Topic Modeling refers to the problem of discovering the main topics that have occurred in corpora of textual data, with solutions finding crucial applications in numerous fields. In this work, inspired by the recent advancements in the Natural Language Processing domain, we introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features and utilizing them in discovering topics and clustering text documents that are semantically similar in a corpus. These features range from traditional approaches (e.g., frequency-based) to the most recent auto-encoding embeddings from transformer-based language models such as BERT model family. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset. The library is available online.

READ FULL TEXT

page 1

page 2

page 3

research
10/28/2020

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

Production of news content is growing at an astonishing rate. To help ma...
research
09/14/2019

Multi-view and Multi-source Transfers in Neural Topic Modeling

Though word embeddings and topics are complementary representations, sev...
research
06/04/2018

History Playground: A Tool for Discovering Temporal Trends in Massive Textual Corpora

Recent studies have shown that macroscopic patterns of continuity and ch...
research
06/09/2022

Analyzing Folktales of Different Regions Using Topic Modeling and Clustering

This paper employs two major natural language processing techniques, top...
research
09/05/2020

Visually Analyzing Contextualized Embeddings

In this paper we introduce a method for visually analyzing contextualize...
research
05/02/2017

Fuzzy Approach Topic Discovery in Health and Medical Corpora

The majority of medical documents and electronic health records (EHRs) a...
research
06/21/2017

JaTeCS an open-source JAva TExt Categorization System

JaTeCS is an open source Java library that supports research on automati...

Please sign up or login with your details

Forgot password? Click here to reset