Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

02/08/2015
by   Adham Beykikhoshk, et al.
0

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

READ FULL TEXT
research
05/16/2016

Identification of promising research directions using machine learning aided medical literature analysis

The rapidly expanding corpus of medical research literature presents maj...
research
01/08/2023

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Topic Modelling (TM) is from the research branches of natural language u...
research
04/13/2018

Per-Corpus Configuration of Topic Modelling for GitHub and Stack Overflow Collections

To make sense of large amounts of textual data, topic modelling is frequ...
research
09/15/2021

Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

The textual content of a document and its publication date are intertwin...
research
11/29/2021

Changepoint Analysis of Topic Proportions in Temporal Text Data

Changepoint analysis deals with unsupervised detection and/or estimation...
research
10/12/2021

Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking

The essential task of Topic Detection and Tracking (TDT) is to organize ...

Please sign up or login with your details

Forgot password? Click here to reset