Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

01/29/2022
by   Gaël Poux-Médard, et al.
0

The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little. Furthermore, the textual content of a document is not always correlated to its temporal dynamics. We develop a method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. PDHP also alleviates the hypothesis that textual content and temporal dynamics are perfectly correlated. We demonstrate that PDHP generalizes previous work –such as DHP and UP. Finally, we illustrate a possible application using a real-world dataset from Reddit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

The textual content of a document and its publication date are intertwin...
research
12/12/2022

Multivariate Powered Dirichlet Hawkes Process

The publication time of a document carries a relevant information about ...
research
12/12/2022

Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks

Information spread on networks can be efficiently modeled by considering...
research
07/04/2016

Temporal Topic Analysis with Endogenous and Exogenous Processes

We consider the problem of modeling temporal textual data taking endogen...
research
09/16/2022

Properties of Reddit News Topical Interactions

Most models of information diffusion online rely on the assumption that ...
research
01/21/2019

AD3: Attentive Deep Document Dater

Knowledge of the creation date of documents facilitates several tasks su...
research
10/23/2017

Automating, Operationalizing and Productizing Journalistic Article Analysis

Public Good Software's products match journalistic articles and other na...

Please sign up or login with your details

Forgot password? Click here to reset