Linking Sequences of Events with Sparse or No Common Occurrence across Data Sets

11/12/2017
by   Yunsung Kim, et al.
0

Data of practical interest - such as personal records, transaction logs, and medical histories - are sequential collections of events relevant to a particular source entity. Recent studies have attempted to link sequences that represent a common entity across data sets to allow more comprehensive statistical analyses and to identify potential privacy failures. Yet, current approaches remain tailored to their specific domains of application, and they fail when co-referent sequences in different data sets contain sparse or no common events, which occurs frequently in many cases. To address this, we formalize the general problem of "sequence linkage" and describe "LDA-Link," a generic solution that is applicable even when co-referent event sequences contain no common items at all. LDA-Link is built upon "Split-Document" model, a new mixed-membership probabilistic model for the generation of event sequence collections. It detects the latent similarity of sequences and thus achieves robustness particularly when co-referent sequences share sparse or no event overlap. We apply LDA-Link in the context of social media profile reconciliation where users make no common posts across platforms, comparing to the state-of-the-art generic solution to sequence linkage.

READ FULL TEXT
research
04/03/2020

Neural Conditional Event Time Models

Event time models predict occurrence times of an event of interest based...
research
10/08/2012

ET-LDA: Joint Topic Modeling For Aligning, Analyzing and Sensemaking of Public Events and Their Twitter Feeds

Social media channels such as Twitter have emerged as popular platforms ...
research
08/27/2020

CausalFlow: Visual Analytics of Causality in Event Sequences

Understanding the relation of events plays an important role in differen...
research
10/04/2021

Beyond Topics: Discovering Latent Healthcare Objectives from Event Sequences

A meaningful understanding of clinical protocols and patient pathways he...
research
12/22/2015

News Across Languages - Cross-Lingual Document Similarity and Event Tracking

In today's world, we follow news which is distributed globally. Signific...
research
10/28/2019

Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling

We target modeling latent dynamics in high-dimension marked event sequen...

Please sign up or login with your details

Forgot password? Click here to reset