Mining Top-k Sequential Patterns in Database Graphs:A New Challenging Problem and a Sampling-based Approach

05/08/2018
by   Mingtao Lei, et al.
0

In many real world networks, a vertex is usually associated with a transaction database that comprehensively describes the behaviour of the vertex. A typical example is the social network, where the behaviour of every user is depicted by a transaction database that stores his daily posted contents. A transaction database is a set of transactions, where a transaction is a set of items. Every path of the network is a sequence of vertices that induces multiple sequences of transactions. The sequences of transactions induced by all of the paths in the network forms an extremely large sequence database. Finding frequent sequential patterns from such sequence database discovers interesting subsequences that frequently appear in many paths of the network. However, it is a challenging task, since the sequence database induced by a database graph is too large to be explicitly induced and stored. In this paper, we propose the novel notion of database graph, which naturally models a wide spectrum of real world networks by associating each vertex with a transaction database. Our goal is to find the top-k frequent sequential patterns in the sequence database induced from a database graph. We prove that this problem is #P-hard. To tackle this problem, we propose an efficient two-step sampling algorithm that approximates the top-k frequent sequential patterns with provable quality guarantee. Extensive experimental results on synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2002

The Algorithms of Updating Sequential Patterns

Because the data being mined in the temporal database will evolve with t...
research
09/23/2017

Finding Theme Communities from Database Networks: from Mining to Indexing and Query Answering

Given a database network where each vertex is associated with a transact...
research
06/21/2020

Database Optimization to Recommend Software Developers using Canonical Order Tree

Recently frequent and sequential pattern mining algorithms have been wid...
research
11/12/2019

Identifying Hidden Buyers in Darknet Markets via Dirichlet Hawkes Process

The darknet markets are notorious black markets in cyberspace, which inv...
research
11/04/2017

Transaction Fraud Detection Using GRU-centered Sandwich-structured Model

Rapid growth of modern technologies such as internet and mobile computin...
research
10/14/2015

A Bayesian Network Model for Interesting Itemsets

Mining itemsets that are the most interesting under a statistical model ...
research
11/26/2019

Finding Route Hotspots in Large Labeled Networks

In many advanced network analysis applications, like social networks, e-...

Please sign up or login with your details

Forgot password? Click here to reset