Utility-driven Mining of Contiguous Sequences

10/30/2021
by   Chunkai Zhang, et al.
0

Recently, contiguous sequential pattern mining (CSPM) gained interest as a research topic, due to its varied potential real-world applications, such as web log and biological sequence analysis. To date, studies on the CSPM problem remain in preliminary stages. Existing CSPM algorithms lack the efficiency to satisfy users' needs and can still be improved in terms of runtime and memory consumption. In addition, existing algorithms were developed to deal with simple sequence data, working with only one event at a time. Complex sequence data, which represent multiple events occurring simultaneously, are also commonly observed in real life. In this paper, we propose a novel algorithm, fast utility-driven contiguous sequential pattern mining (FUCPM), to address the CSPM problem. FUCPM adopts a compact sequence information list and instance chain structures to store the necessary information of the database and candidate patterns. For further efficiency, we develop the global unpromising items pruning and local unpromising items pruning strategies, based on sequence-weighted utilization and item-extension utilization, to reduce the search space. Extensive experiments on real-world and synthetic datasets demonstrate that FUCPM outperforms the state-of-the-art algorithms and is scalable enough to handle complex sequence data.

READ FULL TEXT

page 2

page 10

page 11

research
12/25/2019

Discovering High Utility Episodes in Sequences

Sequence data, e.g., complex event sequence, is more commonly seen than ...
research
04/28/2019

Fast Utility Mining on Complex Sequences

High-utility sequential pattern mining is an emerging topic in the field...
research
12/04/2019

Mining Top-k Trajectory-Patterns from Anonymized Data

The ubiquity of GPS enabled devices result into the generation of an eno...
research
06/28/2021

THUE: Discovering Top-K High Utility Episodes

Episode discovery from an event is a popular framework for data mining t...
research
01/27/2022

Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

The need to analyze information from streams arises in a variety of appl...
research
03/30/2021

TUSQ: Targeted High-Utility Sequence Querying

Significant efforts have been expended in the research and development o...
research
05/17/2019

Reference-Based Sequence Classification

Sequence classification is an important data mining task in many real wo...

Please sign up or login with your details

Forgot password? Click here to reset