An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

04/05/2016
by   John O. R. Aoga, et al.
0

The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2013

A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

Constraint-based pattern discovery is at the core of numerous data minin...
research
04/29/2015

Prefix-Projection Global Constraint for Sequential Pattern Mining

Sequential pattern mining under constraints is a challenging data mining...
research
11/26/2015

A global Constraint for mining Sequential Patterns with GAP constraint

Sequential pattern mining (SPM) under gap constraint is a challenging ta...
research
11/09/2018

Stratified Constructive Disjunction and Negation in Constraint Programming

Constraint Programming (CP) is a powerful declarative programming paradi...
research
10/01/2019

Towards Improving Solution Dominance with Incomparability Conditions: A case-study using Generator Itemset Mining

Finding interesting patterns is a challenging task in data mining. Const...
research
10/10/2019

Reflections on "Incremental Cardinality Constraints for MaxSAT"

To celebrate the first 25 years of the International Conference on Princ...
research
09/25/2017

Mining a Sub-Matrix of Maximal Sum

Biclustering techniques have been widely used to identify homogeneous su...

Please sign up or login with your details

Forgot password? Click here to reset