MDL-based Compressing Sequential Rules

12/20/2022
by   Xinhong Chen, et al.
0

Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem of compression of SRM and also propose a solution named ComSR for MDL-based compressing of sequential rules based on the designed sequential rule coding scheme. To our knowledge, we are the first to use sequential rules to encode an entire database. A heuristic method is proposed to find a set of compact and meaningful sequential rules as much as possible. ComSR has two trade-off algorithms, ComSR_non and ComSR_ful, based on whether the database can be completely compressed. Experiments done on a real dataset with different thresholds show that a set of compact and meaningful sequential rules can be found. This shows that the proposed method works.

READ FULL TEXT
research
06/09/2022

Towards Target Sequential Rules

In many real-world applications, sequential rule mining (SRM) can provid...
research
09/15/2021

Discovering Useful Compact Sets of Sequential Rules in a Long Sequence

We are interested in understanding the underlying generation process for...
research
01/17/2020

Population-based metaheuristics for Association Rule Text Mining

Nowadays, the majority of data on the Internet is held in an unstructure...
research
06/15/2018

Efficient Graph Compression Using Huffman Coding Based Techniques

Graphs have been extensively used to represent data from various domains...
research
12/12/2022

DRIVE: Dockerfile Rule Mining and Violation Detection

A Dockerfile defines a set of instructions to build Docker images, which...
research
10/27/2022

Towards Correlated Sequential Rules

The goal of high-utility sequential pattern mining (HUSPM) is to efficie...
research
04/21/2023

Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing

With the onset of the Information Era and the rapid growth of informatio...

Please sign up or login with your details

Forgot password? Click here to reset