Nonoverlapping (delta, gamma)-approximate pattern matching

09/07/2022
by   Youxi Wu, et al.
0

Pattern matching can be used to calculate the support of patterns, and is a key issue in sequential pattern mining (or sequence pattern mining). Nonoverlapping pattern matching means that two occurrences cannot use the same character in the sequence at the same position. Approximate pattern matching allows for some data noise, and is more general than exact pattern matching. At present, nonoverlapping approximate pattern matching is based on Hamming distance, which cannot be used to measure the local approximation between the subsequence and pattern, resulting in large deviations in matching results. To tackle this issue, we present a Nonoverlapping Delta and gamma approximate Pattern matching (NDP) scheme that employs the (delta, gamma)-distance to give an approximate pattern matching, where the local and the global distances do not exceed delta and gamma, respectively. We first transform the NDP problem into a local approximate Nettree and then construct an efficient algorithm, called the local approximate Nettree for NDP (NetNDP). We propose a new approach called the Minimal Root Distance which allows us to determine whether or not a node has root paths that satisfy the global constraint and to prune invalid nodes and parent-child relationships. NetNDP finds the rightmost absolute leaf of the max root, searches for the rightmost occurrence from the rightmost absolute leaf, and deletes this occurrence. We iterate the above steps until there are no new occurrences. Numerous experiments are used to verify the performance of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2022

AOP-Miner: Approximate Order-Preserving Pattern Mining for Time Series

The order-preserving pattern mining can be regarded as discovering frequ...
research
10/03/2018

Approximating Approximate Pattern Matching

Given a text T of length n and a pattern P of length m, the approximate ...
research
06/19/2017

The E-Average Common Submatrix: Approximate Searching in a Restricted Neighborhood

This paper introduces a new (dis)similarity measure for 2D arrays, exten...
research
05/28/2022

A New High-Performance Approach to Approximate Pattern-Matching for Plagiarism Detection in Blockchain-Based Non-Fungible Tokens (NFTs)

We are presenting a fast and innovative approach to performing approxima...
research
05/06/2020

Quantum pattern matching Oracle construction

We propose a couple of oracle construction methods for quantum pattern m...
research
02/09/2018

Self-Bounded Prediction Suffix Tree via Approximate String Matching

Prediction suffix trees (PST) provide an effective tool for sequence mod...
research
06/11/2021

Matching Patterns with Variables under Hamming Distance

A pattern α is a string of variables and terminal letters. We say that α...

Please sign up or login with your details

Forgot password? Click here to reset