Generalized Dictionary Matching under Substring Consistent Equivalence Relations
Given a set of patterns called a dictionary and a text, the dictionary matching problem is a task to find all occurrence positions of all patterns in the text. The dictionary matching problem can be solved efficiently by using the Aho-Corasick algorithm. Recently, Matsuoka et al. [TCS, 2016] proposed a generalization of pattern matching problem under substring consistent equivalence relations and presented a generalization of the Knuth-Morris-Pratt algorithm to solve this problem. An equivalence relation ≈ is a substring consistent equivalence relation (SCER) if for two strings X,Y, X ≈ Y implies |X| = |Y| and X[i:j] ≈ Y[i:j] for all 1 < i < j < |X|. In this paper, we propose a generalization of the dictionary matching problem and present a generalization of the Aho-Corasick algorithm for the dictionary matching under SCER. We present an algorithm that constructs SCER automata and an algorithm that performs dictionary matching under SCER by using the automata. Moreover, we show the time and space complexity of our algorithms with respect to the size of input strings.
READ FULL TEXT