Position Heaps for Cartesian-tree Matching on Strings and Tries
The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings S' of the text string S such that the Cartesian tree of S' and that of a query pattern P coincide. In this paper, we present a new indexing structure for this problem called the Cartesian-tree Position Heap (CPH). Let n be the length of the input text string S, m the length of a query pattern P, and σ the alphabet size. We show that the CPH of S, denoted 𝖢𝖯𝖧(S), supports pattern matching queries in O(m (σ + log (min{h, m})) + occ) time with O(n) space, where h is the height of the CPH and occ is the number of pattern occurrences. We show how to build 𝖢𝖯𝖧(S) in O(n logσ) time with O(n) working space. Further, we extend the problem to the case where the text is a labeled tree (i.e. a trie). Given a trie T with N nodes, we show that the CPH of T, denoted 𝖢𝖯𝖧(T), supports pattern matching queries on the trie in O(m (σ^2 + log (min{h, m})) + occ) time with O(N σ) space. We also show a construction algorithm for 𝖢𝖯𝖧(T) running in O(N σ) time and O(N σ) working space.
READ FULL TEXT