A compressed dynamic self-index for highly repetitive text collections

11/08/2017
by   Takaaki Nishimoto, et al.
0

We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding is a compressed dynamic self-index for highly repetitive texts and has a large disadvantage that the pattern search for short patterns is slow. We improve this disadvantage for faster pattern search by leveraging an idea behind truncated suffix tree and present the first compressed dynamic self-index named TST-index that supports not only fast pattern search but also dynamic update operation of index for highly repetitive texts. Experiments using a benchmark dataset of highly repetitive texts show that the pattern search of TST-index is significantly improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2019

E2FM: an encrypted and compressed full-text index for collections of genomic sequences

Next Generation Sequencing (NGS) platforms and, more generally, high-thr...
research
04/01/2020

Grammar-Compressed Indexes with Logarithmic Search Time

Let a text T[1..n] be the only string generated by a context-free gramma...
research
03/29/2021

A Fast and Small Subsampled R-index

The r-index (Gagie et al., JACM 2020) represented a breakthrough in comp...
research
10/07/2018

A Fast Text Similarity Measure for Large Document Collections using Multi-reference Cosine and Genetic Algorithm

One of the important factors that make a search engine fast and accurate...
research
09/08/2018

Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

Indexing highly repetitive texts --- such as genomic databases, software...
research
07/10/2020

Truss-based Structural Diversity Search in Large Graphs

Social decisions made by individuals are easily influenced by informatio...

Please sign up or login with your details

Forgot password? Click here to reset