Subpath Queries on Compressed Graphs: a Survey

11/19/2020
by   Nicola Prezza, et al.
0

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query's length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text's entropy. These contributions had an enormous impact in bioinformatics: nowadays, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today's compressed indexes for labeled graphs and regular languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2020

On Locating Paths in Compressed Cardinal Trees

A compressed index is a data structure representing a text within compre...
research
03/26/2018

Universal Compressed Text Indexing

The rise of repetitive datasets has lately generated a lot of interest i...
research
08/07/2023

Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

In the last decades, the necessity to process massive amounts of textual...
research
09/26/2019

String Indexing with Compressed Patterns

Given a string S of length n, the classic string indexing problem is to ...
research
09/08/2018

Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

Indexing highly repetitive texts --- such as genomic databases, software...
research
04/03/2023

Compressed Indexing for Consecutive Occurrences

The fundamental question considered in algorithms on strings is that of ...
research
03/09/2023

Elastic Founder Graphs Improved and Enhanced

Indexing labeled graphs for pattern matching is a central challenge of p...

Please sign up or login with your details

Forgot password? Click here to reset