DeepAI AI Chat
Log In Sign Up

Identifying Hierarchical Structure in Sequences: A linear-time algorithm

by   C. G. Nevill-Manning, et al.

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.


page 1

page 2

page 3

page 4


Extending de Bruijn sequences to larger alphabets

A circular de Bruijn sequence of order n in an alphabet of k symbols is ...

Efficient Textual Representation of Structure

This paper attempts a more formal approach to the legibility of text bas...

Emergence and Evolution of Hierarchical Structure in Complex Systems

It is well known that many complex systems, both in technology and natur...

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...

Linear-size Suffix Tries for Parameterized Strings

In this paper, we propose a new indexing structure for parameterized str...

An Information-Theoretical Analysis of the Minimum Cost to Erase Information

We normally hold a lot of confidential information in hard disk drives a...

Code Repositories


GrammarViz 2.0 public release:

view repo