Identifying Hierarchical Structure in Sequences: A linear-time algorithm

09/01/1997
by   C. G. Nevill-Manning, et al.
0

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2019

Extending de Bruijn sequences to larger alphabets

A circular de Bruijn sequence of order n in an alphabet of k symbols is ...
research
06/02/2017

Efficient Textual Representation of Structure

This paper attempts a more formal approach to the legibility of text bas...
research
05/13/2018

Emergence and Evolution of Hierarchical Structure in Complex Systems

It is well known that many complex systems, both in technology and natur...
research
11/15/2022

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...
research
02/01/2019

Linear-size Suffix Tries for Parameterized Strings

In this paper, we propose a new indexing structure for parameterized str...
research
02/21/2018

An Information-Theoretical Analysis of the Minimum Cost to Erase Information

We normally hold a lot of confidential information in hard disk drives a...
research
02/17/2016

Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data

Data represented as strings abounds in biology, linguistics, document mi...

Please sign up or login with your details

Forgot password? Click here to reset