Efficient Semiring-Weighted Earley Parsing

07/06/2023
by   Andreas Opedal, et al.
0

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's O (N^3|G||R|), which is unworkable for the large grammars that arise in natural language processing, to O (N^3|G|), which matches the runtime of CKY on a binarized version of the grammar G. Here N is the length of the sentence, |R| is the number of productions in G, and |G| is the total length of those productions. We also provide a version that achieves runtime of O (N^3|M|) with |M| ≤ |G| when the grammar is represented compactly as a single finite-state automaton M (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2023

A Fast Algorithm for Computing Prefix Probabilities

Multiple algorithms are known for efficiently calculating the prefix pro...
research
02/21/2017

On the Complexity of CCG Parsing

We study the parsing complexity of Combinatory Categorial Grammar (CCG) ...
research
10/13/2022

Algorithms for Weighted Pushdown Automata

Weighted pushdown automata (WPDAs) are at the core of many natural langu...
research
05/24/2021

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows...
research
08/29/2011

Parsing Combinatory Categorial Grammar with Answer Set Programming: Preliminary Report

Combinatory categorial grammar (CCG) is a grammar formalism used for nat...
research
11/10/2016

Roadmap Enhanced Improvement to the VSIMM Tracker via a Constrained Stochastic Context Free Grammar

The aim of syntactic tracking is to classify spatio-temporal patterns of...
research
05/03/2023

Approximating CKY with Transformers

We investigate the ability of transformer models to approximate the CKY ...

Please sign up or login with your details

Forgot password? Click here to reset