Learning Efficient Disambiguation

06/02/1999
by   Khalil Sima'an, et al.
0

This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It provides proofs that various problems of probabilistic disambiguation are NP-Complete under instances of these performance-models, and it argues that none of these models accounts for attractive efficiency properties of human language processing in limited domains, e.g. that frequent inputs are usually processed faster than infrequent ones. The central hypothesis of this dissertation is that these shortcomings can be eliminated by specializing the performance-models to the limited domains. The dissertation addresses "grammar and model specialization" and presents a new framework, the Ambiguity-Reduction Specialization (ARS) framework, that formulates the necessary and sufficient conditions for successful specialization. The framework is instantiated into specialization algorithms and applied to specializing DOP. Novelties of these learning algorithms are 1) they limit the hypotheses-space to include only "safe" models, 2) are expressed as constrained optimization formulae that minimize the entropy of the training tree-bank given the specialized grammar, under the constraint that the size of the specialized model does not exceed a predefined maximum, and 3) they enable integrating the specialized model with the original one in a complementary manner. The dissertation provides experiments with initial implementations and compares the resulting Specialized DOP (SDOP) models to the original DOP models with encouraging results.

READ FULL TEXT
research
05/30/2023

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Large language models (LLMs) can learn to perform a wide range of natura...
research
12/01/2020

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

There are two major classes of natural language grammars – the dependenc...
research
08/25/2023

Construction Grammar and Language Models

Recent progress in deep learning and natural language processing has giv...
research
09/03/2019

Attributed Rhetorical Structure Grammar for Domain Text Summarization

This paper presents a new approach of automatic text summarization which...
research
09/05/2017

Optimizing for Measure of Performance in Max-Margin Parsing

Many statistical learning problems in the area of natural language proce...
research
04/29/2021

Automated Design Space Exploration of CGRA Processing Element Architectures using Frequent Subgraph Analysis

The architecture of a coarse-grained reconfigurable array (CGRA) process...
research
08/23/2023

Saggitarius: A DSL for Specifying Grammatical Domains

Common data types like dates, addresses, phone numbers and tables can ha...

Please sign up or login with your details

Forgot password? Click here to reset