Unsupervised Disambiguation of Syncretism in Inflected Lexicons

06/10/2018
by   Ryan Cotterell, et al.
0

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2017

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Type-level word embeddings use the same set of parameters to represent a...
research
08/22/2000

Explaining away ambiguity: Learning verb selectional preference with Bayesian networks

This paper presents a Bayesian model for unsupervised learning of verb s...
research
10/05/2020

Speakers Fill Lexical Semantic Gaps with Context

Lexical ambiguity is widespread in language, allowing for the reuse of e...
research
11/16/2022

Neural Unsupervised Reconstruction of Protolanguage Word Forms

We present a state-of-the-art neural approach to the unsupervised recons...
research
10/06/2015

Analyzer and generator for Pali

This work describes a system that performs morphological analysis and ge...
research
05/04/2020

The Paradigm Discovery Problem

This work treats the paradigm discovery problem (PDP), the task of learn...
research
08/05/2016

Boundary-based MWE segmentation with text partitioning

This work presents a fine-grained, text-chunking algorithm designed for ...

Please sign up or login with your details

Forgot password? Click here to reset