Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings

04/01/2019
by   Ciosici, et al.
0

As abbreviations often have several distinct meanings, disambiguating their intended meaning in context is important for Machine Reading tasks such as document search, recommendation and question answering. Existing approaches mostly rely on labelled examples of abbreviations and their correct long forms, which is costly to generate and limits their applicability and flexibility. Importantly, they need to be subjected to a full empirical evaluation, which is cumbersome in practice. In this paper, we present an entirely unsupervised abbreviation disambiguation method (called UAD) that picks up abbreviation definitions from text. Creating distinct tokens per meaning, we learn context representations as word embeddings. We demonstrate how to further boost abbreviation disambiguation performance by obtaining better context representations from additional unstructured text. Our method is the first abbreviation disambiguation approach which features a transparent model that allows performance analysis without requiring full-scale evaluation, making it highly relevant for real-world deployments. In our thorough empirical evaluation, UAD achieves high performance on large real world document data sets from different domains and outperforms both baseline and state-of-the-art methods. UAD scales well and supports thousands of abbreviations with many different meanings with a single model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2019

A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases

Deep language models learning a hierarchical representation proved to be...
research
02/01/2019

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Learning word embeddings has received a significant amount of attention ...
research
10/16/2019

A Probabilistic Framework for Learning Domain Specific Hierarchical Word Embeddings

The meaning of a word often varies depending on its usage in different d...
research
07/02/2020

Improving Event Detection using Contextual Word and Sentence Embeddings

The task of Event Detection (ED) is a subfield of Information Extraction...
research
07/19/2016

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word...
research
04/17/2019

Contextual Aware Joint Probability Model Towards Question Answering System

In this paper, we address the question answering challenge with the SQuA...
research
11/16/2022

Artificial Disfluency Detection, Uh No, Disfluency Generation for the Masses

Existing approaches for disfluency detection typically require the exist...

Please sign up or login with your details

Forgot password? Click here to reset