Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics

11/23/2019
by   Preslav Nakov, et al.
0

An important characteristic of English written text is the abundance of noun compounds - sequences of nouns acting as a single noun, e.g., colon cancer tumor suppressor protein. While eventually mastered by domain experts, their interpretation poses a major challenge for automated analysis. Understanding noun compounds' syntax and semantics is important for many natural language applications, including question answering, machine translation, information retrieval, and information extraction. I address the problem of noun compounds syntax by means of novel, highly accurate unsupervised and lightly supervised algorithms using the Web as a corpus and search engines as interfaces to that corpus. Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies. I extend this approach by introducing novel surface features and paraphrases, which yield state-of-the-art results for the task of noun compound bracketing. I also show how these kinds of features can be applied to other structural ambiguity problems, like prepositional phrase attachment and noun phrase coordination. I address noun compound semantics by automatically generating paraphrasing verbs and prepositions that make explicit the hidden semantic relations between the nouns in a noun compound. I also demonstrate how these paraphrasing verbs can be used to solve various relational similarity problems, and how paraphrasing noun compounds can improve machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2021

Design and Implementation of English To Yoruba Verb Phrase Machine Translation System

We aim to develop an English to Yoruba machine translation system which ...
research
09/17/2023

Syntax Tree Constrained Graph Network for Visual Question Answering

Visual Question Answering (VQA) aims to automatically answer natural lan...
research
03/19/2015

Phrase database Approach to structural and semantic disambiguation in English-Korean Machine Translation

In machine translation it is common phenomenon that machine-readable dic...
research
11/26/2017

Machine Translation Using Semantic Web Technologies: A Survey

A large number of machine translation approaches has been developed rece...
research
03/19/2022

Clickbait Spoiling via Question Answering and Passage Retrieval

We introduce and study the task of clickbait spoiling: generating a shor...
research
01/09/2022

An Ensemble Approach to Acronym Extraction using Transformers

Acronyms are abbreviated units of a phrase constructed by using initial ...
research
11/26/2021

An Optimal Algorithm for Finding Champions in Tournament Graphs

A tournament graph T = (V, E ) is an oriented complete graph, which can ...

Please sign up or login with your details

Forgot password? Click here to reset