Nominal Compound Chain Extraction: A New Task for Semantic-enriched Lexical Chain

09/19/2020
by   Bobo Li, et al.
0

Lexical chain consists of cohesion words in a document, which implies the underlying structure of a text, and thus facilitates downstream NLP tasks. Nevertheless, existing work focuses on detecting the simple surface lexicons with shallow syntax associations, ignoring the semantic-aware lexical compounds as well as the latent semantic frames, (e.g., topic), which can be much more crucial for real-world NLP applications. In this paper, we introduce a novel task, Nominal Compound Chain Extraction (NCCE), extracting and clustering all the nominal compounds that share identical semantic topics. In addition, we model the task as a two-stage prediction (i.e., compound extraction and chain detection), which is handled via a proposed joint framework. The model employs the BERT encoder to yield contextualized document representation. Also, HowNet is exploited as external resources for offering rich sememe information. The experiments are based on our manually annotated corpus, and the results prove the necessity of the NCCE task as well as the effectiveness of our joint approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2020

SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Lexical Semantic Change detection, i.e., the task of identifying words t...
research
04/06/2020

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

The extraction of anglicisms (lexical borrowings from English) is releva...
research
01/22/2021

Enhanced word embeddings using multi-semantic representation through lexical chains

The relationship between words in a sentence often tells us more about t...
research
06/17/2018

Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Sememes are minimum semantic units of concepts in human languages, such ...
research
10/18/2017

Towards a Seamless Integration of Word Senses into Downstream NLP Applications

Lexical ambiguity can impede NLP systems from accurate understanding of ...
research
07/29/2021

WiC = TSV = WSD: On the Equivalence of Three Semantic Tasks

The WiC task has attracted considerable attention in the NLP community, ...
research
03/16/2017

Improving Document Clustering by Eliminating Unnatural Language

Technical documents contain a fair amount of unnatural language, such as...

Please sign up or login with your details

Forgot password? Click here to reset