Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions

08/16/2018
by   Wei Li, et al.
0

Huge numbers of new words emerge every day, leading to a great need for representing them with semantic meaning that is understandable to NLP systems. Sememes are defined as the minimum semantic units of human languages, the combination of which can represent the meaning of a word. Manual construction of sememe based knowledge bases is time-consuming and labor-intensive. Fortunately, communities are devoted to composing the descriptions of words in the wiki websites. In this paper, we explore to automatically predict lexical sememes based on the descriptions of the words in the wiki websites. We view this problem as a weakly ordered multi-label task and propose a Label Distributed seq2seq model (LD-seq2seq) with a novel soft loss function to solve the problem. In the experiments, we take a real-world sememe knowledge base HowNet and the corresponding descriptions of the words in Baidu Wiki for training and evaluation. The results show that our LD-seq2seq model not only beats all the baselines significantly on the test set, but also outperforms amateur human annotators in a random subset of the test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2018

Primal Meaning Recommendation for Chinese Words and Phrases via Descriptions in On-line Encyclopedia

Polysemy is a very common phenomenon in modern languages. Most of previo...
research
05/28/2010

Using Soft Constraints To Learn Semantic Models Of Descriptions Of Shapes

The contribution of this paper is to provide a semantic model (using sof...
research
06/17/2018

Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Sememes are minimum semantic units of concepts in human languages, such ...
research
01/16/2020

Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence

Sememes, defined as the minimum semantic units of human languages in lin...
research
08/10/2022

The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

A sememe is defined as the minimum semantic unit of human languages. Sem...
research
05/14/2020

Towards NLP-supported Semantic Data Management

The heterogeneity of data poses a great challenge when data from differe...
research
10/20/2022

Design Representation as Semantic Networks

Design representation is a common task in the design process to facilita...

Please sign up or login with your details

Forgot password? Click here to reset