Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

12/04/2019
by   Fanchao Qi, et al.
0

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction.

READ FULL TEXT

page 6

page 7

research
03/14/2022

Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information

In linguistics, a sememe is defined as the minimum semantic unit of lang...
research
08/10/2022

The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

A sememe is defined as the minimum semantic unit of human languages. Sem...
research
05/26/2021

Automatic Construction of Sememe Knowledge Bases via Dictionaries

A sememe is defined as the minimum semantic unit in linguistics. Sememe ...
research
01/16/2020

Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence

Sememes, defined as the minimum semantic units of human languages in lin...
research
09/07/2018

Multitask and Multilingual Modelling for Lexical Analysis

In Natural Language Processing (NLP), one traditionally considers a sing...
research
05/22/2023

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

Colexification in comparative linguistics refers to the phenomenon of a ...
research
12/01/2022

A Commonsense-Infused Language-Agnostic Learning Framework for Enhancing Prediction of Political Polarity in Multilingual News Headlines

Predicting the political polarity of news headlines is a challenging tas...

Please sign up or login with your details

Forgot password? Click here to reset