SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

09/21/2023
by   Zhaoyi Wang, et al.
0

Wikipedia articles are hierarchically organized through categories and lists, providing one of the most comprehensive and universal taxonomy, but its open creation is causing redundancies and inconsistencies. Assigning DBPedia classes to Wikipedia categories and lists can alleviate the problem, realizing a large knowledge graph which is essential for categorizing digital contents through entity linking and typing. However, the existing approach of CaLiGraph is producing incomplete and non-fine grained mappings. In this paper, we tackle the problem as ontology alignment, where structural information of knowledge graphs and lexical and semantic features of ontology class names are utilized to discover confident mappings, which are in turn utilized for finetuing pretrained language models in a distant supervision fashion. Our method SLHCat consists of two main parts: 1) Automatically generating training data by leveraging knowledge graph structure, semantic similarities, and named entity typing. 2) Finetuning and prompt-tuning of the pre-trained language model BERT are carried out over the training data, to capture semantic and syntactic properties of class names. Our model SLHCat is evaluated over a benchmark dataset constructed by annotating 3000 fine-grained CaLiGraph-DBpedia mapping pairs. SLHCat is outperforming the baseline model by a large margin of 25 accuracy, offering a practical solution for large-scale ontology mapping.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2021

OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding

Semantic embedding has been widely investigated for aligning knowledge g...
research
01/21/2020

Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute

Wikipedia is a huge opportunity for machine learning, being the largest ...
research
04/28/2022

Instilling Type Knowledge in Language Models via Multi-Task QA

Understanding human language often necessitates understanding entities a...
research
11/07/2022

BigCilin: An Automatic Chinese Open-domain Knowledge Graph with Fine-grained Hypernym-Hyponym Relations

This paper presents BigCilin, the first Chinese open-domain knowledge gr...
research
02/12/2021

Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia

Contextual advertising provides advertisers with the opportunity to targ...
research
04/05/2023

What's in a Name? Beyond Class Indices for Image Recognition

Existing machine learning models demonstrate excellent performance in im...
research
06/20/2018

Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs

Semantic annotation is fundamental to deal with large-scale lexical info...

Please sign up or login with your details

Forgot password? Click here to reset