Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute

01/21/2020
by   Tiphaine Viard, et al.
0

Wikipedia is a huge opportunity for machine learning, being the largest semi-structured base of knowledge available. Because of this, countless works examine its contents, and focus on structuring it in order to make it usable in learning tasks, for example by classifying it into an ontology. Beyond its textual contents, Wikipedia also displays a typical graph structure, where pages are linked together through citations. In this paper, we address the task of integrating graph (i.e. structure) information to classify Wikipedia into a fine-grained named entity ontology (NE), the Extended Named Entity hierarchy. To address this task, we first start by assessing the relevance of the graph structure for NE classification. We then explore two directions, one related to feature vectors using graph descriptors commonly used in large-scale network analysis, and one extending flat classification to a weighted model taking into account semantic similarity. We conduct at-scale practical experiments, on a manually labeled subset of 22,000 pages extracted from the Japanese Wikipedia. Our results show that integrating graph information succeeds at reducing sparsity of the input feature space, and yields classification results that are comparable or better than previous works.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2019

Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set

Wikipedia is a great source of general world knowledge which can guide N...
research
09/21/2023

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Wikipedia articles are hierarchically organized through categories and l...
research
05/12/2021

Priberam Labs at the NTCIR-15 SHINRA2020-ML: Classification Task

Wikipedia is an online encyclopedia available in 285 languages. It compo...
research
11/23/2018

Fine Grained Classification of Personal Data Entities

Entity Type Classification can be defined as the task of assigning categ...
research
02/12/2021

Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia

Contextual advertising provides advertisers with the opportunity to targ...
research
07/13/2018

Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking

Extraction from raw text to a knowledge base of entities and fine-graine...
research
09/19/2018

Learning to Interpret Satellite Images Using Wikipedia

Despite recent progress in computer vision, fine-grained interpretation ...

Please sign up or login with your details

Forgot password? Click here to reset