Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set

09/14/2019
by   Hassan S. Shavarani, et al.
0

Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. We aim to create a large set of structured knowledge, usable for NLP models, from Wikipedia. The first step we take to create such a structured knowledge source is fine-grain classification of Wikipedia articles. In this work, we introduce the Shinara Dataset, a large multi-lingual and multi-labeled set of manually annotated Wikipedia articles in Japanese, English, French, German, and Farsi using Extended Named Entity (ENE) tag set. We evaluate the dataset using the best models provided for ENE label set classification and show that the currently available classification models struggle with large datasets using fine-grained tag sets.

READ FULL TEXT
research
11/22/2021

Namesakes: Ambiguously Named Entities from Wikipedia and News

We present Namesakes, a dataset of ambiguously named entities obtained f...
research
01/21/2020

Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute

Wikipedia is a huge opportunity for machine learning, being the largest ...
research
12/10/2021

LSH methods for data deduplication in a Wikipedia artificial dataset

This paper illustrates locality sensitive hasing (LSH) models for the id...
research
10/24/2018

Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing

Knowledge bases (KBs) are paramount in NLP. We employ multiview learning...
research
09/19/2018

Learning to Interpret Satellite Images Using Wikipedia

Despite recent progress in computer vision, fine-grained interpretation ...
research
04/02/2015

Eliciting Disease Data from Wikipedia Articles

Traditional disease surveillance systems suffer from several disadvantag...
research
12/14/2022

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

With the ever-growing popularity of the field of NLP, the demand for dat...

Please sign up or login with your details

Forgot password? Click here to reset