280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

04/25/2017
by   Amit Gupta, et al.
0

We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2016

Transliteration in Any Language with Surrogate Languages

We introduce a method for transliteration generation that can produce tr...
research
04/28/2020

Extending Multilingual BERT to Low-Resource Languages

Multilingual BERT (M-BERT) has been a huge success in both supervised an...
research
06/11/2020

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

We use the multilingual OSCAR corpus, extracted from Common Crawl via la...
research
06/02/2023

Fair multilingual vandalism detection system for Wikipedia

This paper presents a novel design of the system aimed at supporting the...
research
10/24/2018

Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing

Knowledge bases (KBs) are paramount in NLP. We employ multiview learning...
research
05/03/2020

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction

We propose an automatic language-independent graph-based method to build...
research
02/22/2022

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

On the world wide web, toxic content detectors are a crucial line of def...

Please sign up or login with your details

Forgot password? Click here to reset