Japanese-Spanish Thesaurus Construction Using English as a Pivot

03/06/2013
by   Jessica Ramírez, et al.
0

We present the results of research with the goal of automatically creating a multilingual thesaurus based on the freely available resources of Wikipedia and WordNet. Our goal is to increase resources for natural language processing tasks such as machine translation targeting the Japanese-Spanish language pair. Given the scarcity of resources, we use existing English resources as a pivot for creating a trilingual Japanese-Spanish-English thesaurus. Our approach consists of extracting the translation tuples from Wikipedia, disambiguating them by mapping them to WordNet word senses. We present results comparing two methods of disambiguation, the first using VSM on Wikipedia article texts and WordNet definitions, and the second using categorical information extracted from Wikipedia, We find that mixing the two methods produces favorable results. Using the proposed method, we have constructed a multilingual Spanish-Japanese-English thesaurus consisting of 25,375 entries. The same method can be applied to any pair of languages that are linked to English in Wikipedia.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2022

Considerations for Multilingual Wikipedia Research

English Wikipedia has long been an important data source for much resear...
research
11/13/2016

Cross-lingual Dataless Classification for Languages with Small Wikipedia Presence

This paper presents an approach to classify documents in any language in...
research
04/08/2020

Architecture for a multilingual Wikipedia

Wikipedia's vision is a world in which everyone can share in the sum of ...
research
10/21/2020

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

Specific lexical choices in how people are portrayed both reflect the wr...
research
11/27/2019

Sideways Transliteration: How to Transliterate Multicultural Person Names?

In a global setting, texts contain transliterated names from many cultur...
research
10/13/2020

Multilingual Argument Mining: Datasets and Analysis

The growing interest in argument mining and computational argumentation ...
research
06/02/2023

Fair multilingual vandalism detection system for Wikipedia

This paper presents a novel design of the system aimed at supporting the...

Please sign up or login with your details

Forgot password? Click here to reset