Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings

04/07/2022
by   Dan John Velasco, et al.
8

Language resources such as wordnets remain indispensable tools for different natural language tasks and applications. However, for low-resource languages such as Filipino, existing wordnets are old and outdated, and producing new ones may be slow and costly in terms of time and resources. In this paper, we propose an automatic method for constructing a wordnet from scratch using only an unlabeled corpus and a sentence embeddings-based language model. Using this, we produce FilWordNet, a new wordnet that supplants and improves the outdated Filipino WordNet. We evaluate our automatically-induced senses and synsets by matching them with senses from the Princeton WordNet, as well as comparing the synsets to the old Filipino WordNet. We empirically show that our method can induce existing, as well as potentially new, senses and synsets automatically without the need for human supervision.

READ FULL TEXT

page 4

page 7

research
05/21/2021

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Existing models of multilingual sentence embeddings require large parall...
research
10/14/2021

Large Scale Substitution-based Word Sense Induction

We present a word-sense induction method based on pre-trained masked lan...
research
10/22/2020

Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation

Transformers represent the state-of-the-art in Natural Language Processi...
research
10/27/2020

Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora

We propose a new approach for learning contextualised cross-lingual word...
research
10/25/2022

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

Bilingual lexicons form a critical component of various natural language...
research
01/05/2022

Semi-automatic WordNet Linking using Word Embeddings

Wordnets are rich lexico-semantic resources. Linked wordnets are extensi...
research
06/07/2020

An Algorithm for Fuzzification of WordNets, Supported by a Mathematical Proof

WordNet-like Lexical Databases (WLDs) group English words into sets of s...

Please sign up or login with your details

Forgot password? Click here to reset