A Computational Approach to Measuring the Semantic Divergence of Cognates

12/02/2020
by   Ana-Sabina Uban, et al.
0

Meaning is the foundation stone of intercultural communication. Languages are continuously changing, and words shift their meanings for various reasons. Semantic divergence in related languages is a key concern of historical linguistics. In this paper we investigate semantic divergence across languages by measuring the semantic similarity of cognate sets in multiple languages. The method that we propose is based on cross-lingual word embeddings. In this paper we implement and evaluate our method on English and five Romance languages, but it can be extended easily to any language pair, requiring only large monolingual corpora for the involved languages and a small bilingual dictionary for the pair. This language-agnostic method facilitates a quantitative analysis of cognates divergence – by computing degrees of semantic similarity between cognate pairs – and provides insights for identifying false friends. As a second contribution, we formulate a straightforward method for detecting false friends, and introduce the notion of "soft false friend" and "hard false friend", as well as a measure of the degree of "falseness" of a false friends pair. Additionally, we propose an algorithm that can output suggestions for correcting false friends, which could result in a very helpful tool for language learning or translation.

READ FULL TEXT
research
01/19/2018

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Recognizing semantically similar sentences or paragraphs across language...
research
12/17/2021

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages

Cognates are present in multiple variants of the same text across differ...
research
03/10/2020

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

We introduce Multi-SimLex, a large-scale lexical resource and evaluation...
research
01/17/2020

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

This paper presents a new technique for creating monolingual and cross-l...
research
12/10/2019

Machine Translation with Cross-lingual Word Embeddings

Learning word embeddings using distributional information is a task that...
research
11/13/2016

Cross-lingual Dataless Classification for Languages with Small Wikipedia Presence

This paper presents an approach to classify documents in any language in...
research
12/07/2020

What Meaning-Form Correlation Has to Compose With

Compositionality is a widely discussed property of natural languages, al...

Please sign up or login with your details

Forgot password? Click here to reset