Evolution of Efficient Symbolic Communication Codes

06/04/2023
by   Anton Kolonin, et al.
0

The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximisation of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximising F1 score based on the "ground truth" language structure, by means of maximising the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenisation study for Russian, Chinese and English as well as subword segmentation or morphological parsing study for English. It is found that language structure form the word-level segmentation or tokenisation can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morphological parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.

READ FULL TEXT
research
03/04/2023

Self-tuning hyper-parameters for unsupervised cross-lingual tokenization

We explore the possibility of meta-learning for the language-independent...
research
01/30/2022

Word Segmentation and Morphological Parsing for Sanskrit

We describe our participation in the Word Segmentation and Morphological...
research
03/11/2021

Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Cross-lingual word embeddings (CLWE) have been proven useful in many cro...
research
04/21/2018

Cross-lingual Semantic Parsing

We introduce the task of cross-lingual semantic parsing: mapping content...
research
06/09/2021

Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Abstract Meaning Representation (AMR) is a rooted, labeled, acyclic grap...
research
05/29/2019

Anti-efficient encoding in emergent communication

Despite renewed interest in emergent language simulations with neural ne...

Please sign up or login with your details

Forgot password? Click here to reset