A Mathematical Model for Linguistic Universals

07/29/2019
by   Weinan E, et al.
12

Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker's "mentalese", or chains of mental states that transcend the spoken/written forms. Such (potentially) universal temporal structures of textual patterns lead us to a language-independent semantic representation, or a translationally-invariant word embedding, thereby forming the common ground for both comprehensibility within a given language and translatability between different languages. Applying our model to documents of moderate lengths, without relying on external knowledge bases, we reconcile Noam Chomsky's "poverty of stimulus" paradox with statistical learning of natural languages.

READ FULL TEXT

page 3

page 5

research
08/22/2022

The optimality of word lengths. Theoretical foundations and an empirical study

One of the most robust patterns found in human languages is Zipf's law o...
research
07/18/2020

A novel approach to sentiment analysis in Persian using discourse and external semantic information

Sentiment analysis attempts to identify, extract and quantify affective ...
research
03/08/2021

AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin

From Word2Vec to GloVe, word embedding models have played key roles in t...
research
04/21/2018

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

We construct a multilingual common semantic space based on distributiona...
research
06/27/2020

Normalizador Neural de Datas e Endereços

Documents of any kind present a wide variety of date and address formats...
research
08/23/2022

Universality and diversity in word patterns

Words are fundamental linguistic units that connect thoughts and things ...

Please sign up or login with your details

Forgot password? Click here to reset