Starting form the second half of the nineteenth century [Schleicher, 1869] various researchers address historic development of language from evolutionary grounds.
A considerable proportion of the works in this field use word frequency as an important proxy of the word fitness. For example, [Pagel et al., 2007] demonstrate across several languages that frequently used words evolve at slower rates, whereas infrequently used words evolve more rapidly. [Newberry et al., 2017] state that a possible explanation for this phenomenon could be a stronger stochastic drift of rare words. In the meantime, [Adelman et al., 2006] notice that word frequency is confounded with polysemy, i.e., the number of contexts in which a word has been seen. They also show that this contextual diversity is a crucial factor that determines word-naming and lexical decision times.
[Lee, 1990] demonstrates that older words are more polysemous than recent words and that frequently used words are more polysemous than infrequently used words. This goes in line with [MacCormac, 1985] theory of semantic conceptual change that states that words evolve additional meanings through metaphor. It seems that the frequency of the word is confounded with its semantics.
reviews results on how a sound change affects the lexicon and documents that a sound change affects high-frequency words and low-frequency words differently. This shows that the frequency of the word is confounded with its phonetic properties. The ideas that there is a subtle correspondence between phonetics and semantics were entertained by literary theorists[Shklovsky, 1917] and artists [Kruchenykh, 1923] at least from the beginning of the twentieth century. In a massive study across nearly two-thirds of the world’s languages [Blasi et al., 2016] managed to demonstrate that a considerable proportion of 100 essential vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages. [Yamshchikov et al., 2019] showed that modern methods of computational linguistics could be used to highlight such associative structures within a language.
This position paper develops these ideas, and states that the phonetic simplicity of a word is to some extent correlated with the number of its semantic contexts. We also speculate on possible cognitive mechanisms underlying this connection.
To relate polysemous properties of the words with their phonetic structure, we use two different datasets represented as graphs with words as vertices.
[Smerlak, 2020] reconsidered Maynard Smith’s toy model of protein evolution [Smith, 1970] in context of neutral evolution. We use this dataset here for a different purpose. Let us consider a set of all possible four-letter words and assume that two words are connected with an edge if the second word can be derived from the first one with an edit of one letter. We further call this graph a graph of edits
. One could interpret the degree of the vertices in this graph as a proxy of the word’s phonetic simplicity. Indeed, if a word has more one-letter edits that produce a meaningful word, one can assume that it consists of the letters or sounds with higher joined probabilities. We discuss this hypothesis further in Section4.
We provide a large dataset of English synonyms that is based on WordNet111https://wordnet.princeton.edu/. Here two words are connected with an edge if they are synonymous. Figure 1 shows the frequency of the word usage (estimated on a large chunk of English Wikipedia222https://www.kaggle.com/rtatman/english-word-frequency) as a function of the number of synonyms that it has in the graph. This is a well-known fact that is important for us here, since to a certain extent it validates the dataset of synonyms as a representative one.
Figure 2 shows that the connection between word frequencies and degrees in the graph of synonyms is even stronger for the four-letter words that form the graph of edits.
The sheer number of synonyms adjacent to a given word does not necessarily correspond to the number of various semantic contexts in which it can occur. It is well known that polysemic words have more synonyms and tend to have higher frequencies, but one can not infer the number of possible semantic contexts in which a word can occur our of the number of its synonyms. Further, we discuss how one can estimate polysemy of a word using the geometry of the graph of synonyms.
3 Ollivier-Ricci Curvature and Polysemy
Ollivier-Ricci curvature [Ollivier, 2009] is commonly used for community detection [Ni et al., 2015], [Sia et al., 2019]. In this paper we use it in a way that is novel for mathematical linguistic and claim that it could be used as a proxy for the word’s polysemy measure in the language. Yet before we get into the details let us briefly describe Ollivier-Ricci curvature itself.
. One considers a particular probability distribution, which has parameter , and a graph . For a vertex with degree , let denote the set of neighbors of . For any the probability measure is defined as
The intuition behind the curvature of a given edge in our case is rather intuitive. Once an edge is within a dense community it has positive curvature, whereas edges that connect separate communities have negative curvature. This property of Ollivier-Ricci curvature directly leads to the detection of polysemy of a given word. Indeed, every incident edge with a positive Ollivier-Ricci curvature would connect the word to a synonym within the same semantic context, however, an incident edge with negative Ollivier-Ricci curvature points to a synonym within a drastically different semantic field. Therefore, one can use the number of incident edges with negative Ollivier-Ricci curvature or the average Ollivier-Ricci curvature across incident edges as a measure of the polysemy of the word. Figure 3 shows that words with lower average Ollivier-Ricci curvature of incident edges tend to have a higher degree in the graph of synonyms. This also goes in line with the statement that word frequency is confounded with polysemy.
Further, we show that the situation is more nuanced and that there is a connection between the location of the word within the graph of edits and its polysemy.
4 Polysemy and Phonetics
Figure 4 shows how the degree of a word in the graph of synonyms depends on its polysemy, i.e., the number of incident edges with negative Ollivier-Ricci curvature in the graph of synonyms. This connection is well known and can be seen in the proposed dataset.
Let us now discuss the graph of edits. One can regard the formation of actual words as a purely random process. The [Smith, 1970] toy-model is based on the assumption that if we regard all possible one-letter edits of a word, any combination of letters is equally ’lucky’ to become be another meaningful word. However, [Nowak and Krakauer, 1999] show that introduction of an error in sound recognition on the stage of a protolanguage makes it very limited: ”Adding new sounds increases the number of objects that can be described but at the cost of an increased probability of making mistakes; the overall ability to transfer information does not improve”. The authors show that combining sounds into words is a way to overcome such error limit. In line with this reasoning we suggest to look at the graph of edits from a phonetic perspective. Indeed, one should remember that there are certain phonetic structures that are more characteristic for a given language. Moreover, if a combination of letters is ’not pronounceable’ it definitely can not be a meaningful word. Therefore, one can suppose that the degree in the graph of edits corresponds to the so-called ’phonetic simplicity’ of a word. The words that are easier to pronounce would probably have a higher degree in the graph edits. Figure 5 partially illustrates this supposition.
Figure 5 and Figure 6 show that as the degree of the words in the graph of edits gets higher the words tend to have two vowels rather than one. Also ”u” and ”i” are less frequent among densely connected words, however ”a” and ”e” seem to occur more often. ”y” already vanishes as the degree of the words gets bigger than ten.
Table 1 shows that the frequency of two-vowel words that are arguably more robust in terms of phonetic simplicity correlates with the degree of the corresponding word in the graph of synonyms. It also correlates with the number of incident edges with negative Ollivier-Ricci curvature in the graph of synonyms. Finally, there is a strong correlation between the frequency of the two-vowel words and their degree in the graph of edits.
|Value||correlation with frequency|
|of two-vowel words|
|# of incident edges||%|
|with negative ORC|
|degree in the graph||%|
|degree in the graph||%|
All these observed correlations allow speculating that the structure of the graph of edits is affected by certain phonetic properties of the English language. A higher degree of a word in this graph seems to capture certain phonetic usability of this word.
This position paper demonstrates an interesting empirical fact: there is a connection between the structure of the graph of edits that is based on purely formal reasoning and a graph of synonyms that to a certain extent captures semantic complexity of the language. This fact in itself is thought-provoking. It motivates a search for a phonetically inspired notion of fitness that could be applied to the problems of the evolution of language. However, the discussion of such a notion is out of the scope of this work. Here we would only like to highlight the role of negatively curved incident edges in the graph of synonyms. We hope that this geometric approach could be further used to study polysemy.
Let us now briefly discuss the final interesting connection between the phonetic structure of the words and their polysemy. Out of Table 1 we know that the correlation between the degree in the graph of edits and the frequency of two vowel words is above 74%. We also know that the frequency of two-vowel words correlates with a number of incident negatively curved edges in the graph of synonyms and with the degree of the word in the graph of synonyms. These two quantities are also strongly correlated. In fact, the degree of the word in the graph of synonyms and the number of its incident negatively curved edges correlate with a coefficient of 0.97. Indeed, a number of synonyms, polysemy, and frequency of use are known to be correlated. However, we would like to discuss another interesting empiric connection here that could highlight the connection between these properties of the words and their phonetics.
Let us regard all the words with a given degree in the graph of edits. For a given word let us count all incident edgers with negative Olliver-Ricci curvature in the synonym graph and let us denote this number as . Let us also denote the degree of this word in the graph of synonyms as . Let us then calculate the ration . Figure 7 demonstrates how the sum of these ratios across all words with a fixed degree in the graph of edits depends on this degree.
Metric in Figure 7 correlates with the frequency of two vowel words with -83.4%. In our opinion, this might highlight the importance of Ollivier-Ricci curvature-based polysemy measure as a tool to highlight the connection between polysemy and phonetic properties of the words. It stands to reason that the words that are easier to pronounce would be used more often and acquire more synonyms with time. This highlights the possibility that polysemy could be associated with certain acoustic simplicity. Therefore it develops the idea of evolution through metaphor stated in [MacCormac, 1985], showing that the words that are easier to pronounce could be more prone to such evolution and, as time proceeds, could end up with more semantic fields.
This position paper demonstrates empirically a connection between polysemy of the words and their formal structure. We propose to use Ollivier-Ricci curvature over a graph of synonyms as an estimate for polysemy of the word. We speculate that the aforementioned connection between polysemy and formal structure is rooted in the phonetic properties of the language. We empirically demonstrate that certain phonetic properties of the words are correlated with their polysemy.
Authors are extremely grateful to Matteo Smerlak and Massimo Warglien for the help, support and constructive discussions.
- [Adelman et al., 2006] Adelman, J. S., Brown, G. D., and Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological science, 17(9):814–823.
- [Blasi et al., 2016] Blasi, D. E., Wichmann, S., Hammarström, H., Stadler, P. F., and Christiansen, M. H. (2016). Sound-meaning association biases evidenced across thousands of languages. In Proceedings of the National Academy of Sciences, volume 113:39, pages 10818–10823.
- [Bybee, 2002] Bybee, J. (2002). Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language variation and change, 14(3):261–290.
- [Kruchenykh, 1923] Kruchenykh, A. (1923). Phonetics of theatre. M.:41, Moscow.
- [Lee, 1990] Lee, C. J. (1990). Some hypotheses concerning the evolution of polysemous words. Journal of Psycholinguistic Research, 19(4):211–219.
- [MacCormac, 1985] MacCormac, E. R. (1985). A cognitive theory of metaphor. Journal of Aesthetics and Art Criticism, 45(4):418–420.
- [Newberry et al., 2017] Newberry, M. G., Ahern, C. A., Clark, R., and Plotkin, J. B. (2017). Detecting evolutionary forces in language change. Nature, 551(7679):223–226.
- [Ni et al., 2015] Ni, C.-C., Lin, Y.-Y., Gao, J., Gu, X. D., and Saucan, E. (2015). Ricci curvature of the internet topology. In 2015 IEEE Conference on Computer Communications (INFOCOM), pages 2758–2766. IEEE.
- [Ni et al., 2019] Ni, C.-C., Lin, Y.-Y., Luo, F., and Gao, J. (2019). Community detection on networks with ricci flow. Scientific reports, 9(1):1–12.
- [Nowak and Krakauer, 1999] Nowak, M. A. and Krakauer, D. C. (1999). The evolution of language. Proceedings of the National Academy of Sciences, 96(14):8028–8033.
Ollivier, Y. (2009).
Ricci curvature of markov chains on metric spaces.Journal of Functional Analysis, 256(3):810–864.
- [Pagel et al., 2007] Pagel, M., Atkinson, Q. D., and Meade, A. (2007). Frequency of word-use predicts rates of lexical evolution throughout indo-european history. Nature, 449(7163):717–720.
- [Schleicher, 1869] Schleicher, A. (1869). Darwinism tested by the science of language. JC Hotten.
- [Shklovsky, 1917] Shklovsky, V. (1917). Art as technique. Literary theory: An anthology, pages 15–21.
- [Sia et al., 2019] Sia, J., Jonckheere, E., and Bogdan, P. (2019). Ollivier-ricci curvature-based method to community detection in complex networks. Scientific reports, 9(1):1–12.
- [Smerlak, 2020] Smerlak, M. (2020). Localization of neutral evolution: selection for mutational robustness and the maximal entropy random walk. BioRxiv.
- [Smith, 1970] Smith, J. M. (1970). Natural selection and the concept of a protein space. Nature, 225(5232):563–564.
[Yamshchikov et al., 2019]
Yamshchikov, I. P., Shibaev, V., and Tikhonov, A. (2019).
Dyr bul shchyl. proxying sound symbolism with word embeddings.
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 90–94.