The 'Letter' Distribution in the Chinese Language

05/26/2020
by   Qinghua Chen, et al.
0

Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent. In particular, the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages. This study provides new evidence of the consistency of human languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2017

The Dependence of Frequency Distributions on Multiple Meanings of Words, Codes and Signs

The dependence of the frequency distributions due to multiple meanings o...
research
09/17/2017

Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs

We collect 14 representative corpora for major periods in Chinese histor...
research
05/09/2018

wubi2en: Character-level Chinese-English Translation through ASCII Encoding

Character-level Neural Machine Translation (NMT) models have recently ac...
research
09/30/2015

The "handedness" of language: Directional symmetry breaking of sign usage in words

Using large written corpora for many different scripts, we show that the...
research
03/21/2023

Chinese Intermediate English Learners outdid ChatGPT in deep cohesion: Evidence from English narrative writing

ChatGPT is a publicly available chatbot that can quickly generate texts ...
research
01/14/2021

Estimation of the Frequency of Occurrence of Italian Phonemes in Text

The purpose of this project was to derive a reliable estimate of the fre...
research
05/18/2020

Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

Chinese dynastic histories form a large continuous linguistic space of a...

Please sign up or login with your details

Forgot password? Click here to reset