Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

05/18/2020
by   Sergey Zinin, et al.
0

Chinese dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE. The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change. However, there is no freely available open-source corpus of these histories, making Classical Chinese low-resource. This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license. An original list of Classical Chinese gender-specific terms was developed as a case study for analyzing the historical linguistic use of male and female terms. The study demonstrates considerable stability in the usage of these terms, with dominance of male terms. Exploration of word meanings uses keyword analysis of focus corpora created for genderspecific terms. This method yields meaningful semantic representations that can be used for future studies of diachronic semantics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

Evolution of Part-of-Speech in Classical Chinese

Classical Chinese is a language notable for its word class flexibility: ...
research
12/06/2018

Adpositional Supersenses for Mandarin Chinese

This study adapts Semantic Network of Adposition and Case Supersenses (S...
research
09/10/2015

On the evolution of word usage of classical Chinese poetry

The hierarchy of classical Chinese poetry has been broadly acknowledged ...
research
02/02/2017

Topic Modeling the Hàn diăn Ancient Classics

Ancient Chinese texts present an area of enormous challenge and opportun...
research
02/01/2023

For the Underrepresented in Gender Bias Research: Chinese Name Gender Prediction with Heterogeneous Graph Attention Network

Achieving gender equality is an important pillar for humankind's sustain...
research
08/29/2022

naab: A ready-to-use plug-and-play corpus for Farsi

Huge corpora of textual data are always known to be a crucial need for t...
research
05/26/2020

The 'Letter' Distribution in the Chinese Language

Corpus-based statistical analysis plays a significant role in linguistic...

Please sign up or login with your details

Forgot password? Click here to reset