Topic Modeling the Hàn diăn Ancient Classics

02/02/2017
by   Colin Allen, et al.
0

Ancient Chinese texts present an area of enormous challenge and opportunity for humanities scholars interested in exploiting computational methods to assist in the development of new insights and interpretations of culturally significant materials. In this paper we describe a collaborative effort between Indiana University and Xi'an Jiaotong University to support exploration and interpretation of a digital corpus of over 18,000 ancient Chinese documents, which we refer to as the "Handian" ancient classics corpus (Hàn diăn gŭ jí, i.e, the "Han canon" or "Chinese classics"). It contains classics of ancient Chinese philosophy, documents of historical and biographical significance, and literary works. We begin by describing the Digital Humanities context of this joint project, and the advances in humanities computing that made this project feasible. We describe the corpus and introduce our application of probabilistic topic modeling to this corpus, with attention to the particular challenges posed by modeling ancient Chinese documents. We give a specific example of how the software we have developed can be used to aid discovery and interpretation of themes in the corpus. We outline more advanced forms of computer-aided interpretation that are also made possible by the programming interface provided by our system, and the general implications of these methods for understanding the nature of meaning in these texts.

READ FULL TEXT

page 10

page 11

research
11/07/2016

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

Objective: To build a comprehensive corpus covering syntactic and semant...
research
05/18/2020

Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

Chinese dynastic histories form a large continuous linguistic space of a...
research
10/11/2015

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

We analyzed historical and literary documents in Chinese to gain insight...
research
11/04/2015

Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History

Person names and location names are essential building blocks for identi...
research
04/21/2021

Possibilities, Challenges and Limits of a European Charters Corpus (Cartae Europae Medii Aevi - CEMA)

The objective of this paper is to present a meta-corpus of diplomatic do...
research
03/14/2016

Interactive Tools and Tasks for the Hebrew Bible

This contribution to a special issue on "Computer-aided processing of in...
research
04/16/2023

SikuGPT: A Generative Pre-trained Model for Intelligent Information Processing of Ancient Texts from the Perspective of Digital Humanities

The rapid advance in artificial intelligence technology has facilitated ...

Please sign up or login with your details

Forgot password? Click here to reset