Mining and discovering biographical information in Difangzhi with a language-model-based approach

04/08/2015
by   Peter K. Bol, et al.
0

We present results of expanding the contents of the China Biographical Database by text mining historical local gazetteers, difangzhi. The goal of the database is to see how people are connected together, through kinship, social connections, and the places and offices in which they served. The gazetteers are the single most important collection of names and offices covering the Song through Qing periods. Although we begin with local officials we shall eventually include lists of local examination candidates, people from the locality who served in government, and notable local figures with biographies. The more data we collect the more connections emerge. The value of doing systematic text mining work is that we can identify relevant connections that are either directly informative or can become useful without deep historical research. Academia Sinica is developing a name database for officials in the central governments of the Ming and Qing dynasties.

READ FULL TEXT

page 9

page 10

research
11/04/2015

Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History

Person names and location names are essential building blocks for identi...
research
12/19/2022

Very Large Language Model as a Unified Methodology of Text Mining

Text data mining is the process of deriving essential information from l...
research
10/16/2020

Learning Social Networks from Text Data using Covariate Information

Describing and characterizing the impact of historical figures can be ch...
research
08/02/2023

Industrial Memories: Exploring the Findings of Government Inquiries with Neural Word Embedding and Machine Learning

We present a text mining system to support the exploration of large volu...
research
06/05/2023

Jambu: A historical linguistic database for South Asian languages

We introduce Jambu, a cognate database of South Asian languages which un...
research
02/02/2022

Systems Mining with Heraklit: The Next Step

We suggest systems mining as the next step after process mining. Systems...
research
08/24/2023

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Existing full text datasets of U.S. public domain newspapers do not reco...

Please sign up or login with your details

Forgot password? Click here to reset