As we all know, microscopes can help bioscientists better observe the internal structure of cells. Our framework hopes to help researchers not only focus on the relationships of specific research characters but recognize the role of these characters in the entire large social network, which we hope it works like computing lens for some social problems.
The traditional method to study historical persons’ relationship is mainly through reading literature, accessing a large number of historical documents and demonstrating the relationships. And it is mostly qualitative analysis. However, this method requires too much professional knowledge. For example, for someone who does not understand the ancient literature, it is impossible to dig out information in historical documents. In addition, the traditional method used to discuss the relationship between two people was limited to the direct relationship between historical people but did not consider the influence of these historical people’s friends and even the entire social network on him. Therefore, this paper proposes a new social network research framework(CLHPSoNet) to help people understand the influence of social networks of historical people on the social relationships of historical people. This method can be studied quantitatively without the need for rich background knowledge. The comparison between our framework and traditional methods are shown in Table I. Our method can get some answers about the problems like who is the central figures? or what’s the relationship between some people?111https://www.zhihu.com/question/20589740. These questions are not only concerned by researchers, but history enthusiasts also have great interests.
|Traditional methods||Our framework|
|Data source||Historical Documents||Database|
|Results display||Papers, Tables||Web UI|
The major contributions of this paper are as follows:
A novel research framework for exploring the social relationship of historical people is presented.
We propose graph partitioning algorithms to solve relevant domain problems.
Based on the proposed model and algorithm, we have built an application to help people analyze and understand the social relationships of the ancients.
The rest of the paper is organized as follows. We introduce related work in Section 2. We introduce our modeling in Section 3 and system designing in Section 4. An Application is built in Section 5 to show our framework work well in some domain problems. We conclude the paper and point out future directions in Section 6.
Ii Related Work
Ii-a Research on Historical Figures’ Relationship
The study of social relationships between historical people plays an important role in the study of history. In addition to a richer account of the life experiences of the characters, it also reflects the historical background of the historical people to some extent. It attracted a lot of research attention.
For example, Liu  lists historical document records to discuss the Su Shi’s222https://en.wikipedia.org/wiki/Su_Shi(a song dynasty poet) association with Wang Anshi333https://en.wikipedia.org/wiki/Wang_Anshi (a song dynasty poet), from life experience, political, literature, and other aspects. Guo use a lot of historical documents to discuss the friendship between Ouyang Xiu and Wang Anshi. With the development of digital humanities, some researchers input the historical document records to computers to count, compute and visualize the relationship of historical people. Yu et al have some statistics on the friendship between Su Shi and Wang Gong(a song dynasty poet). Zhu  proposed a ‘Chro-Ring’ approach to visualize the history of poets. However, most of the research in this area is qualitative, straightforward and historical-based.
Ii-B Research on China Biographical Database Project (CBDB)
The China Biographical Database(CBDB)444https://projects.iq.harvard.edu/cbdb is a freely accessible relational database with biographical information about approximately 417,000 individuals, primarily from the 7th through 19th centuries. It is developed by Harvard University, Institute of History and Philology of Academia Sinica and Peking University. Recent CBDB’s version released in April 2017, it has been continuously updated. The data is meant to be useful for statistical, social network, and spatial analysis as well as serving as a kind of biographical reference.
Based on CBDB dataset, a lot of new applications and system are proposed. Liu et al[6, 7] use the database to compare the poetry of different dynasties, it shows the accessibility of increasingly larger datasets strengthens researchers’ research potential. Guo proposed a powerful tool for studying genealogical records. Sie et al developed a text retrieval and mining system for Taiwanese historical people. There exist lots of valuable information waiting for researchers to mine. Our paper uses the dataset to implement our framework. Of course, other historical files or database can be added to make it more complete.
Ii-C Signed Graphs
Signed social networks are such social networks in which social ties can have two signs: positive and negative. It was first addressed by social balance theory, which originated in cognitive theory from the 1950s
. For all signed triad, triads with an odd number of positive edges are defined as balanced. It was proven that if a connected network is balanced, it can be split into two opposing groups and vice versa. In reality, however, social networks are rarely, structural balance theory has been developed extensively, e.g. weak structural balance, k-balance, and other different criteria to quantify and evaluate balance in a signed social network[13, 10, 14]. Researchers defined clustering problems on signed graphs can be used as a criteria to measure the degree of balance in social networks
. Then it becomes a cluster problem and it’s NP-hard, a lot of heuristic methods are proposed to solve the clustering problems[16, 17, 15, 13]
. Recently, with the development of network representation learning, researchers begin to use machine learning methods to learn low-dimensional vector representations for nodes of a given network. And it has been proven to be useful in many tasks of network analysis such as link prediction, node classification, and visualization. However, network embedding has various methods (e.g. DeepWalk, node2vec, t-SNE)been proposed for unsigned network embedding. How to do network embedding for signed graph attracted the interest of researchers. A framework like SiNE is the latest attempt, it inspires the use of embedding methods for sigend graph.
Iii-a Data Preparation
As we said in related work, we use the CBDB’s API555https://projects.iq.harvard.edu/cbdb/cbdb-api to collect data. We cleaned the collected data. According to the following rules, we divided the dynasty where people belong to:
The dynasty is marked
The year of birth is within the year of the dynasty
The year of death is within the year of the dynasty
We use NetworkX666https://networkx.github.io/ to do some statistics on the networks, which the number of nodes , the number of edges , average clustering coefficient and average path length. The results show in the Table II. Besides, Figure 1 shows the degree distributions in networks. It shows that the social network in Tang dynasty is not complete. Other networks show a power law phenomenon and small world phenomenon. For our following works, we use Song dynasty social networks.
In order to get an intuitive impression, we use Gephi777https://gephi.org/ to visualize the network to get Figure 2 and Figure 3. It shows that most of the important people are contained in a big connected component, which shows a small world phenomenon. And if you focus on someone e.g.Wan Anshi, you will find that even someone e.g.Hui Qin888https://en.wikipedia.org/wiki/Qin_Hui whom Wan Anshi don’t know can be reached by little steps, such as Sun Di.
Iii-B Domain Problem
After our data preparation, we have a whole undirected weighted graph . is the number of records between and . And we want to solve some problems people concerned. For example, how is the relationship between Eight Great Prose Masters of the Tang and Song999https://en.wikipedia.org/wiki/Eight_Masters_of_the_Tang_and_Song. Or who is the central figures in the New Policies in Song dynasty101010https://en.wikipedia.org/wiki/New_Policies_(Song_dynasty)?
To answer the questions about the relationship for some people, we need do some data modelings.
Iii-C Data Model
Iii-C1 signed networks
As we mentioned in the related work about the signed graph. In order to answer some domain problems, the network needs to be a signed graph to make the relationship more semantics, which means to make the relationship friendly or unfriendly. We assigns a sign to each edge in , then is achieved. A total 445 kinds of relationship is manually annotated. Some signed rules(Top 10) showed in the Table III. It shows that the number of positive relationships is much more than the number of negative relationships.
|为Y作墓志铭||Make epitaph for Y||10189||弹劾||impeach||400||被致书由Y||The book was written by Y||3624|
|墓志铭由Y所作||Epitaph made by Y||8502||被Y弹劾||Invoked by Y||385||致书Y||mail to Y||3302|
|书序由Y所作||Book sequence made by Y||4774||遭到Y的反对/攻讦||Y’s opposition/attack||348||同僚||A colleag||538|
|为Y所著书作序||Preface to Y’s book||4323||反对/攻讦||Opposition/attack||334||未详||Unknown||353|
|书跋由Y所作||Book made by Y||4111||反对/不支持Y的政策||Oppose/Do not support Y’s policy||297||(暂时保留，待删除：吏部供职)||(Temporarily reserved pending deletion: Appointment)||331|
|祭文由Y所作||The memorial was made by Y||3975||其政策被Y反对/不支持||Its policy is opposed by Y/not supported||281||上司为Y||Boss is Y||213|
|为Y所著书作跋||Writing for Y’s book||3868||遭Y排挤||Excluded by Y||199||为Y之考官||The examiner for Y||167|
|为Y作祭文||Make a memorial for Y||3667||排挤||Exclude||196||奏录Y之文||Record the story of Y||20|
|为Y作临別赠言(送別诗、序)||Make a parting speech for Y (Farewell poetry||3623||得罪Y||Offend Y||177||同场屋/同应举||Same-room housing||20|
|临別得到Y所作赠言(送別诗、序)||Make a goodbye to Y’s speech (Farewell poems, prefaces)||3520||不合||Not fit||152||以宦官事Y||With official business Y||18|
Iii-C2 Top and Central People
Centrality is a common way to quantify the importance of the vertices in a network.
It has a lot of different declinations are used in social network analysis to identify and highlight specific nodes .
In order to find the important people in our network, we use the Degree Centrality, Betweenness Centrality, Closeness Centrality and Eigenvector Centrality
Degree Centrality, Betweenness Centrality, Closeness Centrality and Eigenvector Centrality. In Table IV, Top 15 degree centrality figures are listed, which is order by Degree Centrality. It’s easy to find that most high degree centrality people are famous in the history.
Degree Centrality defines the number of links that have a node. For a graph, , the degree centrality for vertex is:
is the degree of . The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph. It shows that people who know many other people are actually important and powerful. e.g.Anshi Wang who once served as prime minister and implemented reforms New Policies111111https://en.wikipedia.org/wiki/New_Policies_(Song_dynasty). This reform involves many aspects, such as finance, military, education an so on. As the reformer, Wang Anshi were in the political center and contacted with many people. What’s more, in the network, the distribution of ‘long tail’ of this measure exists like many other real-world network, which means a lot of people have a low degree centrality.
Betweenness Centrality is a centrality measure of a vertex within a graph. Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not. The betweenness computed as follows:
where the number of shortest -paths, is the number of those paths passing through some node other than . It measures the person’s ability to act as a bridge between other people. In our networks, some people with high degree centrality may not have high betweenness centrality like Lv Zuqian.
Closeness Centrality is the reciprocal of the average shortest path distance to over all other reachable nodes. The result is “a ratio of the fraction of actors in the group who are reachable, to the average distance” from the reachable actors. It defines as follows:
where is the shortest-path distance between and , and is the number of nodes that can reach . It can measure someone’s independence, influence and the ability of information transmission.
Eigenvector Centrality computes the centrality for a node based on the centrality of its neighbors.
where is the adjacency matrix of the graph
with eigenvalue. The power iteration method is used to compute the eigenvector, the default is 100 iterations. Eigenvector centrality assigned to the vertices according to the score their neighbors received. It takes the number of people and the power of the people someone knows into considerations.
Iii-C3 Subgraph Extraction
In order to measure the social balance, we proposed an Algorithm 1. If , the graph will be the direct relationship between the people we care about. Algorithm 1 can extract the subgraph centered on the seed node, which can better reflect the relationship between seed nodes. Because of small world phenomenon, should less than 4 most cases and the number of nodes in subgraph grow very fast.
Iv System Design
Iv-a Problem Definition
In the section, we proposed our framework workflow as Figure 4. It includes signed graph modeling, subgraph extraction, computing and visualize. The final output is consists of three parts: Top and Central People, Direct Relationship and Group Partition.
After modeling, we have a signed subgraph and top central people. Then we want to know how to parted into different groups based on the social relationship? It can be formal defined as follows:
(CC problem). Let be a signed graph and we be a nonnegative edge weight associated with each edge . The correlation clustering problem is the problem of finding a partition of such that the imbalance is minimized. is defined as for a partition :
As we said in the related work, the CC problem is NP-hard and there are some algorithms can give the solutions.
Iv-B1 Graph Partition
Doreiana proposed a heuristic approach proposed is a simple greedy neighborhood search procedure that assumes a prior knowledge of the number of clusters in the solution. This heuristic is implemented in software Pajek121212http://vlado.fmf.uni-lj.si/pub/networks/pajek/. It can be a solution algorithm for our problems.
Iv-B2 Community Detection
Iv-B3 Network Embedding
Network Embedding is a new method to model the signed graph. SiNE
is a deep learning framework for Signed Network Embedding. It defines a new objective function guided by the extended structural balance theory. The framework has two hidden layers and it will embed the node to-dimension(In his paper,
achieved the best performs). After the node embedding, it can be used K-means clustering methods to part the nodes to different groups.
Overall, the problem is an open issue to a certain extent. Different datasets, research questions, and algorithms will lead to different results. There is no particularly good silver bullet.
V-a Development tools
V-B A Case Study for Eight Great Prose Masters of Song
In order to study the relationship between in Eight Great Prose Masters of the Song(Su Shi, Wang Anshi, Ouyang Xiu, Zeng Gong, Su Zhe, Su Xun), we input these people and set , which means just take their relationship into consideration. And we choose Community Detection as our algorithm. We will get a three-part report by our visualize system.
V-B1 Top Central People
First of all, we can get the central people.
Figure 5 shows that Su Shi, Wang Anshi, Ouyang Xiu all have high central values, indicating that they are important nodes in the network. In the social network, many people know each other through them. This result meets with common sense. These three people’s popularity in the understanding of ordinary people is relatively high than other three people.
V-B2 Direct Relationship
The second part reports the direct relationship between given people.
We describe in detail the positive relationship, negative relationship and final signed relationship between them. Figure 6 shows that most of the relationships between them are positive, and the source of the negative relationship is mainly due to political opinion differences. It is worth pointing out that the relationship between Ouyang Xiu and Wang Anshi is more complicated.
In the early days, Ouyang Xiu appreciated Wang Anshi’s talent very much. He had recommended him to the court several times, and Wang Anshi was also very grateful for the grace of understanding. Later, Wang Anshi presided over the reform process. Ouyang Xiu’s political views are different from Wang. The relevant studies have different opinions on the relationship between the two in the later period. However, according to our analysis, we believe that the relationship between the two is still positive. After the death of Ouyang Xiu, Wang Anshi also made a memorial service for him. With regard to the results of this part, we have found relevant research to support our findings.
V-B3 Group Partition
Finally, we obtained the results of the graph partition results in Figure 7. The system divide Su Shi, Su Zhe, Su Xun into a group, and divide Wang Anshi, Ouyang Xiu, Zeng Gong into another group.
In fact, Su Shi, Su Zhe, Su Xun are a family like Alexandre Dumas family and the Brontë family in history. Su Xun is the father of Su Shi and Su Zhe. Although they are a family, their personality, works, and experience are totally different. But if we focus on the six figures networks, it will be found that Su family all opposed Wang’s opinions, which matches the historical facts. Compare to Su family, Ouyang Xiu, Zeng Gong is more friendly to Wang. And when you make depth , it will give different results. The partition of this six people in a large network is not stable, which are not the same like .
Vi Conclusion and Future work
In this paper, we present a new research framework for exploring the social relationship of historical people. Based on the proposed the signed social networks model and group partition algorithm, we have built an application to help people analyze and understand the social relationships of the ancients. Via our framework, the social research questions can be transformed into a computing task. People not only can easily use our application to visualize ancient figures’ social network but also more clearly understand kinds of literature opinion. In the case study, our system produced some information that meets the facts and social experts’ judgment. There are still several works to be done in future. For example, CBDB dataset need to be enriched and improved. How to apply machine learning methods to label social network edges is a research problem. Further study is required for more reasonable and stable for graph partition and structural balance.
This paper is inspired by Peking University Digital Humanities Forum, and Prof Luo’s Network Science Course.
-  L. Naichang, “Su shi’s association with wang anshi,” Journal of Northeast Normal University: a Philosophy and Social Science Edition, no. 3, pp. 45–51, 1981.
-  G. Yongxin, “The friendship between ouyang xiu and wang anshi,” Literary heritage, vol. 5, p. 016, 2001.
-  Y. Shihua and Z. Guangyu, “Sharing weal and woe and supporting each other - on friendship between su shi and wang gong,” Journal of Jiangsu University of Science and Technology (Social Science Edition), vol. 13, no. 2, pp. 50–58, 2013.
-  Y. Zhu, J. Yu, and J. Wu, “Chro-ring: a time-oriented visual approach to represent writer’s history,” The Visual Computer, vol. 32, no. 9, pp. 1133–1149, 2016.
-  M. A. Fuller, “The china biographical database user’s guide,” 2015.
-  C.-L. Liu and K.-F. Luo, “Tracking words in chinese poetry of tang and song dynasties with the china biographical database,” arXiv preprint arXiv:1611.06320, 2016.
-  C.-L. Liu, “Flexible computing services for comparisons and analyses of classical chinese poetry,” arXiv preprint arXiv:1709.05729, 2017.
-  X. Guo, “An investigation of the digitization of chinese genealogical records,” Taiwan University of Information Network and Multimedia Institute Dissertation, pp. 1–40, 2016.
-  S.-H. Sie, H.-R. Ke, and S.-B. Chang, “Development of a text retrieval and mining system for taiwanese historical people,” in Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), 2017. IEEE, 2017, pp. 56–62.
-  J. Leskovec, D. Huttenlocher, and J. Kleinberg, “Signed networks in social media,” in Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 2010, pp. 1361–1370.
-  F. Heider, “Attitudes and cognitive organization,” The Journal of psychology, vol. 21, no. 1, pp. 107–112, 1946.
-  D. Easley and J. Kleinberg, Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010.
-  M. Levorato, L. Drummond, Y. Frota, and R. Figueiredo, “An ils algorithm to evaluate structural balance in signed social networks,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, 2015, pp. 1117–1122.
-  A. Srinivasan, “Local balancing influences global structure in social networks,” Proceedings of the National Academy of Sciences, vol. 108, no. 5, pp. 1751–1752, 2011.
-  P. Doreian and A. Mrvar, “A partitioning approach to structural balance,” Social networks, vol. 18, no. 2, pp. 149–168, 1996.
-  ——, “Partitioning signed social networks,” Social Networks, vol. 31, no. 1, pp. 1–11, 2009.
-  V. A. Traag and J. Bruggeman, “Community detection in networks with positive and negative links,” Physical Review E, vol. 80, no. 3, p. 036115, 2009.
-  S. Wang, J. Tang, C. Aggarwal, Y. Chang, and H. Liu, “Signed network embedding in social media,” in Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 2017, pp. 327–335.
-  A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016, pp. 855–864.
-  L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
-  B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 701–710.
-  J. Saramäki, M. Kivelä, J.-P. Onnela, K. Kaski, and J. Kertesz, “Generalizations of the clustering coefficient to weighted complex networks,” Physical Review E, vol. 75, no. 2, p. 027105, 2007.
-  M. Grandjean, “A social network analysis of twitter: Mapping the digital humanities community,” Cogent Arts & Humanities, vol. 3, no. 1, p. 1171458, 2016.
-  S. Wasserman and K. Faust, Social network analysis: Methods and applications. Cambridge university press, 1994, vol. 8.
-  M. Girvan and M. E. Newman, “Community structure in social and biological networks,” Proceedings of the national academy of sciences, vol. 99, no. 12, pp. 7821–7826, 2002.