Gandhipedia: A one-stop AI-enabled portal for browsing Gandhian literature, life-events and his social network

06/05/2020 ∙ by Sayantan Adak, et al. ∙ 0

We introduce an AI-enabled portal that presents an excellent visualization of Mahatma Gandhi's life events by constructing temporal and spatial social networks from the Gandhian literature. Applying an ensemble of methods drawn from NLTK, Polyglot and Spacy we extract the key persons and places that find mentions in Gandhi's written works. We visualize these entities and connections between them based on co-mentions within the same time frame as networks in an interactive web portal. The nodes in the network, when clicked, fire search queries about the entity and all the information about the entity presented in the corresponding book from which the network is constructed, are retrieved and presented back on the portal. Overall, this system can be used as a digital and user-friendly resource to study Gandhian literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Mahatma Gandhi111https://en.wikipedia.org/wiki/Mahatma_Gandhi was the leader of India’s non-violent independence movement against British rule and in South Africa who advocated for the civil rights of Indians. Gandhi’s life has been a major source of inspiration for many people. There is much written about his life, which are of great relevance to not only historians, but also general public interested in learning about his life and his ideologies.

Compilation of these documents, and visual depiction of Gandhi’s life could of unprecedented value to many people.

State-of-the-art: Websites like gandhiheritageportal.org, mkgandhi.org, etc., contain major works of Gandhi, be it his letters, magazines, newspaper articles, or speeches in a digital format providing easy access to the literature on Gandhi. In addition, there are a good number of scholarly digital libraries, which have sizable collection of resources on Gandhi in digital forms222JSTOR, HathiTrust Digital Library, World EBook Library, South Asian Archives, Internet Archives, Project Gutenberg, Shodhganga, WorldCat.. The abundance of literature on these platforms results in information overload and makes it difficult to gather relevant information as per a user’s interest. This is when life has become faster than ever and users tend to get disengaged very quickly.

Contributions: In this demo paper, we discuss the development of Gandhipedia, an interactive web portal that aims to digitize all the writings of Mahatma Gandhi and present them in a well organised format which can serve as a useful and easy-to-access resource for users. We have successfully constructed the spatial and temporal networks for the 7 key texts authored by Mahatma Gandhi. The nodes in these networks are entities (people or places that find mention in these texts) and edges correspond to co-mentions of entities within a pre-defined time window. The nodes in the network on click fire a query about the corresponding entity and search the text to return excerpts from different chapters in the text where the entity occurred. Consider a user interested in all occurrences of “Gopal Krishna Gokhale”333https://en.wikipedia.org/wiki/Gopal_Krishna_Gokhale in the autobiography (The Story of My Experiments with Truth444https://en.wikipedia.org/wiki/The_Story_of_My_Experiments_with_Truth) of Gandhi. The user can look into the network, quickly find the node “Gokhale” and click on it to obtain all information organised chapter/time wise about “Gokhale” without having to skim through the entire autobiography to find and manually aggregate this information. In addition, the portal also enables text based query search across 100 volumes of the Collected Works of Mahatma Gandhi. The current portal is hosted at http://gandhipedia150.in.

2. Gandhipedia Architecture

We leverage the Collected Works of Mahatma Gandhi (CWMG) available at (Preservation and Trust, 2013), a collection of 100 volumes of letters, speeches and books written by Gandhi for the construction of the portal. Figure 1 shows the detailed Gandhipedia portal architecture. It consists of four distinct modules: (i) data module, (ii) query processing module, (iii) network creation module, and (iv) user interface module. The arrows represent the direction of data flow. Next, we describe each module in detail.

Data module: We use pdf2xml tool555https://sourceforge.net/projects/pdf2xml/ to convert the volumes from PDF format to XML format. The XML files are processed to detect chapter boundaries. Each chapter refers to a book, letter, speech, or newspaper article. Each chapter is stored individually in text format in MongoDb666MongoDb: https://www.apress.com/gp/book/9781430230519.

Network creation module: We currently implement this module for seven books authored by Mahatma Gandhi, including his autobiography. An example temporal network is shown in Figure 2. Different colors represent different communities. The network construction methodology is described as follows – (i) Identifying place/location entities: We recognize named entities (place/person) occurring in different chapters of books. We use an ensemble of three different NER libraries, NLTK777https://www.nltk.org/, Polyglot888https://draquet.github.io/PolyGlot/, and Spacy999https://spacy.io/. We only consider entities that were identified by at least two libraries. (ii) Filtering noisy entities: Above NER mechanism results in several common nouns that were neither people or person. We filter out these entities using a list of cities and countries from NLTK. We use WordNet to filter improper nouns101010https://wordnet.princeton.edu/. (iii) Temporal networks creation and clustering: Temporal network creation consists of three important steps. First, each chapter is mapped to a specific year. All entities in a particular chapter are associated with that year. All entities in a particular year are linked to each other. In addition, entities of year are linked with entities of year and year . The hypothesis behind construction of such networks is that entities mentioned close by in time could possibly be ‘socially’ related. Finally, community detection is performed using standard Louvain and Infomap clustering algorithms (Held et al., 2016).

Figure 1. Architecture of Gandhipedia.
Figure 2. Sample temporal people graph shown in web portal. The broad level interpretations of each of the communities are noted in a line of text below the network.

Query processing module: The query processing module processes two distinct types of queries, network visualization related and full-text based search using ElasticSearch (Kononenko et al., 2014). Search through the network: For the network visualization, when a node is clicked (person/place) a search query with the entity as the query term is fired. The search results are organised time wise and chapter wise and returned back to the user. Search through the CWMG: The current full-text search supports first 40 volumes of CWMG in a textual format highlighting the query and its results as shown in Figure 3. It supports general query retrieval using MongoDb database.

Figure 3. Sample search result shown on the web portal.

User interface module: This module fetches results from the query processing and the network creation modules. The entity search displays the volumes, books and then chapters, with the searched entity highlighted. The network visualization111111Dash: https://github.com/plotly/dash on top of Plotly: https://plot.ly. depicts the generated interactive networks arranged over time.

3. Conclusion and Future Work

We develop Gandhipedia to study Gandhian literature and his social networks in an interactive manner. In future, we plan to index more than 100 volumes of CWMG and present multilingual search facility. We also aim to develop timelines to represent which personalities/places are mentioned for the first-time in the literature. This helps in development of nice chronological visualisations.

References

  • P. Held, B. Krause, and R. Kruse (2016) Dynamic clustering in social networks using louvain and infomap method. In 2016 Third European Network Intelligence Conference (ENIC), Vol. , Los Alamitos, CA, USA, pp. 61–68. External Links: ISSN , Document, Link Cited by: §2.
  • O. Kononenko, O. Baysal, R. Holmes, and M. W. Godfrey (2014) Mining modern repositories with elasticsearch. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, New York, NY, USA, pp. 328–331. External Links: ISBN 9781450328630, Link, Document Cited by: §2.
  • S. A. Preservation and M. Trust (2013) The Collected Works of Mahatma Gandhi. Note: https://www.gandhiheritageportal.org/the-collected-works-of-mahatma-gandhi[Online; accessed 22-February-2020] Cited by: §2.