Analyzing Folktales of Different Regions Using Topic Modeling and Clustering

06/09/2022
by   Jacob Werzinsky, et al.
0

This paper employs two major natural language processing techniques, topic modeling and clustering, to find patterns in folktales and reveal cultural relationships between regions. In particular, we used Latent Dirichlet Allocation and BERTopic to extract the recurring elements as well as K-means clustering to group folktales. Our paper tries to answer the question what are the similarities and differences between folktales, and what do they say about culture. Here we show that the common trends between folktales are family, food, traditional gender roles, mythological figures, and animals. Also, folktales topics differ based on geographical location with folktales found in different regions having different animals and environment. We were not surprised to find that religious figures and animals are some of the common topics in all cultures. However, we were surprised that European and Asian folktales were often paired together. Our results demonstrate the prevalence of certain elements in cultures across the world. We anticipate our work to be a resource to future research of folktales and an example of using natural language processing to analyze documents in specific domains. Furthermore, since we only analyzed the documents based on their topics, more work could be done in analyzing the structure, sentiment, and the characters of these folktales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gaine...
research
08/19/2021

A Framework for Neural Topic Modeling of Text Corpora

Topic Modeling refers to the problem of discovering the main topics that...
research
04/23/2019

Exploring the Daschle Collection using Text Mining

A U.S. Senator from South Dakota donated documents that were accumulated...
research
12/19/2022

Human in the loop: How to effectively create coherent topics by manually labeling only a few documents per class

Few-shot methods for accurate modeling under sparse label-settings have ...
research
08/16/2021

An NLP approach to quantify dynamic salience of predefined topics in a text corpus

The proliferation of news media available online simultaneously presents...
research
09/28/2020

Visual Exploration and Knowledge Discovery from Biomedical Dark Data

Data visualization techniques proffer efficient means to organize and pr...

Please sign up or login with your details

Forgot password? Click here to reset