Bollywood Movie Corpus for Text, Images and Videos

10/11/2017
by   Nishtha Madaan, et al.
0

In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1970-2017. The corpus contains csv files with the following data about each movie - Wikipedia title of movie, cast, plot text, co-referenced plot text, soundtrack information, link to movie poster, caption of movie poster, number of males in poster, number of females in poster. In addition to that, corresponding to each cast member the following data is available - cast name, cast gender, cast verbs, cast adjectives, cast relations, cast centrality, cast mentions. We present some preliminary results on the task of bias removal which suggest that the data-set is quite useful for performing such tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2017

Analyzing Gender Stereotyping in Bollywood Movies

The presence of gender stereotypes in many aspects of society is a well-...
research
06/17/2015

Early Predictions of Movie Success: the Who, What, and When of Profitability

This paper proposes a decision support system to aid movie investment de...
research
05/08/2020

Condensed Movies: Story Based Retrieval with Contextual Embeddings

Our objective in this work is the long range understanding of the narrat...
research
10/13/2021

Presenting a Larger Up-to-date Movie Dataset and Investigating the Effects of Pre-released Attributes on Gross Revenue

Movie-making has become one of the most costly and risky endeavors in th...
research
03/15/2019

Using Data Science to Understand the Film Industry's Gender Gap

Data science can offer answers to a wide range of social science questio...
research
11/04/2015

Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy

Wikipedia is widely used for finding general information about a wide va...
research
06/29/2021

Using Robust Regression to Find Font Usage Trends

Fonts have had trends throughout their history, not only in when they we...

Please sign up or login with your details

Forgot password? Click here to reset