Measuring Women Representation and Impact in Films over Time

by   Luoying Yang, et al.
University of Rochester

Women have always been underrepresented in movies and not until recently do women representation in movies improve. To investigate the improvement of women representation and its relationship with a movie's success, we propose a new measure, the female cast ratio, and compare it to the commonly used Bechdel test result. We employ generalized linear regression with L_1 penalty and a Random Forest model to identify the predictors that are influential on women representation, and evaluate the relationship between women representation and a movie's success in three aspects: revenue/budget ratio, rating and popularity. Three important findings in our study have highlighted the difficulties women in the film industry face in both upstream and downstream. First, female filmmakers especially female screenplay writers are instrumental for movies to have better women representation, but the percentage of female filmmakers has been very low. Second, lower budgets are often made to support movies that could tell good stories about women, and this usually cause the films to in turn receive more criticisms. Finally, the demand for better women presentation from moviegoers has also not been strong enough to compel the film industry for a change, as movies that have poor women representation can still be very popular and successful in the box office.



page 5


Predicting Gross Movie Revenue

'There is no terror in the bang, only is the anticipation of it' - Alfre...

A Comprehensive Study on Various Statistical Techniques for Prediction of Movie Success

The film industry is one of the most popular entertainment industries an...

Presenting a Larger Up-to-date Movie Dataset and Investigating the Effects of Pre-released Attributes on Gross Revenue

Movie-making has become one of the most costly and risky endeavors in th...

Using Data Science to Understand the Film Industry's Gender Gap

Data science can offer answers to a wide range of social science questio...

Data-driven Blockbuster Planning on Online Movie Knowledge Library

In the era of big data, logistic planning can be made data-driven to tak...

Computational appraisal of gender representativeness in popular movies

Gender representation in mass media has long been studied by qualitative...

Analyzing movies to predict their commercial viability for producers

Upon film premiere, a major form of speculation concerns the relative su...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Films are a common entertainment form that fulfills the audience’s desire to make emotional connections to characters and learn about the social world. However, it was very difficult for women to see inspiring counterparts on the big screen in the past. Many key roles in film-making, such as directors and cinematographers, were for many decades dominated almost entirely by men  [8], and women did not have enough power to make demands in the film market. Consequently, women are constantly under-represented in movies. Even when they are present, women are portrayed often in circumscribed and subordinated ways in traditionally feminine (i.e., stereotyped) roles, such as nonprofessionals, homemakers, wives or parents, and sexual gatekeepers  [3]. Lacking a role model on the big screen is detrimental for young girls. They are discouraged from pursuing ambitions and participating actively in social affairs  [4]. Therefore, women under-representation is a critical issue that must be addressed.

In more recent times, women have risen and made inroads to many of these fields and films have started to respond to the female viewers with strong and well-rounded female characters [12, 5]. There are many studies and projects devoted to studying the evolving feminism in films, centering on both the upstream effects, in which content is structured through the actions of major filmmakers in gendered organizations who presume the public’s preference, as well as the downstream effects, where audiences respond to content and attitudes are formed and reinforced. In our work of examining gender representation in films, we ask the following two questions: What factors in the film-making process has significant impact on women representation in the films, and do films that feature better women representation outperform those that do not commercially?

To answer these questions, we need to define a proper measure for women representation in films. The Bechdel test is a popular measure for examining how well-rounded and complete the representations of women in media are. The test asks only three questions:

  • Are there at least two women in the film who have names?

  • Do those women talk to each other?

  • Do they talk to each other about something other than a man?

The test is simple and has been widely used in many film gender studies to measure women representation[10, 9]. However, it has also been criticized to fail to reveal the hidden gender imbalance structure[11]. A movie can pass the Bechdel test yet still portrait women as auxiliary characters with minimal screen time. For example, both Wonder Woman and The Martian passed the test while the importance of female characters is very different between the two movies. Therefore, we propose a new measure of women representation in the films: the percentage of women cast in the whole cast. This proposal is based on the intuition that if more women are cast for the films, then women will have more representation on the big screen.

In addition to a new measure of women representation, we would also like to consider different aspects of movie success. The box office return is a popular measure of movie success because it is directly linked to the profit. However, movie ratings and popularity are also important measures of a movie’s success. Public acclamation and popularity would bring confidence to the studios and producers to inspire similar films or sequels even when the box office return is not substantially high.

Our analyses are completed in two steps. Drawing on a sample of the most widely distributed films, we first combine a content analysis using the Bechdel Test with the film-making data such as the budget and gender of crews to examine what factors are influential on encouraging women representation during the film-making process. Next, including all film characteristics such as women representation, budget, genre and many others, we further examine whether better women representation can increase the chance of success for films while adjusting for other possible confounding variables. In doing so, we contribute to sociological theories about how feminism reciprocally forms and impacts.

The remainder of the paper is organized as follows. In section 2, we review the background literature that studies the gender gap in films and analyzes the box office performance. In section 3, we describe the data features and the data collection process. In section 4, we present the methods and models used throughout this study. We present our experiments and the corresponding results in Section 4. In section 5, we present and discuss our conclusions and future directions.

Related work

Most of the studies that center on gender inequality in media are content analysis  [12, 5, 11], which studies how women are portrayed in media. While content analysis is beyond the scope of our study, the related works have shown that the roles of women are evolving on the big screen. There are also many studies about women under-presentation, albeit on a small scale and with simple analysis. Thomas has shown that in the top 10 worldwide highest grossing films of 2016, women speak only 27% of the lines [15]. The finding is novel but the sample size is too small. A team at the Rhodes Information Initiative at Duke University published a report that movies which pass the Bechdel Test have a statistically significant higher return on investment when compared to films that do not [2]. The tech company Shift72 published a similar report which states that all films that pass the Bechdel Test surpassed the box office returns of films that fail this test 111 However, these reports only presented simple statistics of mere comparisons between movies that pass the test and the movies that do not, which are too simple to conclude that “more women in the film means more success (movies are more profitable).” Such analyses failed to adjust for confounding sources such as the genres. Linder et al. have shown that after adjusting for confounding factors such as production budgets, movies with more women representation (passing the Bechdel test) tend to have smaller production budgets and consequently earn less money. However, given the same budget, women representation do not boost the box office return  [9]. They have conducted a similar analysis for movie critics using the same set of movies and obtained a similar conclusion that women presentation has little effect on critics [10]. The analysis in these papers is the closest to ours. However, this study only drew samples of 974 films from the 2000-2009 decade and considered only a few confounding variables such as the budget and genres. Our analysis include a larger sample from a longer time span with a larger set of confounding variables to provide deeper insights over time.


Three data sources are used in this project: the Bechdel Test Movie List 222https://, the Internet Movie Database (IMDb) 333 and the Movie Database (TMDb) 444 The Bechdel Test Movie List contains over 8000 movies crowdsourced through the Internet with a flag showing if each of them passes the Bechdel Test. IMDb has a collection of up-to-date movie dataset, which is widely used among many movie related projects. However, since IMDb does not provide an official API, we have a limited access to some of the full data, such as the full list of the cast and crew members of each movie. TMDb on the other hand is a crowdsource online movie database similar to IMDb. It is used less often, but with the advantage of having an API so that we can easily access a variety of movie-related data of interest to us.

Data Acquisition

The Bechdel Test Movie List is downloaded using its API. It contains 8190 movies at the time, with 7 fields. The relevant ones include imdbid (with which we can join with other datasets), title, year and rating (0 means no two women, 1 means no talking, 2 means talking about a man, 3 means it passes the test).

The IMDb dataset is downloaded from its website. In the dataset, only the title.basic table is used. It contains 9 fields, including tconst (imdbid), titleType, primaryTitle, originalTitle, isAdult, startYear, endYear, runtimeMinutes and genres.

Getting the TMDb dataset requires more work with its API. The first step is to obtain the tmdbids that correspond to the imdbids within the Bechdel Movie List. Then two more rounds of API usage are needed to obtain the movie detail and movie credits associated with each tmdbid. The the relevant fields in the movie details are: budget, revenue, vote average, vote count, popularity (a comprehensive measure calculated from release data, number of votes, number of views, etc. from TMDb) and production companies.

Data Preprocessing

For the TMDb fields, the revenue-budget ratio of each movie is calculated to represent the profitability. Those movies with 0 budget are removed from the list. The first 5 production companies of each movie are one-hot encoded. Next, the movie credits are aggregated to extract the total number of cast, female cast ratio, number of core crew members (directors, producers and screenplays) and the female core crew ratios.

Finally, the Bechdel Movie List, IMDb and TMDb datasets are joined according to the imdbid. There are 4232 observations and 67 variables. The continuous and count variables are then min-max normalized in preparation for modeling.


Our primary goal is to evaluate what factors have effect on women representation in the films and the impact of women representations on films’ success. Given this purpose, the primary method of our analysis is an explanatory model which focuses on modeling the true generation of sample data rather than making the best prediction. However, we also add a prediction model as the secondary task to evaluate the performance of our primary model. For our primary model, we consider the widely used Generalized Linear Regression with a penalty term for model selection. For our secondary model, we consider Random Forest. In order to compare the coefficient estimates of different variables from the regression model, variables are normalized by min-max normalization to

so that they are on the same scale, and the same dataset was fitted to both the regression model and Random Forest model. After obtaining the variable selection and predictors that rank results from the two methods, we compare the set of selected variables and the top ranked predictors to see if they are consistent. For both methods, we split the samples by a 7:3 ratio into the training and test sets and compare the prediction accuracy to see if they have similar performance.

Generalized Linear Regression with Penalty

Generalized linear regression is a broad class of models that includes regular regression for continuous response as well as models for discrete responses. The relationship between a dependent variable and one or more independent variables can be explained by the magnitude and sign of the regression coefficient estimates. Depending on the types of response variables, we include the regular linear regression for continuous responses and logistic regression for binary responses in our analysis. In addition, we also include a

regularization term to shrink the coefficient estimates toward 0 so that only highly correlated variables remain in the model [16]. A tuning parameter controls the degree of shrinkage such that varying the tuning parameter value results in different estimates. We employ the Bayesian Information Criterion (BIC) to choose the optimal model, which focuses on finding the true model that generates the data. If using an error-based or accuracy-based criterion such as cross-validation for variable selection, when correlated variables enter the model, any one of the correlated variables being selected can give a good predictive performance; however BIC selects the variables that yield the best likelihood for the model and removes the rest, thus serving our purpose of explanatory modeling perfectly.

Random Forest

The Random Forest is one of the most effective methods for predictive analysis as either a classification algorithm or regression model  [6]

. By selecting the nodes randomly and aggregating the trees for pooled results, Random Forest is known to be robust to noise and outliers, as well as the value changes of parameters on a fine scale, including the number of trees aggregated, and the number of nodes being selected at each node split. Therefore, we do not manually vary the parameter values on a fine scale in order to achieve the optimal results. The model selection, which is the ranking of predictors by their changes in purity, is done internally through computing the Out-of-bag (OOB) error from aggregating the trees.


In this section, we present the basic features of women representation (i.e. the Bechdel test result and female cast ratio), movie success (i.e. the revenue/budget ratio, movie rating and movie popularity), and our modeling results of their associations.


We examine the two most important measures of women representation in movies, the Bechdel test result and the female cast ratio. About 55.01% of the movies in our dataset pass the Bechdel test. As Fig.1 shows, in the 21st century the movies that pass the test have outnumbered the ones that did not. The average female cast ratio is 0.26, and Fig.1 shows that the movies in latter time have higher female cast ratios compared to the older time. Using a simple univariate linear regression, we conclude that as time progresses, the number of movies that pass the Bechdel test increases (coefficient estimate = 1.61, p-value) and more females are cast in the movies (coefficient estimate = 0.11, p-value

0.0001). These two measures are highly correlated (t test, p-value

0.0001), as movies that have passed the Bechdel test have a higher female cast ratio (0.31) than those which did not (0.20). Among all the variables, we are particularly interested in budget because we want to see whether movies with a better women representation were better supported. The results are disappointing in that higher female cast ratio is significantly associated with lower budget (Pearson correlation coefficient = -0.099, p-value0.0001). Similarly, movies that pass the Bechdel test had a significantly lower budget than movies that did not (t test, p-value = 0.001).

Figure 1: Women representation in movies over time. (a) The distribution of the Bechdel test results over the last 12 decades; (b) The distribution of the female cast ratio over the last 12 decades.

Exploring their association with movie success, we found that the revenue/budget ratio is not significantly associated with the female cast ratio (Pearson correlation coefficient = 0.016, p-value=0.28). Meanwhile, the revenue/budget ratio does not differ significantly between the movies with different Bechdel test results (t test, p-value=0.14). On the other hand, higher movie rating is significantly correlated with lower female cast ratio (Pearson correlation coefficient = -0.15, p-value0.0001). Movie ratings in the group of movies that pass the Bechdel test are significantly lower than the group of movies that do not (t test, p-value0.0001); Similarly, higher movie popularity is also significantly correlated with lower female cast ratio (Pearson correlation coefficient = -0.091, p-value0.0001). Movies that pass the Bechdel test are significantly less popular than the movies that do not.

The statistical analyses have shown that women representation seems to have a negative impact on movie success. However, we cannot conclude the relationship between movie success and women representation. We have found that women representation is improving over time, and audience reaction to movies is also changing over time. From Fig.2, overall, movie rating decreases significantly as time progresses (Pearson correlation coefficient = -0.25, p-value0.0001) and popularity increases significantly as time progresses (Pearson correlation coefficient = 0.097, p-value0.0001). The revenue/budget ratio does not change significantly over time (Pearson correlation coefficient = -0.012, p-value=0.43) because both budget (Pearson correlation coefficient = 0.29, p-value0.0001) and revenue (Pearson correlation coefficient = 0.19, p-value0.0001) increase significantly over time. We further investigate the relationship between movie success and women representation in different time periods. The majority of the movies were produced in the United States, and the feminism movement in the United States reached a peak around 1970. From Fig.1, movies that did not pass the Bechdel test outnumber the movies that did dramatically before 1970, but afterwards there were more movies that passed the test. Before 1970, the relationship between the female cast ratio and movie success is the same as for the overall data. However, all three aspects did not differ significantly between the movies that passed the Bechdel test and movies that did not. Going into 1970-2015, the relationship is the same as for the overall data. After 2015, movie rating is no longer significantly correlated with the female cast ratio (Pearson correlation coefficient = -0.078, p-value=0.053), and all three aspects did not differ significantly between the movies that passed the Bechdel test and movies that did not. We can see that before 1970 and after 2015, the movies that passed the Bechdel test achieved success at the similar level as movies that did not pass the test, while between these two time points movies with better women representation suffered from a lower rating and lower popularity.

From the preliminary analysis, we can see that there exists significant relationship between women representation and movie success. However, they are also correlated with time as women representation is improving, movie popularity is increasing and rating is decreasing. We rely on the regression model and Random Forest regression to adjust for any confounding effect throughout the analyses.

Figure 2: Movie budget, revenue, rating and popularity over the last 12 decades.

The Improvement in Women Representation

We first evaluated the factors that have strong association with the Bechdel test result. Setting the Bechdel test result as the response variable of a logistic regression with penalty and a Random Forest, we include variables year, adult content, run minutes, budget, number of casts, female director ratio, female producer ratio, female screenplay writer ratio, 26 production companies indicators and 25 genres indicators as the predictors. The same set of predictors was also fitted to the models with the female cast ratio being the response variable. The variable selection results are shown in Table 1. For both response variables, the regression model and the Random Forest model have very similar predictive performance. The female screenplay writer ratio was selected in both regression models to have positive association with the Bechdel test result (coefficient estimate = 0.79) and female cast ratio (coefficient estimate = 0.042) and had very high rankings in both Random Forest models (Bechdel test=6, female cast ratio=5), implying that women participation in story-writing is a key to determining how women are represented on screen, outperforming the female director ratio and female producer ratio. These two variables are also selected with relatively smaller magnitude and decent ranking in the Random Forest model, due to the fact that these three variables were correlated and their effects on the response variables were adjusted for the presence of the female screenplay writer ratio. Year is also a very strong predictor of women representation, which is consistent with Fig.1 in that as time progresses women become better represented. For the female cast ratio, the number of casts is the most influential factor (coefficient estimate=-0.19, ranking=1) as a higher number of casts leads to a lower female cast ratio. Such association implies that male casts are more likely to be hired than female casts. Many genres were identified to have better women representation, such as Romance, selected by both regression models (Bechdel test: 0.18, female cast ratio: 0.042) and had decent rankings in both Random Forest models (Bechdel test: 14, female cast ratio: 6), and Horror selected by the Bechdel test regression (coefficient estimate = 0.20) and decent rankings in both Random Forest models (Bechdel test: 11, female cast ratio: 14). Here, Horror was paid special attention due to its association with movie success in future analysis. Some were identified to have less women representation, such as Action and Crime. Although significant correlations of budget with the female cast ratio and Bechdel test results were found in the preliminary statistical analysis, budget was not selected in either of the regression models. However, this fact can be explained by the gender preference of genres. From Fig.3, we can see that Action and Crime movies, which were identified in our models to be negatively associated with women representation, are likely to receive much higher budget than movies that are not associated with these genres. Meanwhile, they also tend to feature more male casts than female casts. Romance and Horror movies (surprisingly) feature more females than males on the average. However, they also tend to receive a lower budget. The imbalance between the budgets of genres and gender preference of genres leads to the imbalance of budgets on movies with different levels of women representation.

Bechdel test Female cast ratio
Variable Regression Random Forest Regression Random Forest
Test accuracy 64.9% 63.1% Test MSE
Female screenplay ratio 0.79 6 0.042 5
Female director ratio 0.15 21 1.25 10
Female producer ratio 0.047 5 0.00 7
Year 0.67 2 0.016 4
Number of casts 0.00 4 -0.19 1
Action -0.26 8 -0.018 8
Comedy 0.00 7 0.011 9
Crime -0.20 9 0.00 11
Family 0.21 22 0.00 34
Horror 0.20 11 0.00 14
Music 0.04 28 0.00 22
Romance 0.18 14 0.042 6
Sport -0.15 31 0.00 35
Table 1: Regression coefficient estimates and Random Forest predictor importance ranking for the variables selected by the regularized regression models of the Bechdel test and female cast ratio. The importance of predictors in the Random Forest model is defined as the change in the Gini-impurity for classification and MSE for regression, where larger changes indicate greater importance.
Figure 3: The female cast ratio and budget distribution in the genres selected by the regression model.

Women Representation and Movie Success

Setting the movie revenue/budget ratio (R/B ratio), rating and popularity each as the response variable of an individual regression model with penalty and a Random Forest model, we include the variables of year, adult content, run time minutes, number of casts, female cast ratio, female core crew ratio, the bechdel test, 26 production companies indicators and 25 genres indicators as the predictors.

The female cast ratio was selected to have strong positive association with R/B ratio (coefficient estimate=) and ranked as the most important predictor in the Random Forest model (ranking=1). Meanwhile, three similar genres have been identified to have high positive association with the R/B ratio: Horror (coefficient estimate=, ranking=7), Mystery (coefficient estimate=, ranking=5), and Thriller (coefficient estimate=, ranking=6). As we have acquired the knowledge from previous analysis that the female cast ratio is positively associated with the Horror genre, we are confident to conclude that with the presence of Horror and the other two correlated genres in the model, the female cast ratio has a positive effect on a movie’s profitability after considering the effects of genres. This same set of variables was also selected for movie rating, however their associations with rating are in the opposite direction in contrast to the R/B ratio. The female cast ratio has negative association with rating (coefficient estimate=) and very high ranking in the Random Forest model (rank=4). Horror (coefficient estimate=, ranking=6) and Mystery (coefficient estimate=, ranking=16) are identified to be negatively associated with rating. It seems that the movies featuring more women and horror, mystery elements are likely to make more money, but receive more criticism. Neither of the women representation measures, the female cast ratio and Bechdel test result, was selected for movie popularity. However, the female cast ratio still receives a high ranking in the Random Forest model (ranking=4), possibly due to its correlation with other variables.

Due to the fact that the revenue/budget ratio is calculated directly from budget, and that in the previous analysis we have learned that budget does not have a direct impact on women representation, we exclude budget from our model since its strong correlation with the response variable would overwhelm the effects of other variables. However, our models still demonstrate the relationship between budget and movie success. The run-time in minutes and the number of casts are identified to have very strong association with all three aspects of movie success and receive very high ranking in the Random Forest models, and they are a proxy of a movie’s budget as a higher budget leads to a larger number of casts (Spearman correlation coefficient=0.41, p-value0.0001) and a longer run time in minutes (Pearson correlation coefficient=0.26, p-value0.0001). Meanwhile, our preliminary analysis shows that the female cast ratio is not significantly correlated with the R/B ratio, and our regression model identifies it to have quite strong association possibly due to the fact that the female cast ratio is negatively associated with the budget. Movies featuring a higher female cast ratio tend to have a lower budget, which is likely to lead to a higher R/B ratio even though the revenue is not particularly high. However, such an effect is not strong enough to be detected by the correlation test. Our modeling results are consistent with the preliminary analysis for ratings in that a higher female cast ratio leads to a lower movie ratings. For movie popularity, the female cast ratio was not selected in the regression model despite that the Pearson correlation test identifies them being highly correlated. This again can be explained by the confounding effect of genres and production companies. The regression model identifies some genres being positively associated with popularity, such as Action (coefficient estimate=, ranking=6) and Adventure (coefficient estimate=, ranking=5), as well as one production company Disney (coefficient estimate=, ranking=8). Meanwhile, it also identifies one genre being negatively associated with popularity, Drama (coefficient estimate=, ranking=10). Fig.4 shows that Action, Adventure and Sci-Fi movies, which tend to be very popular, also tend to feature fewer female casts; Drama movies, on the other hand, feature slightly more female casts but also tend to be less popular than other genres. Disney, known for creating strong and well-rounded female characters, in fact still features more male casts (or male voices in animation) than female casts in their productions. Therefore, the high popularity of Disney movies does not help associate better women representation with high movie popularity.

R/B ratio Rating Popularity
Variable Regression Random Forest Regression Random Forest Regression Random Forest
Test MSE
Female cast ratio 1 4 0.00 4
Number of casts 0.00 2 0.014 3 0.040 1
Run time minutes 3 0.29 1 0.023 3
Year 0.00 4 -0.088 2 2
Action 0.00 19 0.00 8 6
Adventure 0.00 15 0.00 12 5
Animation 0.00 27 10 0.00 9
Drama 8 0.015 5 10
Horror 7 -0.014 6 0.00 24
Mystery 5 -0.19 16 0.00 26
SciFi 11 -0.018 15 13
Thriller 6 0.011 13 0.00 17
Disney 0.00 28 0.00 28 8
Table 2: Regression coefficient estimates and Random Forest predictors importance ranking for variables selected by the regularized regression models of movie revenue/budget ratio, rating and popularity.
Figure 4: The female cast ratio and movie popularity distribution in the genres of Action, Adventure, Drama, Sci-Fi and production company of Disney, selected by the regression model.


Overall, our findings regarding women representation indicate that women representation are critically influenced by female crews’ work, especially female screenplay writers, in the film-making process. In addition, it is also evolving throughout time as a result of other factors outside the movies and thus not included in the dataset, for example, as implied by the variable year. Our findings also reflect some difficulties actresses face, for example, male casts are more likely to be hired than female casts, genres that tend to receive higher budgets also prefer male casts to portray the stories, and so on. From our investigation into the relationship between women representation and movies’ success, we discover that compared to the Bechdel test result, the proposed female cast ratio is more directly linked to a movie’s success as it is often selected multiple times. In comparison, the Bechdel test result was not selected once. Moreover, movies featuring more women tend to have a lower budget which leads to a higher revenue/budget return. However, they also suffer from more criticism, possibly due to the low budget invested. Considering that the number of casts and run-time minutes are both selected to have very high positive impact on a movie’s rating, and that they are both a proxy of the budget, this is a very plausible explanation. The female cast ratio is not directly linked to a movie’s popularity. However, we have also discovered that genres likely to be popular, such as adventure and action, also tend to feature fewer female casts. Disney has made contributions to better women representation on the big screen and its productions are often very popular. However, their productions still feature more males than females.

Our findings have demonstrated that the difficulties women in the film industry face are in both upstream and downstream: female filmmakers especially female screenplay writers are instrumental for movies to have better women representation, but the percentages of female filmmakers are very low at only 6.0% of the directors, 9.7% of the producers and 12.2% of the screenplay writers. Meanwhile, lower budgets are made to support the movies that could tell good stories about women, thus causing the films to receive more criticisms in return. The demand of better women presentation from viewers is also not strong enough to press the film industry for a change, as movies that have poor women representation can still be very profitable.


Our study provides a big picture about women representation in movies and how it is perceived by audience over time. Unfortunately, under-representation of women in movies is not the only difficulty women are facing. Apart from the low ratio of female casts, portrayal of women in stereotypical ways that reflect and sustain socially endorsed views of genders, and depictions of relationships between men and women that emphasize traditional roles and normalize violence against women, are another two important themes of how media portray gender  [19]. Although our newly-defined measure of women representation, the female cast ratio, has shown to be more successful in evaluating movie success than the Bechdel test, it also has limitations that fail to further explain how women are portrayed in movies. For example, our findings indicate that horror movies like to feature more women than men, and the reason behind this phenomenon is that the audience enjoy the victimization of women more then the victimization of men  [14]. In other words, the root of ”favoritism” toward women in horror movies is still that people want to see women being helpless and passive rather than men. Another genre identified by our analysis that favor women, Romance, is also likely to portray women in a stereotypical way. A linguistics study of the movie ”The Best of Me” shows some differences in expressions between men and women, in that men prefer to use commanding directives while women use directives for suggesting or requesting, and that men tend to use swear words to express anger and women tend to use swear words to express bad feelings  [13]. Even in a movie that targeted the female audience, women were portrayed as docile and submissive compared to men. Meanwhile, some studies have used dialogue speaking time to evaluate how much women talk in movies and discovered that even in some female-led movies (such as Disney princess movies), the lead female’s speaking time could be outnumbered by male casts  [1]. Whether they speak to express their ideas, feelings, make command and such, men speak more than women and their influence on the audience than women is ultimately stronger.

We also consider many other factors that can influence the image of a character in movies for our future study to help us better understand how gender works in shaping a character’s image and how they influence the audience’s perception of movies. For example, race and ethnicity are both important factors in the audience’s expectation for a character. For example, studies have shown that African American music videos were significantly more likely to portray sexual content and sexualized female characters than White videos  [18]. In addition, the first African American performer to win the Academy Award was a female and much earlier than her male counterparts. The reason behind the phenomenon that minority women are more popular than minority men is possibly that traditionally women and minority were both considered submissive to white males, thus a minority woman portrayed in such a way is more acceptable to the mainstream than a minority man. Besides race and ethnicity, age is another important factor. While in movies both women and men in their 60s and older were dramatically underrepresented compared to their representation in the U.S. population  [7], older actresses experience greater difficulty in finding a job than older actors even for a celebrity [17]. The majority of male characters were in their 30s and 40s, and the majority of female characters were in their 20s and 30s. For male characters, leadership and occupational power increase with age; however, as female characters age, they were less likely to have goals  [7]. This observation coincides with the mainstream’s expectation of genders in that men hold authority and leadership which usually increases with age, while women are in more subordinate roles and often sexualized for which younger age is preferred.


  • [1] H. Anderson and M. Daniels (2016)(Website) Note: Cited by: Discussions.
  • [2] S. Berkman, S. Garland, and A. VanSteinberg (2017) Quantified feminism and the Bechdel test. Technical report Duke University. Cited by: Related work.
  • [3] R. L. Collins (2011-02-01) Content analysis of gender roles in media: where are we now and where should we go?. Sex Roles 64 (3), pp. 290–298. External Links: ISSN 1573-2762, Document, Link Cited by: Introduction.
  • [4] A. Haraldsson and L. Wängnerud (2019) The effect of media sexism on women’s political ambition: evidence from a worldwide study. Feminist Media Studies 19 (4), pp. 525–541. Cited by: Introduction.
  • [5] C. Heldman, L. L. Frankel, and J. Holmes “Hot, black leather, whip” the (de) evolution of female protagonists in action cinema, 1960–2014. Sexualization, Media, & Society 2 (2). Cited by: Introduction, Related work.
  • [6] T. K. Ho (1995) Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1, pp. 278–282. Cited by: Random Forest.
  • [7] M. M. Lauzen and D. M. Dozier (2005) Maintaining the double standard: portrayals of age and gender in popular films. Sex roles 52 (7-8), pp. 437–446. Cited by: Discussions.
  • [8] M. Lauzen (2012) The celluloid ceiling: behind the scenes employment of women on the top 250 films of 2013. Women’s Media Center 15, pp. 2012. Cited by: Introduction.
  • [9] A. M. Lindner, M. Lindquist, and J. Arnold (2015) Million dollar maybe? The effect of female presence in movies on box office returns. Sociological Inquiry 85 (3), pp. 407–428. Cited by: Introduction, Related work.
  • [10] A. M. Lindner and Z. Schulting How movies with a female presence fare with critics. Socius 3. Cited by: Introduction, Related work.
  • [11] Z. Micic (2015) Female interactions on film-beyond the Bechdel test: a quantitative content analysis of same-sex-interactions of top 20 box office films. Note: Cited by: Introduction, Related work.
  • [12] J. N. Murphy (2015) The role of women in film: supporting the men–an analysis of how culture influences the changing discourse on gender representations in film. Undergraduate Honor Thesis, Department of Journalism, University of Arkansas. Cited by: Introduction, Related work.
  • [13] S. Sulastri, M. Laila, and M. Hum (2019) Characterizing men and women language in The Best of Me movie. Ph.D. Thesis, Universitas Muhammadiyah Surakarta. Cited by: Discussions.
  • [14] R. Tamborini, J. Stiff, and D. Zillman (1987) Preference for graphic horror featuring male versus female victimization. Human Communication Research 13 (4), pp. 529–552. External Links: Document, Link, Cited by: Discussions.
  • [15] A. Thomas (2017) Women only said 27% of the words in 2016’s biggest movies. Note: Cited by: Related work.
  • [16] R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: Generalized Linear Regression with Penalty.
  • [17] J. Treme and L. A. Craig (2013) Celebrity star power: do age and gender effects influence box office performance?. Applied Economics Letters 20 (5), pp. 440–445. Cited by: Discussions.
  • [18] J. S. Turner (2011) Sex and the spectacle of music videos: an examination of the portrayal of race and sexuality in music videos. Sex Roles 64 (3-4), pp. 173–191. Cited by: Discussions.
  • [19] J. T. Wood (1994) Gendered media: the influence of media on views of gender. Gendered lives: Communication, gender, and culture 9, pp. 231–244. Cited by: Discussions.