Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online

04/16/2019 ∙ by Elise Jing, et al. ∙ 0

The nature of what people enjoy is not just a central question for the creative industry, it is a driving force of cultural evolution. It is widely believed that successful cultural products balance novelty and conventionality: they provide something familiar but at least somewhat divergent from what has come before, and occupy a satisfying middle ground between "more of the same" and "too strange". We test this belief using a large dataset of over half a million works of fanfiction from the website Archive of Our Own (AO3), looking at how the recognition a work receives varies with its novelty. We quantify the novelty through a term-based language model, and a topic model, in the context of existing works within the same fandom. Contrary to the balance theory, we find that the lowest-novelty are the most popular and that popularity declines monotonically with novelty. A few exceptions can be found: extremely popular works that are among the highest novelty within the fandom. Taken together, our findings not only challenge the traditional theory of the hedonic value of novelty, they invert it: people prefer the least novel things, are repelled by the middle ground, and have an occasional enthusiasm for extreme outliers. It suggests that cultural evolution must work against inertia --- the appetite people have to continually reconsume the familiar, and may resemble a punctuated equilibrium rather than a smooth evolution.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


A central puzzle for the arts and cultural industries, a sector that contributes more than $700 billion to the U.S. GDP in 2018 by products such as movies, TV shows, and video games [31], is to predict what people will like and enjoy. As suggested by the very name “creative work”, people may seek surprise when consuming artistic and cultural creations [20]. At the same time, we know that people have a strong preference for familiarity. When people listen to music, for example, they often choose songs that they are familiar with [42]. Such preferences may be basic: humans also find high attractiveness in faces that are close to the average, and therefore more familiar, and the effect even extends to images of other objects such as birds and automobiles [13]. Moreover, Zajonc [48] showed that a mere exposure to a certain stimuli can increase people’s preference for it, even when they are not aware of the exposure [24, 7].

“Looking like what has come before”, leveraging the familiar, is therefore expected to be a core strategy to achieve popularity. Repetition has long been a central feature of cultural products such as music and poetry [19], and the trend persists in contemporary popular culture, where new releases are often adaptations, remakes, and remixes of existing works [27]. Marvel and DC Comics have been making massively successful movies based on their popular comics, including many sequels, prequels, reboots, and soft reboots. The origin story of Spider-man provides an extreme case: it has been re-played in three movies since 2002 — Spider-Man (2002), The Amazing Spider-Man (2012) and Spider-Man: Homecoming (2017) [47]. Market reception indicates that the audience enjoy such repetitions: currently, the three movies have grossed about $640 million, $300 million, and $340 million respectively (inflation-adjusted) [8], among the top 300 highest-grossing movies of all time. Out of the 10 highest-grossing movies of 2016, 8 are adaptions, sequels, remakes, or parts of a movie universe [45]. This number increased to 10 out of 10 in 2017 [46]. Notably, many such films also follow the typical plots of their genres. Film and literary theories have further suggested that most stories can be summarized into a few archetypes and basic elements, such as the Hero’s Journey [9], mythemes [25] or Proppian functions [32]. In these accounts, every story involves the repetition of classical narrative elements. The preference for repetition can be found in political discourse just as well as cultural artifacts: for instance, in Parliament of France during the Revolution, the more novel a speech is, the less likely its combinations are to persist in later speeches [3].

Extremely novel cultural products, on the other hand, sometimes also enjoy huge successes. In the film industry, such movies are often landmark introductions of new forms of cinematography, such as Star Wars (multiple ground-breaking technologies) and Avatar (the first mainstream 3D film using stereoscopic filmmaking). Popular music in the 20th Century is characterized by high-turnover and the emergence of new genres [28], such as hip-hop and electronic dance music, that start as underground music and growing to enjoy global commercial success. In the history of literature, experimental, deviating works such as The Trial by Franz Kafka and Ulysses by James Joyce are considered canonical events that define both the author and the era. One of the most influential figures in the French Revolution, Maximilien Robespierre, reliably delivered some of the most novel speeches. If people only enjoy familiar things, what accounts for such success?

“Balance Theories” of Liking. A widely-adopted hypothesis reconciles the apparent conflict between the preference for novelty and familiarity by suggesting that successful creative works are a combination of, or balance between, convention and innovation. Under this theory, popular works are different from previous works and their peers, but not too different. In psychology, this idea is captured in the Wundt-Berlyne curve, an inverted U-shaped curve with an optimal amount of novelty for hedonic values [4]; a similar account is found in the business and marketing literature where it is known as Mandler’s Hypothesis [29].

A few experiments have supported this hypothesis in the case of words [38], music [14, 2], films [39], advertising [30], and scientific publications [43], suggesting that, for example, songs with an optimal amount of differentiation are more likely to be on the top of the Billboard’s Hot 100 charts, that movies balancing familiarity and novelty have the highest revenues, or that visual metaphors in ads work best when they have mild incongruity. In scientific publications, the highest-cited papers are argued to be grounded on mostly conventional, but partly novel combinations of previous works [43]. However, much of these research deals with cultural products whose receptions are influenced by many factors: not just intrinsic factors such as content, style, subject, genre, or length, but also external ones such as price, advertisement, and media coverage. For example, a song’s popularity is only partially determined by its quality [34], and deliberately putting unpopular songs on the top chart can in fact popularize them [35]. The complex interactions between consumers, cultural products and the market is therefore difficult to disentangle.

In order to verify the balance theory while controlling for such interactions, we study a special dataset that allows isolating the effect of novelty — fanfictions (see Data and Methods). Using this data, we investigate how a fanfiction’s novelty is associated with its success. We use two frameworks to characterize the novelty of the fiction: a term-based language model and a topic model. In both cases, a fiction is evaluated with respect to the existing fictions in the same fandom. A fiction is more novel if it is more different from the other fictions published during the previous time period in the feature space. The success of a fiction is evaluated by its readers’ responses, such as “kudos” and comments (see Data and Methods).

Our statistical methods recover a near-monotonically decreasing relationship between a fiction’s novelty and its success with, in some exceptional cases, an additional uptick for very high novelty work. Readers prefer creative fictions that gives them a sense of familiarity, and we find no limit to their appetite for “more of the same”; the exception is provided by extreme outliers. Taken together, the main contribution of our findings is that they reverse the usual “inverted U-shape” account, and diverge from the balancing theories. Instead of a sweet spot at the intermediate level of novelty, we found a lack of recognition for intermediate values of novelty. Success does not come from combining novelty and sameness, but from pushing one of these to the extreme.

Data & Methods

Fanfiction and Fandoms. A special type of creative work, known as fan work, allows us to rule out some of the confounds in creative works. Fan works draw on narratives and characters from a particular canonical work to create new stories and alternate timelines [11]. While fan works appear in multiple media including painting, music, and games, the most common type is creative writing, or “fanfiction”. An enthusiast of the Harry Potter series may write a new adventure for Hermione and Harry after the end of Rowling’s novels; a fan of the television show Buffy the Vampire Slayer may write a story in which Willow’s girlfriend has a different fate.

Fanfiction is often transformative and transgressive: fans of Sherlock Holmes, for example, have written stories in which Holmes and Dr. Watson fall in love, and Watson, by magical means, gestates a baby they conceive together. Examples like these abound: despite being based around a canonical work, fanfiction communities are far from conservative or uncreative and often, to the contrary, attempt not only to extend a canonical work, but also to subvert it. Fanfiction, in short, can be seen not as an outlier of cultural production, but as a paradigm for textural production in general, characterizing the ways in which one author or story influences another [33], and unusual only in how explicitly it identifies the ancestor from which downstream texts evolve. Fanfiction has drawn attention from media studies and cultural studies (see, e.g., [41]). However, these works have often focused on the social aspect of fanfiction: the identity of the writers [5], the practice of writing [26], and the interaction between fans [16], rather than on the texts themselves as creative works.

Fields Description
Title Title of the work.
Fandoms The fandom(s) that the work belongs to.
Author The author(s) of the work.
Chapters The number of chapters that the work has.
Archive Warnings Warnings for sensitive elements.
Category The type of relationships in the work.
Rating The age rating.
Relationship The relationship(s) between characters in the work, in the form of Character A/Character B or Character A&Character B.
Publish Date The date the work was published on AO3.
Complete date For multi-chapter works, the date when it was marked as “complete”.
Update date The date when the work was last updated.
Kudos The number of kudos (likes) the work received.
Comments The number of comments the work received.
Hits The number of times the link to the work was clicked on.
Bookmarks The number of people who bookmarked the work.
Table 1: Metadata fields for each work of fanfiction in our database.

The nature of fanfiction allows us to mitigate a number of common confounds in the study of creativity and what people enjoy. First, fanfiction is usually shared among relatively isolated online communities (“fandoms”) with almost no advertisement or promotions [44]; a work’s success is therefore largely uninfluenced by top-down interventions, such as advertising campaigns, that can distort its reception. Second, most fanfiction works are freely available on the Internet, so that the price is not a confounding factor. Third, works in the same fandom are created within the same context, and have similar subjects, characters, and settings to each other. As in genres of literary production, variations between works occur in a recognizable space, allowing us to talk about their novelty while controlling for other factors. Finally, because fanfiction is freely available in plain text and in remarkably large volumes, computational methods can be used to operationalize novelty for quantitative studies.

Figure 1: The size of fandoms and the amount of fanfiction published in AO3 each month, from 2009 to 2016.

We draw our data from the online fanfiction archive Archive of Our Own (AO3), which allows users to upload their work, and categorizes them based on fandoms. Established in 2009, it has become one of the largest fan communities, with more than 1.6 million users and 4 million works by August 2018 [1]. A Python script was used to download fanfictions and their metadata from AO3 ( in March 2016.

AO3 classifies the fandoms based on the formats of the canons, such as movies, TV shows, books, anime & manga, and musicals. We first identify the top five fandoms in each of these categories based on the number of works they contain. We exclude the fandoms that are subsets of other large fandoms. For example, we keep

Marvel but exclude The Avengers, because The Avengers is a part of the Marvel Universe. Fandoms that do not have a unified subject, such as K-pop (which contains fanfictions about over 300 different Korean pop bands), were also removed. We keep only those written in English. Finally, to control for the effects of the work’s length on both reader preference and on our methods, we only analyze the subset of fictions with 500–1,500 words. This leaves us with 609,812 works in 23 fandoms from 86,583 authors.

Figure 1(a) shows the number of works in each of the 23 fandoms; Marvel, Supernatural, and Sherlock Holmes are the largest. Figure 1(b) shows the volume of works produced over time. AO3 was established in December 2009, and experienced rapid growth beginning in 2012. Because the works timestamped earlier than the start date might be migrated from other platforms, and may not correctly reflect the status of the archive, we only run our analysis using those published in January 2010 or later.

Metadata for each fanfiction (see Table 1) allows us to gain insights about the fictions and their reception. The variables indicating “kudos” (‘likes’), hits, comments, and bookmarks are used to operationalize the reception of a work. While the number of hits is the most direct metric for popularity, it may be strongly affected by the title, summary, and the author, rather than the content itself. The number of kudos is a clearer signal that a reader liked the fiction. A reader can also comment on a fiction or bookmark it to read later. The comments and bookmarks therefore signal the recognition or engagement from readers, although they depend on multiple motivations, and are less directly associated with popularity. Figure 2 shows the distribution of these indicators on a logarithmic scale. Like many other measurements of popularity, they exhibit fat tails, with a small number of works receiving the majority of attention and most receiving little or none. Also note that there are outliers that achieved extreme recognition in terms of hits and kudos. In the following analysis, we log-transform these values, a common practice for similar data such as citations [40]. This practice has been argued to reduce the potential bias when performing regression and other statistical analysis [40].

Quantifying novelty. Although there are many ways to measure novelty, here we operationalize it in an intuitive, data-driven way. As previous studies noted [2, 10], it is reasonable to assess a work’s novelty in the context of other previously published works. Intuitively, a work is less novel if it is similar to many others published beforehand. Here we use the centroid, in feature space, of all past works in a fandom as the guide for measuring novelty. A work is more novel the further it is from the center.

Figure 2:

Log-binned probability density function and complementary cumulative distribution of kudos, hits, bookmarks and comments. For multi-chapter fanfictions, we average these values over the number of chapters. Fat-tailed distributions are observed, where a small portion of fictions receive many kudos, hits, etc., and most receive few.

In line with many existing researches such as [22, 3, 17], we extract features from the works using two methods, the term frequency–inverse document frequency (TF–IDF) and the Latent Dirichlet Allocation (LDA) [6]

— some of the most fundamental and widely used ways to model documents. TF–IDF is a vector space model often used in information retrieval

[36], where each document is represented as a vector, and its entries are the TF–IDF scores of the unique terms in the document. It discounts the importance of common terms that appear in many documents. This allows us to quantify the term-level novelty. We first pre-process the texts by removing the terms that appear only once. The TF-IDF scores are then computed, for each fanfiction, using all fanfictions published in the same fandom within the past 6 months from when it was published. The Python library scikit-learn was used to create the vectors. We then compute the centroid of the feature space as the average of all feature vectors. The term novelty score of a fanfiction is defined as:


where is the vector representation of a fanfiction, and is the centroid of the vector space defined with respect of this fanfiction.

LDA provides our second measure of novelty, characterizing documents in terms of topics, or co-occuring word patterns. Similar to the TF–IDF, for each work, we construct a feature space consisting of the topic distributions of the works published before it, and define the topic novelty score as the Jensen-Shannon Distance (JSD; [22]) between the fanfiction’s topic distribution and the center of the feature space:


where is the vector representation of a fanfiction, and is the centroid (see above).

, and D is the Kullback–Leibler divergence between the two vectors.

The Python library gensim is used to fit LDA topic models on our data. The parameters are: number of topics 100111Our results do not significantly change even if we vary the number of topics to 20, 40, 60, 80, and 120 (results not shown)., , and iterations 50. The texts are preprocessed by removing the top 500 most frequent words and the words that appear only once. The data and code that we used is made available222


Let us first present illustrative evidence for our quantification of novelty. By manually examining fanfictions with different levels of novelty, we found that in general, the least-novel fictions (in the 95% percentile) and intermediate-novel fictions (around the 50% percentile) tend to feature common character pairings and story settings. Meanwhile, top-novelty fanfictions (in the top 5% percentile) often feature rare forms of writing, story elements, or character pairings. In particular, many of them are “cross-over” fictions, where characters from multiple fandoms interact with each other. We illustrate this with a few examples.

One example is a fiction in the Marvel fandom titled I am groot. It has a term novelty score of 0.99 and is “told from the perspective of Groot”. The fiction consists of 437 repetitions of a single sentence “I am groot.” Similarly, one of the works with a novelty score of 0.99 in the Star Wars fandom is about the exchange between C-3PO and R2D2, written in binary numbers. A high-novelty fiction in the Sherlock fandom, titled The Real Meaning of Idioms, is written entirely in the form of text messages. Another fiction with high topic novelty is found in the Doctor Who fandom. Titled The Boy Who Waited, it is a cross-over with the Marvel Universe. Such examples, albeit anecdotal by nature, show that the fanfictions found to be extremely novel by our methods are also novel to human readers.

Correlation between novelty and success. Figure 3 displays the raw correlation between a work’s novelty and its kudos, hits, comments, and bookmarks, aggregated across fandoms. Since the fandoms differ in size and activity, we compute the -score for each value in logarithm based on the average of its fandom. We observe that as the term novelty and topic novelty score increases, the -score of kudos, hits, comments, and bookmarks decrease across the board, displaying a negative correlation. In other words, novelty is associated with poor recognition and engagement, not concurring with the balance theory. This result prompts us to perform a more sophisticated multiple regression that controls for confounding factors.

Figure 3:

The relationships between novelty and success, measured by kudos, hits, comments, and bookmarks. The horizontal axes are the novelty scores, and the vertical axes are the corresponding average of the z-score of kudos, hits, comments, and bookmarks in bins with bin size = 0.1 (left) and 0.05 (right). The confidence intervals obtained from bootstrap resampling are shown.

Regression analysis

Model Response variables Independent variables Control variables
Models 1-4 Logarithm of Kudos, hits, comments, and bookmarks respectively Term novelty, topic novelty All control variables
Models 5-8 Logarithm of Kudos, hits, comments, and bookmarks respectively Term novelty, square of term novelty, topic novelty, square of topic novelty All control variables
Models 9-12 Non-zero subsets of Kudos, hits, comments, and bookmarks Term novelty, topic novelty All control variables
Table 2: The response, independent, and control variables used in each group of the regression models.

Response variables. Four response variables — kudos, hits, comments, and bookmarks — are considered. We use the logarithm of these variables because they exhibit fat-tailed distributions (see Data & Methods).

Independent variables

. The term novelty and topic novelty scores are the predictor variables in all models. To account for possible non-linear relationships such as the inverted U-shape curve, we also use the square values of these scores as predictor variables. We first mean-center the term novelty and topic novelty scores before computing the square values.

Control variables

. To isolate the relationship between the novelty and success, we consider the following control variables: (1) Some fandoms have larger fan bases than others, resulting in higher numbers of kudos in general for the works in these fandoms. We use binary variables that represent each fandom to control for this effect. (2) The number of chapters provides a second control: multi-chapter works have additional chances for exposure, and that higher kudos may stimulate an author to write more chapters. (3) Since AO3’s user base has been increasing, one may expect newer works to receive more kudos than older ones. Conversely, older works may receive more kudos because they had more time to be discovered or to accumulate readers. We therefore include the age of each work as a numerical variable: the number of days since a work was completed (for finished works) or was last edited (for incomplete works). (4) Since the authors may accumulate fame through writing fanfiction, and this fame may bias readership and kudos, we control for the total number of works that an author has written. (5) The relationship between characters is one of the main reasons that many fans read fanfictions, and some relationships have larger fan bases than others. To account for this effect, we create a binary variable, “frequent relationship”, to indicate whether a work features a relationship that is among the top five most frequent relationships in its fandom. Finally, (6) the “archive warnings” indicate that the fanfictions contain sensitive elements such as graphic violence or major character death, and may influence the readers’ choice to read them; the age ratings restrict some fictions to adults only; the types of character relationships may also influence the readers’ choices. Categorical variables are created to capture their effects.

The correlations between the numerical variables are shown in Figure 4, presenting no strong pairwise correlation within the predictor and the numerical control variables. We also examined the variation inflation factor (VIF) and removed one variable that causes strong collinearity (the age of fictions), although keeping it does not strongly influence the results. The variables that each model contains are summarized in Table 2.

Because of the prevalent zero values in the response variables, we use a two–part model [21, 18]333Because our outcomes are the average of counts data, zero–inflated Poisson or negative binomial regression models are not appropriate.

. A logistic regression is first performed on the predictor and control variables to predict the probability of each sample having a non-zero outcome. This probability is then used as an additional predictor variable in a pooled OLS regression on the samples

with non-zero outcome. We use the Python library statsmodels [37] to perform the analysis.

Selected OLS coefficient estimates of the models are shown in Figure

5. Let us first examine the coefficients of the control variables. A work featuring frequent relationships is likely to have better reception. In contrast to our assumption, the number of chapters and the author’s fame do not have significant influence on its success. While the coefficients of the categorical control variables are not shown here, we found that work published in newer and more popular fandoms such as Star Wars and Marvel444These fandoms may have long histories, but recent installments such as Star Wars’ new trilogy and the Marvel movies are associated with the influx of new fans. tend to be more successful. Elements such as character death and mature contents are associated with poorer recognition. For full results with all coefficients, see Appendix.

Figure 4: Correlations between the numerical predictor, response, and control variables. Strong correlation exists between the response variables, but not the predictor variables.
(a) Models 1-4
(b) Models 5-8
Figure 5: OLS coefficients for the independent variables and selected control variables for the multiple regression models. 95% confidence intervals are shown. The coefficients of the categorical variables are omitted.
Figure 6: Models 9-12: Partial dependence plots of the Generalized Additive Models. The x-axis: the term/novelty scores. The y-axis: the partial influence of the novelty scores on the response variables while holding other independent variables constant. 95% confidence intervals are shown.

We then examine models 1-4, which do not include the square values of the term and novelty scores (Figure 4(a)). After controlling for these effects, we find that both the term novelty and the topic novelty has negative effects on kudos, comments, and bookmarks, supporting our previous observation that higher novelty is linked to less success. For example, the term novelty increasing by 0.05 (cf. Figure 3) is associated with the decrease in kudos by approximately 22.1% and the topic novelty increasing by 0.05 is associated with 63.2% decrease in kudos, holding other variables constant.

When we add the square values of novelty scores in models 5-8 (Figure 4(b)), the coefficients for the term and novelty scores are similar to that in models 1-4, supporting the robustness of our models. At the same time, we find that the effect sizes of the topic novelty squared are positive and very large across all cases. Both fictions with low and high topic novelty are therefore associated with better reception, suggesting not the inverted U-shaped curves, but U-shaped curves.


The linear regression models suggest a potentially nonlinear relationship between novelty and success, but cannot directly reveal details of such nonlinear relationship. We therefore turn to the generalized additive models (GAM) 

[15], which allows us to study non-linear relationships more directly in complex cultural data (e.g. [17]). Here we use the non-zero subset of the response variables without log transformation, and use the same predictor and control variables as in the linear regression models (see Table 2). The models are fitted using the mgcv library in R, with the following parameters: for term novelty, and for topic novelty, and .

The partial dependence plots of the models are shown in Figure 6. In the case of term novelty, we observe a decreasing trend for kudos, hits, and bookmarks, supporting our previous findings. However, a U-shape curve presents for comments, suggesting that extremely novel works may achieve high engagement, which matches with the effect of term novelty squared in Figure 4(b). For topic novelty, similar upticks appear in the high-novelty region, again supporting the implication of Figure 4(b). This pattern is found to be influenced by a small number of outliers, which have extremely high topic novelties and enjoy huge success as we discuss shortly.


Traditional theories suggest that people like things with a balance of familiarity and surprise. Our findings from the fanfiction community contradict this: people, in general, are repelled by novelty and tend to stay with the familiar. On rare occasions, however, extreme novelties gain huge attention and success.

These extreme cases can be seen in our results. In Figure 6, the sharp rise in success for high topic novelty can be attributed to a small number of fanfictions, such as the I am Groot and the binary number fiction examples discussed above (see Results). These fictions are all among the most well-received ones in their fandoms. The Groot fiction in particular received 67,219 kudos — the highest number of kudos ever recorded in AO3. These examples reflect the well-recognized ways of innovation: stylistic innovation, as well as recombination and remixing. The recognition of such highly novel creations may be one of the incentives for authors to risk the dangers of innovation (see Figure 6). Readers may not actively seek for this type of fiction, but they appreciate them when presented. In the case of art and design, a “surprised by novelty” dynamic is perhaps best captured by Steve Jobs : “People don’t know what they want until you show it to them.” This dynamics further implicates that cultural evolution may not be a smooth process. Similar to the punctuated equilibrium theory in evolutionary biology [12] and the paradigm shifts in scientific revolution [23], people have the inertia to keep consuming familiar things for their comfort, until a revolutionary creation redefine their tastes and open up a new space for followers, establishing the landmarks in cultural history.

The boundaries that fandoms impose on themselves provide a natural control for the variation in subjects, characters and settings, allowing us to better isolate the influence of novelty, and avoiding the confounding factors that may have contributed to the inverted U-shape curve found by previous studies. However, our dataset also has limitations. Both the authors and the consumers of the works are drawn from a particular population skewed towards the young females, and largely from the United States and United Kingdom. Fanfiction is usually the domain of amateur writers whose training, socialization, and incentives may differ from the “professional” producers of culture products in other domains. Finally, while all cultural practices have a chain of inheritance, fanfictions are more explicitly anchored in original texts than most.

Our results are robust against the two different characterizations of the texts (term and topic); both methods, however, neglect the semantic information contained in word orderings. In literary theories, the arrangement of events plays an essential role in stories. Our methods capture the “material” of stories but are unable to evaluate the way it is arranged. Moreover, the novelty of a piece of writing may appear in its style as well as in content. Some stylometric features, such as the usage of certain words, are captured by our methods, but they could not be decomposed from contents. Other features such as sentence length and punctuation usage are neglected by our methods. This treatment of stylometric features may bias our evaluation of novelty.

Our results may also diverge from previous researches because of the methods we used. In the early experiments by Berlyne [4], they controlled the novelty of geometric shapes by having the subjects exposed to them, and then evaluate the success by asking the subjects to report their “interestingness”. Similarly, Zajonc’s experiments exposed subjects to groups of words [48]. These experiments have very different data and setup from ours, which can be one reason for the different outcomes. However, we also note that our results diverge from some recent researches that measure novelty and success similar to our study [17, 2], suggesting that our findings are not merely caused by the different experimental setups.

Our results may also have practical values. For fanfiction writers, our findings suggest that they may gain popularity by writing fiction fitting into the “mainstream” of the fandom. However, exceptional recognition sometimes comes from creating an avant-garde, adventurous, and ingenious work. This recommendation may also apply to the broader area of genre fiction, where sameness may continue to satisfy readers, while high novelty can open up new markets. Our methods can also easily extend to other types of text data. A potential research direction may therefore be to investigate the relationship between novelty and success in other areas, such as book markets or news articles.


We thank the AO3 staff for helping us with the data collection, and thank Ágnes Horvát and Jaehyuk Park for their comments.

Appendix: Full multiple regression results

Figure 7: OLS coefficients for models 5-8. 95% confidence intervals are shown.

The complete figure of coefficient estimates for models 5-8 are shown in Figure 7.


  • [1] Archive of Our Own. Archive of Our Own, 2018. [Online; accessed 12-September-2018].
  • [2] N. Askin and M. Mauskapf. What makes popular culture popular? product features and optimal differentiation in music. American Sociological Review, 82(5):910–944, 2017.
  • [3] A. T. Barron, J. Huang, R. L. Spang, and S. DeDeo. Individuals, institutions, and innovation in the debates of the French revolution. Proceedings of the National Academy of Sciences, 115(18):4607–4612, 2018.
  • [4] D. E. Berlyne. Novelty, complexity, and hedonic value. Attention, Perception, & Psychophysics, 8(5):279–286, 1970.
  • [5] R. W. Black. Language, culture, and identity in online fanfiction. E-learning and Digital Media, 3(2):170–184, 2006.
  • [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation.

    Journal of Machine Learning Research

    , 3(Jan):993–1022, 2003.
  • [7] R. F. Bornstein. Exposure and affect: overview and meta-analysis of research, 1968–1987. Psychological bulletin, 106(2):265, 1989.
  • [8] Box Office Mojo. All time box office, 2018. [Online; accessed 05-July-2018].
  • [9] J. Campbell. The hero with a thousand faces, volume 17. New World Library, 2008.
  • [10] M. De Vaan, D. Stark, and B. Vedres. Game changer: The topology of creativity. American Journal of Sociology, 120(4):1144–1194, 2015.
  • [11] Fanlore contributors. Transformative work — Fanlore, 2015. [Online; accessed 14-December-2015].
  • [12] N. E.-S. J. Gould and N. Eldredge. Punctuated equilibria: an alternative to phyletic gradualism. 1972.
  • [13] J. Halberstadt and G. Rhodes. It’s not just average faces that are attractive: Computer-manipulated averageness makes birds, fish, and automobiles attractive. Psychonomic Bulletin & Review, 10(1):149–156, Mar 2003.
  • [14] D. J. Hargreaves. The effects of repetition on liking for music. Journal of research in Music Education, 32(1):35–47, 1984.
  • [15] T. J. Hastie. Generalized additive models. In Statistical models in S, pages 249–307. Routledge, 2017.
  • [16] M. Hills. The expertise of digital fandom as a ‘community of practice’ exploring the narrative universe of doctor who. Convergence, 21(3):360–374, 2015.
  • [17] E.-A. Horvát, J. Wachs, R. Wang, and A. Hannák. The role of novelty in securing investors for equity crowdfunding campaigns. 2018.
  • [18] B. R. Humphreys. Dealing with zeros in economic data. Department of Economics, University of Alberta, Alberta, 2013.
  • [19] D. Huron. A psychological approach to musical form: The habituation-fluency theory of repetition. Current Musicology, (96):7, 2013.
  • [20] M. Hutter. Infinite surprises: on the stabilization of value in the creative industries. The Worth of Goods. Valuation and pricing in the economy, pages 201–220, 2011.
  • [21] A. M. Jones. Health econometrics. In Handbook of health economics, volume 1, pages 265–344. Elsevier, 2000.
  • [22] S. Klingenstein, T. Hitchcock, and S. DeDeo. The civilizing process in London’s Old Bailey. Proceedings of the National Academy of Sciences of the United States of America, 111(26):9419–9424, 2014.
  • [23] T. S. Kuhn. The structure of scientific revolutions. University of Chicago press, 2012.
  • [24] W. R. Kunst-Wilson and R. B. Zajonc. Affective discrimination of stimuli that cannot be recognized. Science, 207(4430):557–558, 1980.
  • [25] C. Lévi-Strauss. The structural study of myth. The journal of American Folklore, 68(270):428–444, 1955.
  • [26] A. M. Magnifico, J. S. Curwood, and J. C. Lammers. Words on the screen: broadening analyses of interactions among fanfiction writers and reviewers. Literacy, 49(3):158–166, 2015. LIT-OA-2015-003.R1.
  • [27] L. Manovich. What comes after remix. Remix Theory, 10:2013, 2007.
  • [28] M. Mauch, R. M. MacCallum, M. Levy, and A. M. Leroi. The evolution of popular music: USA 1960–2010. Royal Society Open Science, 2(5):150081, 2015.
  • [29] J. Meyers-Levy and A. M. Tybout. Schema congruity as a basis for product evaluation. Journal of Consumer Research, 16(1):39–54, 1989.
  • [30] P. Mohanty and S. Ratneshwar. Visual metaphors in ads: The inverted-u effects of incongruity on processing pleasure and ad effectiveness. Journal of Promotion Management, 22(3):443–460, 2016.
  • [31] National Endowment for the Arts. The arts contribute more than $760 billion to the u.s. economy, 2018. [Online; accessed 04-September-2018].
  • [32] V. Propp. Morphology of the Folktale, volume 9. University of Texas Press, 2010.
  • [33] L. M. Rohrs. North and South as Jane Austen Fanfiction: How Gaskell’s Use of Austen’s Characters and Structure Strengthen Her Social Protest Novel. The Victorian, 6(1), 2018.
  • [34] M. J. Salganik, P. S. Dodds, and D. J. Watts. Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762):854–856, 2006.
  • [35] M. J. Salganik and D. J. Watts. Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market. Social psychology quarterly, 71(4):338–355, 2008.
  • [36] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.
  • [37] S. Seabold and J. Perktold. Statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference, 2010.
  • [38] W. Sluckin, A. M. Colman, and D. J. Hargreaves. Liking words as a function of the experienced frequency of their occurrence. British Journal of Psychology, 71(1):163–169, 1980.
  • [39] S. Sreenivasan. Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. Scientific Reports, 3:2758, 2013.
  • [40] M. Thelwall and P. Wilson. Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4):963–971, 2014.
  • [41] B. Thomas. What is fanfiction and why are people saying such nice things about it? Storyworlds: A Journal of Narrative Studies, 3(1):1–24, 2011.
  • [42] D. Thompson. The shazam effect. The Atlantic, December issue. http://www. theatlantic. com/magazine/archive/2014/12/the-shazam-effect/382237, 2014.
  • [43] B. Uzzi, S. Mukherjee, M. Stringer, and B. Jones. Atypical combinations and scientific impact. Science, 342(6157):468–472, 2013.
  • [44] Wikipedia contributors. Fandom — Wikipedia, the free encyclopedia, 2018. [Online; accessed 15-January-2019].
  • [45] Wikipedia contributors. 2016 in film — Wikipedia, the free encyclopedia, 2019. [Online; accessed 15-January-2019].
  • [46] Wikipedia contributors. 2017 in film — Wikipedia, the free encyclopedia, 2019. [Online; accessed 15-January-2019].
  • [47] Wikipedia contributors. Spider-man in film — Wikipedia, the free encyclopedia, 2019. [Online; accessed 15-January-2019].
  • [48] R. B. Zajonc. Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(2p2):1, 1968.