|post||off?||int?||lewd?||group?||targeted group||implied statement||min. speaker|
|Kendrick just dropped the coldest video of all fucking time||no||no||no|
|What a fucking stupid bitch shut the fuck up already||yes||yes||no|
|need to fuck sum bad , I got some pressure built up :P||no||no||yes|
|I take a bitch home give her long dick||yes||yes||yes||pers.|
|You annoyed me every day for months you’re a fucking moron||yes||yes||no||no|
|My problem with Korean artists: I dont know how to pronounce your name I can’t hanggul to save my life||yes||no||no||yes||Koreans||have weird names||no|
|I’ll continue this thread till non-lifting fags stop lifting like idiots at the gym… Which is never.||yes||yes||no||yes||gay men||are weak and not manly||no|
|I thought drugs were the only things black people could shoot up Boy was I wrong||yes||yes||no||yes||Black folks||do drugs||no|
Language has enormous power to project social biases and reinforce stereotypes on people Fiske (1993). The way such biases are projected is rarely in what is stated explicitly, but in all the implied layers of meanings that frame and influence people’s judgements about others. For example, even with a seemingly innocuous statement that an all-Muslim movie was a “box office bomb”, most people can instantly recognize the implied demonizing stereotype that “Muslims are terrorists” (Figure 1). Understanding these biases with accurate underlying explanations is necessary for AI systems to adequately interact in the social world Pereira et al. (2016), and failure to do so can result in the deployment of harmful technologies (e.g., conversational AI systems turning sexist and racist; Vincent, 2016).
Most previous approaches to understanding the implied harm in statements have cast this task as a simple toxicity classification (e.g., Waseem and Hovy, 2016; Founta et al., 2018; Davidson et al., 2017). However, simple classifications run the risk of discriminating against minority groups, due to high variation and identity-based biases in annotations (e.g., which cause models to learn associations between dialect and toxicity; Sap et al., 2019a; Davidson et al., 2019). In addition, it is the detailed explanations that are much more informative for people to understand and reason about why a statement is potentially harmful against other people Ross et al. (2017).
Thus, we propose Social Bias Frames, a novel conceptual formalism that aims to model pragmatic frames in which people project social biases and stereotypes on others. Compared to semantic frames, the meanings projected by pragmatic frames are richer thus cannot be easily formalized using only categorical labels. Therefore, as illustrated in Figure 1, our formalism combines hierarchical categories of biased implications such as intent and offensiveness with implicatures described in free-form text such as groups referenced and implied statements. In addition, we introduce SBIC,111Social Bias Inference Corpus, available at http://tinyurl.com/social-bias-frames. a new corpus collected using a novel crowdsourcing framework. SBIC supports large scale learning and evaluation with over 100k structured annotations of social media posts, spanning over 26k implications about a thousand demographic groups.
We then establish baseline approaches that learn to recover Social Bias Frames from unstructured text. We find that while state-of-the-art neural models are effective at making high-level categorization of whether a given statement projects unwanted social bias (86% ), they are not effective at spelling out more detailed explanations by accurately decoding out Social Bias Frames. Our study motivates future research that combines structured pragmatic inference with commonsense reasoning on social implications.
Important Implications of Our Study
We recognize that studying Social Bias Frames necessarily requires us to confront online content that may be offensive or disturbing. However, deliberate avoidance does not make the problem go away. Therefore, the important premise we take in this study is that assessing social media content through the lens of Social Bias Frames is important for automatic flagging or AI-augmented writing interfaces, where potentially harmful online contents can be analyzed with detailed explanations for users to consider and verify. In addition, the collective analysis over large corpora can also be insightful for educating people to put more conscious efforts in reducing unconscious biases that they repeatedly project in their language use.
2 Social Bias Frames Definition
|Founta et al. (2018)||11,865|
|Davidson et al. (2017)||3,008|
|Waseem and Hovy (2016)||1,816|
To better enable models to account for socially biased implications of language,222In this work, we employ the U.S. socio-cultural lens when discussing bias and power dynamics among demographic groups. we design a new pragmatic formalism that distinguishes several related but distinct inferences, shown in Figure 1. Given a natural language utterance, henceforth, post
, we collect both categorical as well as free text inferences (described below), inspired by recent efforts in knowledge graph creation(e.g., Speer and Havasi, 2012; Sap et al., 2019b). The free-text explanations are crucial to our formalism, as they can both increase trust in predictions made by the machine Kulesza et al. (2012); Bussone et al. (2015); Nguyen et al. (2018) and encourage a poster’s empathy towards targeted group, thereby combating potential biases Cohen-Almagor (2014).
denotes the overall rudeness, disrespect, or toxicity of a post. We define it formally as whether it could be considered “offensive to anyone”, as previous work has shown this to have higher recall of offensive content Sap et al. (2019a)
. This is a categorical variable with three possible answers (yes, maybe, no).
Intent to offend
or sexual references are a key subcategory of what constitutes potentially offensive material in many cultures, especially in the United States Strub (2008). This is a categorical variable with three possible answers (yes, maybe, no).
are distinguished from individual-only attacks or insults that do not invoke power dynamics between groups (e.g., “F*ck you” vs. “F*ck you, f*ggot”). This is a categorical variable with two possible answers.
describes the social or demographic group that is referenced or targeted by the post. Here we collect free-text answers, but provide a seed list of demographic or social groups to encourage consistency.
represents the power dynamic or stereotype that is referenced in the post. We collect free-text answers in the form of simple Hearst-like patterns (e.g., “women are ADJ”, “gay men VBP”; Hearst, 1992).
3 Collecting nuanced annotations
To create SBIC, we design a crowdsourcing framework to seamlessly distill the biased implications of posts at a large scale.
3.1 Data Selection
We draw from two sources of online content, namely Reddit and Twitter, to select posts to annotate. To mitigate the challenge of scarcity of online toxicity Founta et al. (2018),333Founta et al. (2018) find that the prevalence of toxic content online is 4%. we start by annotating posts made in three intentionally offensive English subReddits (see Table 2). By nature, these are very likely to have harmful implications as they are often posted with intents to deride adversity or social inequality Bicknell (2007). Additionally, we include posts from three existing English datasets annotated for toxic or abusive language, filtering out @-replies, retweets, and links. We mainly annotate tweets released by Founta et al. (2018), who use a bootstrapping approach to sample potentially offensive tweets. We also include tweets from Waseem and Hovy (2016) and Davidson et al. (2017), who collect datasets of tweets containing racist or sexist hashtags and slurs, respectively.
|total # tuples||103,173|
3.2 Annotation Task Design
For each post, workers indicate whether the post is offensive, whether the intent was to offend, and whether it contains lewd or sexual content. Only if annotators indicate potential offensiveness do they answer the group implication question. If the post targets or references a group or demographic, workers select or write which one(s); per selected group, they then write two to four stereotypes. Finally, workers are asked whether they think the speaker is part of one of the groups references by the post. Optionally, we ask workers for coarse-grained demographic information.444This study was approved by the University of Washington IRB.
We collect three annotations per post, and restrict our worker pool to the U.S. and Canada.
In our final annotations, our worker pool was relatively gender-balanced and age-balanced (55% women, 42% men, 1% non-binary; 36
10 years old), but racially skewed (82% White, 4% Asian, 4% Hispanic, 4% Black).
We compute how well annotators agreed on categorical questions, showing moderate agreement on average. Workers agreed on a post being offensive at a rate of 77% (Cohen’s =0.53), its intent being to offend at 76% (=0.48), and it having group implications at 76% (=0.51). Workers marked posts as lewd with substantial agreement (94%, =0.66), but agreed less when marking the speaker a minority (94%, =0.18).555Low values are expected for highly skewed categories such as minority speaker (only 4% “yes”).
3.3 Sbic description
4 Social Bias Inference
Given a post, our model aims to generate the implied power dynamics in textual form, as well as classify the post’s offensiveness and other categorical variables. We show a general overview of the full model in Figure4.
As input, our model takes a post , defined as a sequence of tokens delimited by a start token ([STR]) and a classifier token ([CLF]). Our encoder model then yields a contextualized representation of each token , where is the hidden size of the encoder.
For predicting the categorical variables (), our model combines five logistic classifiers that use the representation at the classifier token, , as input. The final predictions are computed through a projection and a sigmoid layer:
where and . During training, we minimize the negative log-likelihood of the predictions:
During inference, we simply predict the classes which have highest probability.
For the free-text variables, we take inspiration from recent generative commonsense modelling Bosselut et al. (2019). Specifically, we frame the inference as a conditional language modelling task, by appending the linearized targeted group () and implied statement () to the post (using the SEP delimiter token; see Figure 4). During training, we minimize the cross-entropy of the linearized triple using a language modelling objective:
During inference, we conditionally generate the group and statement conditioned on the post , using greedy (argmax) or sampling decoding.
|54.8% pos. (dev)||57.5% pos (dev)||9.3% pos (dev)||68.2% pos (dev)||2.0% pos (dev)|
4.1 Experimental setup
In this work, we build on the pretrained OpenAI-GPT model by Radford et al. (2018) as our encoder , which has yielded impressive classification and generation results Radford et al. (2018); Gabriel et al. (2019). This model is a uni-directional language model, which means encoded token representations are only conditioned on past tokens (i.e., ). OpenAI-GPT was trained on English fiction (Toronto Book Corpus; Zhu et al., 2015)
For baseline comparison, we consider a multitask classification-only model (). We also compare the full multitask model to a baseline generative inference model trained only on the language modelling loss (). Finally, we consider a model variant that uses a randomly initialized GPT model to observe the effect of pretraining.
We evaluate performance of our models in the following ways. For classification, we report precision, recall, and scores of the positive class.Following previous generative inference work Sap et al. (2019b), we use automated metrics to evaluate model generations. We use BLEU-2 and RougeL () scores to capture word overlap between the generated inference and the references, which captures quality of generation Galley et al. (2015); Hashimoto et al. (2019). We additionally compute word mover’s distance (WMD; Kusner et al., 2015), which uses distributed word representations to measure similarity between the generated and target text.
4.3 Training details
As each post can contain multiple annotations, we define a training instance as containing one post-group-statement triple (along with the five categorical annotations). We then split our dataset into train/dev./test (75:12.5:12.5), ensuring that no post is present in multiple splits. For evaluation (dev., test), we combine the categorical variables by averaging, and compare the generated inferences (hypotheses) to all targeted groups and implied statements (references).
All experiments are carried our using HuggingFace’s Transformers library.666https://github.com/huggingface/transformers
We tune hyperparameters on the dev. set, and report performance for the best performing setting (according to average F1). We train or finetune our models using a batch size of 4, a learning rate of 5e-5 (with linear warm up), and consider training forepochs.
|group targeted||implied statement|
|post||predicted group||predicted implication||reference groups||reference implications|
|(a)||All-female casts bother me but I have literally never noticed the hundreds of movies with 0 female characters I have already watched||—||—||women||women aren’t good actors; women can’t be in leading roles|
|(b)||As expected, when the terrorist group Hamas won the election in Gaza it was the usual one man, one vote, one time, Islamist election. #Islam||muslim folks||muslims are terrorists||muslim folks; arabic folks||[”bad people.”, ”islam promotes controlling governments”, ”muslims are authoritarians”, ”not fair.”]|
|(c)||”Black guy in class: *attempts to throw a paper ball into the trash* *misses* Mrs. Mundy: ”You’re a disgrace to your race, Marcus”||black folks||black people are worthless||black folks||are good at basketball; black men are defined by athletic skill.; black men should be good at basketball.; do not take things seriously|
|(d)||A Jew runs into a wall with a boner. What breaks first? His nose.||jewish folks||jewish people have big noses||jewish folks||jewish folks have large noses; jewish folks have the same features; jews are fun to joke about; makes fun of there looks; stereotype about there nose length; they have big noses|
|(e)||I hate fat bitches||overweight/fat folks||fat people are unattractive||overweight/fat folks||are not pleasant people; fat folks are all the same; fat folks are less than others; not likable|
Most notably for classification, the multitask model outperforms other variants substantially when predicting a post’s offensiveness and intent to offend (+8% F1 on both). The classification-only model slightly outperforms the full multitask model on other categories. We hypothesize that correctly predicting those might require more lexical matching (e.g., detecting sexual words for the lewd category). In contrast, the offensiveness and intent gains from full multitasking suggest that for those more subtle semantic categories, more in-domain language model finetuning helps. Highly skewed categories pose a challenge for all models, due to the lack of positive instances. As expected, using the randomly initialized model performs significantly worse than the pretrained version.
When we evaluate on our generation tasks, we find that model performance is comparable across automatic metrics between the full multitask variant (GPT L1+L2) and the free-text only generation model (GPT L2). Surprisingly, the randomly initialized multitask variant performs better on BLEU and WMD on the group target inference, which is likely due to the small and constrained generation space (there are only 1.1k different groups in our corpus; see Table 3). When the generation space is larger (for the implied statement), pretrained variants perform better.
6 Related Work
Bias and Toxicity Detection
Detection of hateful, abusive, or otherwise toxic language has received increased attention recently Schmidt and Wiegand (2017). Most dataset creation work has cast this detection problem as binary classification Waseem and Hovy (2016); Wulczyn et al. (2017); Davidson et al. (2017); Founta et al. (2018), Recently, Zampieri et al. (2019) collected a dataset of tweets with hierarchical categorical annotations of offensiveness and whether a group or individual is targeted. In contrast, Social Bias Frames covers both hierarchical categorical and free-text annotations.
Inference about Social Dynamics
Various work has tackled the task of making inferences about power and social dynamics. Particularly, previous work has analyzed power dynamics about specific entities, either in conversation settings (Prabhakaran et al., 2014; Danescu-Niculescu-Mizil et al., 2012) or in narrative text Sap et al. (2017); Field et al. (2019); Antoniak et al. (2019). Additionally, recent work in commonsense inference has focused on mental states of participants of a situation (e.g., Rashkin et al., 2018; Sap et al., 2019b). In contrast to reasoning about particular individuals, our work focuses on biased implications of social and demographic groups as a whole.
7 Ethical considerations
Risks in deployment
Determining offensiveness and reasoning about harmful implications of language should be done with care. When deploying such algorithms, several ethical aspects should be considered including the fairness of the model on speech by different demographic groups or in different varieties of English Mitchell et al. (2019). Additionally, practitioners should discuss potential nefarious side effects of deploying such technology, such as censorship Ullmann and Tomalin (2019) and dialect-based racial bias Sap et al. (2019a); Davidson et al. (2019). Finally, inferences about offensiveness could be paired with promotions of positive online interactions, such as emphasis of community standards Does et al. (2011) or counter-speech Chung et al. (2019); Qian et al. (2019).
Risks in annotation
Recent work has highlighted various negative side effects caused by annotating potentially abusive or harmful content (e.g., acute stress; Roberts, 2016). We mitigate these by limiting the number of posts that one worker can annotate in one day, paying workers above minimum wage ($7-$12), and providing crisis management resources to our annotators.777We direct workers to the Crisis Text Line (https://www.crisistextline.org/)
To help machines reason about and account for societal biases, we introduce Social Bias Frames, a new structured commonsense formalism that distills knowledge about the biased implications of language. Our frames combine categorical knowledge about the offensiveness, intent, and targets of statements, as well as free-text inferences about which groups are targeted and biased implications or stereotypes. We collect a new dataset of 100k annotations on social media posts using a novel crowdsourcing framework. We establish baseline performance of models built on top of large pretrained language model. We show that while classifying the intent or offensiveness of statements is easier, models struggle to generate relevant inferences about social biases, especially when implications have low lexical overlap with posts. This indicates that more sophisticated models are required for Social Bias Frames inferences.
Authors would like to thank Hannah Rashkin and Lucy Lin for their helpful comments on the paper. This research was supported in part by NSF (IIS-1524371, IIS-1714566), DARPA under the CwC program through the ARO (W911NF-15-1-0543), DARPA under the MCS program through NIWC Pacific (N66001-19-2-4031), and Samsung Research.
- Antoniak et al. (2019) Maria Antoniak, David Mimno, and Karen Levy. 2019. Narrative paths and negotiation of power in birth stories. In CSCW.
- Bicknell (2007) Jeanette Bicknell. 2007. What is offensive about offensive jokes? Philosophy Today, 51(4):458–465.
- Bosselut et al. (2019) Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. Comet: Commonsense transformers for automatic knowledge graph construction. In ACL.
- Breitfeller et al. (2019) Luke M Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. 2019. Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts. In EMNLP.
- Bussone et al. (2015) Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In 2015 International Conference on Healthcare Informatics, pages 160–169. IEEE.
- Chung et al. (2019) Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, and Marco Guerini. 2019. CONAN - COunter NArratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. In ACL, pages 2819–2829, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Cohen-Almagor (2014) Raphael Cohen-Almagor. 2014. Countering hate on the internet. Annual review of law and ethics, 22:431–443.
- Danescu-Niculescu-Mizil et al. (2012) Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: language effects and power differences in social interaction. In WWW, page 699, New York, New York, USA. ACM Press.
- Davidson et al. (2019) Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial bias in hate speech and abusive language detection datasets. In Abusive Language Workshop.
- Davidson et al. (2017) Thomas Davidson, Dana Warmsley, Michael W Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In ICWSM.
- Does et al. (2011) Serena Does, Belle Derks, and Naomi Ellemers. 2011. Thou shalt not discriminate: How emphasizing moral ideals rather than obligations increases whites’ support for social equality. J. Exp. Soc. Psychol., 47(3):562–571.
- Dynel (2015) Marta Dynel. 2015. The landscape of impoliteness research. Journal of Politeness Research, 11(2):383.
- Field et al. (2019) Anjalie Field, Gayatri Bhat, and Yulia Tsvetkov. 2019. Contextual affective analysis: A case study of people portrayals in online# MeToo stories. In ICWSM, volume 13, pages 158–169. wvvw.aaai.org.
- Fiske (1993) S T Fiske. 1993. Controlling other people. the impact of power on stereotyping. Am. Psychol., 48(6):621–628.
- Founta et al. (2018) Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. In ICWSM.
- Gabriel et al. (2019) Saadia Gabriel, Antoine Bosselut, Ari Holtzman, Kyle Lo, Asli Çelikyilmaz, and Yejin Choi. 2019. Cooperative generator-discriminator networks for abstractive summarization with narrative flow. ArXiv, abs/1907.01272.
- Galley et al. (2015) Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, and William B. Dolan. 2015. deltableu: A discriminative metric for generation tasks with intrinsically diverse targets. In ACL.
- Greengross and Miller (2008) Gil Greengross and Geoffrey F Miller. 2008. Dissing oneself versus dissing rivals: Effects of status, personality, and sex on the Short-Term and Long-Term attractiveness of Self-Deprecating and Other-Deprecating humor. Evol. Psychol., 6(3):147470490800600303.
- Hashimoto et al. (2019) Tatsunori B. Hashimoto, Hugh Zhang, and Percy Liang. 2019. Unifying human and statistical evaluation for natural language generation. In NAACL-HLT.
- Hearst (1992) Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In ACL, COLING ’92, pages 539–545, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Kasper (1990) Gabriele Kasper. 1990. Linguistic politeness:: Current research issues. J. Pragmat., 14(2):193–218.
- Kulesza et al. (2012) Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more?: the effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1–10. ACM.
- Kusner et al. (2015) Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML, pages 957–966.
- Liu et al. (2016) Chia-Wei Liu, Ryan Lowe, Iulian V Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In ACL.
- Mitchell et al. (2019) Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In FAT.
- Nguyen et al. (2018) An T Nguyen, Aditya Kharosekar, Saumyaa Krishnan, Siddhesh Krishnan, Elizabeth Tate, Byron C Wallace, and Matthew Lease. 2018. Believe it or not: Designing a human-ai partnership for mixed-initiative fact-checking. In The 31st Annual ACM Symposium on User Interface Software and Technology, pages 189–199. ACM.
- Pereira et al. (2016) Gonçalo Pereira, Rui Prada, and Pedro A Santos. 2016. Integrating social power into the decision-making of cognitive agents. Artif. Intell., 241:1–44.
- Prabhakaran et al. (2014) Vinodkumar Prabhakaran, Prabhakaran Vinodkumar, and Rambow Owen. 2014. Predicting power relations between participants in written dialog from a single thread. In ACL.
- Qian et al. (2019) Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. In EMNLP.
- Radford et al. (2018) Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
- Rashkin et al. (2018) Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, and Yejin Choi. 2018. Event2mind: Commonsense inference on events, intents, and reactions. In ACL.
- Roberts (2016) Sarah T Roberts. 2016. Commercial content moderation: Digital laborers’ dirty work. In Safiya Umoja Noble and Brendesha M Tynes, editors, The Intersectional Internet: Race, Sex, Class and Culture Online, Media Studies Publications. Peter Lang Publishing.
- Ross et al. (2017) Björn Ross, Michael Rist, Guillermo Carbonell, Benjamin Cabrera, Nils Kurowsky, and Michael Wojatzki. 2017. Measuring the reliability of hate speech annotations: The case of the european refugee crisis. In NLP 4 CMC Workshop.
- RWJF (2017) RWJF. 2017. Discrimination in america: Experiences and views. https://www.rwjf.org/en/library/research/2017/10/discrimination-in-america--experiences-and-views.html. Accessed: 2019-11-5.
- Sakaguchi et al. (2019) Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2019. Winogrande: An adversarial winograd schema challenge at scale. ArXiv, abs/1907.10641.
- Sap et al. (2019a) Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A Smith. 2019a. The risk of racial bias in hate speech detection. In ACL.
- Sap et al. (2019b) Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A Smith, and Yejin Choi. 2019b. Atomic: An atlas of machine commonsense for if-then reasoning. In AAAI.
- Sap et al. (2017) Maarten Sap, Marcella Cindy Prasetio, Ariel Holtzman, Hannah Rashkin, and Yejin Choi. 2017. Connotation frames of power and agency in modern films. In EMNLP.
Schmidt and Wiegand (2017)
Anna Schmidt and Michael Wiegand. 2017.
A survey on hate speech detection using natural language processing.In Proceedings of the Workshop on NLP for Social Media.
- Speer and Havasi (2012) Robyn Speer and Catherine Havasi. 2012. Representing general relational knowledge in conceptnet 5. In LREC.
- Strub (2008) Whitney Strub. 2008. The clearly obscene and the queerly obscene: Heteronormativity and obscenity in cold war los angeles. Am. Q., 60(2):373–398.
- Ullmann and Tomalin (2019) Stefanie Ullmann and Marcus Tomalin. 2019. Quarantining online hate speech: technical and ethical perspectives. Ethics Inf. Technol.
- Vincent (2016) James Vincent. 2016. Twitter taught microsoft’s AI chatbot to be a racist asshole in less than a day. https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist. Accessed: 2019-10-26.
- Wang and Potts (2019) Zijian Wang and Christopher Potts. 2019. TalkDown: A corpus for condescension detection in context. In EMNLP.
- Waseem and Hovy (2016) Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In NAACL Student Research Workshop.
- Wulczyn et al. (2017) Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In WWW.
- Zampieri et al. (2019) Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the type and target of offensive posts in social media. In NAACL.
Zhu et al. (2015)
Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan R. Salakhutdinov, Raquel
Urtasun, Antonio Torralba, and Sanja Fidler. 2015.
Aligning books and movies: Towards story-like visual explanations by
watching movies and reading books.
2015 IEEE International Conference on Computer Vision (ICCV), pages 19–27.