The WEAT was introduced by caliskan2017semantics (caliskan2017semantics). It takes as input a trained word embedding model, two sets of “category words” and , and two sets of “attribute words” and . Where is the cosine similarity between the vectors assigned to words and by the trained word embedding model, it outputs a single measure of how ‘biased’ the set of word embeddings are, measured as:
So, for instance, and might be uniquely European-American and African-American names respectively, and might be ‘pleasant’ and ‘unpleasant’ words respectively (for a review of how these attribute words are chosen, see antoniak2021bad [antoniak2021bad]), and the word embedding model might be a Word2Vec model mikolov2013efficient trained on a corpus of interest. In this case, is the difference between how much more pleasant words are associated with white names than black names () and how much more unpleasant words are associated with white names than black names (). If pleasant words were found to be more associated with white names while unpleasant words were found to be more associated with black names, the overall measure would be positive, indicating anti-black bias in the underlying text corpus the word embeddings were trained on.
This deceptively simple measure has become an integral part of the computational linguistics toolkit. Other high-profile papers such as Garg et al. (garg2018word) and Lewis and Lupyan (lewis2020gender) have used the WEAT to study cultural biases across time and place. Importantly, the method is now being used to evaluate the political biases of websites knoche2019identifying, detect the purposeful spread of misinformation on social media by state-sponsored actors toney2021automatically, uncover biases present and proliferated through popular song lyrics barman2019decoding, and even to measure how much gender bias US judges display in their judicial opinions ash2021measuring.
However, there are well-known issues with word embeddings in general and the WEAT specifically that should make us skeptical of this proliferation. silva2021towards (silva2021towards), for instance, find that (at least when using contextualized embedding models) WEAT estimates poorly predict bias estimated by other measures and is even internally inconsistent. goldfarb2020intrinsic (goldfarb2020intrinsic) find that estimates of the bias present in word embeddings (such as those produced by the WEAT) do not meaningfully correlate with downstream biases of applications using those embeddings. Finally, terms in the semantic space estimated by word embeddings tend to cluster on non-intuitive dimensions such as term frequency arora2017simple; mu2017all; gong2018frage.
We focus in on this final issue–that a term’s frequency in a corpus shapes its estimated vector representation in a word embedding. Human language exhibits a clear “linguistic positivity bias”, where positive words are used more frequently than negative words dodds2015human. In theory, this might result in rare words being on average closer to negative words than positive words and frequent words being on average closer to positive words than negative words. wolfe2021low (wolfe2021low), consistent with this reasoning, find that the degree of linguistic bias, estimated using the WEAT, towards names unique to ethnic minorities is highly correlated with the frequency with which they appear in the underlying corpus.
Data and Methods
Geo-Tagged Twitter Data
A random 10% sample of the Twitter stream (i.e., the “Garden Hose”) was collected between January 2010 and May 2014, after which the data was reduced to a 1% sample for the remainder of 2014 preotiuc2012trendminer. Each tweet was mapped to a MSA (by first mapping to a U.S. county which is then trivially mapped to a MSA). If latitude/longitude information is available, then a tweet can trivially be mapped to a US county. If latitude/longitude data is not available for a given tweet, then location information is extracted from the self-reported User Location field, if available. This is a rule based mapping system designed to avoid false positives (i.e., incorrect mappings) at the expense of fewer mappings. For full details please see schwartz2013characterizing.
We also removed retweets and quoted tweets from the corpus, as we were interested only in the original language produced by MSA residents. All retweets and quoted tweets in our data contained “RT @” followed by the handle of the account the tweet was a retweet/quote of. Therefore, we excluded from analysis any tweet that contained “RT @” in its main body. However, some small number of tweets that were not retweets or quoted tweets likely contained this, and may have thus been unduly removed from the analysis.111 To asses how accurate this heuristic was, we tested it on similar Twitter data for which we
To asses how accurate this heuristic was, we tested it on similar Twitter data for which wedid have ground-truth meta-data indicating whether a tweet was a retweet or quoting tweet. We found the heuristic to be over 99.995% accurate in identifying retweets. After excluding MSAs with less that 500k tweets (see below), the final data set consists of 1.12 billion tweets from 214 MSAs (out of 384 possible MSAs).
MSA-Level WEAT Estimates
Our empirical strategy is to compare WEAT-based measures of the anti-black linguistic bias present in each MSA’s Twitter discourse and compare that to other regional measures of racial animus. One straight-forward approach to this would be to train completely independent word embedding models on each MSA’s respective Twitter data, and subject each of these models to the WEAT. One obstacle to realizing this strategy is that the volume of Twitter data produced by the residents of many MSAs over our observation period is relatively small. Even in our large data set, the median number of tweets (after removing retweets as specified above) in an MSA was 615,474—a smaller number than typically used for high-quality word embeddings. Further, variation in the number of tweets available for each MSA might introduce unwanted bias into our estimates.
To overcome this limitation, we leveraged the approach taken in van2020explaining (van2020explaining), which allows for estimating linguistic differences among (relatively) small sub-populations. In our case, it works by first randomly sampling a fixed number of tweets from every MSA, compiling them together, and training a word embedding model (in our case a Word2Vec model222CBOW model, vector size of one-hundred, minimum term count of ten, using negative sampling and an initial learning rate of 0.025) on this stratified corpus. This model, built over 1 million tweets, is referred to as the “baseline model” and represents the consensual linguistic understanding among the MSAs. Then, for each MSA, a larger fixed number of tweets is sampled from that MSA and used to continue training the baseline model. Specifically, we sample 500k tweets per MSA, excluding those with less than 500k tweets. The resulting model is the “updated model”, which learns the idiosyncratic linguistic norms of its MSA. This updated model is then what the WEAT is performed on. This is repeated five times and the WEAT estimates for each MSA are averaged to overcome stochastic variation in sampling and in training the embedding models.
We wanted our WEAT to be as similar to that performed by caliskan2017semantics (caliskan2017semantics) as possible. To that end, we used the same list of African-American and European-American names and pleasant and unpleasant words as them. Since they used multiple lists from different sources, we simply took the union of these different lists (and excluded any which were not frequent enough in the corpus to be included in the baseline model). Just as in caliskan2017semantics (caliskan2017semantics), a higher score on the WEAT indicates more anti-black bias.
We briefly describe each variable below, with full descriptions in the Supplemental Materials.
Racial Animus Measures.
Implicit Bias is derived through IAT experiments. It is represented by the average -score (a measure of the difference in response latency when pleasant [unpleasant] words were paired with white [black] faces and vice versa) for all respondents in a given MSA. Explicit Bias measures feeling more warmly towards European Americans relative to African Americans (as self-reported). Opposition to Affirmative Action asks how respondents felt about affirmative action policies in general, which is averaged for all white respondents (on surveys). Racial Resentment is measured by racial resentment scale which attitudes thought to be indicative of racial animus among contemporary American whites (on surveys) kinder1996divided. Residential Segregation is the proportion of one group that would need to change the location of their residence for the MSA to have no segregation.
Relative Black Name Frequency.
For each MSA, we measure how often each name used as a category word in the WEAT appears in the MSA’s Twitter discourse and find the proportion of all name occurrences that are uniquely black.
We collect the following information for each MSA to use as standard control variables: the proportion of the population living in poverty, the log of the total population count, the logged population density, the proportion of residents whose highest educational attainment was completing high school or less, and the proportion of households that live in a rural area. We also collect the proportion of residents that identify as black. Finally, we create a series of binary variables indicating in which of the nine census division each MSA resides.
Following the methods used in the social sciences, we use ordinary least squares (OLS) regression, which models the outcome as a weighted linear combination of the predictors with random, Gaussian-distributed noise and for every predictor yields a standardized coefficient
with an associated significance. The simultaneous inclusion of multiple predictors allows OLS to partial out (control for) the associations of different covariates with the outcome. In all models, all variables are standardized (mean centered and re-scaled to a unit standard deviation ) to ease comparison and interpretability.
|Opp. affirm. action||-0.23***||-0.18**||-0.12||-0.08|
Table 1 reports a summary of our results. Each cell displays the estimated coefficient corresponding to the strength of the relationship between an MSA’s WEAT estimate and the outcome variable indicated by the row label. Each column corresponds to a set of covariates included in the model simultaneously. Asterisks denote levels of statistical significance.
In the first column labeled ‘No controls’, coefficients are equivalent to the bivariate Pearson correlation between WEAT estimates and the outcome. As can be seen, the WEAT estimates significantly correlate with each of the outcome measures, and each correlation is highly significant. Note, however, that the direction of the correlations with racial resentment, opposition to affirmative action, and residential segregation are all in the opposite direction of what one might initially expect (i.e., MSAs with higher survey-measured racial resentment show lower WEAT-measured anti-black bias).
The second column of Table 1 displays the same coefficient when controlling for an extensive list of standard controls at the MSA level, including census division dummies as well as proxies for socioeconomic status, average level of education, and rural/urban status. Even with these controls, the WEAT estimates strongly and significantly correlate with each of the five outcomes, suggesting robust relationships.
The third column summarizes the results of models that only control for percent of the MSA that identifies as black (and does not include the several covariates included under ‘Standard controls’). As can be seen in the table, in most cases the coefficient magnitude is reduced more when controlling for this one variable than when controlling for the several covariates included the previous column. However, the relationships between WEAT estimates and implicit bias as well as explicit bias remain statistically significant.
The fourth and final column of Table 1 shows the coefficient of the relationship between WEAT estimates and each outcome when controlling for relative black name frequency. As can be seen, none of the relationships attain statistical significance. This indicates that the WEAT estimates don’t contain significantly more information regarding the outcomes than the name frequencies alone.
To unpack the difference between columns 3 and 4, see Figure 1
, which shows that that the proportion of the population that identifies as black and the proportion of occurrences of uniquely black names in the WEAT are highly correlated (explaining over half of the variance). That is, Black names are used more in areas where Black Americans reside–and while controlling for % Black Population partially accounts for the association between the WEAT and Racial Animus Measures, it is really the relative occurrence of Black names that fully accounts for these associations.
Word embeddings are an undeniably powerful tool for the study of human language and cognition. A prominent article in the American Sociological Review has even said that they reveal the very “geometry of culture” kozlowski2019geometry. However, these models are also black-boxes; they seem to provide valuable information, but due to their complexity researchers cannot easily observe how they arrive at that information.
In this work, we showed that one potentially unintuitive aspect of word embeddings (their tendency to separate rare and common words in their estimated semantic space) can have unintended consequences for the study of human attitudes. Specifically, when estimating latent linguistic biases against social groups using the popular Word Embedding Association Test (WEAT), these models can conflate the relative frequency of words prototypical of the groups with positivity. This is especially problematic in our setting where we analyzed linguistic bias against groups with varying prevalences in the text-generating populations, which corresponds to a commensurate lack of representation in the Twitter data. In sum, this created spurious relationships between our estimates of linguistic bias and various experimental and survey-based measures of regional racial animus.
We were able to uncover this tendency by showing that these relationships vanished upon controlling for a single variable: the frequency of the words prototypical of the minority group relative to that same frequency for the majority group. While it’s relieving that we were able to find a simple way to alleviate this omitted variable bias, it’s worrisome that a different but fairly extensive set of controls did not sufficiently correct for it. This indicates that if other biases we don’t know about are also introduced by the use of word embeddings, we might not be able to rely on standard sociodemographic controls to fully address them.
The current work relies on the widely used Word2Vec model mikolov2013efficient; future work may extend analysis to GloVe models pennington2014glove and contextual embeddings models such as BERT devlin2018bert, as well as consider other measures of linguistic bias beyond the WEAT.
Our findings have important consequences for computational linguistics and computational social science. First and foremost, research using the WEAT in a way similar to how we do here should strongly consider either measuring and controlling for the relative frequency of the seed words used in the WEAT or estimating their word embeddings such that they are frequency-agnostic mu2017all; gong2018frage. Second, other measures for assessing linguistic bias should be audited to uncover whether or not similar biases exist as in the WEAT. Finally, social scientists using word embeddings models should take heed: these models are complex and their validity has not been properly demonstrated. Careful study of these models with this goal in mind is necessary before they can truly be considered a measure of the “geometry of culture.”