Joint Multiclass Debiasing of Word Embeddings

by   Radomir Popović, et al.

Bias in Word Embeddings has been a subject of recent interest, along with efforts for its reduction. Current approaches show promising progress towards debiasing single bias dimensions such as gender or race. In this paper, we present a joint multiclass debiasing approach that is capable of debiasing multiple bias dimensions simultaneously. In that direction, we present two approaches, HardWEAT and SoftWEAT, that aim to reduce biases by minimizing the scores of the Word Embeddings Association Test (WEAT). We demonstrate the viability of our methods by debiasing Word Embeddings on three classes of biases (religion, gender and race) in three different publicly available word embeddings and show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings. Our work strengthens the foundation for more unbiased neural representations of textual data.




Conceptor Debiasing of Word Representations Evaluated on WEAT

Bias in word embeddings such as Word2Vec has been widely investigated, a...

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

Social biases are encoded in word embeddings. This presents a unique opp...

Black is to Criminal as Caucasian is to Police:Detecting and Removing Multiclass Bias in Word Embeddings

Online texts -- across genres, registers, domains, and styles -- are rid...

Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis

We present a new approach for detecting human-like social biases in word...

Debiasing Convolutional Neural Networks via Meta Orthogonalization

While deep learning models often achieve strong task performance, their ...

WordBias: An Interactive Visual Tool for Discovering Intersectional Biases Encoded in Word Embeddings

Intersectional bias is a bias caused by an overlap of multiple social fa...

Discovering and Categorising Language Biases in Reddit

We present a data-driven approach using word embeddings to discover and ...

Code Repositories



view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Word Embeddings, i.e., the vector representation of natural language words, are key components of many state-of-the art algorithms for a variety Natural Language Processing tasks, such as Sentiment Analysis or Part of Speech Tagging. Recent research established that popular embeddings are prone to substantial biases, e.g., with respect to gender or race  

[DBLP:journals/corr/abs-1711-08412, bolukbasi2016], which demonstrated in results like “Man is to Computer Programmer as Woman is to Homemaker” [bolukbasi2016] as results of basic analogy tasks. Since such biases can potentially have an effect on downstream tasks, several relevant approaches for debiasing existing word embedding have been developed. A common deficit of existing techniques is that debiasing is limited to a single bias dimension (such as gender). Thus, in this paper, we propose two new post-processing methods for joint/simultaneous multiclass debiasing, which differ in their trade-off between maintaining word relationships and decreasing bias levels: HardWEAT completely eliminates contained bias as measured by the established Word Embedding Association Test [caliskan2017]. SoftWEAT has a stronger and tunable emphasis on maintaining the original relationships between words in addition to bias removal. We demonstrate the effectiveness of our approach on the bias dimensions gender, race and religion on three conventional Word Embedding models: FastText, GloVe and Word2Vec.

2 Background and Related Work

In this section, we introduce key concepts formally and discuss related work on debiasing word embeddings.

We assume a Word Embedding with vocabulary size that maps each word to a vector representation . Protected classes are entities on which bias can exist, e.g., race, religion, gender, or age. Subclasses or target set of words refer to directions of a class, e.g., male, female for gender, and are associated with a set of definitional words . The set of neutral words contains all words that should not be associated with any . Finally, attribute sets of words () denote word sets that a target set of words can potentially be linked to, e.g., {pleasant, nice, enjoyable} or {science, research, academic}.

The Word Embedding Association Test. The state-of-the art way of measuring biases in embeddings is the Word Embedding Association Test (WEAT) [caliskan2017]: It considers two sets of attribute words (, ), e.g., family and career related words, and two target sets (,

), e.g., black and white names. The null hypothesis of this test states, that the relative association of target sets’ words to both attribute sets’ words are equally strong. Thus, rejecting the null hypothesis asserts bias. To examine this, a test statistic

quantifies how strongly is associated to in comparison to the association between and . It is computed as , where describes the relative association between a single target word compared to the two attribute sets in a range . Based on , we assess the statistical significance via an one-sided permutation test through which we acquire value; Additionally, an effect size that quantifies the severity of the bias can be calculated as:

We will use this effect size in our novel debiasing approaches.

2.0.1 Existing debiasing techniques.

To achieve reduction of bias, we will substantially extend the work of Bolukbasi et al. [bolukbasi2016], which describes two ways of debiasing Word Embeddings in a post-processing step: Hard and Soft Debiasing. Both rely on identifying gender bias subspace

via Principal Component Analysis computed on gender word pairs differences, such as

he–she, and man–woman. Hard debiasing employs a neutralize operation that removes, e.g., all non-gender related words from a gender subspace by deducting from vectors their bias subspace projection. Subsequently, an equalize operation positions opposing gender pair words (e.g., king, queen) to share the same angle with neutral words. By contrast, Soft Debiasing enables more gradual bias removal by utilizing a tuning parameter : An embedding is being transformed by optimizing a transformation matrix :

Another recent approach by Manzini et al. [manzini2019black] incorporates these ideas, but expands and evaluates results not only on gender, but separately also on race and religion. It suggests a bias subspace definition for non-binary class environment, which is formulated via PCA of mean shifted -tuples (number of subclasses) of the definitional words. There are also other recently proposed gender bias post-processing [DBLP:journals/corr/abs-1901-07656, font2019equalizing] and pre-processing techniques [DBLP:journals/corr/abs-1809-01496].

In terms of existing joint multiclass debiasing techniques, Conceptor Debiasing [karve-etal-2019-conceptor]

, is based on applying Boolean-like logic operators using soft transformation matrix on a linear subspace where Word Embedding has the highest variance. We will use this technique for comparison in our experiments.

3 Approach

Figure 1: Visual interpretation of multiclass debiasing via HardWEAT.

In this chapter, we present our novel debiasing techniques. Our methods substantially extend previous works of Bolukbasi et al. [bolukbasi2016] and Caliskan et al. [caliskan2017].

HardWEAT: To adapt the neutralize step from Bolubasi’s work [bolukbasi2016] jointly in a multiclass debiasing setting, we first define the concept of class definitional vectors of classes (e.g., race, gender, …), which are computed as the top component of a PCA on the vector representations of definitional words for subclasses (e.g., male, black, …), cf. [manzini2019black]. Now, each particular class can vary in terms of bias amount according to WEAT tests with chosen sets of attribute and target words. Thus, we aggregate WEAT effect sizes for each class into bias levels by averaging twice: First, we compute the mean value for all target/subclass pairs within a class. Second, we average those means for all results for class . Then, we compute a Centroid , as the average of the class definitional vectors weighted by their normalized bias levels: . We use this centroid to perform neutralization, i.e., we re-embed all neutral words such that they perpendicular to it, cf. [bolukbasi2016].

To adapt the equalize step of Bolubasi’s work, we move the definitional words of all subclasses in a way such that there is no relative preference of any attribute set towards a subclass. For that purpose, we generate equally spread out points on some circle with radius and center as the new center points of the subclasses. In three or more dimensions, this circle can be defined by two vectors perpendicular to : .

Given this formula, we use equidistancing twice: First, we calculate new temporary central points for each class (e.g., gender) by neutralization (see above), then we determine new temporary central points for each subclass (e.g., male) in a circle around that. Then, we compute new embeddings for all definitional words in a circle around this subclass vector. Note that are randomly generated with the condition , whereas are obtained via SVD. Along with an illustration (see Figure 1), we present the full procedure in Algorithm 1.

Input: Word Embedding matrix , Words , set of classes , set of subclasses , Bias levels
1 for  do // Generating class definitional vectors
3 // Computing centroid // Defining neutral words // Neutralizing // Normalizing vectors for  do // Equidistancing
5       for  do
7             for  do
// Normalizing vectors
Algorithm 1 HardWEAT algorithm.

For successful equidistancing, i.e., equal dispersion of words, we also require that the length of all subclass/target words within a particular class must be equal. Violations of this requirement will result in inadequate WEAT scores. Furthermore, HardWEAT could in theory result in some equidistanced words becoming angle-close to random words: Thus, to counter this, we design this process in an iterative manner, modifying until there are no neutral words having an angle greater than certain threshold (e.g., we use as a default). We discuss factors of success and open issues in more detail in Section 4.4. Overall, the HardWEAT method ultimately aims at complete bias elimination, but randomizes the topological structure of subclass words.

SoftWEAT is a more gradual debiasing alternative with a greater focus on quality preservation. It provides the user with a choice on how to debias and to what extent. Let us assume that a particular target set of words is closely related in terms of their angle to some attribute set of words (e.g., is the set of pleasant words, is the set of intellectual words). Hypothetically, bias would be minimized if the subclass words would be perpendicular to the attribute sets . Thus, moving towards such a perpendicular space/vector (null vector), noted as , results in bias reduction. Since full convergence towards a vector may result in quality decrease we define a parameter as a level of removal. Setting results in maximal angle decrease between and (perpendicular vectors), while makes no transformation at all. Note, however, that placing vectors from some subclass words to be perpendicular with attribute words may not necessarily result in overall bias level reductions since WEAT tests are relative measures. E.g., they take the relationship between male and intellectual words and the relationship between female and appearance-related words into account at the same time. Additionally, higher also pose a greater risk of producing new bias. Moving the representation of subclass words away from attribute words may result in moving them closer to some other without prior intent. For example, moving male away from science can get them closer to aggressive, resulting in other WEAT test increase. To address these issues, we choose a nullspace via SVD that minimizes the average WEAT effect size for all tests.

SoftWEAT executes the following steps: Given some and its corresponding related attribute set of words , we generate a matrix , where the rows consist of the first principal components for each respectively. For , we then find the nullspace vector that decreases WEAT test scores the most. As a goal, we aim to translate our to this space. Since initially, there might be only few target set words , we expand it by adding for all of them the closest, frequently occurring neighbor words (e.g., the 20 closest neighbors with minimum frequency of 200 in the English Wikipedia [arora2017asimple]).) Afterwards, we calculate a transformation vector for which it holds that: , where is the mean of words. This transformation vector is then multiplied with a parameter followed by conversion into linear translation matrix . By translating vectors from extended word sets, we preserve relationships of subclass words to these words. Note that we only modify positions of words from the extended neighborhood of subclass words, but not all words from the vocabulary. Finally, as a last optional step, we normalize all vectors.

In SoftWEAT, the user can decide on the , but can also select target and attribute sets used for debiasing. By default, we will only take those into account, for which WEAT scores result in an aggregated effect size of . To compute this, we accumulate attribute sets per each target set that form matrix A per each . In case of positive , gets removed from , and from . In case of negative , removal is done other way around. The algebraic operations are formalized as follows: Out of the new matrix W’ with rows and columns, all but the last row are taken as a new vector representation for the given target set of words .

4 Experimental Evaluation

This section presents example results of our methods in practical settings and compares it with Conceptor Debiasing [karve-etal-2019-conceptor] as the current state-of-the-art approach. We measure the bias decrease along with deterioration of embedding quality and show the effects of biased/debiased embeddings in a Sentiment Analysis as a downstream task example. Due to limited space, we also provide an extended set of results in an accompanying online appendix111

4.1 Experimental Setup

Datasets: Extending the experimental design from [10.1145/3342220.3343658], we apply debiasing simultaneously on following target sets/subclasses: (male, female) - gender, (islam, christianity, atheism) - religion and (black and white names) - race with seven distinct attribute set pairs55footnotemark: 5. We collected target, attribute sets, and class definitional sets from literature [may2019measuring, DBLP:journals/corr/abs-1809-01496, DBLP:journals/corr/abs-1804-06876, caliskan2017, manzini2019black, 10.1145/3342220.3343658], see our online appendix for a complete list. As in previous studies [karve-etal-2019-conceptor], evaluation was done on three pretrained Word Embedding models with vector dimension of 300: FastText222 webcrawl and Wikipedia, 2 million words), GloVe333 Crawl, Wikipedia and Gigaword, 2.2 million words) and Word2Vec444 (Trained on Google News, 3 million words). For the Sentiment Analysis task, we additionally employed a dataset of movie reviews [maas-EtAl:2011:ACL-HLT2011].

Quality Assessment: First, we compared ranked lists of word pairs in terms of their vector similarity to human judgement via Spearman’s Rank correlation coefficient [doi:10.1002/0471667196.ess5050.pub2], by using the collection taken from the Conceptor Debiasing evaluation [karve-etal-2019-conceptor], i.e., RG65, WS, RW, MEN, MTurk, SimLex, SimVerb. In addition, we also utilize Mikolov Analogy Test [mikolov2013a].

Methods: Regarding HardWEAT, we specified neutral words through set difference between all words and ones from priorly defined target sets. In terms of SoftWEAT we provide details such as target-attribute sets structure and number of changed words in section F of online appendix. We applied the OR operator in the Conceptor Debiasing, using the same word set of subclasses within the three above defined classes (subspaces in Conceptor Debiasing).

4.2 Bias levels and quality of Word Embeddings

First, we focus on overall bias levels, see Table 1, by measuring WEAT scores before and after debiasing. We observe that HardWEAT removes the measured bias completely as indicated by zero WEAT scores. SoftWEAT also substantially reduces the bias measurements to different degrees for different datasets. In comparison, for example with , SoftWEAT still leads to a stronger reduction in bias compared to the state-of-the-art Conceptor algorithm in all but one measurement.

gender 0.62 0.0** 0.47 0.3 0.28 0.15 0.09* 0.24
GloVe race 0.71 0.0** 0.6 0.51 0.39 0.3 0.19* 0.43
religion 0.77 0.0** 0.62 0.46 0.29 0.2 0.16* 0.28
gender 0.52 0.0** 0.14* 0.17 0.32 0.31 0.32 0.46
FastText race 0.36 0.0** 0.13* 0.16 0.25 0.3 0.31 0.31
religion 0.6 0.0** 0.27 0.21* 0.29 0.35 0.42 0.54
gender 0.63 0.0** 0.48 0.37 0.32 0.2 0.23 0.12*
Word2Vec race 0.56 0.0** 0.32 0.28 0.2 0.16 0.16* 0.38
religion 0.47 0.0** 0.31 0.19 0.11* 0.18 0.2 0.28
Table 1: Bias levels for gender, race, and religion before debiasing (REG) and after debiasing with Hard/SoftWEAT (HW/SW) or Conceptor (CPT)

Regarding quality assessment, Table 2, shows the complete results for seven rank similarity datasets as well as the Mikolov analogy score on the example of the GloVe Word Embedding. As expected, a significant quality drop appears with HardWEAT, most notably on the RG65 dataset. With SoftWEAT, greater induce larger modifications of the original embeddings, which corresponds to a greater drop in embedding quality. However, in three out of eight test settings, SoftWEAT with lower settings achieved a similar or higher score and also leads to competitive results compared to Conceptor, see Section 4.4 for an in-depth discussion. Extended results can be found in the online appendix.

RG65 76.03* 63.42 75.68 75.39 74.34 73.25 71.42 68.73
WS 73.8 69.73 74.0* 73.96 73.15 71.95 70.34 73.48
RW 46.15 46.09 46.15 46.13 46.08 46.02 45.94 52.45*
MEN 80.13 77.52 80.34 80.34* 80.1 79.57 78.74 79.99
MTurk 71.51* 69.78 71.25 70.78 70.37 69.48 68.64 68.24
SimLex 40.88 40.44 41.46 42.0 42.07 42.21 42.18 47.36*
SimVerb 28.74 28.74 28.9 29.15 29.27 29.54 29.78 36.78*
Mikolov 0.65 0.64 0.65* 0.65 0.65 0.64 0.64 0.63
Table 2: Measurements of quality tasks after debiasing for GloVe embeddings.

4.3 Sentiment Analysis

To analyze the effects of debiasing in downstream tasks, we study a sentiment analysis task in the context of movie reviews, i.e., we investigate whether we observe significant differences in predicted sentiments when using biased and debiased Word Embeddings. Modifying setup from Packer et al. [47172], we added to the end of each test sentences randomized word from opposing set pairs (e.g., a typical black name or a typical white name), and calculated the difference in the prediction, i.e., the polarity score

according to a simple neural network that takes the respective pretrained word embeddings as pretrained embedding, which is followed by Flatten-operation and a sigmoid layer. The first two layers were not trainable.

Figure 2: Classifier Bias on (islam, christianity)

For training, we used a combination of binary-cross entropy loss and Ada-Delta optimizer. We trained our model only on sentences with a maximum length of 50 that did not contain any target sets of words. Shuffling test and training set, we then calculated polarity score on 15 different model instances using 6 different kinds of Embeddings: Regular (not debiased), SoftWEAT with , HardWEAT and Conceptor Debiasing with .

Results for [christianity, islam] are shown in Figure 2. For the non debiased embeddings, we observe that our classifier has no clear trend such that the addition of a bias word in a sentence influences the polarity in a particular direction. However, we can see that bias words have a strong influence on the classifier, leading to a large variance of polarity scores in GloVe and Word2Vec-based models. When we apply our debiasing methods, we see that the respective variances are substantially decreased, e.g., around 5.45 times for SoftWEAT already with a small of in GloVe. This is only the case with Conceptor Debiasing for the Word2Vec-based model, but not for the GloVe-based model.

4.4 Discussion

HardWEAT and SoftWEAT success factors: Based on our experiments, see also our online appendix, we could identify a variety of success factors for our methods. Regarding HardWEAT, higher embedding dimensionality and more dispersed vectors are more likely to output more desirable outcome. This is due to the possibility of random words appearing as close neighbors to a target words during the equidistancing procedure, which we counter iteratively as described above. Regarding centroid () neutralization, relevant factors of importance are number of classes, uniformity of bias levels and angles between all : Having more classes with uniform bias levels and distant angles can output non-desirable results, e.g., too large distances between and points. However, the user could also manually define an alternative centroid . We acknowledge that equidistancing comes with a main drawback, which is the partial loss of relationship between target and non-target words, which is also reflected in a quality drop according to different metrics. Regarding SoftWEAT, decreased angles between target and attribute sets after translation may not necessarily result in lower bias levels, since WEAT is a relative measure. However, we take one nullspace which minimizes these values. Furthermore, we note that in our experiments we used all target and attribute set pairs within each of the WEAT tests, which can be further optimized: We may not want to debias something, which isn’t priorly biased. E.g., removing male from science may be necessary, whereas doing the same for female from art may not, thus we could exclude this latter pair. Also, we should bear in mind, that attribute sets of words are often correlated (as also shown by our experiments in the online appendix). This implies that by moving subclass words towards a specific set of attribute words, we automatically change the associations also with other attribute sets. Thus, the user plays a crucial role in deciding which sets should be used for debiasing.

Comparison with Conceptor Debiasing: Given the experimental results, we conclude that neither Conceptor Debiasing nor SoftWEAT outperform each other. Yet, SoftWEAT exhibits some distinctive advantages: (i) With SoftWEAT, relationships within the target set words remain completely the same, whereas in Conceptor Debiasing, overall angle distribution gets more narrow (See online appendix for more details). (ii) We argue that with SoftWEAT, user gets more freedom with choosing on which target/attribute set combination and to which degree debiasing is applied. (iii) Given our method, there is no difference in word representation if one uses small subset of neutral words or complete vocabularies. Nevertheless, we acknowledge that Conceptor Debiasing does succeed in reducing bias equally well by applying it in more global behavior.

5 Conclusion

In this paper, we presented two novel approaches for multi-class debiasing of Word Embeddings. We demonstrated the general viability of these methods for reducing and/or eliminating biases while preserving meaningful relationships of the original vector representations. We also analyzed the effects of debiased representations in Sentiment Analysis as an example downstream task and find that debiasing leads to substantially decreased variance in the predicted polarity. Overall, our work contributes to ongoing efforts towards providing more unbiased neural representations of textual data.


We want to thank S. Karve, L. Ungar, L., and J. Sedoc for providing the code for Conceptor Debiasing and their support.