Neural Networks have proven to be useful for automating tasks such as question answering, system response, and language generation considering large textual datasets. In learning systems, bias can be defined as the negative consequences derived by the implicit association of patterns that occur in a high-dimensional space. In dialogue systems, these patterns represent associations between word embeddings that can be measured by a Cosine distance to observe male- and female-related analogies that resemble the gender stereotypes of the real world. We propose an automatic technique to mitigate bias in language generation models based on the use of an external memory in which word embeddings are associated to gender information, and they can be sparsely updated based on content-based lookup.
The main contributions of our work are the following:
We introduce a novel architecture that considers the notion of a Fair Region to update a subset of the trainable parameters of a Memory Network.
We experimentally show that this architecture leads to mitigate gender bias amplification in the automatic generation of text when extending the Sequence2Sequence model.
2 Memory Networks and Fair Region
As illustrated in Figure 1, the memory consists of arrays and that store addressable keys (latent representations of the input) and values (class labels), respectively as in rare_events. To support our technique, we extend this definition with an array that stores the gender associated to each word, e.g., actor is male, actress is female, and scientist is no-gender. The final form of the memory module is as follows:
A neural encoder with trainable parameters receives an observation and generates activations in a hidden layer. We want to store a normalized (i.e., ) in the long-term memory module to increase the capacity of the encode. Hence, let be the index of the most similar key
then writing the triplet to consist of:
However, the number of word embeddings does not provide an equal representation across gender types because context-sensitive embeddings are severely biased in natural language, biasemnlp. For example, it has been shown in that man is closer to programmer than woman, bias_man_programmer. Similar problems have been recently observed in popular work embedding algorithms such as Word2Vec, Glove, and BERT, bias_recent.
We propose the update of a memory network within a Fair Region in which we can control the number of keys associated to each particular gender. We define this region as follows.
(Fair Region) Let be an latent representation of the input and be an external memory. The male-neighborhood of is represented by the indices of the -nearest keys to in decreasing order that share the same gender type male as
. Running this process for each gender type estimates the indices, , and which correspond to the male, female, and non-gender neighborhoods. Then, the FairRegion of given consists of .
The Fair Region of a memory network consists of a subset of the memory keys which are responsible for computing error signals and generating gradients that will flow through the entire architecture with backpropagation. We do not want to attend over all the memory entries but explicitly induce a uniform gender distribution within this region. The result is a training process in which gender-related embeddings equally contribute in number to the update of the entire architecture. This embedding-level constraint prevents the unconstrained learning of correlations between a latent vectorand similar memory entries in directly in the latent space considering explicit gender indicators.
3 Language Model Generation
Our goal is to leverage the addressable keys of a memory augmented neural network and the notion of fair regions discussed in Section2 to guide the automatic generation of text. Given an encoder-decoder architecture seq2seq; seq2seq_attention, the inputs are two sentences and
from the source and target domain, respectively. An LSTM encoder outputs the context-sensitive hidden representationbased on the history of sentences and an LSTM decoder receives both and and predicts the sequence of words . At every timestep of decoding, the decoder predicts the token of the output by computing its corresponding hidden state applying the recurrence
Instead of using the decoder output to directly predict the next word as a prediction over the vocabulary , as in key_value_networks. We combine this vector with a query to the memory module to compute the embedding vector . We do this by computing an attention score seq2seq_attentiontoken of the response . We then argmax the most likely entry in the output vocabulary to obtain the predicted token of the response . More formally,
Naturally, the objective function is to minimize the cross entropy of actual and generated content:
where is the number of training documents, indicates the number of words in the generated output, and is the one-hot representation of the word in the target sequence.
4 Bias Amplification
As originally introduced by biasemnlp, we compute the bias score of a word considering its word embedding 111For Seq2Seq neural models, this word embedding is the output of the decoder component and two gender indicators (words man and woman). For example, the bias score of scientist is:
If the bias score during testing is greater than the one during training,
then the bias of man towards scientist has been amplified by the model while learning such representation, given training and testing datasets similarly distributed.
We evaluate our proposed method in datasets crawled from the websites of three newspapers from Chile, Peru, and Mexico.
To enable a fair comparison, we limit the number of articles for each dataset to 20,000 and the size of the vocabulary to the 18,000 most common words. Datasets are split into 60%, 20%, and 20% for training, validation, and testing. We want to see if there are correlations showing stereotypes across different nations. Does the biased correlations learned by an encoder transfer to the decoder considering word sequences from different countries?
We compare our approach Seq2Seq+FairRegion, an encoder-decoder architecture augmented with a Fair Region, with the following baseline models:
Seq2Seq seq2seq: An encoder-decoder architecture that maps between sequences with minimal assumptions on the sequence structure and that is able to remember long term dependencies by mapping the source sentence into a fixed-length vector.
Seq2Seq+Attention seq2seq_attention: Similar to Seq2Seq, this architecture automatically attends to parts of the input that can be relevant to predict the target word.
5.3 Training Settings
For all the experiments, the size of the word embeddings is 256. The encoders and decoders are bidirectional LSTMs of 2-layers with state size of 256 for each direction. For the Seq2Seq+FairRegion model, the number of memory entries is 1,000. We train all models with Adam optimizer ADAM with a learning rate of and initialized all weights from a uniform distribution in . We also applied dropout dropout with keep probability of
for the inputs and outputs of recurrent neural networks.
5.4 Fair Region Results in Similar Perplexity
We evaluate all the models with test perplexity, which is the exponential of the loss. We report in Table 1 the average perplexity of the aggregated dataset from Peru, Mexico, and Chile, and also from specific countries.
Our main finding is that our approach (Seq2Seq+FairRegion) shows similar perplexity values () than the Seq2Seq+Attention baseline model () when generating word sequences despite using the Fair Region strategy. These results encourage the use of a controlled region as an automatic technique that maintains the efficacy of generating text. We observe a larger perplexity for country-based datasets, likely because of their smaller training datasets.
5.5 Fair Region Controls Bias Amplification
We compute the bias amplification metric for all models, as defined in Section 4, to study the effect of amplifying potential bias in text for different language generation models.
Table 1 shows that using Fair Regions is the most effective method to mitigate bias amplification when combining all the datasets (+0.09). Instead, both Seq2Seq (+0.18) and Seq2Seq+Attention (+0.25) amplify gender bias for the same corpus. Interestingly, feeding the encoders with news articles from different countries decreases the advantage of using a Fair Region and also amplifies more bias across all the models. In fact, training the encoder with news from Peru has, in general, a larger bias amplification than training it with news from Mexico. This could have many implications and be a product of the writing style or transferred social bias across different countries. We take its world-wide study as future work.
Gender bias is an important problem when generating text. Not only smart composer or auto-complete solutions can be impacted by the encoder-decoder architecture, but the unintended harm made by these algorithms could impact the user experience in many applications. We also show the notion of bias amplification applied to this dataset and results on how bias can be transferred between country-specific datasets in the encoder-decoder architecture.