1. Introduction
Recently, there has been an explosion in the amount of information generated in the Web. To help Internet users deal with the information overload, text summarization algorithms are commonly used to get a quick overview of the textual information. Recognizing the business opportunities, many startups have mushroomed which offer content summarization services. For example, Agolo^{1}^{1}1https://www.agolo.com/splash provides a summarization platform to get most relevant information from both public and private documents. Aylien^{2}^{2}2https://aylien.com/textapi/summarization/ or Resoomer^{3}^{3}3https://resoomer.com/en/ present relevant points and topics from a piece of text. Moreover, some news websites, e.g., Harvard Business Review^{4}^{4}4https://www.hbr.org, have started providing an ‘executive summary’ of the news articles, to help the readers lacking time to go through the full article. Multiple smartphone apps (e.g., News360, InShorts) have also been launched to provide short summaries of news stories.
A large number of summarization algorithms have been devised in the research community, which include algorithms for summarizing an individual large document, as well as summarizing a set of documents (e.g., a set of microblogs or news articles) – see (Allahyari et al., 2017) for a survey. Most of the summarization algorithms are extractive in nature, i.e., they form the summary by extracting some of the textual units in the input (e.g., individual sentences in a document, or individual microblogs in a set of microblogs) (Gupta and Lehal, 2010). Additionally, some abstractive algorithms have also been devised, that attempt to generate natural language summaries (Allahyari et al., 2017). In this work, we restrict our focus to extractive summarization and leave abstractive summarization as future work.
Extractive summarization algorithms essentially perform a selection of a (small) subset of the textual units in the input, for inclusion in the summary, based on some measure of the relative quality or importance of the textual units. Traditionally, these algorithms are judged on how closely the algorithmic summary matches gold standard summaries that are usually written by human annotators. To this end, measures such as ROUGE scores are used to evaluate the goodness of algorithmic summaries(Lin, 2004). The underlying assumption behind this traditional evaluation criteria is that the data to be summarized is homogeneous, and the sole focus of summarization algorithms should be to identify summaryworthy information.
Information on the Web today is often an amalgamation of information of different classes, that come from multiple sources or groups, and often cover different perspectives. For example, on social media, different socially salient groups (e.g., men and women, Republicans and Democrats) discuss sociopolitical issues, and it is frequently observed that these social groups express very different opinions on the same topic or event (Chakraborty et al., 2017). Similarly, news media sources having different ideological leanings publish different articles on the same topic or event, covering different political parties, different gender issues, etc. Hence, while summarizing such heterogeneous data, one needs to check whether the generated summaries are properly representing the opinions of these different groups. or sources. Therefore, in this paper, we propose to look at summarization algorithms from a completely new perspective. We propose to investigate whether the selection of the textual units in the summary is fair, i.e., whether the generated summary fairly represents all the classes in the input data.
Our investigation over four realworld datasets (two microblog datasets, and two DUC2006 datasets that are frequently used as benchmarks for summarization) shows that many existing summarization algorithms do not fairly represent the input data in the generated summaries. Rather, some classes are systemically underrepresented in the process. To reduce such unfairness, we develop a novel fairnesspreserving summarization algorithm (named FairSumm) which selects highly relevant textual units in the summary while maintaining fairness in the process. Extensive evaluations show that FairSumm outperforms many existing summarization algorithms, not only in maintaining fairness, but also in providing better summary achieving high ROUGE scores.
In summary, we make the following contributions in this paper: (1) ours is one of the first attempts to consider the notion of fairness in summarization^{5}^{5}5Fairness in summarization was briefly introduced in our prior work (Shandilya et al., 2018)., (2) we show that existing summarization algorithms often do not fairly represent the input data, and (3) we propose ‘FairSumm’ algorithm which produces good quality and fair summary. Such a summary would not only benefit the end users of the summarization algorithms, but also many downstream applications that use the summaries generated by algorithms (e.g., Lloret et al. (Lloret et al., 2010) proposed a summarybased opinion classification and rating inference mechanism). Such applications would also benefit from a fair summary, e.g., one which fairly represents the positive and negative opinions in the input data. We plan to make the implementation of FairSumm publicly available upon acceptance of the paper.
2. Background and motivation
In this section, we investigate whether existing summarization algorithms produce summaries that are fair. We experiment with two types of data containing textual units from multiple classes – (i) one where the different classes correspond to different values of a sensitive attribute, such as the political leaning of the textual units, or the gender of their authors, and (ii) the other where the different classes correspond to different sources having different viewpoints, such as different news media. For both types of datasets, we investigate whether the summaries produced by existing summarization algorithms represent the different classes fairly.
2.1. Summarizing data associated with a sensitive attribute
We apply several wellknown extractive summarization algorithms on the following two microblog datasets where every textual unit is annotated with a sensitive attribute (gender or political leaning).
(1) Claritin dataset: This dataset contains tweets about the effects of the drug Claritin.^{6}^{6}6The dataset is described in detail at https://www.crowdflower.com/discoveringdrugsideeffectswithcrowdsourcing/. Each tweet is annotated with the gender of the user (male or female or unknown) who posted it. From this dataset, we ignored those tweets for which the gender of the user is unknown. The number of tweets in the different classes is reported in Table 1 (first row).
(2) USelection dataset (Darwish et al., 2017): This dataset contains tweets posted during the 2016 US Presidential elections. Each tweet is annotated as supporting or attacking one of the presidential candidates (Donald Trump and Hillary Clinton) or neutral or attacking both. For simplicity, we grouped the tweets into three classes: (i) ProRepublican: tweets which support Trump and / or attack Clinton, (ii) ProDemocratic: tweets which support Clinton and / or attack Trump, and (iii) Neutral: tweets which are neutral or attack both candidates. The number of tweets in the different classes is reported in Table 2 (first row).
Humangenerated summaries: The traditional way of evaluating the ‘goodness’ of a summary is to match it with one or more humangenerated summaries (gold standard), and then compute ROUGE scores (Lin, 2004). To this end, we asked three human annotators to summarize the two datasets described above. Each annotator is wellversed with the use of social media like Twitter, is fluent in English, and none is an author of this paper. The annotators were asked to generate extractive summaries independently. We use these three humangenerated summaries for the evaluation of various algorithmicallygenerated summaries, by computing ROUGE1 and ROUGE2 Recall and scores (Lin, 2004).
Summarization algorithms: We consider a set of wellknown extractive summarization algorithms, some of which are unsupervised (the traditional methods for summarization) and some which are supervised neural models.
Unsupervised summarization algorithms:
We consider six wellknown summarization algorithms. These algorithms generally estimate an importance score for each textual unit (sentence / tweet) in the input, and
textual units having the highest importance scores are selected to generate a summary of length .(1) Clusterrank (Garg et al., 2009) which clusters the textual units to form a clustergraph, and uses graph algorithms (e.g., PageRank) to compute the importance of each unit.
(2) DSDR (He et al., 2012) which measures the relationship between the textual units using linear combinations and reconstructions, and generates the summary by minimizing the reconstruction error.
(3) LexRank (Erkan and Radev, 2004)
, which computes the importance of textual units using eigenvector centrality on a graph representation based on similarity of the units, where edges are placed depending on the intraunit cosine similarity.
(4) LSA (Gong and Liu, 2001)
, which constructs a termsbyunits matrix, and estimates the importance of the textual units based on Singular Value Decomposition on the matrix.
(5) LUHN (Luhn, 1958), which derives a ‘significance factor’ for each textual unit based on occurrences and placements of frequent words within the unit.
(6) SumBasic (Nenkova and Vanderwende, 2005)
, which uses frequencybased selection of textual units, and reweights word probabilities to minimize redundancy.
Supervised neural summarization algorithms:
With the recently increasing popularity of neural network based models, the state of the art techniques for summarization have shifted to datadriven supervised algorithms
(Dong, 2018). We have considered two recently proposed extractive neural summarization models, proposed in (Nallapati et al., 2017):(7) SummaRuNNerRNN
, a Recurrent Neural Network (RNN) based sequence model that provides a binary label to each textual unit – a label of
implies that the unit can be part of the summary, while a label of indicates otherwise. Each label has an associated confidence score. The summary is generated by picking textual units labeledin decreasing order of their confidence score, until the desired summary length is exceeded. The proposed model is built around a twolayer bidirectional Gated Recurrent Unit based Recurrent Neural Network (GRURNN).
(8) SummaRuNNerCNN
is a variant of the above model where the sentences are fed to a two layer Convolutional Neural Network (CNN) architecture before using GRURNN in the third layer.
For both the SummaRuNNer models, the authors have made the pretrained models available^{7}^{7}7https://github.com/hpzhao/SummaRuNNer which are trained on the CNN/Daily Mail news articles corpus^{8}^{8}8https://github.com/deepmind/rcdata. We directly used the pretrained models for the summarization.
Results of summarization: We applied the above summarization algorithms on the two datasets, to obtain summaries of length tweets each. Table 1 shows the results of summarizing the Claritin datasets, while Table 2 shows that for the USelection dataset. In both cases, shown are the numbers of tweets of the different classes in the whole dataset (first row), and in the summaries generated by the different summarization algorithms (subsequent rows), and the ROUGE1 and ROUGE2 recall and scores of the summaries.
Method  Nos. of tweets  ROUGE1  ROUGE2  

Female  Male  Recall  Recall  
Whole data  2,505  1,532  
(62%)  (38%)  
ClusterRank  33  17  0.4369  0.4948  0.1614  0.1828 
DSDR  31  19  0.3018  0.4251  0.1443  0.2033 
LexRank  34  16*  0.2964  0.3926  0.1138  0.1599 
LSA  35  15*  0.5153  0.5041  0.1506  0.1473 
LUHN  34  16*  0.3802  0.4053  0.1280  0.1365 
SumBasic  27*  23  0.3144  0.4341  0.1082  0.1494 
SummaRNN  33  17  0.3423  0.3754  0.1257  0.1468 
SummaCNN  30  20  0.3774  0.4087  0.1265  0.1460 
Method  Nos. of tweets  ROUGE1  ROUGE2  

ProRep  ProDem  Neutral  Recall  Recall  
Whole data  1,309  658  153  
(62%)  (31%)  (7%)  
ClusterRank  32  15  3  0.2472  0.3499  0.0611  0.0865 
DSDR  28*  19  3*  0.2154  0.3313  0.0675  0.1039 
LexRank  27*  20  3*  0.2525  0.3672  0.0788  0.1146 
LSA  24*  20*  6  0.3107  0.4039  0.0832  0.1083 
LUHN  34  13*  3*  0.2808  0.3754  0.0846  0.1131 
SumBasic  27*  23  0*  0.1988  0.3111  0.0513  0.0803 
SummaRNN  34  15  1*  0.3472  0.4361  0.1201  0.1601 
SummaCNN  32  17  1*  0.3368  0.4227  0.1083  0.1446 
Verifying if the summaries are fair: To check whether the generated summaries are fair, we apply the principle of ‘adverse impact’ that is used by the U.S. Equal Employment Opportunity Commission to determine whether a company’s hiring policy is biased against a demographic group (Biddle, 2006). According to this policy, a particular class is underrepresented in the selected set, if the fraction of selected items belonging to class is less than of the fraction of selected items of the class having the highest selection rate. Applying this rule, we find underrepresentation of particular classes of tweets in the summaries generated by many of the algorithms; these cases are marked with an asterisk (*) in Table 1 and Table 2.
We repeated the experiments for summaries of lengths other than as well, such as for (details omitted due to lack of space). We observed several cases where the same algorithm includes very different proportions of tweets of various classes, while generating summaries of different lengths. Hence, whether summarization is fair depends on several factors, including the particular algorithm used and the length of summary.
2.2. Summarizing data from multiple sources
For these experiments, we consider the standard summarization datasets provided by the Document Understanding Conference (DUC) 2006^{9}^{9}9https://duc.nist.gov/data.html, which have been used to evaluate summarization algorithms in a large number of prior works (He et al., 2012). The DUC06 datasets contains ‘topics’, and each topic contains news articles relevant to the topic. The news articles are from three news media sources – The Associated Press (AP), The New York Times (NYTimes), and Xinhua News Agency (Xinhua). The DUC 2006 task was to generate a summary of words out of all the articles in a topic. To evaluate the summaries, a set of gold standard summaries written by human assessors are also provided.
Several teams participated in the DUC 2006 task, and submitted summaries of length words. The submitted summaries (called ‘peers’) are also provided along with the data. The submitted summaries contain both extractive and abstractive summaries. For the present work, we consider only the purely extractive summaries, i.e., those summaries in which every sentence is contained in one of the news articles for the corresponding topic. In total, there are extractive summaries submitted across all the topics.
Verifying if the summaries are fair: We checked whether the extractive summaries underrepresent any of the three sources, according to the notion of ‘adverse impact’ described earlier. To this end, we consider individual sentences in the input articles / summaries as the textual units. Note that the sentences in each news article have already been separated in the datasets provided by DUC, making this a natural choice for textual units.
Out of the extractive summaries, we found that summaries (91%) underrepresent at least one source, and summaries (58%) underrepresent two out of the three sources. Figure 1(a) shows the topicwise distribution (across the DUC06 topics) of the fraction of summaries that underrepresent at least one source. Similarly, Figure 1(b) shows the topicwise distribution of summaries that underrepresent two different sources.
These statistics show that a very large majority of the summaries submitted to DUC06 are not fair according to the notion of adverse impact. In fact, for as many as out of the topics, all the submitted extractive summaries underrepresent two of the three sources. We choose the datasets for two such topics for further experiments later in the paper – D0621 (crime and law enforcement in China) and D0626 (bombing of US embassies in Africa). For these two topics, Table 3 and Table 4 show the number of sentences from the three sources in the input data, in some of the submitted extractive summaries (Peer XX) and in the extractive summaries generated by the various algorithms stated in Section 2.1. We also present the ROUGE1 and ROUGE2 Recall and scores, as computed using the gold standard summaries provided by DUC06. As before, underrepresented sources are marked with an asterisk (*).
Method  Nos. of tweets  ROUGE1  ROUGE2  

APW  NYT  XIE  Recall  Recall  
Whole data  167  78  131  
(44%)  (21%)  (35%)  
Peer 2  1*  0*  6  0.3952  0.4008  0.0912  0.0905 
Peer 24  1*  0  7*  0.4878  0.4691  0.1015  0.1017 
Peer 27  5*  4  0*  0.3761  0.3542  0.0754  0.0766 
ClusterRank  11  0*  0*  0.3718  0.3127  0.0702  0.0589 
DSDR  6  0*  4  0.4509  0.4161  0.0875  0.0613 
LexRank  5  3  2*  0.4356  0.4430  0.1078  0.0848 
LSA  3*  3  4  0.4822  0.4021  0.0762  0.0677 
LUHN  4*  0*  5  0.3974  0.3213  0.0935  0.0647 
SumBasic  3*  3  3*  0.2847  0.3137  0.0437  0.0482 
SummaRNN  10  0*  0*  0.4428  0.4272  0.0811  0.0637 
SummaCNN  10  0*  0*  0.4328  0.4011  0.0712  0.0591 
Method  Nos. of tweets  ROUGE1  ROUGE2  

APW  NYT  XIE  Recall  Recall  
Whole data  338  89  65  
(69%)  (18%)  (13%)  
Peer 2  4*  2*  2  0.4316  0.4097  0.1073  0.1018 
Peer 24  7*  1*  2  0.4487  0.4251  0.1416  0.1341 
Peer 27  5  1*  0*  0.4658  0.4648  0.1545  0.1541 
Cluster Rank  8  0*  0*  0.3414  0.3355  0.0576  0.0566 
DSDR  10  0*  0*  0.4411  0.3902  0.1324  0.1171 
LexRank  5*  0*  4  0.4300  0.3648  0.1112  0.0942 
LSA  6*  5  1*  0.4317  0.3445  0.1516  0.0981 
LUHN  10  1*  0*  0.4005  0.3120  0.1516  0.0768 
SumBasic  8*  1*  2  0.2870  0.3651  0.0616  0.0785 
SummaRNN  10  0*  0*  0.4658  0.3772  0.1616  0.1202 
SummaCNN  10  0*  0*  0.4529  0.3911  0.1472  0.1125 
The experiments in this section establish that summaries generated by existing summarization algorithms are often not fair, and underrepresent one or more classes of the textual units. Having identified the need for new algorithms for fair summarization, we proceed to define fairness notions for the problem of summarization which the algorithms should satisfy.
3. Notions of Fair Summarization
Next, we define some fairness notions in the context of summarization. As before, we assume that the textual units in the input data are categorized into different classes.
Equal Representation: According to the notion of Statistical Parity (Luong et al., 2011), a summarization algorithm will be fair if different classes in the input data are represented equally in the generated summary.
Proportional Representation: Often it may not be possible to satisfy equal representation of different classes in the summary, especially if the input data itself has very different proportions from the different classes. The notion of Proportional Representation requires that the representation of different classes in the summary should be proportional to their distribution in the input data.
No Adverse Impact: As defined earlier, the principle of Adverse Impact (Biddle, 2006) is used to measure unfairness. We propose a notion of fairness, which would ensure no adverse impact in summarization. More specifically, this fairness notion requires that the fraction of textual units from any class, that is selected in the summary, should not be less than of the fraction of selected units from the class having the highest selection rate (in the summary).
4. FairSumm: A Fairnesspreserving Summarization Algorithm
Our proposed fairnesspreserving summarization algorithm treats summarization as an optimization problem of a submodular, monotone objective function, where the fairness requirements are applied as constraints. In this section, we first establish the theoretical premise of our algorithm, and then describe our algorithm. The symbols used in this section are given in Table 5.
4.1. Submodularity and Monotonicity
Definitions: Let be the set of elements each of which represents a textual unit (e.g., a tweet or a sentence) in the input collection to be summarized. We define a function : (where is the set of real numbers) that assigns a real value to a subset (say, ) of . Our ultimate aim is to find ( ) such that , where is the desired length of the summary (that is specified as an input), and for which is maximized (i.e. = ). So, from a set of textual units , we look to find a summary that maximizes an objective function .
Definition 1 (Discrete derivative) (Krause and Golovin, 2014): For and a function , let = – be the discrete derivative of at with respect to .
Definition 2 (Submodularity): a function is submodular if for every and (i.e. and ),
(1) 
This property is popularly called the property of diminishing returns.
Definition 3 (Monotonicity): The function is monotone (or monotone nondecreasing if for every ,
Symbols  Meanings 

Set of textual units to be summarized  
, number of textual units to be summarized  
Number of classes to which the textual units in belong (e.g., for Claritin dataset)  
The classes of the textual units  
Summary conforming to fairness notions ()  
Desired length of summary,  
,  Minimum number of textual units from class to be included in (to satisfy fairness) 
Objective function (overall goodness measure of )  
Coverage function (goodness measure of )  
Diversity reward function (goodness measure of )  
Similarity score between two textual units  
A (partition) matriod  
Set of partitions of matriod  
The optimized fair summary produced by Algorithm 1 
Properties: We now discuss some of the important properties of monotone submodular functions which we will exploit in our problem formulation.
Property 1 (The class of submodular functions is closed under nonnegative linear combinations): Let , , …, defined by
: ( = 1, 2, …, ) be submodular functions and , , …, be nonnegative real numbers. Then is also submodular.
Proof: This is a wellknown property of submodular functions (e.g., see (Lin and Bilmes, 2011)),
hence the proof is omitted.
Property 2 (The class of monotone functions is closed under nonnegative linear combinations): Let , , …, defined by
: ( = 1, 2, …, ) be monotone functions and , , …, be nonnegative real numbers. Then is also monotone.
Proof: Let . Then, since each is a monotone, . Let = .
Case I: Let 0, for all .
Then, = 0 and is a constant (nondecreasing) function.
Case II: Let 0, for all . Then we can write
Case III: Let only some ( 0 and ) s 0. Let, without any loss of generality, such s be , , …, . Then we have
This completes the proof.
Property 3 (The class of monotone submodular functions is closed under nonnegative linear combinations): Let , , …, defined by : ( = 1, 2, …, ) be monotone submodular functions and , , …, be nonnegative real numbers. Then is also monotone submodular.
Proof: This follows trivially from Properties 1 and 2.
Property 4: Given functions : and : , the composition = : (i.e., = ) is nondecreasing submodular, if is nondecreasing concave and is nondecreasing submodular.
Proof: This is also a wellknown property of submodular functions (e.g., see (Lin and Bilmes, 2011)), hence the proof is omitted.
4.2. Formulation of the objective function for summarization
We now look for an objective function for the task of extractive summarization. Following the formulation by Lin et al. (Lin and Bilmes, 2011), we use monotone submodular functions to construct the objective function. We consider the following two aspects of an extractive text summarization algorithm:
Coverage: Coverage refers to amount of information covered in the summary , measured by a function, say, . The generic form of can be
(2) 
where denotes the similarity between two textual units (tweets, sentences, etc.) and . Thus, measures the overall similarity of the textual units included in the summary with all the textual units in the input collection .
Note that is monotone submodular. is monotonic since coverage increases by the addition of a new sentence in the summary. At the same time, is submodular since the increase in would be more when a sentence is added to a shorter summary, than when a sentence is added to a longer summary. There can be several forms of depending on how is measured, which we will discuss later in this paper.
Diversity reward: The purpose of this aspect is to avoid redundancy and reward diverse information in the summary. Let the associated function be denoted as . A generic formulation of is
(3) 
where comprise a partition of such that and = for all ; is a suitable monotone submodular function that estimates the importance of adding the textual unit to the summary. The partitioning can be achieved by clustering the set using any clustering algorithm (e.g., means), based on the similarity of items as measured by .
rewards diversity since there is more benefit in selecting a textual unit from a partition (cluster) that does not yet have any of its elements included in the summary. As soon as any one element from a cluster is included in the summary, the other elements in will start having diminishing gains, due to the square root function.
The function is a ‘singleton reward function’ since it estimates the reward of adding the singleton element to the summary . One possible way to define this function is:
(4) 
which measures the average similarity of to the other textual units in .
Note that is monotone submodular by Property (4), since square root is a nondecreasing concave function. This formulation will remain monotone submodular if any other nondecreasing concave function is used instead of square root.
While constructing a summary, both coverage and diversity are important. Only maximizing coverage may lead to lack of diversity in the resulting summary and vice versa. So, we define our objective function for summarization as follows:
(5) 
where , 0 are the weights given to coverage and diversity respectively. Note that, by Property (3), is monotone submodular.
Our proposed fairnesspreserving summarization algorithm will maximize in keeping with some fairness constraints. We now discuss this step.
4.3. Submodular Maximization using Partition Matroids
For fairnesspreserving summarization, we essentially need to optimize the submodular function while ensuring that the summary includes at least a certain number of textual units from each class present in the input data. This problem can be formulated using the concept of partition matroids, as described below.
Definitions: We start with some definitions that are necessary to formulate the constrained optimization problem.
Definition 4 (Matroid): A matroid is a pair = (, ), defined over a finite set (called the ground set) and a family of sets (called the independent sets), that satisfies the following three properties:

(empty set) .

If and , then .

If , and , then there exists such that .
Definition 5 (Partition Matroids): Partition matroids refer to a special type of matroids where the ground set is partitioned into disjoint subsets , , …, for some and
= { — and , for all = 1, 2, …, }
for some given parameters , , …, .
Thus, is a subset of that contains at least items from the partition (for all ), and is the family of all such subsets.
Formulation of the constrained maximization problem: Consider that we have a set of control variables (e.g., gender, political leaning), each of which takes distinct values (e.g., male and female, democrat and republican). Each item in has a particular value for each .
For each control variable , we can partition into disjoint subsets , , …, , each corresponding to a particular value of this control variable. We now define a partition matriod = (, ) such that
= { — and , for all }
for some given parameters , , …, .
Now, for a given submodular objective function , a submodular optimization under the partition matriod constraints with control variables can be designed as follows:
(6) 
subject to .
A prior work by Du et al. (Du et al., 2013) has established that this submodular optimization problem under the matroid constraints can be solved efficiently with provable guarantees (see (Du et al., 2013) for details).
4.4. Proposed summarization scheme
In the context of the summarization problem, the ground set is (= ), the set of all textual units (sentences/tweets) which we look to summarize. The control variables (stated in Section 4.3) are analogous to the sensitive attributes with respect to which fairness is to be ensured. In this work, we consider only one sensitive attribute for a particular dataset (the gender of a user for the Claritin dataset, political leaning of a tweet for the USelection dataset, and the media source for the DUC06 dataset). Let the corresponding control variable be , and let take distinct values (e.g., for the Claritin dataset, and for the USelection dataset). Note that, as described in Section 4.3, the formulation can be extended to multiple sensitive attributes (control variables) as well.
Each textual unit in is associated with a class, i.e., a particular value of the control variable (e.g., is posted either my a male or a female). Let , , …, ( , for all ) be the disjoint subsets of the textual units from the classes, each associated with a distinct value of . We now define a partition matroid = (, ) in which is partitioned into disjoint subsets , , …, and
= { — and , = 1, 2, …, }
for some given parameters , , …, . In other words, will contain all the sets containing at most sentences from , = 1, 2, …, .
Outside the purview of the matroid constraints, we maintain the restriction that ’s are chosen such that
(1) (the desired length of the summary ), and
(2) a desired fairness criterion is maintained in . For instance, if equal representation of all classes in the summary is desired, then for all .
We now express our fairnessconstrained summarization problem as follows:
(7) 
subject to .
where the objective function is as stated in Equation 5.
The suitable algorithm to solve this constrained submodular optimization problem, proposed by Du et al. (Du et al., 2013), is presented as Algorithm 1. The produced by Algorithm 1 is the solution of Equation 7. This algorithm is an efficient alternative to the greedy solution which has a time complexity of where = and = . On the other hand, it can be shown that the time complexity of Du et al. (Du et al., 2013) is , where is a factor (to be explained shortly). The reason for this efficiency is the fact that this algorithm does not perform exhaustive evaluation of all the possible submodular functions evolving in the intermediate steps of the algorithm. Instead, it keeps on decreasing the threshold by a dividing factor (1 + , which skips the evaluation of many submodular functions and sets the threshold to zero when it is small enough. It selects elements from the ground set (in our case, ) only at each threshold value to evaluate the marginal gain (()  ()) without violating any constraints.
5. Alternative mechanisms for fair summarization
In this section, we discuss two alternative mechanisms for generating fair summaries.
5.1. Summarizing classes separately
Suppose that the textual units in the input belong to classes , and to conform to a desired fairness notion, the summary should have units from class , (using the same notations as in Section 4). The easiest way to generate a fair summary is to separately summarize the textual units belonging to each class , to produce a summary of length , and finally to combine all the summaries to obtain the final summary of length . We refer to this method as the ClasswiseSumm method – specifically, we use our proposed algorithm, without any fairness constraints, to summarize each class separately.
While this method is possibly the easiest to generate a fair summary, it is not clear how good the resultant summaries will be. We will evaluate this method in Section 6.
5.2. Fair ranking based summarization
Many summarization algorithms (including the unsupervised ones stated in Section 2) generate an importance score of each textual unit in the input. The textual units are then ranked in decreasing order of this importance score, and the topranked units are selected to form the summary. Hence, if the ranked list of the textual units can be made fair (according to some fairness notion), then selecting the top from this fair ranked list can be another way of generating fair summaries.
Zehlike et al. (Zehlike et al., 2017) recently proposed a fairranking scheme in a twoclass setting, with a ‘majority class’ and a ‘minority class’ for which fairness has to be ensured adhering to a ranked group fairness criterion . They proposed a ranking algorithm FA*IR that ensures that the proportion of the candidates/items from the minority class in a ranked list never falls below a certain specified threshold. Zehlike et al. (Zehlike et al., 2017) ensure two fairness criteria – selection utility which means every selected item is more qualified than those not selected, and ordering utility which means for every pair of selected candidates, either the more qualified is ranked above or the difference in their qualifications is small.
Applying FA*IR algorithm for summarization: We use this algorithm for fair extractive text summarization as follows. We consider that class to be the majority class which has the higher number of textual units in the input data, while the class having lesser textual units in the input is considered the minority class.
Input: The algorithm takes as input a set of textual units (to be summarized); the other input parameters (, , and ) are set as discussed below.
Parameter settings: We set the parameters of FA*IR as follows.

Qualification () of a candidate: In our summarization setting, this is the goodness value of a textual unit in the data to be summarized. We set this value to the importance score computed by some standard summarization algorithm (e.g., the ones discussed in Section 2.1) that ranks the text units by their importance scores.

Expected size () of the ranking: The expected number of textual units in the summary ().

Indicator variable () indicating if the candidate is protected: We consider that class to be the minority class which has the lesser number of textual units in the input data.

Minimum proportion () of protected candidates: We will set this value in the open interval 0, 1 (0 and 1 excluded) so that a particular notion of fairness is ensured in the summary. For instance, if we want equal representation of both classes in the summary, we will set .

Adjusted significance level (): We regulate this parameter in the open interval 0, 1.
Working of the algorithm: Two priority queues (for the textual units from the majority class) and (for the textual units of the minority class), each with capacity , are set to empty. Then and are initialized by the goodness values () of the majority and minority textual units respectively. Then a ranked group fairness table is created which calculates the minimum number of minority items/candidates at each rank, given the parameter setting. If this table determines that a textual unit from the minority class needs to added to the summary (being generated), the algorithm adds the best element from to the summary ; otherwise it adds the overall best textual unit (from ) to .
Output: A fair summary () of desired length , adhering to a particular notion of fairness.
Note that since the FA*IR algorithm provides fair ranking for two classes only (Zehlike et al., 2017), we look to apply this algorithm for summarization of data containing textual units from exactly two classes. So, we report the summarization results using this methodology only on the Claritin dataset that has two classes – Male and Female. It is an interesting future work to design a fair ranking algorithm for more than two classes, and then to use the algorithm for summarizing data from more than two classes.
6. Experiments and Evaluation
We now experiment with different methodologies of generating fair summaries, over the four datasets – the two microblog datasets Claritin, USelection, and the two DUC news article datasets D0621 and D0626 (described in Section 2). For the Claritin and USelection datasets, we generate all summaries of length tweets. For the two DUC datasets, we generate all summaries of length words (as specified by DUC2006). Specifically, sentences are included in the summary until the length of the summary becomes , and the summary is truncated to words while computing ROUGE scores. To evaluate the quality of summaries, we compute ROUGE1 and ROUGE2 Recall and scores by matching the algorithmically generated summaries with the gold standard summaries.
Parameter settings of algorithms:
The proposed FairSumm algorithm uses a similarity function to measure the similarity between two textual units and .
We experimented with the following two similarity functions:
(1) TFIDFsim
– we compute TFIDF scores for each word (unigram) in a dataset, and hence obtain a TFIDF vector for each textual unit. The similarity
is computed as the cosine similarity between the TFIDF vectors of and .(2) Embedsim – we obtain embeddings for the words in a dataset, either by training Word2vec (Mikolov et al., 2013) on the dataset, or by considering pretrained GloVe embeddings (Pennington et al., 2014). In either way, we get an embedding (a vector of dimension ) for each distinct word in the dataset, which is expected to capture the semantics of the word. For a given textual unit , we obtain an embedding by taking the average embedding of all words contained in (note that Word2vec vectors are additive in nature (Mikolov et al., 2013)). The similarity is computed as the cosine similarity between the embeddings of and .
We experimented with these two similarity measures and found that the performance of the FairSumm algorithm is very similar for both. Hence, we report results for the TFIDFsim similarity measure.
For the FA*IR algorithm, the value of the parameter needs to be decided (see Section 5.2). We try different values of in the interval 0, 1 using grid search, and finally use since this value obtained the best ROUGE scores on the Claritin dataset.
Method  Nos. of tweets  ROUGE1  ROUGE2  

Female  Male  Recall  Recall  
Whole data  2,505  1,532  
(62%)  (38%)  
Without any fairness constraint  
FairSumm  37  13  0.5487  0.5457  0.1724  0.1706 
Fairness: Equal representation  
FairSumm  25  25  0.5604  0.5523  0.1877  0.1849 
ClasswiseSumm  25  25  0.5453  0.5383  0.1722  0.1697 
Fa*irClusRank  25  25  0.4333  0.4805  0.1355  0.1625 
Fa*irDSDR  25  25  0.2846  0.4005  0.1392  0.2063 
Fa*irLexRank  25  25  0.2900  0.3699  0.1100  0.1534 
Fa*irLSA  25  25  0.5135  0.4928  0.1136  0.1090 
Fa*irLUHN  25  25  0.4153  0.4290  0.1145  0.1183 
Fa*irSumBasic  25  25  0.3144  0.4357  0.1109  0.1538 
Fa*irSummaRNN  25  25  0.3558  0.4099  0.1257  0.1536 
Fa*irSummaCNN  25  25  0.3558  0.4099  0.1257  0.1536 
Fairness: Proportional representation  
FairSumm  31  19  0.5719  0.5681  0.2061  0.2022 
ClasswiseSumm  31  19  0.5501  0.5414  0.1803  0.1731 
Fa*irClusRank  31  19  0.4387  0.4830  0.1332  0.1587 
Fa*irDSDR  31  19  0.3018  0.4251  0.1451  0.2045 
Fa*irLexRank  31  19  0.3117  0.4061  0.1153  0.1596 
Fa*irLSA  31  19  0.5018  0.4868  0.1181  0.1146 
Fa*irLUHN  31  19  0.4261  0.4349  0.1190  0.1215 
Fa*irSumBasic  31  19  0.3180  0.4355  0.1163  0.1593 
Fa*irSummaRNN  31  19  0.3405  0.3939  0.1203  0.1475 
Fa*irSummaCNN  31  19  0.3405  0.3939  0.1203  0.1475 
Method  Nos. of tweets  ROUGE1  ROUGE2  
ProRep  ProDem  Neutral  Recall  Recall  
Whole data  1,309  658  153  
(62%)  (31%)  (7%)  
Without any fairness constraint  
FairSumm  34  12  4  0.3586  0.4596  0.0743  0.0913 
Fairness: Equal representation  
FairSumm  17  17  16  0.3683  0.4671  0.0781  0.0965 
ClasswiseSumm  16  16  18  0.3627  0.4671  0.0711  0.0879 
Fairness: Proportional representation  
FairSumm  31  15  4  0.3756  0.4902  0.0937  0.1164 
ClasswiseSumm  30  15  5  0.3668  0.4541  0.0810  0.1003 
Fairness: No Adverse Impact  
FairSumm  29  17  4  0.3713  0.4836  0.0861  0.1024 
FairSumm  30  16  4  0.3721  0.4893  0.0869  0.1093 
FairSumm  31  15  4  0.3756  0.4902  0.0937  0.1164 
FairSumm  31  16  3  0.3711  0.4775  0.0854  0.0956 
FairSumm  32  15  3  0.3707  0.4726  0.0849  0.0936 
Results: Table 6 reports the results of fair summarization algorithms on the Claritin dataset. Specifically, we compute summaries without any fairness constraint, and considering the two fairness notions of equal representation and proportional representation (as explained in Section 3). In each case, we state the number of sentences in the summary from the two classes, and the ROUGE scores of the summary. Similarly, Table 7, Table 8 and Table 9 report the results for the USelection, D0621 and D0626 datasets respectively. The FairSumm (proposed) and ClasswiseSumm algorithms are executed over all datasets. For the twoclass Claritin dataset, we also try the methodology stated in Section 5.2 where the FA*IR algorithm is used over several existing summarization algorithms such as ClusterRank, LexRank, DSDR, etc. These methodologies are denoted as Fa*irClusRank, Fa*irLexRank, Fa*irDSDR, and so on.
Note that, for generating a fixed length summary, the neural SummaRuNNer model uses only the textual units labeled with , ranked as per their confidence scores. While applying FA*IR algorithm over SummaRuNNer (Fa*irSummaRNN and Fa*irSummaCNN in Table 6), we have given as input to FA*IR the ranked list of only those textual units that are labeled with . Theoretically, it may so happen that there might not be sufficient representation (required for a certain fairness notion) of a certain class in the set of textual units that are labeled with . In such cases, a fair ranking algorithm can not make the final summaries fair. We make the following observations from the results.
Ensuring fairness can lead to better summarization: FairSumm with fairness constraints always achieves higher ROUGE scores than FairSumm without any fairness constraints. In some cases, ClasswiseSumm with fairness constraints also achieve higher ROUGE scores than FairSumm without fairness constraints. Also, from Table 1 and Table 6 (both on Claritin dataset), we can compare the performance of the existing summarization algorithms (e.g., DSDR, LexRank) without any fairness constraint, and after their outputs are made fair using the methodology in Section 5.2. We find that the performances are comparable – while most of the ROUGE1 Recall scores are higher in Table 6 (when the summary is made fair), most ROUGE2 Recall scores are higher in Table 1 (in the original summaries). The trends are similar in case of Scores.
We also observe that summaries that conform to proportional representation generally achieve higher ROUGE scores than summaries that conform to equal representation.
Thus, making summaries fair can actually improve quality of summaries (as measured by ROUGE scores). These higher ROUGE scores of fair summaries are probably due to the human assessors inherently attempting to represent different classes of textual units in the gold standard summaries in a similar proportion as the classes occur in the input data.
Summarizing different classes separately does not yield good summarization: Across all datasets, the proposed FairSumm algorithm achieves higher ROUGE scores than the ClasswiseSumm approach, considering the same fairness notion. Note that in the ClasswiseSumm approach, the same algorithm is used on each class separately. Hence, separately summarizing each class leads to relatively poor summaries, as compared to the proposed methodology.
FairSumm generalizable to different fairness notions: The proposed FairSumm algorithm is generalizable to various fairness notions. Apart from equal representation and proportional representation, we also experimented with the ‘No adverse impact’ notion – the last few rows of Table 7 shows different summaries that can be generated using FairSumm considering ‘no adverse impact’ as the fairness notion (such rows are omitted from other tables for brevity).
Comparing FairSumm with other algorithms: As demonstrated in Table 1, Table 2, Table 3, and Table 4, none of the existing summarization algorithms generate fair summaries. For most of the datasets (except Claritin), the recently proposed supervised neural algorithms achieve higher ROUGE scores than the unsupervised algorithms. But the summaries produced by the neural methods underrepresent the minority neutral group in the USelection dataset, and two of the three sources in the DUC datasets.
The summaries generated by FairSumm, apart from ensuring fairness, achieve very comparable ROUGE scores as the bestperforming algorithms. For the two microblog datasets (Claritin and USelection), FairSumm with ‘proportional representation’ achieves higher ROUGE scores than all other methods.
For the D0626 dataset, comparing Table 4 and Table 9, we can see that the performance of FairSumm with ‘proportional representation’ actually surpasses that of all algorithms, including the neural models and all the summaries submitted in the DUC2006 track. For the topic D0621, we can see from Table 3 and Table 8 that the performance of FairSumm with proportional representation is better than that of all methods (including neural models) except the best submitted summary in the DUC track (Peer 24).
Lastly, across all the datasets, the summaries generated by the proposed FairSumm algorithm achieve the highest ROUGE scores, compared to all other methods for generating fair summaries.
Overall, the results signify that, FairSumm can not only ensure various fairness notions in the summaries, but also can generate summaries that achieve comparable (or better) ROUGE scores than many wellknown summarization algorithms.
Method  Nos. of tweets  ROUGE1  ROUGE2  
APW  NYT  XIE  Recall  Recall  
Whole data  167  78  131  
(44%)  (21%)  (35%)  
Without any fairness constraint  
FairSumm  4  1  3  0.4141  0.3972  0.0714  0.0722 
Fairness: Equal representation  
FairSumm  3  3  3  0.4374  0.4146  0.0881  0.0722 
ClasswiseSumm  4  3  3  0.4220  0.4004  0.0737  0.0752 
Fairness: Proportional representation  
FairSumm  4  2  3  0.4691  0.4643  0.0917  0.0893 
ClasswiseSumm  4  1  3  0.4423  0.4393  0.0804  0.0796 
Method  Nos. of tweets  ROUGE1  ROUGE2  
APW  NYT  XIE  Recall  Recall  
Whole data  338  89  65  
(69%)  (18%)  (13%)  
Without any fairness constraint  
FairSumm  5  1  2  0.4686  0.4021  0.1332  0.1302 
Fairness: Equal representation  
FairSumm  3  3  3  0.4743  0.4370  0.1245  0.1146 
ClasswiseSumm  3  3  3  0.4511  0.3887  0.1183  0.1018 
Fairness: Proportional representation  
FairSumm  6  2  1  0.4786  0.4270  0.1802  0.1514 
ClasswiseSumm  5  2  2  0.4623  0.3892  0.1204  0.1121 
7. Related Works
The expanding availability of textual information has demanded exhaustive research in the area of automatic text summarization. A large number of text summarization algorithms have been proposed; the reader can refer to (Gupta and Lehal, 2010) for surveys. One of the most commonly used class of summarization algorithms is centered around the popular TFIDF model (Salton, 1989). Different works have used TFIDF score based similarities for summarization (Radev et al., 2002; Alguliev et al., 2011). Additionally, there has been a series of works where summarization has been treated as a submodular optimization framework (Lin and Bilmes, 2011; Badanidiyuru et al., 2014). The algorithm proposed in this work is also based on a submodular constrained optimization framework, and uses the notion of TFIDF similarity .
Given that information filtering algorithms have farreaching social and economic consequences in today’s world, fairness and antidiscrimination have been recent inclusions in the algorithm design perspective (Hajian et al., 2016). There have been several recent works on measuring different notions of fairness and biases (Bonchi et al., 2017; Pedreshi et al., 2008; Chakraborty et al., 2017) as well as on removing the existing unfairness from different methodologies(Hajian et al., 2014; Zemel et al., 2013). Different fairnessaware algorithms have been proposed to achieve group and/or individual fairness for tasks such as clustering (Chierichetti et al., 2017), classification (Zafar et al., 2017), ranking (Zehlike et al., 2017), sampling (Celis et al., 2016), and so on. However, to the best of our knowledge, there has not been prior exploration of fairness in the context of summarization.
8. Conclusion
To our knowledge, this work is the first attempt to develop a fairnesspreserving text summarization algorithm. Through experiments on microblog datasets and the popular DUC datasets, we show that existing algorithms often produce summaries that are not fair. The proposed algorithm can generate highquality summaries that conform to various standard notions of fairness. In fact, ensuring the fairness of summaries often leads to enhancing the quality of the summary as well.
The proposed algorithm will help in addressing the concern that using a (inadvertently) ‘biased’ summarization algorithm can reduce the visibility of the voice/opinion of a certain social group or source in the summary. Moreover, downstream applications that use the summaries (e.g., for opinion classification and rating inference (Lloret et al., 2010)) would benefit from a fair summary (e.g., that fairly represents the positive and negative opinions in the input).
We believe that this work will open up interesting research problems on fair summarization, such as extending the concept of fairness to abstractive summaries, estimating user preferences for fair summaries in various applications, and so on.
References
 (1)
 Alguliev et al. (2011) Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, and Chingiz A Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 12 (2011).
 Allahyari et al. (2017) Mehdi Allahyari, Seyed Amin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. 2017. Text Summarization Techniques: A Brief Survey. CoRR abs/1707.02268 (2017). http://arxiv.org/abs/1707.02268
 Badanidiyuru et al. (2014) Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In ACM KDD.
 Biddle (2006) Dan Biddle. 2006. Adverse Impact and Test Validation: A Practitioner’s Guide to Valid and Defensible Employment Testing. Routledge.

Bonchi
et al. (2017)
Francesco Bonchi, Sara
Hajian, Bud Mishra, and Daniele
Ramazzotti. 2017.
Exposing the probabilistic causal structure of
discrimination.
International Journal of Data Science and Analytics
3, 1 (2017).  Celis et al. (2016) L Elisa Celis, Amit Deshpande, Tarun Kathuria, and Nisheeth K Vishnoi. 2016. How to be Fair and Diverse? arXiv preprint arXiv:1610.07183 (2016).
 Chakraborty et al. (2017) Abhijnan Chakraborty, Johnnatan Messias, Fabricio Benevenuto, Saptarshi Ghosh, Niloy Ganguly, and Krishna P Gummadi. 2017. Who makes trends? understanding demographic biases in crowdsourced recommendations.
 Chierichetti et al. (2017) Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In NIPS.
 Darwish et al. (2017) K. Darwish, W. Magdy, and Zanouda T. 2017. Trump vs. Hillary: What Went Viral During the 2016 US Presidential Election. In SocInfo.
 Dong (2018) Yue Dong. 2018. A Survey on Neural NetworkBased Summarization Methods. arXiv preprint arXiv:1804.04589 (2018).
 Du et al. (2013) Nan Du, Yingyu Liang, MariaFlorina Balcan, and Le Song. 2013. ContinuousTime Influence Maximization for Multiple Items. CoRR abs/1312.2164 (2013). arXiv:1312.2164 http://arxiv.org/abs/1312.2164
 Erkan and Radev (2004) Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graphbased Lexical Centrality As Salience in Text Summarization. J. Artif. Int. Res. 22, 1 (2004).
 Garg et al. (2009) Nikhil Garg and others. 2009. Clusterrank: a graph based method for meeting summarization. In INTERSPEECH.
 Gong and Liu (2001) Yihong Gong and Xin Liu. 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In ACM SIGIR.
 Gupta and Lehal (2010) Vishal Gupta and Gurpreet Singh Lehal. 2010. A Survey of Text Summarization Extractive Techniques. IEEE Journal of Emerging Technologies in Web Intelligence 2, 3 (2010).
 Hajian et al. (2016) Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairnessaware data mining. In ACM KDD.
 Hajian et al. (2014) Sara Hajian, Josep DomingoFerrer, and Oriol Farràs. 2014. Generalizationbased privacy preservation and discrimination prevention in data publishing and mining. Data Mining and Knowledge Discovery 28 (2014).
 He et al. (2012) Zhanying He and others. 2012. Document Summarization Based on Data Reconstruction. In AAAI.
 Krause and Golovin (2014) Andreas Krause and Daniel Golovin. 2014. Submodular Function Maximization. In Tractability: Practical Approaches to Hard Problems.
 Lin (2004) ChinYew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proc. Workshop on Text Summarization Branches Out, ACL.
 Lin and Bilmes (2011) Hui Lin and Jeff Bilmes. 2011. A Class of Submodular Functions for Document Summarization. In ACL (HLT ’11).
 Lloret et al. (2010) Elena Lloret, Horacio Saggion, and Manuel Palomar. 2010. Experiments on summarybased opinion classification. In NAACL HLT.
 Luhn (1958) H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM J. Res. Dev. 2, 2 (1958).
 Luong et al. (2011) Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. kNN as an implementation of situation testing for discrimination discovery and prevention. In ACM KDD.
 Mikolov et al. (2013) T. Mikolov, W.T. Yih, and G. Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In NAACL HLT.
 Nallapati et al. (2017) Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents.. In AAAI. 3075–3081.
 Nenkova and Vanderwende (2005) Ani Nenkova and Lucy Vanderwende. 2005. The impact of frequency on summarization. Technical Report. Microsoft Research.
 Pedreshi et al. (2008) Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discriminationaware data mining. In ACM KDD.
 Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proc. EMNLP.
 Radev et al. (2002) Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown. 2002. Introduction to the Special Issue on Summarization. Comput. Linguist. 28, 4 (2002).
 Salton (1989) Gerard Salton. 1989. Automatic text processing: The transformation, analysis, and retrieval of. Reading: AddisonWesley (1989).
 Shandilya et al. (2018) Anurag Shandilya, Kripabandhu Ghosh, and Saptarshi Ghosh. 2018. Fairness of Extractive Text Summarization. In Proc. The Web Conference (WWW) Companion Volume. 97–98.
 Zafar et al. (2017) Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness Constraints: Mechanisms for Fair Classification. In AIStats.
 Zehlike et al. (2017) Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo BaezaYates. 2017. FA*IR: A Fair Topk Ranking Algorithm. In ACM CIKM.
 Zemel et al. (2013) Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In ICML.
Comments
There are no comments yet.