1 Introduction
Graphs are widely used in modeling and analyzing complex systems such as biological networks or financial markets, which leads to a rise in attention towards various machine learning (ML) tasks over graphs. Specifically, node representation learning is a field with growing popularity. Node representations are mappings from nodes to vector embeddings containing both structural and attributive information. Their applicability on ensuing tasks has enabled various applications such as traffic forecasting
(Opolka et al., 2019), and crime forecasting (Jin et al., 2020). Graph neural networks (GNNs) have been prevalently used for representation learning, where node embeddings are created by repeatedly aggregating information from neighbors for both supervised and unsupervised learning tasks
(Kipf and Welling, 2017; Veličković et al., 2018; GarcíaDurán and Niepert, 2017).It has been shown that ML models propagate preexisting bias in training data, which may lead to discriminative results for ensuing applications (Dwork et al., 2012; Beutel et al., 2017). Particular to ML over graphs, while GNNbased methods achieve stateoftheart results for graph representation learning, they also amplify already existing biases in training data (Dai and Wang, 2020). For example, nodes in social networks tend to connect to other nodes with similar attributes, leading to denser connectivity between nodes with same sensitive attributes (e.g., gender) (Hofstra et al., 2017). Thus, by aggregating information from the neighbors, the representations obtained by GNNs may be highly correlated with the sensitive attributes. This causes discrimination in ensuing tasks even when the sensitive attributes are not directly used in training (Hajian and DomingoFerrer, 2013).
Data augmentation has been widely utilized to improve generalizability in trained models, as well as enable learning in unsupervised methods such as contrastive or selfsupervised learning. Augmentation schemes have been extensively studied in vision
(Shorten and Khoshgoftaar, 2019; Hjelm et al., 2018)and natural language processing
(Zhang et al., 2015; Kafle et al., 2017). However, there are comparatively limited work in the graph domain due to the complex, nonEuclidean structure of graphs. To the best of our knowledge, (Agarwal et al., 2021) is the only study that designs fairnessaware graph data augmentation in the contrastive learning framework to reduce bias.This study theoretically investigates the sources of bias in GNNbased learning and in turn improves fairness in node representations by employing fairnessaware graph data augmentation schemes. Proposed schemes corrupt both input graph topology and nodal features adaptively, in order to reduce the corresponding terms in the analysis that lead to bias. Although the proposed schemes are presented over their applications using contrastive learning, the introduced augmentation strategies can be flexibly utilized in several GNNbased learning approaches together with other fairnessenhancement methods.
Our
contributions in this paper can be summarized as follows:
c1) We theoretically analyze the sources of bias that is propagated towards node representations in a GNNbased learning framework.
c2) Based on the analysis, we develop novel fairnessaware graph data augmentations that can reduce potential bias in learning node representations. Our approach is adaptive to both input graph and sensitive attributes, and to the best of our knowledge, is the first study that tackles fairness enhancement through adaptive graph augmentation design.
c3) The proposed strategies incur low additional computation complexity compared to nonadaptive counterparts, and are compatible to operate in conjunction with various GNNbased learning frameworks, including other fairness enhancement methods.
c4) Theoretical analysis is provided to corroborate the effectiveness of the proposed feature masking and node sampling augmentation schemes.
c5)
Performance of the proposed graph data augmentation schemes is evaluated on real networks for both node classification and link prediction tasks.
It is shown that compared to stateoftheart graph contrastive learning methods, the novel augmentation schemes improve fairness metrics while providing comparable utility measures.
2 Related Work
Representation learning on graphs. Conventional graph representation learning approaches can be summarized under two categories: factorizationbased and random walkbased approaches. Factorizationbased methods aim to minimize the difference between the inner product of node representations and a deterministic similarity metric between them (Ahmed et al., 2013; Cao et al., 2015; Ou et al., 2016). Random walkbased approaches, on the other hand, employ stochastic measures of similarity between nodes (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Chen et al., 2018). GNNs have gained popularity in representation learning, for both supervised (Kipf and Welling, 2017; Veličković et al., 2018; Hu et al., 2019; Wu et al., 2019), and unsupervised tasks, e.g., (GarcíaDurán and Niepert, 2017; Hamilton et al., 2017). Specifically, recent success of contrastive learning on visual representation learning (Wu et al., 2018; Ye et al., 2019; Ji et al., 2019) has paved the way for contrastive learning for unsupervised graph representation learning.
Graph data augmentation. Augmentation strategies have been extensively investigated in vision (Shorten and Khoshgoftaar, 2019; Hjelm et al., 2018) and natural language processing (Zhang et al., 2015; Kafle et al., 2017) domains. However, the area is comparatively underexplored in the graph domain due to the complex, nonEuclidean topology of graphs. Graph augmentation based on graph structure modification has been developed to improve the utility of ensuing tasks (Rong et al., 2019; Zhao et al., 2020; Chen et al., 2020). Meanwhile, graph data augmentation has been used to generate graph views for unsupervised graph contrastive learning, see, e.g., (Veličković et al., 2019; Opolka et al., 2019; Zhu et al., 2020, 2021), which achieves stateoftheart results in various learning tasks over graphs such as node classification, regression, and link prediction (Opolka et al., 2019; Veličković et al., 2019; You et al., 2020; Zhu et al., 2020; Peng et al., 2020; Hassani and Khasahmadi, 2020). Among which, (Zhu et al., 2020) is the first study that aims to maximize the agreement of nodelevel embeddings across two corrupted graph views. Building upon (Zhu et al., 2020), (Zhu et al., 2021) develops adaptive augmentation schemes with respect to various node centrality measures and achieves better results. However, none of these studies are fairnessaware.
Fairnessaware learning on graphs. A pioneering study tackling the fairness problem in graph representation learning based on random walks is developed in (Rahman et al., 2019). In addition, adversarial regularization is employed to account for fairness of node representations (Dai and Wang, 2020; Bose and Hamilton, 2019; Fisher et al., 2020) where (Dai and Wang, 2020) is presented specifically for node classification, and (Fisher et al., 2020)
works on knowledge graphs.
(Buyl and De Bie, 2020) also aims to create fair node representations by utilizing a Bayesian approach where sensitive information is modeled in the prior distribution. Contrary to these aforementioned works, our framework is built on a theoretical analysis (developed within this paper). Similar to the works mentioned above, the proposed methods can be utilized within the learning process to mitigate bias by modifying the learned model (i.e., inprocessing fairness strategy). In addition, the proposed schemes can also be regarded as “preprocessing” tools, implying their compatibility to a wide array of GNNbased learning schemes in a versatile manner. Furthermore, (Ma et al., 2021) carries out a PACBayesian analysis and connects the concept of subgroup generalization to accuracy disparity, and (Zeng et al., 2021) introduces several methods including GNNbased ones to decrease the bias for the representations of heterogeneous information networks. While (Li et al., 2021; Laclau et al., 2021) modify adjacency to improve different fairness measures specifically for link prediction, (Buyl and De Bie, 2021) designs a regularizer for the same purpose. With a specific interest on individual fairness over graphs, (Dong et al., 2021) employs a rankingbased strategy. A biased edge dropout scheme is proposed in (Spinelli et al., 2021) to improve fairness. However, the scheme therein is not adaptive to the graph structure (the parameters of the framework are independent of the input graph topology). Fairnessaware graph contrastive learning is first studied in (Agarwal et al., 2021), where a layerwise weight normalization scheme along with graph augmentations is introduced. However, the fairnessaware augmentation utilized therein is designed primarily for counterfactual fairness.3 Fairness in GNNbased Representation Learning
GNNbased approaches are the stateoftheart for node representation learning. However, it has been demonstrated that the utilization of graph structure in the learning process not just propagates but also amplifies a possible bias towards certain sensitive groups (Dai and Wang, 2020). To this end, this section investigates the sources of bias in the generated representations via GNNbased learning. It carries out an analysis revealing that both nodal features and graph structure lead to bias, for which several graph data augmentation frameworks are introduced.
3.1 Preliminaries
This study aims to learn fairnessaware nodal representations for a given graph where denotes the node set, and represents the edge set. and are used to denote the feature and adjacency matrices, respectively, with the th entry if and only if . Degree matrix is defined to be a diagonal matrix with the th diagonal entry denoting the degree of . For the fairness examination, sensitive attributes of the nodes are represented with , where the existence of a single, binary sensitive attribute is considered. In this work, unsupervised methods are chosen as enabling schemes for the representation generation where given the inputs , and , the main purpose is to learn a mapping that generates dimensional (generally ) unbiased nodal representations through an layer GNN, which can be used in an ensuing task such as node classification. , , and denote the feature vector, representation at layer and the sensitive attribute of node . Furthermore, and denote the set of nodes whose sensitive attributes are and , respectively. Define interedge set , while intraedge set is defined as . Similarly, the set of nodes having at least one inter edge is denoted by , while defines the set of nodes that have no interedges. The intersection of the sets is denoted as . Additionally, and denote the numbers of interedges and intraedges adjacent to , respectively. Finally, denotes the entrywise absolute value for scalar or vector inputs, while it is used for the cardinality when the input is a set.
3.2 Analysis for Bias in GNN Representations
This subsection presents an analysis to find out the sources of bias in node representations generated by GNNs. Analysis is developed for the mean aggregation scheme in which aggregated representations at layer , , are generated such that for , where is the th row of corresponding to node , denotes the degree of node , refers to the neighbor set of node (including itself). The recursive relation in a GNN layer in which left normalization is applied for feature smoothing is where is the weight matrix in layer , and
denotes an identity matrix. With these definitions, the relation between the aggregated information
and node representations becomes equivalent to at the th GNN layer. As the provided analysis is applicable to every layer, superscript is dropped in the following to keep the notation simple.It has been demonstrated that features that are correlated with the sensitive attribute result in bias even when the sensitive attribute is not utilized in the learning process (Hajian and DomingoFerrer, 2013). This work provides an analysis on the correlation of with , and aims to reduce it. Note that, the reduction of correlation can still allow the generation of discriminable representations for different class labels, if the discriminability is provided by nonsensitive attributes. The (sample) correlation between the sensitive attributes and aggregated representations can be written as
where is the th column of . In the analysis, following assumptions are made:
A1: Node representations have sample means and respectively across each group, where . Throughout the paper, denotes the sample mean operation.
A2: Node representations have finite maximal deviations and : That is, , with .
Based on these assumptions, the following theorem shows that can be bounded from above, which will serve as a guideline to design a fairnessaware graph data augmentation scheme.
Theorem 1.
The total correlation between the sensitive attributes and representations that are obtained after a mean aggregation over graph , , can be bounded above by
(1) 
where , with , , , , , .
3.3 Fair Graph Data Augmentations
Data augmentation has been studied extensively in order to enable certain unsupervised learning schemes such as contrastive learning, selfsupervised learning or as a general framework to improve the generalizability of the trained models over unseen data. However, the design of graph data augmentations is still a developing research area due to the challenges introduced by complex, nonEuclidean graph structure. Several augmentation schemes over the graph structure are proposed in order to enhance the generalizability of GNNs (Rong et al., 2019; Zhao et al., 2020), while both topological (e.g., edge/node deletion) and attributive (e.g., feature shuffling/masking) corruption schemes have been developed in the context of contrastive learning (Veličković et al., 2019; You et al., 2020; Zhu et al., 2021, 2020). However, none of these works are fairnessaware. Hence, in this work, novel data augmentation schemes that are adaptive to the sensitive attributes, as well as the input graph structure are introduced with Theorem 1 as guidelines.
3.3.1 Feature Masking
In this subsection, an augmentation framework on nodal features is presented in order to mitigate possible intrinsic bias propagated by them. Note that in equation 1 is minimized when all nodal features are the same (i.e., all nodal features masked/zeroed out). However, this would result in the loss of all information in nodal features. Motivated by this, the proposed scheme aims to improve uniform feature masking in terms of reducing for a given masking budget (a total amount of nodal features to be masked in expectation). Specifically, the random feature masking scheme used in (You et al., 2020; Zhu et al., 2020)
where each feature has the same masking probability is modified to assign higher masking probabilities to the features varying more across different sensitive groups. Thus, masking probabilities are generated based on the term
. Let denote the normalized , the feature masking probability can then be designed as(2) 
where
is a hyperparameter. The feature mask
is then generated as a random binary vector, with the th entry ofdrawn independently from Bernoulli distribution with
for . The augmented feature matrix is obtained via(3) 
where is the concatenation operator, and is the Hadamard product. Since the proposed feature masking scheme is probabilistic in nature, the resulting is a random vector with entry having
(4) 
where is the probability that the feature is not masked in the graph view. The following proposition shows that the novel, adaptive feature masking approach can decrease compared to random feature masking, the proof of which can be found in Appendix A.2.
Proposition 1.
In expectation, the proposed adaptive feature masking scheme results in a lower value compared to uniform feature masking, meaning
(5) 
where corresponds to uniform masking with masking probability , and .
3.3.2 Node Sampling
In this subsection, an adaptive node sampling framework is introduced to decrease the term in equation 1 of Theorem 1, and hence to reduce the intrinsic bias that the graph topology can create. A small suggests a more balanced population distribution with respect to and
. Specifically, a subset of nodes is selected at every epoch and the training is carried over the subgraph induced by the sampled nodes. This augmentation mainly aims at reducing the bias by selecting a subset of more balanced groups, meanwhile it also helps reduce the computational and memory complexity in training.
The proposed node sampling is adaptive to the input graph, that is, it depends on the cardinalities of the sets , and . The developed scheme copes with the case . In algorithm design, it is assumed that if then and (same for ), which holds for all real graphs in our experiments, but our design principles can be readily extended to different settings as well.
Given input graph , the augmented graph can be obtained as an induced subgraph from a subset of nodes . All nodes in are retained, ( and ), while subsets of nodes and are randomly sampled from and with sample sizes and , respectively . See also Algorithm 1 in Appendix A.3. The cardinalities of node sets in the resulting graph augmentation satisfy . (See Appendix A.3 for details.)
Remark 1.
Note that the resulting graph yields as long as and is satisfied for any . The presented scheme here simply sets , which results in a balanced ratio across groups, but the performance can be improved if is selected carefully for specific datasets.
3.3.3 Augmentation on Graph Connectivity
Minimizing to zero in Theorem 1 suggests a graph topology where all nodes in the network have the same number of neighbors from each sensitive group, i.e., . Since, for this scenario, . Note that this finding is parallel to the main design idea of Fairwalk (Rahman et al., 2019) in which the transition probabilities are equalized for different sensitive groups in random walks in order to reduce bias in random walkbased representations.
This finding suggests that an ideal augmented graph could be generated by deleting edges from or adding edges to such that each node has exactly the same number of neighbors from each sensitive group. However, such a pernode sampling scheme is computationally complex and may not be ideal for largescale graphs. To this end, we propose global probabilistic edge augmentation schemes such that in the augmented graph,
(6) 
Here the expectation is taken with respect to the randomness in the augmentation design. It is shown in our experiments that the global approach can indeed help to reduce the value of (see Appendix A.8). Note that even though the strategy is presented for the case where (which holds for all datasets considered herein), the scheme can be easily generalized to the case where .
In social networks, users connect to other users that are similar to themselves with higher probabilities (Mislove et al., 2010), hence the graph connectivity naturally inherits bias towards potential minority groups. Motivated by the reduction of , the present subsection introduces augmentation schemes over edges to provide a balanced graph structure that can mitigate such bias.
Adaptive Edge Deletion.
To obtain a balanced graph structure, we first develop an adaptive edge deletion scheme where edges are removed with certain deletion probabilities. Based on the graph structure and sensitive attributes, the probabilities are assigned as
(7) 
where is the removal probability of the edge connecting nodes and , is a hyperparameter, and denotes the set of intraedges connecting the nodes in . Note that is chosen to be in this work, but it can be selected by schemes such as grid search to improve performance. While this graphlevel edge deletion scheme does not not directly minimize , it provides a balanced global structure such that , henceforth equation 6 holds in the augmented graph , see Appendix A.4 for more details.
Adaptive Edge Addition.
For graphs that are very sparse, edge deletion may not be an ideal graph augmentation as it may lead to unstable results. In this case, an adaptive edge addition framework is developed to obtain a more balanced graph structure. Specifically, for graphs where holds, pairs of the nodes are sampled uniformly from and with replacement. Then, a new edge is created to connect each sampled pair of nodes, in order to obtain an augmented graph for which equation 6 holds. Note that experimental results in Section 4 also show that edge addition may become a better alternative over edge deletion for graphs that are sparse in interedges.
Remark 2. Overall, while a subset of these augmentation schemes can be employed based on the input graph properties (sparse/dense, large/small), all schemes can also be employed on the input graph sequentially. The framework where node sampling, edge deletion, edge addition are employed sequentially together with feature masking is called ’FairAug’. It is worth emphasizing that edge augmentation schemes should always follow node sampling, as performing node sampling changes the distribution of edges. Since the cardinalities of different sets will be calculated only once (in preprocessing), we note that the proposed augmentations will not incur significant additional cost.
4 Experiments
In this section, experiments are carried out on realworld datasets for node classification and link prediction tasks. Performances of our proposed adaptive augmentations are compared with baseline schemes in terms of utility and fairness metrics.
4.1 Datasets and Settings
Datasets. Experiments are conducted on realworld social and citation networks consisting of Pokecz, Pokecn (Dai and Wang, 2020), UCSD34, Berkeley13 (Red et al., 2011)
, Cora, Citeseer, Pubmed. Pokecz and Pokecn sampled from a larger social network, Pokec
(Takac and Zabovsky, 2012), are used in node classification experiments where Pokec is a Facebooklike, social network used in Slovakia. While the original network includes millions of users, Pokecz and Pokecn are generated by collecting the information of users from two major regions (Dai and Wang, 2020). The region information is treated as the sensitive attribute, while the working field of the users is the label to be predicted in classification. Both attributes are binarized, see also
(Dai and Wang, 2020). UCSD34 and Berkeley13 are Facebook networks where edges are created based on the friendship information in social media. Each user (node) has dimensional nodal features including student/faculty status, gender, major, etc. Gender is utilized as the sensitive attribute in UCSD34 and Berkeley13. Cora, Citeseer, and Pubmed are citation networks that consider articles as nodes and descriptions of articles as their nodal attributes. In these datasets, the category of the articles is used as the sensitive attribute. Statistics for datasets are presented in Tables 5 and 6 of Appendix A.5.Evaluation Metrics. Performance of the node classification task is evaluated in terms of accuracy. Two quantitative group fairness metrics are used to assess the effectiveness of fairness aware strategies in terms of statistical parity: and equal opportunity: , where and denote the groundtruth and the predicted labels, respectively. Lower values for and imply better fairness performance (Dai and Wang, 2020). For the link prediction task, both accuracy and Area Under the Curve (AUC) are employed as utility metrics. As the fairness metrics, the definitions of statistical parity and equal opportunity are modified for link prediction such that and , where denotes the edges and is the decision for whether the edge exists.
Implementation details. The proposed augmentation scheme is tested on social networks with node representations generated through an unsupervised contrastive learning framework. Node classification and link prediction are employed as ensuing tasks that evaluate the performances of generated node representations. In addition, we also provide link prediction results obtained via an endtoend graph convolutional network (GCN) model on citation networks. Further details on the implementation of the experiments are provided in Appendix A.7. Contrastive learning is utilized to demonstrate the effects of the proposed augmentation schemes, as augmentations are inherently utilized in its original design (You et al., 2020). Specifically, GRACE (Zhu et al., 2020) is employed as our baseline framework. GRACE constructs two different graph views using random, nonadaptive augmentation schemes, which we replaced by augmentations obtained via FairAug in the experiments. For more details on the employed contrastive learning framework, see Appendix A.6.
Baselines. We present the performances on a total of baseline studies. As we examine our proposed augmentations in the context of contrastive learning, graph contrastive learning schemes are employed as the natural baselines. Said schemes are Deep Graph Infomax (DGI) (Veličković et al., 2019), Deep Graph Contrastive Representation Learning (GRACE) (Zhu et al., 2020), Graph Contrastive Learning with Adaptive Augmentations (GCA) (Zhu et al., 2021), and Fair and Stable Graph Representation Learning (NIFTY) (Agarwal et al., 2021). We note that the objective function of NIFTY (Agarwal et al., 2021) consists of both supervised and unsupervised components, and its results for the unsupervised setting are provided here given the scope of this paper. In addition, as another family of unsupervised approaches, random walkbased methods for unsupervised node representation generation are also considered. Such schemes include DeepWalk (Perozzi et al., 2014), Node2Vec (Grover and Leskovec, 2016), and FairWalk (Rahman et al., 2019). Lastly, for endtoend link prediction, we present results for the random edge dropout scheme (Rong et al., 2019).
4.2 Experimental Results
Pokecz  Pokecn  
Accuracy ()  ()  ()  Accuracy ()  ()  %  
DeepWalk  
Node2Vec  
FairWalk  
DGI  
GRACE  
GCA  
NIFTY  
FairAug  
FairAug wo ED 
The comparison between baselines and our proposed framework FairAug is presented in Table 1. Note that ’FairAug wo ED’ in Table 1 refers to FairAug where edge deletion (ED) is removed from the chain, but all node sampling (NS), edge addition (EA), and feature masking (FM) are still employed. Firstly, the results of Table 1 show that FairAug provides roughly reduction in fairness metrics over GRACE, the strategy it is built upon, while providing similar accuracy values. Second, we note that similar to our framework, GCA is built upon GRACE through adaptive augmentations as well. However, the adaptive augmentations utilized in GCA are not fairnessaware, and the results of Table 1 demonstrate that the effect of such augmentations on the fairness metrics is unpredictable. Third, the results indicate that all contrastive learning methods provide better fairness performance than random walkbased methods on evaluated datasets, including Fairwalk, which is a fairnessaware study. Since the sole information source of random walkbased studies is the graph structure, obtained results confirm that the graph topology indeed propagates bias, which is consistent with the motivation of our graph data augmentation design. Finally, the results of Table 1 demonstrate that the closest competitor scheme to FairAug is NIFTY (Agarwal et al., 2021) in terms of fairness measures. For , FairAug outperforms NIFTY on both datasets, whereas in terms of , FairAug and NIFTY outperform each other on Pokecz and Pokecn, respectively. Table 1 shows that the EDremoved version of FairAug, ’FairAug wo ED’, outperforms NIFTY on Pokecn as well, suggesting that at least one of the proposed strategies outperform all considered benchmark schemes in both and . This fairness performance improvement achieved by ED removal motivated us to conduct an ablation study on the building blocks of FairAug, which we present in the sequel.
Table 2 lists the results of an ablation study for FairAug, where the last four rows demonstrate the effects of the removal of FM, NS, ED, and EA, respectively. Pokecz and Pokecn are highly unbalanced datasets () with a considerable sparsity in interedges (see Table 5 of Appendix A.5). For such networks, the exact probabilities presented in equation 7 cannot be utilized for edge deletion as it would result in the deletion of a significantly large portion of the edges, damaging the graph structure. We employ an upper limit on the deletion probabilities to avoid this (see Appendix A.7). However, with such a limit, the proposed ED framework alone cannot sufficiently balance and . At this point, we note that since , node sampling also provides a way of decreasing by excluding nodes from the set when generating the subgraph. As the graph was already sparse initially in interedges (i.e., small ), employing both NS and ED causes a similar overdeletion of intraedges and creates unstable results due to the significantly distorted graph structure. Overall, combining this phenomenon with the results of Table 2, in order to balance the input graphs on Pokec networks consistently (which have and interedge sparsity), node sampling is observed to be a better choice than edge manipulations.
Pokecz  Pokecn  
Accuracy ()  ()  ()  Accuracy ()  ()  %  
GRACE  
FairAug  
FairAug wo FM  
FairAug wo NS  
FairAug wo ED  
FairAug wo EA 
UCSD34  Berkeley13  

AUC ()  ()  ()  AUC ()  ()  %  
GCA  
GRACE  
FairAug  
FairAug wo NS 
Table 3 presents the obtained results for link prediction in the framework of contrastive learning. Results demonstrate that ’FairAug wo NS’ consistently improves the fairness metrics of the framework it is built upon (GRACE), together with similar utility measures. In addition, comparing GCA and GRACE, obtained results confirm our previous assessment regarding the unpredictable effect of GCA’s augmentations on the fairness metrics. Finally, comparing FairAug with and without NS, the results of Table 3 show that in UCSD34 and Berkeley13, the employment of node sampling can be ineffective in improving fairness metrics, or can even worsen them. In UCSD34 and Berkeley13, we have (see Table 5 of Appendix A.5). Therefore, for NS on these datasets, half of the nodes are sampled randomly from the sets and (as the limit on the minimum node sampling budget is half of the initial group size to avoid a possible oversampling, see Appendix A.7). Since , such a sampling framework coincides with random sampling, which makes the effects of the proposed node sampling scheme unpredictable on the fairness metrics. Furthermore, while suggests that the cardinality of the set should be reduced when , actually appears in the upper bound and not in the exact expression. The removal of nodes with interedges is actually counterintuitive, as interedges generally help to reduce bias in graphs where (which holds for all datasets considered herein). Therefore, the proposed node sampling becomes effective for graph structures providing . The effect of node sampling on Facebook networks is further investigated through and in Appendix A.8, which corroborates the explanations on the random/detrimental effect of node sampling for these datasets.
As also noted in Section 1, even though FairAug is presented through its application on graph constrastive learning in this section, the proposed approach can be fully or partially used in conjunction to other learning frameworks as well. To exemplify such a use case, Table 4 lists the results for an endtoend link prediction task with a twolayer GCN where the proposed fair edge deletion scheme is employed as an edge dropout method. As the benchmark, random edge dropout (Rong et al., 2019) is considered, which is a scheme originally proposed to improve the generalizability of the model on unseen data (Rong et al., 2019). In the experiments, the number of deleted edges is the same for both strategies. The results demonstrate that our method ’Fair ED’ can indeed enhance the fairness of the learning framework it is employed in while it slightly reduces the utility measures.
Accuracy ()  AUC ()  ()  ()  

Cora  Edge Dropout  
Fair ED  
Citeseer  Edge Dropout  
Fair ED  
Pubmed  Edge Dropout  
Fair ED 
5 Conclusions
In this study, the source of bias in aggregated representations in a GNNbased framework has been theoretically analyzed. Based on the analysis, several fairnessaware augmentation schemes have been introduced on both graph structure and nodal features. The proposed augmentations can be flexibly utilized together with several GNNbased learning methods. In addition, they can readily be employed in unsupervised node representation learning schemes such as graph contrastive learning. Experimental results on realworld graphs demonstrate that the proposed adaptive augmentations can improve fairness metrics with comparable utilities to stateoftheart in node classification and link prediction.
References
 Towards a unified framework for fair and stable graph representation learning. arXiv preprint arXiv:2102.13186. Cited by: §1, §2, §4.1, §4.2.
 Distributed largescale natural graph factorization. In Proc. International Conference on World Wide Web (WWW), pp. 37–48. Cited by: §2.
 Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075. Cited by: §1.
 Compositional fairness constraints for graph embeddings. In International Conference on Machine Learning (ICML, pp. 715–724. Cited by: §2.
 Debayes: a bayesian method for debiasing network embeddings. In International Conference on Machine Learning (ICML), pp. 1220–1229. Cited by: §2.
 The kldivergence between a graph model and its fair iprojection as a fairness regularizer. arXiv preprint arXiv:2103.01846. Cited by: §2.
 Grarep: learning graph representations with global structural information. In Proc. ACM International Conference on Information and Knowledge Management (CIKM), pp. 891–900. Cited by: §2.

Measuring and relieving the oversmoothing problem for graph neural networks from the topological view.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 34, pp. 3438–3445. Cited by: §2.  Harp: hierarchical representation learning for networks. In Proc. AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
 A simple framework for contrastive learning of visual representations. In Proc. International Conference on Machine Learning (ICML), pp. 1597–1607. Cited by: §A.6.
 Say no to the discrimination: learning fair graph neural networks with limited sensitive attribute information. arXiv preprint arXiv:2009.01454. Cited by: §1, §2, §3, §4.1, §4.1.
 Individual fairness for graph neural networks: a ranking based approach. In Proc ACM Conference on Knowledge Discovery & Data Mining (SIGKDD), pp. 300–310. Cited by: §2.
 Fairness through awareness. In Proc. Innovations in Theoretical Computer Science (ITCS), pp. 214–226. Cited by: §1.
 Debiasing knowledge graph embeddings. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7332–7345. Cited by: §2.
 Learning graph representations with embedding propagation. In Proc. International Conference on Neural Information Processing Systems (NeurIPS), pp. 5125–5136. Cited by: §1, §2.
 Understanding the difficulty of training deep feedforward neural networks. In Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 249–256. Cited by: §A.7.
 Majorization and dynamics of continuous distributions. Entropy 21 (6), pp. 590. Cited by: §A.2.
 Node2vec: scalable feature learning for networks. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Cited by: §2, §4.1.
 A methodology for direct and indirect discrimination prevention in data mining. IEEE Transactions on Knowledge and Data Engineering 25 (7), pp. 1445–1459. External Links: Document Cited by: §1, §3.2.
 Inductive representation learning on large graphs. In Proc. International Conference on Neural Information Processing Systems (NeurIPS), pp. 1025–1035. Cited by: §2.
 Contrastive multiview representation learning on graphs. In Proc. International Conference on Machine Learning (ICML), pp. 4116–4126. Cited by: §2.

Learning deep representations by mutual information estimation and maximization
. In Proc. International Conference on Learning Representations (ICLR), Cited by: §1, §2.  Sources of segregation in social networks: a novel approach using facebook. American Sociological Review 82 (3), pp. 625–656. Cited by: §1.
 Hierarchical graph convolutional networks for semisupervised node classification. In Proc. International Joint Conference on Artificial Intelligence, (IJCAI), Cited by: §2.

Invariant information clustering for unsupervised image classification and segmentation.
In
Proc. IEEE International Conference on Computer Vision (ICCV)
, pp. 9865–9874. Cited by: §2. 
Addressing crime situation forecasting task with temporal graph convolutional neural network approach
. In Proc. International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 474–478. External Links: Document Cited by: §1. 
Data augmentation for visual question answering.
In
Proceedings of the 10th International Conference on Natural Language Generation
, pp. 198–202. Cited by: §1, §2.  Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §A.7, §A.7.
 Semisupervised classification with graph convolutional networks. In Proc. International Conference on Learning Representations (ICLR), Cited by: §A.7, §1, §2.
 All of the fairness for edge prediction with optimal transport. In International Conference on Artificial Intelligence and Statistics, pp. 1774–1782. Cited by: §2.
 On dyadic fairness: exploring and mitigating bias in graph connections. In Proc. International Conference on Learning Representations (ICLR), Cited by: §2.
 Subgroup generalization and fairness of graph neural networks. arXiv preprint arXiv:2106.15535. Cited by: §2.
 Inequalities: theory of majorization and its applications. Vol. 143, Springer. Cited by: §A.2.
 You are who you know: inferring user profiles in online social networks. In Proc. ACM International Conference on Web Search and Data Mining (WSDM), pp. 251–260. Cited by: §3.3.3.
 Spatiotemporal deep graph infomax. arXiv preprint arXiv:1904.06316. Cited by: §A.6, §1, §2.
 Asymmetric transitivity preserving graph embedding. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1105–1114. Cited by: §2.
 Graph representation learning via graphical mutual information maximization. In Proc. Web Conference (WWW), pp. 259–270. Cited by: §2.
 DeepWalk: online learning of social representations. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 701–710. External Links: Link Cited by: §2, §4.1.
 Fairwalk: towards fair graph embedding.. In Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 3289–3295. Cited by: §2, §3.3.3, §4.1.
 Comparing community structure to characteristics in online collegiate social networks. SIAM review 53 (3), pp. 526–543. Cited by: §4.1.
 Dropedge: towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903. Cited by: §2, §3.3, §4.1, §4.2.

A survey on image data augmentation for deep learning
. Journal of Big Data 6 (1), pp. 1–48. Cited by: §1, §2.  Biased edge dropout for enhancing fairness in graph representation learning. arXiv preprint arXiv:2104.14210. Cited by: §2.
 Data analysis in public social networks. In International Scientific Conference and International Workshop. ’Present Day Trends of Innovations’, Vol. 1. Cited by: §4.1.
 Line: largescale information network embedding. In Proc. International Conference on World Wide Web (WWW), pp. 1067–1077. Cited by: §2.
 Graph attention networks. Proc. International Conference on Learning Representations (ICLR). Cited by: §1, §2.
 Deep graph infomax. In Proc. International Conference on Learning Representations (ICLR), External Links: Link Cited by: §A.6, §A.7, §2, §3.3, §4.1.
 Simplifying graph convolutional networks. In Proc. International Conference on Machine Learning (ICML), pp. 6861–6871. Cited by: §2.

Unsupervised feature learning via nonparametric instance discrimination.
In
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 3733–3742. Cited by: §2.  Unsupervised embedding learning via invariant and spreading instance feature. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6210–6219. Cited by: §2.
 Graph contrastive learning with augmentations. Proc. International Conference on Neural Information Processing Systems (NeurIPS) 33. Cited by: §2, §3.3.1, §3.3, §4.1.
 Fair representation learning for heterogeneous information networks. arXiv preprint arXiv:2104.08769. Cited by: §2.
 Characterlevel convolutional networks for text classification. Advances in neural information processing systems 28, pp. 649–657. Cited by: §1, §2.
 Data augmentation for graph neural networks. arXiv preprint arXiv:2006.06830. Cited by: §2, §3.3.
 Deep Graph Contrastive Representation Learning. In Proc. International Conference on Machine Learning (ICML) Workshop on Graph Representation Learning and Beyond, External Links: Link Cited by: §A.6, §A.6, §A.7, §A.7, §2, §3.3.1, §3.3, §4.1, §4.1.
 Graph contrastive learning with adaptive augmentation. In Proc. Web Conference (WWW), Cited by: §A.6, §A.6, §A.7, §A.7, §2, §3.3, §4.1.
Appendix A Appendix
a.1 Proof of Theorem 1
Let Hence, the elements of centered sensitive attribute vector can be written as
(8) 
where and is the element of matrix at row , column . Using values, the equation 8 becomes
(9) 
Similarly analysis for follows as
(10) 
Define and denote the vector whose elements are s. Then equals to
(11) 
where represents the Hadamard product. Therefore, follows as
(12) 
We first consider the terms and individually
(13) 
Similarly, the expression for the term can also be derived.
(14) 
Define , the following can be written by equation 13 and equation 14