DeepAI
Log In Sign Up

Fair Node Representation Learning via Adaptive Data Augmentation

Node representation learning has demonstrated its efficacy for various applications on graphs, which leads to increasing attention towards the area. However, fairness is a largely under-explored territory within the field, which may lead to biased results towards underrepresented groups in ensuing tasks. To this end, this work theoretically explains the sources of bias in node representations obtained via Graph Neural Networks (GNNs). Our analysis reveals that both nodal features and graph structure lead to bias in the obtained representations. Building upon the analysis, fairness-aware data augmentation frameworks on nodal features and graph structure are developed to reduce the intrinsic bias. Our analysis and proposed schemes can be readily employed to enhance the fairness of various GNN-based learning mechanisms. Extensive experiments on node classification and link prediction are carried out over real networks in the context of graph contrastive learning. Comparison with multiple benchmarks demonstrates that the proposed augmentation strategies can improve fairness in terms of statistical parity and equal opportunity, while providing comparable utility to state-of-the-art contrastive methods.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/09/2021

Fairness-Aware Node Representation Learning

Node representation learning has demonstrated its effectiveness for vari...
05/20/2022

FairNorm: Fair and Fast Graph Neural Network Training

Graph neural networks (GNNs) have been demonstrated to achieve state-of-...
04/29/2021

Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning

Graph representation learning has become a ubiquitous component in many ...
10/06/2022

Uncovering the Structural Fairness in Graph Contrastive Learning

Recent studies show that graph convolutional network (GCN) often perform...
05/06/2021

CrossWalk: Fairness-enhanced Node Representation Learning

The potential for machine learning systems to amplify social inequities ...
12/22/2022

Graph Learning with Localized Neighborhood Fairness

Learning fair graph representations for downstream applications is becom...
10/26/2021

Tackling Oversmoothing of GNNs with Contrastive Learning

Graph neural networks (GNNs) integrate the comprehensive relation of gra...

1 Introduction

Graphs are widely used in modeling and analyzing complex systems such as biological networks or financial markets, which leads to a rise in attention towards various machine learning (ML) tasks over graphs. Specifically, node representation learning is a field with growing popularity. Node representations are mappings from nodes to vector embeddings containing both structural and attributive information. Their applicability on ensuing tasks has enabled various applications such as traffic forecasting

(Opolka et al., 2019), and crime forecasting (Jin et al., 2020)

. Graph neural networks (GNNs) have been prevalently used for representation learning, where node embeddings are created by repeatedly aggregating information from neighbors for both supervised and unsupervised learning tasks

(Kipf and Welling, 2017; Veličković et al., 2018; García-Durán and Niepert, 2017).

It has been shown that ML models propagate pre-existing bias in training data, which may lead to discriminative results for ensuing applications (Dwork et al., 2012; Beutel et al., 2017). Particular to ML over graphs, while GNN-based methods achieve state-of-the-art results for graph representation learning, they also amplify already existing biases in training data (Dai and Wang, 2020). For example, nodes in social networks tend to connect to other nodes with similar attributes, leading to denser connectivity between nodes with same sensitive attributes (e.g., gender) (Hofstra et al., 2017). Thus, by aggregating information from the neighbors, the representations obtained by GNNs may be highly correlated with the sensitive attributes. This causes discrimination in ensuing tasks even when the sensitive attributes are not directly used in training (Hajian and Domingo-Ferrer, 2013).

Data augmentation has been widely utilized to improve generalizability in trained models, as well as enable learning in unsupervised methods such as contrastive or self-supervised learning. Augmentation schemes have been extensively studied in vision

(Shorten and Khoshgoftaar, 2019; Hjelm et al., 2018)

and natural language processing

(Zhang et al., 2015; Kafle et al., 2017). However, there are comparatively limited work in the graph domain due to the complex, non-Euclidean structure of graphs. To the best of our knowledge, (Agarwal et al., 2021) is the only study that designs fairness-aware graph data augmentation in the contrastive learning framework to reduce bias.

This study theoretically investigates the sources of bias in GNN-based learning and in turn improves fairness in node representations by employing fairness-aware graph data augmentation schemes. Proposed schemes corrupt both input graph topology and nodal features adaptively, in order to reduce the corresponding terms in the analysis that lead to bias. Although the proposed schemes are presented over their applications using contrastive learning, the introduced augmentation strategies can be flexibly utilized in several GNN-based learning approaches together with other fairness-enhancement methods. Our contributions in this paper can be summarized as follows:
c1) We theoretically analyze the sources of bias that is propagated towards node representations in a GNN-based learning framework.
c2) Based on the analysis, we develop novel fairness-aware graph data augmentations that can reduce potential bias in learning node representations. Our approach is adaptive to both input graph and sensitive attributes, and to the best of our knowledge, is the first study that tackles fairness enhancement through adaptive graph augmentation design.
c3) The proposed strategies incur low additional computation complexity compared to non-adaptive counterparts, and are compatible to operate in conjunction with various GNN-based learning frameworks, including other fairness enhancement methods.
c4) Theoretical analysis is provided to corroborate the effectiveness of the proposed feature masking and node sampling augmentation schemes.
c5) Performance of the proposed graph data augmentation schemes is evaluated on real networks for both node classification and link prediction tasks. It is shown that compared to state-of-the-art graph contrastive learning methods, the novel augmentation schemes improve fairness metrics while providing comparable utility measures.

2 Related Work

Representation learning on graphs. Conventional graph representation learning approaches can be summarized under two categories: factorization-based and random walk-based approaches. Factorization-based methods aim to minimize the difference between the inner product of node representations and a deterministic similarity metric between them (Ahmed et al., 2013; Cao et al., 2015; Ou et al., 2016). Random walk-based approaches, on the other hand, employ stochastic measures of similarity between nodes (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Chen et al., 2018). GNNs have gained popularity in representation learning, for both supervised (Kipf and Welling, 2017; Veličković et al., 2018; Hu et al., 2019; Wu et al., 2019), and unsupervised tasks, e.g., (García-Durán and Niepert, 2017; Hamilton et al., 2017). Specifically, recent success of contrastive learning on visual representation learning (Wu et al., 2018; Ye et al., 2019; Ji et al., 2019) has paved the way for contrastive learning for unsupervised graph representation learning.

Graph data augmentation. Augmentation strategies have been extensively investigated in vision (Shorten and Khoshgoftaar, 2019; Hjelm et al., 2018) and natural language processing (Zhang et al., 2015; Kafle et al., 2017) domains. However, the area is comparatively under-explored in the graph domain due to the complex, non-Euclidean topology of graphs. Graph augmentation based on graph structure modification has been developed to improve the utility of ensuing tasks (Rong et al., 2019; Zhao et al., 2020; Chen et al., 2020). Meanwhile, graph data augmentation has been used to generate graph views for unsupervised graph contrastive learning, see, e.g.,  (Veličković et al., 2019; Opolka et al., 2019; Zhu et al., 2020, 2021), which achieves state-of-the-art results in various learning tasks over graphs such as node classification, regression, and link prediction  (Opolka et al., 2019; Veličković et al., 2019; You et al., 2020; Zhu et al., 2020; Peng et al., 2020; Hassani and Khasahmadi, 2020). Among which, (Zhu et al., 2020) is the first study that aims to maximize the agreement of node-level embeddings across two corrupted graph views. Building upon (Zhu et al., 2020), (Zhu et al., 2021) develops adaptive augmentation schemes with respect to various node centrality measures and achieves better results. However, none of these studies are fairness-aware.

Fairness-aware learning on graphs. A pioneering study tackling the fairness problem in graph representation learning based on random walks is developed in (Rahman et al., 2019). In addition, adversarial regularization is employed to account for fairness of node representations (Dai and Wang, 2020; Bose and Hamilton, 2019; Fisher et al., 2020) where (Dai and Wang, 2020) is presented specifically for node classification, and (Fisher et al., 2020)

works on knowledge graphs.

(Buyl and De Bie, 2020) also aims to create fair node representations by utilizing a Bayesian approach where sensitive information is modeled in the prior distribution. Contrary to these aforementioned works, our framework is built on a theoretical analysis (developed within this paper). Similar to the works mentioned above, the proposed methods can be utilized within the learning process to mitigate bias by modifying the learned model (i.e., in-processing fairness strategy). In addition, the proposed schemes can also be regarded as “pre-processing” tools, implying their compatibility to a wide array of GNN-based learning schemes in a versatile manner. Furthermore, (Ma et al., 2021) carries out a PAC-Bayesian analysis and connects the concept of subgroup generalization to accuracy disparity, and (Zeng et al., 2021) introduces several methods including GNN-based ones to decrease the bias for the representations of heterogeneous information networks. While (Li et al., 2021; Laclau et al., 2021) modify adjacency to improve different fairness measures specifically for link prediction, (Buyl and De Bie, 2021) designs a regularizer for the same purpose. With a specific interest on individual fairness over graphs, (Dong et al., 2021) employs a ranking-based strategy. A biased edge dropout scheme is proposed in (Spinelli et al., 2021) to improve fairness. However, the scheme therein is not adaptive to the graph structure (the parameters of the framework are independent of the input graph topology). Fairness-aware graph contrastive learning is first studied in (Agarwal et al., 2021), where a layer-wise weight normalization scheme along with graph augmentations is introduced. However, the fairness-aware augmentation utilized therein is designed primarily for counterfactual fairness.

3 Fairness in GNN-based Representation Learning

GNN-based approaches are the state-of-the-art for node representation learning. However, it has been demonstrated that the utilization of graph structure in the learning process not just propagates but also amplifies a possible bias towards certain sensitive groups (Dai and Wang, 2020). To this end, this section investigates the sources of bias in the generated representations via GNN-based learning. It carries out an analysis revealing that both nodal features and graph structure lead to bias, for which several graph data augmentation frameworks are introduced.

3.1 Preliminaries

This study aims to learn fairness-aware nodal representations for a given graph where denotes the node set, and represents the edge set. and are used to denote the feature and adjacency matrices, respectively, with the -th entry if and only if . Degree matrix is defined to be a diagonal matrix with the th diagonal entry denoting the degree of . For the fairness examination, sensitive attributes of the nodes are represented with , where the existence of a single, binary sensitive attribute is considered. In this work, unsupervised methods are chosen as enabling schemes for the representation generation where given the inputs , and , the main purpose is to learn a mapping that generates dimensional (generally ) unbiased nodal representations through an -layer GNN, which can be used in an ensuing task such as node classification. , , and denote the feature vector, representation at layer and the sensitive attribute of node . Furthermore, and denote the set of nodes whose sensitive attributes are and , respectively. Define inter-edge set , while intra-edge set is defined as . Similarly, the set of nodes having at least one inter edge is denoted by , while defines the set of nodes that have no inter-edges. The intersection of the sets is denoted as . Additionally, and denote the numbers of inter-edges and intra-edges adjacent to , respectively. Finally, denotes the entry-wise absolute value for scalar or vector inputs, while it is used for the cardinality when the input is a set.

3.2 Analysis for Bias in GNN Representations

This subsection presents an analysis to find out the sources of bias in node representations generated by GNNs. Analysis is developed for the mean aggregation scheme in which aggregated representations at layer , , are generated such that for , where is the th row of corresponding to node , denotes the degree of node , refers to the neighbor set of node (including itself). The recursive relation in a GNN layer in which left normalization is applied for feature smoothing is where is the weight matrix in layer , and

denotes an identity matrix. With these definitions, the relation between the aggregated information

and node representations becomes equivalent to at the th GNN layer. As the provided analysis is applicable to every layer, superscript is dropped in the following to keep the notation simple.

It has been demonstrated that features that are correlated with the sensitive attribute result in bias even when the sensitive attribute is not utilized in the learning process (Hajian and Domingo-Ferrer, 2013). This work provides an analysis on the correlation of with , and aims to reduce it. Note that, the reduction of correlation can still allow the generation of discriminable representations for different class labels, if the discriminability is provided by non-sensitive attributes. The (sample) correlation between the sensitive attributes and aggregated representations can be written as

where is the th column of . In the analysis, following assumptions are made:
A1: Node representations have sample means and respectively across each group, where . Throughout the paper, denotes the sample mean operation.
A2: Node representations have finite maximal deviations and : That is, , with .
Based on these assumptions, the following theorem shows that can be bounded from above, which will serve as a guideline to design a fairness-aware graph data augmentation scheme.

Theorem 1.

The total correlation between the sensitive attributes and representations that are obtained after a mean aggregation over graph , , can be bounded above by

(1)

where , with , , , , , .

The proof is given in Appendix A.1. The upper bound in equation 1 can be lowered by i) utilizing feature masking which has an effect on the term at the first layer, ii) node sampling that can change the value of , iii) edge augmentations that can reduce the value of .

3.3 Fair Graph Data Augmentations

Data augmentation has been studied extensively in order to enable certain unsupervised learning schemes such as contrastive learning, self-supervised learning or as a general framework to improve the generalizability of the trained models over unseen data. However, the design of graph data augmentations is still a developing research area due to the challenges introduced by complex, non-Euclidean graph structure. Several augmentation schemes over the graph structure are proposed in order to enhance the generalizability of GNNs (Rong et al., 2019; Zhao et al., 2020), while both topological (e.g., edge/node deletion) and attributive (e.g., feature shuffling/masking) corruption schemes have been developed in the context of contrastive learning (Veličković et al., 2019; You et al., 2020; Zhu et al., 2021, 2020). However, none of these works are fairness-aware. Hence, in this work, novel data augmentation schemes that are adaptive to the sensitive attributes, as well as the input graph structure are introduced with Theorem 1 as guidelines.

3.3.1 Feature Masking

In this subsection, an augmentation framework on nodal features is presented in order to mitigate possible intrinsic bias propagated by them. Note that in equation 1 is minimized when all nodal features are the same (i.e., all nodal features masked/zeroed out). However, this would result in the loss of all information in nodal features. Motivated by this, the proposed scheme aims to improve uniform feature masking in terms of reducing for a given masking budget (a total amount of nodal features to be masked in expectation). Specifically, the random feature masking scheme used in (You et al., 2020; Zhu et al., 2020)

where each feature has the same masking probability is modified to assign higher masking probabilities to the features varying more across different sensitive groups. Thus, masking probabilities are generated based on the term

. Let denote the normalized , the feature masking probability can then be designed as

(2)

where

is a hyperparameter. The feature mask

is then generated as a random binary vector, with the -th entry of

drawn independently from Bernoulli distribution with

for . The augmented feature matrix is obtained via

(3)

where is the concatenation operator, and is the Hadamard product. Since the proposed feature masking scheme is probabilistic in nature, the resulting is a random vector with entry having

(4)

where is the probability that the feature is not masked in the graph view. The following proposition shows that the novel, adaptive feature masking approach can decrease compared to random feature masking, the proof of which can be found in Appendix A.2.

Proposition 1.

In expectation, the proposed adaptive feature masking scheme results in a lower value compared to uniform feature masking, meaning

(5)

where corresponds to uniform masking with masking probability , and .

3.3.2 Node Sampling

In this subsection, an adaptive node sampling framework is introduced to decrease the term in equation 1 of Theorem 1, and hence to reduce the intrinsic bias that the graph topology can create. A small suggests a more balanced population distribution with respect to and

. Specifically, a subset of nodes is selected at every epoch and the training is carried over the subgraph induced by the sampled nodes. This augmentation mainly aims at reducing the bias by selecting a subset of more balanced groups, meanwhile it also helps reduce the computational and memory complexity in training.

The proposed node sampling is adaptive to the input graph, that is, it depends on the cardinalities of the sets , and . The developed scheme copes with the case . In algorithm design, it is assumed that if then and (same for ), which holds for all real graphs in our experiments, but our design principles can be readily extended to different settings as well.

Given input graph , the augmented graph can be obtained as an induced subgraph from a subset of nodes . All nodes in are retained, ( and ), while subsets of nodes and are randomly sampled from and with sample sizes and , respectively . See also Algorithm 1 in Appendix A.3. The cardinalities of node sets in the resulting graph augmentation satisfy . (See Appendix A.3 for details.)

Remark 1.

Note that the resulting graph yields as long as and is satisfied for any . The presented scheme here simply sets , which results in a balanced ratio across groups, but the performance can be improved if is selected carefully for specific datasets.

3.3.3 Augmentation on Graph Connectivity

Minimizing to zero in Theorem 1 suggests a graph topology where all nodes in the network have the same number of neighbors from each sensitive group, i.e., . Since, for this scenario, . Note that this finding is parallel to the main design idea of Fairwalk (Rahman et al., 2019) in which the transition probabilities are equalized for different sensitive groups in random walks in order to reduce bias in random walk-based representations.

This finding suggests that an ideal augmented graph could be generated by deleting edges from or adding edges to such that each node has exactly the same number of neighbors from each sensitive group. However, such a per-node sampling scheme is computationally complex and may not be ideal for large-scale graphs. To this end, we propose global probabilistic edge augmentation schemes such that in the augmented graph,

(6)

Here the expectation is taken with respect to the randomness in the augmentation design. It is shown in our experiments that the global approach can indeed help to reduce the value of (see Appendix A.8). Note that even though the strategy is presented for the case where (which holds for all datasets considered herein), the scheme can be easily generalized to the case where .

In social networks, users connect to other users that are similar to themselves with higher probabilities (Mislove et al., 2010), hence the graph connectivity naturally inherits bias towards potential minority groups. Motivated by the reduction of , the present subsection introduces augmentation schemes over edges to provide a balanced graph structure that can mitigate such bias.

Adaptive Edge Deletion.

To obtain a balanced graph structure, we first develop an adaptive edge deletion scheme where edges are removed with certain deletion probabilities. Based on the graph structure and sensitive attributes, the probabilities are assigned as

(7)

where is the removal probability of the edge connecting nodes and , is a hyper-parameter, and denotes the set of intra-edges connecting the nodes in . Note that is chosen to be in this work, but it can be selected by schemes such as grid search to improve performance. While this graph-level edge deletion scheme does not not directly minimize , it provides a balanced global structure such that , henceforth equation 6 holds in the augmented graph , see Appendix A.4 for more details.

Adaptive Edge Addition.

For graphs that are very sparse, edge deletion may not be an ideal graph augmentation as it may lead to unstable results. In this case, an adaptive edge addition framework is developed to obtain a more balanced graph structure. Specifically, for graphs where holds, pairs of the nodes are sampled uniformly from and with replacement. Then, a new edge is created to connect each sampled pair of nodes, in order to obtain an augmented graph for which equation 6 holds. Note that experimental results in Section 4 also show that edge addition may become a better alternative over edge deletion for graphs that are sparse in inter-edges.

Remark 2. Overall, while a subset of these augmentation schemes can be employed based on the input graph properties (sparse/dense, large/small), all schemes can also be employed on the input graph sequentially. The framework where node sampling, edge deletion, edge addition are employed sequentially together with feature masking is called ’FairAug’. It is worth emphasizing that edge augmentation schemes should always follow node sampling, as performing node sampling changes the distribution of edges. Since the cardinalities of different sets will be calculated only once (in pre-processing), we note that the proposed augmentations will not incur significant additional cost.

4 Experiments

In this section, experiments are carried out on real-world datasets for node classification and link prediction tasks. Performances of our proposed adaptive augmentations are compared with baseline schemes in terms of utility and fairness metrics.

4.1 Datasets and Settings

Datasets. Experiments are conducted on real-world social and citation networks consisting of Pokec-z, Pokec-n (Dai and Wang, 2020), UCSD34, Berkeley13 (Red et al., 2011)

, Cora, Citeseer, Pubmed. Pokec-z and Pokec-n sampled from a larger social network, Pokec

(Takac and Zabovsky, 2012), are used in node classification experiments where Pokec is a Facebook-like, social network used in Slovakia. While the original network includes millions of users, Pokec-z and Pokec-n are generated by collecting the information of users from two major regions (Dai and Wang, 2020)

. The region information is treated as the sensitive attribute, while the working field of the users is the label to be predicted in classification. Both attributes are binarized, see also

(Dai and Wang, 2020). UCSD34 and Berkeley13 are Facebook networks where edges are created based on the friendship information in social media. Each user (node) has dimensional nodal features including student/faculty status, gender, major, etc. Gender is utilized as the sensitive attribute in UCSD34 and Berkeley13. Cora, Citeseer, and Pubmed are citation networks that consider articles as nodes and descriptions of articles as their nodal attributes. In these datasets, the category of the articles is used as the sensitive attribute. Statistics for datasets are presented in Tables 5 and 6 of Appendix A.5.

Evaluation Metrics. Performance of the node classification task is evaluated in terms of accuracy. Two quantitative group fairness metrics are used to assess the effectiveness of fairness aware strategies in terms of statistical parity: and equal opportunity: , where and denote the ground-truth and the predicted labels, respectively. Lower values for and imply better fairness performance (Dai and Wang, 2020). For the link prediction task, both accuracy and Area Under the Curve (AUC) are employed as utility metrics. As the fairness metrics, the definitions of statistical parity and equal opportunity are modified for link prediction such that and , where denotes the edges and is the decision for whether the edge exists.

Implementation details. The proposed augmentation scheme is tested on social networks with node representations generated through an unsupervised contrastive learning framework. Node classification and link prediction are employed as ensuing tasks that evaluate the performances of generated node representations. In addition, we also provide link prediction results obtained via an end-to-end graph convolutional network (GCN) model on citation networks. Further details on the implementation of the experiments are provided in Appendix A.7. Contrastive learning is utilized to demonstrate the effects of the proposed augmentation schemes, as augmentations are inherently utilized in its original design (You et al., 2020). Specifically, GRACE (Zhu et al., 2020) is employed as our baseline framework. GRACE constructs two different graph views using random, non-adaptive augmentation schemes, which we replaced by augmentations obtained via FairAug in the experiments. For more details on the employed contrastive learning framework, see Appendix A.6.

Baselines. We present the performances on a total of baseline studies. As we examine our proposed augmentations in the context of contrastive learning, graph contrastive learning schemes are employed as the natural baselines. Said schemes are Deep Graph Infomax (DGI) (Veličković et al., 2019), Deep Graph Contrastive Representation Learning (GRACE) (Zhu et al., 2020), Graph Contrastive Learning with Adaptive Augmentations (GCA) (Zhu et al., 2021), and Fair and Stable Graph Representation Learning (NIFTY) (Agarwal et al., 2021). We note that the objective function of NIFTY (Agarwal et al., 2021) consists of both supervised and unsupervised components, and its results for the unsupervised setting are provided here given the scope of this paper. In addition, as another family of unsupervised approaches, random walk-based methods for unsupervised node representation generation are also considered. Such schemes include DeepWalk (Perozzi et al., 2014), Node2Vec (Grover and Leskovec, 2016), and FairWalk (Rahman et al., 2019). Lastly, for end-to-end link prediction, we present results for the random edge dropout scheme (Rong et al., 2019).

4.2 Experimental Results

Pokec-z Pokec-n
Accuracy () () () Accuracy () () %
DeepWalk
Node2Vec
FairWalk
DGI
GRACE
GCA
NIFTY
FairAug
FairAug wo ED
Table 1: Comparative Results with Baselines on Node Classification

The comparison between baselines and our proposed framework FairAug is presented in Table 1. Note that ’FairAug wo ED’ in Table 1 refers to FairAug where edge deletion (ED) is removed from the chain, but all node sampling (NS), edge addition (EA), and feature masking (FM) are still employed. Firstly, the results of Table 1 show that FairAug provides roughly reduction in fairness metrics over GRACE, the strategy it is built upon, while providing similar accuracy values. Second, we note that similar to our framework, GCA is built upon GRACE through adaptive augmentations as well. However, the adaptive augmentations utilized in GCA are not fairness-aware, and the results of Table 1 demonstrate that the effect of such augmentations on the fairness metrics is unpredictable. Third, the results indicate that all contrastive learning methods provide better fairness performance than random walk-based methods on evaluated datasets, including Fairwalk, which is a fairness-aware study. Since the sole information source of random walk-based studies is the graph structure, obtained results confirm that the graph topology indeed propagates bias, which is consistent with the motivation of our graph data augmentation design. Finally, the results of Table 1 demonstrate that the closest competitor scheme to FairAug is NIFTY (Agarwal et al., 2021) in terms of fairness measures. For , FairAug outperforms NIFTY on both datasets, whereas in terms of , FairAug and NIFTY outperform each other on Pokec-z and Pokec-n, respectively. Table 1 shows that the ED-removed version of FairAug, ’FairAug wo ED’, outperforms NIFTY on Pokec-n as well, suggesting that at least one of the proposed strategies outperform all considered benchmark schemes in both and . This fairness performance improvement achieved by ED removal motivated us to conduct an ablation study on the building blocks of FairAug, which we present in the sequel.

Table 2 lists the results of an ablation study for FairAug, where the last four rows demonstrate the effects of the removal of FM, NS, ED, and EA, respectively. Pokec-z and Pokec-n are highly unbalanced datasets () with a considerable sparsity in inter-edges (see Table 5 of Appendix A.5). For such networks, the exact probabilities presented in equation 7 cannot be utilized for edge deletion as it would result in the deletion of a significantly large portion of the edges, damaging the graph structure. We employ an upper limit on the deletion probabilities to avoid this (see Appendix A.7). However, with such a limit, the proposed ED framework alone cannot sufficiently balance and . At this point, we note that since , node sampling also provides a way of decreasing by excluding nodes from the set when generating the subgraph. As the graph was already sparse initially in inter-edges (i.e., small ), employing both NS and ED causes a similar over-deletion of intra-edges and creates unstable results due to the significantly distorted graph structure. Overall, combining this phenomenon with the results of Table 2, in order to balance the input graphs on Pokec networks consistently (which have and inter-edge sparsity), node sampling is observed to be a better choice than edge manipulations.

Pokec-z Pokec-n
Accuracy () () () Accuracy () () %
GRACE
FairAug
FairAug wo FM
FairAug wo NS
FairAug wo ED
FairAug wo EA
Table 2: Ablation Study on Node Classification
UCSD34 Berkeley13
AUC () () () AUC () () %
GCA
GRACE
FairAug
FairAug wo NS
Table 3: Link prediction results obtained on node representations

Table 3 presents the obtained results for link prediction in the framework of contrastive learning. Results demonstrate that ’FairAug wo NS’ consistently improves the fairness metrics of the framework it is built upon (GRACE), together with similar utility measures. In addition, comparing GCA and GRACE, obtained results confirm our previous assessment regarding the unpredictable effect of GCA’s augmentations on the fairness metrics. Finally, comparing FairAug with and without NS, the results of Table 3 show that in UCSD34 and Berkeley13, the employment of node sampling can be ineffective in improving fairness metrics, or can even worsen them. In UCSD34 and Berkeley13, we have (see Table 5 of Appendix A.5). Therefore, for NS on these datasets, half of the nodes are sampled randomly from the sets and (as the limit on the minimum node sampling budget is half of the initial group size to avoid a possible over-sampling, see Appendix A.7). Since , such a sampling framework coincides with random sampling, which makes the effects of the proposed node sampling scheme unpredictable on the fairness metrics. Furthermore, while suggests that the cardinality of the set should be reduced when , actually appears in the upper bound and not in the exact expression. The removal of nodes with inter-edges is actually counter-intuitive, as inter-edges generally help to reduce bias in graphs where (which holds for all datasets considered herein). Therefore, the proposed node sampling becomes effective for graph structures providing . The effect of node sampling on Facebook networks is further investigated through and in Appendix A.8, which corroborates the explanations on the random/detrimental effect of node sampling for these datasets.

As also noted in Section 1, even though FairAug is presented through its application on graph constrastive learning in this section, the proposed approach can be fully or partially used in conjunction to other learning frameworks as well. To exemplify such a use case, Table 4 lists the results for an end-to-end link prediction task with a two-layer GCN where the proposed fair edge deletion scheme is employed as an edge dropout method. As the benchmark, random edge dropout (Rong et al., 2019) is considered, which is a scheme originally proposed to improve the generalizability of the model on unseen data (Rong et al., 2019). In the experiments, the number of deleted edges is the same for both strategies. The results demonstrate that our method ’Fair ED’ can indeed enhance the fairness of the learning framework it is employed in while it slightly reduces the utility measures.

Accuracy () AUC () () ()
Cora Edge Dropout
Fair ED
Citeseer Edge Dropout
Fair ED
Pubmed Edge Dropout
Fair ED
Table 4: Employment of fair edge deletion as an edge dropout method

5 Conclusions

In this study, the source of bias in aggregated representations in a GNN-based framework has been theoretically analyzed. Based on the analysis, several fairness-aware augmentation schemes have been introduced on both graph structure and nodal features. The proposed augmentations can be flexibly utilized together with several GNN-based learning methods. In addition, they can readily be employed in unsupervised node representation learning schemes such as graph contrastive learning. Experimental results on real-world graphs demonstrate that the proposed adaptive augmentations can improve fairness metrics with comparable utilities to state-of-the-art in node classification and link prediction.

References

  • C. Agarwal, H. Lakkaraju*, and M. Zitnik* (2021) Towards a unified framework for fair and stable graph representation learning. arXiv preprint arXiv:2102.13186. Cited by: §1, §2, §4.1, §4.2.
  • A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola (2013) Distributed large-scale natural graph factorization. In Proc. International Conference on World Wide Web (WWW), pp. 37–48. Cited by: §2.
  • A. Beutel, J. Chen, Z. Zhao, and E. H. Chi (2017) Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075. Cited by: §1.
  • A. Bose and W. Hamilton (2019) Compositional fairness constraints for graph embeddings. In International Conference on Machine Learning (ICML, pp. 715–724. Cited by: §2.
  • M. Buyl and T. De Bie (2020) Debayes: a bayesian method for debiasing network embeddings. In International Conference on Machine Learning (ICML), pp. 1220–1229. Cited by: §2.
  • M. Buyl and T. De Bie (2021) The kl-divergence between a graph model and its fair i-projection as a fairness regularizer. arXiv preprint arXiv:2103.01846. Cited by: §2.
  • S. Cao, W. Lu, and Q. Xu (2015) Grarep: learning graph representations with global structural information. In Proc. ACM International Conference on Information and Knowledge Management (CIKM), pp. 891–900. Cited by: §2.
  • D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun (2020) Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 34, pp. 3438–3445. Cited by: §2.
  • H. Chen, B. Perozzi, Y. Hu, and S. Skiena (2018) Harp: hierarchical representation learning for networks. In Proc. AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
  • T. Chen, S. Kornblith, M. Norouzi, and G. Hinton (2020) A simple framework for contrastive learning of visual representations. In Proc. International Conference on Machine Learning (ICML), pp. 1597–1607. Cited by: §A.6.
  • E. Dai and S. Wang (2020) Say no to the discrimination: learning fair graph neural networks with limited sensitive attribute information. arXiv preprint arXiv:2009.01454. Cited by: §1, §2, §3, §4.1, §4.1.
  • Y. Dong, J. Kang, H. Tong, and J. Li (2021) Individual fairness for graph neural networks: a ranking based approach. In Proc ACM Conference on Knowledge Discovery & Data Mining (SIGKDD), pp. 300–310. Cited by: §2.
  • C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel (2012) Fairness through awareness. In Proc. Innovations in Theoretical Computer Science (ITCS), pp. 214–226. Cited by: §1.
  • J. Fisher, A. Mittal, D. Palfrey, and C. Christodoulopoulos (2020) Debiasing knowledge graph embeddings. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7332–7345. Cited by: §2.
  • A. García-Durán and M. Niepert (2017) Learning graph representations with embedding propagation. In Proc. International Conference on Neural Information Processing Systems (NeurIPS), pp. 5125–5136. Cited by: §1, §2.
  • X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 249–256. Cited by: §A.7.
  • I. S. Gomez, B. G. da Costa, and M. A. Dos Santos (2019) Majorization and dynamics of continuous distributions. Entropy 21 (6), pp. 590. Cited by: §A.2.
  • A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Cited by: §2, §4.1.
  • S. Hajian and J. Domingo-Ferrer (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Transactions on Knowledge and Data Engineering 25 (7), pp. 1445–1459. External Links: Document Cited by: §1, §3.2.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Proc. International Conference on Neural Information Processing Systems (NeurIPS), pp. 1025–1035. Cited by: §2.
  • K. Hassani and A. H. Khasahmadi (2020) Contrastive multi-view representation learning on graphs. In Proc. International Conference on Machine Learning (ICML), pp. 4116–4126. Cited by: §2.
  • R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio (2018)

    Learning deep representations by mutual information estimation and maximization

    .
    In Proc. International Conference on Learning Representations (ICLR), Cited by: §1, §2.
  • B. Hofstra, R. Corten, F. Van Tubergen, and N. B. Ellison (2017) Sources of segregation in social networks: a novel approach using facebook. American Sociological Review 82 (3), pp. 625–656. Cited by: §1.
  • F. Hu, Y. Zhu, S. Wu, L. Wang, and T. Tan (2019) Hierarchical graph convolutional networks for semi-supervised node classification. In Proc. International Joint Conference on Artificial Intelligence, (IJCAI), Cited by: §2.
  • X. Ji, J. F. Henriques, and A. Vedaldi (2019) Invariant information clustering for unsupervised image classification and segmentation. In

    Proc. IEEE International Conference on Computer Vision (ICCV)

    ,
    pp. 9865–9874. Cited by: §2.
  • G. Jin, Q. Wang, C. Zhu, Y. Feng, J. Huang, and J. Zhou (2020)

    Addressing crime situation forecasting task with temporal graph convolutional neural network approach

    .
    In Proc. International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 474–478. External Links: Document Cited by: §1.
  • K. Kafle, M. Yousefhussien, and C. Kanan (2017) Data augmentation for visual question answering. In

    Proceedings of the 10th International Conference on Natural Language Generation

    ,
    pp. 198–202. Cited by: §1, §2.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §A.7, §A.7.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In Proc. International Conference on Learning Representations (ICLR), Cited by: §A.7, §1, §2.
  • C. Laclau, I. Redko, M. Choudhary, and C. Largeron (2021) All of the fairness for edge prediction with optimal transport. In International Conference on Artificial Intelligence and Statistics, pp. 1774–1782. Cited by: §2.
  • P. Li, Y. Wang, H. Zhao, P. Hong, and H. Liu (2021) On dyadic fairness: exploring and mitigating bias in graph connections. In Proc. International Conference on Learning Representations (ICLR), Cited by: §2.
  • J. Ma, J. Deng, and Q. Mei (2021) Subgroup generalization and fairness of graph neural networks. arXiv preprint arXiv:2106.15535. Cited by: §2.
  • A. W. Marshall, I. Olkin, and B. C. Arnold (1979) Inequalities: theory of majorization and its applications. Vol. 143, Springer. Cited by: §A.2.
  • A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel (2010) You are who you know: inferring user profiles in online social networks. In Proc. ACM International Conference on Web Search and Data Mining (WSDM), pp. 251–260. Cited by: §3.3.3.
  • F. L. Opolka, A. Solomon, C. Cangea, P. Veličković, P. Liò, and R. D. Hj elm (2019) Spatio-temporal deep graph infomax. arXiv preprint arXiv:1904.06316. Cited by: §A.6, §1, §2.
  • M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu (2016) Asymmetric transitivity preserving graph embedding. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1105–1114. Cited by: §2.
  • Z. Peng, W. Huang, M. Luo, Q. Zheng, Y. Rong, T. Xu, and J. Huang (2020) Graph representation learning via graphical mutual information maximization. In Proc. Web Conference (WWW), pp. 259–270. Cited by: §2.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) DeepWalk: online learning of social representations. In Proc. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 701–710. External Links: Link Cited by: §2, §4.1.
  • T. A. Rahman, B. Surma, M. Backes, and Y. Zhang (2019) Fairwalk: towards fair graph embedding.. In Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 3289–3295. Cited by: §2, §3.3.3, §4.1.
  • V. Red, E. D. Kelsic, P. J. Mucha, and M. A. Porter (2011) Comparing community structure to characteristics in online collegiate social networks. SIAM review 53 (3), pp. 526–543. Cited by: §4.1.
  • Y. Rong, W. Huang, T. Xu, and J. Huang (2019) Dropedge: towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903. Cited by: §2, §3.3, §4.1, §4.2.
  • C. Shorten and T. M. Khoshgoftaar (2019)

    A survey on image data augmentation for deep learning

    .
    Journal of Big Data 6 (1), pp. 1–48. Cited by: §1, §2.
  • I. Spinelli, S. Scardapane, A. Hussain, and A. Uncini (2021) Biased edge dropout for enhancing fairness in graph representation learning. arXiv preprint arXiv:2104.14210. Cited by: §2.
  • L. Takac and M. Zabovsky (2012) Data analysis in public social networks. In International Scientific Conference and International Workshop. ’Present Day Trends of Innovations’, Vol. 1. Cited by: §4.1.
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In Proc. International Conference on World Wide Web (WWW), pp. 1067–1077. Cited by: §2.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph attention networks. Proc. International Conference on Learning Representations (ICLR). Cited by: §1, §2.
  • P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm (2019) Deep graph infomax. In Proc. International Conference on Learning Representations (ICLR), External Links: Link Cited by: §A.6, §A.7, §2, §3.3, §4.1.
  • F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger (2019) Simplifying graph convolutional networks. In Proc. International Conference on Machine Learning (ICML), pp. 6861–6871. Cited by: §2.
  • Z. Wu, Y. Xiong, S. X. Yu, and D. Lin (2018) Unsupervised feature learning via non-parametric instance discrimination. In

    Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    pp. 3733–3742. Cited by: §2.
  • M. Ye, X. Zhang, P. C. Yuen, and S. Chang (2019) Unsupervised embedding learning via invariant and spreading instance feature. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6210–6219. Cited by: §2.
  • Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen (2020) Graph contrastive learning with augmentations. Proc. International Conference on Neural Information Processing Systems (NeurIPS) 33. Cited by: §2, §3.3.1, §3.3, §4.1.
  • Z. Zeng, R. Islam, K. N. Keya, J. Foulds, Y. Song, and S. Pan (2021) Fair representation learning for heterogeneous information networks. arXiv preprint arXiv:2104.08769. Cited by: §2.
  • X. Zhang, J. Zhao, and Y. LeCun (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28, pp. 649–657. Cited by: §1, §2.
  • T. Zhao, Y. Liu, L. Neves, O. Woodford, M. Jiang, and N. Shah (2020) Data augmentation for graph neural networks. arXiv preprint arXiv:2006.06830. Cited by: §2, §3.3.
  • Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang (2020) Deep Graph Contrastive Representation Learning. In Proc. International Conference on Machine Learning (ICML) Workshop on Graph Representation Learning and Beyond, External Links: Link Cited by: §A.6, §A.6, §A.7, §A.7, §2, §3.3.1, §3.3, §4.1, §4.1.
  • Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang (2021) Graph contrastive learning with adaptive augmentation. In Proc. Web Conference (WWW), Cited by: §A.6, §A.6, §A.7, §A.7, §2, §3.3, §4.1.

Appendix A Appendix

a.1 Proof of Theorem 1

Let Hence, the elements of centered sensitive attribute vector can be written as

(8)

where and is the element of matrix at row , column . Using values, the equation 8 becomes

(9)

Similarly analysis for follows as

(10)

Define and denote the vector whose elements are s. Then equals to

(11)

where represents the Hadamard product. Therefore, follows as

(12)

We first consider the terms and individually

(13)

Similarly, the expression for the term can also be derived.

(14)

Define , the following can be written by equation 13 and equation 14