1 Introduction
Graph representation learning (GRL) has attracted significant attention due to its widespread applications in the realworld interaction systems, such as social, molecules, biological and citation networks GRL_survey
. The current stateoftheart supervised GRL methods are mostly based on Graph Neural Networks (GNNs)
GCN ; GAT ; hamilton_inductive_2017 ; GIN , which require a large amount of taskspecific supervised information. Despite the remarkable performances, they are usually limited by the deficiency of label supervision in realworld graph data due to the fact that it is usually easy to collect unlabeled graph but could be very costly to obtain enough annotated label, especially in certain fields, like biochemistry. Therefore, many recent works GCC ; ContrastMultiView ; InfoGraphhave studied how to fully utilize the unlabeled information on graph and thus stimulating the application of selfsupervised learning (SSL) for GRL where only limited or even no label is needed.
As a prevalent and effective strategy of SSL, contrastive learning follows the mutual information maximization principle (InfoMax) DGI to maximize the agreements of the positive pairs while minimizing that of the negative pairs in embedding space. In particular, graph contrastive learning (GCL) methods GCC ; ContrastMultiView ; GraphCL usually take two different augmentation views from the same graph as the positive pair and maintain a high consistency between the learned representations of the views to preserve the invariance of graphs properties while ignore the nuance. However, InfoMax has been proved to be risky because it could push encoders to capture redundant information, which is useless for the label identification and may lead to brittle representation on_mutual_info . Recent works ADGCL ; JOAO ; GASSL try to implement challenging augmentation views or add perturbations on the original graph to improve representation robustness. But such aggressive transformation may break an assumption of GCL that the data distribution shift induced by augmentation does not affect label information. Intuitively, each commonlyused augmentation technique transforms the original data in a certain way and therefore injecting distinct yet meaningless factors to the original graph data. The injected misleading factors could be highly entangled with the labelrelevant factors, making the augmented graph less informative. Therefore, simply adding regularizations ADGCL ; GASSL to eliminate redundant information within the highly entangled graph may lead to another extreme where the learned representations are not informative enough to identify different graphs. To mitigate this issue, a method which can disentangle the augmentation viewinvariant and the viewdependent latent factors without sacrificing the information sufficiency of learned representation is in urgent need.
In this paper, we address this challenge by proposing a novel crossview disentangled graph contrastive learning model with adversarial training, named ACDGCL. Specifically, ACDGCL consists of a graph encoder followed with two feature extractors that disentangle representations particular to the essential and augmentationinduced latent factors, respectively. We follow the mutual information maximization principle to maximize the correspondence between the essential representation of two augmentation views. To ensure the disentanglement constraint, we propose a reconstructionbased representation learning, including intraview and interview reconstructions, to explicitly disentangle the augmentation relevant and irrelevant factors of the learned representation. Besides, a perturbed adversarial graph is added as the third contrasitve view besides the two augmentation views to further remove the redundant information from the learned essential representation. We further provide theoretical analysis to show that ACDGCL is capable to learn a minimal sufficient representation with the designs above. Finally, we conduct experiments to validate the effectiveness of ACDGCL, on the commonlyused graph benchmark datasets. The experiment results show that ACDGCL achieves significant performance gains over different datasets and settings compared with stateoftheart baselines.
To sum up, our main contributions of this work include three aspects: (i) We propose ACDGCL to learn the augmentationdisentangled representation with a crossview reconstruction mechanism; (ii) We add adversarial samples as the third contrastive view to improve the robustness of the learned representation; (iii) We conduct thorough experiments to demonstrate that ACDGCL significantly outperforms the stateoftheart baselines over multiple graph classification benchmark datasets.
2 Preliminaries
2.1 Graph Representation Learning
In this work, we focus on the graphlevel task, let denote a graph dataset with N graphs, where and are the node set and edge set of graph , respectively. We use and
to denote the attribute vector of each node
and edge . Each graph is associated with a label, denoted as , the goal the graph representation learning is to learn an encoder so that the learned representation is sufficient to identify in the downstream task. We clarify sufficiency as containing the same amount of information as for label identification sufficiency , and it is formulated as:(1) 
where denotes the mutual information between two variables. We demonstrate the general optimization result of classical representation learning in Figure 1(a).
2.2 Contrastive Learning
Contrastive Learning (CL) is a selfsupervised representation learning method which leverages instancelevel identity as supervision. During the training phase, each instance firstly goes through proper data transformation to generate two data augmentation views and , where and are transformation functions. Then, the CL method learns a encoder (a backbone network plus a projection layer) which maps and closer in the hidden space so that the learned representations and maintain all the information shared by and . The encoder is usually optimized by contrastive loss, such as NCE loss NCE , InfoNCE loss InfoNCE and NTXent loss SimCLR . We hereby provide the general optimization result of CL in Figure 1(b). In Graph Contrastive Learning (GCL), we usually use GNNs, like GCN GCN and GIN GIN , as the backbone networks and the commonlyused graph data augmentation operators GraphCL , such as node dropping, edge perturbation, subgraph sampling, and attribute masking.
All the CLbased methods are built on an assumption that augmentations do not change the information regarding to the label. Here, we follow robust_rep to clear up the definition of redundancy. is redundant to for iff and share the same labelrelevant information. In CL, and are supposed to be mutually redundant, we define the mutual redundancy as:
(2) 
2.3 Adversarial Training
Deep neural networks have been demonstrated to be vulnerable to adversarial attacks AT_intro . Among all approaches proposed against adversarial attacks, Adversarial Training (AT) achieves remarkable robustness. Specifically, AT introduces a perturbation variable to improve the model robustness by training the network on the adversarial samples . During the training phase, the model is optimized to minimize the training loss, and the is optimized within the radius to maximize the loss. The supervised setting of AT is defined as:
(3) 
where are the data feature and label sampled from training set respectively, and denotes the supervised training objective, such as the crossentropy loss. Except improving model robustness to adversarial attacks, AT is also capable of reducing overfitting and further increasing the generalization performance AdvProp ; FreeLB . One possible reason behind this phenomena is that AT follows the Information Bottleneck principle IB ; IB_AT , in which the optimal representations only contain minimal yet sufficient information. Depending on whether relevant to label or not, the mutual information between representation and can be decomposed into two parts:
(4) 
Ideally, the learned representation through AT is expected to keep all the labelrelevant information from intact while ignoring other futile information, we plot this ideal situation in Figure 1(c).
3 Proposed Model
In this section, we introduce the proposed ACDGCL, and its framework is shown in Figure 2. Before we dive into the details of ACDGCL, we will briefly analyze the sufficiency and robustness in GCL, and provide illustration of the disentanglement hypothesis.
3.1 Motivation of ACDGCL
To minimize the redundant information in graph representation, the principle of information bottleneck (IB) has been introduced in graph representation learning yu_graph_2020 , and the learned model is empirically proved to be more robust to adversarial attacks. In the circumstances of graph contrastive learning (GCL), labels are not accessible to guide the optimization process, thus it is more challenging to discern the predictive information and redundant information. For each graph , GCL methods generally pick two augmentation operators and IID sampled from the same family of augmentation to generate two augmented graph and for contradistinction. As stated in Section 2.2, all GCL models are optimized based on the assumption that the two contrastive views are mutually redundant for label information. However, this assumption does not necessarily hold, especially when aggressive augmentations are applied on the graph data. Intuitively, and own completely different distributions, but the factors related to them can be highly entangled through the data augmentation process, hence shifting the distributions compared with the original graph data in some extent. In this case, aggressive data augmentation operators ADGCL or perturbations GASSL could overly enlarge the distribution shift so that the augmentation views fail to meet the redundancy assumption in Section 2.2, and thereby the learned representation will not satisfy the sufficiency requirement in Section 2.1.
To address this dilemma, we propose to learn the augmentationdisentangled representation to improve its robustness without sacrificing the information sufficiency. Given an augmented graph , we aim to learn a pair of disentangled representation , where is expected to be specific to the augmentation information, while is optimized to elicit all the essential factors (the distribution of original graph) from , i.e., . We provide the illustration of the optimal representation disentanglement in Figure 3(a) and (b): (1) is sufficient for its corresponding graph view regarding to , and the union of the and cover all the information in ; (2) and are mutually excluded (disentangled), and only is relevant to label information. To reach to this optimal point, we need to propose corresponding designs to guarantee the sufficiency and disentanglement of the representation. Next, we will introduce our framework details to explain how do we achieve the two restrictions, respectively.
3.2 Disentanglement by CrossView Reconstruction
In GCL, we usually leverage a graph encoder to aggregate the feature of graph data as its representation. There are multiple choices of graph encoders in GCL, including GCN GCN and GIN GIN , etc. In this work, we adopt GIN as the backbone network for simplicity. Note that any other commonlyused graph encoders can also be applied to our model. Given two augmentation views and , we firstly use the encoder to map the the them into a lower dimension hidden space to generate two embeddings and . Instead of directly maximizing the agreement between the two entangled representations and , we further feed them into a pair feature extractors (both of them are MLPbased networks) to learn the disentangled embeddings:
(5) 
where we can generate a pair of disentangled embeddings for both and through the procedure above. Ideally, the mutual redundancy assumption between can thus be guaranteed because and are augmented from the same original graph, and they naturally share the same essential factors, including those relevant to label identification. Here, we clarify the lower bound of the mutual information between one augmentation view and the learned augmentationinvariant representation of another augmentation view in Theorem 1.
Theorem 1
Suppose is a GNN encoder as powerful as 1WL test. Let elicits only the augmentation information from meanwhile extracts the essential factors of from and . Then we have:
The detailed proof is provided in Section C of Appendix. Therefore, we can maximize the consistency between the representations of the two views by maximizing the mutual information of between and . Therefore, we can derive our objective to ensure the view invariance as follow:
(6) 
where denotes the contrastive objective and we adopt InfoNCE loss in this work InfoNCE . Meanwhile, to ensure the feature sufficiency and disentanglement as stated above, we thus propose to use the crossview reconstruction mechanism to pursue these two objectives. To be specific, we will use the representation pair
within and cross the augmentation views to recover the original raw data so that the two objectives can be guaranteed simultaneously. Due to the reason that graph data is a kind of nonEuclidean structured data which can not be represented in the euclidean space like the raw data in computer vision domain, we turn to infer the output of
based on . Firstly, we do the reconstruction within the augmentation view, namely mapping to , where representing the augmentation view. The optimal result of the this step is shown in Figure 3(a), where the joint of and can cover all the information in its corresponding augmentation graph view . Since only is involved in the contrastive loss in Equation 6, we could thus avoid the graph encoder demanded to be less powerful than 1WL test especially when aggressive augmentation or perturbation is applied, i.e., . Then, we define the as a crossview representation pair and the reconstruction procedure will be repeated on it to predict , aiming to ensure and is disentangled and is specific to the augmentationinduced factors, where or . The optimal disentanglement result is illustrated in Figure 3(b), where and . Here, we formulate the reconstruction procedures as:(7) 
where is the parameterized reconstruction model and is the predefined fusion operator, like elementwise product. The reconstruction procedures are optimized by minimizing the entropy , where or . Ideally, we can reach the optimal situation demonstrated in Figure 3(a) and (b) iff , where
is exactly recovered given its augmentationdependent representation and the augmentationinvariant representation of any view. Nevertheless, the condition probability
is unknown for us, we hence use the variation distribution approximated by instead, denoted as . We provide the upper bound of in Theorem 2.Theorem 2
Assume
is a Gaussian distribution,
is the parameterized reconstruction model which infer from . Then we have:The detailed proof is demonstrated in Section C of Appendix. Since we adopt two augmentation views, the objective function constraining representation sufficiency and disentanglement can be formulated as:
(8) 
3.3 Adversarial Contrastive View
With the crossview reconstruction mechanism above, we disentangle the essential factors from the two augmentation views and thus and can be considered as the mutually redundant for because both of them are similarly distributed as original graph , i.e., and . However, not all of the information in original graph is labelrelevant and redundant information may still exist in as illustrated in Figure 3(b). To further remove the redundant information and enhance the robustness of learned representation, we introduce adversarial training in our model by adding crafted perturbations upon . Many existing works RoCL about adversarial contrastive learning usually add crafted perturbations on the augmented samples and optimize the perturbation to maximize the contrastive loss. Despite its generalization ability, this kind of AdversarialtoAdversarial training strategy will aggressively demand minimizing the “worstcase” consistency and thereby is unsuitable to apply on original graph view ADVCL . We thereby build the adversarial sample over rather than augmented sample , denoted as . The adversarial objective is defined as
(9) 
where the adversarial sample is employed as another positive pair with the two augmentation views. We also clarify the lower bound of the mutual information between and adversarial view in theorem 3.
Theorem 3
The detailed proof is in Section C of Appendix. Our implementation of crafting perturbation is spurred by recent work GASSL that add perturbation on the output of first hidden layer because it is empirically proved to generate more challenging view than adding perturbation on the initial node feature. With the lower bound stated in Theorem 3, we can utilize the minmax training strategy to further improve representation robustness, where the inner maximization is solved by projected gradient descent (PGD) PGD . Then the adversarial objective is defined as:
(10) 
3.4 The Joint Objective
We design the joint objective of ACDGCL by combining all of objectives above together. Given the graph , the graph encoder and feature extractor can be optimized with the objective below:
(11) 
where and are the coefficients to balance the magnitude of each loss term. Our proposed model is able to learn optimal representation illustrated in Figure 3(c) with the joint objective.
4 Experiments
In this section, we demonstrate the empirical evaluation results of ACDGCL on public graph benchmark datasets. Ablation study and robustness analysis are conducted to evaluate the effectiveness of the designs in ACDGCL. We provide the dataset statistics, training details and more analysis about hyperparameters in the Appendix.
4.1 Experimental Setups
Datasets.
We evaluate our model on five graph benchmark datasets from the field of bioinformatics, including MUTAG, PTCMR, NCI1, DD, and PROTEINS, and other five from the field of social network, which are COLLAB, IMDBB, RDTB, RDTM5K, and IMDBM, for the task of graphlevel property classification. Additionally, We use ogbgmolhiv from Open Graph Benchmark Dataset
OGB to demonstrate our model’s advantages over largescale dataset. More details about dataset statistics are included in Section A of Appendix.Baselines. Under the unsupervised representation learning setting, we compare ACDGCL with the seven SOTA selfsupervised learning methods GraphCL GraphCL , InfoGraphInfoGraph , MVGRL ContrastMultiView , ADGCLADGCL , GASSLGASSL , InfoGCLInfoGCL and DGCLDGCL , as well as four classical unsupervised representation learning methods, including node2vec node2vec , sub2vec sub2vec , graph2vec graph2vec , and GVAEVGAE .
Evaluation Protocol. We follow the evaluation protocols in the previous works InfoGraph ; GraphCL ; DGCL
to verify the effectiveness of our model. The learned representation is finetuned by a linear SVM classifier for taskspecific prediction. We report the mean test accuracy evaluated by a 10fold cross validation with standard deviation of five random seeds as the final performance. In addition, we follow the setting of semisupervised representation learning from GraphCL on the ogbgmolhiv dataset, with the finetune label rates as 1%, 10%, and 20%. The final performance is reported as the mean ROCAUC of five initialization random seeds
Implementation Details.
We implement our framework with PyTorch and employ the data augmentation function provided by PyGCL library
PyGCL . We choose GIN GIN as the backbone graph encoder and the model is optimized through Adam optimizer. There are two specific hyperparameters in our model, namely and , the search space of them are and , respectively. More details about implementation details is provided in the Section B of Appendix. All of the experiments are conducted on Nvidia GeForce RTX 2080ti GPU.4.2 Overall Performance Comparison
Unsupervised representation learning. The overall performance comparison is shown in Table 1
. From the results, we can have three observations: (1) The GCLbased methods generally yield higher performances than classical unsupervised learning methods, indicating the effectiveness of utilizing instancelevel supervision; (2) InfoGCL and GASSL achieve better performances than GraphCL, which empirically proves the conclusion that InfoMax object could suffer from the overwhelmed information and thus more challenging augmentations or perturbations are in need to produce robust representations; (3) Our proposed ACDGCL and DGCL consistently outperform other baselines, proving the advantage of disentangled representation. More importantly, ADGCL achieves stateoftheart results on most of the datasets, which further demonstrate the success of our model to learn minimal yet sufficient representations.
Semisupervised representation learning. The semisupervised representation learning results for ogbgmolhiv are shown in Figure 4. It is obvious that our model gains significant improvements under the three labelrate finetuning settings. We also notice that as the label rate increases, the amount of improvement increases as well (1%, 1.8%, and 4.4% for label rate 1%, 10%, and 20%, respectively). A possible explanation could be that as more trainable data is included in the process of finetuning when the label rate increases, so does the affiliated redundant information, which as a result, deteriorate the performance even more. Therefore, removing redundant information causes a higher performance boost.
4.3 Ablation Study
To further verify the effectiveness of different modules in ACDGCL, we perform ablation studies on each one of the module by creating the model variants illustrated below. The comparison results are shown in Table 2.

[leftmargin=*]

w/o Intraview Recon. Reconstruction is only executed within the cross view i.e., .

w/o Interview Recon. Reconstruction is only executed within the same view i.e., .

w/o Adv. Training. Adversarial view is discarded in the contrastive loss.
From Table 2 we can see that our model with the combination of crossview reconstruction and adversarial training module outperforms all of the variants. Discarding any reconstruction view could cause the failure to reach the optimal situation illustrated in Figure 3. We can not guarantee the representation disentanglement assumption if we skip the interview reconstruction, and the sufficiency assumption may not hold if we abandon intraview reconstruction. Either way, the augmentationinvariant representations may suffer from enormous information loss during the contrastive learning and further lead to the performance deterioration. Compared with our model, the variant w/o Adv. Training may bring too much redundant information to the downstream classifier, therefore creating more confusions. The relatively larger performance deterioration for the two variants w/o Intraview Recon and w/o Interview suggests the rule "better than nothing". That is, having redundant information is better than having it partially.
4.4 Robustness Analysis
In this section, we conduct extra experiments on ogbgmolhiv dataset to evaluate the effectiveness of our design in ensuring the representation robustness under aggressive augmentation and perturbation. The results are shown in Figure 5. In the left two subplots, we plot accuracy verses edge perturbation and attribute masking strengths, respectively. Specifically, we keep the GraphCL and our proposed ACDGCL under the same hyperparameter setting and set the and of ACDGCL as 5.0 and 0.5, respectively. From the results we can see that ACDGCL not only consistently outperforms GraphCL but also is less affected by larger augmentation strengths. Similar observation can be find in the right two subplots, where we compare our method with GASSL under different perturbation bounds and attack steps to demonstrate its robustness against adversarial attacks. Since both our model and GASSL use GIN as the backbone network, we hereby add the performance of GIN as the compared baseline. Although aggressive adversarial attacks can largely deteriorate the performance, our proposed ACDGCL still achieves more robust performance than GASSL.
5 Related Work
Graph contrastive learning. Contrastive learning is firstly proposed in the compute vision field SimCLR and raises a surge of interests in the area of selfsupervised graph representation learning for the past few years. The principle behind contrastive learning is to utilize the instancelevel identity as supervision and maximize the consistency between positive pairs in hidden space through designed contrast mode. Previous graph contrastive learning works generally rely on various graph augmentation (transformation) techniques DGI ; GCC ; MVGRL ; GraphCL ; InfoGraph to generate positive pair from original data as similar samples. Recent works in this field try to improve the effectiveness of graph contrastive learning by finding more challenge view ADGCL ; InfoGCL ; JOAO or adding adversarial perturbation GASSL . However, most of the existing methods contrast over entangled embeddings, where the complex intertwined information may pose obstacles to extracting useful information for downstream tasks. Our model is spared from the issue by contrasting over disentangled representations.
Disentangled representation learning on graphs. Disentangled representation learning arises from the computer vision field hsieh_learning_2018 ; zhao_learning_2021 to disentangle the heterogeneous latent factors of the representations, and therefore making the representations more robust and interpretable RLReview . This idea has now been widely adopted in graph representation learning. IPGDN ; DisenGCN utilizes neighborhood routing mechanism to identify the latent factors in the node representations. Some other generative models VGAE ; GraphVAE
utilize Variational Autoencoders to balance reconstruction and disentanglement. Recent work
DGCL outspreads the application of disentangled representations learning in selfsupervised graph learning by contrasting the factorized representations. Although these methods gain significant benefit from the representation disentanglement, the underlined excessive information could still overload the model, thus resulting in limited capacities. Our model targets the issue by removing the redundant information that is considered irrelevant to the graph property.Graph information bottleneck. The Information bottleneck (IB) IB has been widely adopted as a critical principle of representation learning. A representation contains minimal yet sufficient information is considered to be in compliance with the IB priciple and many works VIB ; blackbox ; robust_rep have empirically and theoretically proved that representation agree with IB principle is both informative and robust. Recently, IB principle is also borrowed to guide the representation learning of graph structure data. Current methods GIB ; InfoGCL ; ADGCL usually propose different regularization designs to learn compressed yet informative representations in accordance with IB principle. We follow the information bottleneck to learn the expressive and robust representation from disentangled information in this work.
6 Conclusion
In this paper, we study graph representation learning in light of information bottleneck. To reach the optimum we illustrate, we propose a novel model, namely ACDGCl, which is designed to disentangle the essential factors from augmented graph through a crossview reconstruction mechanism so that the information entanglement brought by augmentations will not cause the loss of predictive information during contrastive learning. We also add an adversarial view as the third view of the contrastive learning to further remove redundant information and enhance representation robustness. In addition, we theoretically analyze the effectiveness of each component in our model and derive the objective based on the analysis. Extensive experiments on multiple graph benchmark datasets and different settings prove the ability of ACDGCL to learn robust and informative graph representation. In the future, we can explore how to come up with a practical objective to further decrease the upper bound of the mutual information between the disentangled representations and try to utilize more efficient training strategy to make the proposed model more timesaving on largescale graphs.
References
 (1) Alessandro Achille and Stefano Soatto. Emergence of invariance and disentanglement in deep representations. JMLR, 2018.
 (2) Bijaya Adhikari, Yao Zhang, Naren Ramakrishnan, and B. Aditya Prakash. Sub2Vec: Feature Learning for Subgraphs. In KDD, 2018.
 (3) Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep Variational Information Bottleneck. ICLR, 2017.
 (4) Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. TPAMI, 2013.
 (5) Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A Simple Framework for Contrastive Learning of Visual Representations. In ICML, 2020.
 (6) Lijie Fan, Sijia Liu, PinYu Chen, Gaoyuan Zhang, and Chuang Gan. When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In NeurIPS, 2021.
 (7) Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, and Zeynep Akata. Learning robust representations via multiview information bottleneck. In ICLR, 2020.
 (8) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2014.
 (9) Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. In KDD, 2016.
 (10) Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive Representation Learning on Large Graphs. In NeurIPS, 2017.
 (11) William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017.
 (12) Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive MultiView Representation Learning on Graphs. In ICML, 2020.
 (13) Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive multiview representation learning on graphs. In ICML, 2020.
 (14) JunTing Hsieh, Bingbin Liu, DeAn Huang, Li F FeiFei, and Juan Carlos Niebles. Learning to Decompose and Disentangle Representations for Video Prediction. In NeurIPS, 2018.

(15)
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu,
Michele Catasta, and Jure Leskovec.
Open Graph Benchmark: Datasets for Machine Learning on Graphs.
In NeurIPS, 2020.  (16) Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Adversarial selfsupervised contrastive learning. In NeurIPS, 2020.
 (17) Thomas N. Kipf and Max Welling. Variational Graph AutoEncoders. In NeurIPS, 2016.
 (18) Thomas N. Kipf and Max Welling. SemiSupervised Classification with Graph Convolutional Networks. In ICLR, 2017.
 (19) Haoyang Li, Xin Wang, Ziwei Zhang, Zehuan Yuan, Hang Li, and Wenwu Zhu. Disentangled Contrastive Learning on Graphs. In NeurIPS, 2021.
 (20) Yanbei Liu, Xiao Wang, Shu Wu, and Zhitao Xiao. Independence promoted graph disentangled networks. In AAAI, 2020.
 (21) Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. Disentangled Graph Convolutional Networks. In ICML, 2019.

(22)
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
Adrian Vladu.
Towards deep learning models resistant to adversarial attacks.
In ICLR, 2018.  (23) Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, and Greg Ver Steeg. Invariant representations without adversarial training. NeurIPS, 2018.

(24)
A. Narayanan, Mahinthan Chandramohan, R. Venkatesan, Lihui Chen, Yang Liu, and
Shantanu Jaiswal.
graph2vec: Learning Distributed Representations of Graphs.
ArXiv, 2017.  (25) Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. GCC: Graph Contrastive Coding for Graph Neural Network PreTraining. In KDD, 2020.
 (26) Ravid ShwartzZiv and Naftali Tishby. Opening the Black Box of Deep Neural Networks via Information. arXiv:1703.00810 [cs], April 2017.
 (27) Martin Simonovsky and Nikos Komodakis. GraphVAE: Towards generation of small graphs using variational autoencoders. In ICLR, 2018.
 (28) FanYun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. InfoGraph: Unsupervised and Semisupervised GraphLevel Representation Learning via Mutual Information Maximization. In ICLR, 2019.
 (29) Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. Adversarial Graph Augmentation to Improve Graph Contrastive Learning. In NeurIPS, 2021.
 (30) Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000.
 (31) Michael Tschannen, Josip Djolonga, Paul K Rubenstein, Sylvain Gelly, and Mario Lucic. On mutual information maximization for representation learning. In ICLR, 2019.
 (32) Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation Learning with Contrastive Predictive Coding. arXiv eprints, 2018.
 (33) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph Attention Networks. In ICLR, 2018.
 (34) Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. Deep Graph Infomax. In ICLR, 2019.
 (35) Tailin Wu, Hongyu Ren, Pan Li, and Jure Leskovec. Graph Information Bottleneck. In NeurIPS. Curran Associates, Inc., 2020.
 (36) Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. Unsupervised Feature Learning via NonParametric Instance Discrimination. In CVPR, 2018.
 (37) Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L Yuille, and Quoc V Le. Adversarial examples improve image recognition. In CVPR, 2020.
 (38) Dongkuan Xu, Wei Cheng, Dongsheng Luo, Haifeng Chen, and Xiang Zhang. InfoGCL: InformationAware Graph Contrastive Learning. In NeurIPS, 2021.
 (39) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In ICLR, 2019.
 (40) Longqi Yang, Liangliang Zhang, and Wenjing Yang. Graph Adversarial SelfSupervised Learning. In NeurIPS, 2021.
 (41) Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. Graph contrastive learning automated. In ICLR, 2021.
 (42) Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph Contrastive Learning with Augmentations. In NeurIPS, 2020.
 (43) Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, and Ran He. Graph Information Bottleneck for Subgraph Recognition. In ICLR, 2020.
 (44) Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, and Ting Liu. Learning ViewDisentangled Human Pose Representation by Contrastive CrossView Mutual Information Maximization. In CVPR, 2021.
 (45) Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, and Jingjing Liu. Freelb: Enhanced adversarial training for natural language understanding. In ICLR, 2020.
 (46) Yanqiao Zhu, Yichen Xu, Qiang Liu, and Shu Wu. An Empirical Study of Graph Contrastive Learning. arXiv.org, 2021.