1 Introduction
The prediction of molecular property has been widely considered as one of the most significant tasks in computational drug and material discovery goh2017deep; wu2018moleculenet; chen2018rise. Accurately predicting the property can help to evaluate and select the appropriate chemical molecules with desired characteristics for many downstream applications xiong2019pushing; yang2019analyzing; song2020communicative; ijcai2021309.
With the remarkable success of graph neural networks (GNNs) in various graphrelated tasks in recent years wu2020comprehensive, a number of efforts have been made from different directions to design GNN models for molecular property prediction like yang2019analyzing; danel2020spatial; maziarka2020molecule; song2020communicative; ijcai2021309
. The fundamental idea is to regard the topology of atoms and bonds as a graph, and translate each molecule to a representation vector with powerful GNN encoders, followed by the prediction module for the specific property.
Along the other line of development for GNNs, graph contrastive learning (GCL) methods wu2021self
have shown promising performance in many applications when there is a lack of sufficient labeled data. The scarcity of labeled data is one of the major obstacles to hinder the prediction performance of GNN models (as well as other deep learning models) for molecular property prediction. For example, it usually requires a high cost to collect the labeled data for certain computational drug discovery task.
More special attention should be paid to developing GCL for molecular property prediction. Existing GCL methods usually adopt different data augmentation schemes for the graph, which may change the semantics of graphs across domains. Most of the current GCL methods on molecular graphs are still based on such data augmentation paradigm which could inevitably alter the natural structure of a molecule. For example, you2020graph proposes to drop atoms, perturb edges and mask attributes to augment the data. However, since each of atoms has an effect on the molecular property, such random dropping and perturbation of atoms could destroy the structure of a molecule. Though some other methods like MoCL sun2021mocl
adopt predefined substructures of the molecule to alleviate the problem of random corruption, such substitution rules still have a probability to violate the chemical principle.
Our insight is to design a new GCL method for molecular property prediction from different geometric views without corrupting the molecular structure. As shown in Figure 1, the molecules can be represented as twodimensional (2D) and threedimensional (3D) structural graphs. Although there are emerging geometric GNN models which can make use of multiple factors from the 2D chemical graph or 3D spatial graph maziarka2020molecule; klicpera_dimenet_2020; shui2020heterogeneous; danel2020spatial
, all of them capture the geometric information from a single view for representation learning. In practice, the 2D view and 3D view of a molecule are generated based on different methods: the 2D view derives directly from the structural formula of a chemical compound, while the 3D view is usually estimated by conformation generation procedure
hawkins2017conformation using the tools like RDKit tosco2014bringing. Considering that molecular graphs in 2D and 3D views can provide chemical and geometric information at different levels and further complement each other, there is a demand to develop a new paradigm for the moleculedriven contrastive learning without changing chemical semantics. Therefore, such different views of one molecule provide a great opportunity for designing a unique graph contrastive learning scheme on molecular graphs.To tackle the aforementioned challenges, we propose a novel geometricenhanced graph contrastive learning model (GeomGCL) for molecular property prediction, which is equipped with the adaptive geometric message passing network (GeomMPNN) as well as a contrastive learning strategy to augment the 2D3D geometric structure learning process. Firstly, we devise the dualchannel geometric learning procedure to adapt the graph aggregation to both views with leveraging distance and angle information at different granularity levels. Secondly, we further aim to make both views complement each other for better geometric comprehension. Here, we take the molecular geometry into consideration and propose the 2D3D geometric contrastive scheme to bridge the knowledge gap between geometric structure modeling and graph representation learning without labels. The representative 3D spatial information can be distilled with the guidance of the stable 2D information, which provides the chemical semantics. The fusion of 2D and 3D graphs promotes the proposed GeomGCL extract more expressive representation for property prediction. To summarize, the main contributions of our work are as follows:

To the best of our knowledge, we are among the first to develop the contrastive learning method for molecular graphs based on geometric views. By means of naturally contrasting the 2D and 3D view graphs of one molecule without any random augmentation process, our proposed GeomGCL makes best of the consistent and realistic structures for better representation learning.

The GeomGCL employs the dualchannel message passing neural network, which adaptively captures both 2D and 3D geometric information of molecular graphs. The additional spatial regularizer further preserves the relative relation of geometry and improves the performance.

The experiments on seven molecular datasets demonstrate that the proposed model outperforms the stateoftheart GNN and graph contrastive learning methods.
2 Related Work
The research topic of this paper is associated with molecular representation learning based on graph neural networks, especially the promising geometric and contrastive learning on molecular graphs. We will briefly discuss these topics.
2.1 Molecular Representation Learning
As the important basis of property prediction, molecular representation learning has been a popular research area. The earlier featurebased methods learn the fixed representations from molecular descriptors or chemical fingerprints rogers2010extended, which ignore the graph structures and rely on the feature engineering. Recent years have witnessed the great advantage of graph neural networks (GNNs) in modeling graph data wu2020comprehensive, much attention has been paid to applying GNN models to learn molecular graph representations. The graph convolution model duvenaud2015convolutional is first introduced to encode molecular graphs based on atomic features. AttentiveFP xiong2019pushing adopts the graph attention network to make the best of graph structures and has become one of the stateoftheart methods for molecular property prediction. More recently, incorporating bond features into the message passing neural networks gilmer2017neural turns into a trend of learning better representations. DMPNN yang2019analyzing is proposed to perform the edgebased message passing process over the edgeoriented directed graph, which obtains atom and bond embeddings concurrently. The communicative message passing models song2020communicative; ijcai2021309 further extend this work through improving the nodeedge interactive kernels as well as applying the transformer framework to capture longrange dependencies. Nevertheless, these approaches can not deal with the geometry of molecules and lack the ability of learning the nonlocal correlations among atomic nodes on the molecular graph.
2.2 Geometric Learning on Molecular Graphs
In the field of deep learning, geometrybased methods have shown prominent performance bronstein2017geometric. Since molecules have the geometric structures intrinsically, a few attempts have also been made to develop geometric graph learning models for the molecular graphs atz2021geometric. From the 2D view of the molecular graph, MAT maziarka2020molecule is designed to encode the interatomic distances with augmenting the attention mechanism in a transformer architecture. Meanwhile, there are some efforts to model the 3D molecular structures. SGCN danel2020spatial simply utilizes the 3D coordinates to apply the aggregation process. Such an intuitive method is sensitive to the coordinate systems, which leads to the poor performance of learning geometric graphs. Furthermore, several models that are invariant to translation and rotation of atom coordinates are proposed through designing the geometric kernels klicpera_dimenet_2020 or strengthening the nodeedge interactions with geometric information shui2020heterogeneous. However, these studies have shown not just strengths but also some limitations. Firstly, most of the efficient moleculeoriented geometric learning methods target at the quite small molecules for quantum property prediction. Secondly, none of these methods incorporate 2D and 3D geometric information simultaneously. To overcome these limitations, we propose to learn the 2D3D geometric factors adaptively and synergistically.
2.3 Contrastive Learning on Molecular Graphs
Along the other line of development, graph contrastive learning methods wu2021self have their own advantages and have achieved extraordinary performance in many applications you2020graph; qiu2020gcc; wang2021self. However, the existing models designed for molecular graphs receive little attention. InfoGraph sun2019infograph manages to maximize the mutual information between the representations of the graph and its substructures to guide the molecular representation learning. To alleviate the problem of random corruption on molecular graphs which may alter the chemical semantics, MoCL sun2021mocl adopts the domain knowledgedriven contrastive learning framework at both local and globallevel to preserve the semantics of graphs in the augmentation process. However, the learning ability of such model depends on the welldesigned substitution rules. The deficiency of geometric information in graph contrastive learning limits the capability of effective molecular representation learning. To this end, we develop a novel contrastive learning model to integrate the geometry of molecules with chemical semantics by means of contrasting the 2D and 3D view graphs.
3 Preliminaries
Generally, a molecular graph with geometric information can be represented as , where and denote the node (atom) set and edge (bond) set respectively. denotes the coordinate matrix for atoms, where is the spatial dimension. Given a molecule, the specific 2D and 3D view graphs are defined as following.
Definition 1 (2D View Graph.)
The 2D view graph is defined as , where the edges in correspond to the primary covalent bonds in the molecule, and denotes the coordinate of atom .
Definition 2 (3D View Graph.)
Similarly, the 3D view graph can be represented as . Note that the generated coordinate in three dimensional space is nondeterministic by means of the estimation algorithm, thus we repeatedly generate coordinates times and . The edge set is constructed based on the 3D spatial coordinates, which contains all edges whose distances are smaller than the cutoff threshold . It can be formulated as .
Problem Statement. Given a molecule, we can construct the 2D view graph and 3D view graph . Let be the atom feature matrix and be the bond feature matrix, our goal is to train a geometric graph encoder to learn the molecular representation vector without any label information. Then the welltrained model is utilized for various downstream property prediction tasks through the finetune process.
4 Model Framework
In this section, we present the proposed contrastive learning framework GeomGCL with leveraging dual geometric views of the molecular graph. As shown in Figure 2, after the derivation of 2D and 3D view graphs from the original molecule in SMILES format, GeomGCL equips with a dualchannel geometric message passing architecture (GeomMPNN) to learn the representations of both graphs adaptively. By contrast with the deterministic graph structure of 2D view, the 3D structure of molecule is always calculated through a stochastic process of 3D conformation generation. Consequently, while the 3D view graph contains more abundant geometric structure, such uncertain information is not always beneficial for molecular representation learning. To this end, we propose to bypass this challenge by adopting a geometric contrastive learning strategy across 2D and 3D views. This geometricview supervision mutually makes both views complement each other. On the one hand, GeomMPNN can distill the valuable 3D structure under the guidance of 2D view. On the other hand, it helps to inject 3D geometric information for better 2D molecular representation learning. In the following sections, we use the bold letters (, , , , or ) to represent embeddings of the corresponding symbolic indicators (, , , , or ).
4.1 Geometric Embedding
Since the primary 2D or 3D coordinates are changeable and inconsistent across different coordinate systems, we manage to calculate the definite geometric factors (i.e., angle and distance) and then utilize radial basis functions (RBF) to obtain duallevel geometric embeddings. As illustrated in Figure
1, the local distance and 2D angle refer to the distance and angle based on the covalent bonds respectively, which carry the critical chemical information. In the 3D view graph, the global distance further provides nonlocal correlations in a molecule, while the 3D angle indicates the spatial distribution of global connections. Following the previous work shui2020heterogeneous, we adopt several RBF layers to encode diverse geometric factors:(1)  
(2)  
(3)  
(4) 
where is the concatenation operation over scalar values to form a Kdimensional geometric embedding. For local and global distances, the K central points are uniformly selected between ( is or ) and 1, while . For 2D and 3D angles, each is between 0 and with , where denotes or .
4.2 Adaptive Geometric Message Passing
As shown in Figure 3, we further design an adaptive message passing scheme (GeomMPNN) to learn the topological structures of molecules with geometric information in a NodeEdge interactive manner. On the whole, GeomMPNN consists of NodeEdge, EdgeEdge and EdgeNode threestage message passing layers to iteratively update the node and edge embeddings, followed by a NodeGraph attentive pooling process. Both of the dualchannel networks generally follow such architecture and can adaptively learn the 2D and 3D geometric factors with finegrained designs.
(i) NodeEdge Message Passing.
Since only the existing bonds in a molecule have the initial edge features (i.e., bond features), the edge embedding should be firstly updated through aggregating the pairwise node embeddings with involving the associated features. To enrich the connection information from different aspects, we use MLP to integrate the chemical bond feature and global distance embeddings for 2D and 3Dedge embeddings respectively:
(5)  
(6) 
where and are edge embeddings at th layer, is the node (atom) embedding, the superscript or indicates the view channel, and represents the concatenation operator. Then the dualchannel edge embeddings can contain both geometric and chemical semantic information.
(ii) EdgeEdge Message Passing.
Different from the general graph, both the 2D view and 3D view graphs of a molecule have the unique geometric attributes, which can significantly influence the specific property of the molecule. After the derivation of edge embeddings, GeomMPNN performs an edge
edge message passing process to perceive the geometric distribution in the molecule through 2D and 3D angleaware aggregations. Considering that the neighbors in 2D view are more sparse by contrast with the neighbors in 3D view, we develop the welldirected layers for both views. Specifically, for 2D view graph learning, the following angleinjected function is employed to update the 2D edge embedding:
(7) 
where denotes the set of neighboring edges of the edge , is the 2D angle embedding between the edge and , is the elementwise dot operation, and are learnable parameters. For the 3D view graph, there are sufficient neighbors around each edge. Inspired by the recent work li2021structure, we divide the neighboring edges of each target edge into several angle domains according to 3D spatial angle . Then we apply a hierarchical aggregation process among edges for 3D view graph learning, which consists of local and global stages:
(8)  
(9) 
where is the aggregated 3D edge embedding at th angle domain through the local stage, and are trainable parameters, is the 3D angle embedding between the edge and .
means the max pooling function over all
local edge embeddings, which can generally extract the highlevel spatial distribution information to strengthen the geometric structure learning.(iii) EdgeNode Message Passing.
After obtaining the angleaware edge embeddings and , we apply the edgenode message passing to fulfil the propagation process from the edge back to the node. The essential distance factor between nodes is wellconsidered via the similar adaptive scheme with the previous edgeedge component:
(10)  
(11) 
where , , and are learnable parameters, and are the geometric embeddings of local and global distances respectively, is the set of all neighboring edges for node in 2D view, is the set of neighboring edges located in th distance domain among all divided domains in 3D view.
(iv) NodeGraph Attentive Pooling.
Since all geometric factors thoroughly enrich the node and edge embeddings in both 2D and 3D views via stacked message passing layers, the final representations and can reflect the graph topology along with the molecular geometry. To further get the graphlevel representation with identifying the important nodes, we follow xiong2019pushing and adopt the nodegraph attentive pooling layer. For simplicity, we use to represent or and use to represent or . The graphlevel embedding is updated iteratively through the attentive propagation process, which starts with the initial embedding .
(12)  
(13) 
where is the global context message at th pooling layer, which aggregates the valuable nodes from the full set . After performing pooling layers, the final graph representations for 2D view and for 3D view are acquired.
4.3 Geometric Contrastive Optimization
Despite the great progress made in the study of graph contrastive learning, the distinctive semantics and geometry of molecular graphs are always ignored. As a result, the conventional data augmentations may change the targeted molecular property as stated in sun2021mocl, which emphasized the importance of domain knowledge to generate molecule variants. To tackle this challenge, we intend to enhance the molecular representation learning from the perspective of geometric contrast. As shown in Figure 2(c), the correlated 2D and 3D views can supervise each other without constructing additional fake samples. First, the 2D and 3D projection heads are adopted to map the representations of two views into the space for contrastive learning:
(14) 
Our goal is to make the maximum consistency between 2D3D positive pairs compared with negative pairs. Given one batch with
molecules, we have the following contrastive loss function under 2D3D geometric views:
(15) 
where denotes the inner product to measure the similarity, is the scale parameter. Since and are two embeddings in different views from the same molecule, they are regarded as a positive pair while the remaining pairs in the batch are considered as negative pairs. Furthermore, to reflect the local spatial correlations across 3D geometric domains, we propose additional constraints as a spatial regularization technique. The key idea of this regularizer is to encourage the transformation matrices of adjacent angle domains to be similar to each other:
(16) 
Finally, we combine the spatial and contrastive loss and arrive at the following objective function:
(17) 
where is the number of molecules in the dataset, is the tradeoff parameter that controls the importance of spatial regularizer to better guide the representation learning.
4.4 Downstream Inference
When GeomGCL has been optimized through the geometric contrastive learning process, we utilize the welltrained 2D geometric MPNN and 3D geometric MPNN for downstream applications. The representations of two views are combined to predict the molecular properties via the finetune process. Formally, the prediction head can be written as follows:
(18) 
For different tasks, we use the cross entropy loss function for classification loss and use the L1 loss function for regression loss . The spatial regularizer is also adopted for better performance.
(19)  
(20) 
where denotes the predicted value and is the measured true value of one specific molecular property.
5 Experiments
In this section, we conduct experiments on seven wellknown benchmark datasets to demonstrate the effectiveness of GeomGCL for molecular property prediction.
5.1 Experiment Settings
Datasets.
To evaluate the performance of our proposed model with the existing molecular representation learning methods, we use seven molecular datasets from MoleculeNet wu2018moleculenet
including ClinTox, Sider, Tox21 and ToxCast four physiology datasets for graph classification tasks, as well as ESOL, FreeSolv and Lipophilicity three physical chemistry datasets for graph regression tasks. The main statistics of datasets are summarized in Table
1.Baselines.
We compare the proposed GeomGCL with a variety of stateoftheart baseline models for molecular property prediction, which includes molecular message passingbased methods, geometry learningbased GNN methods, and graph contrastive learning methods. The first group consists of three welldesigned message passing neural networks. AttentiveFP xiong2019pushing adopts the graph attention network for molecular representation learning. DMPNN yang2019analyzing and CoMPT ijcai2021309 are message passing models with considering edge features in a nodeedge interactive manner. Geometry learningbased models contain several GNN approaches. SGCN danel2020spatial directly encodes the atomic position information in the aggregation process. MAT maziarka2020molecule incorporates the local geometric distance into the graphbased transformer model. To comprehensively reflect the superiority of our proposed model, we also compare GeomGCL against with HMGNN shui2020heterogeneous and DimeNet klicpera_dimenet_2020, both of which can learn the geometric distance and angle factors in 3D space for quantum property prediction. Besides, the recent graph contrastive models are compared to show the power of our proposed 2D3D geometric contrastive learning strategy. InfoGraph sun2019infograph maximizes the mutual information between nodes and graphs, while MoCL sun2021mocl introduces the multilevel domain knowledge for molecular graphs in a welldesigned contrastive learning framework.
Dataset  # Tasks  Task Type  # Molecules 

ClinTox  2  Classification  1484 
Sider  27  Classification  1427 
Tox21  12  Classification  7831 
ToxCast  617  Classification  8597 
ESOL  1  Regression  1128 
FreeSolv  1  Regression  643 
Lipophilicity  1  Regression  4200 
Model  Graph Classification (ROCAUC)  Graph Regression (RMSE)  
ClinTox  Sider  Tox21  ToxCast  Cls.Ave  ESOL  FreeSolv  Lipophilicity  Reg.Ave  
AttentiveFP  0.808  0.605  0.835  0.743  0.748  0.578  1.034  0.602  0.738 
DMPNN  0.886  0.637  0.848  0.743  0.779  0.647  1.092  0.591  0.777 
CoMPT  0.877  0.626  0.836  0.755  0.774  0.589  1.103  0.590  0.761 
SGCN  0.825  0.560  0.769  0.656  0.703  1.329  2.061  1.075  1.488 
MAT  0.898  0.619  0.834  0.735  0.772  0.624  1.059  0.705  0.796 
HMGNN  0.680  0.607  0.794  0.702  0.696  0.701  1.207  0.720  0.876 
DimeNet  0.760  0.615  0.780  0.645  0.7000  0.633  0.978  0.614  0.742 
InfoGraph  0.781  0.585  0.793  0.705  0.716  0.914  2.104  0.845  1.288 
MoCL  0.739  0.629  0.824  0.718  0.727  0.934  1.478  0.742  1.051 
GeomMPNN  0.900  0.638  0.838  0.743  0.780  0.555  0.913  0.578  0.682 
GeomGCL  0.919  0.648  0.850  0.763  0.796  0.575  0.866  0.541  0.661 
Implementation Details.
Following the previous works, we evaluate all methods through the kfold crossvalidation experiments, and we set k as 10 to report the robustly average experimental results. As recommended by the MoleculeNet benchmarks wu2018moleculenet, we randomly split each dataset into training, validation, and testing set with a ratio of 0.8/0.1/0.1. The validation set is used for early stop and model selection. We use ROCAUC and RMSE metrics for graph classification and graph regression tasks respectively.
The 3D structures of molecules are generated for times through the stochastic optimization algorithm of Merck Molecular Force Field (MMFF), which is implemented in the RDKit package tosco2014bringing. For our model, We use Adam optimizer for model training with a learning rate of 1e3. We set the batch size as 256 for contrastive learning and 32 for finetuning with the scale parameter . The hidden size of all models is set to 128. The cutoff distance is determined (4 Å or 5 Å) according to the size of the molecule on each dataset. We set the dimension of geometric embedding as 64. The numbers of 3D angle domains and global distance domains are set to 4. The balancing hyperparameter is set to 0.01 according to the performance on validation set. For baseline models, we tune parameters of each method based on recommended settings in the paper to ensure the best performance. As a general setting sun2019infograph; sun2021mocl, our proposed GeomGCL and contrastive learning baselines are pretrained on molecular graphs of each dataset, and then we finetune the model for the downstream task on the same dataset.
5.2 Performance Evaluation
Overall Comparision.
The performance results of evaluating each model for graph classification and regression tasks are presented in Table 2. As we can see, our model significantly outperforms all the baselines on both types of tasks. On the whole, we can observe that our proposed GeomGCL improves the performance over the best message passing baselines with 2.18% and 10.4% for classification and regression tasks, respectively.
Among all baseline approaches, the welldesigned message passing models generally show the best performance, which indicates that the essential bond features of the molecule can provide chemical semantic information for molecular representation learning. Specifically, since DMPNN and CoMPT adopt the nodeedge interactive scheme, they perform slightly better than AttentiveFP. As to geometrybased baseline models, MAT can take advantage of the geometric distance from the molecular graph and performs much better than SGCN, which directly encodes the 3D coordinates and can be easily affected by the coordinate systems. Although HMGNN and DimeNet can identify the distance and angle information, they learn the molecular embedding only based on the 3D geometric graph which might be noisy. Besides, these models are designed for quantum property prediction and may not be expert in modeling the larger molecules. By contrast, our GeomMPNN and GeomGCL are capable of learning from the stable 2D graph and the informative 3D geometric structure. For contrastive learning methods, MoCL achieves better results than InfoGraph, showing the significance of domain knowledge for developing the contrastive strategy without changing the chemical semantics. However, the failure of leveraging the critical geometric information limits their ability to model the molecule without labels, while our method can capture geometryaware structural information by contrasting the 2D3D geometric views. Therefore, GeomGCL is much effective for molecular representation learning and can accurately predict each targeted property.
Ablation Study.
To further investigate the factors that influence the performance of the proposed GeomGCL framework, we conduct the ablation study on four benchmark datasets for classification and regression tasks with designing different variants of GeomGCL.

GeomMPNN2D only preserves the singlechannel 2D geometric message passing layers.

GeomMPNN3D only preserves the singlechannel 3D geometric message passing layers.

GeomMPNN uses the dualchannel geometric message passing layers without contrastive learning.

GeomGCLNoReg removes the spatial regularizer .
As shown in Figure 4, GeomGCL achieves the best performance among all architectures, proving the necessity of learning the 2D3D geometric structures contrastively and synergistically. To be specific, we can find that learning representations from a single 2D or 3D view can not always perform better than the other view across different datasets, which supports our hypothesis that only a geometric view of molecular graph is not sufficient. Therefore, decoupling 2D or 3D geometric message passing layers from GeomGCL yields a significant drop in performance. Additionally, the use of spatial regularizer when training the model can help GeomGCL to discriminate the relative correlations of different angle domains and then contributes to the performance improvements. What’s more, GeomMPNN significantly performs worse than GeomGCL, which confirms that our geometric contrastive learning scheme is beneficial for molecular representation learning.
Parameter Analysis.
Finally, we analyze the performance variation for GeomGCL by varying the coefficient to look deeper into the impact of the spatial regularizer. As depicted in Figure 5, we observe that the performance first tends to get better with incorporating more 3D angle domain information while training the model, and then begins to drop off slightly. The appropriate tradeoff weight can assist the model in identifying the geometric factors and enhancing the representation learning. Overall, the performance of GeomGCL is stable and always better than baseline methods.
6 Conclusion
In this paper, we propose a novel geometric graph contrastive learning framework named GeomGCL for molecular representation learning, which builds the bridge between the geometric structure learning and the graph contrastive learning. Along this line, we design the dualchannel geometric message passing neural networks to sufficiently capture the distance and angle information under both 2D and 3D views. Then the appropriate geometrybased contrastive learning strategy is proposed to enhance the molecular representation learning with the spatial regularizer. The experimental results on the downstream property prediction tasks demonstrate the effectiveness of the proposed GeomGCL.