Hierarchical Graph Transformer with Adaptive Node Sampling

10/08/2022
by   Zaixi Zhang, et al.
0

The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers:(1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to the attention weights and can vary in the training procedure. Meanwhile, we propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions while reducing computational complexity. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.

READ FULL TEXT
research
10/25/2021

Gophormer: Ego-Graph Transformer for Node Classification

Transformers have achieved remarkable performance in a myriad of fields ...
research
03/01/2023

Diffusing Graph Attention

The dominant paradigm for machine learning on graphs uses Message Passin...
research
06/10/2020

Bandit Samplers for Training Graph Neural Networks

Several sampling algorithms with variance reduction have been proposed f...
research
06/29/2022

Deformable Graph Transformer

Transformer-based models have been widely used and achieved state-of-the...
research
06/10/2022

NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs

Graph Transformers have demonstrated superiority on various graph learni...
research
06/09/2021

Do Transformers Really Perform Bad for Graph Representation?

The Transformer architecture has become a dominant choice in many domain...
research
06/08/2023

RRWKV: Capturing Long-range Dependencies in RWKV

Owing to the impressive dot-product attention, the Transformers have bee...

Please sign up or login with your details

Forgot password? Click here to reset