Unleashing the Power of Transformer for Graphs
Despite recent successes in natural language processing and computer vision, Transformer suffers from the scalability problem when dealing with graphs. The computational complexity is unacceptable for large-scale graphs, e.g., knowledge graphs. One solution is to consider only the near neighbors, which, however, will lose the key merit of Transformer to attend to the elements at any distance. In this paper, we propose a new Transformer architecture, named dual-encoding Transformer (DET). DET has a structural encoder to aggregate information from connected neighbors and a semantic encoder to focus on semantically useful distant nodes. In comparison with resorting to multi-hop neighbors, DET seeks the desired distant neighbors via self-supervised training. We further find these two encoders can be incorporated to boost each others' performance. Our experiments demonstrate DET has achieved superior performance compared to the respective state-of-the-art methods in dealing with molecules, networks and knowledge graphs with various sizes.
READ FULL TEXT