Tree-structured Attention with Hierarchical Accumulation

02/19/2020
by   Xuan-Phi Nguyen, et al.
0

Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks. However, it is evident that state-of-the-art (SOTA) sequence-based models like the Transformer struggle to encode such structures inherently. On the other hand, dedicated models like the Tree-LSTM, while explicitly modeling hierarchical structures, do not perform as efficiently as the Transformer. In this paper, we attempt to bridge this gap with "Hierarchical Accumulation" to encode parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task. It also yields improvements over Transformer and Tree-LSTM on three text classification tasks. We further demonstrate that using hierarchical priors can compensate for data shortage, and that our model prefers phrase-level attentions over token-level attentions.

READ FULL TEXT
research
09/14/2019

Tree Transformer: Integrating Tree Structures into Self-Attention

Pre-training Transformer from large-scale raw texts and fine-tuning on t...
research
09/30/2018

Phrase-Based Attentions

Most state-of-the-art neural machine translation systems, despite being ...
research
11/28/2021

ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

The ability to reason with multiple hierarchical structures is an attrac...
research
12/16/2021

Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees

Transformer networks are the de facto standard architecture in natural l...
research
11/28/2021

FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding

Inducing latent tree structures from sequential data is an emerging tren...
research
05/04/2023

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer

Various template-based and template-free approaches have been proposed f...
research
06/04/2021

Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing

Hierarchical structures exist in both linguistics and Natural Language P...

Please sign up or login with your details

Forgot password? Click here to reset