Tree Transformer: Integrating Tree Structures into Self-Attention

09/14/2019
by   Yau-Shian Wang, et al.
0

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores.

READ FULL TEXT

page 8

page 12

research
02/25/2021

LazyFormer: Self Attention with Lazy Update

Improving the efficiency of Transformer-based language pre-training is a...
research
02/19/2020

Tree-structured Attention with Hierarchical Accumulation

Incorporating hierarchical structures like constituency trees has been s...
research
06/25/2022

Adversarial Self-Attention for Language Understanding

An ultimate language system aims at the high generalization and robustne...
research
02/25/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Pre-trained language models (e.g., BERT (Devlin et al., 2018) and its va...
research
09/15/2022

Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention

We propose Beat Transformer, a novel Transformer encoder architecture fo...
research
07/02/2021

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

Human language understanding operates at multiple levels of granularity ...
research
06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...

Please sign up or login with your details

Forgot password? Click here to reset