HYDRA – Hyper Dependency Representation Attentions

09/11/2021
by   Ha-Thanh Nguyen, et al.
0

Attention is all we need as long as we have enough data. Even so, it is sometimes not easy to determine how much data is enough while the models are becoming larger and larger. In this paper, we propose HYDRA heads, lightweight pretrained linguistic self-attention heads to inject knowledge into transformer models without pretraining them again. Our approach is a balanced paradigm between leaving the models to learn unsupervised and forcing them to conform to linguistic knowledge rigidly as suggested in previous studies. Our experiment proves that the approach is not only the boost performance of the model but also lightweight and architecture friendly. We empirically verify our framework on benchmark datasets to show the contribution of linguistic knowledge to a transformer model. This is a promising result for a new approach to transferring knowledge from linguistic resources into transformer-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

The utility of linguistic annotation in neural machine translation seeme...
research
05/03/2023

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

Transformer-based models show state-of-the-art performance even for larg...
research
08/30/2021

Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

The highly popular Transformer architecture, based on self-attention, is...
research
11/10/2020

When Do You Need Billions of Words of Pretraining Data?

NLP is currently dominated by general-purpose pretrained language models...
research
03/26/2021

Dodrio: Exploring Transformer Models with Interactive Visualization

Why do large pre-trained transformer-based models perform so well across...
research
09/23/2021

Incorporating Linguistic Knowledge for Abstractive Multi-document Summarization

Within natural language processing tasks, linguistic knowledge can alway...
research
09/09/2018

How clever is the FiLM model, and how clever can it be?

The FiLM model achieves close-to-perfect performance on the diagnostic C...

Please sign up or login with your details

Forgot password? Click here to reset