Low Rank Factorization for Compact Multi-Head Self-Attention

11/26/2019
by   Sneha Mehta, et al.
0

Effective representation learning from text has been an active area of research in the fields of NLP and text mining. Attention mechanisms have been at the forefront in order to learn contextual sentence representations. Current state-of-art approaches in representation learning use single-head and multi-head attention mechanisms to learn context-aware representations. However, these approaches can be largely parameter intensive resulting in low-resource bottlenecks. In this work we present a novel multi-head attention mechanism that uses low-rank bilinear pooling to efficiently construct a structured sentence representation that attends to multiple aspects of a sentence. We show that the proposed model is more effeffective than single-head attention mechanisms and is also more parameter efficient and faster to compute than existing multi-head approaches. We evaluate the performance of the proposed model on multiple datasets on two text classification benchmarks including: (i) Sentiment Analysis and (ii) News classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2018

Enhancing Sentence Embedding with Generalized Pooling

Pooling is an essential component of a wide variety of sentence represen...
research
12/03/2020

Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !

The multi-head self-attention of popular transformer models is widely us...
research
10/18/2022

Deep Multi-Representation Model for Click-Through Rate Prediction

Click-Through Rate prediction (CTR) is a crucial task in recommender sys...
research
04/28/2020

Scheduled DropHead: A Regularization Method for Transformer Models

In this paper, we introduce DropHead, a structured dropout method specif...
research
08/21/2022

Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms

Attention has become one of the most commonly used mechanisms in deep le...
research
03/05/2021

Set Representation Learning with Generalized Sliced-Wasserstein Embeddings

An increasing number of machine learning tasks deal with learning repres...
research
10/11/2021

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

This paper describes the multi-query multi-head attention (MQMHA) poolin...

Please sign up or login with your details

Forgot password? Click here to reset