Parallel Scheduling Self-attention Mechanism: Generalization and Optimization

12/02/2020
by   Mingfei Yu, et al.
0

Over the past few years, self-attention is shining in the field of deep learning, especially in the domain of natural language processing(NLP). Its impressive effectiveness, along with ubiquitous implementations, have aroused our interest in efficiently scheduling the data-flow of corresponding computations onto architectures with many computing units to realize parallel computing. In this paper, based on the theory of self-attention mechanism and state-of-the-art realization of self-attention in language models, we propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver, to parallelize typical computations of self-attention. Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25 for two widely-adopted application schemes of self-attention. With the proposed optimization adopted, we have correspondingly come up with another two scheduling algorithms. The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture. Due to the complexity of proving the correctness of the algorithms mathematically for general cases, we have conducted experiments to reveal their validity, together with the superior quality of the solutions provided by which, by solving SAT problems for particular instances.

READ FULL TEXT
research
08/22/2018

Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding

In this paper, we propose Dynamic Self-Attention (DSA), a new self-atten...
research
10/01/2019

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

Self-attention has been a huge success for many downstream tasks in NLP,...
research
10/24/2022

Composition, Attention, or Both?

In this paper, we propose a novel architecture called Composition Attent...
research
12/30/2018

Variational Self-attention Model for Sentence Representation

This paper proposes a variational self-attention model (VSAM) that emplo...
research
07/05/2021

Learning Delaunay Triangulation using Self-attention and Domain Knowledge

Delaunay triangulation is a well-known geometric combinatorial optimizat...
research
07/14/2022

QSAN: A Near-term Achievable Quantum Self-Attention Network

Self-Attention Mechanism (SAM), an important component of machine learni...
research
05/12/2020

AttViz: Online exploration of self-attention for transparent neural language modeling

Neural language models are becoming the prevailing methodology for the t...

Please sign up or login with your details

Forgot password? Click here to reset