Element-Wise Attention Layers: an option for optimization

The use of Attention Layers has become a trend since the popularization of the Transformer-based models, being the key element for many state-of-the-art models that have been developed through recent years. However, one of the biggest obstacles in implementing these architectures - as well as many others in Deep Learning Field - is the enormous amount of optimizing parameters they possess, which make its use conditioned on the availability of robust hardware. In this paper, it's proposed a new method of attention mechanism that adapts the Dot-Product Attention, which uses matrices multiplications, to become element-wise through the use of arrays multiplications. To test the effectiveness of such approach, two models (one with a VGG-like architecture and one with the proposed method) have been trained in a classification task using Fashion MNIST and CIFAR10 datasets. Each model has been trained for 10 epochs in a single Tesla T4 GPU from Google Colaboratory. The results show that this mechanism allows for an accuracy of 92 Fashion MNIST dataset, while reducing the number of parameters in 97 CIFAR10, the accuracy is still equivalent to 60 while using 50

READ FULL TEXT

page 7

page 8

page 11

page 12

page 14

page 15

page 17

page 25

research
07/14/2022

Rethinking Attention Mechanism in Time Series Classification

Attention-based models have been widely used in many areas, such as comp...
research
05/09/2020

schuBERT: Optimizing Elements of BERT

Transformers <cit.> have gradually become a key component for many state...
research
01/02/2020

Lightweight Residual Densely Connected Convolutional Neural Network

Extremely efficient convolutional neural network architectures are one o...
research
09/20/2022

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Attention-based neural networks have become pervasive in many AI tasks. ...
research
11/18/2021

Efficient deep learning models for land cover image classification

The availability of the sheer volume of Copernicus Sentinel imagery has ...
research
12/02/2019

AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning

We propose a method that enables practitioners to conveniently incorpora...
research
08/26/2019

Rethinking Attribute Representation and Injection for Sentiment Classification

Text attributes, such as user and product information in product reviews...

Please sign up or login with your details

Forgot password? Click here to reset