Robustify Transformers with Robust Kernel Density Estimation

10/11/2022
by   Xing Han, et al.
9

Recent advances in Transformer architecture have empowered its empirical success in various tasks across different domains. However, existing works mainly focus on improving the standard accuracy and computational cost, without considering the robustness of contaminated samples. Existing work has shown that the self-attention mechanism, which is the center of the Transformer architecture, can be viewed as a non-parametric estimator based on the well-known kernel density estimation (KDE). This motivates us to leverage the robust kernel density estimation (RKDE) in the self-attention mechanism, to alleviate the issue of the contamination of data by down-weighting the weight of bad samples in the estimation process. The modified self-attention mechanism can be incorporated into different Transformer variants. Empirical results on language modeling and image classification tasks demonstrate the effectiveness of this approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

Attention Is Not All You Need Anymore

In recent years, the popular Transformer architecture has achieved great...
research
05/19/2019

Adaptive Attention Span in Transformers

We propose a novel self-attention mechanism that can learn its optimal a...
research
11/20/2022

Convexifying Transformers: Improving optimization and understanding of transformer networks

Understanding the fundamental mechanism behind the success of transforme...
research
01/22/2022

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

We propose GLassoformer, a novel and efficient transformer architecture ...
research
11/10/2019

Improving Transformer Models by Reordering their Sublayers

Multilayer transformer networks consist of interleaved self-attention an...
research
05/23/2023

Improving Heterogeneous Model Reuse by Density Estimation

This paper studies multiparty learning, aiming to learn a model using th...
research
11/19/2022

Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach

Exploring sample relationships within each mini-batch has shown great po...

Please sign up or login with your details

Forgot password? Click here to reset