Adversarial Self-Attention for Language Understanding

06/25/2022
by   Hongqiu Wu, et al.
0

An ultimate language system aims at the high generalization and robustness when adapting to diverse scenarios. Unfortunately, the recent white hope pre-trained language models (PrLMs) barely escape from stacking excessive parameters to the over-parameterized Transformer architecture to achieve higher performances. This paper thus proposes Adversarial Self-Attention mechanism (ASA), which adversarially reconstructs the Transformer attentions and facilitates model training from contaminated model structures, coupled with a fast and simple implementation for better PrLM building. We conduct comprehensive evaluation across a wide range of tasks on both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gain compared to regular training for longer periods. For fine-tuning, ASA-empowered models consistently outweigh naive models by a large margin considering both generalization and robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2020

Adversarial Training for Large Neural Language Models

Generalization and robustness are both key desiderata for designing mach...
research
07/14/2023

Improving BERT with Hybrid Pooling Network and Drop Mask

Transformer-based pre-trained language models, such as BERT, achieve gre...
research
10/24/2020

Rethinking embedding coupling in pre-trained language models

We re-evaluate the standard practice of sharing weights between input an...
research
09/14/2019

Tree Transformer: Integrating Tree Structures into Self-Attention

Pre-training Transformer from large-scale raw texts and fine-tuning on t...
research
05/23/2022

Simple Recurrence Improves Masked Language Models

In this work, we explore whether modeling recurrence into the Transforme...
research
08/16/2023

It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

Generative Transformer-based models have achieved remarkable proficiency...
research
06/20/2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Generative Pre-trained Transformer (GPT) models have exhibited exciting ...

Please sign up or login with your details

Forgot password? Click here to reset