AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21

by   Danqing Zhu, et al.

Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further improvement for the problem. Recent breakthroughs of language models pre-trained on large corpora clearly show that unsupervised pre-training can vastly improve the performance of downstream tasks. In this paper, we present an Adversarial Training BERT method named AT-BERT, our winning solution to acronym identification task for Scientific Document Understanding (SDU) Challenge of AAAI 2021. Specifically, the pre-trained BERT is adopted to capture better semantic representation. Then we incorporate the FGM adversarial training strategy into the fine-tuning of BERT, which makes the model more robust and generalized. Furthermore, an ensemble mechanism is devised to involve the representations learned from multiple BERT variants. Assembling all these components together, the experimental results on the SciAI dataset show that our proposed approach outperforms all other competitive state-of-the-art methods.



There are no comments yet.


page 4

page 6


Adversarial Training for Large Neural Language Models

Generalization and robustness are both key desiderata for designing mach...

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

We present our HABERTOR model for detecting hatespeech in large scale us...

Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation

An obstacle to scientific document understanding is the extensive use of...

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Heavily pre-trained transformer models such as BERT have recently shown ...

G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

The recent GRAPH-BERT model introduces a new approach to learning graph ...

Dynamic Language Models for Continuously Evolving Content

The content on the web is in a constant state of flux. New entities, iss...

SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation

Acronym disambiguation means finding the correct meaning of an ambiguous...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.