Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

02/22/2021
by   Tingyu Xia, et al.
0

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.

READ FULL TEXT
research
07/25/2022

Fine-Tuning BERT for Automatic ADME Semantic Labeling in FDA Drug Labeling to Enhance Product-Specific Guidance Assessment

Product-specific guidances (PSGs) recommended by the United States Food ...
research
08/31/2021

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

Pretrained transformer-based models such as BERT have demonstrated state...
research
02/07/2019

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Multi-task learning allows the sharing of useful information between mul...
research
05/02/2020

Generating Derivational Morphology with BERT

Can BERT generate derivationally complex words? We present the first stu...
research
03/13/2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

Training a generalizable 3D part segmentation network is quite challengi...
research
07/28/2022

SDBERT: SparseDistilBERT, a faster and smaller BERT model

In this work we introduce a new transformer architecture called SparseDi...
research
04/17/2020

Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning

Even though BERT achieves successful performance improvements in various...

Please sign up or login with your details

Forgot password? Click here to reset