Embracing Ambiguity: Improving Similarity-oriented Tasks with Contextual Synonym Knowledge

11/20/2022
by   Yangning Li, et al.
5

Contextual synonym knowledge is crucial for those similarity-oriented tasks whose core challenge lies in capturing semantic similarity between entities in their contexts, such as entity linking and entity matching. However, most Pre-trained Language Models (PLMs) lack synonym knowledge due to inherent limitations of their pre-training objectives such as masked language modeling (MLM). Existing works which inject synonym knowledge into PLMs often suffer from two severe problems: (i) Neglecting the ambiguity of synonyms, and (ii) Undermining semantic understanding of original PLMs, which is caused by inconsistency between the exact semantic similarity of the synonyms and the broad conceptual relevance learned from the original corpus. To address these issues, we propose PICSO, a flexible framework that supports the injection of contextual synonym knowledge from multiple domains into PLMs via a novel entity-aware Adapter which focuses on the semantics of the entities (synonyms) in the contexts. Meanwhile, PICSO stores the synonym knowledge in additional parameters of the Adapter structure, which prevents it from corrupting the semantic understanding of the original PLM. Extensive experiments demonstrate that PICSO can dramatically outperform the original PLMs and the other knowledge and synonym injection models on four different similarity-oriented tasks. In addition, experiments on GLUE prove that PICSO also benefits general natural language understanding tasks. Codes and data will be public.

READ FULL TEXT
research
12/02/2021

DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for Natural Language Understanding

Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained ...
research
12/30/2020

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Pre-trained Language Models (PLMs) have shown strong performance in vari...
research
08/20/2021

SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining

Recently, the performance of Pre-trained Language Models (PLMs) has been...
research
12/01/2021

Domain-oriented Language Pre-training with Adaptive Hybrid Masking and Optimal Transport Alignment

Motivated by the success of pre-trained language models such as BERT in ...
research
08/28/2023

Biomedical Entity Linking with Triple-aware Pre-Training

Linking biomedical entities is an essential aspect in biomedical natural...
research
08/18/2022

Ered: Enhanced Text Representations with Entities and Descriptions

External knowledge,e.g., entities and entity descriptions, can help huma...

Please sign up or login with your details

Forgot password? Click here to reset