Exploring Task Difficulty for Few-Shot Relation Extraction

Few-shot relation extraction (FSRE) focuses on recognizing novel relations by learning with merely a handful of annotated instances. Meta-learning has been widely adopted for such a task, which trains on randomly generated few-shot tasks to learn generic data representations. Despite impressive results achieved, existing models still perform suboptimally when handling hard FSRE tasks, where the relations are fine-grained and similar to each other. We argue this is largely because existing models do not distinguish hard tasks from easy ones in the learning process. In this paper, we introduce a novel approach based on contrastive learning that learns better representations by exploiting relation label information. We further design a method that allows the model to adaptively learn how to focus on hard tasks. Experiments on two standard datasets demonstrate the effectiveness of our method.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

08/29/2019

Neural Snowball for Few-Shot Relation Learning

Knowledge graphs typically undergo open-ended growth of new relations. T...
03/23/2022

Pre-training to Match for Unified Low-shot Relation Extraction

Low-shot relation extraction (RE) aims to recognize novel relations with...
01/06/2021

Curriculum-Meta Learning for Order-Robust Continual Relation Extraction

Continual relation extraction is an important task that focuses on extra...
05/25/2022

Fine-grained Contrastive Learning for Relation Extraction

Recent relation extraction (RE) works have shown encouraging improvement...
07/05/2020

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

This paper studies few-shot relation extraction, which aims at predictin...
04/15/2021

AdaPrompt: Adaptive Prompt-based Finetuning for Relation Extraction

In this paper, we reformulate the relation extraction task as mask langu...
09/11/2021

Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction

Dialogue-based relation extraction (DiaRE) aims to detect the structural...

1 Introduction

Accepted as a long paper in EMNLP 2021 (Conference on Empirical Methods in Natural Language Processing).

Relation extraction aims to detect the relation between two entities contained in a sentence, which is the cornerstone of various natural language processing (NLP) applications, including knowledge base enrichment trisedya-etal-2019-neural, biomedical knowledge discovery DBLP:conf/ijcai/GuoN0C20, and question answering DBLP:conf/ijcai/HanCW20. Conventional neural methods miwa-bansal-2016-end; tran-etal-2019-relation train a deep network through a large amount of labeled data with extensive relations, so that the model can recognize these relations during the test phase. Although impressive performance has been achieved, these methods are difficult to adapt to novel relations that have never been seen in the training process. In contrast, humans can identify new relations with very few examples. It is thus of great interest to enable the model to generalize to new relations with a handful of labeled instances.

Inspired by the success of few-shot learning in the computer vision (CV) community

DBLP:conf/cvpr/SungYZXTH18; DBLP:conf/iclr/SatorrasE18, han-etal-2018-fewrel first introduce the task of few-shot relation extraction (FSRE). FSRE requires models to be capable of handling classification of novel relations with scarce labeled instances. A popular framework for few-shot learning is meta-learning DBLP:conf/icml/SantoroBBWL16; DBLP:conf/nips/VinyalsBLKW16, which optimizes the model through collections of few-shot tasks sampled from the external data containing disjoint relations with novel relations, so that the model can learn cross-task knowledge and use the knowledge to acclimate rapidly to new tasks. A simple yet effective algorithm based on meta-learning is prototypical network (DBLP:conf/nips/SnellSZ17)

, aiming to learn a metric space in which a query instance is classified according to its distance to class prototypes. Recently, many works

DBLP:conf/aaai/GaoH0S19; DBLP:conf/icml/QuGXT20; DBLP:conf/cikm/YangZDHHC20 for FSRE are in line with prototypical networks, which achieve remarkable performance. Nonetheless, the difficulty of distinguishing relations varies in different tasks DBLP:journals/corr/abs-2007-06240, depending on the similarity between relations. As illustrated in Figure LABEL:intro, there are easy few-shot tasks whose relations are quite different, so that they can be consistently well-classified, and also hard few-shot tasks with subtle inter-relation variations which are prone to misclassification. Current FSRE methods struggle with handling the hard tasks given limited labeled instances due to two main reasons. First, most works mainly focus on general tasks to learn generalized representations, and ignore modeling subtle and local differences of relations effectively, which may hinder these models from dealing with hard tasks well. Second, current meta-learning methods treat training tasks equally, which are randomly sampled and have different degrees of difficulty. The generated easy tasks can overwhelm the training process training and lead to a degenerate model.

To fill this gap, this paper proposes a Hybrid Contrastive Relation-Prototype (HCRP) approach, which focuses on improving the performance on hard FSRE tasks. Concretely, we first propose a hybrid prototypical network, capable of capturing global and local features to generate the informative class prototypes. Next, we present a novel relation-prototype contrastive learning method, which leverages relation descriptions as anchors, and pulls the prototype of the same class closer in representation space and pushes those of different classes away. In this way, the model gains diverse and discriminative prototype representations, which could be beneficial to distinguish the subtle difference of confusing relations in hard few-shot tasks. Furthermore, we design a task-adaptive training strategy based on focal loss (DBLP:conf/iccv/LinGGHD17) to learn more from hard tasks, which allocates dynamic weights to different tasks according to task difficulty. Extensive experiments on two large-scale benchmarks show that our model significantly outperforms the baselines. Ablation and case studies demonstrate the effectiveness of the proposed modules. Our code is available at https://github.com/hanjiale/HCRP .

The contributions of this paper are summarized as follows:

  • We present HCRP to explore task difficulty as useful information for FSRE, which boosts hybrid prototypical network with relation-prototype contrastive learning to capture diverse and discriminative representations.

  • We design a novel task adaptive focal loss to focus training on hard tasks, which enables the model to achieve higher robustness and better performance.

  • Qualitative and quantitative experiments on two FSRE benchmarks demonstrate the effectiveness of our model.

2 Related Work

2.1 Few-shot Relation Extraction

Relation extraction is a foundational and important task in NLP and attracts many recent attentions DBLP:journals/corr/abs-2104-07650; nan-etal-2020-reasoning; nan2021dialogue. Few-shot relation extraction aims to predict novel relations by exploring a few labeled instances. han-etal-2018-fewrel first present a large-scale benchmark FewRel for FSRE. DBLP:conf/aaai/GaoH0S19 design a hybrid attention-based prototypical network to highlight the crucial instances and features. ye-ling-2019-multi propose a prototypical network with multi-level matching and aggregation. sun-etal-2019-hierarchical present a hierarchical attention prototypical network to enhance the representation ability of semantic space. DBLP:conf/icml/QuGXT20 utilize an external relation graph to study the relationships between different relations. wang-etal-2020-learning apply added relative position information and syntactic relation information to enhance prototypical networks. DBLP:conf/cikm/YangZDHHC20 fuse text descriptions of relations and entities by a collaborative attention mechanism. And yang-etal-2021-entity introduce the inherent concepts of entities to provide clues for relation classification. There are also some methods baldini-soares-etal-2019-matching; peng-etal-2020-learning

combining prototypical networks with pre-trained language models, which achieve impressive results.

However, the task difficulty of FSRE has not been explored. In this work, we focus on the hard tasks and propose a hybrid contrastive relation-prototype method to better model subtle variations across different relations.

2.2 Contrastive Learning

Figure 2: The overall framework of HCRP. Best viewed in color. The rectangles represent the class prototypes, the circles represent the relations, and different colors represent different classes.

Contrastive learning DBLP:journals/corr/abs-2011-00362 has gained popularity recently in the CV community. The core idea is to contrast the similarities and dissimilarities between data instances, which pulls the positives closer and pushes negatives away simultaneously. CPC DBLP:journals/corr/abs-1807-03748

proposes a universal unsupervised learning approach. MoCo

DBLP:conf/cvpr/He0WXG20 presents a mechanism for building dynamic dictionaries for contrastive learning. SimCLR DBLP:conf/icml/ChenK0H20 improves contrastive learning by using larger batch size and data augmentation. DBLP:conf/nips/KhoslaTWSTIMLK20 extend the self-supervised contrastive approach to the supervised setting. Nan_2021_CVPR propose a dual contrastive learning approach for video grounding. There are also some applications of contrastive learning in the field of NLP. DBLP:journals/corr/abs-2005-12766 employ back translation and MoCo to learn sentence-level representations. gunel2021supervised design supervised contrastive learning for pre-trained language model fine-tuning. Inspired by these works, we propose a heterogeneous relation-prototype contrastive learning in a supervised way to obtain more discriminative representations.

3 Task Definition

We follow a typical few-shot task setting, namely the -way--shot setup, which contains a support set and a query set . The support set includes novel classes, each with labeled instances. The query set contains the same classes as . And the task is evaluated on query set , trying to predict the relations of instances in . What’s more, an auxiliary dataset is given, which contains abundant base classes, each with a large number of labeled examples. Note the base classes and novel classes are disjoint with each other. The few-shot learner aims to acquire knowledge from base classes and use the knowledge to recognize novel classes. One popular approach is the meta-learning paradigm DBLP:conf/nips/VinyalsBLKW16, which mimics the few-shot learning settings at training stage. Specifically, in each training iteration, we randomly select classes from base classes, each with instances to form a support set . Meanwhile, instances are sampled from the remaining data of the classes to construct a query set . The model is optimized by collections of few-shot tasks sampled from base classes, so that it can rapidly adapt to new tasks.

For an FSRE task, each instance consists of a set of samples , where denotes a natural language sentence, indicates a pair of head entity and tail entity, and is the relation label. The name and description for each relation are also provided as auxiliary support evidence for relation extraction. For example, for a relation with its relation id “P726” in a dataset that we use, we can obtain its name “candidate” and description “person or party that is an option for an office in an election”.

4 Approach

In this section, we present the details of our proposed HCRP approach. The overall learning framework is illustrated in Figure 2. The inputs are -way--shot tasks (sampled from the auxiliary dataset ), where each task contains a support set and a query set . Meanwhile, we take the names and descriptions of these classes (i.e., relations) as inputs as well. HCRP consists of three components. The hybrid prototype learning module generates informative prototypes by capturing global and local features, which can better capture the subtle differences of relations. The relation-prototype contrastive learning component is then used to leverage the relation label information to further enhance the discriminative power of the prototype representations. Finally, a task adaptive focal loss is introduced to encourage the model to focus training on hard tasks.

4.1 Hybrid Prototype Learning

We employ BERT devlin-etal-2019-bert as the encoder to obtain contextualized embeddings of query instances and support instances , where and are the sentence lengths of the -th query instance and -th support instance in class respectively, and is the size of the resulting contextualized representations. For each relation, we concatenate the name and description and feed the sequence into the BERT encoder to obtain relation embeddings , where is the length of relation description .

Global Prototypes

For instances in and , the global features and are obtained by concatenating the hidden states corresponding to start tokens of two entity mentions following baldini-soares-etal-2019-matching. The global features of relations are obtained by the hidden states corresponding to [CLS] token (converted to dimension with a transformation). For each relation , we average the global features of the supporting instances following the work of DBLP:conf/nips/SnellSZ17, and further add the global feature of relation to form global prototype representation.

(1)

Local Prototypes

While global prototypes are capable of capturing general data representations, such representations may not readily capture useful local information within specific RSRE tasks. To better handle the hard FSRE tasks with subtle differences among highly similar relations, we further propose local prototypes to highlight key tokens in an instance that are essential to characterize different relations.

For relation , we first calculate the local feature of the -th support instance as:

(2)
(3)

where is the -th row of a matrix, is an operation that sums all elements for each row in a matrix.