Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation Extraction

04/21/2020 ∙ by Yuming Shang, et al. ∙ Beijing Institute of Technology 0

Relation ties, defined as the correlation and mutual exclusion between different relations, are critical for distant supervised relation extraction. Existing approaches model this property by greedily learning local dependencies. However, they are essentially limited by failing to capture the global topology structure of relation ties. As a result, they may easily fall into a locally optimal solution. To solve this problem, in this paper, we propose a novel force-directed graph based relation extraction model to comprehensively learn relation ties. Specifically, we first build a graph according to the global co-occurrence of relations. Then, we borrow the idea of Coulomb's Law from physics and introduce the concept of attractive force and repulsive force to this graph to learn correlation and mutual exclusion between relations. Finally, the obtained relation representations are applied as an inter-dependent relation classifier. Experimental results on a large scale benchmark dataset demonstrate that our model is capable of modeling global relation ties and significantly outperforms other baselines. Furthermore, the proposed force-directed graph can be used as a module to augment existing relation extraction systems and improve their performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Relation extraction, defined as the task of extracting structured relations from primitive unstructured text, is crucial in natural language processing (NLP). Conventional supervised methods are time-consuming for the requirement of large-scale manually labeled data. Therefore,  mintz2009distant mintz2009distant propose distant supervision to automatically label sentences. It assumes that if two entities have a relation

in a knowledge graph, then any sentence that mentions the two entities might express that relation. As there may be multiple relations between one entity pair, distant supervised relation extraction is a multi-label prediction task.

Figure 1: The difference between our method and previous methods when learning relation ties. The solid arrow indicates that the new relation must exist. The dashed arrow indicates that the new relation may exist.

Interestingly, relations in distant supervision usually have inner correlation and mutual exclusion, which we call relation ties. As shown in Figure 1, if an entity pair (Barack Obama, United States) has a relation President_of, then we can infer that the entity pair must have relations Place_lived and Nationality. Similarly, the relation Place_of_birth

may also exist between the two entities in a certain probability. On the contrary, we can also infer that the entity pair does not contain relation

Location. Because the head entity Barack Obama is the name of a person, not a place. Obviously, considering relation ties can effectively narrow down the potential searching space and significantly improve the performance of relation extraction.

Existing studies on learning relation ties broadly fall into two types: the explicit methods and the implicit methods. The former tries to use model architecture to explicitly represent dependencies and conflicts between relations, e.g., Markov Logic Network [5] or Encoder-Decoder framework [12]

. The latter implicitly learns relation ties by soft constraints, e.g., designing loss function

[7, 14] or using attention mechanism [3]. However, as shown in the right part of Figure 1, previous approaches greedily obtain local dependencies between relations at each learning step, and have difficulty in making a global optimization. As a consequence, they fail to precisely describe the complex global topology structure of relation ties and may easily fall into a locally optimal solution.

To address this issue, in this paper, we propose a novel Force-Directed Graph based Relation Extraction model, named FDG-RE, which is able to comprehensively learn the global relation ties in an end-to-end manner. Specifically, we build a graph based on global co-occurrence of relations, where each node is represented by a relation embedding and the edge indicates the co-occurrence between two relations. Intuitively, the ideal topology structure of the graph should be that related nodes are close while conflicted nodes are far away. To this end, we borrow the Coulomb’s Law [4] from physics and introduce the concept of attractive force and repulsive force into the graph. The aim of attractive force is to increase the similarity between two correlated relation embeddings so that they are close with each other in the embedding space, while the repulsive force works in the opposite direction. To simulate attractive force, we employ graph convolutional network (GCN) [8] to obtain information propagation between correlated relation embeddings. To simulate repulsive force, we utilize the similarity between conflicted relation embeddings as a penalty term for the objective loss function. Finally, the relation representations learned by the force-directed graph are applied as an inter-dependent relation classifier. Experimental results prove that our FDG-RE performs better than the state-of-the-art baselines.

To sum up, our contributions can be encapsulated as follows:

  • Different from existing methods, the proposed FDG-RE can precisely learn global topology structure of relation ties.

  • The proposed force-directed graph can be applied as an independent module to other relation extraction methods to improve their performance.

  • Experiments on a widely used dataset prove that our FDG-RE achieves state-of-the-art performance.

2 Motivations

In distant supervision scenario, every relation has its own correlated relations and conflicted relations. Therefore, if we want to precisely learn global relation ties, the following three core problems must be solved:

(1) What kind of data structure is appropriate for describing relation ties?

Intuitively, the correlation and mutual exclusion between relations constitute a complex network. In order to comprehensively represent all connections in this network, we build a graph based on the co-occurrence of relations. In this graph, each node is represented by a relation embedding and the edge between two nodes indicates the co-occurrence dependency of the two relations.

(2) What is the ideal topology structure of the graph?

To explore this issue, we borrow the idea of Coulomb’s Law from physics. The law states that the force between two charges () is directly proportional to the product of them and inversely to the square of the distance between them. If , the force is negative (repulsive force), if , the force is positive (attractive force). When the Coulomb force acting on several charges, it tries to make opposite-sign charges close while like-sign charges away. Finally, all charges are moved to an equilibrium where all forces add up to zero, and the position of charges stays stable. Similarly, we can deduce that the ideal topology structure of the graph should be the same as the distribution of charges.

(3) How to model attractive force and repulsive force?

To address this point, we extend the concept of “force” into relation embedding space. We define that the attractive force is a kind of calculation, which can increase the similarity between relation embeddings. On the contrary, the aim of repulsive force is to reduce the similarity between relation embeddings. To this end, we employ GCN to obtain information propagation between relation embeddings to play as attractive force. Besides, we use the similarity between conflicted relation embeddings as a penalty term for the objective loss function to play as repulsive force.

3 Method

In this section, we present our force-directed graph based distant supervised relation extraction model — FDG-RE. We first give the task definition. Then, we provide detail formalization of the force-directed graph, with special emphasize on modeling attractive force and repulsive force. Finally, we introduce the implementation of relation extraction.

3.1 Task Definition

We define relation classes as , where is the number of relations. Given a bag of sentences consisting of sentences and an entity pair () presenting in all sentences. In distant supervised relation extraction, the purpose is to predict a set of target relations () according to the entity pair () and the sentence-bag . Because the predicted relations often have inner connections, the goal of learning relation ties is to capture global correlation and mutual exclusion between relations.

3.2 Learning Relation Ties

This section illustrates how we construct the force-directed graph and model relation ties.

3.2.1 Graph Construction

In order to capture the global topology structure of relation ties, we build a graph with relation embeddings as nodes and the co-occurrence between relations as edges , where and are the number of relations and edges respectively. Concretely, if two relations () appear in a same entity pair (), there will be an edge between the two nodes (). Finally, as shown in Figure 2, the adjacency matrix of the graph is the symmetrical co-occurrence matrix of relations.

Figure 2: An example of constructing adjacency matrix of the graph.

3.2.2 Attractive Force

In general, we conclude that the correlations between relations can have two categories: weak correlation and strong correlation. Weak correlation mainly involve the co-occurrence between relations such as Place_lived and Born_in. While, strong correlation means the logical entailment such as “President_of Nationality”. It is worth noting that the correlation between relations is directional. For example, the probability of “President_of Nationality” is 1. In contrast, the probability of “Nationality President_of” is close to 0. To represent such asymmetric correlations, we introduce occurrence times of relation to the co-occurrence matrix M to get the conditional probability transition matrix P:

(1)

where denotes the probability of “”. As mentioned above, for an ordinary people, the probability of “Nationality President_of” is close to 0. To make our model more generalizable, we filter these “noisy transition” with a threshold , that is, if , .

Then, we employ a -layer GCN to obtain information propagation between relation embeddings:

(2)

where is the relation representations of the -th layer and is the dimension of relation embeddings. denotes the filtered probability transition matrix. is the weight matrix to learn. is a non-linear function. Intuitively, the more information is exchanged between relation embeddings, the closer their spatial locations will be.

3.2.3 Repulsive Force

To obtain global mutual exclusion information between relations, we transform co-occurrence matrix M into mutual exclusion matrix U:

(3)

Then, we define the similarity between relation embeddings and with a simple dot product:

(4)

Note that we assume a relation is “conflicted” with itself, that is, if . This assumption can be regarded as a normalization to make relation embeddings more stable. Thus, the global mutual exclusion between relations is defined as:

(5)

where denotes element-wise multiply operation. Because is the sum of all pairwise mutual exclusion, its value is too large. We further scale it by:

(6)

Finally, we leverage as the penalty term for the objective loss function to act as repulsive forces.

3.3 Relation Extraction

In FDG-RE, the position embeddings proposed by zeng2014relation zeng2014relation are adopted to specify the target entity pair () and make model pay more attention to the words close to target entities. The final representation of a word is the concatenation of word embedding and two position embeddings :

(7)

We employ PCNN [15]

to learn sentence-level features, which mainly consists of two parts: a traditional convolutional neural network (CNN) and piece-wise max-pooling. Suppose

is one of the feature maps learned by CNN, PCNN divides every feature map into three parts { } by the position of two target entities (). Then, the max-pooling operation is performed on the three parts separately. The final sentence representation

is the concatenation of all vectors:

(8)

We employ sentence-level selective attention to combine embedded sentences into one bag representation , aiming to aggregate information across sentences:

(9)

where is calculated by:

(10)

is a coupling coefficient which scores how well the input sentence and the target relation matches. The output of the neural network is:

(11)

where B denotes the sentence-bag representation matrix, H is the relation representations learned by GCN, the bias term in this equation is omitted for convenient description. In fact, H is an inner-dependent classification network which has learned the correlations between relations. We employ softmax to get the final prediction probability:

(12)

Finally, the objective function of the model is:

(13)

Where represents the repulsive forces between mutual exclusive relations obtained by equation (6), is a harmonic factor that balances the two terms. is the predicted relations of sentence-bag . indicates all parameters of the model.

4 Experiments

(a) PCNN+ATT
(b) PCNN+AVE
(c) PCNN+ONE
Figure 3: The proposed force-directed graph is applied as a module to augment three different relation extraction methods.

Our experiments are designed to demonstrate four points:

  1. The proposed force-directed graph can be used as a module to augment existing relation extraction methods and significantly improve their performance (section 4.3).

  2. Among the similar methods of learning relation ties, our FDG-RE performs best (section 4.4).

  3. FDG-RE outperforms the state-of-the-art distant supervised relation extraction methods (section 4.5).

  4. FDG-RE can indeed learn the topology structure of relation ties (section 4.6).

In the following, we first introduce the dataset and evaluation metrics. Second, we show the experimental setup. Third, we conduct three parts of detailed comparison in response to the experimental purposes 1-3. Finally, we illustrate the visualization of relation embeddings in response to the experimental purpose 4.

4.1 Dataset and Evaluation Metrics

We evaluate our FDG-RE and all baselines on a widely used dataset NYT developed by riedel2010modeling riedel2010modeling, which was structured by aligning relations in Freebase [2] with the New York Times (NYT) corpus. In the NYT dataset, training sentences are from 2005-2006 corpus and test sentences are from 2007. Specifically, it contains 520K training sentences and 172K test sentences. There are 53 unique relations including a special relation NA that signifies no relation between the entity pair.

Following the previous methods [15, 9, 14, 12, 11], we evaluate all models in held-out evaluation and present precision-recall curves (PR-Curves). The held-out evaluation is an approximate measure of the model, which uses the extracted relations to automatically compare with the fact in knowledge graph.

4.2 Setup

For all baselines, during training, we follow the settings used in their papers. We set the hyper-parameters in FDG-RE by Random Search [1]. Table 1 shows the parameters used in FDG-RE.

Setting Number
Kernel size 3
Number of feature maps 320
Word embedding dimension 50
Position embedding dimension 5
learning rate 0.19
Threshold 0.18
Harmonic factor 0.25
Number of GCN layers 2
Table 1: Parameters Setting

4.3 Act as a Module

In this section, we conduct experiments to demonstrate that the proposed force-directed graph can be applied as a module to augment existing relation extraction methods and significantly improve their performance.

4.3.1 Baselines

We select three conventional relation extraction methods as baselines. During extraction, they all predict relations independently and ignore the relation ties.

  • PCNN+ATT: lin2016neural lin2016neural propose to use sentence-level attention mechanism to obtain sentence-bag representations.

  • PCNN+AVE: We try to obtain the bag-level representations via the average of all the sentence representations.

  • PCNN+ONE: zeng2015distant zeng2015distant propose to use the feature of the most correct sentence to represent the sentence-bag.

4.3.2 Results

The results are shown in Figure 3, +FDG means applying the force-directed graph to the corresponding model. It can be observed that the proposed module can significantly improve the performance of three baselines. This proves that: (1) Considering relation ties in distant supervised relation extraction can indeed reduce the potential searching space and improve the prediction performance. (2) The proposed force-directed graph is flexible and adaptable.

4.4 Compare with Similar Methods

In this part, we compare FDG-RE with similar methods which focus on learning relation ties to show that our model performs best.

4.4.1 Baselines

We use the following four models as baselines:

  • MIMLCNN: jiang2016relation jiang2016relation obtain relation dependencies by designing multi-label loss function in the neural network classifier.

  • Rank+ExATT: jointly jointly adopt pairwise learning to rank framework to capture the co-occurrence dependency between relations.

  • Memory: feng2017effective feng2017effective use memory network to capture relation dependencies.

  • PartialMax+IQ+ATT: su2018exploring su2018exploring utilize the Encoder-Decoder framework to capture relation dependencies and predict relations with a RNN decoder.

We implemented MIMLCNN and PartialMax+IQ+ATT. For Rank+ExATT111https://github.com/oceanypt/DR_RE and Memory222https://github.com/liuyongjie985/Effective_Deep_Memory_Net- works_for_Distant_Supervised_Relation_Extraction, we use the codes provided by authors.

4.4.2 Results

Figure 4: Comparison with the similar methods.

Figure 4 shows the resulting PR-Curves in the most concerned area. It can be observed that:

(1) Comparing the explicit methods (FDG-RE, PartialMax+IQ+ATT) with the implicit methods (MIMLCNN, Rank+ExATT, Memory), we can conclude that the explicit methods perform better than implicit methods. Because RNN can well describe the linear dependencies between relations, and GCN is good at learning the regional dependencies. In other words, using RNN or GCN means the prior knowledge of the topology structure is added at the beginning of the training process.

(2) Among the two explicit methods, FDG-RE can consistently and significantly outperform PartialMax+IQ+ATT in the entire range of recall. This proves that considering global connections is better than focusing on local dependencies. Concretely, PartialMax+IQ+ATT tries to use a linear Encoder-Decoder framework to learn relation ties. However, the pre-defined order of relations has an important influence on the final prediction. For example, if decoder has predicted relation Nationality, it will not predict President_of at later steps. As a result, all the downstream relations of President_of will be unreachable.

Overall, the experimental results demonstrate that the motivation of our work that using force-directed graph to model global topology structure of relation ties is effective.

4.5 Compare with SOTA Methods

We further compare FDG-RE with the latest distant supervised relation extraction methods to illustrate that our model achieves the state-of-the-art performance.

4.5.1 Baselines

Our model does not use any external information (e.g., entity type, entity description and so on). Therefore, we select three latest methods that do not use external information as baselines:

  • PCNN+C2SA: yuan2019cross yuan2019cross use cross-relation cross-bag selective attention to deal with the noisy labeling problem.

  • PCNN+ATT_RA+BAG_ATT: ye-ling-2019-distant ye-ling-2019-distant propose intra-bag and inter-bag attention to alleviate the influence of noisy sentences.

  • DCRE: [11] try to convert noisy sentences into useful training instances by unsupervised deep clustering.

We implemented DCRE, for PCNN+C2SA333https://github.com/yuanyu255/PCNN_C2SA and PCNN+ATT_RA+BAG_ATT444https://github.com/ZhixiuYe/Intra-Bag-and-Inter-Bag-Attentions, we use the codes provided by authors. Their original paper use the NYT set which contains 570k training sentences, whose training set contains lots of test fact. Following the mainstream of distant supervised relation extraction, in our experiments, they are evaluated by the filtered NYT555https://github.com/thunlp/NRE set which has 520k training sentences.

4.5.2 Results

As shown in Figure 5, there is an obvious margin between FDG-RE and the three baselines. We believe that this observation is mainly due to (1) The three baselines all predict relations independently and ignore the relation ties. On the contrary, FDG-RE considers global correlation and mutual exclusion between relations. (2) The objective function of FDG-RE has loss term and penalty term . It can not only penalize the false classifications, but also enhance the generalization ability of the model.

Figure 5: Comparison with the state-of-the-art methods.

4.6 Topology Structure of Relation Ties

In order to prove that our proposed method can indeed obtain the global topology structure of relation ties, we visualize the relation embeddings learned by FDG-RE with Isomap [13], which is a nonlinear dimensionality reduction algorithm. The relation embeddings learned by PCNN+ATT are also visualized as a comparison. The results are shown in Figure 6. During visualization, we omit the long-tail relations to highlight key information. It can be observed that:

(1) The topology structure of relation ties learned by FDG-RE is inter-compact and intro-loose, which shows the characteristics of clustering. For example, the position of four connected relations Nationality, Place_lived, Place_of_birth and Place_of_death are close to each other. While, they are far away from the relations whose root node is Location or Business. This is consistent with our motivation. In contrast, the relation embeddings learned by PCNN+ATT are almost randomly distributed.

(2) The relation NA is in the center of Figure 6 (a). It means “no relations ” and is conflicted with all the other relations. Because the effect of repulsive forces, it “pushes” other relations away. However, PCNN cannot obtain such features.

(3) To a certain extent, the relation representations learned by our force-directed graph maintain some “semantic” correlations. For example, the relations in the same branch Location are close to each other. Therefore, the relation embeddings have the generalization abilities when perform as a relation classifier. This proves the experimental conclusion of section 4.3 from the side.

(a) FDG-RE
(b) PCNN+ATT
Figure 6: The visualization of relation embeddings learned by FDG-RE and PCNN+ATT.

5 Related Work

An entity pair may have multiple relations in knowledge graph. Therefore, previous studies formalize distant supervised relation extraction as a multi-instance multi-label prediction task [10, 6]. Afterwards, there are many attempts focusing on exploring the correlation and mutual exclusion between relations to reduce the potential searching space. Existing approaches can be broadly divided into two categories:

The first category explicitly represents relation ties by the model architecture. For example, han2016global han2016global try to utilize Markov Logic Network to represent the transition probability between relations. su2018exploring su2018exploring employ Encoder-Decoder framework to capture relation connections. While benefiting from the model architecture, these methods are limited by the learning ability of the model. Specifically, Markov Logical Network can only consider the neighboring nodes (Markov property), and Encoder-Decoder framework learns relation dependencies in a linear manner. As a consequence, they can only obtain a range of relation ties, rather than global connections between relations.

The second category implicitly captures relation ties by soft constraints. For example, jiang2016relation jiang2016relation handle relation connections by using a shared entity-pair-level representation and designing multi-label classification loss function. jointly jointly adopt pairwise learning to rank framework to capture co-occurrence dependencies among relations. feng2017effective feng2017effective propose a two-layer memory network and employ the attention mechanism to learn relation dependencies. However, these methods greedily capture relation ties according to the current sentence-bag. Although the training process will traverse all sentence-bags, focusing on local features can not yield a precise global topology structure of relation ties.

Different from the aforementioned two kinds of methods, the model proposed in this paper explicitly learns relation correlations by using GCN to obtain information propagation based on a directed graph, and implicitly captures mutual exclusion by adding penalty term into the objective loss function. Experimental results demonstrate that our proposed force-directed graph can indeed capture the global topology structure of relation ties, and can be used as a module to augment existing relation extraction methods.

6 Conclusion

In this paper, we study learning relation ties in distant supervised relation extraction and propose a novel force-directed graph based relation extraction model, named FDG-RE. As compared with previous methods, our FDG-RE introduces the concept of attractive force and repulsive force into relation embedding space and can indeed capture the global topology structure of relation ties. We conduct various experiments on a widely used benchmark dataset and the evaluation results show that our model outperforms state-of-the-art baselines. Besides, the proposed force-directed graph is flexible and adaptable, it can be used as a module to augment other relation extraction methods.

References

  • [1] J. Bergstra and Y. Bengio (2012) Random search for hyper-parameter optimization.

    Journal of Machine Learning Research

    13 (Feb), pp. 281–305.
    Cited by: §4.2.
  • [2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §4.1.
  • [3] X. Feng, J. Guo, B. Qin, T. Liu, and Y. Liu (2017) Effective deep memory networks for distant supervised relation extraction.. In IJCAI, pp. 4002–4008. Cited by: §1.
  • [4] D. Halliday, R. Resnick, and J. Walker (2013) Fundamentals of physics. John Wiley & Sons. Cited by: §1.
  • [5] X. Han and L. Sun (2016) Global distant supervision for relation extraction. In

    Thirtieth AAAI Conference on Artificial Intelligence

    ,
    Cited by: §1.
  • [6] R. Hoffmann, C. Zhang, X. Ling, L. S. Zettlemoyer, and D. S. Weld (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In Meeting of the Association for Computational Linguistics: Human Language Technologies, Cited by: §5.
  • [7] X. Jiang, Q. Wang, P. Li, and B. Wang (2016) Relation extraction with multi-instance multi-label convolutional neural networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1471–1480. Cited by: §1.
  • [8] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §1.
  • [9] Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun (2016) Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 2124–2133. Cited by: §4.1.
  • [10] S. Riedel, L. Yao, and A. McCallum (2010) Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Cited by: §5.
  • [11] Y. M. Shang, H. Huang, X. Mao, X. Sun, and W. Wei (2020) Are noisy sentences useless for distant supervised relation extraction. In Proceedings of AAAI, Cited by: 3rd item, §4.1.
  • [12] S. Su, N. Jia, X. Cheng, S. Zhu, and R. Li (2018) Exploring encoder-decoder model for distant supervised relation extraction. pp. 4389–4395. Cited by: §1, §4.1.
  • [13] J. B. Tenenbaum, V. De Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §4.6.
  • [14] H. Ye, W. Chao, Z. Luo, and Z. Li (2017) Jointly extracting relations with class ties via effective deep ranking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Cited by: §1, §4.1.
  • [15] D. Zeng, K. Liu, Y. Chen, and J. Zhao (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1753–1762. Cited by: §3.3, §4.1.