1 Introduction
Relation extraction, defined as the task of extracting structured relations from primitive unstructured text, is crucial in natural language processing (NLP). Conventional supervised methods are timeconsuming for the requirement of largescale manually labeled data. Therefore, mintz2009distant mintz2009distant propose distant supervision to automatically label sentences. It assumes that if two entities have a relation
in a knowledge graph, then any sentence that mentions the two entities might express that relation. As there may be multiple relations between one entity pair, distant supervised relation extraction is a multilabel prediction task.
Interestingly, relations in distant supervision usually have inner correlation and mutual exclusion, which we call relation ties. As shown in Figure 1, if an entity pair (Barack Obama, United States) has a relation President_of, then we can infer that the entity pair must have relations Place_lived and Nationality. Similarly, the relation Place_of_birth
may also exist between the two entities in a certain probability. On the contrary, we can also infer that the entity pair does not contain relation
Location. Because the head entity Barack Obama is the name of a person, not a place. Obviously, considering relation ties can effectively narrow down the potential searching space and significantly improve the performance of relation extraction.Existing studies on learning relation ties broadly fall into two types: the explicit methods and the implicit methods. The former tries to use model architecture to explicitly represent dependencies and conflicts between relations, e.g., Markov Logic Network [5] or EncoderDecoder framework [12]
. The latter implicitly learns relation ties by soft constraints, e.g., designing loss function
[7, 14] or using attention mechanism [3]. However, as shown in the right part of Figure 1, previous approaches greedily obtain local dependencies between relations at each learning step, and have difficulty in making a global optimization. As a consequence, they fail to precisely describe the complex global topology structure of relation ties and may easily fall into a locally optimal solution.To address this issue, in this paper, we propose a novel ForceDirected Graph based Relation Extraction model, named FDGRE, which is able to comprehensively learn the global relation ties in an endtoend manner. Specifically, we build a graph based on global cooccurrence of relations, where each node is represented by a relation embedding and the edge indicates the cooccurrence between two relations. Intuitively, the ideal topology structure of the graph should be that related nodes are close while conflicted nodes are far away. To this end, we borrow the Coulomb’s Law [4] from physics and introduce the concept of attractive force and repulsive force into the graph. The aim of attractive force is to increase the similarity between two correlated relation embeddings so that they are close with each other in the embedding space, while the repulsive force works in the opposite direction. To simulate attractive force, we employ graph convolutional network (GCN) [8] to obtain information propagation between correlated relation embeddings. To simulate repulsive force, we utilize the similarity between conflicted relation embeddings as a penalty term for the objective loss function. Finally, the relation representations learned by the forcedirected graph are applied as an interdependent relation classifier. Experimental results prove that our FDGRE performs better than the stateoftheart baselines.
To sum up, our contributions can be encapsulated as follows:

Different from existing methods, the proposed FDGRE can precisely learn global topology structure of relation ties.

The proposed forcedirected graph can be applied as an independent module to other relation extraction methods to improve their performance.

Experiments on a widely used dataset prove that our FDGRE achieves stateoftheart performance.
2 Motivations
In distant supervision scenario, every relation has its own correlated relations and conflicted relations. Therefore, if we want to precisely learn global relation ties, the following three core problems must be solved:
(1) What kind of data structure is appropriate for describing relation ties?
Intuitively, the correlation and mutual exclusion between relations constitute a complex network. In order to comprehensively represent all connections in this network, we build a graph based on the cooccurrence of relations. In this graph, each node is represented by a relation embedding and the edge between two nodes indicates the cooccurrence dependency of the two relations.
(2) What is the ideal topology structure of the graph?
To explore this issue, we borrow the idea of Coulomb’s Law from physics. The law states that the force between two charges () is directly proportional to the product of them and inversely to the square of the distance between them. If , the force is negative (repulsive force), if , the force is positive (attractive force). When the Coulomb force acting on several charges, it tries to make oppositesign charges close while likesign charges away. Finally, all charges are moved to an equilibrium where all forces add up to zero, and the position of charges stays stable. Similarly, we can deduce that the ideal topology structure of the graph should be the same as the distribution of charges.
(3) How to model attractive force and repulsive force?
To address this point, we extend the concept of “force” into relation embedding space. We define that the attractive force is a kind of calculation, which can increase the similarity between relation embeddings. On the contrary, the aim of repulsive force is to reduce the similarity between relation embeddings. To this end, we employ GCN to obtain information propagation between relation embeddings to play as attractive force. Besides, we use the similarity between conflicted relation embeddings as a penalty term for the objective loss function to play as repulsive force.
3 Method
In this section, we present our forcedirected graph based distant supervised relation extraction model — FDGRE. We first give the task definition. Then, we provide detail formalization of the forcedirected graph, with special emphasize on modeling attractive force and repulsive force. Finally, we introduce the implementation of relation extraction.
3.1 Task Definition
We define relation classes as , where is the number of relations. Given a bag of sentences consisting of sentences and an entity pair () presenting in all sentences. In distant supervised relation extraction, the purpose is to predict a set of target relations () according to the entity pair () and the sentencebag . Because the predicted relations often have inner connections, the goal of learning relation ties is to capture global correlation and mutual exclusion between relations.
3.2 Learning Relation Ties
This section illustrates how we construct the forcedirected graph and model relation ties.
3.2.1 Graph Construction
In order to capture the global topology structure of relation ties, we build a graph with relation embeddings as nodes and the cooccurrence between relations as edges , where and are the number of relations and edges respectively. Concretely, if two relations () appear in a same entity pair (), there will be an edge between the two nodes (). Finally, as shown in Figure 2, the adjacency matrix of the graph is the symmetrical cooccurrence matrix of relations.
3.2.2 Attractive Force
In general, we conclude that the correlations between relations can have two categories: weak correlation and strong correlation. Weak correlation mainly involve the cooccurrence between relations such as Place_lived and Born_in. While, strong correlation means the logical entailment such as “President_of Nationality”. It is worth noting that the correlation between relations is directional. For example, the probability of “President_of Nationality” is 1. In contrast, the probability of “Nationality President_of” is close to 0. To represent such asymmetric correlations, we introduce occurrence times of relation to the cooccurrence matrix M to get the conditional probability transition matrix P:
(1) 
where denotes the probability of “”. As mentioned above, for an ordinary people, the probability of “Nationality President_of” is close to 0. To make our model more generalizable, we filter these “noisy transition” with a threshold , that is, if , .
Then, we employ a layer GCN to obtain information propagation between relation embeddings:
(2) 
where is the relation representations of the th layer and is the dimension of relation embeddings. denotes the filtered probability transition matrix. is the weight matrix to learn. is a nonlinear function. Intuitively, the more information is exchanged between relation embeddings, the closer their spatial locations will be.
3.2.3 Repulsive Force
To obtain global mutual exclusion information between relations, we transform cooccurrence matrix M into mutual exclusion matrix U:
(3) 
Then, we define the similarity between relation embeddings and with a simple dot product:
(4) 
Note that we assume a relation is “conflicted” with itself, that is, if . This assumption can be regarded as a normalization to make relation embeddings more stable. Thus, the global mutual exclusion between relations is defined as:
(5) 
where denotes elementwise multiply operation. Because is the sum of all pairwise mutual exclusion, its value is too large. We further scale it by:
(6) 
Finally, we leverage as the penalty term for the objective loss function to act as repulsive forces.
3.3 Relation Extraction
In FDGRE, the position embeddings proposed by zeng2014relation zeng2014relation are adopted to specify the target entity pair () and make model pay more attention to the words close to target entities. The final representation of a word is the concatenation of word embedding and two position embeddings :
(7) 
We employ PCNN [15]
to learn sentencelevel features, which mainly consists of two parts: a traditional convolutional neural network (CNN) and piecewise maxpooling. Suppose
is one of the feature maps learned by CNN, PCNN divides every feature map into three parts { } by the position of two target entities (). Then, the maxpooling operation is performed on the three parts separately. The final sentence representationis the concatenation of all vectors:
(8) 
We employ sentencelevel selective attention to combine embedded sentences into one bag representation , aiming to aggregate information across sentences:
(9) 
where is calculated by:
(10) 
is a coupling coefficient which scores how well the input sentence and the target relation matches. The output of the neural network is:
(11) 
where B denotes the sentencebag representation matrix, H is the relation representations learned by GCN, the bias term in this equation is omitted for convenient description. In fact, H is an innerdependent classification network which has learned the correlations between relations. We employ softmax to get the final prediction probability:
(12) 
Finally, the objective function of the model is:
(13) 
Where represents the repulsive forces between mutual exclusive relations obtained by equation (6), is a harmonic factor that balances the two terms. is the predicted relations of sentencebag . indicates all parameters of the model.
4 Experiments
Our experiments are designed to demonstrate four points:

The proposed forcedirected graph can be used as a module to augment existing relation extraction methods and significantly improve their performance (section 4.3).

Among the similar methods of learning relation ties, our FDGRE performs best (section 4.4).

FDGRE outperforms the stateoftheart distant supervised relation extraction methods (section 4.5).

FDGRE can indeed learn the topology structure of relation ties (section 4.6).
In the following, we first introduce the dataset and evaluation metrics. Second, we show the experimental setup. Third, we conduct three parts of detailed comparison in response to the experimental purposes 13. Finally, we illustrate the visualization of relation embeddings in response to the experimental purpose 4.
4.1 Dataset and Evaluation Metrics
We evaluate our FDGRE and all baselines on a widely used dataset NYT developed by riedel2010modeling riedel2010modeling, which was structured by aligning relations in Freebase [2] with the New York Times (NYT) corpus. In the NYT dataset, training sentences are from 20052006 corpus and test sentences are from 2007. Specifically, it contains 520K training sentences and 172K test sentences. There are 53 unique relations including a special relation NA that signifies no relation between the entity pair.
Following the previous methods [15, 9, 14, 12, 11], we evaluate all models in heldout evaluation and present precisionrecall curves (PRCurves). The heldout evaluation is an approximate measure of the model, which uses the extracted relations to automatically compare with the fact in knowledge graph.
4.2 Setup
For all baselines, during training, we follow the settings used in their papers. We set the hyperparameters in FDGRE by Random Search [1]. Table 1 shows the parameters used in FDGRE.
Setting  Number 

Kernel size  3 
Number of feature maps  320 
Word embedding dimension  50 
Position embedding dimension  5 
learning rate  0.19 
Threshold  0.18 
Harmonic factor  0.25 
Number of GCN layers  2 
4.3 Act as a Module
In this section, we conduct experiments to demonstrate that the proposed forcedirected graph can be applied as a module to augment existing relation extraction methods and significantly improve their performance.
4.3.1 Baselines
We select three conventional relation extraction methods as baselines. During extraction, they all predict relations independently and ignore the relation ties.

PCNN+ATT: lin2016neural lin2016neural propose to use sentencelevel attention mechanism to obtain sentencebag representations.

PCNN+AVE: We try to obtain the baglevel representations via the average of all the sentence representations.

PCNN+ONE: zeng2015distant zeng2015distant propose to use the feature of the most correct sentence to represent the sentencebag.
4.3.2 Results
The results are shown in Figure 3, +FDG means applying the forcedirected graph to the corresponding model. It can be observed that the proposed module can significantly improve the performance of three baselines. This proves that: (1) Considering relation ties in distant supervised relation extraction can indeed reduce the potential searching space and improve the prediction performance. (2) The proposed forcedirected graph is flexible and adaptable.
4.4 Compare with Similar Methods
In this part, we compare FDGRE with similar methods which focus on learning relation ties to show that our model performs best.
4.4.1 Baselines
We use the following four models as baselines:

MIMLCNN: jiang2016relation jiang2016relation obtain relation dependencies by designing multilabel loss function in the neural network classifier.

Rank+ExATT: jointly jointly adopt pairwise learning to rank framework to capture the cooccurrence dependency between relations.

Memory: feng2017effective feng2017effective use memory network to capture relation dependencies.

PartialMax+IQ+ATT: su2018exploring su2018exploring utilize the EncoderDecoder framework to capture relation dependencies and predict relations with a RNN decoder.
We implemented MIMLCNN and PartialMax+IQ+ATT. For Rank+ExATT^{1}^{1}1https://github.com/oceanypt/DR_RE and Memory^{2}^{2}2https://github.com/liuyongjie985/Effective_Deep_Memory_Net works_for_Distant_Supervised_Relation_Extraction, we use the codes provided by authors.
4.4.2 Results
Figure 4 shows the resulting PRCurves in the most concerned area. It can be observed that:
(1) Comparing the explicit methods (FDGRE, PartialMax+IQ+ATT) with the implicit methods (MIMLCNN, Rank+ExATT, Memory), we can conclude that the explicit methods perform better than implicit methods. Because RNN can well describe the linear dependencies between relations, and GCN is good at learning the regional dependencies. In other words, using RNN or GCN means the prior knowledge of the topology structure is added at the beginning of the training process.
(2) Among the two explicit methods, FDGRE can consistently and significantly outperform PartialMax+IQ+ATT in the entire range of recall. This proves that considering global connections is better than focusing on local dependencies. Concretely, PartialMax+IQ+ATT tries to use a linear EncoderDecoder framework to learn relation ties. However, the predefined order of relations has an important influence on the final prediction. For example, if decoder has predicted relation Nationality, it will not predict President_of at later steps. As a result, all the downstream relations of President_of will be unreachable.
Overall, the experimental results demonstrate that the motivation of our work that using forcedirected graph to model global topology structure of relation ties is effective.
4.5 Compare with SOTA Methods
We further compare FDGRE with the latest distant supervised relation extraction methods to illustrate that our model achieves the stateoftheart performance.
4.5.1 Baselines
Our model does not use any external information (e.g., entity type, entity description and so on). Therefore, we select three latest methods that do not use external information as baselines:

PCNN+C2SA: yuan2019cross yuan2019cross use crossrelation crossbag selective attention to deal with the noisy labeling problem.

PCNN+ATT_RA+BAG_ATT: yeling2019distant yeling2019distant propose intrabag and interbag attention to alleviate the influence of noisy sentences.

DCRE: [11] try to convert noisy sentences into useful training instances by unsupervised deep clustering.
We implemented DCRE, for PCNN+C2SA^{3}^{3}3https://github.com/yuanyu255/PCNN_C2SA and PCNN+ATT_RA+BAG_ATT^{4}^{4}4https://github.com/ZhixiuYe/IntraBagandInterBagAttentions, we use the codes provided by authors. Their original paper use the NYT set which contains 570k training sentences, whose training set contains lots of test fact. Following the mainstream of distant supervised relation extraction, in our experiments, they are evaluated by the filtered NYT^{5}^{5}5https://github.com/thunlp/NRE set which has 520k training sentences.
4.5.2 Results
As shown in Figure 5, there is an obvious margin between FDGRE and the three baselines. We believe that this observation is mainly due to (1) The three baselines all predict relations independently and ignore the relation ties. On the contrary, FDGRE considers global correlation and mutual exclusion between relations. (2) The objective function of FDGRE has loss term and penalty term . It can not only penalize the false classifications, but also enhance the generalization ability of the model.
4.6 Topology Structure of Relation Ties
In order to prove that our proposed method can indeed obtain the global topology structure of relation ties, we visualize the relation embeddings learned by FDGRE with Isomap [13], which is a nonlinear dimensionality reduction algorithm. The relation embeddings learned by PCNN+ATT are also visualized as a comparison. The results are shown in Figure 6. During visualization, we omit the longtail relations to highlight key information. It can be observed that:
(1) The topology structure of relation ties learned by FDGRE is intercompact and introloose, which shows the characteristics of clustering. For example, the position of four connected relations Nationality, Place_lived, Place_of_birth and Place_of_death are close to each other. While, they are far away from the relations whose root node is Location or Business. This is consistent with our motivation. In contrast, the relation embeddings learned by PCNN+ATT are almost randomly distributed.
(2) The relation NA is in the center of Figure 6 (a). It means “no relations ” and is conflicted with all the other relations. Because the effect of repulsive forces, it “pushes” other relations away. However, PCNN cannot obtain such features.
(3) To a certain extent, the relation representations learned by our forcedirected graph maintain some “semantic” correlations. For example, the relations in the same branch Location are close to each other. Therefore, the relation embeddings have the generalization abilities when perform as a relation classifier. This proves the experimental conclusion of section 4.3 from the side.
5 Related Work
An entity pair may have multiple relations in knowledge graph. Therefore, previous studies formalize distant supervised relation extraction as a multiinstance multilabel prediction task [10, 6]. Afterwards, there are many attempts focusing on exploring the correlation and mutual exclusion between relations to reduce the potential searching space. Existing approaches can be broadly divided into two categories:
The first category explicitly represents relation ties by the model architecture. For example, han2016global han2016global try to utilize Markov Logic Network to represent the transition probability between relations. su2018exploring su2018exploring employ EncoderDecoder framework to capture relation connections. While benefiting from the model architecture, these methods are limited by the learning ability of the model. Specifically, Markov Logical Network can only consider the neighboring nodes (Markov property), and EncoderDecoder framework learns relation dependencies in a linear manner. As a consequence, they can only obtain a range of relation ties, rather than global connections between relations.
The second category implicitly captures relation ties by soft constraints. For example, jiang2016relation jiang2016relation handle relation connections by using a shared entitypairlevel representation and designing multilabel classification loss function. jointly jointly adopt pairwise learning to rank framework to capture cooccurrence dependencies among relations. feng2017effective feng2017effective propose a twolayer memory network and employ the attention mechanism to learn relation dependencies. However, these methods greedily capture relation ties according to the current sentencebag. Although the training process will traverse all sentencebags, focusing on local features can not yield a precise global topology structure of relation ties.
Different from the aforementioned two kinds of methods, the model proposed in this paper explicitly learns relation correlations by using GCN to obtain information propagation based on a directed graph, and implicitly captures mutual exclusion by adding penalty term into the objective loss function. Experimental results demonstrate that our proposed forcedirected graph can indeed capture the global topology structure of relation ties, and can be used as a module to augment existing relation extraction methods.
6 Conclusion
In this paper, we study learning relation ties in distant supervised relation extraction and propose a novel forcedirected graph based relation extraction model, named FDGRE. As compared with previous methods, our FDGRE introduces the concept of attractive force and repulsive force into relation embedding space and can indeed capture the global topology structure of relation ties. We conduct various experiments on a widely used benchmark dataset and the evaluation results show that our model outperforms stateoftheart baselines. Besides, the proposed forcedirected graph is flexible and adaptable, it can be used as a module to augment other relation extraction methods.
References

[1]
(2012)
Random search for hyperparameter optimization.
Journal of Machine Learning Research
13 (Feb), pp. 281–305. Cited by: §4.2.  [2] (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §4.1.
 [3] (2017) Effective deep memory networks for distant supervised relation extraction.. In IJCAI, pp. 4002–4008. Cited by: §1.
 [4] (2013) Fundamentals of physics. John Wiley & Sons. Cited by: §1.

[5]
(2016)
Global distant supervision for relation extraction.
In
Thirtieth AAAI Conference on Artificial Intelligence
, Cited by: §1.  [6] (2011) Knowledgebased weak supervision for information extraction of overlapping relations. In Meeting of the Association for Computational Linguistics: Human Language Technologies, Cited by: §5.
 [7] (2016) Relation extraction with multiinstance multilabel convolutional neural networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1471–1480. Cited by: §1.
 [8] (2017) Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §1.
 [9] (2016) Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 2124–2133. Cited by: §4.1.
 [10] (2010) Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Cited by: §5.
 [11] (2020) Are noisy sentences useless for distant supervised relation extraction. In Proceedings of AAAI, Cited by: 3rd item, §4.1.
 [12] (2018) Exploring encoderdecoder model for distant supervised relation extraction. pp. 4389–4395. Cited by: §1, §4.1.
 [13] (2000) A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §4.6.
 [14] (2017) Jointly extracting relations with class ties via effective deep ranking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Cited by: §1, §4.1.
 [15] (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1753–1762. Cited by: §3.3, §4.1.
Comments
There are no comments yet.