Capturing the semantic relationship between two concepts is a fundamental operation for many semantic interpretation tasks. This is a task which humans perform rapidly and reliably by using their linguistic and commonsense knowledge about entities and relations. Natural language processing systems which aspire to reach the goal of producing meaningful representations of text must be equipped to identify and learn semantic relations in the documents they process.
The automatic recognition of semantic relations has many applications such as information extraction, document summarization, machine translation, or the construction of thesauri and semantic networks. It can also facilitate auxiliary tasks such as word sense disambiguation, language modeling, paraphrasing, and recognizing textual entailment.
However it is not always possible to establish a direct semantic relation given two entity mentions in text. In the Semeval 2010 Task 8 test collection  for example 17.39% of the semantic relations mapped within sentences were assigned with the label ”OTHER”, meaning that they could not be mapped to the set of 9 direct semantic relations 111Cause-Effect, Instrument-Agency, Product-Producer, Content-Container, Entity-Origin, Entity-Destination, Component-Whole, Member-Collection, Communication-Topic. In many cases, the semantic relations between two entities can only be expressed by a composition of two or more operations. This work aims at improving the description and the formalization of the semantic relation classification task by introducing the concept of composite semantic relation classification, in which the relations between entities can be expressed using the composition of one or more relations.
This paper is organized as follows: Section 2 describes the semantic relation classification problem and the related work followed by the proposed composite semantic relation classification (Section 3), Section 4 describes the existing baseline models; while Section 5 describes the experimental setup and analyses the results, providing a comparative analysis between the proposed model and the baselines. Finally, Section 6 provides the conclusion.
2 Composite Semantic Relation Classification
2.1 Semantic Relation Classification
Semantic relation classification is the task of classifying the underlying abstract semantic relations between target entities (terms) present in texts. The goal of relation classification is defined as follows: given a sentence with the pairs of annotated target nominals and , the relation classification system aims to classify the relations between and in given texts within the pre-defined relation set . For instance, the relation between the nominal burst and pressure in the following example sentence is interpreted as Cause-Effect().
The has been caused by water hammer .
2.2 Existing Approaches for Semantic Relation Classification
Different approaches have been explored for relation classification, including unsupervised relation discovery and supervised classification. Existing literature have proposed various features to identify the relations between entities using different methods.
Recently, Neural network-based approaches have achieved significant improvement over traditional methods based on either human-designed features
. However, existing neural networks for relation classification are usually based on shallow architectures (e.g., one-layer convolutional neural networks or recurrent networks). In exploring the potential representation space at different abstraction levels, they may fail to perform.
The performance of supervised approaches strongly depends on the quality of the designed features . With the recent improvement in Deep Neural Network (DNN), many researchers are experimenting with unsupervised methods for automatic feature learning. 
introduce gated recurrent networks, in particular, Long short-term memory (LSTM), to relation classification. use Convolutional Neural Network (CNNs). Additionally, 
replace the common Softmax loss function with a ranking loss in their CNN model. design a negative sampling method based on CNNs. From the viewpoint of model ensembling,  combine CNNs and recursive networks along the Shortest Dependency Path (SDP), while 
incorporate CNNs with Recurrent Neural Networks (RNNs).
3 From Single to Composite Relation Classification
The goal of this work is to propose an approach for semantic relation classification using one or more relations between term mentions/entities.
”The was carefully wrapped and bound into the by means of a cord.”
In this example, the relationship between and cannot be directly expressed by one of the nine abstract semantic relations from the set described in .
However, looking into a commonsense KB (in this case, ConceptNet V5.4) we can see the following set of composite relations between these elements:
As you increase the number of edges that you can include in the set of semantic relations compositions (the size of the semantic relationship path), there is a dramatic increase in the number of paths which connect the two entities. For example, for the words and there are 15 paths of size 2, 1079 paths of size 3 and 95380 paths of size 4. Additionally, as the path size grows many non-relevant relationships (less meaningful relations) will be included.
The challenge in composite semantic relation classification is to provide a classification method that provides the most meaningful set of relations for the context at hand. This task can be challenging because, as previously mentioned, a simple KB lookup based approach would provide all semantic associations at hand.
To achieve this goal we propose an approach which combines sequence machine learning models, distributional semantic models and commonsense relations knowledge bases to provide an accurate method for composite semantic relation classification.
The proposed model (Fig 1) relies on the combination of the following approaches:
Use existing structured commonsense KBs define an initial set of semantic relation compositions.
Use a pre-filtering method based on the Distributional Navigational Algorithm (DNA) as proposed by 
Use sequence-based Neural Network based model to quantify the sequence probabilities of the semantic relation compositions. We call this model Neural Concept/Relation Model, in analogy to a Language Model.
3.2 Commonsense KB Lookup
The first step consists in the use of a large commonsense knowledge base for providing a reference for a sequence of semantic relations. ConceptNet is a semantic network built from existing linguistic resources and crowd-sourced. It is built from nodes representing words or short phrases of natural language, and labeled abstract relationships between them.
1094 paths were extracted from ConceptNet with two given entities (e.g. and ) with no corresponding semantic relation from the Semeval 2010 Task 8 test collection (Figure 1(i)). Examples of paths are:
3.3 Distributional Navigational Algorithm (DNA)
The Distributional Navigational Algorithm (DNA) consists of an approach which uses distributional semantic models as a relevance-based heuristic for selecting relevant facts attached to a contextual query. The approach focuses on addressing the following problems: (i) providing a semantic selection mechanism for facts which are relevant and meaningful in a particular reasoning & querying context and (ii) allowing coping with information incompleteness in a huge KBs.
In  DSMs are used as a complementary semantic layer to the relational model, which supports coping with semantic approximation and incompleteness.
For large-scale and open domain commonsense reasoning scenarios, model completeness, and full materialization cannot be assumed. A commonsense KB would contain vast amounts of facts, and a complete inference over the entire KB would not scale to its size. Although several meaningful paths may exist between two entities, there are a large number of paths which are not meaningful in a specific context. For instance, the reasoning path which goes through (1) is not related to the goal of the entity pairs (the relation between of human and ) and should be eliminated by the application of the Distributional Navigation Algorithm (DNA) , which computes the distributional semantic relatedness between the entities and the intermediate entities in the KB path as a measure of semantic coherence. In this case the algorithm navigates from in the direction of in the KB using distributional semantic relatedness between the target node and the intermediate nodes as a heuristic method.
3.4 Neural Entity/Relation Model
The Distributional Navigational Algorithm provides a pre-filtering of the relations maximizing the semantic relatedness coherence. This can be complemented by a predictive model which takes into account the likelihood of a sequence of relations, i.e. the likelihood of a composition sequence. The goal is to systematically compute the sequence of probabilities of a relation composition, in a similar fashion to a language model. For this purpose we use a Long short-term memory (LSTM) recurrent neural network architecture (Figure 3) .
4 Baseline Models
As baselines we use bigram language models which define the conditional probabilities between a sequence of semantic relations after entities , i.e. .
The performance of baselines systems is measured using the CSRC222Composite Semantic Relation Classification task, as defined in section 5.1 where we hold out the last relation and rate a system by its ability to infer this relation.
Random Model: This is the simplest baseline, which outputs randomly selected relation pairs.
Unigram Model: Predicts the next relation based on unigram probability of each relation which was calculated from the training set. In this model, relations are assumed to occur independently.
The single model is defined by :
where is the probability of seeing and , in order. Let be an ordered list of relations and entities, is the length of R, For , define to be the element of A. We rank candidate relations r by maximizing F(r,a), defined as
where the conditional probabilities calculated using (1).
is an ensemble learning method for classification and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes. Random decision forests correct for decision trees’ habit of overfitting to their training set.
5 Experimental Evaluation
5.1 Training and Test Dataset
The evaluation dataset was generated by collecting all pairs of entity mentions in the Semeval 2010 task 8  which had no attached semantic relation classification (i.e. which contained the relation label ”OTHER”).
For all entities with unassigned relation labels, we did a lookup , where we generated all paths from sizes 1, 2 and 3 (number of relations) occurring between both entities( and ) and their relations ().
where contains the intermediate entities between the target entity mentions e1 and e2.
In next step, the Distributional Navigational Algorithm (DNA) is applied over the entity paths. In the final step of generating training & test datasets, the best paths are selected manually out of filtered path sets.
From 602 entity pairs assigned to the ”OTHER” relation label in Semeval, we found paths between entity pairs in ConceptNet. With the Distributional Navigation Algorithm (DNA), meaningless paths were eliminated, and after filtering, we have paths for entity-pairs.
Overall we have relations and entities. All paths were converted into the following format which will be input into the neural network: (Table 1).
We provide statistics for the generated datasets in the Tables 2 and 3. In Table 3 our dataset is divided into a training set and a test set with scale (), also we used percent of the training set for cross-validation, examples for training, for validation and for testing. Table 2 shows statistics for test dataset of baseline models.
|Test Dataset||# Length 2||# Length 4||# Length 6|
|Dataset||# Train||# Dev||# Test|
To achieve the classification goal, we generated a LTSM model for the composite relation classification task. In our experiments, a batch size 25, and epoch 50 was generated. An embedding layer using Word2Vec pre-trained vectors was used.
In our experiment, we optimized the hyperparameters of the LSTM model. After several experiments, the best model is generated with:
Inputs length and dimension are and , respectively.
Three hidden layers with , and nodes and activation,
Dropout technique (),
We experimented our LSTM model with three different pre-training embedding word vector models:
Word2Vec (Google News) with 300 dimensions
Word2Vec (Wikipedia 2016) with 30 dimensions
No pre-training word embedding
The accuracy for the configuration above after 50 epochs is shown in the table below.
|CRSC||W2V Google_News||W2V Wikipedia||No Pre Training|
Table 5 contains the Precision, Recall, F1-Score and Accuracy.
Between the evaluated models, the LSTM-CSRC achieved the highest F1 Score and Accuracy. The Single model achieved the second highest accuracy followed by Random forest model . The LSTM approach provides an improvement of 9.86 % on accuracy over the baselines, and 11.31 % improvement on the F1-score. Random Forest achieved the highest precision, while LSTM-CSRC achieved the highest recall.
|Relation||# Correct Predicted||# Correct Predicted Rate||Relation||# Correct Predicted||# Correct Predicted Rate|
|Relation||# Correct Predicted||Rate||Wrong Relation 1||# False Predicted for Relation 1||Wrong Relation 2||# False Predicted for Relation 2||Wrong Relation 3||# False Predicted for Relation 3|
At table 6 ’Correctly Predicted’ column indicates the proportion of relations are predicted correctly, and ’Correct Prediction Rate’ column indicates the rate of correct predicted. For instance, our model predicts the relation 100 percent correct.
Table 7 shows the relations which are wrongly predicted (’Wrongly Predicted’ columns).
Based on the results, the most incorrectly predicted relation is , which accounts for a large proportion of relations of the dataset (around 150 out of 550). In the second place is relation (172 out of 550). The third place is the relation. On the other hand, some relations which are correctly unpredicted, can be treated as semantically equivalent to their prediction, where the assignment is dependent on a modelling decision. The same situation occurs for and relations.
Another issue is the low number of certain relations expressed int he dataset.
In this paper we introduced the task of composite semantic relation classification. The paper proposes a composite semantic relation classification model which combines commonsense KB lookup, a distributional semantic based filter and the application of a sequence machine learning model to address the task. The proposed LSTM model outperformed existing baselines with regard to f1-score, accuracy and recall. Future work will focus on increasing the volume of the training set for under-represented relations.
-  Barzegar, S., Sales, J.E., Freitas, A., Handschuh, S., Davis, B.: Dinfra: A one stop shop for computing multilingual semantic relatedness. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1027–1028. ACM (2015)
-  Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems. pp. 2787–2795 (2013)
-  Freitas, A., da Silva, J.C.P., Curry, E., Buitelaar, P.: A distributional semantics approach for selective reasoning on commonsense graph knowledge bases. In: Natural Language Processing and Information Systems, pp. 21–32. Springer (2014)
Garcia-Duran, A., Bordes, A., Usunier, N., Grandvalet, Y.: Combining two and three-way embedding models for link prediction in knowledge bases. Journal of Artificial Intelligence Research 55, 715–742 (2016)
Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. pp. 94–99. Association for Computational Linguistics (2009)
-  Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
Jans, B., Bethard, S., Vulić, I., Moens, M.F.: Skip n-grams and ranking functions for predicting script events. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. pp. 336–344. Association for Computational Linguistics (2012)
-  Liu, Y., Wei, F., Li, S., Ji, H., Zhou, M., Wang, H.: A dependency-based neural network for relation classification. arXiv preprint arXiv:1507.04646 (2015)
-  Nguyen, T.H., Grishman, R.: Combining neural networks and log-linear models to improve relation extraction. arXiv preprint arXiv:1511.05926 (2015)
-  Qin, P., Xu, W., Guo, J.: An empirical convolutional neural network approach for semantic relation classification. Neurocomputing (2016)
-  dos Santos, C.N., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. vol. 1, pp. 626–634 (2015)
Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: Advances in Neural Information Processing Systems. pp. 926–934 (2013)
-  Speer, R., Havasi, C.: Representing general relational knowledge in conceptnet 5. In: LREC. pp. 3679–3686 (2012)
-  Xu, K., Feng, Y., Huang, S., Zhao, D.: Semantic relation classification via convolutional neural networks with simple negative sampling. arXiv preprint arXiv:1506.07650 (2015)
-  Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y., Jin, Z.: Improved relation classification by deep recurrent neural networks with data augmentation. arXiv preprint arXiv:1601.03651 (2016)
-  Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., Jin, Z.: Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (to appear) (2015)
-  Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J., et al.: Relation classification via convolutional deep neural network. In: COLING. pp. 2335–2344 (2014)