Evolution of Semantic Similarity – A Survey

04/19/2020
by   Dhivya Chandrasekaran, et al.
Lakehead University
0

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. In order to address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network-based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place, for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

02/17/2016

A Comprehensive Comparative Study of Word and Sentence Similarity Measures

Sentence similarity is considered the basis of many natural language tas...
04/26/2022

Suggesting Relevant Questions for a Query Using Statistical Natural Language Processing Technique

Suggesting similar questions for a user query has many applications rang...
10/06/2019

Measuring Sentences Similarity: A Survey

This study is to review the approaches used for measuring sentences simi...
07/25/2020

Constructing a Testbed for Psychometric Natural Language Processing

Psychometric measures of ability, attitudes, perceptions, and beliefs ar...
05/29/2020

Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

Recent years have seen a growing number of publications that analyse Nat...
07/08/2015

Generating Navigable Semantic Maps from Social Sciences Corpora

It is now commonplace to observe that we are facing a deluge of online i...
04/05/2021

Automating Transfer Credit Assessment in Student Mobility – A Natural Language Processing-based Approach

Student mobility or academic mobility involves students moving between i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the exponential increase in text data generated over time, Natural Language Processing (NLP) has gained significant attention from Artificial Intelligence (AI) experts. Measuring the semantic similarity between various text components like words, sentences, or documents plays a significant role in a wide range of NLP tasks like information retrieval

(Kim et al., 2017)

, text summarization

(Mohamed and Oussalah, 2019), text classification (Kim, 2014), essay evaluation (Janda et al., 2019), machine translation (Zou et al., 2013), question answering (Bordes et al., 2014) (Lopez-Gazpio et al., 2017)

, among others. In early days two text snippets were considered similar if they contain the same words/characters. The techniques like Bag of Words (BoW), Term Frequency - Inverse Document Frequency (TF-IDF) were used to represent text, as real value vectors to aid calculation of semantic similarity. However, these techniques did not attribute to the fact that words have different meanings and different words can be used to represent a similar concept. For example, consider two sentences

“John and David studied Maths and Science.” and “John studied Maths and David studied Science.”. Though these two sentences have exactly the same words they do not convey the same meaning. Similarly, the sentences “Mary is allergic to dairy products.” and “Mary is lactose intolerant.” convey the same meaning; however, they do not have the same set of words. These methods captured the lexical feature of the text and were simple to implement, however, they ignored the semantic and syntactic properties of text. To address these drawbacks of the lexical measures various semantic similarity techniques were proposed over the past three decades.
Semantic Textual Similarity (STS) is defined as the measure of semantic equivalence between two blocks of text. Semantic similarity methods usually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar. Semantic similarity is often used synonymously with semantic relatedness. However, semantic relatedness not only accounts for the semantic similarity between texts but also considers a broader perspective analyzing the shared semantic properties of two words. For example, the words ‘coffee’ and ‘mug’ may be related to one another closely, but they are not considered semantically similar whereas the words ‘coffee’ and ‘tea’ are semantically similar. Thus, semantic similarity may be considered, as one of the aspects of semantic relatedness. The semantic relationship including similarity is measured in terms of semantic distance, which is inversely proportional to the relationship(Hadj Taieb et al., 2019).

Figure 1. Survey Architecture

1.1. Motivation behind the survey

Most of the survey articles published recently related to semantic similarity, provide in-depth knowledge of one particular semantic similarity technique or a single application of semantic similarity. Lastra-Díaz et al. surveys various knowledge-based methods(Lastra-Díaz et al., 2019) and IC-based methods(Lastra-Díaz and García-Serrano, 2015), Camacho-Colladas et al.(Camacho-Collados and Pilehvar, 2018) discuss various vector representation methods of words, Taieb et al.(Hadj Taieb et al., 2019) on the other hand describes various semantic relatedness methods and Berna Altınel et al.(Altınel and Ganiz, 2018) summarises various semantic similarity methods used for text classification. The motivation behind this survey is to provide a comprehensive account of the various semantic similarity techniques including the most recent advancements using deep neural network-based methods.
This survey traces the evolution of Semantic Similarity Techniques over the past decades, distinguishing them based on the underlying methods used in them. Fig 1

shows the structure of the survey. A detailed account of the widely used datasets available for Semantic Similarity is provided in Section 2. Sections 3 to 6 provide a detailed description of semantic similarity methods broadly classified as 1) Knowledge-based methods, 2) Corpus-based methods, 3) Deep neural network-based methods and 4) Hybrid methods. Section 7 analyzes the various aspects and inference of the survey conducted. This survey provides a deep and wide knowledge of existing techniques for new researchers who venture to explore one of the most challenging NLP tasks, Semantic Textual Similarity.

2. Datasets

In this section, we discuss some of the popular datasets used to evaluate the performance of semantic similarity algorithms. The datasets may include word pairs or sentence pairs with associated standard similarity values. The performance of various semantic similarity algorithms is measured by the correlation of the achieved results with that of the standard measures available in these datasets. Table 1 lists some of the popular datasets used to evaluate the performance of semantic similarity algorithms. The below subsection describes the attributes of the dataset and the methodology used to construct them.

Dataset Name Word/Sentence pairs Similarity score range Year Reference
R&G 65 0-4 1965 (Rubenstein and Goodenough, 1965)
M&C 30 0-4 1991 (Miller and Charles, 1991)
WS353 353 0-10 2002 (Finkelstein et al., 2001)
LiSent 65 0-4 2007 (Li et al., 2006)
SRS 30 0-4 2007 (Pedersen et al., 2007)
WS353-Sim 203 0-10 2009 (Agirre et al., 2009)
STS2012 5250 0-5 2012 (Agirre et al., 2012)
STS2013 2250 0-5 2013 (Agirre et al., 2013)
WP300 300 0-1 2013 (Li et al., 2013)
STS2014 3750 0-5 2014 (Agirre et al., 2014)
SL7576 7576 1-5 2014 (Silberer and Lapata, 2014)
SimLex-999 999 0-10 2014 (Hill et al., 2015)
SICK 10000 1-5 2014 (Marelli et al., 2014)
STS2015 3000 0-5 2015 (Agirre et al., 2015)
SimVerb 3500 0-10 2016 (Gerz et al., 2016)
STS2016 1186 0-5 2016 (Agirre et al., 2016)
WiC 5428 NA 2019 (Pilehvar and Camacho-Collados, 2019)
Table 1. Popular Benchmark datasets for Semantic similarity

2.1. Semantic similarity datasets

The following is a list of widely used semantic similarity datasets arranged chronologically.

  • Rubenstein and Goodenough (R&G)(Rubenstein and Goodenough, 1965): This dataset was created as a result of an experiment conducted among 51 undergraduate students (native English speakers) in two different sessions. The subjects were provided with 65 selected English noun pairs and requested to assign a similarity score for each pair over a scale of 0 to 4, where 0 represents that the words are completely dissimilar and 4 represents that they are highly similar. This dataset is the first and most widely used dataset in Semantic similarity tasks(Zhu and Iglesias, 2017).

  • Miller and Charles (M&C)(Miller and Charles, 1991): Miller and Charles repeated the experiment performed by Rubenstein and Goodenough in 1991 with a subset of 30 word pairs from the original 65 word pairs. 38 human subjects ranked the word pairs on a scale from 0 to 4, 4 being the ”most similar”.

  • WS353(Finkelstein et al., 2001): WS353 contains 353 word pairs with an associated score ranging from 0 to 10. 0 represents the least similarity and 10 represents the highest similarity. The experiment was conducted with a group of 16 human subjects. This dataset measures semantic relatedness rather than semantic similarity. Subsequently, the next dataset was proposed.

  • WS353-Sim(Agirre et al., 2009): This dataset is a subset of WS353 containing 203 word pairs from the original 353 word pairs that are more suitable for semantic similarity algorithms specifically.

  • LiSent(Li et al., 2006): 65 sentence pairs were built using the dictionary definition of 65 word pairs used in R&G dataset. 32 native english speakers volunteered to provide a similarity range from 0 to 4, 4 being the highest. The mean of the scores given by all the volunteers was taken as the final score.

  • SRS(Pedersen et al., 2007): Pedersen et al., attempted to build a domain specific semantic similarity dataset for the biomedical domain. Initially 120 pairs were selected by a physician distributed with 30 pairs over 4 similarity values. These term pairs were then ranked by 13 medical coders on a scale of 1-10. 30 word pairs from the 120 pairs were selected to increase reliability and these word pairs were annotated by 3 physicians and 9 (out of the 13) medical coders to form the final dataset.

  • SimLex-999(Hill et al., 2015): 999 word pairs were selected from the UFS Dataset(Nelson et al., 2004) of which 900 were similar and 99 were related but not similar. 500 native English speakers, recruited via Amazon Mechanical Turk were asked to rank the similarity between the word pairs over a scale of 0 to 6, 6 being the most similar. The dataset contains 666 noun pairs, 222 verb pairs, and 111 adjective pairs.

  • Sentences Involving Compositional Knowledge (SICK) dataset(Marelli et al., 2014): The SICK dataset consists of 10,000 sentence pairs, derived from two existing datasets the ImageFlickr 8 and MSR-Video descriptions dataset. Each sentence pair is associated with a relatedness score and a text entailment relation. The relatedness score ranges from 1 to 5 and the three entailment relations are ”NEUTRAL, ENTAILMENT and CONTRADICTION.” The annotation was done using crowd-sourcing techniques.

  • STS datasets(Agirre et al., 2012)(Agirre et al., 2013)(Agirre et al., 2014)(Agirre et al., 2015)(Agirre et al., 2016)(Cer et al., 2017): The STS datasets were built by combining sentence pairs from different sources by the organizers of the SemEVAL shared task. The dataset was annotated using Amazon Mechanical Turk and further verified by the organizers themselves. Table 2 shows the various sources from which the STS dataset was built.

    Year Dataset Pairs Source
    2012 MSRPar 1500 newswire
    2012 MSRvid 1500 videos
    2012 OnWN 750 glosses
    2012 SMTNews 750 WMT eval.
    2012 SMTeuroparl 750 WMT eval.
    2013 HDL 750 newswire
    2013 FNWN 189 glosses
    2013 OnWN 561 glosses
    2013 SMT 750 MT eval.
    2014 HDL 750 newswire headlines
    2014 OnWN 750 glosses
    2014 Deft-forum 450 forum posts
    2014 Deft-news 300 news summary
    2014 Images 750 image descriptions
    2014 Tweet-news 750 tweet-news pairs
    2015 HDL 750 newswire headlines
    2015 Images 750 image descriptions
    2015 Ans.-student 750 student answers
    2015 Ans.-forum 375 Q & A forum answers
    2015 Belief 375 committed belief
    2016 HDL 249 newswire headlines
    2016 Plagiarism 230 short-answers plag.
    2016 post-editing 244 MT postedits
    2016 Ans.-Ans 254 Q & A forum answers
    2016 Quest.-Quest. 209 Q & A forum questions
    2017 Trail 23 Mixed STS 2016
    Table 2. STS English language training dataset (2012-2017)(Cer et al., 2017)

3. Knowledge-based Semantic-Similarity Methods

Knowledge-based semantic similarity methods calculate semantic similarity between two terms based on the information derived from one or more underlying knowledge sources like ontologies/lexical databases, thesauri, dictionaries, etc. The underlying knowledge-base offers these methods a structured representation of terms or concepts connected by semantic relations, further offering an ambiguity free semantic measure, as the actual meaning of the terms, is taken into consideration(Sánchez et al., 2012). In this section, we discuss four lexical databases widely employed in knowledge-based semantic similarity methods and further discuss in brief, different methodologies adopted by some of the knowledge-based semantic similarity methods.

3.1. Lexical Databases

  • WordNet(Miller, 1995) is a widely used lexical database for knowledge-based semantic similarity methods that accounts for more than 100,000 English concepts(Sánchez et al., 2012). WordNet can be visualized as a graph, where the nodes represent the meaning of the words (concepts), and the edges define the relationship between the words(Zhu and Iglesias, 2017). WordNet’s structure is primarily based on synonyms, where each word has different synsets attributed to their different meanings. The similarity between two words depends on the path distance between them(Pawar and Mago, 2019).

  • Wiktionary is an open-source lexical database that encompasses approximately 6.2 million words from 4,000 different languages. Each entry has an article page associated with it and it accounts for a different sense of each entry. Wiktionary does not have a well-established taxonomic lexical relationship within the entries, unlike WordNet, which makes it difficult to be used in Semantic Similarity Algorithms

    (Pilehvar and Navigli, 2015).

  • With the advent of Wikipedia, most techniques for semantic similarity exploit the abundant text data freely available to train the models(Mihalcea and Csomai, 2007). Wikipedia has the text data organized as Articles. Each article has a title (concept), neighbors, description, and categories. It is used as both structured taxonomic data and/or as a corpus for training corpus-based methods(Qu et al., 2018). The complex category structure of Wikipedia is used as a graph to determine the Information Content of concepts, which in turn aids in calculating the semantic similarity(Jiang et al., 2017).

  • BabelNet(Navigli and Ponzetto, 2012) is a lexical resource that combines WordNet with data available on Wikipedia for each synset. It is the largest multilingual semantic ontology available with nearly over 13 million synsets and 380 million semantic relations in 271 languages. It includes over four million synsets with at least one associated Wikipedia page for the English language(Camacho-Collados et al., 2016).

3.2. Types of Knowledge-based semantic similarity methods

Based on the underlying principle of how the semantic similarity between words is assessed, knowledge-based semantic similarity methods can be further categorized as edge-counting methods, feature-based methods, and Information content-based methods.

3.2.1. Edge-counting methods:

The most straight forward edge counting method is to consider the underlying ontology as a graph connecting words taxonomically and count the edges between two terms to measure the similarity between them. The greater the distance between the terms the less similar they are. This measure called was proposed by Rada et al.(Rada et al., 1989) where the similarity is inversely proportional to the shortest path length between two terms. In this edge-counting method, the fact that the words deeper down the hierarchy have a more specific meaning, and that, they may be more similar to each other even though they have the same distance as two words that represent a more generic concept was not taken into consideration. Wu and Palmer(Wu and Palmer, 1994) proposed measure, where the depth of the words in the ontology was considered an important attribute. The measure counts the number of edges between each term and their Least Common Subsumer (LCS). LCS is the common ancestor shared by both terms in the given ontology. Consider, two terms denoted as , their LCS denoted as , and the shortest path length between them denoted as ,
is measured as,

(1)

and is measured as,

(2)

Li et al.(Li et al., 2003) proposed a measure that takes into account both the minimum path distance and depth. is measured as,

(3)

However, the edge-counting methods ignore the fact that the edges in the ontologies need not be of equal length. To overcome this shortcoming of simple edge-counting methods feature-based semantic similarity methods were proposed.

3.2.2. Feature-based methods:

The feature-based methods calculate similarity as a function of properties of the words, like gloss, neighboring concepts, etc. (Sánchez et al., 2012). Gloss is defined as the meaning of a word in a dictionary; a collection of glosses is called glossary. There are various semantic similarity methods proposed based on the gloss of words. Gloss-based semantic similarity measures exploit the knowledge that words with similar meaning have more common words in their gloss. The semantic similarity is measured as the extent of overlap between the gloss of the words in consideration. The Lesk measure(Banerjee and Pedersen, 2003), assigns a value of relatedness between two words based on the overlap of words in their gloss and the glosses of the concepts they are related to in an ontology like WordNet(Lastra-Díaz et al., 2019). Jiang et al. (Jiang et al., 2015) proposed a feature-based method where semantic similarity is measured using the glosses of concepts present in Wikipedia. Most feature-based methods take into account common and non-common features between two words/terms. The common features contribute to the increase of the similarity value and the non-common features decrease the similarity value. The major limitation of feature-based methods is its dependency on ontologies with semantic features, and most ontologies rarely incorporate any semantic features other than taxonomic relationships(Sánchez et al., 2012).

3.2.3. Information Content-based methods:

Information content (IC) of a concept is defined as the information derived from the concept when it appears in context(Sánchez and Batet, 2013). High IC value indicates that the word is more specific and clearly describes a concept with less ambiguity, while lower IC values indicate that the words are more abstract in meaning(Zhu and Iglesias, 2017). The specificity of the word is determined using Inverse Document Frequency (IDF), which relies on the principle that more specific a word is, the less it occurs in a document. Information content-based methods measure the similarity between terms using the IC value associated with them. Resnik and Philip (Resnik, 1995) proposed a semantic similarity measure called which measures the similarity based on the idea that if two concepts share a common subsumer they share more information since the value of the LCS is higher. Considering represents the Information Content of the given term, is measured as,

(4)

D. Lin(Lin and others, 1998) proposed an extension of the measure taking into consideration the value of the both the terms that attribute to the individual information or description of the terms and the value of their LCS that provides the shared commonality between the terms. is measured as,

(5)

Jiang and Conrath(Jiang and Conrath, 1997) calculate a distance measure based on the difference between the sum of the individual values of the terms and the value of their LCS using the below equation,

(6)

The distance measure replaces the shortest path length in equation (1), and the similarity is inversely proportional to the above distance. Hence is measured as,

(7)

IC can be measured using an underlying corpora or from the intrinsic structure of the ontology itself(Sánchez et al., 2011) based on the assumption that the ontologies are structured in a meaningful way. Some of the terms may not be included in one ontology, which provides a scope to use multiple ontologies to calculate their relationship (Rodríguez and Egenhofer, 2003). Based on whether the given terms are both present in a single ontology or not, IC-based methods can be classified as mono-ontological methods or multi-ontological methods. When multiple ontologies are involved the of the Least Common Subsumer from both the ontologies are accessed to estimate the semantic similarity values. Jiang et al. (Jiang et al., 2017) proposed IC-based semantic similarity measures based on Wikipedia pages, concepts and neighbors. Wikipedia was both used as a structured taxonomy as well as a corpus to provide values.

3.2.4. Combined knowledge-based methods:

Various similarity measures were proposed combining the various knowledge-based methods. Goa et al.(Gao et al., 2015) proposed a semantic similarity method based on WordNet ontology where three different strategies are used to add weights to the edges and the shortest weighted path is used to measure the semantic similarity. According to the first strategy, the depths of all the terms in WordNet along the path between the two terms in consideration is added as a weight to the shortest path. In the second strategy, only the depth of the LCS of the terms was added as the weight, and in strategy three, the

value of the terms is added as weight. The shortest weighted path length is now calculated and then non-linearly transformed to produce semantic similarity measures. In comparison, it is shown that strategy three achieved a better correlation to the gold standards in comparison with traditional methods and the two other strategies proposed. Zhu and Iglesias

(Zhu and Iglesias, 2017) proposed another weighted path measure called that adds the value of the Least Common Subsumer as a weight to the shortest path length. is calculated as

(8)

This method was proposed to be used in various knowledge graphs(KG) like WordNet

(Miller, 1995), DBPedia(Bizer et al., 2009), YAGO(Hoffart et al., 2013), etc. and the parameter

is a hyperparameter which has to be tuned for different KGs and different domains as different KGs have a different distribution of terms in each domain. Both corpus-based IC and intrinsic IC values were experimented and corpus IC-based

measure achieved greater correlation in most of the gold standard datasets.

Knowledge-based semantic similarity methods are computationally simple and the underlying knowledge-base acts as a strong backbone for the models, and the most common problem of ambiguity like synonyms, idioms and phrases are handled efficiently. Knowledge-based methods can easily be extended to calculate sentence to sentence similarity measure by defining rules for aggregation(Lee, 2011). Lastra-Díaz et al(Lastra-Díaz et al., 2017) developed a software Half-Edge Semantic Measures Library (HESML) to implement various ontology-based semantic similarity measures proposed and have shown an increase in performance time and scalability of the models.
However, knowledge-based systems are highly dependent on the underlying source resulting in the need to update them frequently which requires time and high computational resources. Although strong ontologies exist for the English language, like WordNet, similar resources are not available for other languages which result in the need for the building of strong and structured knowledge bases to implement knowledge-based methods in different languages and across different domains. Various researches were conducted on extending semantic similarity measures in biomedical domain(Pedersen et al., 2007)(Soğancıoğlu et al., 2017). McInnes et al.(McInnes et al., ) built a domain-specific model called UMLS to measure the similarity between words in biomedical domain. With nearly 6,500 world languages and numerous domains, this becomes a serious drawback for knowledge-based systems.

4. Corpus-based Semantic-Similarity Methods

Corpus-based semantic similarity methods measure semantic similarity between terms using the information retrieved from a large underlying corpora. The underlying principle exploits the idea that similar words occur together frequently in documents; however, the actual meaning of the words is not taken into consideration. Statistical techniques are deployed to analyse the latent similarities between terms in the training corpora. In this section, we discuss three of the widely used word-embedding models and further discuss in detail some of the methodologies implemented in corpus-based semantic similarity methods.

4.1. Word Embeddings

Word Embeddings provide vector representations of words wherein these vectors retain the underlying linguistic relationship between the words(Schnabel et al., 2015). These vectors are computed using different approaches like neural networks(Mikolov et al., 2013a), word co-occurrence matrix(Pennington et al., 2014), or representations in terms of the context in which the word appears(Levy and Goldberg, 2014). Some of the most widely used pre-trained word embeddings include:

  • word2vec(Mikolov et al., 2013a): Developed from Google News Dataset containing approximately 3 million vector representations of words and phrases, is a neural network model used to produce distributed vector representation of words based on an underlying corpus. There are two different models of proposed: The Combined Bag of Words (CBOW) and the Skip-gram model. The architecture of the network is rather simple and contains an input layer, one hidden layer, and an output layer. The network is fed with a large text corpus as the input, and the output of the model is a vector representation of words. The CBOW model predicts the current word using the previous words, while the Skip-gram model predicts the neighboring context words given a target word. models are efficient in representing the word vectors which retain the contextual similarity between words. The word vector calculations yielded good results in predicting the semantic similarity(Mikolov et al., 2013b). Many researchers extended word2vec model to propose context vectors(Melamud et al., 2016), dictionary vectors(Tissier et al., 2017), sentence vectors(Pagliardini et al., 2018) and paragraph vectors(Le and Mikolov, 2014).

  • GloVe(Pennington et al., 2014): developed by Stanford University relies on a global word co-occurrence matrix formed based on the underlying corpus. It estimates similarity based on the principle that words similar to each other occur together. The co-occurrence matrix is populated with occurrence values by doing a single pass over the underlying large corpora. model was trained using five different corpora mostly Wikipedia dumps. While forming vectors words are chosen within a specified context window owing to the fact that words far away have less relevance to the context word in consideration. The loss function minimizes the least-square distance between the context window co-occurrence values and the global co-occurrence values(Lastra-Díaz et al., 2019). GloVe vectors were extended to form contextualized word vectors to differentiate words based on context(McCann et al., 2017).

  • fastText(Bojanowski et al., 2017)

    : Facebook AI researchers developed a word embedding model which builds word vectors based on Skip-gram models where each word is represented as a collection of character n-grams.

    learns word embeddings as the average of its character embeddings thus accounting to the morphological structure of the word which proves efficient in various languages like Finnish and Turkish. Even out-of-the-vocabulary words are assigned word vectors based on their characters or sub units.

Word embeddings are used to measure semantic similarity between texts of different languages by mapping the word embedding of one language over the vector space of another. On training with a limited yet sufficient number of translation pairs, the translation matrix can be computed to enable the overlap of embeddings across languages(Glavaš et al., 2018). One of the major challenges faced when deploying word-embeddings to measure similarity is Meaning Conflation Deficiency. It denotes that word embeddings do not attribute to the different meanings of a word that pollutes the semantic space with noise by bringing irrelevant words closer to each other. For example, the words ‘finance’ and ‘river’ may appear in the same semantic space since the word ‘bank’ has two different meanings(Camacho-Collados and Pilehvar, 2018).

4.2. Types of corpus-based semantic similarity methods

Based on the underlying methods using which the word-vectors are constructed there are a wide variety of corpus-based methods some of which are discussed in this section.

4.2.1. Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997):

LSA is one of the most popular and widely used corpus-based techniques used for measuring semantic similarity. A word co-occurrence matrix is formed where the rows represent the words and columns represent the paragraphs and the cells are populated with word counts. This matrix is formed with a large underlying corpus and dimensionality reduction is achieved by a mathematical technique called Singular Value Decomposition (SVD). SVD represents a given matrix as a product of three matrices, where two matrices represent the rows and columns as vectors derived from their eigenvalues and the third matrix is a diagonal matrix that has values that would reproduce the original matrix when multiplied with the other two matrices

(Landauer et al., 1998). SVD reduces the number of columns while retaining the number of rows thereby preserving the similarity structure among the words. Then each word is represented as a vector using the values in its corresponding rows and semantic similarity is calculated as the cosine value between these vectors. LSA models are generalised by replacing words with texts and columns with different samples and can be used to calculate the similarity between sentences, paragraphs, and documents.

4.2.2. Hyperspace Analogue to Language(HAL)(Lund and Burgess, 1996):

HAL builds word co-occurrence matrix that has both rows and columns representing the words in the vocabulary and the matrix elements are populated with association strength values. The association strength values are calculated by sliding a ”window” the size of which can be varied, over the underlying corpus. The strength of association between the words in the window decreases with the increase in their distance from the focused word. For example, in the sentence ”This is a survey of various semantic similarity measures”, the words ‘survey’ and ‘variety’ have greater association value than the words ‘survey’ and ‘measures’. Word vectors are formed by taking into consideration both the row and column of the given word. Dimensionality reduction can be achieved by removing any columns with low entropy values. The semantic similarity is then calculated by measuring the Euclidean or Manhattan distance between the word vectors.

4.2.3. Explicit Semantic Analysis (ESA)(Gabrilovich et al., 2007):

ESA measures semantic similarity based on Wiki-pedia concepts. The use of Wikipedia ensures that the proposed method can be used over various domains and languages and since Wikipedia is constantly updated, it is adaptable to the changes over time. First, each concept in Wikipedia is represented as an attribute vector of the words that occur in it, then an inverted index is formed, where each word is linked to all the concepts it is associated with. The association strength is weighted using the TF-IDF technique, and the concepts weakly associated with the words are removed. Thus the input text is represented by weighted vectors of concepts called the ”interpretation vectors.” Semantic similarity is measured by calculating the cosine similarity between the word vectors.

4.2.4. Word-Alignment models (Sultan et al., 2015):

Word-Alignment models calculate the semantic similarity of sentences based on their alignment over a larger corpus. The second, third and fifth position in SemEval Tasks 2015 was secured by methods based on word alignment. The unsupervised method which was at the fifth place implemented the word alignment technique based on Paraphrase Database (PPDB) (Ganitkevitch et al., 2013). The system calculates the semantic similarity between two sentences as a proportion of the aligned context words in the sentences over the total words in both the sentences. The supervised methods which were at the second and third place used to obtain the alignment of the words. In the first method, a sentence vector is formed by computing the ”component-wise average” of the words in the sentence and the cosine similarity between these sentence vectors is used as a measure of semantic similarity. The second supervised method takes into account only those words that have a contextual semantic similarity(Sultan et al., 2015).

4.2.5. Latent Dirichlet Allocation (LDA)(Sinoara et al., 2019):

LDA is used to represent a topic or the general idea behind a document as a vector rather than every word in the document. This technique is widely used for topic modeling tasks and it has the advantage of reduced dimensionality considering that the topics are significantly less than the actual words in a document (Sinoara et al., 2019). One of the novel approaches to determine document-to-document similarity is the use of vector representation of documents and calculate the cosine similarity between the vectors to ascertain the semantic similarity between documents(Benedetti et al., 2019).

4.2.6. Normalised Google Distance(Cilibrasi and Vitanyi, 2007):

NGD measures similarity between two terms based on the results obtained when the terms are queried using Google search engine. It is based on the assumption that two words occur together more frequently in web-pages if they are more related. Give two terms and the following formula is used to calculate the NGD between the two terms.

(9)

where the functions and return the number of hits in Google search of the given terms, returns the number of hits in Google search when the terms are searched together and represent the total number of pages in the overall google search. NGD is widely used to measure semantic relatedness rather than semantic similarity because related terms occur together more frequently in web pages though they may have opposite meaning.

4.2.7. Dependency-based models(Agirre et al., 2009):

Dependency-based approaches ascertain the meaning of a given word or phrase using the neighbors of the word within a given window. The dependency-based models initially parse the corpus based on its distribution using Inductive Dependency Parsing(Nivre, 2006). For every given word a ”syntactic context template” is built considering both the nodes preceding and succeeding the word in the built parse tree. For example, the phrase “thinks ¡term¿ delicious” could have a context template as “pizza, burger, food”. Vector representation of a word is formed by adding each window across the location that has the word in consideration, as it’s root word, along with the frequency of the window of words appearing in the entire corpus. Once this vector is formed semantic similarity is calculated using cosine similarity between these vectors. Levy et al.(Levy and Goldberg, 2014) proposed DEPS embedding as a word-embedding model based on dependency-based bag of words. This model was tested with the WS353 dataset where the task was to rank the similar words above the related words. On plotting a recall precision curve the DEPS curve showed greater affinity towards similarity rankings over BoW methods taken in comparison.

4.2.8. Word-attention models:(Le et al., 2018)

In most of the corpus-based methods all text components are considered to have equal significance; however, human interpretation of measuring similarity usually depends on keywords in a given context. Word attention models capture the importance of the words from underlying corpora

(Lopez-Gazpio et al., 2019) before calculating the semantic similarity. Different techniques like word frequency, alignment, word association are used to capture the attention-weights of the text in consideration. Attention Constituency Vector Tree (ACV-Tree) proposed by Le et al.(Le et al., 2018) is similar to a parse tree where one word of a sentence is made the root and the remainder of the sentence is broken as Noun Phrase (NP) and Verb Phrase (VP). The nodes in the tree store three different attributes of the word in consideration: the word vector determined by an underlying corpus, the attention-weight, and the ”modification-relations” of the word. The modification relations can be defined as the adjectives or adverbs that modify the meaning of another word. All the three components are linked to form the representation of the word. A tree kernel function is used to determine the similarity between two words based on the equation below

(10)
(11)

where represent the represents the nodes, measures the cosine similarity between the vectors, calculates the number of common subsequences of lenght , , denote the decay factors for length of the child sequences and the height of the tree respectively, refer to the children nodes and . The algorithm is tested using the STS Benchmark datsets and has shown better performance in 12 out of 19 chosen STS Datasets (Le et al., 2018) (Quan et al., 2019).

Unlike knowledge-based systems, corpus-based systems are language and domain independent(Altınel and Ganiz, 2018). Since they are dependent on statistical measures the methods can be easily adapted across various languages using an effective corpus. With the growth of the internet, building corpora of most languages or domains has become rather easy. Simple web crawling techniques can be used to build large corpora(Baroni et al., 2009). However, the corpus-based methods do not take into consideration the actual meaning of the words. The other challenge faced by corpus-based methods is the need to process the large corpora built, which is a rather time-consuming and resource-dependent task. Since the performance of the algorithms largely depends on the underlying corpus, building an efficient corpus is paramount. However, to the extent of our knowledge, an ”ideal corpus” is still not defined by researchers.

5. Deep Neural Network-based Methods

Semantic similarity methods have exploited the recent developments in neural networks to enhance performance. The most widely used techniques include Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), Bidirectional Long Short Term Memory (Bi-LSTM), and Recursive Tree LSTM. Deep neural network models are built based on two fundamental operations: convolution and pooling. The convolution operation in text data may be defined as the sum, of the element-wise product of a sentence vector and a weight matrix. Convolution operations are used for feature extraction. Pooling operations are used to eliminate features that have a negative impact, and only consider those feature values that have a considerable impact on the task at hand. There are different types of pooling operations and the most widely used is Max pooling, where, only the maximum value in the given filter space is selected. This section describes some of the methods that deploy deep neural networks to estimate semantic similarity between text snippets.

5.1. Types of deep neural network-based semantic similarity methods:

  • Wang et al.(Wang et al., 2016) proposed a model to estimate semantic similarity between two sentences based on lexical decomposition and composition. The model uses word2vec pretrained embeddings to form a vector representation of the sentences and . A similarity matrix of dimension x is built where i and j, are the number of words in sentence 1 () and sentence 2 () respectively. The cells of the matrix are populated with the cosine similarity between the words in the indices of the matrix. Three different functions are used to construct semantic matching vectors and , the global, local and max function. The global function constructs the semantic matching vector of by taking the weighted sum of the vectors, of all the words in , the local function, takes into consideration only word vectors within a given window size, and the max function takes only the vectors of the words, that have the maximum similarity. The second phase of the algorithm uses three different decomposition functions - rigid, linear and orthogonal - to estimate the similarity component and the dissimilarity component between the sentence vectors and the semantic matching vectors. Both the similarity component and the dissimilarity component vectors are passed through a two-channel convolution layer followed by a single max-pooling layer. The similarity is then calculated using a sigmoid layer that estimates the similarity value within the range of 0 and 1. The model was tested using the QASent dataset(Wang et al., 2007) and the WikiQA dataset(Meek, 2018). The two measures used to estimate the performance are mean average precision (MAP) and mean reciprocal rank (MRR). The model achieves the best MAP in the QASent dataset and the best MAP and MRR in the WikiQA dataset.

  • Yang Shao (Shao, 2017)

    proposed a semantic similarity algorithm that exploits, the recent development in neural networks using GloVe word embeddings. Given two sentences, the model predicts a probability distribution over set semantic similarity values. The pre-processing steps involve the removal of punctuation, tokenization, and using GloVe vectors to replace words with word embeddings. The length of the input is set to 30 words, which is achieved by removal or padding as deemed necessary. Some special hand-crafted features like flag values indicating if the words or numbers occurred in both the sentences and POS tagging one hot encoded values, were added to the GloVe vectors. The vectors are then fed to Convolutional Neural Network (CNN) with 300 filters and 1 max-pooling layer which is used to form the sentence vectors. Relu activation function is used in the convolution layer. The semantic difference between the vectors is calculated by the element-wise absolute difference and the element-wise multiplication of the two, sentence-vectors generated. The vectors are further passed through two fully-connected layers, which predicts the probability distribution of the semantic similarity values. The model performance was evaluated using the SemEval datasets where the model was ranked 3rd in SemEval 2017 dataset track.

  • The LSTM networks are a special kind of Recurrent Neural Networks (RNN). While processing text data, it is essential for the networks to remember previous words, to capture the context, and RNNs have the capacity to do so. However, not all the previous content has significance over the next word/phrase, hence RNNs suffer the drawback of long term dependency. LSTMs are designed to overcome this problem. LSTMs have gates which enable the network to choose the content it has to remember. For example, Consider the text snippet,

    “Mary is from Finland. She is fluent in Finnish. She loves to travel”. While we reach the second sentence of the text snippet, it is essential to remember the words of “Mary” and “Finland”. However, on reaching the third sentence the network may forget the word “Finland”. The architecture of LSTMs allows this. Many researchers use the LSTM architecture to measure semantic similarity between blocks of texts. Tien et al.(Tien et al., 2019) uses a network combined with LSTM and CNN to form a sentence embedding from pretrained word embeddings followed by an LSTM architecture to predict their similarity. Tai et al.(Tai et al., 2015)

    proposed an LSTM architecture to estimate the semantic similarity between two given sentences. Initially, the sentences are converted to sentence representations using Tree-LSTM over the parse tree of the sentences. These sentence representations are then, fed to a neural network which calculates the absolute distance between the vectors and the angle between the vectors. The experiment was conducted using the SICK dataset, and the similarity measure varies with the range 1 to 5. The hidden layer consisted of 50 neurons and the final softmax layer classifies the sentences over the given range. The Tree-LSTM model achieved better Pearson’s and Spearman’s correlation with the gold standards than the other neural network models in comparison.

  • He and Lin(He and Lin, 2016) proposed a hybrid architecture using Bi-LSTM and CNN to estimate the semantic similarity of the model. Bi-LSTMs have two LSTMs that run parallel, one from the beginning of the sentence and one from the end, thus capturing the entire context. In their model, He and Lin use Bi-LSTM for context modelling. A pairwise word interaction model is built that calculates a comparison unit between the vectors derived from the hidden states of the two LSTMs using the below formula

    (12)

    where and represent the vectors from the hidden state of the LSTMs and the functions , , calculate the Cosine distance, Euclidean distance, and Manhattan distance, respectively. This model is similar to other recent neural network-based word attention models (Bahdanau et al., 2015)(Alexander M. Rush et al., 2015). However, attention weights are not added, rather the distances are added as weights. The word interaction model is followed by a similarity focus layer where weights are added to the word interactions (calculated in the previous layers) based on their importance in determining the similarity. These re-weighted vectors are fed to the final Convolution Network. The network is composed of alternating spatial convolution layers and spatial max pooling layers, ReLU activation function is used and at the network ends with two fully connected layers followed by a LogSoftmax layer to obtain a non-linear solution. This model outperforms the previously mentioned Tree-LSTM model proposed on the SICK dataset.

  • Lopez-Gazpio et al.(Lopez-Gazpio et al., 2019) proposed an extension to the existing Decomposable Attention Model (DAM) proposed by Parikh et al.(Parikh et al., 2016)

    which was originally used for Natural Language Inference(NLI). NLI is used to categories a given text block to a particular relation like entailment, neutral or contradiction. The DAM model used feed-forward neural networks in three consecutive layers the attention layer, comparison layer and the aggregation layer. Given two sentences the attention layer produces two attention vectors for each sentences by finding the overlap between them. The comparison layer concatenates the attention vectors with the sentence vectors to form a single representative vector for each sentence. The final aggregation layer flattens the vectors and calculates the probability distribution over the given values. Lopez-Gazpio et al.

    (Lopez-Gazpio et al., 2019) used word n-grams to capture attention in the first layer instead of individual words. maybe defined as a sequence of n words that are contiguous with the given word, n-grams are used to capture the context in various NLP tasks. In-order to accommodate n-grams a Recurrent Neural Network (RNN) is added to the attention layer. Variations were proposed by replacing RNN with Long-Term Short memory (LSTM) and Convolutional Neural Network (CNN). The model was used for semantic similarity calculations by replacing the final classes of entailment relationships with semantic similarity ranges from 0 to 5. The models achieved better performance in capturing the semantic similarity in the SICK dataset and STS Benchmark dataset when compared to DAM and other state-of-the-art models like Sent2Vec(Pagliardini et al., 2018), BiLSTM among others.

Deep neural network-based methods outperform most of the traditional methods however implementation of deep-learning models requires large computational resources. Mos deep-learning models are ”black-box” models and it is difficult to ascertain the features based on which the performance is achieved, hence it becomes difficult to be interpreted unlike in the case of corpus-based methods that have a strong mathematical foundation. Various fields like finance, insurance, etc. that deal with sensitive data may be reluctant to deploy deep neural network-based methods due to their lack of interpretability.

6. Hybrid Methods

Based on all the previously discussed methods we see that each has its advantages and disadvantages. The knowledge-based methods exploit the underlying ontologies to disambiguate synonyms, while corpus-based methods are versatile as they can be used across languages. Deep neural network-based systems, though computationally expensive, provide better results. However, many researchers have found ways to exploit the best of each method and build hybrid models to measure semantic similarity. In this section, we describe the methodologies used in some of the widely used hybrid models.

6.1. Types of hybrid semantic similarity methods:

  • Novel Approach to a Semantically-Aware Representation of Items (NASARI) (Camacho-Collados et al., 2015): Camacho Collados et al. proposed an approach the were the knowledge source BabelNet is used to build a corpus based on which vector representation for concepts (words or group of words) are formed. Initially, the Wikipedia pages associated with a given concept, in this case, the synset

    of BabelNet, and all the outgoing links from the given page are used to form a sub-corpus for the specific concept. The sub-corpus is further expanded with the Wikipedia pages of the hypernyms and hyponyms of the concept in the BabelNet network. The entire Wikipedia is considered as the reference corpus. Two different types of vector representation were proposed. In the first method, weighted vectors were formed using lexical specificity. Lexical specificity is a statistical method of identifying the most representative words for a given text, based on the hypergeometric distribution (sampling without replacement). Let ”

    and ”, denote the total content words in the refernce corpus and sub-corpus respectively and ” and ” denote the frequency of the given word in the reference corpus and sub-corpus respectively, then lexical specificity can be represented by the below equation

    (13)

    X represents a random variable that follows a hypergemotric relation with the parameters

    , and and is defined as,

    (14)

    is the probability of a given term appearing exactly times in the given sub-corpus in hypergeometric distribution with , and . The second method forms a cluster of words in the sub-corpus that share a common hypernym in the WordNet taxonomy which is embedded in BabelNet. The specificity is then measured based on the frequency of the hypernym and all its hyponyms in the taxonomy, even those that did not occur in the given sub-corpus. This clustering technique forms a unified representation of the words that preserve the semantic properties. The specificity values are added as weights in both methods to rank the terms in a given text. The first method of vector representation was called and the second method was called . The similarity between these vectors is calculated using the measure called Weighted Overlap(Pilehvar et al., 2013) as,

    (15)

    where denotes the overlapping terms in each vector and represent the rank of the term in the vector .
    Camacho Collados et al.(Camacho-Collados et al., 2016) proposed an extension to their previous work and proposed a third vector representation by mapping the lexical vector to the semantic space of word embeddings produced by complex word embedding techniques like word2vec. This representation was called as . The similarity is measured as the cosine similarity between these vectors. All three methods were implemented using gold standard datasets M&C, WS-Sim and SimLex-999. achieved higher Pearson’s and Spearman correlation in average over the three datasets in comparison with other methods like ESA, and .

  • Most Suitable Sense Annotation (MSSA) (Ruas et al., 2019): Ruas et al. proposed three different methodologies to form word-sense embeddings. Given a corpus, the word-sense disambiguation step is performed using one of the three proposed methods: Most Suitable Sense Annotation (MSSA), Most Suitable Sense Annotation N Refined (MSSA-NR) and Most Suitable Sense Annotation Dijkstra (MSSA-D). Given a corpus each word in the corpus is associated with a synset in the WordNet ontology and ”gloss-average-vector” is calculated for each synset. The gloss-average-vector is formed using the vector representation of the words in the gloss of each synset. MSSA calculates the gloss-average-vector using a small window of words and returns the synset of the word which has the highest gloss-average-vector value. MSSA-D, however, considers the entire document from the first word to the last word and then determines the associated synset. These two systems use Google News vectors111https://code.google.com/archive/p/word2vec/ . to form the synset-embeddings. MSSA-NR is an iterative model, where the first pass produces the synset-embeddings, that are fed back in the second pass as a replacement to gloss-average-vectors to produce more refined synset-embeddings. These synset-embeddings are then fed to a word2vec CBOW model to produce multi-sense word embeddings that are used to calculate the semantic similarity. This combination of MSSA variations and word2vec produced solid results in gold standard datasets like R&G, M&C, WS353-Sim, and SimLex-999(Ruas et al., 2019).

  • Unsupervised Ensemble Semantic Textual Similarity Methods (UESTS) (Hassan et al., 2019): Hassan et al. proposed an ensemble semantic similarity method based on an underlying unsupervised word-aligner. The model calculates the semantic similarity as the weighted sum of four different semantic similarity measures between sentences and using the equation below

    (16)

    calculates similarity using a synset-based word aligner. The similarity between text is measured based on the number of shared neighbors each term has in the BableNet taxonomy. measures similarity using soft cardinality measure between the terms in comparison. The soft cardinality function treats each word as a set and the similarity between them as an intersection between the sets. forms word vector representations using the word embeddings proposed by Baroni et al.(Baroni et al., 2014). Then similarity is measured as the cosine value between the two vectors. is a measure of dissimilarity between two given sentences. The edit distance is defined as the minimum number of edits it takes to convert one sentence to another. The edits may involve insertion, deletion or substitution. uses word-sense edit distance where word-senses are taken into consideration instead of actual words themselves. The hyperparameters , , , and were tuned to values between 0 and 0.5 for different STS Benchmark datasets. The Ensemble model outperformed the STS Benchmark unsupervised models in the 2017 SemEval series on various STS Benchmark datasets.

Hybrid methods exploit both the structural efficiency offered by knowledge-based methods and the versatility of corpus-based methods. Many researchers have been conducted to build multi-sense embeddings in order to incorporate the actual meaning of words into word vectors. Iacobacci et al. formed word embeddings called ”Sensembed” by using BabelNet to form a sense annotated corpus and then using to build word vectors thus having different vectors for different senses of the words. As we can see, hybrid models compensate for the shortcomings of one method by incorporating other methods. Hence the performance of hybrid methods is comparatively high. The first 5 places of SemEval 2017 semantic similarity tasks were awarded to ensemble models which clearly shows the shift in research towards hybrid models(Cer et al., 2017).

7. Analysis of Survey

This section discusses the method used to build this survey article and provides an overview of the various research articles taken into consideration.

7.1. Search Strategy:

The articles considered for this survey were obtained using the Google Scholar search engine and the keywords used include “semantic similarity, word embedding, knowledge-based methods, corpus-based methods, deep neural network-based semantic similarity, LSTM, text processing, and Semantic similarity datasets.”

The results of the search were fine-tuned using various parameters like the Journal Ranking, Google Scholar Index, number of citations, year of publication, etc. Only articles published in journals with Scimago Journal ranking of Quartile 1 and conferences that have a Google metrics H-index above 50 were considered. Exceptions were made for some articles that have a higher impact and relevance. The table of references sorted by the year of publication is included in the Appendix. The table records 1) Title, 2) Year of Publication, 3) Author Names, 4) Venue, 5) SJR Quartile (for journals), 6) H-Index, and 7) Number of Citations(as on 02.04.2020). Some of the statistical results of the chosen articles are shown in the figures below. Fig

3 shows the distribution of the referenced articles over conferences, journals, and others. 52% of the articles are from conferences and 45% of the articles are from Journals. The remaining 3% of the articles are from Axriv. However, they have rather a high impact in relation to the topic of the survey. Fig 3 highlights the distribution of the selected articles over the years. Nearly 72% of the chosen articles are works carried out after 2010, the remaining 28% of the articles represent the traditional methods adopted during the early stages of the evolution of semantic similarity. Fig 4 represents the citation range of the articles. 34% of the articles have 50 to 500 citations, 28% have 1,000-10,000 citations. We see that 27% of the articles have citations below 50 however, all these articles are published after the year 2017 which accounts for the fewer citations.

Figure 2. Distribution of articles over venues.
Figure 3. Distribution of articles over years.
Figure 4. Distribution of citation range over the articles.
Figure 5. World cloud representing the collection of words from the abstracts of the papers used in the survey.

7.2. Word-cloud generation:

We implemented a simple python code to generate a word cloud using the abstracts from all the articles used in this survey. The abstracts from all the 100 articles were used to build a dataset that was then used in the python code. The extracted abstracts are initially pre-processed by converting the text to lower case, removing the punctuation, and removing the most commonly used English stop words available in the nltk222http://www.nltk.org/. library. Then the word-cloud is built using the python library. The word cloud thus built is shown in Fig 5. From the word cloud, we infer that though different keywords were used in our search for articles the general focus of the selected articles is semantic similarity. In a word cloud the size of the words is propotional to the frequency of use of these words. The word “word” is considerably bigger than the word “sentence” showing that most of the research works focus on word-to-word similarity rather than sentence-to-sentence similarity. We could also infer that the words ”vector” and ”representation” have been used more frequently than the words ”information, ”context”, and ”context” indicating the influence of corpus-based methods over knowledge-based methods. With the given word cloud we showcase the focus of the survey graphically.

8. Conclusion

Measuring semantic similarity between two text snippets has been one of the most challenging tasks in the field of Natural Language Processing. Various methodologies have been proposed over the years to measure the semantic similarity. The survey discusses the advantages and disadvantages of various methods. Knowledge-based methods taken into consideration the actual meaning of text however, they are not adaptable across different domains and languages. Corpus-based methods have a statistical background and can be implemented across languages but they do not take into consideration the actual meaning of the text. Deep neural network-based methods show better performance, but they require high computational resources and lack interpretability. Hybrid methods are formed to take advantage of the benefits from different methods compensating the shortcomings of each other. It is clear from the survey that each method has its advantages and disadvantages and it is difficult to choose one best model, however, most recent hybrid methods have shown promising results over other independent models. This survey would serve as a good foundation for researchers who intend to find new methods to measure semantic similarity.

Acknowledgements.
The authors would like to extend our gratitude to the research team in the DaTALab at Lakehead University for their support, in particular Abhijit Rao, Mohiuddin Qudar, Punardeep Sikka and Andrew Heppner for their feedback and revisions on this publication. We would also like to thank Lakehead University, CASES and the Ontario Council for Articulation and Transfer, without their support this research would not have been possible.

References

  • E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 19. Cited by: Table 3, 4th item, Table 1, §4.2.7.
  • E. Agirre, C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, I. Lopez-Gazpio, M. Maritxalar, R. Mihalcea, et al. (2015) Semeval-2015 task 2: semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pp. 252–263. Cited by: Table 3, 9th item, Table 1.
  • E. Agirre, C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, R. Mihalcea, G. Rigau, and J. Wiebe (2014) Semeval-2014 task 10: multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 81–91. Cited by: Table 3, 9th item, Table 1.
  • E. Agirre, C. Banea, D. Cer, M. Diab, A. Gonzalez Agirre, R. Mihalcea, G. Rigau Claramunt, and J. Wiebe (2016) Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL; 2016. p. 497-511., Cited by: Table 3, 9th item, Table 1.
  • E. Agirre, D. Cer, M. Diab, A. Gonzalez-Agirre, and W. Guo (2013) * SEM 2013 shared task: semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43. Cited by: Table 3, 9th item, Table 1.
  • E. Agirre, D. Cer, M. Diab, and A. Gonzalez-Agirre (2012) Semeval-2012 task 6: a pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 385–393. Cited by: Table 3, 9th item, Table 1.
  • Alexander M. Rush, Sumit Chopra, and Jason Weston (2015) A Neural Attention Model for Abstractive Sentence. Proceedings of the 2015 conference on empirical methods in natural language processing. 5 (3), pp. 379–389. External Links: ISSN 2302-4496 Cited by: Table 3, 4th item.
  • B. Altınel and M. C. Ganiz (2018) Semantic text classification: a survey of past and recent advances. Information Processing & Management 54 (6), pp. 1129 – 1153. External Links: ISSN 0306-4573, Document, Link Cited by: Table 3, §1.1, §4.2.8.
  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. (English (US)). Note: 3rd International Conference on Learning Representations, ICLR 2015 ; Conference date: 07-05-2015 Through 09-05-2015 Cited by: Table 3, 4th item.
  • S. Banerjee and T. Pedersen (2003) Extended gloss overlaps as a measure of semantic relatedness. In Ijcai, Vol. 3, pp. 805–810. Cited by: Table 3, §3.2.2.
  • M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta (2009) The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language resources and evaluation 43 (3), pp. 209–226. Cited by: Table 3, §4.2.8.
  • M. Baroni, G. Dinu, and G. Kruszewski (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 238–247. Cited by: Table 3, 3rd item.
  • F. Benedetti, D. Beneventano, S. Bergamaschi, and G. Simonini (2019) Computing inter-document similarity with context semantic analysis. Information Systems 80, pp. 136 – 147. External Links: ISSN 0306-4379, Document, Link Cited by: Table 3, §4.2.5.
  • C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann (2009) DBpedia-a crystallization point for the web of data. Journal of web semantics 7 (3), pp. 154–165. Cited by: Table 3, §3.2.4.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov (2017) Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, pp. 135–146. Cited by: Table 3, 3rd item.
  • A. Bordes, S. Chopra, and J. Weston (2014) Question answering with subgraph embeddings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 615–620. Cited by: Table 3, §1.
  • J. Camacho-Collados, M. T. Pilehvar, and R. Navigli (2015) Nasari: a novel approach to a semantically-aware representation of items. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 567–577. Cited by: Table 3, 1st item.
  • J. Camacho-Collados, M. T. Pilehvar, and R. Navigli (2016) Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240, pp. 36 – 64. External Links: ISSN 0004-3702, Document, Link Cited by: Table 3, 4th item, 1st item.
  • J. Camacho-Collados and M. T. Pilehvar (2018) From word to sense embeddings: a survey on vector representations of meaning. Journal of Artificial Intelligence Research 63, pp. 743–788. Cited by: Table 3, §1.1, §4.1.
  • D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia (2017) SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 1–14. Cited by: Table 3, 9th item, Table 2, §6.1.
  • R. L. Cilibrasi and P. M. Vitanyi (2007) The google similarity distance. IEEE Transactions on knowledge and data engineering 19 (3), pp. 370–383. Cited by: Table 3, §4.2.6.
  • L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin (2001) Placing search in context: the concept revisited. In Proceedings of the 10th international conference on World Wide Web, pp. 406–414. Cited by: Table 3, 3rd item, Table 1.
  • E. Gabrilovich, S. Markovitch, et al. (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis.. In IJcAI, Vol. 7, pp. 1606–1611. Cited by: Table 3, §4.2.3.
  • J. Ganitkevitch, B. Van Durme, and C. Callison-Burch (2013) PPDB: the paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 758–764. Cited by: Table 3, §4.2.4.
  • J. Gao, B. Zhang, and X. Chen (2015) A wordnet-based semantic similarity measurement combining edge-counting and information content theory. Engineering Applications of Artificial Intelligence 39, pp. 80 – 88. External Links: ISSN 0952-1976, Document, Link Cited by: Table 3, §3.2.4.
  • D. Gerz, I. Vulić, F. Hill, R. Reichart, and A. Korhonen (2016) SimVerb-3500: a large-scale evaluation set of verb similarity. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2173–2182. Cited by: Table 3, Table 1.
  • G. Glavaš, M. Franco-Salvador, S. P. Ponzetto, and P. Rosso (2018) A resource-light method for cross-lingual semantic textual similarity. Knowledge-Based Systems 143, pp. 1 – 9. External Links: ISSN 0950-7051, Document, Link Cited by: Table 3, §4.1.
  • M. A. Hadj Taieb, T. Zesch, and M. Ben Aouicha (2019) A survey of semantic relatedness evaluation datasets and procedures. Artificial Intelligence Review. External Links: ISSN 1573-7462, Document, Link Cited by: Table 3, §1.1, §1.
  • B. Hassan, S. E. Abdelrahman, R. Bahgat, and I. Farag (2019) UESTS: an unsupervised ensemble semantic textual similarity method. IEEE Access 7, pp. 85462–85482. Cited by: Table 3, 3rd item.
  • H. He and J. Lin (2016) Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 937–948. External Links: Link, Document Cited by: Table 3, 4th item.
  • F. Hill, R. Reichart, and A. Korhonen (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41 (4), pp. 665–695. Cited by: Table 3, 7th item, Table 1.
  • J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum (2013) YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence 194, pp. 28–61. Cited by: Table 3, §3.2.4.
  • H. K. Janda, A. Pawar, S. Du, and V. Mago (2019)

    Syntactic, semantic and sentiment analysis: the joint effect on automated essay evaluation

    .
    IEEE Access 7, pp. 108486–108503. Cited by: Table 3, §1.
  • J. J. Jiang and D. W. Conrath (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference, pp. 19–33. Cited by: Table 3, §3.2.3.
  • Y. Jiang, W. Bai, X. Zhang, and J. Hu (2017) Wikipedia-based information content and semantic similarity computation. Information Processing & Management 53 (1), pp. 248 – 265. External Links: ISSN 0306-4573, Document, Link Cited by: Table 3, 3rd item, §3.2.3.
  • Y. Jiang, X. Zhang, Y. Tang, and R. Nie (2015) Feature-based approaches to semantic similarity assessment of concepts using wikipedia. Information Processing & Management 51 (3), pp. 215–234. Cited by: Table 3, §3.2.2.
  • S. Kim, N. Fiorini, W. J. Wilbur, and Z. Lu (2017) Bridging the gap: incorporating a semantic similarity measure for effectively mapping pubmed queries to documents. Journal of biomedical informatics 75, pp. 122–127. Cited by: Table 3, §1.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Cited by: Table 3, §1.
  • T. K. Landauer and S. T. Dumais (1997) A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge.. Psychological review 104 (2), pp. 211. Cited by: Table 3, §4.2.1.
  • T. K. Landauer, P. W. Foltz, and D. Laham (1998) An introduction to latent semantic analysis. Discourse processes 25 (2-3), pp. 259–284. Cited by: Table 3, §4.2.1.
  • J. J. Lastra-Díaz, A. García-Serrano, M. Batet, M. Fernández, and F. Chirigati (2017) HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems 66, pp. 97–118. Cited by: Table 3, §3.2.4.
  • J. J. Lastra-Díaz and A. García-Serrano (2015) A new family of information content models with an experimental survey on wordnet. Knowledge-Based Systems 89, pp. 509–526. Cited by: Table 3, §1.1.
  • J. J. Lastra-Díaz, J. Goikoetxea, M. A. H. Taieb, A. García-Serrano, M. B. Aouicha, and E. Agirre (2019) A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Engineering Applications of Artificial Intelligence 85, pp. 645 – 665. External Links: ISSN 0952-1976, Document, Link Cited by: Table 3, §1.1, §3.2.2, 2nd item.
  • Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In

    International conference on machine learning

    ,
    pp. 1188–1196. Cited by: Table 3, 1st item.
  • Y. Le, Z. Wang, Z. Quan, J. He, and B. Yao (2018) ACV-tree: a new method for sentence similarity modeling.. In IJCAI, pp. 4137–4143. Cited by: Table 3, §4.2.8, §4.2.8, §4.2.8.
  • M. C. Lee (2011) A novel sentence similarity measure for semantic-based expert systems. Expert Systems with Applications 38 (5), pp. 6392–6399. Cited by: Table 3, §3.2.4.
  • O. Levy and Y. Goldberg (2014) Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 302–308. Cited by: Table 3, §4.1, §4.2.7.
  • P. Li, H. Wang, K. Q. Zhu, Z. Wang, and X. Wu (2013) Computing term similarity by large probabilistic isa knowledge. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 1401–1410. Cited by: Table 3, Table 1.
  • Y. Li, Z. A. Bandar, and D. McLean (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on knowledge and data engineering 15 (4), pp. 871–882. Cited by: Table 3, §3.2.1.
  • Y. Li, D. McLean, Z. A. Bandar, J. D. O’shea, and K. Crockett (2006) Sentence similarity based on semantic nets and corpus statistics. IEEE transactions on knowledge and data engineering 18 (8), pp. 1138–1150. Cited by: Table 3, 5th item, Table 1.
  • D. Lin et al. (1998) An information-theoretic definition of similarity.. In Icml, Vol. 98, pp. 296–304. Cited by: Table 3, §3.2.3.
  • I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, and E. Agirre (2017) Interpretable semantic textual similarity: finding and explaining differences between sentences. Knowledge-Based Systems 119, pp. 186 – 199. External Links: ISSN 0950-7051, Document, Link Cited by: Table 3, §1.
  • I. Lopez-Gazpio, M. Maritxalar, M. Lapata, and E. Agirre (2019) Word n-gram attention models for sentence similarity and inference. Expert Systems with Applications 132, pp. 1 – 11. External Links: ISSN 0957-4174, Document, Link Cited by: Table 3, §4.2.8, 5th item.
  • K. Lund and C. Burgess (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior research methods, instruments, & computers 28 (2), pp. 203–208. Cited by: Table 3, §4.2.2.
  • M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi, and R. Zamparelli (2014) A sick cure for the evaluation of compositional distributional semantic models. Cited by: Table 3, 8th item, Table 1.
  • B. McCann, J. Bradbury, C. Xiong, and R. Socher (2017) Learned in translation: contextualized word vectors. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6297–6308. Cited by: Table 3, 2nd item.
  • [57] B. T. McInnes, Y. Liu, T. Pedersen, G. B. Melton, and S. V. Pakhomov UMLS:: similarity: measuring the relatedness and similarity of biomedical concepts. In Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 28. Cited by: Table 3, §3.2.4.
  • W. Y. C. Meek (2018) WIKIQA : A Challenge Dataset for Open-Domain Question Answering. (September 2015), pp. 2013–2018. Cited by: Table 3, 1st item.
  • O. Melamud, J. Goldberger, and I. Dagan (2016) Context2vec: learning generic context embedding with bidirectional lstm. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61. Cited by: Table 3, 1st item.
  • R. Mihalcea and A. Csomai (2007) Wikify! linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 233–242. Cited by: Table 3, 3rd item.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: Table 3, 1st item, §4.1.
  • T. Mikolov, W. Yih, and G. Zweig (2013b) Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pp. 746–751. Cited by: Table 3, 1st item.
  • G. A. Miller and W. G. Charles (1991) Contextual correlates of semantic similarity. Language and cognitive processes 6 (1), pp. 1–28. Cited by: Table 3, 2nd item, Table 1.
  • G. A. Miller (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: Table 3, 1st item, §3.2.4.
  • M. Mohamed and M. Oussalah (2019) SRL-esa-textsum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Information Processing & Management 56 (4), pp. 1356–1372. Cited by: Table 3, §1.
  • R. Navigli and S. P. Ponzetto (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193. Cited by: Table 3, 4th item.
  • D. L. Nelson, C. L. McEvoy, and T. A. Schreiber (2004) The university of south florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers 36 (3), pp. 402–407. Cited by: Table 3, 7th item.
  • J. Nivre (2006) Inductive dependency parsing. Springer. Cited by: Table 3, §4.2.7.
  • M. Pagliardini, P. Gupta, and M. Jaggi (2018) Unsupervised learning of sentence embeddings using compositional n-gram features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 528–540. Cited by: Table 3, 1st item, 5th item.
  • A. Parikh, O. Täckström, D. Das, and J. Uszkoreit (2016) A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2249–2255. Cited by: Table 3, 5th item.
  • A. Pawar and V. Mago (2019) Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access 7 (), pp. 16291–16308. External Links: ISSN 2169-3536 Cited by: Table 3, 1st item.
  • T. Pedersen, S. V. Pakhomov, S. Patwardhan, and C. G. Chute (2007) Measures of semantic similarity and relatedness in the biomedical domain. Journal of biomedical informatics 40 (3), pp. 288–299. Cited by: Table 3, 6th item, Table 1, §3.2.4.
  • J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: Table 3, 2nd item, §4.1.
  • M. T. Pilehvar and J. Camacho-Collados (2019) WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1267–1273. Cited by: Table 3, Table 1.
  • M. T. Pilehvar, D. Jurgens, and R. Navigli (2013) Align, disambiguate and walk: a unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1341–1351. Cited by: Table 3, 1st item.
  • M. T. Pilehvar and R. Navigli (2015) From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence 228, pp. 95 – 128. External Links: ISSN 0004-3702, Document, Link Cited by: Table 3, 2nd item.
  • R. Qu, Y. Fang, W. Bai, and Y. Jiang (2018) Computing semantic similarity based on novel models of semantic representation using wikipedia. Information Processing & Management 54 (6), pp. 1002 – 1021. External Links: ISSN 0306-4573, Document, Link Cited by: Table 3, 3rd item.
  • Z. Quan, Z. Wang, Y. Le, B. Yao, K. Li, and J. Yin (2019) An efficient framework for sentence similarity modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (4), pp. 853–865. External Links: Document, ISSN 2329-9304 Cited by: Table 3, §4.2.8.
  • R. Rada, H. Mili, E. Bicknell, and M. Blettner (1989) Development and application of a metric on semantic nets. IEEE transactions on systems, man, and cybernetics 19 (1), pp. 17–30. Cited by: Table 3, §3.2.1.
  • P. Resnik (1995) Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence-Volume 1, pp. 448–453. Cited by: Table 3, §3.2.3.
  • M. A. Rodríguez and M. J. Egenhofer (2003) Determining semantic similarity among entity classes from different ontologies. IEEE transactions on knowledge and data engineering 15 (2), pp. 442–456. Cited by: Table 3, §3.2.3.
  • T. Ruas, W. Grosky, and A. Aizawa (2019) Multi-sense embeddings through a word sense disambiguation process. Expert Systems with Applications 136, pp. 288 – 303. External Links: ISSN 0957-4174, Document, Link Cited by: Table 3, 2nd item.
  • H. Rubenstein and J. B. Goodenough (1965) Contextual correlates of synonymy. Communications of the ACM 8 (10), pp. 627–633. Cited by: Table 3, 1st item, Table 1.
  • D. Sánchez, M. Batet, D. Isern, and A. Valls (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications 39 (9), pp. 7718 – 7728. External Links: ISSN 0957-4174, Document, Link Cited by: Table 3, 1st item, §3.2.2, §3.
  • D. Sánchez, M. Batet, and D. Isern (2011) Ontology-based information content computation. Knowledge-based systems 24 (2), pp. 297–303. Cited by: Table 3, §3.2.3.
  • D. Sánchez and M. Batet (2013) A semantic similarity method based on information content exploiting multiple ontologies. Expert Systems with Applications 40 (4), pp. 1393 – 1399. External Links: ISSN 0957-4174, Document, Link Cited by: Table 3, §3.2.3.
  • T. Schnabel, I. Labutov, D. Mimno, and T. Joachims (2015) Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 298–307. Cited by: Table 3, §4.1.
  • Y. Shao (2017) HCTI at semeval-2017 task 1: use convolutional neural network to evaluate semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 130–133. Cited by: Table 3, 2nd item.
  • C. Silberer and M. Lapata (2014)

    Learning grounded meaning representations with autoencoders

    .
    In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 721–732. Cited by: Table 3, Table 1.
  • R. A. Sinoara, J. Camacho-Collados, R. G. Rossi, R. Navigli, and S. O. Rezende (2019) Knowledge-enhanced document embeddings for text classification. Knowledge-Based Systems 163, pp. 955 – 971. External Links: ISSN 0950-7051, Document, Link Cited by: Table 3, §4.2.5, §4.2.5.
  • G. Soğancıoğlu, H. Öztürk, and A. Özgür (2017) BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 33 (14), pp. i49–i58. External Links: ISSN 1367-4803, Document, Link, https://academic.oup.com/bioinformatics/article-pdf/33/14/i49/25157316/btx238.pdf Cited by: Table 3, §3.2.4.
  • M. A. Sultan, S. Bethard, and T. Sumner (2015) Dls@ cu: sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 148–153. Cited by: Table 3, §4.2.4, §4.2.4.
  • K. S. Tai, R. Socher, and C. D. Manning (2015) Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566. Cited by: Table 3, 3rd item.
  • N. H. Tien, N. M. Le, Y. Tomohiro, and I. Tatsuya (2019) Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity. Information Processing & Management 56 (6), pp. 102090. External Links: ISSN 0306-4573, Document, Link Cited by: Table 3, 3rd item.
  • J. Tissier, C. Gravier, and A. Habrard (2017) Dict2vec: learning word embeddings using lexical dictionaries. In Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), pp. 254–263. Cited by: Table 3, 1st item.
  • M. Wang, N. A. Smith, and T. Mitamura (2007) What is the Jeopardy model? A quasi-synchronous grammar for QA. EMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (June), pp. 22–32. Cited by: Table 3, 1st item.
  • Z. Wang, H. Mi, and A. Ittycheriah (2016) Sentence similarity learning by lexical decomposition and composition. COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers (challenge 2), pp. 1340–1349. External Links: 1602.07019, ISBN 9784879747020 Cited by: Table 3, 1st item.
  • Z. Wu and M. Palmer (1994) Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pp. 133–138. Cited by: Table 3, §3.2.1.
  • G. Zhu and C. A. Iglesias (2017) Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge and Data Engineering 29 (1), pp. 72–85. External Links: Document, ISSN 2326-3865 Cited by: Table 3, 1st item, 1st item, §3.2.3, §3.2.4.
  • W. Y. Zou, R. Socher, D. Cer, and C. D. Manning (2013) Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398. Cited by: Table 3, §1.

Appendix A Table of References

Cita-tion Title Year Authors Venue SJR Quartile H- Index Citations as on 02.04.2020
(Agirre et al., 2009) A study on similarity and relatedness using distributional and wordnet-based approaches 2009 Agirre, Eneko and Alfonseca, Enrique and Hall, Keith and Kravalova, Jana and Pasca, Marius and Soroa, Aitor Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics 61 809
(Agirre et al., 2015) Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability 2015 Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Lopez-Gazpio, Inigo and Maritxalar, Montse and Mihalcea, Rada and others Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) 49 242
(Agirre et al., 2014) Semeval-2014 task 10: Multilingual semantic textual similarity 2014 Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Mihalcea, Rada and Rigau, German and Wiebe, Janyce Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) 49 220
(Agirre et al., 2016) Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation 2016 Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and Gonzalez Agirre, Aitor and Mihalcea, Rada and Rigau Claramunt, German and Wiebe, Janyce SemEval-2016. 10th International Workshop on Semantic Evaluation; 49 200
(Agirre et al., 2012) Semeval-2012 task 6: A pilot on semantic textual similarity 2012 Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) 49 498
(Agirre et al., 2013) SEM 2013 shared task: Semantic textual similarity 2013 Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity 49 268
(Alexander M. Rush et al., 2015) A Neural Attention Model for Abstractive Sentence 2015 Alexander M. Rush, Sumit Chopra and Jason Weston EMNLP 88 1350
(Altınel and Ganiz, 2018) Semantic text classification: A survey of past and recent advances 2018 Altinel, Berna and Ganiz, Murat Can Information Processing & Management Q1 88 29
(Bahdanau et al., 2015) Neural machine translation by jointly learning to align and translate 2015 Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio International Conference on Learning Representations 150 10967
(Banerjee and Pedersen, 2003) Extended gloss overlaps as a measure of semantic relatedness 2003 Banerjee, Satanjeev and Pedersen, Ted IJCAI 109 953
(Baroni et al., 2009) The WaCky wide web: a collection of very large linguistically processed web-crawled corpora 2009 Baroni, Marco and Bernardini, Silvia and Ferraresi, Adriano and Zanchetta, Eros Language resources and evaluation 40 1130
(Baroni et al., 2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors 2014 Baroni, Marco and Dinu, Georgiana and Kruszewski, Germán Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 106 1166
(Benedetti et al., 2019) Computing inter-document similarity with Context Semantic Analysis 2019 Fabio Benedetti and Domenico Beneventano and Sonia Bergamaschi and Giovanni Simonini Information Systems Q1 76 24
(Bizer et al., 2009) DBpedia-A crystallization point for the Web of Data 2009 Bizer, Christian and Lehmann, Jens and Kobilarov, Georgi and Auer, Sören and Becker, Christian and Cyganiak, Richard and Hellmann, Sebastian Journal of web semantics 28 2331
(Bojanowski et al., 2017) Enriching word vectors with subword information 2017 Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas Transactions of the Association for Computational Linguistics 47 2935
(Bordes et al., 2014) Question Answering with Subgraph Embeddings 2014 Bordes, Antoine and Chopra, Sumit and Weston, Jason EMNLP 88 433
(Camacho-Collados and Pilehvar, 2018) From Word to Sense Embeddings: A Survey on Vector Representations of Meaning 2018 Camacho-Collados, Jose and Pilehvar, Mohammad Taher Journal of Artificial Intelligence Research Q1 103 69
(Camacho-Collados et al., 2015) Nasari: a novel approach to a semantically-aware representation of items 2015 Camacho-Collados, Jos{\’e} and Pilehvar, Mohammad Taher and Navigli, Roberto Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 61 74
(Camacho-Collados et al., 2016) Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities 2016 José Camacho-Collados and Mohammad Taher Pilehvar and Roberto Navigli Artificial Intelligence Q1 135 117
(Cer et al., 2017) Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation 2017 Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, Inigo and Specia, Lucia Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 49 227
(Cilibrasi and Vitanyi, 2007) The google similarity distance 2007 Cilibrasi, Rudi L and Vitanyi, Paul MB IEEE Transactions on knowledge and data engineering Q1 148 2042
(Finkelstein et al., 2001) Placing search in context: The concept revisited 2001 Finkelstein, Lev and Gabrilovich, Evgeniy and Matias, Yossi and Rivlin, Ehud and Solan, Zach and Wolfman, Gadi and Ruppin, Eytan Proceedings of the 10th international conference on World Wide Web 70 1768
(Gabrilovich et al., 2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. 2007 Gabrilovich, Evgeniy and Markovitch, Shaul and others IJCAI 109 2514
(Ganitkevitch et al., 2013) PPDB: The paraphrase database 2013 Ganitkevitch, Juri and Van Durme, Benjamin and Callison-Burch, Chris Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 61 493
(Gao et al., 2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory 2015 Jian-Bo Gao and Bao-Wen Zhang and Xiao-Hua Chen Engineering Applications of Aritifical Intelligence Q1 86 74
(Gerz et al., 2016) SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity 2016 Gerz, Daniela and Vuli{\’c}, Ivan and Hill, Felix and Reichart, Roi and Korhonen, Anna EMNLP 88 113
(Glavaš et al., 2018) A resource-light method for cross-lingual semantic textual similarity 2018 Goran Glavaš and Marc Franco-Salvador and Simone P. Ponzetto and Paolo Rosso Knowledge-based Systems Q1 94 13
(Hadj Taieb et al., 2019) A survey of semantic relatedness evaluation datasets and procedures 2019 Hadj Taieb, Mohamed Ali and Zesch, Torsten and Ben Aouicha, Mohamed Artificial Intelligence Review Q1 63
(Hassan et al., 2019) UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method 2019 Hassan, Basma and Abdelrahman, Samir E and Bahgat, Reem and Farag, Ibrahim IEEE Access Q1 56 1
(He and Lin, 2016) Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement 2016 He, Hua and Lin, Jimmy Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 61 140
(Hill et al., 2015) Simlex-999: Evaluating semantic models with (genuine) similarity estimation 2015 Hill, Felix and Reichart, Roi and Korhonen, Anna Computational Linguistics Q2 85 728
(Hoffart et al., 2013) YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia 2013 Hoffart, Johannes and Suchanek, Fabian M and Berberich, Klaus and Weikum, Gerhard Artificial Intelligence Q1 135 1064
(Janda et al., 2019) Syntactic, Semantic and Sentiment Analysis: The Joint Effect on Automated Essay Evaluation 2019 Janda, Harneet Kaur and Pawar, Atish and Du, Shan and Mago, Vijay IEEE Access Q1 56
(Jiang and Conrath, 1997) Semantic similarity based on corpus statistics and lexical taxonomy 1997 Jiang, Jay J and Conrath, David W COLING 41 3682
(Jiang et al., 2017) Wikipedia-based information content and semantic similarity computation 2017 Yuncheng Jiang and Wen Bai and Xiaopei Zhang and Jiaojiao Hu Information Processing & Management Q1 88 43
(Jiang et al., 2015) Feature-based approaches to semantic similarity assessment of concepts using Wikipedia 2015 Jiang, Yuncheng and Zhang, Xiaopei and Tang, Yong and Nie, Ruihua Information Processing & Management Q1 88 55
(Kim et al., 2017) Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents 2017 Kim, Sun and Fiorini, Nicolas and Wilbur, W John and Lu, Zhiyong Journal of biomedical informatics Q1 83 14
(Kim, 2014) Convolutional Neural Networks for Sentence Classification 2014 Kim, Yoon EMNLP 88 6790
(Landauer and Dumais, 1997) A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. 1997 Landauer, Thomas K and Dumais, Susan T Psychological review Q1 192 6963
(Landauer et al., 1998) An introduction to latent semantic analysis 1998 Landauer, Thomas K and Foltz, Peter W and Laham, Darrell Discourse Processes Q1 50 5752
(Lastra-Díaz and García-Serrano, 2015) A new family of information content models with an experimental survey on WordNet 2015 Lastra-Díaz, Juan J and García-Serrano, Ana Knowledge-Based Systems Q1 94 12
(Lastra-Díaz et al., 2017) HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset 2017 Lastra-Díaz, Juan J and García-Serrano, Ana and Batet, Montserrat and Fernández, Miriam and Chirigati, Fernando Information Systems Q1 76 27
(Lastra-Díaz et al., 2019) A reproducible survey on word embeddings and ontology-based methods forword similarity: Linear combinations outperform the state of the art 2019 Juan J. Lastra-Díaz and Josu Goikoetxea and Mohamed Ali Hadj Taieb and Ana García-Serrano and Mohamed Ben Aouicha and Eneko Agirre Engineering Applications of Aritifical Intelligence Q1 86 7
(Le and Mikolov, 2014) Distributed representations of sentences and documents 2014 Le, Quoc and Mikolov, Tomas International conference on machine learning 135 5406
(Le et al., 2018) ACV-tree: A New Method for Sentence Similarity Modeling. 2018 Le, Yuquan and Wang, Zhi-Jie and Quan, Zhe and He, Jiawei and Yao, Bin IJCAI 109 4
(Lee, 2011) A novel sentence similarity measure for semantic-based expert systems. 2011 Lee, Ming Che Expert Systems with Applications Q1 162 47
(Levy and Goldberg, 2014) Dependency-based word embeddings 2014 Levy, Omer and Goldberg, Yoav Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 106 860
(Li et al., 2013) Computing term similarity by large probabilistic isa knowledge 2013 Li, Peipei and Wang, Haixun and Zhu, Kenny Q and Wang, Zhongyuan and Wu, Xindong Proceedings of the 22nd ACM international conference on Information & Knowledge Management 48 56
(Li et al., 2003) An approach for measuring semantic similarity between words using multiple information sources 2003 Li, Yuhua and Bandar, Zuhair A and McLean, David IEEE Transactions on knowledge and data engineering Q1 148 1315
(Li et al., 2006) Sentence similarity based on semantic nets and corpus statistics 2006 Li, Yuhua and McLean, David and Bandar, Zuhair A and O’shea, James D and Crockett, Keeley IEEE transactions on knowledge and data engineering Q1 148 849
(Lin and others, 1998) An information-theoretic definition of similarity. 1998 Lin ICML 135 5263
(Lopez-Gazpio et al., 2017) Interpretable semantic textual similarity: Finding and explaining differences between sentences 2017 I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre, Knowledge-based Systems Q1 94 16
(Lopez-Gazpio et al., 2019) Word n-gram attention models for sentence similarity and inference 2019 I. Lopez-Gazpio and M. Maritxalar and M. Lapata and E. Agirre Expert Systems with Applications Q1 162 2
(Lund and Burgess, 1996) Producing high-dimensional semantic spaces from lexical co-occurrence 1996 Lund, Kevin and Burgess, Curt Behavior research methods Q1 114 1869
(Marelli et al., 2014) A SICK cure for the evaluation of compositional distributional semantic models 2014 Marelli, M and Menini, S and Baroni, M and Bentivogli, L and Bernardi, R and Zamparelli, R International Conference on Language Resources and Evaluation (LREC) 45 464
(McCann et al., 2017) Learned in translation: Contextualized word vectors 2017 McCann, Bryan and Bradbury, James and Xiong, Caiming and Socher, Richard NIPS 169 376
(McInnes et al., ) UMLS:: Similarity: Measuring the Relatedness and Similarity of Biomedical Concept 2013 McInnes, Bridget T and Liu, Ying and Pedersen, Ted and Melton, Genevieve B and Pakhomov, Serguei V Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics 61 14
(Meek, 2018) WIKIQA : A Challenge Dataset for Open-Domain Question Answering 2018 Meek, Wen-tau Yih Christopher EMNLP 88 351
(Melamud et al., 2016) context2vec: Learning generic context embedding with bidirectional LSTM 2016 Melamud, Oren and Goldberger, Jacob and Dagan, Ido Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning 34 198
(Mihalcea and Csomai, 2007) Wikify! Linking documents to encyclopedic knowledge 2007 Mihalcea, Rada and Csomai, Andras Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 48 1120
(Mikolov et al., 2013a) Efficient estimation of word representations in vector space 2013 Mikolov, Tomas and Chen, Kai and Corrado, Greg and Dean, Jeffrey Archive 14807
(Mikolov et al., 2013b) Linguistic regularities in continuous space word representations 2013 Mikolov, Tomas and Yih, Wen-tau and Zweig, Geoffrey Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies 61 2663
(Miller, 1995) WordNet: a lexical database for English 1995 Miller, George A Communications of the ACM Q1 189 13223
(Miller and Charles, 1991) Contextual correlates of semantic similarity 1991 Miller, George A and Charles, Walter G Language and cognitive processes 1727
(Mohamed and Oussalah, 2019) SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis 2019 Mohamed, Muhidin and Oussalah, Mourad Information Processing & Management Q1 88 2
(Navigli and Ponzetto, 2012) BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network 2012 Navigli, Roberto and Ponzetto, Simone Paolo Artificial Intelligence Q1 135 1110
(Nelson et al., 2004) The University of South Florida free association, rhyme, and word fragment norms 2004 Nelson, Douglas L and McEvoy, Cathy L and Schreiber, Thomas A Behavior Research Methods, Instruments, & Computers Q1 114 2162
(Nivre, 2006) Inductive Dependency Parsing 2006 J Nivre Book 313
(Pagliardini et al., 2018) Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features 2018 Pagliardini, Matteo and Gupta, Prakhar and Jaggi, Martin North American Chapter of the Association for Computational Linguistics: Human Language Technologies 61 233
(Parikh et al., 2016) A Decomposable Attention Model for Natural Language Inference 2016 Parikh, Ankur and Tackstrom, Oscar and Das, Dipanjan and Uszkoreit, Jakob EMNLP 88 550
(Pawar and Mago, 2019) Challenging the Boundaries of Unsupervised Learning for Semantic Similarity 2019 A. Pawar and V. Mago, IEEE Access Q1 56 11
(Pedersen et al., 2007) Measures of semantic similarity and relatedness in the biomedical domain 2007 Pedersen, Ted and Pakhomov, Serguei VS and Patwardhan, Siddharth and Chute, Christopher G Journal of biomedical informatics Q1 83 555
(Pennington et al., 2014) Glove: Global vectors for word representation 2014 Pennington, Jeffrey and Socher, Richard and Manning, Christopher D EMNLP 88 12376
(Pilehvar and Camacho-Collados, 2019) WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations 2019 Pilehvar, Mohammad Taher and Camacho-Collados, Jose Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 61 11
(Pilehvar et al., 2013) Align, disambiguate and walk: A unified approach for measuring semantic similarity 2013 Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 106 184
(Pilehvar and Navigli, 2015) From senses to texts: An all-in-one graph-based approach for measuring semantic similarity 2015 Mohammad Taher Pilehvar and Roberto Navigli Artificial Intelligence Q1 135 66
(Qu et al., 2018) Computing semantic similarity based on novel models of semantic representation using Wikipedia 2018 Rong Qu and Yongyi Fang and Wen Bai and Yuncheng Jiang Information Processing & Management Q1 88 11
(Quan et al., 2019) An Efficient Framework for Sentence Similarity Modeling 2019 Z. Quan and Z. Wang and Y. Le and B. Yao and K. Li and J. Yin IEEE/ACM Transactions on Audio,Speech and Language Processing Q1 55 4
(Rada et al., 1989) Development and application of a metric on semantic nets 1989 Rada, Roy and Mili, Hafedh and Bicknell, Ellen and Blettner, Maria IEEE transactions on systems, man, and cybernetics Q1 111 2347
(Resnik, 1995) Using information content to evaluate semantic similarity in a taxonomy 1995 Resnik, Philip IJCAI 109 4300
(Rodríguez and Egenhofer, 2003) Determining semantic similarity among entity classes from different ontologies 2003 Rodríguez, M Andrea and Egenhofer, Max J. EMNLP 88 1183
(Ruas et al., 2019) Multi-sense embeddings through a word sense disambiguation process 2019 Terry Ruas and William Grosky and Akiko Aizawa Expert Systems with Applications Q1 162 4
(Rubenstein and Goodenough, 1965) Contextual correlates of synonymy 1965 Rubenstein, Herbert and Goodenough, John Communications of the ACM Q1 189 1336
(Sánchez et al., 2011) Ontology-based information content computation 2011 Sánchez, David and Batet, Montserrat and Isern, David Knowledge-based systems Q1 94 251
(Schnabel et al., 2015) Evaluation methods for unsupervised word embeddings 2015 Schnabel, Tobias and Labutov, Igor and Mimno, David and Joachims, Thorsten EMNLP 88 334
(Shao, 2017) HCTI at SemEval-2017 Task 1: Use Convolutional Neural Network to evaluate semantic textual similarity 2017 Shao, Yang Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 49 32
(Silberer and Lapata, 2014) Learning grounded meaning representations with autoencoders 2014 Silberer, Carina and Lapata, Mirella Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 61 127
(Sinoara et al., 2019) Knowledge-enhanced document embeddings for text classification 2018 Roberta A. Sinoara and Jose Camacho-Collados and Rafael G. Rossi and Roberto Navigli and Solange O. Rezende Knowledge-based Systems Q1 94 25
(Soğancıoğlu et al., 2017) BIOSSES: a semantic sentence similarity estimation system for the biomedical domain 2017 Soğancıoğlu, Gizem and Öztürk, Hakime and Özgür, Arzucan Bioinformatics Q1 335 34
(Sultan et al., 2015) Dls@ cu: Sentence similarity from word alignment and semantic vector composition 2015 Sultan, Md Arafat and Bethard, Steven and Sumner, Tamara Proceedings of the 9th International Workshop on Semantic Evaluation SemEval 2015 49 105
(Sánchez and Batet, 2013) A semantic similarity method based on information content exploiting multiple ontologies 2013 David Sánchez and Montserrat Batet Expert Systems with Applications Q1 162 82
(Sánchez et al., 2012) Ontology-based semantic similarity: A new feature-based approach 2012 David Sánchez and Montserrat Batet and David Isern and Aida Valls Expert Systems with Applications Q1 162 361
(Tai et al., 2015) Improved semantic representations from tree-structured long short-term memory networks 2015 Tai, Kai Sheng and Socher, Richard and Manning, Christopher D Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 106 1676
(Tien et al., 2019) Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity 2019 Nguyen Huy Tien and Nguyen Minh Le and Yamasaki Tomohiro and Izuha Tatsuya Information Processing & Management Q1 88 7
(Tissier et al., 2017) Dict2vec: Learning Word Embeddings using Lexical Dictionaries 2017 Tissier, Julien and Gravier, Christophe and Habrard, Amaury EMNLP 88 39
(Wang et al., 2007) What is the Jeopardy model? A quasi-synchronous grammar for QA 2007 Wang, Mengqiu and Smith, Noah A. and Mitamura, Teruko EMNLP 88 337
(Wang et al., 2016) Sentence similarity learning by lexical decomposition and composition 2016 Wang, Zhiguo and Mi, Haitao and Ittycheriah, Abraham COLING 41 119
(Wu and Palmer, 1994) Verbs semantics and lexical selection 1994 Wu, Zhibiao and Palmer, Martha Proceedings of the 32nd annual meeting on Association for Computational Linguistics 106 3895
(Zhu and Iglesias, 2017) Computing Semantic Similarity of Concepts in Knowledge Graphs 2017 G. Zhu and C. A. Iglesias IEEE Transactions on Knowledge and Data Engineering Q1 148 88
(Zou et al., 2013) Bilingual word embeddings for phrase-based machine translation 2013 Zou, Will Y and Socher, Richard and Cer, Daniel and Manning, Christopher D EMNLP 88 468