A Comparison of Semantic Similarity Methods for Maximum Human Interpretability
The inclusion of semantic information in any similarity measures improves the efficiency of the similarity measure and provides human interpretable result. This paper presents three different methods to compute semantic similarities between short news texts. These methods are based on corpus-based and knowledge-based methods: cosine similarity using tf-idf vectors, cosine similarity using word embedding and soft cosine similarity using word embedding. As a result, cosine similarity using tf-idf vectors performed best among three in finding similarities between short news texts.
READ FULL TEXT