Domain Specific Complex Sentence (DCSC) Semantic Similarity Dataset

10/23/2020
by   Dhivya Chandrasekaran, et al.
0

Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformed based models in existing benchmark datasets like STS dataset and SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We propose a new benchmark dataset – the Domain Specific Complex Sentences (DSCS) dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and the results justify the hypothesis that the performance of the word embeddings decrease with an increase in complexity of the sentences.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 9

11/17/2018

Correcting the Common Discourse Bias in Linear Representation of Sentences using Conceptors

Distributed representations of words, better known as word embeddings, h...
10/05/2021

A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of inte...
03/26/2019

Deep Learning and Word Embeddings for Tweet Classification for Crisis Response

Tradition tweet classification models for crisis response focus on convo...
03/31/2022

A bilingual approach to specialised adjectives through word embeddings in the karstology domain

We present an experiment in extracting adjectives which express a specif...
07/05/2020

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

In order to combat the COVID-19 pandemic, society can benefit from vario...
04/02/2022

Efficient comparison of sentence embeddings

The domain of natural language processing (NLP), which has greatly evolv...
05/20/2019

Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes

Usage similarity estimation addresses the semantic proximity of word ins...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.