Domain Specific Complex Sentence (DCSC) Semantic Similarity Dataset

10/23/2020
by   Dhivya Chandrasekaran, et al.
0

Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformed based models in existing benchmark datasets like STS dataset and SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We propose a new benchmark dataset – the Domain Specific Complex Sentences (DSCS) dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and the results justify the hypothesis that the performance of the word embeddings decrease with an increase in complexity of the sentences.

READ FULL TEXT

page 5

page 9

research
11/17/2018

Correcting the Common Discourse Bias in Linear Representation of Sentences using Conceptors

Distributed representations of words, better known as word embeddings, h...
research
10/05/2021

A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of inte...
research
04/14/2023

SimpLex: a lexical text simplification architecture

Text simplification (TS) is the process of generating easy-to-understand...
research
03/31/2022

A bilingual approach to specialised adjectives through word embeddings in the karstology domain

We present an experiment in extracting adjectives which express a specif...
research
04/02/2022

Efficient comparison of sentence embeddings

The domain of natural language processing (NLP), which has greatly evolv...
research
05/20/2019

Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes

Usage similarity estimation addresses the semantic proximity of word ins...
research
10/13/2019

Feature Detection and Attenuation in Embeddings

Embedding is one of the fundamental building blocks for data analysis ta...

Please sign up or login with your details

Forgot password? Click here to reset