Patents Phrase to Phrase Semantic Matching Dataset

08/01/2022
by   Grigor Aslanyan, et al.
0

There are many general purpose benchmark datasets for Semantic Textual Similarity but none of them are focused on technical concepts found in patents and scientific publications. This work aims to fill this gap by presenting a new human rated contextual phrase to phrase matching dataset. The entire dataset contains close to 50,000 rated phrase pairs, each with a CPC (Cooperative Patent Classification) class as a context. This paper describes the dataset and some baseline models.

READ FULL TEXT

page 1

page 2

research
07/19/2022

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

Since BERT (Devlin et al., 2018), learning contextualized word embedding...
research
04/17/2016

From Incremental Meaning to Semantic Unit (phrase by phrase)

This paper describes an experimental approach to Detection of Minimal Se...
research
10/21/2022

Describing Sets of Images with Textual-PCA

We seek to semantically describe a set of images, capturing both the att...
research
03/28/2018

Handling Verb Phrase Anaphora with Dependent Types and Events

This paper studies how dependent typed events can be used to treat verb ...
research
01/25/2021

Unsupervised Key-phrase Extraction and Clustering for Classification Scheme in Scientific Publications

Several methods have been explored for automating parts of Systematic Ma...
research
12/14/2021

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Human-Object Interaction (HOI) detection is a fundamental task in high-l...
research
10/24/2022

Investigating the detection of Tortured Phrases in Scientific Literature

With the help of online tools, unscrupulous authors can today generate a...

Please sign up or login with your details

Forgot password? Click here to reset