Hybrid Model for Patent Classification using Augmented SBERT and KNN

03/22/2021
by   Hamid Bekamiri, et al.
0

Purpose: This study aims to provide a hybrid approach for patent claim classification with Sentence-BERT (SBERT) and K Nearest Neighbours (KNN) and explicitly focuses on the patent claims. Patent classification is a multi-label classification task in which the number of labels can be greater than 640 at the subclass level. The proposed framework predicts individual input patent class and subclass based on finding top k semantic similarity patents. Design/Methodology/Approach: The study uses transformer models based on Augmented SBERT and RoBERTa. We use a different approach to predict patent classification by finding top k similar patent claims and using the KNN algorithm to predict patent class or subclass. Besides, in this study, we just focus on patent claims, and in the future study, we add other appropriate parts of patent documents. Findings: The findings suggest the relevance of hybrid models to predict multi-label classification based on text data. In this approach, we used the Transformer model as the distance function in KNN, and proposed a new version of KNN based on Augmented SBERT. Practical Implications: The presented framework provides a practical model for patent classification. In this study, we predict the class and subclass of the patent based on semantic claims similarity. The end-user interpretability of the results is one of the essential positive points of the model. Originality/Value: The main contribution of the study included: 1) Using the Augmented approach for fine-tuning SBERT by in-domain supervised patent claims data. 2) Improving results based on a hybrid model for patent classification. The best result of F1-score at the subclass level was > 69 high interpretability of results.

READ FULL TEXT

page 1

page 7

research
03/05/2022

A Similarity-based Framework for Classification Task

Similarity-based method gives rise to a new class of methods for multi-l...
research
06/12/2023

Imbalanced Multi-label Classification for Business-related Text with Moderately Large Label Spaces

In this study, we compared the performance of four different methods for...
research
10/01/2021

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMC) seeks to find relevant lab...
research
03/11/2022

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

In this work, we carried out a study about the use of attention-based al...
research
07/01/2019

Patent Claim Generation by Fine-Tuning OpenAI GPT-2

In this work, we focus on fine-tuning an OpenAI GPT-2 pre-trained model ...
research
08/16/2023

Boosting Commit Classification with Contrastive Learning

Commit Classification (CC) is an important task in software maintenance,...
research
04/12/2021

WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Smoothed Labels

The UNESCO World Heritage List (WHL) is to identify the exceptionally va...

Please sign up or login with your details

Forgot password? Click here to reset