Solution for the EPO CodeFest on Green Plastics: Hierarchical multi-label classification of patents relating to green plastics using deep learning

02/22/2023
by   Tingting Qiao, et al.
0

This work aims at hierarchical multi-label patents classification for patents disclosing technologies related to green plastics. This is an emerging field for which there is currently no classification scheme, and hence, no labeled data is available, making this task particularly challenging. We first propose a classification scheme for this technology and a way to learn a machine learning model to classify patents into the proposed classification scheme. To achieve this, we come up with a strategy to automatically assign labels to patents in order to create a labeled training dataset that can be used to learn a classification model in a supervised learning setting. Using said training dataset, we come up with two classification models, a SciBERT Neural Network (SBNN) model and a SciBERT Hierarchical Neural Network (SBHNN) model. Both models use a BERT model as a feature extractor and on top of it, a neural network as a classifier. We carry out extensive experiments and report commonly evaluation metrics for this challenging classification problem. The experiment results verify the validity of our approach and show that our model sets a very strong benchmark for this problem. We also interpret our models by visualizing the word importance given by the trained model, which indicates the model is capable to extract high-level semantic information of input documents. Finally, we highlight how our solution fulfills the evaluation criteria for the EPO CodeFest and we also outline possible directions for future work. Our code has been made available at https://github.com/epo/CF22-Green-Hands

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2023

Recent Advances in Hierarchical Multi-label Text Classification: A Survey

Hierarchical multi-label text classification aims to classify the input ...
research
07/19/2019

DaiMoN: A Decentralized Artificial Intelligence Model Network

We introduce DaiMoN, a decentralized artificial intelligence model netwo...
research
04/08/2023

Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

This paper presents a deep learning-based pipeline for categorizing Beng...
research
11/28/2022

Generalized Category Discovery with Decoupled Prototypical Network

Generalized Category Discovery (GCD) aims to recognize both known and no...
research
04/27/2022

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Current contrastive learning frameworks focus on leveraging a single sup...
research
02/05/2020

Exploratory Machine Learning with Unknown Unknowns

In conventional supervised learning, a training dataset is given with gr...
research
05/24/2021

Classifying Math KCs via Task-Adaptive Pre-Trained BERT

Educational content labeled with proper knowledge components (KCs) are p...

Please sign up or login with your details

Forgot password? Click here to reset