Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes

12/18/2020
by   Mana Iwata, et al.
0

Japanese patents are assigned a patent classification code, FI (File Index), that is unique to Japan. FI is a subdivision of the IPC, an international patent classification code, that is related to Japanese technology. FIs are revised to keep up with technological developments. These revisions have already established more than 30,000 new FIs since 2006. However, these revisions require a lot of time and workload. Moreover, these revisions are not automated and are thus inefficient. Therefore, using machine learning to assist in the revision of patent classification codes (FI) will lead to improved accuracy and efficiency. This study analyzes patent documents from this new perspective of assisting in the revision of patent classification codes with machine learning. To analyze time-series changes in patents, we used the dynamic topic model (DTM), which is an extension of the latent Dirichlet allocation (LDA). Also, unlike English, the Japanese language requires morphological analysis. Patents contain many technical words that are not used in everyday life, so morphological analysis using a common dictionary is not sufficient. Therefore, we used a technique for extracting technical terms from text. After extracting technical terms, we applied them to DTM. In this study, we determined the technological progress of the lighting class F21 for 14 years and compared it with the actual revision of patent classification codes. In other words, we extracted technical terms from Japanese patents and applied DTM to determine the progress of Japanese technology. Then, we analyzed the results from the new perspective of revising patent classification codes with machine learning. As a result, it was found that those whose topics were on the rise were judged to be new technologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2016

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

Topic modeling, a method for extracting the underlying themes from a col...
research
08/30/2023

Conti Inc.: Understanding the Internal Discussions of a large Ransomware-as-a-Service Operator with Machine Learning

Ransomware-as-a-service (RaaS) is increasing the scale and complexity of...
research
07/23/2022

A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling

With the advent and popularity of big data mining and huge text analysis...
research
09/14/2021

Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis

Spell checking and morphological analysis are two fundamental tasks in t...
research
08/12/2010

Discovering shared and individual latent structure in multiple time series

This paper proposes a nonparametric Bayesian method for exploratory data...
research
02/03/2014

A high-reproducibility and high-accuracy method for automated topic classification

Much of human knowledge sits in large databases of unstructured text. Le...
research
07/11/2017

Look Who's Talking: Bipartite Networks as Representations of a Topic Model of New Zealand Parliamentary Speeches

Quantitative methods to measure the participation to parliamentary debat...

Please sign up or login with your details

Forgot password? Click here to reset