Evaluating Generative Patent Language Models

06/23/2022

∙

This research aims to build generative language models in the patent domain and to evaluate the models from a human-centric perspective. The evaluation metric is to calculate the ratio of keystrokes that can be saved for a user in an autocomplete context based on the prediction of the generative models. The performance of models in different sizes can also be evaluated in such a metric by measuring a number of newly granted patents. On the basis of the metric, it is found that the largest model is not necessarily the best. Several models are pre-trained from scratch with patent corpus and are released. The experiments in this manuscript focus on patent claims, but the ideas and implementation can be applied to other parts of a patent document. Furthermore, this research is motivated to measure how close the pre-trained language model can generate a newly granted patent claim. Or, conversely, the task is to measure the probabilities for the model to generate each token text given the newly granted patent claim. In addition, this manuscript raises several legal implications on patent law for potential interdisciplinary research in the future. In particular, can the metric based on model prediction be a metric to measure the nonobviousness requirement in the patent law?

READ FULL TEXT

Evaluating Generative Patent Language Models

Sign in with Google

Consider DeepAI Pro