A novel approach to measuring patent claim scope based on probabilities obtained from (large) language models

09/17/2023
by   Sébastien Ragot, et al.
0

This work proposes to measure the scope of a patent claim as the reciprocal of the self-information contained in this claim. Grounded in information theory, this approach is based on the assumption that a rare concept is more informative than a usual concept, inasmuch as it is more surprising. The self-information is calculated from the probability of occurrence of that claim, where the probability is calculated in accordance with a language model. Five language models are considered, ranging from the simplest models (each word or character is drawn from a uniform distribution) to intermediate models (using average word or character frequencies), to a large language model (GPT2). Interestingly, the simplest language models reduce the scope measure to the reciprocal of the word or character count, a metric already used in previous works. Application is made to nine series of patent claims directed to distinct inventions, where the claims in each series have a gradually decreasing scope. The performance of the language models is then assessed with respect to several ad hoc tests. The more sophisticated the model, the better the results. The GPT2 model outperforms models based on word and character frequencies, which are themselves ahead of models based on word and character counts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2017

Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

Syllabification does not seem to improve word-level RNN language modelin...
research
08/04/2020

An improved Bayesian TRIE based model for SMS text normalization

Normalization of SMS text, commonly known as texting language, is being ...
research
06/23/2022

Evaluating Generative Patent Language Models

This research aims to build generative language models in the patent dom...
research
08/31/2018

Indicatements that character language models learn English morpho-syntactic units and regularities

Character language models have access to surface morphological patterns,...
research
03/15/2022

Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models

Natural language processing models learn word representations based on t...
research
08/27/2019

Bridging the Gap for Tokenizer-Free Language Models

Purely character-based language models (LMs) have been lagging in qualit...
research
02/02/2023

Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease

In this work we explore how language models can be employed to analyze l...

Please sign up or login with your details

Forgot password? Click here to reset