Prior Art Search and Reranking for Generated Patent Text

09/19/2020
by   Jieh-Sheng Lee, et al.
0

Generative models, such as GPT-2, have demonstrated impressive results recently. A fundamental question we'd like to address is: where did the generated text come from? This work is our initial effort toward answering the question by using prior art search. The purpose of the prior art search is to find the most similar prior text in the training data of GPT-2. We take a reranking approach and apply it to the patent domain. Specifically, we pre-train GPT-2 models from scratch by using the patent data from the USPTO. The input for the prior art search is the patent text generated by the GPT-2 model. We also pre-trained BERT models from scratch for converting patent text to embeddings. The steps of reranking are: (1) search the most similar text in the training data of GPT-2 by taking a bag-of-word ranking approach (BM25), (2) convert the search results in text format to BERT embeddings, and (3) provide the final result by ranking the BERT embeddings based on their similarities with the patent text generated by GPT-2. The experiments in this work show that such reranking is better than ranking with embeddings alone. However, our mixed results also indicate that calculating the semantic similarities among long text spans is still challenging. To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

DeText: A Deep Text Ranking Framework with BERT

Ranking is the most important component in a search system. Mostsearch s...
research
08/13/2022

Interpreting BERT-based Text Similarity via Activation and Saliency Maps

Recently, there has been growing interest in the ability of Transformer-...
research
01/24/2022

Text and Code Embeddings by Contrastive Pre-Training

Text embeddings are useful features in many applications such as semanti...
research
10/17/2019

Universal Text Representation from BERT: An Empirical Study

We present a systematic investigation of layer-wise BERT activations for...
research
02/25/2023

Prompt-based Learning for Text Readability Assessment

We propose the novel adaptation of a pre-trained seq2seq model for reada...
research
07/06/2022

The Role of Complex NLP in Transformers for Text Ranking?

Even though term-based methods such as BM25 provide strong baselines in ...
research
08/07/2019

Embedding-based system for the Text part of CALL v3 shared task

This paper presents a scoring system that has shown the top result on th...

Please sign up or login with your details

Forgot password? Click here to reset