Improving Automated Patent Claim Parsing: Dataset, System, and Experiments

05/05/2016
by   Mengke Hu, et al.
0

Off-the-shelf natural language processing software performs poorly when parsing patent claims owing to their use of irregular language relative to the corpora built from news articles and the web typically utilized to train this software. Stopping short of the extensive and expensive process of accumulating a large enough dataset to completely retrain parsers for patent claims, a method of adapting existing natural language processing software towards patent claims via forced part of speech tag correction is proposed. An Amazon Mechanical Turk collection campaign organized to generate a public corpus to train such an improved claim parsing system is discussed, identifying lessons learned during the campaign that can be of use in future NLP dataset collection campaigns with AMT. Experiments utilizing this corpus and other patent claim sets measure the parsing performance improvement garnered via the claim parsing system. Finally, the utility of the improved claim parsing system within other patent processing applications is demonstrated via experiments showing improved automated patent subject classification when the new claim parsing system is utilized to generate the features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2019

Real-time Claim Detection from News Articles and Retrieval of Semantically-Similar Factchecks

Factchecking has always been a part of the journalistic process. However...
research
05/05/2022

WDV: A Broad Data Verbalisation Dataset Built from Wikidata

Data verbalisation is a task of great importance in the current field of...
research
07/06/2021

Shell Language Processing: Unix command parsing for Machine Learning

In this article, we present a Shell Language Preprocessing (SLP) library...
research
07/27/2019

Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

Engaging in a live debate requires, among other things, the ability to e...
research
12/16/2021

NewsClaims: A New Benchmark for Claim Detection from News with Background Knowledge

Claim detection and verification are crucial for news understanding and ...
research
04/26/2022

Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims

False information has a significant negative influence on individuals as...
research
08/08/2019

Uncheatable Machine Learning Inference

Classification-as-a-Service (CaaS) is widely deployed today in machine i...

Please sign up or login with your details

Forgot password? Click here to reset