RISC: Generating Realistic Synthetic Bilingual Insurance Contract

04/09/2023
by   David Beauchemin, et al.
0

This paper presents RISC, an open-source Python package data generator (https://github.com/GRAAL-Research/risc). RISC generates look-alike automobile insurance contracts based on the Quebec regulatory insurance form in French and English. Insurance contracts are 90 to 100 pages long and use complex legal and insurance-specific vocabulary for a layperson. Hence, they are a much more complex class of documents than those in traditional NLP corpora. Therefore, we introduce RISCBAC, a Realistic Insurance Synthetic Bilingual Automobile Contract dataset based on the mandatory Quebec car insurance contract. The dataset comprises 10,000 French and English unannotated insurance contracts. RISCBAC enables NLP research for unsupervised automatic summarisation, question answering, text simplification, machine translation and more. Moreover, it can be further automatically annotated as a dataset for supervised tasks such as NER

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2021

CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract Authoring

Contracts are a common type of legal document that frequent in several d...
research
01/21/2023

Investigating Strategies for Clause Recommendation

Clause recommendation is the problem of recommending a clause to a legal...
research
07/21/2023

Understanding (Un)Written Contracts of NVMe ZNS Devices with zns-tools

Operational and performance characteristics of flash SSDs have long been...
research
11/18/2020

Clustering-based Automatic Construction of Legal Entity Knowledge Base from Contracts

In contract analysis and contract automation, a knowledge base (KB) of l...
research
08/27/2022

Conversion of Legal Agreements into Smart Legal Contracts using NLP

A Smart Legal Contract (SLC) is a specialized digital agreement that con...
research
10/05/2021

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts

Reviewing contracts is a time-consuming procedure that incurs large expe...
research
01/30/2018

PEYMA: A Tagged Corpus for Persian Named Entities

The goal in the NER task is to classify proper nouns of a text into clas...

Please sign up or login with your details

Forgot password? Click here to reset