Privacy-Preserving Models for Legal Natural Language Processing

11/05/2022
by   Ying Yin, et al.
0

Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.

READ FULL TEXT
research
09/14/2021

Legal Transformer Models May Not Always Help

Deep learning-based Natural Language Processing methods, especially tran...
research
09/12/2023

Recovering from Privacy-Preserving Masking with Large Language Models

Model adaptation is crucial to handle the discrepancy between proxy trai...
research
10/06/2022

Q-LSTM Language Model – Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection

Large-scale language models are trained on a massive amount of natural l...
research
05/09/2023

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

Legal case retrieval is a critical process for modern legal information ...
research
01/29/2022

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

In recent years, large pre-trained Transformer-based language models hav...
research
10/24/2022

Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models

In the era of billion-parameter-sized Language Models (LMs), start-ups h...
research
05/26/2023

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

We study the phenomenon of in-context learning (ICL) exhibited by large ...

Please sign up or login with your details

Forgot password? Click here to reset