Spanish Legalese Language Model and Corpora

There are many Language Models for the English language according to its worldwide relevance. However, for the Spanish language, even if it is a widely spoken language, there are very few Spanish Language Models which result to be small and too general. Legal slang could be think of a Spanish variant on its own as it is very complicated in vocabulary, semantics and phrase understanding. For this work we gathered legal-domain corpora from different sources, generated a model and evaluated against Spanish general domain tasks. The model provides reasonable results in those tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Customizing Contextualized Language Models forLegal Document Reviews

Inspired by the inductive transfer learning on computer vision, many eff...
research
09/07/2017

Cynical Selection of Language Model Training Data

The Moore-Lewis method of "intelligent selection of language model train...
research
08/04/2023

A Survey of Spanish Clinical Language Models

This survey focuses in encoder Language Models for solving tasks in the ...
research
09/06/2023

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Large Language Models (LLMs) pretrained on massive corpora exhibit remar...
research
08/18/2023

OCR Language Models with Custom Vocabularies

Language models are useful adjuncts to optical models for producing accu...
research
09/06/2021

You should evaluate your language model on marginal likelihood overtokenisations

Neural language models typically tokenise input text into sub-word units...
research
08/08/2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

The legality of training language models (LMs) on copyrighted or otherwi...

Please sign up or login with your details

Forgot password? Click here to reset