GePpeTto Carves Italian into a Language Model

04/29/2020
by   Lorenzo De Mattei, et al.
0

In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto's quality by means of both an automatic and a human-based evaluation. The automatic assessment consists in (i) calculating perplexity across different genres and (ii) a profiling analysis over GePpeTto's writing characteristics. We find that GePpeTto's production is a sort of bonsai version of human production, with shorter but yet complex sentences. Human evaluation is performed over a sentence completion task, where GePpeTto's output is judged as natural more often than not, and much closer to the original human texts than to a simpler language model which we take as baseline.

READ FULL TEXT
research
02/21/2022

StyleBERT: Chinese pretraining by font style information

With the success of down streaming task using English pre-trained langua...
research
10/31/2022

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

Pre-trained language models have shown impressive performance on a varie...
research
11/09/2019

On Architectures for Including Visual Information in Neural Language Models for Image Description

A neural language model can be conditioned into generating descriptions ...
research
12/19/2022

Evaluating Human-Language Model Interaction

Many real-world applications of language models (LMs), such as code auto...
research
07/27/2023

Models of reference production: How do they withstand the test of time?

In recent years, many NLP studies have focused solely on performance imp...
research
04/27/2022

Probing Simile Knowledge from Pre-trained Language Models

Simile interpretation (SI) and simile generation (SG) are challenging ta...
research
06/02/2021

belabBERT: a Dutch RoBERTa-based language model applied to psychiatric classification

Natural language processing (NLP) is becoming an important means for aut...

Please sign up or login with your details

Forgot password? Click here to reset