PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

08/20/2020
by   Diedre Carmo, et al.
0

In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages. In this paper, we pretrain a T5 model on the BrWac corpus, an extensive collection of web pages in Portuguese, and evaluate its performance against other Portuguese pretrained models and multilingual models on the sentence similarity and sentence entailment tasks. We show that our Portuguese pretrained models have significantly better performance over the original T5 models. Moreover, we showcase the positive impact of using a Portuguese vocabulary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Extending the Subwording Model of Multilingual Pretrained Models for New Languages

Multilingual pretrained models are effective for machine translation and...
research
10/23/2020

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Inductive transfer learning, enabled by self-supervised learning, have t...
research
04/01/2019

Using Similarity Measures to Select Pretraining Data for NER

Word vectors and Language Models (LMs) pretrained on a large amount of u...
research
03/10/2023

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning

Due to their similarity-based learning objectives, pretrained sentence e...
research
07/14/2021

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus

With more than 7000 languages worldwide, multilingual natural language p...
research
04/18/2023

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

Pretrained multilingual large language models have typically used heuris...
research
10/05/2017

On the Effective Use of Pretraining for Natural Language Inference

Neural networks have excelled at many NLP tasks, but there remain open q...

Please sign up or login with your details

Forgot password? Click here to reset