Sequence-to-Sequence Resources for Catalan

02/14/2022
by   Ona de Gibert, et al.
0

In this work, we introduce sequence-to-sequence language resources for Catalan, a moderately under-resourced language, towards two tasks, namely: Summarization and Machine Translation (MT). We present two new abstractive summarization datasets in the domain of newswire. We also introduce a parallel Catalan-English corpus, paired with three different brand new test sets. Finally, we evaluate the data presented with competing state of the art models, and we develop baselines for these tasks using a newly created Catalan BART. We release the resulting resources of this work under open license to encourage the development of language technology in Catalan.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2019

CVIT-MT Systems for WAT-2018

This document describes the machine translation system used in the submi...
research
05/23/2018

Amortized Context Vector Inference for Sequence-to-Sequence Networks

Neural attention (NA) is an effective mechanism for inferring complex st...
research
07/23/2019

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

Machine Comprehension (MC) is one of the core problems in natural langua...
research
08/22/2019

Unsupervised Text Summarization via Mixed Model Back-Translation

Back-translation based approaches have recently lead to significant prog...
research
03/25/2023

Indian Language Summarization using Pretrained Sequence-to-Sequence Models

The ILSUM shared task focuses on text summarization for two major Indian...
research
10/17/2018

Sequence to Sequence Mixture Model for Diverse Machine Translation

Sequence to sequence (SEQ2SEQ) models often lack diversity in their gene...
research
11/14/2017

Classical Structured Prediction Losses for Sequence to Sequence Learning

There has been much recent work on training neural attention models at t...

Please sign up or login with your details

Forgot password? Click here to reset