Multi-Task Sequence Prediction For Tunisian Arabizi Multi-Level Annotation

11/10/2020
by   Elisa Gugliotta, et al.
0

In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2017

A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning

Multi-task learning leverages potential correlations among related tasks...
research
05/24/2016

Multi-Level Analysis and Annotation of Arabic Corpora for Text-to-Sign Language MT

In this paper, we present an ongoing effort in lexical semantic analysis...
research
06/05/2018

Multi-Task Active Learning for Neural Semantic Role Labeling on Low Resource Conversational Corpus

Most Semantic Role Labeling (SRL) approaches are supervised methods whic...
research
08/10/2022

Multi-task Active Learning for Pre-trained Transformer-based Models

Multi-task learning, in which several tasks are jointly learned by a sin...
research
09/09/2018

TextContourNet: a Flexible and Effective Framework for Improving Scene Text Detection Architecture with a Multi-task Cascade

We study the problem of extracting text instance contour information fro...
research
11/28/2018

GIRNet: Interleaved Multi-Task Recurrent State Sequence Models

In several natural language tasks, labeled sequences are available in se...

Please sign up or login with your details

Forgot password? Click here to reset