ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

10/28/2021
by   David Samuel, et al.
7

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages. We base our solution on a pre-trained byte-level language model, ByT5 (Xue et al., 2021a), which we further pre-train on synthetic data and then fine-tune on authentic normalization data. Our system achieves the best performance by a wide margin in intrinsic evaluation, and also the best performance in extrinsic evaluation through dependency parsing. The source code is released at https://github.com/ufal/multilexnorm2021 and the fine-tuned models at https://huggingface.co/ufal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

ÚFAL CorPipe at CRAC 2022: Effectivity of Multilingual Models for Coreference Resolution

We describe the winning submission to the CRAC 2022 Shared Task on Multi...
research
01/04/2023

UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

Previous state-of-the-art models for lexical simplification consist of c...
research
10/06/2021

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

Current benchmark tasks for natural language processing contain text tha...
research
04/27/2023

A Modular Approach for Multilingual Timex Detection and Normalization using Deep Learning and Grammar-based methods

Detecting and normalizing temporal expressions is an essential step for ...
research
03/17/2019

Technical notes: Syntax-aware Representation Learning With Pointer Networks

This is a work-in-progress report, which aims to share preliminary resul...
research
11/03/2017

Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank. Submission to PolEval task 2

We describe a variant of Child-Sum Tree-LSTM deep neural network (Tai et...
research
11/15/2022

Evaluating How Fine-tuning on Bimodal Data Effects Code Generation

Despite the increase in popularity of language models for code generatio...

Please sign up or login with your details

Forgot password? Click here to reset