Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

08/03/2022
by   Angelina Aquino, et al.
0

The grammatical analysis of texts in any human language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages such as Tagalog which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of auxiliary data sources for creating task-specific models in the absence of annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2021

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

Recent complementary strands of research have shown that leveraging info...
research
12/15/2019

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

The lack of annotated data in many languages is a well-known challenge w...
research
08/27/2018

Zero-shot Transfer Learning for Semantic Parsing

While neural networks have shown impressive performance on large dataset...
research
12/07/2021

Multinational Address Parsing: A Zero-Shot Evaluation

Address parsing consists of identifying the segments that make up an add...
research
06/29/2020

Leveraging Subword Embeddings for Multinational Address Parsing

Address parsing consists of identifying the segments that make up an add...
research
09/16/2021

Revisiting Tri-training of Dependency Parsers

We compare two orthogonal semi-supervised learning techniques, namely tr...
research
05/21/2023

A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Dialogue-level dependency parsing has received insufficient attention, e...

Please sign up or login with your details

Forgot password? Click here to reset