Assessment of Pre-Trained Models Across Languages and Grammars

09/20/2023
by   Alberto Muñoz-Ortiz, et al.
0

We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by casting parsing as sequence labeling. To do so, we select a few LLMs and study them on 13 diverse UD treebanks for dependency parsing and 10 treebanks for constituent parsing. Our results show that: (i) the framework is consistent across encodings, (ii) pre-trained word vectors do not favor constituency representations of syntax over dependencies, (iii) sub-word tokenization is needed to represent syntax, in contrast to character-based models, and (iv) occurrence of a language in the pretraining data is more important than the amount of task data when recovering syntax from the word vectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2020

Parsing as Pretraining

Recent analyses suggest that encoders pretrained for language modeling c...
research
11/29/2022

Syntactic Substitutability as Unsupervised Dependency Syntax

Syntax is a latent hierarchical structure which underpins the robust and...
research
10/30/2019

LSTM Easy-first Dependency Parsing with Pre-trained Word Embeddings and Character-level Word Embeddings in Vietnamese

In Vietnamese dependency parsing, several methods have been proposed. De...
research
10/02/2020

Syntax Representation in Word Embeddings and Neural Networks – A Survey

Neural networks trained on natural language processing tasks capture syn...
research
06/15/2018

An Empirical Analysis of the Correlation of Syntax and Prosody

The relation of syntax and prosody (the syntax--prosody interface) has b...
research
04/28/2021

Learning Syntax from Naturally-Occurring Bracketings

Naturally-occurring bracketings, such as answer fragments to natural lan...
research
09/06/2019

Extracting and Learning a Dependency-Enhanced Type Lexicon for Dutch

This thesis is concerned with type-logical grammars and their practical ...

Please sign up or login with your details

Forgot password? Click here to reset