A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

04/30/2020
by   Ilia Kuznetsov, et al.
0

Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019) demonstrate remarkable performance on a range of downstream tasks. A recent line of research in probing investigates the linguistic knowledge implicitly learned by these models during pre-training. While most work in probing operates on the task level, linguistic tasks are rarely uniform and can be represented in a variety of formalisms. Any linguistics-based probing study thereby inevitably commits to the formalism used to annotate the underlying data. Can the choice of formalism affect probing results? To investigate, we conduct an in-depth cross-formalism layer probing study in role semantics. We find linguistically meaningful differences in the encoding of semantic role- and proto-role information by BERT depending on the formalism and demonstrate that layer probing can detect subtle differences between the implementations of the same linguistic formalism. Our results suggest that linguistic formalism is an important dimension in probing studies, along with the commonly used cross-task and cross-lingual experimental settings.

READ FULL TEXT
research
04/08/2021

A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Powerful sentence encoders trained for multiple languages are on the ris...
research
05/14/2021

BERT Busters: Outlier LayerNorm Dimensions that Disrupt BERT

Multiple studies have shown that BERT is remarkably robust to pruning, y...
research
04/03/2020

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset to train larg...
research
05/17/2022

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Multilingual BERT (mBERT), a language model pre-trained on large multili...
research
09/27/2020

What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties

Multilingual sentence encoders have seen much success in cross-lingual m...
research
05/31/2023

Diffused Redundancy in Pre-trained Representations

Representations learned by pre-training a neural network on a large data...
research
04/13/2021

Finding Concept-specific Biases in Form–Meaning Associations

This work presents an information-theoretic operationalisation of cross-...

Please sign up or login with your details

Forgot password? Click here to reset