Specializing Word Embeddings (for Parsing) by Information Bottleneck

10/01/2019
by   Xiang Lisa Li, et al.
0

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

READ FULL TEXT
research
08/27/2018

An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing

We provide a comprehensive analysis of the interactions between pre-trai...
research
10/30/2019

LSTM Easy-first Dependency Parsing with Pre-trained Word Embeddings and Character-level Word Embeddings in Vietnamese

In Vietnamese dependency parsing, several methods have been proposed. De...
research
12/15/2021

Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing Results and Analysis

We present the first parsing results on the Penn-Helsinki Parsed Corpus ...
research
04/05/2016

A new TAG Formalism for Tamil and Parser Analytics

Tree adjoining grammar (TAG) is specifically suited for morph rich and a...
research
08/11/2017

Simple and Effective Dimensionality Reduction for Word Embeddings

Word embeddings have become the basic building blocks for several natura...
research
11/22/2019

Anaphora Resolution in Dialogue Systems for South Asian Languages

Anaphora resolution is a challenging task which has been the interest of...
research
08/12/2016

Redefining part-of-speech classes with distributional semantic models

This paper studies how word embeddings trained on the British National C...

Please sign up or login with your details

Forgot password? Click here to reset