Deepening Hidden Representations from Pre-trained Language Models for Natural Language Understanding

11/05/2019
by   Junjie Yang, et al.
0

Transformer-based pre-trained language models have proven to be effective for learning contextualized language representation. However, current approaches only take advantage of the output of the encoder's final layer when fine-tuning the downstream tasks. We argue that only taking single layer's output restricts the power of pre-trained representation. Thus we deepen the representation learned by the model by fusing the hidden representation in terms of an explicit HIdden Representation Extractor (HIRE), which automatically absorbs the complementary representation with respect to the output from the final layer. Utilizing RoBERTa as the backbone encoder, our proposed improvement over the pre-trained models is shown effective on multiple natural language understanding tasks and help our model rival with the state-of-the-art models on the GLUE benchmark.

READ FULL TEXT
research
12/17/2022

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

Language models with the Transformers structure have shown great perform...
research
09/15/2021

Can Machines Read Coding Manuals Yet? – A Benchmark for Building Better Language Models for Code Understanding

Code understanding is an increasingly important application of Artificia...
research
05/02/2022

ASTROMER: A transformer-based embedding for the representation of light curves

Taking inspiration from natural language embeddings, we present ASTROMER...
research
05/17/2022

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

Many application studies rely on audio DNN models pre-trained on a large...
research
04/15/2019

Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection

It is known that a deep neural network model pre-trained with large-scal...
research
12/16/2022

Decoder Tuning: Efficient Language Understanding as Decoding

With the evergrowing sizes of pre-trained models (PTMs), it has been an ...
research
10/20/2022

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

Out-of-distribution (OOD) detection aims to discern outliers from the in...

Please sign up or login with your details

Forgot password? Click here to reset