TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

03/16/2020
by   Zhiheng Huang, et al.
0

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments. Our TRANS-BLSTM model obtains an F1 score of 94.01 state-of-the-art result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stan...
research
03/14/2020

Finnish Language Modeling with Deep Transformer Models

Transformers have recently taken the center stage in language modeling a...
research
09/25/2019

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art re...
research
05/12/2021

Building a Question and Answer System for News Domain

This project attempts to build a Question- Answering system in the News ...
research
08/30/2019

Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel

Transformer is a powerful architecture that achieves superior performanc...
research
05/17/2020

Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer

Quality of questions and answers from community support websites (e.g. M...
research
11/09/2020

VisBERT: Hidden-State Visualizations for Transformers

Explainability and interpretability are two important concepts, the abse...

Please sign up or login with your details

Forgot password? Click here to reset