BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

10/11/2018 ∙ by Jacob Devlin, et al. ∙ 0

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4 accuracy to 86.7 (5.6 answering Test F1 to 93.2 (1.5 performance by 2.0



page 7

Code Repositories


TensorFlow code and pre-trained models for BERT

view repo


Google AI 2018 BERT pytorch implementation

view repo


Bidirectional Encoder Representations from Transformers

view repo


this is the code copy from google's BERT model

view repo


A Tensorflow implementation of BERT (Bidirectional Encoder Representations from Transformers).

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.