NeuronBlocks -- Building Your NLP DNN Models Like Playing Lego

04/21/2019 ∙ by Ming Gong, et al. ∙ Microsoft Institute of Computing Technology, Chinese Academy of Sciences 0

When building deep neural network models for natural language processing tasks, engineers often spend a lot of efforts on coding details and debugging, instead of focusing on model architecture design and hyper-parameter tuning. In this paper, we introduce NeuronBlocks, a deep neural network toolkit for natural language processing tasks. In NeuronBlocks, a suite of neural network layers are encapsulated as building blocks, which can easily be used to build complicated deep neural network models by configuring a simple JSON file. NeuronBlocks empowers engineers to build and train various NLP models in seconds even without a single line of code. A series of experiments on real NLP datasets such as GLUE and WikiQA have been conducted, which demonstrates the effectiveness of NeuronBlocks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 7

Code Repositories

NeuronBlocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNN) have been widely employed in industry for solving various Natural Language Processing (NLP) tasks, such as text classification, sequence labeling, question answering, etc. However, when engineers apply DNN models to address specific NLP tasks, they often face the following challenges.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • Multiple mainstream DNN frameworks, including TensorFlow, PyTorch, Keras, etc. It is a big overhead to learn how to program under the frameworks.

  • Diverse and fast evolving DNN models, such as CNN, RNN, and Transformer. It takes big efforts to understand the intuition and maths behind these models.

  • Various regularization and optimization mechanisms. It involves tuning the performance of networks for both quality and efficiency. The model developers have to gain experience in Dropout, Normalization, Gradient Accumulation, Mixed precision training, etc.

  • Coding and debugging complexity. Programming under DNN frameworks requires developers to be familiar with the built-in packages and interfaces. It needs much expertise to author, debug, and optimize code.

  • Platform compatibility. It requires extra coding work to run on different platforms, such as Linux/Windows, GPU/CPU.

The above challenges often hinder the productivity of engineers, and result in less optimal solutions to their given tasks. This motivates us to develop an NLP toolkit for DNN models, which facilitates engineers to develop DNN approaches. Before designing this NLP toolkit, we conducted a survey among engineers and identified a spectrum of three typical personas.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • The first type of engineers prefer off-the-shelf networks. Given a specific task, they expect the toolkit to suggest several end-to-end network architectures, and then they simply focus on collecting the training data, and tuning the model parameters. They hope the whole process to be extremely agile and easy.

  • The second type of engineers would like to build the networks by themselves. However, instead of writing each line of code from scratch, they hope the toolkit to provide a rich gallery of reusable modules as building blocks. Then they can compare various model architectures constructed by the building blocks.

  • The last type of engineers are advanced users. They want to reuse most part of the existing networks, but for critical components, they would like to make innovations and create their own modules. They hope the toolkit to have an open infrastructure, so that customized modules can be easily plugged in.

To satisfy the requirements of all the above three personas, the NLP toolkit has to be generic enough to cover as many tasks as possible. At the same time, it also needs to be flexible enough to allow alternative network architectures as well as customized modules. Therefore, we analyzed the NLP jobs submitted to a commercial centralized GPU cluster. Table 1 showed that about 87.5% NLP related jobs belong to a few common tasks, including sentence classification, text matching, sequence labeling, MRC, etc. It further suggested that more than 90% of the networks were composed of several common components, such as embedding, CNN/RNN, Transformer and so on.

Tasks Ratio
Text matching 39.4%
Sentence classification 27.3%
Sequence labeling 14.7%
Machine reading comprehension 6.0%
Others 12.5%
Table 1: Task analysis of NLP DNN jobs submitted to a commercial centralized GPU cluster.

Based on the above observations, we developed NeuronBlocks, a DNN toolkit for NLP tasks. The basic idea is to provide two layers of support to the engineers. The upper layer targets common NLP tasks. For each task, the toolkit contains several end-to-end network templates, which can be immediately instantiated with simple configuration. The bottom layer consists of a suite of reusable and standard components, which can be adopted as building blocks to construct networks with complex architecture. By following the interface guidelines, users can also contribute to this gallery of components with their own modules.

The technical contributions of NeuronBlocks are summarized into the following three aspects.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • Block Zoo: categorize and abstract the most commonly used DNN components into standard and reusable blocks. The blocks within the same category can be used exchangeably.

  • Model Zoo: identify the most popular NLP tasks and provide alternative end-to-end network templates (in JSON format) for each task.

  • Platform Compatibility: support both Linux and Windows machines, CPU/GPU chips, as well as GPU platforms such as PAI333https://github.com/Microsoft/pai.

2 Related Work

There are several general-purpose deep learning frameworks, such as TensorFlow, PyTorch and Keras, which have gained popularity in NLP community. These frameworks offer huge flexibility in DNN model design and support various NLP tasks. However, building models under these frameworks requires a large overhead of mastering these framework details. Therefore, higher level abstraction to hide the framework details is favored by many engineers.

There are also several popular deep learning toolkits in NLP, including OpenNMT Klein et al. (2017), AllenNLP Gardner et al. (2018)

etc. OpenNMT is an open-source toolkit mainly targeting neural machine translation or other natural language generation tasks. AllenNLP provides several pre-built models for NLP tasks, such as semantic role labeling, machine comprehension, textual entailment, etc. Although these toolkits reduce the development cost, they are limited to certain tasks, and thus not flexible enough to support new network architectures or new components.

3 Design

The Neuronblocks is built on PyTorch. The overall framework is illustrated in Figure 1. It consists of two layers: the Block Zoo and the Model Zoo. In Block Zoo, the most commonly used components of deep neural networks are categorized into several groups according to their functions. Within each category, several alternative components are encapsulated into standard and reusable blocks with a consistent interface. These blocks serve as basic and exchangeable units to construct complex network architectures for different NLP tasks. In Model Zoo, the most popular NLP tasks are identified. For each task, several end-to-end network templates are provided in the form of JSON configuration files. Users can simply browse these configurations and choose one or more to instantiate. The whole task can be completed without any coding efforts.

Figure 1: The overall framework of NeuronBlocks.

3.1 Block Zoo

We recognize the following major functional categories of neural network components. Each category covers as many commonly used modules as possible. The Block Zoo is an open framework, and more modules can be added in the future.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • Embedding Layer: Word/character embedding and extra handcrafted feature embedding such as pos-tagging are supported.

    Figure 2: A Model architecture interface example of sequence labeling model in NeuronBlocks.
  • Neural Network Layers: Block zoo provides common layers like RNN, CNN, QRNN Bradbury et al. (2016), Transformer Vaswani et al. (2017), Highway network, Encoder Decoder architecture, etc. Furthermore, attention mechanisms are widely used in neural networks. Thus we also support multiple attention layers, such as Linear/Bi-linear Attention, Full Attention Huang et al. (2017), Bidirectional attention flow Seo et al. (2016), etc. Meanwhile, regularization layers such as Dropout, Layer Norm, Batch Norm, etc are also supported for improving generalization ability.

  • Loss Function

    : Besides of the loss functions built in PyTorch, we offer more options such as Focal Loss 

    Lin et al. (2017).

  • Metrics: For classification task, AUC, Accuracy, Precision/Recall, F1 metrics are supported. For sequence labeling task, F1/Accuracy are supported. For knowledge distillation task, MSE/RMSE are supported. For MRC task, ExactMatch/F1 are supported.

3.2 Model Zoo

In NeuronBlocks, we identify four types of most popular NLP tasks. For each task, we provide various end-to-end network templates.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • Text Classification and Matching. Tasks such as domain/intent classification, question answer matching are supported.

  • Sequence Labeling. Predict each token in a sequence into predefined types. Common tasks include NER, POS tagging, Slot tagging, etc.

  • Knowledge Distillation Hinton et al. (2015). Teacher-Student based knowledge distillation is one common approach for model compression. NeuronBlocks provides knowledge distillation template to improve the inference speed of heavy DNN models like BERT/GPT.

  • Extractive Machine Reading Comprehension. Given a pair of question and passage, predict the start and end positions of the answer spans in the passage.

Results(F1-score) WLSTM+CRF WLSTM WCNN+CRF WCNN
Nochar Literature (N)
87.00(M-16)
(N)
(N) (N)
NeuronBlocks 89.34 88.50 88.72 88.51
CLSTM Literature
90.94(L-16)
(N)
89.15(L-16)
(N)
(N) (N)
NeuronBlocks 91.03 90.67 90.27 90.37
CCNN Literature
(C-16)
91.21(M-16)
(P-17)
(N)
89.36(M-16)
(N)
(N) (N)
NeuronBlocks 91.38 90.63 90.41 90.36
Table 2: NeuronBlocks results on CoNLL-2003 English NER testb dataset. The abbreviation (C-16)= Chiu and Nichols (2016), (L-16)= Lample et al. (2016), (M-16)= Ma and Hovy (2016), (N)= Yang et al. (2018), (P-17)= Peters et al. (2017).
Model CoLA SST-2 QQP MNLI QNLI RTE WNLI
BiLSTM (Literature) 17.6 87.5 85.3/82.0 66.7 77.0 58.5 56.3
+Attn (Literature) 17.6 87.5 87.7/83.9 70.0 77.2 58.5 60.6
BiLSTM (NeuronBlocks) 20.4 87.5 86.4/83.1 69.8 79.8 59.2 59.2
+Attn (NeuronBlocks) 25.1 88.3 87.8/83.9 73.6 81.0 58.9 59.8
Table 3: NeuronBlocks ?results on GLUE benchmark development sets. As described in Wang et al. (2019), for CoLA, we report Matthews correlation. For QQP, we report accuracy and F1. For MNLI, we report accuracy averaged over the matched and mismatched development sets. For all other tasks we report accuracy. All values have been scaled by 100. Please note that results on the development sets are reported, since GLUE does not distribute labels for the test sets.

3.3 User Interface

NeuronBlocks provides convenient user interface for users to build, train, and test DNN models. The details are described in the following.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • I/O interface. This part defines model input/output, such as training data, pre-trained models/embeddings, model saving path, etc.

  • Model Architecture interface. This is the key part of the configuration file, which defines the whole model architecture. Figure 2 shows an example of how to specify a model architecture using the blocks in NeuronBlocks. To be more specific, it consists of a list of layers/blocks to construct the architecture, where the blocks are supplied in the gallery of Block Zoo.

  • Training Parameters interface. In this part, the model optimizer as well as all other training hyper parameters are indicated.

Figure 3: The workflow of NeuronBlocks.

3.4 Workflow

Model
Inference Speed
QPS
Parameters
Performance
AUC
Teacher Model (BERTbase) 448 110M 0.9112
Student Model (BiLSTMAttn+TextCNN) 11128 13.63M 0.8941
Table 4: NeuronBlocks results on Knowledge Distillation task.

Figure 3 shows the workflow of building DNN models in NeuronBlocks. Users only need to write a JSON configuration file. They can either instantiate an existing template from Model Zoo, or construct a new architecture based on the blocks from Block Zoo. This configuration file is shared across training, test, and prediction.

For model hyper-parameter tuning or architecture modification, users just need to change the JSON configuration file. Advanced users can also contribute novel customized blocks into Block Zoo, as long as they follow the same interface guidelines with the existing blocks. These new blocks can be further shared across all users for model architecture design. Moreover, NeuronBlocks has flexible platform support, such as GPU/CPU, GPU management platforms like PAI.

4 Experiments

To verify the performance of NeuronBlocks, we conducted extensive experiments for common NLP tasks on public data sets including CoNLL-2003 Sang and Meulder (2003), GLUE benchmark Wang et al. (2019), and WikiQA corpus Yang et al. (2015). The experimental results showed that the models built with NeuronBlocks can achieve reliable and competitive results on various tasks, with productivity greatly improved.

4.1 Sequence Labeling

For sequence labeling task, we evaluated NeuronBlocks on CoNLL-2003 Sang and Meulder (2003) English NER dataset, following most works on the same task. This dataset includes four types of named entities, namely, PERSON, LOCATION, ORGANIZATION, and MISC. We adopted the BIOES tagging scheme instead of IOB, as many previous works indicated meaningful improvement with BIOES scheme Ratinov and Roth (2009); Dai et al. (2015). Table 2 shows the results on CoNLL-2003 Englist testb dataset, with 12 different combinations of network layers/blocks, such as word/character embedding, CNN/LSTM and CRF. The results suggest that the flexible combination of layers/blocks in NeuronBlocks can easily reproduce the performance of original models, with comparative or slightly better performance.

4.2 GLUE Benchmark

The General Language Understanding Evaluation (GLUE) benchmark Wang et al. (2019) is a collection of natural language understanding tasks. We experimented on the GLUE benchmark tasks using BiLSTM and Attention based models. As shown in Table 3, the models built by NeuronBlocks can achieve competitive or even better results on GLUE tasks with minimal coding efforts.

4.3 Knowledge Distillation

We evaluated Knowledge Distillation task in NeuronBlocks on a dataset collected from one commercial search engine. We refer to this dataset as Domain Classification Dataset. Each sample in this dataset consists of two parts, i.e., a question and a binary label indicating whether the question belongs to a specific domain. Table 4 shows the results, where Area Under Curve (AUC) metric is used as the performance evaluation criteria and Queries per Second (QPS) is used to measure inference speed. By knowledge distillation training approach, the student model by NeuronBlocks managed to get 23-27 times inference speedup with only small performance regression compared with BERTbase

fine-tuned classifier.

4.4 WikiQA

The WikiQA corpus Yang et al. (2015) is a publicly available dataset for open-domain question answering. This dataset contains 3,047 questions from Bing query logs, each associated with some candidate answer sentences from Wikipedia. We conducted experiments on WikiQA dataset using CNN, BiLSTM, and Attention based models. The results are shown in Table 5. The models built in NeuronBlocks achieved competitive or even better results with simple model configurations.

Model AUC
CNN ( Yang et al. (2015)) 73.59
CNN-Cnt ( Yang et al. (2015)) 75.33
CNN (NeuronBlocks) 74.79
BiLSTM (NeuronBlocks) 76.73
BiLSTM+Attn (NeuronBlocks) 75.48
BiLSTM+MatchAttn (NeuronBlocks) 78.54
Table 5: NeuronBlocks results on WikiQA.

5 Conclusion and Future Work

In this paper, we introduce NeuronBlocks, a DNN toolkit for NLP tasks built on PyTorch. NeuronBlocks targets three types of engineers, and provides a two-layer solution to satisfy the requirements from all three types of users. To be more specific, the Model Zoo consists of various templates for the most common NLP tasks, while the Block Zoo supplies a gallery of alternative layers/modules for the networks. Such design achieves a balance between generality and flexibility. Extensive experiments have verified the effectiveness of this approach. NeuronBlocks has been widely used in a product team of a commercial search engine, and significantly improved the productivity for developing NLP DNN approaches.

As an open-source toolkit, we will further extend it in various directions. The following names a few examples.

  • [itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]

  • Multiple task training. Currently NeuronBlocks supports single task training. We plan to support multi-task training soon.

  • Pre-training and fine-tuning. Deep pre-training models such as ELMo Peters et al. (2018), GPT Radford et al. (2018), BERT Devlin et al. (2018) are new directions in NLP. We will support these models as well.

  • AutoML. Currently NeuronBlocks facilitates users to build models on top of Model Zoo and Block Zoo. With the integration of AutoML techniques, the toolkit can further support automatic model architecture design for specific tasks and data.

References