NeuronBlocks
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
view repo
When building deep neural network models for natural language processing tasks, engineers often spend a lot of efforts on coding details and debugging, instead of focusing on model architecture design and hyper-parameter tuning. In this paper, we introduce NeuronBlocks, a deep neural network toolkit for natural language processing tasks. In NeuronBlocks, a suite of neural network layers are encapsulated as building blocks, which can easily be used to build complicated deep neural network models by configuring a simple JSON file. NeuronBlocks empowers engineers to build and train various NLP models in seconds even without a single line of code. A series of experiments on real NLP datasets such as GLUE and WikiQA have been conducted, which demonstrates the effectiveness of NeuronBlocks.
READ FULL TEXT VIEW PDF
Deep neural networks (DNN) have revolutionized the field of natural lang...
read it
We review the cost of training large-scale language models, and the driv...
read it
Humans read and write hundreds of billions of messages every day. Furthe...
read it
In this paper, we introduce TextBrewer, an open-source knowledge distill...
read it
Recent successes and advances in Deep Neural Networks (DNN) in machine v...
read it
Learning deep representations to solve complex machine learning tasks ha...
read it
This paper presents several strategies that can improve neural network-b...
read it
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
Deep Neural Networks (DNN) have been widely employed in industry for solving various Natural Language Processing (NLP) tasks, such as text classification, sequence labeling, question answering, etc. However, when engineers apply DNN models to address specific NLP tasks, they often face the following challenges.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
Multiple mainstream DNN frameworks, including TensorFlow, PyTorch, Keras, etc. It is a big overhead to learn how to program under the frameworks.
Diverse and fast evolving DNN models, such as CNN, RNN, and Transformer. It takes big efforts to understand the intuition and maths behind these models.
Various regularization and optimization mechanisms. It involves tuning the performance of networks for both quality and efficiency. The model developers have to gain experience in Dropout, Normalization, Gradient Accumulation, Mixed precision training, etc.
Coding and debugging complexity. Programming under DNN frameworks requires developers to be familiar with the built-in packages and interfaces. It needs much expertise to author, debug, and optimize code.
Platform compatibility. It requires extra coding work to run on different platforms, such as Linux/Windows, GPU/CPU.
The above challenges often hinder the productivity of engineers, and result in less optimal solutions to their given tasks. This motivates us to develop an NLP toolkit for DNN models, which facilitates engineers to develop DNN approaches. Before designing this NLP toolkit, we conducted a survey among engineers and identified a spectrum of three typical personas.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
The first type of engineers prefer off-the-shelf networks. Given a specific task, they expect the toolkit to suggest several end-to-end network architectures, and then they simply focus on collecting the training data, and tuning the model parameters. They hope the whole process to be extremely agile and easy.
The second type of engineers would like to build the networks by themselves. However, instead of writing each line of code from scratch, they hope the toolkit to provide a rich gallery of reusable modules as building blocks. Then they can compare various model architectures constructed by the building blocks.
The last type of engineers are advanced users. They want to reuse most part of the existing networks, but for critical components, they would like to make innovations and create their own modules. They hope the toolkit to have an open infrastructure, so that customized modules can be easily plugged in.
To satisfy the requirements of all the above three personas, the NLP toolkit has to be generic enough to cover as many tasks as possible. At the same time, it also needs to be flexible enough to allow alternative network architectures as well as customized modules. Therefore, we analyzed the NLP jobs submitted to a commercial centralized GPU cluster. Table 1 showed that about 87.5% NLP related jobs belong to a few common tasks, including sentence classification, text matching, sequence labeling, MRC, etc. It further suggested that more than 90% of the networks were composed of several common components, such as embedding, CNN/RNN, Transformer and so on.
Tasks | Ratio |
---|---|
Text matching | 39.4% |
Sentence classification | 27.3% |
Sequence labeling | 14.7% |
Machine reading comprehension | 6.0% |
Others | 12.5% |
Based on the above observations, we developed NeuronBlocks, a DNN toolkit for NLP tasks. The basic idea is to provide two layers of support to the engineers. The upper layer targets common NLP tasks. For each task, the toolkit contains several end-to-end network templates, which can be immediately instantiated with simple configuration. The bottom layer consists of a suite of reusable and standard components, which can be adopted as building blocks to construct networks with complex architecture. By following the interface guidelines, users can also contribute to this gallery of components with their own modules.
The technical contributions of NeuronBlocks are summarized into the following three aspects.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
Block Zoo: categorize and abstract the most commonly used DNN components into standard and reusable blocks. The blocks within the same category can be used exchangeably.
Model Zoo: identify the most popular NLP tasks and provide alternative end-to-end network templates (in JSON format) for each task.
Platform Compatibility: support both Linux and Windows machines, CPU/GPU chips, as well as GPU platforms such as PAI333https://github.com/Microsoft/pai.
There are several general-purpose deep learning frameworks, such as TensorFlow, PyTorch and Keras, which have gained popularity in NLP community. These frameworks offer huge flexibility in DNN model design and support various NLP tasks. However, building models under these frameworks requires a large overhead of mastering these framework details. Therefore, higher level abstraction to hide the framework details is favored by many engineers.
There are also several popular deep learning toolkits in NLP, including OpenNMT Klein et al. (2017), AllenNLP Gardner et al. (2018)
etc. OpenNMT is an open-source toolkit mainly targeting neural machine translation or other natural language generation tasks. AllenNLP provides several pre-built models for NLP tasks, such as semantic role labeling, machine comprehension, textual entailment, etc. Although these toolkits reduce the development cost, they are limited to certain tasks, and thus not flexible enough to support new network architectures or new components.
The Neuronblocks is built on PyTorch. The overall framework is illustrated in Figure 1. It consists of two layers: the Block Zoo and the Model Zoo. In Block Zoo, the most commonly used components of deep neural networks are categorized into several groups according to their functions. Within each category, several alternative components are encapsulated into standard and reusable blocks with a consistent interface. These blocks serve as basic and exchangeable units to construct complex network architectures for different NLP tasks. In Model Zoo, the most popular NLP tasks are identified. For each task, several end-to-end network templates are provided in the form of JSON configuration files. Users can simply browse these configurations and choose one or more to instantiate. The whole task can be completed without any coding efforts.
We recognize the following major functional categories of neural network components. Each category covers as many commonly used modules as possible. The Block Zoo is an open framework, and more modules can be added in the future.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
Embedding Layer: Word/character embedding and extra handcrafted feature embedding such as pos-tagging are supported.
Neural Network Layers: Block zoo provides common layers like RNN, CNN, QRNN Bradbury et al. (2016), Transformer Vaswani et al. (2017), Highway network, Encoder Decoder architecture, etc. Furthermore, attention mechanisms are widely used in neural networks. Thus we also support multiple attention layers, such as Linear/Bi-linear Attention, Full Attention Huang et al. (2017), Bidirectional attention flow Seo et al. (2016), etc. Meanwhile, regularization layers such as Dropout, Layer Norm, Batch Norm, etc are also supported for improving generalization ability.
Loss Function
: Besides of the loss functions built in PyTorch, we offer more options such as Focal Loss
Lin et al. (2017).Metrics: For classification task, AUC, Accuracy, Precision/Recall, F1 metrics are supported. For sequence labeling task, F1/Accuracy are supported. For knowledge distillation task, MSE/RMSE are supported. For MRC task, ExactMatch/F1 are supported.
In NeuronBlocks, we identify four types of most popular NLP tasks. For each task, we provide various end-to-end network templates.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
Text Classification and Matching. Tasks such as domain/intent classification, question answer matching are supported.
Sequence Labeling. Predict each token in a sequence into predefined types. Common tasks include NER, POS tagging, Slot tagging, etc.
Knowledge Distillation Hinton et al. (2015). Teacher-Student based knowledge distillation is one common approach for model compression. NeuronBlocks provides knowledge distillation template to improve the inference speed of heavy DNN models like BERT/GPT.
Extractive Machine Reading Comprehension. Given a pair of question and passage, predict the start and end positions of the answer spans in the passage.
Results(F1-score) | WLSTM+CRF | WLSTM | WCNN+CRF | WCNN | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Nochar | Literature | (N) |
|
(N) | (N) | ||||||
NeuronBlocks | 89.34 | 88.50 | 88.72 | 88.51 | |||||||
CLSTM | Literature |
|
|
(N) | (N) | ||||||
NeuronBlocks | 91.03 | 90.67 | 90.27 | 90.37 | |||||||
CCNN | Literature |
|
|
(N) | (N) | ||||||
NeuronBlocks | 91.38 | 90.63 | 90.41 | 90.36 |
Model | CoLA | SST-2 | QQP | MNLI | QNLI | RTE | WNLI |
---|---|---|---|---|---|---|---|
BiLSTM (Literature) | 17.6 | 87.5 | 85.3/82.0 | 66.7 | 77.0 | 58.5 | 56.3 |
+Attn (Literature) | 17.6 | 87.5 | 87.7/83.9 | 70.0 | 77.2 | 58.5 | 60.6 |
BiLSTM (NeuronBlocks) | 20.4 | 87.5 | 86.4/83.1 | 69.8 | 79.8 | 59.2 | 59.2 |
+Attn (NeuronBlocks) | 25.1 | 88.3 | 87.8/83.9 | 73.6 | 81.0 | 58.9 | 59.8 |
NeuronBlocks provides convenient user interface for users to build, train, and test DNN models. The details are described in the following.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
I/O interface. This part defines model input/output, such as training data, pre-trained models/embeddings, model saving path, etc.
Model Architecture interface. This is the key part of the configuration file, which defines the whole model architecture. Figure 2 shows an example of how to specify a model architecture using the blocks in NeuronBlocks. To be more specific, it consists of a list of layers/blocks to construct the architecture, where the blocks are supplied in the gallery of Block Zoo.
Training Parameters interface. In this part, the model optimizer as well as all other training hyper parameters are indicated.
Model |
|
|
|
|||||
---|---|---|---|---|---|---|---|---|
Teacher Model (BERTbase) | 448 | 110M | 0.9112 | |||||
Student Model (BiLSTMAttn+TextCNN) | 11128 | 13.63M | 0.8941 | |||||
Figure 3 shows the workflow of building DNN models in NeuronBlocks. Users only need to write a JSON configuration file. They can either instantiate an existing template from Model Zoo, or construct a new architecture based on the blocks from Block Zoo. This configuration file is shared across training, test, and prediction.
For model hyper-parameter tuning or architecture modification, users just need to change the JSON configuration file. Advanced users can also contribute novel customized blocks into Block Zoo, as long as they follow the same interface guidelines with the existing blocks. These new blocks can be further shared across all users for model architecture design. Moreover, NeuronBlocks has flexible platform support, such as GPU/CPU, GPU management platforms like PAI.
To verify the performance of NeuronBlocks, we conducted extensive experiments for common NLP tasks on public data sets including CoNLL-2003 Sang and Meulder (2003), GLUE benchmark Wang et al. (2019), and WikiQA corpus Yang et al. (2015). The experimental results showed that the models built with NeuronBlocks can achieve reliable and competitive results on various tasks, with productivity greatly improved.
For sequence labeling task, we evaluated NeuronBlocks on CoNLL-2003 Sang and Meulder (2003) English NER dataset, following most works on the same task. This dataset includes four types of named entities, namely, PERSON, LOCATION, ORGANIZATION, and MISC. We adopted the BIOES tagging scheme instead of IOB, as many previous works indicated meaningful improvement with BIOES scheme Ratinov and Roth (2009); Dai et al. (2015). Table 2 shows the results on CoNLL-2003 Englist testb dataset, with 12 different combinations of network layers/blocks, such as word/character embedding, CNN/LSTM and CRF. The results suggest that the flexible combination of layers/blocks in NeuronBlocks can easily reproduce the performance of original models, with comparative or slightly better performance.
The General Language Understanding Evaluation (GLUE) benchmark Wang et al. (2019) is a collection of natural language understanding tasks. We experimented on the GLUE benchmark tasks using BiLSTM and Attention based models. As shown in Table 3, the models built by NeuronBlocks can achieve competitive or even better results on GLUE tasks with minimal coding efforts.
We evaluated Knowledge Distillation task in NeuronBlocks on a dataset collected from one commercial search engine. We refer to this dataset as Domain Classification Dataset. Each sample in this dataset consists of two parts, i.e., a question and a binary label indicating whether the question belongs to a specific domain. Table 4 shows the results, where Area Under Curve (AUC) metric is used as the performance evaluation criteria and Queries per Second (QPS) is used to measure inference speed. By knowledge distillation training approach, the student model by NeuronBlocks managed to get 23-27 times inference speedup with only small performance regression compared with BERTbase
fine-tuned classifier.
The WikiQA corpus Yang et al. (2015) is a publicly available dataset for open-domain question answering. This dataset contains 3,047 questions from Bing query logs, each associated with some candidate answer sentences from Wikipedia. We conducted experiments on WikiQA dataset using CNN, BiLSTM, and Attention based models. The results are shown in Table 5. The models built in NeuronBlocks achieved competitive or even better results with simple model configurations.
In this paper, we introduce NeuronBlocks, a DNN toolkit for NLP tasks built on PyTorch. NeuronBlocks targets three types of engineers, and provides a two-layer solution to satisfy the requirements from all three types of users. To be more specific, the Model Zoo consists of various templates for the most common NLP tasks, while the Block Zoo supplies a gallery of alternative layers/modules for the networks. Such design achieves a balance between generality and flexibility. Extensive experiments have verified the effectiveness of this approach. NeuronBlocks has been widely used in a product team of a commercial search engine, and significantly improved the productivity for developing NLP DNN approaches.
As an open-source toolkit, we will further extend it in various directions. The following names a few examples.
[itemsep= -0.4em,topsep = 0.3em, align=left, labelsep=-0.6em, leftmargin=1.2em]
Multiple task training. Currently NeuronBlocks supports single task training. We plan to support multi-task training soon.
AutoML. Currently NeuronBlocks facilitates users to build models on top of Model Zoo and Block Zoo. With the integration of AutoML techniques, the toolkit can further support automatic model architecture design for specific tasks and data.
Proceedings of the IEEE international conference on computer vision
, pages 2980–2988.
Comments
There are no comments yet.