Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition

12/17/2019
by   Zi-Rui Wang, et al.
0

The technique of distillation helps transform cumbersome neural network into compact network so that the model can be deployed on alternative hardware devices. The main advantages of distillation based approaches include simple training process, supported by most off-the-shelf deep learning softwares and no special requirement of hardwares. In this paper, we propose a guideline to distill the architecture and knowledge of pre-trained standard CNNs simultaneously. We first make a quantitative analysis of the baseline network, including computational cost and storage overhead in different components. And then, according to the analysis results, optional strategies can be adopted to the compression of fully-connected layers. For vanilla convolution layers, the proposed parsimonious convolution (ParConv) block only consisting of depthwise separable convolution and pointwise convolution is used as a direct replacement without other adjustments such as the widths and depths in the network. Finally, the knowledge distillation with multiple losses is adopted to improve performance of the compact CNN. The proposed algorithm is first verified on offline handwritten Chinese text recognition (HCTR) where the CNNs are characterized by tens of thousands of output nodes and trained by hundreds of millions of training samples. Compared with the CNN in the state-of-the-art system, our proposed joint architecture and knowledge distillation can reduce the computational cost by >10x and model size by >8x with negligible accuracy loss. And then, by conducting experiments on one of the most popular data sets: MNIST, we demonstrate the proposed approach can also be successfully applied on mainstream backbone networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural net...
research
02/26/2017

Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition

Like other problems in computer vision, offline handwritten Chinese char...
research
12/30/2018

A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization

Recent researches introduced fast, compact and efficient convolutional n...
research
04/04/2018

Building Efficient CNN Architecture for Offline Handwritten Chinese Character Recognition

Deep convolutional networks based methods have brought great breakthroug...
research
12/05/2020

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Deep neural networks (DNNs) have been extremely successful in solving ma...
research
10/09/2020

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Deep neural network architectures have attained remarkable improvements ...
research
09/19/2016

A scalable convolutional neural network for task-specified scenarios via knowledge distillation

In this paper, we explore the redundancy in convolutional neural network...

Please sign up or login with your details

Forgot password? Click here to reset