We are trying to implement deep neural networks in the edge computing environment for real-world applications such as the IoT(Internet of Things), the FinTech etc., for the purpose of utilizing the significant achievement of Deep Learning in recent years. Especially, we now focus algorithm implementation on FPGA, because FPGA is one of the promising devices for low-cost and low-power implementation of the edge computer. In this work, we introduce Binary-DCGAN(B-DCGAN) - Deep Convolutional GAN model with binary weights and activations, and with using integer-valued operations in forward pass(train-time and run-time). And we show how to implement B-DCGAN on FPGA(Xilinx Zynq). Using the B-DCGAN, we do feasibility study of FPGA's characteristic and performance for Deep Learning. Because the binarization and using integer-valued operation reduce the memory capacity and the number of the circuit gates, it is very effective for FPGA implementation. On the other hand, the quality of generated data from the model will be decreased by these reductions. So we investigate the influence of these reductions.READ FULL TEXT VIEW PDF
Deep convolutional neural networks (CNN) based solutions are the current...
When trained as generative models, Deep Learning algorithms have shown
While hardware implementations of inference routines for Binarized Neura...
This paper presents Systolic-CNN, an OpenCL-defined scalable,
The front end of the human auditory system, the cochlea, converts sound
Increasing the speed of computer is one of the important aspects of the
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement...
A well-known fact, Graphics Processing Unit(GPU) is now the most common computing resources for Deep Neural Networks(DNNs). GPU is a most effective device to execute huge scale numerical operations such as DNNs. Furthermore, thanks to effort of world-wide researchers and developers, now there are many software tools, libraries and frameworks suitable for GPU, so it is easy to write the DNNs software for GPU.
However, GPU has very high power consumption, therefore it needs rich power supply and cooling equipment, it is difficult to use on the small edge machines for IoT, etc.
As a way for low-power and low-cost dedicated computation, Filed Programmable Gate Array(FPGA) is gathering attention recently again, since its cost performance has improved and the price of related software tools has also declined.
While FPGA is a user-programmable hardware, its development in the early times was pretty difficult and complicated, because the program had to be made of ’low-level’ notations such as Hardware Description Language(HDL) or schematic circuit diagrams, so the developers had to have these kind of special skills.
But nowadays, owing to the High-Level Synthesis(HLS) technique has been evolving, the developer programs the FPGA using high-level programming language such as C/C++. It is much easier way for developer who is not an expert of hardware.
The algorithm development using FPGA has some problems as follows:
The number of gates and the capacity of fast memory is much smaller than th GPU. While it is available that we can use multi FPGAs and/or external memories, however, because of the communication overheads the processing speed will drop considerably.
The more the algorithm uses multiplication, division, or floating point value of those operations, the much larger gates it will consume.
If the circuit is not optimized appropriately in the hardware specific way, such as parallelizing or pipelining, the FPGA could get slower speed than a CPU with same clock range.
As the solution for these problems, there are various researches that make the circuit scale smaller for DNNs:
Courbariaux et al.  showed that image classification using the neural networks with binary weights and activations(BNN) is enough possible. And they showed the method to train the BNN.
Umuroglu et al. presented the FINN, a framework for building fast and flexible FPGA accelerators that enable efficient mapping of binarized neural networks to hardware.
A good survey has published by ABDELOUAHAB et al.. This survey reports various type of implementation technique of CNN on FPGA.
The another good survey has written by Cheng et al.. They discussed the recent techniques for compacting and accelerating CNNs in detail.
DCGAN(Radford et al.) is a GAN training method for image generation using CNNs.
This work makes the following contributions:
We introduce binarized & integer-valued DCGAN(deep convolutional conditional generative adversarial networks(B-DCGAN).
We show the quality change of output from B-DCGAN Generator when the extent of binarization was changed.
Binary Deep Convolutional & Conditional Generative Adversarial Networks(B-DCGAN) is a version of DCGAN that its Generator has binary weights, binary activations, and integer-valued operations. The Discriminator is vanilla network as the normal DCGAN, that is, it uses real-valued operation. And B-DCGAN also uses the conditional input method like as CGAN.
Network structure of B-DCGAN Generator is shown in Fig.1. The first half of the Generator is consist of full connection layers, we called it Encoder. The last half is named Decoder, is consist of deconvolution layers.
We show the details of these layers in following sections.
The Generator’s inputs are and . In vanilla GAN,
is 1D-vector of random floating point values in range. In B-DCGAN, we use integer value for input as follows:
The is one-hot vector representation of class label to be generated. It is also converted into integer value :
The constant is a hyper parameter of B-DCGAN. and is integer value represented by signed k bits. The integer value is very advantageous from a hardware perspective. Because the number of circuit logics of integer process is much smaller than those of floating point in FPGA.
We have investigated the influence of the value setting for quality of Generator’s output as we explain in Section 3.
Implementation of Full Connection layer in B-DCGAN is based on those of BNN. During the backward pass(at train-time), its weight value is real-valued variable same as the normal full connection. During the forward pass(both at run-time and train-time), the weight value is binarized, that is, the value is constrained to either +1 or -1 as follows:
We have investigated the influence of this binarization for quality of Generator’s output as we explain in Section 3.
In our B-DCGAN, batch normalization layer is always followed by non-linear activation layer, so we consider these two layers as one layer. This consideration is based on the ’Batchnorm-activation as Threshold’ in the FINN paper(Umuroglu et al.), however, we have modified it as integer-valued version.
be the output of j-th neuron of the previous layer of this layer, andbe the batch normalization parameters learned during training. The output of this layer in computed as:
The threshold is computed by solving and rounding:
During training, although value is always changing according to the change of batchnorm parameters, but in run-time, it can be used as fixed value.
The behavior of binarized deconvolution layer(B-Deconv) is similar to those of the B-FC layer described at 2.3. That is, during the backward pass(at train-time), its weight value of filters are real-valued variable same as the normal deconvolution. During the forward pass(both at run-time and train-time), the weight value of filters are all binarized through the value is constrained to either +1 or -1 according to the equation (6).
We have also investigate the influence of using B-Deconv for output quality of the Generator(see 3.2.4).
To exam B-BDCGAN, we arranged several configurations of network binarization. We present a configuration as a set of flags and hyper-parameters described in Table.1. We call it ’Scenario’. The ’Input as integer’ flag represents whether input is integer-valued(Y) or real-valued(n). Other flags correspond with whether each layer is binarized(Y) or not(n). The ’ value’ is input scale value that is only effective when the ’Input as integer’ is positive(Y), had explained in 2.2.
|Input as integer||n||Y||Y||Y||Y||Y||Y|
The number of units in Full Connection layer is 600.
Filter size of Deconvolution layer-1/2 is 5x5.
The number of filters of Deconvolution layer-1 is 64.
The number of filters of Deconvolution layer-2 is 1.
We used the MNIST
dataset for training. We made the training program by python using Theano library and Lasagne  library, and partially using BinaryNet  code and DCGAN  code. We run the program on Ubuntu Linux with NVIDIA Titan-X GPU. Our program code is available on-line111https://github.com/hterada/b-dcgan.
The training was using stochastic gradient decent(SGD) with mini-batches of size 128 and initial learning rate of 0.0001 which was linearly decreased down to 0 with decay step of 0.0001/3000=.
We judged the finish of training of each scenario through visual assessment for the quality of output images from the Generator. That is, at each training iteration, the program writes output image and current parameters from the Generator at that time to individual files, so we can examine these files in order to find when the quality reaches its peak. For example, Figure.2 shows output images in the middle of training and the peak image. As this figure shows, the quality is rising according to iterations of training.
When the training was finished, we had got the set of model parameters of peak quality. These parameters were written in python pickle formatted files with filename extension ’.jl’. We made a dedicated python program called ’model_to_ch.py’, which converts ’.jl’ file to C++ header files, where the model parameters are expressed as C++ constant variables. These header files are used for building the B-DCGAN Generator in Xilinx Vivado HLS for FPGA.
A binarized parameter represented by +1 or -1 in training model should be represented by 1 or 0(1-bit value) in FPGA, so the ’model_to_ch.py’ is converts +1/-1 parameters into such bit-mapped expression.
Scenario S0 is the baseline of image quality. Figure.3 shows output image of peak quality(we call it ’peak image’) from the trained model configured in S0.
Scenario S2-1 and S2-2 are experiment to examine influence of ’’ value of Encoder.
S2-1 set value to 127
S2-2 set value to 4095
Scenario S3-1 and S3-2 are experiment to examine influence of Decoder binarization.
S3-1 binarize Encoder, B-Deconv-1 and B-BNA-2 in Decoder
S3-2 binarize Encoder, B-Deconv-1, B-BNA-2 and B-Deconv-2 in Decoder; that is all layers in Decoder
According to the results in each scenarios above, we can make the following observations:
The output quality is affected very few by only binarization of the Encoder(cf. S0, S1-1, S1-2).
No influence for quality is appeared when the value has changed(cf. S2-1, S2-2).
The output quality is somewhat degraded by binarization both the Encoder and partially of the Decoder(cf. S3-1)
The output quality is degraded seriously by full binarization of both the Encoder and the Decoder(cf. S3-2)
The output quality chiefly depends on the Deconv-2 layer because the quality difference between S3-1 and S3-2.
At least on the MNIST dataset, we can select the S3-1 as the best scenario for the B-DCGAN.
We have introduced B-DCGAN, DCGAN with binary weights and activations, and integer-valued input. We have conducted several scenarios of experiments on the Theano/Lasagne libraries, which show that it is possible to binarize DCGAN model, and show what extent is the binarization available with keeping output quality acceptable.
According to results of these experiments, the last layer B-Deconv-2 mainly have the initiative for output image quality. So the last layer have to operate in real-valued process. Except for the last layer, other layers are able to be fully binarized.
We would like to thank the UEC Shouno lab222http://daemon.inf.uec.ac.jp/ja/ members: Satoshi Suzuki and Aiga Suzuki for theoretical and technical discussion; Seigo Kawamura, Kurosaka Mamoru, Yoshihiro Kusano, Toya Teramoto and Akihiro Endo for kind technical assistance and humor. We thank Kazuhiko Yoshihara and all the members of Open Stream, Inc. We also thank the developers of Theano, Lasagne and Python environment. This work is supported under the funds of Open Stream, Inc.333https://www.opst.co.jp/