1 Introduction
Deep neural networks (DNNs) begin to find many realtime applications, such as speech recognition, autonomous driving, gesture recognition, and robotic control (Sak et al., 2015; Chen et al., 2015; Jalab et al., 2015; Corradini et al., 2015). Although most of deep neural networks are implemented using GPUs (Graphics Processing Units) in these days, their implementation in hardware can give many benefits in terms of power consumption and system size (Ovtcharov et al., 2015). FPGA based implementation examples of CNN show more than 10 times advantage in power consumption (Ovtcharov et al., 2015).
Neural network algorithms employ many multiply and add (MAC) operations that mimic the operations of biological neurons. This suggests that reconfigurable hardware arrays that contain quite homogeneous hardware blocks, such as MAC units, can give very efficient solution to realtime neural network system design. Early studies on wordlength determination of neural networks reported the needed precision of at least 8 bits
(Holt & Baker, 1991). Our recent works show that the precision required for implementing FFDNN, CNN or RNN needs not be very high, especially when the quantized networks are trained again to learn the effects of lowered precision. In the fixedpoint optimization examples shown in Hwang & Sung (2014); Anwar et al. (2015); Shin et al. (2015), neural networks with ternary weights showed quite good performance which was close to that of floatingpoint arithmetic.
In this work, we try to know if retraining can recover the performance of FFDNN and CNN under quantization with only ternary (+1, 0, 1) levels or 3 bits (+3, +2, +1, 0, 1, 2, 3) for weight representation. Note that bias values are not quantized. For this study, the network complexity is changed to analyze their effects on the performance gap between floatingpoint and retrained lowprecision fixedpoint deep neural networks.
We conduct our experiments with a feedforward deep neural network (FFDNN) for phoneme recognition and a convolutional neural network (CNN) for image classification. To control the network size, not only the number of units in each layer but also the number of hidden layers are varied in the FFDNN. For the CNN, the number of feature maps for each layer and the number of layers are both changed. The FFDNN uses the TIMIT corpus and the CNN employs the CIFAR10 dataset. We also propose a metric called effective compression ratio (ECR) for comparing extremely quantized bigger networks with moderately quantized or floatingpoint networks with the smaller size. This analysis intends to find an insight to the knowledge representation capability of highly quantized networks, and also provides a guideline to network size and wordlength determination for efficient hardware implementation of DNNs.
2 Related Work
Fixedpoint implementation of signal processing algorithms has long been of interest for VLSI based design of multimedia and communication systems. Some of early works used statistical modeling of quantization noise for application to linear digital filters. The simulationbased wordlength optimization method utilized simulation tools to evaluate the fixedpoint performance of a system, by which nonlinear algorithms can be optimized (Sung & Kum, 1995). Ternary (+1, 0, 1) coefficients based digital filters were used to eliminate multiplications at the cost of higher quantization noise. The implementation of adaptive filters with ternary weights were developed, but it demanded oversampling to remove the quantization effects (Hussain et al., 2007).
Fixedpoint neural network design also has been studied with the same purpose of reducing the hardware implementation cost (Moerland & Fiesler, 1997). In Holt & Baker (1991), back propagation simulation with 16bit integer arithmetic was conducted for several problems, such as NetTalk, Parity, Protein and so on. This work conducted the experiments while changing the number of hidden units, which was, however, relatively small numbers. The integer simulations showed quite good results for NetTalk and Parity, but not for Protein benchmarks. With direct quantization of trained weights, this work also confirmed satisfactory operation of neural networks with 8bit precision. An implementation with ternary weights were reported for neural network design with optical fiber networks (Fiesler et al., 1990). In this ternary network design, the authors employed retraining after direct quantization to improve the performance of a shallow network.
Recently, fixedpoint design of DNNs is revisited, and FFDNN and CNN with ternary weights show quite good performances that are very close to the floatingpoint results. The ternary weight based FFDNN and CNN are used for VLSI and FPGA based implementations, by which the algorithms can operate with only onchip memory consuming very low power (Kim et al., 2014). Binary weight based deep neural network design is also studied (Courbariaux et al., 2015). Pruned floatingpoint weights are also utilized for efficient GPU based implementations, where small valued weights are forced to zero to reduce the number of arithmetic operations and the memory space for weight storage (Yu et al., 2012b; Han et al., 2015)
. A network restructuring technique using singular value decomposition technique is also studied
(Xue et al., 2013; Rigamonti et al., 2013).3 Fixedpoint FFDNN and CNN Design
This section explains the design of FFDNN and CNN with varying network complexity and, also, the fixedpoint optimization procedure.
3.1 FFDNN and CNN Design
A feedforward deep neural network with multiple hidden layers are depicted in Figure 1. Each layer
has a signal vector
, which is propagated to the next layer by multiplying the weight matrix , adding biases, and applying the activation function
as follows:(1) 
One of the most popular activation functions is the rectified linear unit defined as
(2) 
In this work, an FFDNN for phoneme recognition is used. The reference DNN has four hidden layers. Each of the hidden layers has units; the value of is changed to control the complexity of the network. We conduct experiments with the
size of 32, 64, 128, 256, 512, and 1024. The number of hidden layers is also reduced. The input layer of the network has 1,353 units to accept 11 frames of a Fouriertransformbased filterbank with 40 coefficients (
energy) distributed on a melscale, together with their first and second temporal derivatives. The output layer consists of 61 softmax units which correspond to 61 target phoneme labels. Phoneme recognition experiments were performed on the TIMIT corpus. The standard 462 speaker set with all SA records removed was used for training, and a separate development set of 50 speaker was used for early stopping. Results are reported for the 24speaker core test set. The network was trained using a backpropagation algorithm with 128 minibatch size. Initial learning rate was
and it was decreased untilduring the training. Momentum was 0.9 and RMSProp was adopted for weights update
(Tieleman & Hinton, 2012). The dropout technique was employed with 0.2 dropout rate in each layer.The CNN used is for CIFAR10 dataset. It contains a training set of 50,000 and a test set of 10,000 3232 RGB color images representing airplanes, automobiles, birds, cats, deers, dogs, frogs, horses, ships and trucks. We divided the training set to 40,000 images for training and 10,000 images for validation. This CNN has 3 convolution and pooling layers and a fully connected hidden layer with 64 units, and the output has 10 softmax units as shown in Figure 2.
We control the number of feature maps in each convolution layer. The reference size has 323264 feature maps with 5 by 5 kernel size as used in Krizhevskey (2014). We did not perform any preprocessing and data augmentation such as ZCA whitening and global contrast normalization. To know the effects of network size variation, the number of feature maps is reduced or increased. The configurations of the feature maps used for the experiments are 8816, 161632, 323264, 6464128, 9696192, and 128128256. The number of feature map layers is also changed, resulting in 323264, 3264, and 64 map configurations. Note that the fully connected layer in the CNN is not changed. The network was trained using a backpropagation algorithm with 128 minibatch size. Initial learning rate was 0.001 and it was decreased to during the training procedure. Momentum was 0.8 and RMSProp was applied for weights update.
3.2 Fixedpoint optimization of DNNs
Reducing the wordlength of weights brings several advantages in hardware based implementation of neural networks. First, it lowers the arithmetic precision, and thereby reduces the number of gates needed for multipliers. Second, the size of memory for storing weights is minimized, which would be a big advantage when keeping them on a chip, instead of external DRAM or NAND flash memory. Note that FFDNNs and recurrent neural networks demand a very large number of weights. Third, the reduced arithmetic precision or minimization of offchip memory accesses leads to low power consumption. However, we need to concern the quantization effects that degrade the system performance.
Direct quantization converts a floatingpoint value to the closest integer number, which is conventionally used in signal processing system design. However, direct quantization usually demands more than 8 bits, and does not show good performance when the number of bits is small. In fixedpoint deep neural network design, retraining of quantized weights shows quite good performance.
The fixedpoint DNN algorithm design consists of three steps: floatingpoint training, direct quantization, and retraining of weights. The floatingpoint training procedure can be any of the state of the art techniques, which may include unsupervised learning and dropout. Note that fixedpoint optimization needs to be based on the best performing floatingpoint weights. Thus, the floatingpoint weight optimization may need to be conducted several times with different initializations, and this step consumes the most of the time. After the floatingpoint training, direct quantization is followed.
For direct quantization, uniform quantization function is employed and the function is defined as follows :
(3) 
where is a sign function, is a quantization step size, and represents the number of quantization levels. Note that
needs to be an odd number since the weight values can be positive or negative. When
is 7, the weights are represented by 3, 2, 1, 0, +1, +2, +3,which can be represented in 3 bits.The quantization step size is determined to minimize the L2 error, , depicted as follows.
(4) 
where is the number of weights in each weight group, is the th weight value represented in floatingpoint. This process needs some iterations, but does not take much time.
For network retraining, we maintain both floatingpoint and quantized weights because the amount of weight updates in each training step is much smaller than the quantization step size . The forward and backward propagation is conducted using quantized weights, but the weight update is applied to the floatingpoint weights and newly quantized values are generated at each iteration. This retraining procedure usually converges quickly and does not take much time when compared to the floatingpoint training.
4 Analysis of quantization effects
4.1 Direct quantization
The performance of the FFDNN and the CNN with directly quantized weights is analyzed while varying the number of units in each layer or the number of feature maps, respectively. In this analysis, the quantization is performed on each weight group, which is illustrated in Figure 1 and Figure 2, to know the sensitivity of wordlength reduction. In this subsection, we try to analyze the effects of direct quantization.
The quantized weight can be represented as follows,
(5) 
where is the distortion of each weight due to quantization. In the direct quantization, we can assume that the distortion is not dependent each other.
Consider a computation procedure for a unit in a hidden layer, the signal from the previous layer is summed up after multiplication with the weights as illustrated in Figure (a)a. We can also assemble a model for distortion, which is shown in Figure (b)b. In the distortion model, since is independent each other, we can assume that the effects of the summed distortion is reduced according to the random process theory. This analysis means that the quantization effects are reduced when the number of units in the anterior layer increases, but slowly.
Figure (a)a illustrates the performance of the FFDNN with floatingpoint arithmetic, 2bit direct quantization of all the weights, and 2bit direct quantization only on the weight group ‘Inh1’, ‘h1h2’, and ‘h4out’. Consider the quantization performance of the ‘Inh1’ layer, the phoneerror rate is higher than the floatingpoint result with an almost constant amount, about 10%. Note that the number of input to the ‘Inh1’ layer is fixed, 1353, regardless of the hidden unit size. Thus, the amount of distortion delivered to each unit of the hidden layer 1 can be considered unchanged. Figure (a)a also shows the quantization performance on ‘h1h2’ and ‘h4out’ layers, which informs the trend of reduced gap to the floatingpoint performance as the network size increases. This can be explained by the sum of increased number of independent distortions when the network size grows. The performance of all 2bit quantization also shows the similar trend of reduced gap to the floatingpoint performance. But, apparently, the performance of 2bit directly quantized networks is not satisfactory.
In Figure (b)b, a similar analysis is conducted to the CNN with direct quantization when the number of feature maps increases or decreases. In the CNN, the number of input to each output is determined by the number of input feature maps and the kernel size. For example, at the first layer C1, the number of input signal for computing one output is only 75 (=325) regardless of the network size, where the input map size is always 3 and the kernel size is 25. However, at the second layer C2, the number of input feature maps increases as the network size grows. When the feature map of 323264 is considered, the number of input for the C2 layer grows to 800 (=3225). Thus, we can expect a reduced distortion as the number of feature maps increases.
Figure (a)a shows the performance of direct quantization with 2, 4, 6, and 8bit precision when the network complexity varies. In the FFDNN, 6 bit direct quantization seems enough when the network size is larger than 128. But, small FFDNNs demand 8 bits for near floatingpoint performance. The CNN in Figure (b)b also shows the similar trend. The direct quantization requires about 6 bits when the feature map configuration is 161632 or larger.
4.2 Effects of retraining on quantized networks
Retraining is conducted on the directly quantized networks using the same data for floatingpoint training. The fixedpoint performance of the FFDNN is shown in Figure (a)a when the number of hidden units in each layer varies. The performance of direct 2 bits (ternary levels), direct 3 bits (7levels), retrainbased 2 bits, and retrainbased 3 bits are compared with the floatingpoint simulation. We can find that the performance gap between the floatingpoint and the retrainbased fixedpoint networks converges very fast as the network size grows. Although the performance gap between the direct and the floatingpoint networks also converges, the rate of convergence is significantly different. In this figure, the performance of the floatingpoint network almost saturates when the network size is about 1024. Note that the TIMIT corpus that is used for training has only 3 hours of data. Thus, the network with 1024 hidden units can be considered in the ‘trainingdata limited region’. Here, the gap between the floatingpoint and fixedpoint networks almost vanishes when the network is in the ‘trainingdata limited region’. However, when the network size is limited, such as 32, 64, 128, or 256, there is some performance gap between the floatingpoint and highly quantized networks even if retraining on the quantized networks is performed.
The similar experiments are conducted for the CNN with varying feature map sizes, and the results are shown in Figure (b)b
. The configuration of the feature maps used for the experiments are 8816, 161632, 323264, 6464128, 9696192, and 128128256. The size of the fully connected layer is not changed. In this figure, the floatingpoint and the fixedpoint performances with retraining also converge very fast as the number of feature maps increases. The floatingpoint performance saturates when the feature map size is 128128256, and the gap is less than 1% when comparing the floatingpoint and the retrainbased 2bit networks. However, also, there is some performance gap when the number of feature maps is reduced. This suggests that a fairly high performance feature extraction can be designed even using very lowprecision weights if the number of feature maps can be increased.
4.3 Fixedpoint performances when varying the depth
It is well known that increasing the depth usually results in positive effects on the performance of a DNN (Yu et al., 2012a). The network complexity of a DNN is changed by increasing or reducing the number of hidden layers or feature map levels. The result of fixedpoint and floatingpoint performances when varying the number of hidden layers for the FFDNN is summarized in Table 1. The number of units in each hidden layer is 512. This table shows that both the floatingpoint and the fixedpoint performances of the FFDNN increase when adding hidden layers from 0 to 4. The performance gap between the floatingpoint and the fixedpoint networks shrinks as the number of levels increases.

# Quantization levels  Direct  Retraining  Difference  


3level  69.88%  38.58%  3.91%  
7level  56.81%  36.57%  1.90%  

3level  47.74%  33.89%  2.38%  
7level  36.99%  33.04%  1.53%  

3level  49.27%  33.05%  2.24%  
7level  36.58%  31.72%  0.91%  

3level  48.13%  31.86%  1.55%  
7level  34.77%  31.49%  1.18% 
The network complexity of the CNN is also varied by reducing the level of feature maps as shown in Table 2. As expected, the performance of both the floatingpoint and retrainbased lowprecision networks degrades as the number of levels is reduced. The performance gap between them is very small with 7level quantization for all feature map levels.
These results for the FFDNN and the CNN with varied number of levels also show that the effects of quantization can be much reduced by retraining when the network contains some redundant complexity.

# Quantization levels  Direct  Retraining  Difference  


3level  72.95%  35.37%  1.18%  
7level  46.60%  34.15%  0.04%  

3level  55.30%  29.51%  0.22%  
7level  39.80%  29.32%  0.03%  

3level  79.88%  27.94%  1.07%  
7level  47.91%  26.95%  0.08% 
5 Effective compression ratio
So far we have examined the effect of direct and retrainingbased quantization to the final classification error rates. As the number of quantization level decreases, more memory space can be saved at the cost of sacrificing the accuracy. Therefore, there is a tradeoff between the total memory space for storing weights and the final classification accuracy. In practice, investigating this tradeoff is important for deciding the optimal bitwidths for representing weights and implementing the most efficient neural network hardware.
In this section, we propose a guideline for finding the optimal bitwidths in terms of the total number of bits consumed by the network weights when the desired accuracy or the network size is given. Note that we assume quantization levels are represented by bits (i.e. 2 bits are required for representing a ternary weight). For simplicity, all layers are quantized with the same number of quantization levels. However, the similar approach can be applied to the layerwise quantization analysis.
The optimal combination of the bitwidth and layer size can be found when the number of total bits or the accuracy is given as shown in Figure 7. The figure shows the framewise phoneme error rate on TIMIT with respect to the number of total bits, while varying the layer size of DNNs with various number of quantization bits from 2 to 8 bits. The network has 4 hidden layers with the uniform sizes. With direct quantization, the optimal hardware design can be achieved with about 5 bits. On the other hand, the weight representation with only 2 bits shows the best performance after retraining.
The remaining question is how much memory space can be saved by quantization while maintaining the accuracy. To examine this, we introduce a metric called effective compression ratio (ECR), which is defined as follows:
(6) 
The compressed size is the total memory bits required for storing all weights with quantization. The effective uncompressed size is the total memory size with 32bit floating point representation when the network achieves the same accuracy as that of the quantized network.
Figure 8 describes how to obtain the effective number of parameters for uncompressed networks. Specifically, by varying the size, we find the number of total parameters of the floatingpoint network that shows the same accuracy as the quantized one. After that, the effective uncompressed size can be computed by multiplying 32 bits to the effective number of parameters.
Once we get the corresponding effective uncompressed size for the specific network size and the number of quantization bits, the ECR can be computed by (6). The ECRs for the direct and retrainbased quantization for various network sizes and quantization bits are shown in Figure 9. For the direct quantization, 5 bit quantization shows the best ECR except for the layer size of 1024. On the other hand, even 2 bit quantization performs better than the others after retraining. That is, after retraining, a bigger network with extreme ternary (2 bit) quantization is more efficient in terms of the memory usage for weights than any other smaller networks with higher quantization bits when they are compared at the same accuracy.
6 Discussion
In this study, we control the network size by changing the number of units in the hidden layers, the number of feature maps, or the number of levels. At any case, reduced complexity lowers the resiliency to quantization. We are now conducting similar experiments to the recurrent neural networks that are known to be more sensitive to quantization (Shin et al., 2015). This work seems to be directly related to several network optimization methods, such as pruning, fault tolerance, and decomposition (Yu et al., 2012b; Han et al., 2015; Xue et al., 2013; Rigamonti et al., 2013). In the pruning, retraining of weights is conducted after zeroing small valued weights. The effects of pruning, fault tolerance, and network decomposition efficiency would be dependent on the redundant representation capability of DNNs.
This study can be applied to hardware efficient DNN design. For design with limited hardware resources, when the size of the reference DNN is relatively small, it is advised to employ a very lowprecision arithmetic and, instead, increase the network complexity as much as the hardware capacity allows. But, when the DNNs are in the performance saturation region, this strategy does not always gain much because growing the ‘alreadybig’ network size brings almost no performance advantages. This can be observed in Figure (b)b and Figure (b)b where 6 bit quantization performed best at the largest layer size (1,024).
7 Conclusion
We analyze the performance of fixedpoint deep neural networks, an FFDNN for phoneme recognition and a CNN for image classification, while not only changing the arithmetic precision but also varying their network complexity. The lowprecision networks for this analysis are obtained by using the retrain based quantization method, and the network complexity is controlled by changing the configurations of the hidden layers or feature maps. The performance gap between the floatingpoint and the fixedpoint neural networks with ternary weights (+1, 0, 1) almost vanishes when the DNNs are in the performance saturation region for the given training data. However, when the complexity of DNNs are reduced, by lowering either the number of units, feature maps, or hidden layers, the performance gap between them increases. In other words, a large size network that may contain redundant representation capability for the given training data does not hurt by the lowered precision, but a very compact network does.
Acknowledgments
This work was supported in part by the Brain Korea 21 Plus Project and the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIP) (No. 2015R1A2A1A10056051).
References
 Anwar et al. (2015) Anwar, Sajid, Hwang, Kyuyeon, and Sung, Wonyong. Fixed point optimization of deep convolutional neural networks for object recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 1131–1135. IEEE, 2015.
 Chen et al. (2015) Chen, Chenyi, Seff, Ari, Kornhauser, Alain, and Xiao, Jianxiong. Deepdriving: Learning affordance for direct perception in autonomous driving. arXiv preprint arXiv:1505.00256, 2015.
 Corradini et al. (2015) Corradini, Maria Letizia, Giantomassi, Andrea, Ippoliti, Gianluca, Longhi, Sauro, and Orlando, Giuseppe. Robust control of robot arms via quasi sliding modes and neural networks. In Advances and Applications in Sliding Mode Control systems, pp. 79–105. Springer, 2015.
 Courbariaux et al. (2015) Courbariaux, Matthieu, Bengio, Yoshua, and David, JeanPierre. Binaryconnect: Training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363, 2015.
 Fiesler et al. (1990) Fiesler, Emile, Choudry, Amar, and Caulfield, H John. Weight discretization paradigm for optical neural networks. In The Hague’90, 1216 April, pp. 164–173. International Society for Optics and Photonics, 1990.
 Han et al. (2015) Han, Song, Mao, Huizi, and Dally, William J. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. 2015.
 Holt & Baker (1991) Holt, Jordan L and Baker, Thomas E. Back propagation simulations using limited precision calculations. In Neural Networks, 1991., IJCNN91Seattle International Joint Conference on, volume 2, pp. 121–126. IEEE, 1991.
 Hussain et al. (2007) Hussain, B Zahir M et al. Short wordlength lms filtering. In Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on, pp. 1–4. IEEE, 2007.
 Hwang & Sung (2014) Hwang, Kyuyeon and Sung, Wonyong. Fixedpoint feedforward deep neural network design using weights +1, 0, and 1. In Signal Processing Systems (SiPS), 2014 IEEE Workshop on, pp. 1–6. IEEE, 2014.
 Jalab et al. (2015) Jalab, Hamid A, Omer, Herman, et al. Human computer interface using hand gesture recognition based on neural network. In Information Technology: Towards New Smart World (NSITNSW), 2015 5th National Symposium on, pp. 1–6. IEEE, 2015.
 Kim et al. (2014) Kim, Jonghong, Hwang, Kyuyeon, and Sung, Wonyong. X1000 realtime phoneme recognition VLSI using feedforward deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 7510–7514. IEEE, 2014.
 Krizhevskey (2014) Krizhevskey, A. CUDAconvnet, 2014.
 Moerland & Fiesler (1997) Moerland, Perry and Fiesler, Emile. Neural network adaptations to hardware implementations. Technical report, IDIAP, 1997.
 Ovtcharov et al. (2015) Ovtcharov, Kalin, Ruwase, Olatunji, Kim, JooYoung, Fowers, Jeremy, Strauss, Karin, and Chung, Eric S. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2, 2015.
 Rigamonti et al. (2013) Rigamonti, Roberto, Sironi, Amos, Lepetit, Vincent, and Fua, Pascal. Learning separable filters. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 2754–2761. IEEE, 2013.
 Sak et al. (2015) Sak, Haşim, Senior, Andrew, Rao, Kanishka, and Beaufays, Françoise. Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947, 2015.
 Shin et al. (2015) Shin, Sungho, Hwang, Kyuyeon, and Sung, Wonyong. Fixed point performance analysis of recurrent neural networks. arXiv preprint arXiv:1512.01322, 2015.
 Sung & Kum (1995) Sung, Wonyong and Kum, KiII. Simulationbased wordlength optimization method for fixedpoint digital signal processing systems. Signal Processing, IEEE Transactions on, 43(12):3087–3090, 1995.

Tieleman & Hinton (2012)
Tieleman, Tijmen and Hinton, Geoffrey.
Lecture 6.5rmsprop: Divide the gradient by a running average of its
recent magnitude.
COURSERA: Neural Networks for Machine Learning
, 4, 2012.  Xue et al. (2013) Xue, Jian, Li, Jinyu, and Gong, Yifan. Restructuring of deep neural network acoustic models with singular value decomposition. In INTERSPEECH, pp. 2365–2369, 2013.
 Yu et al. (2012a) Yu, Dong, Deng, Alex Acero, Dahl, George, Seide, Frank, and Li, Gang. More data + deeper model = better accuracy. In keynote at International Workshop on Statistical Machine Learning for Speech Processing, 2012a.
 Yu et al. (2012b) Yu, Dong, Seide, Frank, Li, Gang, and Deng, Li. Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 4409–4412. IEEE, 2012b.
Comments
There are no comments yet.