1 Introduction
Quantization is a widely used and necessary approach to convert heavy Deep Neural Network (DNN) models in Floating Point (FP32) format to a lightweight lower precision format, compatible with edge device inference. The introduction of lower precision computing hardware like
Qualcomm Hexagon DSP (Codrescu, 2015) resulted in various quantization methods (Morgan and others, 1991; Rastegari et al., 2016; Wu et al., 2016; Zhou et al., 2017; Li et al., 2019; Dong et al., 2019; Krishnamoorthi, 2018) compatible for edge devices. Quantizing a FP32 DNN to INT8 or lower precision results in model size reduction by at least based on the precision opted for. Also, since the computations happen in lower precision, it implicitly results in faster inference time and lesser power consumption. The above benefits with quantization come with a caveat of accuracy loss, due to noise introduced in the model’s weights and activations.In order to reduce this accuracy loss, quantization aware finetuning methods are introduced (Zhu et al., 2016; Zhang et al., 2018; Choukroun et al., 2019; Jacob et al., 2018; Baskin et al., 2019; Courbariaux et al., 2015), wherein the FP32 model is trained along with quantizers and quantized weights. The major disadvantages of these methods are, they are computationally intensive and timeconsuming since they involve the whole training process. To address this, various posttraining quantization methods (Morgan and others, 1991; Wu et al., 2016; Li et al., 2019; Banner et al., 2019) are developed that resulted in trivial to heavy accuracy loss when evaluated on different DNNs. Also, to determine the quantized model’s weight and activation ranges most of these methods require access to training data, which may not be always available in case of applications with security and privacy constraints which involve card details, health records, and personal images. Contemporary research in posttraining quantization (Nagel et al., 2019; Cai et al., 2020)
eliminated the need for training data for quantization by estimating the quantization parameters from the BatchNormalization (BN) layer statistics of the FP32 model but fail to produce better accuracy when BN layers are not present in the model.
To address the above mentioned shortcomings, this paper proposes a dataindependent posttraining quantization method that estimates the quantization ranges by leveraging on ‘retrosynthesis’ data generated from the original FP32 model. This method resulted in better accuracy as compared to both dataindependent and datadependent stateoftheart quantization methods on models ResNet18, ResNet50 (He et al., 2016), MobileNetV2 (Sandler et al., 2018), AlexNet (Krizhevsky et al., 2012) and ISONet (Qi et al., 2020) on ImageNet dataset (Deng et al., 2009). It also outperformed stateoftheart methods even for lower precision such as 6 and 4 bit on ImageNet and CIFAR10 datasets. The ‘retrosynthesis’ data generation takes only 10 to 12 sec of time to generate the entire dataset which is a minimal overhead as compared to the benefit of data independence it provides. Additionally, this paper introduces two variants of posttraining quantization methods namely ‘Hybrid Quantization’ and ‘NonUniform Quantization’.
2 Prior Art
2.1 Quantization aware training based methods
An efficient integer only arithmetic inference method for commonly available integer only hardware is proposed in Jacob et al. (2018), wherein a training procedure is employed which preserves the accuracy of the model even after quantization. The work in Zhang et al. (2018) trained a quantized bit compatible DNN and associated quantizers for both weights and activations instead of relying on handcrafted quantization schemes for better accuracy. A ‘Trained Ternary Quantization’ approach is proposed in Zhu et al. (2016) wherein the model is trained to be capable of reducing the weights to 2bit precision which achieved model size reduction by 16x without much accuracy loss. Inspired by other methods Baskin et al. (2019) proposes a ‘Compression Aware Training’ scheme that trains a model to learn compression of feature maps in a better possible way during inference. Similarly, in binary connect method (Courbariaux et al., 2015) the network is trained with binary weights during forward and backward passes that act as a regularizer. Since these methods majorly adopt training the networks with quantized weights and quantizers, the downside of these methods is not only that they are timeconsuming but also they demand training data which is not always accessible.
2.2 Post training quantization based methods
Several posttraining quantization methods are proposed to replace timeconsuming quantization aware training based methods. The method in Choukroun et al. (2019) avoids full network training, by formalizing the linear quantization as ‘Minimum Mean Squared Error’ and achieves better accuracy without retraining the model. ‘ACIQ’ method (Banner et al., 2019) achieved accuracy close to FP32 models by estimating an analytical clipping range of activations in the DNN. However, to compensate for the accuracy loss, this method relies on a runtime perchannel quantization scheme for activations which is inefficient and not hardware friendly. In similar lines, the OCS method (Zhao et al., 2019)
proposes to eliminate the outliers for better accuracy with minimal overhead. Though these methods considerably reduce the time taken for quantization, they are unfortunately tightly coupled with training data for quantization. Hence they are not suitable for applications wherein access to training data is restricted. The contemporary research on data free posttraining quantization methods was successful in eliminating the need for accessing training data. By adopting a pertensor quantization approach, the DFQ method
(Nagel et al., 2019) achieved accuracy similar to the perchannel quantization approach through cross layer equalization and bias correction. It successfully eliminated the huge weight range variations across the channels in a layer by scaling the weights for cross channels. In contrast ZeroQ (Cai et al., 2020) proposed a quantization method that eliminated the need for training data, by generating distilled data with the help of the BatchNormalization layer statistics of the FP32 model and using the same for determining the activation ranges for quantization and achieved stateoftheart accuracy. However, these methods tend to observe accuracy degradation when there are no BatchNormalization layers present in the FP32 model.To address the above shortcomings the main contributions in this paper are as follows:

A dataindependent posttraining quantization method by generating the ‘Retro Synthesis’ data, for estimating the activation ranges for quantization, without depending on the BatchNormalization layer statistics of the FP32 model.

Introduced a ‘Hybrid Quantization’ method, a combination of PerTensor and PerChannel schemes, that achieves stateoftheart accuracy with lesser inference time as compared to fully perchannel quantization schemes.

Recommended a ‘NonUniform Quantization’ method, wherein the weights in each layer are clustered and then allocated with a varied number of bins to each cluster, that achieved ‘1%’ better accuracy against stateoftheart methods on ImageNet dataset.
3 Methodology
This section discusses the proposed dataindependent posttraining quantization methods namely (a) Quantization using retrosynthesis data, (b) Hybrid Quantization, and (c) NonUniform Quantization.
3.1 Quantization using retro synthesis data
In general, posttraining quantization schemes mainly consist of two parts  (i) quantizing the weights that are static in a given trained FP32 model and (ii) determining the activation ranges for layers like ReLU, Tanh, Sigmoid that vary dynamically for different input data. In this paper, asymmetric uniform quantization is used for weights whereas the proposed ‘retrosynthesis’ data is used to determine the activation ranges. It should be noted that we purposefully chose to resort to simple asymmetric uniform quantization to quantize the weights and also have not employed any advanced techniques such as outlier elimination of weight clipping for the reduction of quantization loss. This is in the interest of demonstrating the effectiveness of ‘retrosynthesis’ data in accurately determining the quantization ranges of activation outputs. However, in the other two proposed methods (b), and (c) we propose two newly developed weight quantization methods respectively for efficient inference with improved accuracy.
3.1.1 Retrosynthesis Data Generation
Aiming for a dataindependent quantization method, it is challenging to estimate activation ranges without having access to the training data. An alternative is to use “random data” having Gaussian distribution with ‘zero mean’ and ‘unit variance’ which results in inaccurate estimation of activation ranges thereby resulting in poor accuracy. The accuracy degrades rapidly when quantized for lower precisions such as 6, 4, and 2 bit. Recently ZeroQ
(Cai et al., 2020) proposed a quantization method using distilled data and showed significant improvement, with no results are showcasing the generation of distilled data for the models without BatchNormalization layers and their corresponding accuracy results.In contrast, inspired by ZeroQ (Cai et al., 2020) we put forward a modified version of the data generation approach by relying on the fact that, DNNs which are trained to discriminate between different image classes embeds relevant information about the images. Hence, by considering the class loss for a particular image class and traversing through the FP32 model backward, it is possible to generate image data with similar statistics of the respective class. Therefore, the proposed “retrosynthesis” data generation is based on the property of the trained DNN model, where the image data that maximizes the class score is generated, by incorporating the notion of the class features captured by the model. Like this, we generate a set of images corresponding to each class using which the model is trained. Since the data is generated from the original model itself we named the data as “retrosynthesis” data. It should be observed that this method has no dependence on the presence of BatchNormalization layers in the FP32 model, thus overcoming the downside of ZeroQ. It is also evaluated that, for the models with BatchNormalization layers, incorporating the proposed “classloss” functionality to the distilled data generation algorithm as in ZeroQ results in improved accuracy. The proposed “retrosynthesis” data generation method is detailed in Algorithm 1. Given, a fully trained FP32 model and a class of interest, our aim is to empirically generate an image that is representative of the class in terms of the model class score. More formally, let be the softmax of the class , computed by the final layer of the model for an image . Thus, the aim is, to generate an image such that, this image when passed to the model will give the highest softmax value for class .
The “retrosynthesis” data generation for a target class starts with random data of Gaussian distribution and performing a forward pass on to obtain intermediate activations and output labels. Then we calculate the aggregated loss that occurs between, stored batch norm statistics and the intermediate activation statistics (), the Gaussian loss (), and the class loss (
) between the output of the forward pass and our target output. The
loss formulation as in equation 1 is used for and calculation whereas mean squared error is used to compute. The calculated loss is then backpropagated till convergence thus generating a batch of retrosynthesis data for a class
. The same algorithm is extendable to generate the retrosynthesis data for all classes as well.(1) 
Where is the computed loss, , , and ,
are the mean and standard deviation of the
activation layer and the BatchNormalization respectively .By observing the sample visual representation of the retrosynthesis data comparing against the random data depicted in Fig. 1, it is obvious that the retrosynthesis data captures relevant features from the respective image classes in a DNN understandable format. Hence using the retrosynthesis data for the estimation of activation ranges achieves better accuracy as compared to using random data. Also, it outperforms the stateoftheart datafree quantization methods (Nagel et al., 2019; Cai et al., 2020) with a good accuracy margin when validated on models with and without BatchNormalization layers. Therefore, the same data generation technique is used in the other two proposed quantization methods (b) and (c) as well.
3.2 Hybrid Quantization
In any quantization method, to map the range of floatingpoint values to integer values, parameters such as scale and zero point are needed. These parameters can be calculated either for perlayer of the model or perchannel in each layer of the model. The former is referred to as ‘pertensor/perlayer quantization’ while the latter is referred to as ‘perchannel quantization’. Perchannel quantization is preferred over pertensor in many cases because it is capable of handling the scenarios where weight distribution varies widely among different channels in a particular layer. However, the major drawback of this method is, it is not supported by all hardware (Nagel et al., 2019) and also it needs to store scale and zero point parameters for every channel thus creating an additional computational and memory overhead. On the other hand, pertensor quantization which is more hardware friendly suffers from significant accuracy loss, mainly at layers where the weight distribution varies significantly across the channels of the layer, and the same error will be further propagated down to consecutive layers of the model resulting in increased accuracy degradation. In the majority of the cases, the number of such layers present in a model is very few, for example in the case of MobileNetV2 only very few depthwise separable layers show significant weight variations across channels which result in huge accuracy loss (Nagel et al., 2019). To compensate such accuracy loss perchannel quantization methods are preferred even though they are not hardware friendly and computationally expensive. Hence, in the proposed “Hybrid Quantization” technique we determined the sensitivity of each layer corresponding to both perchannel and pertensor quantization schemes and observe the loss behavior at different layers of the model. Thereby we identify the layers which are largely sensitive to pertensor (which has significant loss of accuracy) and then quantize only these layers using the perchannel scheme while quantizing the remaining less sensitive layers with the pertensor scheme. For the layer sensitivity estimation KLdivergence (KLD) is calculated between the outputs of the original FP32 model and the FP32 model wherein the th layer is quantized using pertensor and perchannel schemes. The computed layer sensitivity is then compared against a threshold value () in order to determine whether a layer is suitable to be quantized using the pertensor or perchannel scheme. This process is repeated for all the layers in the model.
The proposed Hybrid Quantization scheme can be utilized for a couple of benefits, one is for accuracy improvement and the other is for inference time optimization. For accuracy improvement, the threshold value has to be set to zero, . By doing this, a hybrid quantization model with a unique combination of perchannel and pertensor quantized layers is achieved such that, the accuracy is improved as compared to a fully perchannel quantized model and in some cases also FP32 model. For inference time optimization the threshold value
is determined heuristically by observing the loss behavior of each layer that aims to generate a model with the hybrid approach, having most of the layers quantized with the pertensor scheme and the remaining few sensitive layers quantized with the perchannel scheme. In other words, we try to create a hybrid quantized model as close as possible to the fully pertensor quantized model so that the inference is faster with the constraint of accuracy being similar to the perchannel approach. This resulted in models where perchannel quantization is chosen for the layers which are very sensitive to pertensor quantization. For instance, in case of
ResNet18 model, fully pertensor quantization accuracy is and fully perchannel accuracy is . By performing the sensitivity analysis of each layer, we observe that only the second convolution layer is sensitive to pertensor quantization because of the huge variation in weight distribution across channels of that layer. Hence, by applying perchannel quantization only to this layer and pertensor quantization to all the other layers we achieved reduction in inference time. The proposed method is explained in detail in Algorithm 2. For every layer in the model, we find an auxiliary model = where, the step quantizes the th layer of the model using quant_scheme, where could be perchannel or pertensor while keeping all other layers same as the original FP32 weight values. To find the sensitivity of a layer, we find the between the  and the original FP32 model outputs. If the sensitivity difference between perchannel and pertensor is greater than the threshold value , we apply perchannel quantization to that layer else we apply pertensor quantization. The empirical results with this method are detailed in section 4.2.3.3 NonUniform Quantization
In the uniform quantization method, the first step is to segregate the entire weights range of the given FP32 model into groups of equal width, where ‘K’ is bit precision chosen for the quantization, like K = 8, 6, 4, etc. Since we have a total of bins or steps available for quantization, the weights in each group are assigned to a step or bin. The obvious downside with this approach is, even though, the number of weights present in each group is different, an equal number of steps are assigned to each group. From the example weight distribution plot shown in Fig. 2 it is evident that the number of weights and their range in ‘groupm’ is very dense and spread across the entire range, whereas they are very sparse and also concentrated within a very specific range value in ‘groupn’. In the uniform quantization approach since an equal number of steps are assigned to each group, unfortunately, all the widely distributed weights in ‘groupm’ are quantized to a single value, whereas the sparse weights present in ‘groupn’ are also quantized to a single value. Hence it is not possible to accurately dequantize the weights in ‘groupm’, which leads to accuracy loss. Although a uniform quantization scheme seems to be a simpler approach it is not optimal. A possible scenario is described in Fig. 2
, there may exist many such scenarios in realtime models. Also, in cases where the weight distribution has outliers, uniform quantization tends to perform badly as it ends up in assigning too many steps even for groups with very few outlier weights. In such cases, it is reasonable to assign more steps to the groups with more number of weights and fewer steps to the groups with less number of weights. With this analogy, in the proposed NonUniform Quantization method, first the entire weights range is divided into three clusters using Interquartile Range (IQR) Outlier Detection Technique, and then assign a variable number of steps for each cluster of weights. Later, the quantization process for the weights present in each cluster is performed similar to the uniform quantization method, by considering the steps allocated for that respective cluster as the total number of steps.
With extensive experiments, it is observed that assigning the number of steps to a group by considering just the number of weights present in the group, while ignoring the range, results in accuracy degradation, since there may be more number of weights in a smaller range and vice versa. Therefore it is preferable to consider both the number of weights and the range of the group for assigning the number of steps for a particular group. The effectiveness of this proposed method is graphically demonstrated for a sample layer of the ResNet18 model in Fig. 3 in the appendix A.1. By observing the three weight plots it is evident that the quantized weight distribution using the proposed NonUniform Quantization method is more identical to FP32 distribution, unlike the uniform quantization method and hence it achieves a better quantized model. Also, it should be noted that the proposed NonUniform quantization method is a fully pertensor based method.
4 Experimental results
4.1 Results for quantization method using retrosynthesis data
Table 1 shows the benefits of quantization using the ‘retrosynthesis’ data 3.1 against stateoftheart methods. In the case of models with BatchNormalization layers, the proposed method achieves better accuracy against DFQ and a marginal improvement against ZeroQ. Also, our method outperformed FP32 accuracy in the case of ResNet18 and ResNet50. In the case of models without BatchNormalization layers such as Alexnet and ISONet (Qi et al., 2020) the proposed method outperformed the ZeroQ method by on the ImageNet dataset.
Model  BN  DFQ  ZeroQ  Proposed method  FP32 

resnet18  ✓  69.7  71.42  71.48  71.47 
resnet50  ✓  77.67  77.67  77.74  77.72 
mobilenetV2  ✓  71.2  72.91  72.94  73.03 
Alexnet  ✗    55.91  56.39  56.55 
ISONet18  ✗    65.93  67.67  67.94 
ISONet34  ✗    67.60  69.91  70.45 
ISONet50  ✗    67.91  70.15  70.73 
ISONet101  ✗    67.52  69.87  70.38 
Table 2 demonstrates the effectiveness of the proposed retrosynthesis data for lowprecision (weights quantized to 6bit and the activations quantized to 8bit (W6A8)). From the results, it is evident that the proposed method outperformed the ZeroQ method.
Model  ZeroQ  proposed method  FP32 

ResNet18  70.91  
Resnet50  77.30  
MobileNetV2  70.34 
The efficiency of the proposed quantization method for lower bit precision on the CIFAR10 dataset for ResNet20 and ResNet56 models is depicted in Table 3 below. From the results, it is evident that the proposed method outperforms the stateoftheart methods even for lower precision 8, 6, and 4 bit weights with 8 bit activations.
W8A8  W6A8  W4A8  

Model  ZeroQ 

ZeroQ 

ZeroQ 


ResNet20  93.91  93.93  93.78  93.81  90.87  90.92  
ResNet56  95.27  95.44  95.20  95.34  93.09  93.13 
4.2 Results for Hybrid Quantization method
Table 4 demonstrates the benefits of the proposed Hybrid Quantization method in two folds, one is for accuracy improvement and the other is for the reduction in inference time. From the results, it is observed that the accuracy is improved for all the models as compared to the perchannel scheme. It should also be observed that the proposed method outperformed FP32 accuracy for ResNet18 and ResNet50. Also by applying the perchannel (PC) quantization scheme to very few sensitive layers as shown in “No. of PC layers” column of Table 4, and applying the pertensor (PT) scheme to remaining layers, the proposed method optimizes inference time by while maintaining a very minimal accuracy degradation against the fully perchannel scheme.
Model  PC  PT 


FP32 




Resnet18  71.48  69.7  71.60  71.57  71.47  1  20.79  
Resnet50  77.74  77.1  77.77  77.46  77.72  2  17.60  
MobilenetV2  72.94  71.2  72.95  72.77  73.03  4  8.44 
4.3 Results for Nonuniform Quantization
Since the proposed NonUniform Quantization method is a fully pertensor based method, to quantitatively demonstrate its effect, we choose to compare the models quantized using this method against the fully pertensor based uniform quantization method. The results with this approach depicted in Table 5, accuracy improvement of 1% is evident for the ResNet18 model.
Model/Method 


FP32  

ResNet18  70.60  
ResNet50  77.30 
5 Conclusion and Future scope
This paper proposes a data independent post training quantization scheme using “retro sysnthesis” data, that does not depend on the BatchNormalization layer statistics and outperforms the stateoftheart methods in accuracy. Two futuristic post training quantization methods are also discussed namely “Hybrid Quantization” and “NonUniform Quantization” which resulted in better accuracy and inference time as compared to the stateoftheart methods. These two methods unleashes a lot of scope for future research in similar lines. Also in future more experiments can be done on lower precision quantization such as 6bit, 4bit and 2bit precision using these proposed approaches.
References
 Post training 4bit quantization of convolutional networks for rapiddeployment. In Advances in Neural Information Processing Systems, pp. 7950–7958. Cited by: §1, §2.2.
 CAT: compressionaware training for bandwidth reduction. arXiv preprint arXiv:1909.11481. Cited by: §1, §2.1.

Zeroq: a novel zero shot quantization framework.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 13169–13178. Cited by: §1, §2.2, §3.1.1, §3.1.1, §3.1.1.  Lowbit quantization of neural networks for efficient inference. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3009–3018. Cited by: §1, §2.2.
 Architecture of the hexagon™ 680 dsp for mobile imaging and computer vision. In 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–26. Cited by: §1.
 Binaryconnect: training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pp. 3123–3131. Cited by: §1, §2.1.
 Imagenet: a largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §1.
 Hawq: hessian aware quantization of neural networks with mixedprecision. In Proceedings of the IEEE International Conference on Computer Vision, pp. 293–302. Cited by: §1.
 Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1.
 Quantization and training of neural networks for efficient integerarithmeticonly inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713. Cited by: §1, §2.1.
 Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342. Cited by: §1.

Imagenet classification with deep convolutional neural networks
. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.  Fully quantized network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2810–2819. Cited by: §1, §1.
 Experimental determination of precision requirements for backpropagation training of artificial neural networks. In Proc. Second Int’l. Conf. Microelectronics for Neural Networks,, pp. 9–16. Cited by: §1, §1.
 Datafree quantization through weight equalization and bias correction. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1325–1334. Cited by: §1, §2.2, §3.1.1, §3.2.
 Deep isometric learning for visual recognition. arXiv preprint arXiv:2006.16992. Cited by: §1, §4.1.
 Xnornet: imagenet classification using binary convolutional neural networks. In European conference on computer vision, pp. 525–542. Cited by: §1.
 Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520. Cited by: §1.
 Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828. Cited by: §1, §1.
 Lqnets: learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pp. 365–382. Cited by: §1, §2.1.
 Improving neural network quantization without retraining using outlier channel splitting. arXiv preprint arXiv:1901.09504. Cited by: §2.2.
 Incremental network quantization: towards lossless cnns with lowprecision weights. arXiv preprint arXiv:1702.03044. Cited by: §1.
 Trained ternary quantization. arXiv preprint arXiv:1612.01064. Cited by: §1, §2.1.
Appendix A Appendix
a.1 NonUniform Quantization Method
a.1.1 Clustering mechanism
The IQR of a range of values is defined as the difference between the third and first quartiles
, and respectively. Each quartile is the median of the data calculated as follows. Given, an evenor odd
number of values, the first quartile is the median of the smallest values and the third quartile is the median of the largest values. The second quartile is same as the ordinary median. Outliers here are defined as the observations that fall below the range or above the range . This approach results in grouping the values into three clusters , , and with ranges , , and respectively.With extensive experiments it is observed that, assigning the number of steps to a group by considering just the number of weights present in the group, while ignoring the range, results in accuracy degradation, since there may be more number of weights in a smaller range and vice versa. Therefore it is preferable to consider both number of weights and the range of the group for assigning the number of steps for a particular group. With this goal we arrived at the number of steps allocation methodology as explained below in detail.
a.1.2 Number of steps allocation method for each group
Suppose , and represent the number of weights and the range of th cluster respectively, then the number of steps allocated for the th cluster is directly proportional to and as shown in equation 2 below.
(2) 
Thus, the number of steps allocated for th cluster can be calculated from equation 2 by deriving the proportionality constant based on the constraint , where is the quantization bit precision chosen. So, using this bin allocation method we assign the number of bins to each cluster. Once the number of steps are allocated for each cluster the quantization is performed on each cluster to obtain the quantized weights.
a.2 Sensitivity analysis for pertensor and perchannel quantization schemes
From the sensitivity plot in Fig. 4 it is very clear that only few layers in MobileNetV2 model are very sensitive for pertensor scheme and other layers are equally sensitive to either of the schemes. Hence we can achieve better accuracy by just quantizing those few sensitive layers using perchannel scheme and remaining layers using pertensor scheme.
a.3 Sensitivity analysis of Ground truth data, random data and the proposed retrosynthesis data
From the sensitivity plot inFig. 5, it is evident that there is a clear match between the layer sensitivity index plots of the proposed retrosynthesis data (redplot) and the ground truth data (green plot) whereas huge deviation is observed in case of random data (blue plot). Hence it can be concluded that the proposed retrosynthesis data generation scheme can generate data with similar characteristics as that of ground truth data and is more effective as compared to random data.
Comments
There are no comments yet.