Vector Quantization as Sparse Least Square Optimization

03/01/2018 ∙ by Chen Wang, et al. ∙ Sichuan University 0

Vector quantization aims to form new vectors/matrices with shared values close to the original. It could compress data with acceptable information loss, and could be of great usefulness in areas like Image Processing, Pattern Recognition and Machine Learning. In recent years, the importance of quantization has been soaring as it has been discovered huge potentials in deploying practical neural networks, which is among one of the most popular research topics. Conventional vector quantization methods usually suffer from their own flaws: hand-coding domain rules quantization could produce poor results when encountering complex data, and clustering-based algorithms have the problem of inexact solution and high time consumption. In this paper, we explored vector quantization problem from a new perspective of sparse least square optimization and designed multiple algorithms with their program implementations. Specifically, deriving from a sparse form of coefficient matrix, three types of sparse least squares, with l_0, l_1, and generalized l_1 + l_2 penalizations, are designed and implemented respectively. In addition, to produce quantization results with given amount of quantized values(instead of penalization coefficient λ), this paper proposed a cluster-based least square quantization method, which could also be regarded as an improvement of information preservation of conventional clustering algorithm. The algorithms were tested on various data and tasks and their computational properties were analyzed. The paper offers a new perspective to probe the area of vector quantization, while the algorithms proposed could provide more appropriate options for quantization tasks under different circumstances.



There are no comments yet.


page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Vector quantization modifies vectors/matrices to produce new ones with shared values and acceptable difference from the originals. By reducing the number of values in a vector/matrix, quantization could compress information and therefore reduce the processing and storage cost. Quantization has been found great usefulness in areas like image processingQuantizationImageProcessing, speech recognitionQuantizationSpeechRecognition, and Machine Learning techniquesQuantizationNearestNeighbour. And recently, with the growing research interests in deploying neural networks on resource-scarce edge devices, vector quantization techniques have grasped considerable attention because of its ability in reducing network storage sizeSoftWeightSharing2011SongHan2015NetworkCompressionEfficientProcessingDL_Tutorial. The original idea of this research also comes from the explorations in the quantization of embedded neural networks, although the proposed algorithms could be utilized for general purpose.

Conventional vector quantization methods usually utilize hand-coding domain rules and/or clustering-based methods to quantize the valuesEfficientProcessingDL_Tutorial

. Common approaches include uniform quantization, logarithm quantization and k-means clustering quantization. While these approaches are straightforward and convenient to use, they frequently suffer from the several problems: 1. inadequately-reliable performance. Domain quantization usually produce low-quality outcomes, and it will lead to significant information loss; k-means quantization could usually produce high-quality results for the overall results, but it could include empty clusters or irrational values (say, out-of-range values) because of bad random initialization; 2. inexact results. The results of domain quantization depend on the choice of domain, and k-means clustering is a heuristic method that cannot guarantee the optimal solution and the results are subject to random initialization; and 3. high time-consumption. While K-means clustering method could usually lead to solution with preferable information loss, the time consumption of this method is significant, especially when the amount of post-quantization values is large. Also, to get the optimal solution of K-means, one would be needing to run the algorithm for several times to pick the optimum, which further enhances the time complexity.

In this paper, the quantization algorithms are examined from another perspective: sparse least square optimization. We consider the sparse-inducing properties of and norm-regularization terms and designed algorithms to minimize the difference between the original and the sparsely-constructed vectors. To optimize the performance based on its computational properties, an alternative version with both and norm-regularization is also explored and implemented. Furthermore, to design a least square quantization method that could produce results by indicating certain amounts of quantization amounts (instead of the value of penalization coefficient ), two additional algorithms are designed: the first one is introduced by following the idea of iterative least square optimization, and the second one is accomplished by combining k-means clustering with least square optimization. Interestingly, the second approach could also be interpreted as an improvement of the conventional k-means clustering quantization method. Experimental results illustrate that the performance of the proposed algorithms are competitive. For the -based algorithms, they are especially favorable in terms of running-time complexity, which makes them particularly useful when the required quantization amounts is not in a trivial scale. Moreover, the results provided by the proposed methods are more exact and relatively independent from random seed.

Notice that our algorithms are very similar to sparse compression in signal processingSparseCompressionTypical, but there are clear distinctions between them: in quantization problem the constructed vector should have shared values, while in sparse signal processing it only demands the sparse vector to be able to produce a vector close to the original signal.

The rest of the paper is arranged as follows: section 2 will be introducing related work in the field, including research outcomes from quantization algorithms and sparse coding processing; section 3 will be introducing our designed algorithms mathematically and analyze their optimization schemes and computational properties; the experimental results of the algorithms are shown, compared and analyzed in section 4; and finally, a conclusion is drawn in section 5 and future research topics related to this paper is discussed.

2 Related Work

The goal of vector quantization is relatively straightforward to achieve in usual situations, thus there are not many complicated methods to address this task. EfficientProcessingDL_Tutorial provides a brief survey for basic methods of vector quantization, which includes domain-based uniform and logarithm quantizations and clustering-based techniques such like K-means quantization. The idea of quantizing vectors with clustering methods provides us an open skeleton, in which we could plug novel clustering techniques to produce new quantization algorithmsEmbeddedDeepLearningThesis. 1992weightsharing offers an alternative technique to use Mixture of Gaussian method to perform quantization specifically for neural networks, and a recent paper 2017_compression_1992idea re-examined this idea and formally designed it to be used for neural network compression and provided a mathematical justification for it. Other techniques to perform vector quantization include DivergenceQuantization, which utilized divergence instead of distance metric as the measurement and derived an algorithms based on it; UseNeuralNetworkToQuantize, which designed a neural network to perform vector quantization; and PairwiseSimilarityQuantization, which considered pairwise dis-similarity as the metric for quantization. To the best of our knowledge, hitherto there has not been publications discussing quantization algorithms as sparse least square optimization, thus our paper should be the pioneering work in the area.

There are plenty of academic publication discussing the applications of vector quantization, and recently, academic projects lying in this area have been growingly connected to neural networks, as the ability of quantization in compressing model size is being exploited in implementing edge-device neural networks. SongHan2015NetworkCompression proposed a general pipeline to reduce model storage, and an important part of it is quantization. And similar with 2017_compression_1992idea mentioned above, FixedPointQuantizationNetwork specifically designed a quantization method for neural networks. VecQuantizationNNcompressing directly utilized existed vector quantization techniques in network compressing and illustrated that the technique could be ideal for modifying neural network precisions. As mentioned in the introductory section, the origin of the algorithms was also the task to solve neural network weight-sharing problem, and a set of neural network weight parameters is one of the test data-sets in the experimental section.

The algorithm proposed in this work has significant similarity with compressive sensing (sparse signal processing) in terms of regularization idea and optimization target functions SparseCompressionTypicalSparseCompressionTypical2SparseCompressionTypical3. Typical approaches to induce sparsity in compressive sensing algorithms are to introduce norml0SignalProcessing, norm LassoRetirevePaperLassoCompressiveSensing and/or norm l21CompressiveSensing to the target optimization functions. And similarly, our algorithms also utilize these techniques to perform sparsity. Meanwhile, since norm is not everywhere differentiable and norm is not even convex, there also exist plenties of algorithms devoted to efficiently solve the optimization problems l1OptimizationAlgorithml0OptimizationAlgorithmCoordinateDescentPaper. In this paper, we use coordinate descent method for optimization, and the newly-proposed Fast Best Subset Selection l0Learn2018Hazimeh (will be called ’ learn’ in this paper) to optimize target functions.

3 Quantization Algorithms

3.1 Problem Setting

The vector quantization task could be described as follows: suppose we have a vector that has distinct values. Now we intend to find a vector with distinct values, where . In some case, it could also be set as a more strict condition , which is a given value we require less than . Then by denoting the difference between the original vector and the constructed one with norm, our original target function could be formed as:

subject to

Where means the distinct values of a vector. And notice that here we only consider in 1-dimension vector form. If the data is coded in a matrix, such like neural network parameters and images, then we could simply ’flatten’ the matrix into a vector to perform quantization, and then turn it back to the original shape.

3.2 Original Algorithm with Norm Regularization

To begin with, we firstly change into , which we directly operate on distinct values and recover the full vector by indexing later. And to fulfill the purpose of quantization, we will be needing to construct a new vector with length and distinct values. We could assume there exist a ’base’ vector with shape , where is a given number, and the is generated by

through linear transformation. Notice that there should be

, as it will otherwise be unreasonable to project vection in to with linear transformation. Then suppose the linear transformation matrix is (with shape ), the relationship between, , and will be:

And combining this expression with equation 1, we could get the new optimzation target:


However, optimizing 2 could not bring the property of sparsity/value-sharing. Moreover, with two targets matrices, the optimization will be difficult to perform. To tackle the problem, here we introduce another matrix , with , and each entry of should be:


By designing the matrix with this addition form, we could be able to achieve ’same values’ when there exist . Equation 3 could be achieved by designing matrix as a lower-triangular matrix (with main diagonal on):

And now we have two matrices, and , to control the constructed vector. Intuitively, if we add and/or norm regularization on the target function, it will be possible for us to produce vector with shared values. Here we consider in the first place because it is continuous and convex. Our optimization target would become:


Now the property of sparsity would be able to introduced if 4 is optimized. However, the there are still two target matrices in the target function, which makes the optimization problem non-trivial. To determine the system in a convenient way, we will be needing some approximations. Here, we will fix and only optimize . Now suppose , we could pose the matrix as follows:

And we will get a vector , in the following fomat:

One might doubt whether the optimization would still be accurate if we fix and only optimize . However, we could see that with transformation matrix been set to full rank, the linear space is not changed. This implies with proper optimization, we could still reach the optimal solution. Notice that the above transformation matrix is given under the assumption of , and with scenario, we could leave some of the rows of as and keep the rank as . Then, with the above derivation, the problem is transferred into a sparse least-square problem, without losing its correctness and rationality. And now, our optimization target becomes:


And this is equivalent to form a vector of and a lower-triangular matrix :

And in practice, we could simply use the value of original unique-value vector to fill the value of . This scheme could provide a convenient method to first starting values; meanwhile, from the perspective of numerical computation, it also offers a warm start for the optimization algorithm. The final optimization target with regularization will be as follows:


Equation 6 is very similar to the optimization target in compressive sensing. Nevertheless, there are two significant differences: firstly, the root of the target function and the derivations are different from those in compressive sensing; and secondly, the produced vector will be a quantized vector instead of simply a sparse vector close to the original as in compressive sensing. The optimization of this formula is not very hard: the target function is a typical LASSO problem, and it is solved by Coordinate Descent with LASSO solvers in SK-learn in our program scikit-learn.

It is noticeable that the ’raw result’ of equation 6 could still be improved. As the optimization should satisfy both sparsity and loss, the values in the solved vector might not optimally reduce the difference between and the constructed vector. Thus, we consider to solve the least square with positions that to further improve the result. Mathematically, it could be denoted as to use matrix to perform the least square optimization, where the matrix should be:


Which means, the will pick the columns with corresponding non-zeros indexes in . The optimization target will therefore be:


The target function 8 is in a everywhere-differentiable least-square form, thus it could be direct solved analytically:


Where would be vector, where is the number of distinct values. The values of could be put back into the vector to get the final result :


And finally, the quantized vector could be constructed by multiplying the vector with the ’based transformation’ matrix :


The overall quantization method with regularization could be denoted as algorithm 1. In the experiment section, the results of -based algorithms with and without least square to optimize will be shown separately.

Input: Original vector
      Output: Quantized vector

2:Optimize target function 6 with Coordinate Descent, get
3:Retrieve with equation 7
4:Compute with equation 9
5:Compute vector with equation 10
6:Compute the desired vector with equation 11
Algorithm 1 Quantization with Least Square

3.3 Regularization Algorithm and Regularization Algorithm

One possible improvement of algorithm 1 will be to add a negative penalization term to the original. The optimization target could be denoted with the following formula:


Equation 12 is similar with Elastic Netzou2005ElasticNet, but with a negative coefficient. The intuition behind this scheme is that optimization often leads to values with small quantities before it could reach . Thus, adding the ’negative norm’ could be regarded as a relaxation for the original least square to find sparse index while keep the non-zero values on their original level. More formally, if we inspect the mathematical expressions under coordinate descent with shrinkage model, the LASSO optimization could be expressed as:


Where denotes the coordinate to be optimized and means the shrinkage operator generally used in optimization. In comparison, the Coordinate Descent for negative penalization will be:


Which means, for the combined optimization, the proximal been projected will be larger as the denominator of will be subtracting a positive value. Also, the absolute value of the threshold to be shrinkaged as is higher, making the vector easier to achieve sparsity. There are rare, if any, integrated Lasso optimization packages that permits the parameter setting like equation 12. Thus, this algorithm is optimized by the coordinate descent method implemented by the authors.

Another variation of the algorithm based on 6 could be to replace the norm with norm. In the algorithm, instead of directly add a penalization term, we explicitly set limitations of the number of distinct values:

subject to

Where is a number that we manually set, which indicate the upper bound of the amount of distinct values. The optimization of norm is NP-hard L0NPhardPaper, thus we could only be solve by heuristic-based algorithms up to now. In this paper we utilize the recent-proposed ’L0Learn’ as mentioned above l0Learn2018Hazimeh, and support the number of up to 100. However, one should notice that the optimization method is not universal, which means, it could not reach arbitrary required number of values under our settings. This is also the reason could only serve as a ’upper bound’ of the quantization amounts.

3.4 Iterative Quantization with Regularization

One major drawback of algorithm 1 and the improved -based algorithm is that they could not explicitly indicate the number of demanded distinct values (quantization amounts). To obtain an algorithm capable to explicitly specify amount of distinct values, an iterative method is designed. The paradigm of the algorithm is straightforward: it starts with a small value and gradually improve the quantities of it, until the amount of non-zero values of the optimized could satisfy the conditions. Specifically, at each iteration, the algorithm will firstly follow the procedure of algorithm 1 with current . After obtaining the optimized of the -th iteration, it will be put back to the target function and the will be obtained with the algorithm 1 at the -th iteration.

The iterative quantization method could be decribed as algorithm 2.

Input: Original vector , Desired number of distinct value
      Output: Quantized vector

2: with a small number
4:while  do
6:     Optimize target function 6 with Coordinate Descent, with ,
7:     Retrieve with equation 7
8:     Compute with equation 9
9:     Compute vector with equation 10
10:Compute the desired vector with equation 11 and final
Algorithm 2 Quantization with Iterative Optimization

3.5 Clustering-based Least Square Sparse Optimization

Algorithm 2 could provide quantization results with given amount of distinct values . However, since the algorithm could be sensitive to the change of , in practice it might fail to optimize to exact values but provide values instead. Similarly, for the algorithm with equation 15, we could only set the upper bound of the amount of distinct values and there are no guarantees for how many distinct values will finally be produced. To further improve the capacity of quantization algorithm, here we discuss a general target that could produce definite amount of values with least square form, and design a basic method based on the combination of K-means clustering and least square optimization.

Suppose we want to construct a vector with distinct values now, and here we directly set the parameter vector to a vector with entries ( shape). And now we need to use a transformation matrix to transform the -value vector into a vector while maintaining the values constructed. One possible scheme could be to use a transformation matrix

with one-hot encoded at each row. Under this scheme, the optimization target will be as following:

subject to

The constraint of in equation 16 means that for each row in the matrix, it will be 1 if we want the corresponding value to be belonging to this cluster; otherwise, it will 0. An alternative expression of the matrix will be:


here means the group(cluster) which the th value belongs to. The optimization would be difficult to perform with two optimization variables changing simultaneously. Here, we propose one simple method to deal with this problem: once could first use clustering methods (e.g. K-means) to obtain , and then we could get matrix . Then the target function could be further transferred into the following expression:


And notice that since the ’index of non-zeros’ of the algorithm is obtained through clustering, the rank of the is no longer a problem of concern. Hence, one could simply compute the value to fill all of the non-zero entries. Based on the above settings, the matrix would be the follows:

The optimization of equation 18

is a typical linear regression problem and could be solved in closed form with

time complexity or even faster with approximations dhillon2013LeastSquareApproximation. By taking derivatives and set it to 0, we could obtain the solution:


The algorithm of the clustering-based least square method could be given as algorithm 3.One interesting point is, from the perspective of clustering methods, algorithm 3 could be viewed as an improvement of K-means clustering quantization. In conventional clustering-based quantization algorithm, the representation of a certain cluster of values is simply given as the mean of the cluster. In contrast, for the proposed algorithm, it alternatively compute the value of the cluster that produce the smallest least square distance from the original.

Notice that there should exist multiple schemes to solve the optimization problem proposed by equation 16, and the method proposed here is only a basic solution. The exploration of solving this task could be our future research concentrations.

Input: Original vector , Desired number of distinct value
      Output: Quantized vector

2:Perform K-means with clusters, get model
3:Apply to get the prediction of each data
4:Fill the corresponding columns with 1 for matrix according to equation 17
5:Optimize according to equation 18. The base value of could be
6:Compute the desired vector with equation
Algorithm 3 Quantization with K-means Based Least Square

3.6 Time Complexity Analysis

As it has been mentioned above, the proposed algorithm might not outperform K means-based quantization method in some cases. However, the algorithm could provide exact and interpretable solutions without out-of-range values; furthermore, the computational time of the proposed method could be favorable. With block coordinate descent, the time complexity of convex Lasso regression would be , where and are the magnitude and number of dimensions of data, respectively, and is the number of iterations. Under the setting of algorithm 1, the complexity will be , where is the number of iterations and is the length of . hong2017ConvergenceCoordinate argues that the iteration of convergence could be sub-linear with for strongly convex situations, where is the error of the result. In practice, for the optimization of the proposed algorithm, optimum could usually be found in acceptable iterations.

In contrast, the time complexity of K-mean algorithm would be , where is the number of iterations; is the number of cluster centroids, which is also the desired number of distinct values; and is the length of . The complexity growth linearly with respect to , and this time complexity could be problematic for quantization of moderate to large number of distinct values. For instance, the requirement could be to quantize the bit-width to the nearest to reduce memory cost yet preserve most of the information. Consequently, in such cases, the proposed algorithm will be much favorable than existed K-means ones.

4 Experimental Results

To verify the rationality and effectiveness of the proposed methods, three types of data, namely neural network fully-connected layer weight matrix, MNIST image, and artificially generated data sampled from different distributions, are employed to obtain experimental results for illustrations and analysis. The performances are evaluated mostly based on quantization information loss and time-consumption. The information loss is denoted by loss between the original vector and the quantized vector for MNIST image and artificially-generated data, and by post-quantization recognition accuracy for the neural network test. Notice that in some certain scenarios, a high loss may not necessarily mean a deficient performance. For example, in image quantization, the loss could be dominated by few values far away from the original, and the image with higher loss might actually possess an overall more favorable quality. Thus, for the quantization of MNIST images, the post-quantization results are plotted as images in figure 5 to assistant one to evaluate the performances from a human intuition.

Another point to notice is that in the experiments of MNIST and artificially-generated data, the post-quantization outputs are processed by a ’hard-Sigmoid’ function before they are utilized to compute the

information loss. The ’hard-Sigmoid’ function is denoted as follows:


Where and are the ’floor’ and ’ceiling’ of the range of values. The reason for this function to be implemented is that in many situations, the quantization results must lie in a certain range. For example, MNIST quantization values must be in , otherwise it will not be recognized in practical image storage/displaying systems. Applying the function could avoid out-of-range values that might reduce the loss in a prohibited way.

Major experiments in this section includes the comparison between k-means method, the proposed quantization method (equation 6), with least square method (algorithm 1), and clustering-based least square method (algorithm 3). The performance comparison between sole -based and -based quantization is examined with a different optimization program implemented by the authors. Furthermore, the performances of -based optimization method (equation 15) is implemented and tested separately with the previously-mentioned optimization softwarel0Learn2018Hazimeh written in R. The optimization and k-means-based methods are accomplished via Lasso and K-means in SK-learnscikit-learn, respectively, and the codes are both optimized to the possible extent as far as the authors concern. Notice that since the iterative least square method in algorithm 2 will in general provide results no difference with algorithm 1 for the same number of quantization amounts, the performance of this algorithm is not tested in the experiment.

Experimental results shows that in general, 1. The -based quantization method will lead to a higher information loss comparing to k-means clustering, but the running time could be considerably reduced for medium-size data; 2. After applying least square to -based sparse quantization method, the performance could be much more competitive and the information loss will be in the same level of k-means, while the running time is still significantly quicker than k-means; 3. The clustering-based least square method could perform slightly better than k-means, and it does not take significant longer running time; 4. the combined optimization could induce fewer distinct values(quantization amounts) with the same of sole method, but the algorithm is sensitive with the value of ; and finally, 5. -based quantization method (under the optimization algorithm provided in l0Learn2018Hazimeh) could provide good performance within acceptable running time, but it could not universally produce quantization results (some amounts of quantization amounts are irretrievable) and the optimization could fail under some circumstances (especially when the demanded quantization amount is large).

4.1 Neural Network Weight Matrix

To test the effectiveness of our methods on neural network quantization (which is also the original problem inspired this paper), a 5-layer

fully-connected network for MNIST image recognition is introduced. The network is trained with stochastic gradient descent and the original accuracy on training and testing data are

and , respectively. In the experiments, the last layer () matrix is processed by the quantization method and the weights are replaced by the post-quantization matrix. Figure 1 illustrates the change of accuracy on training and testing data with respect to different number of quantization values (cluster numbers) for , least square, k-means and cluster-based least square methods. In addition, the running time of each method is demonstrated in the figure. And since the accuracy of MNIST recognition is fairly robust against quantization, figure 2 further provides a figure zooming in the area that the accuracy starts to drop with a higher resolution of quantization amounts.

Figure 1: Post-quantization accuracy on training and testing data with respect to quantization amounts, and their running time. The x-axis for the plots stands for quantization amounts(number of quantization), and the y-axis stands for accuracy for the first two plots and time in seconds for the third plot.
Figure 2: Post-quantization accuracy on training and testing data with respect to quantization amounts, focusing on the area where accuracy drops significantly. The x-axis stands for number of clusters, the y-axis stands for accuracy.

From the figures it could be found out that the proposed sparse regression-based methods could provide competitive competitive performances for the last-layer quantization of the neural network. The pure sparse regression provides a slightly more deficient performance than others, but in most cases the deficiency is negligible. Also, comparing to the time-consuming k-means based methods, method could provide an alternative solution with much quicker running time. In addition, if the least square method is applied to optimize the of as algorithm 1 does, the algorithm will be able to provide accuracy no less competitive than K-means method, while the running time will be remaining a low level. The clustering-based least square exact method could provide the overall optimal performance, especially in the area approaching to accuracy decrements. And the additional time consumed by the least square method is also negligible.

Figure 3 shows the values of the neural network last-layer quantization with different level of sparsity (quantization amounts). The full-column plot on the left is the weights solved by least square. It could be found that even for the least square solution without any additional regularization term, some of the values of could still hit the values around or equal to 0. The plots of the rest three columns represent the values of without least square, with least square and k-means based exact value methods, respectively. One would question why the clustering-based least square method, which is designed to possess dense vectors, could be plotted as sparse ones. The trick behind this plot is that the values of the dense vector could be assign to the starting point of each ’batch’ of same-values, which will be equivalent to the effects of sparse in the sparse regression-based algorithms. From the figures, it could be observed that the -based algorithms would mostly produce positive values for quantization, while the clustering-based ones would have positive and negative ones in similar amounts. In addition, despite the differences between values of vectors, the ’zero area’ produced by -based and clustering-based methods are almost the same.

Figure 3: The distribution of weights for neural network last-layer quantization

Moreover, to illustrate the effects of replacing with optimization, the performance of and optimization on the last-layer neural network data is illustrated in figure 4 with a coordinate descent optimization method implemented separately. Neither of the weights in the illustrated plots is optimized with least square. From the figure it could be observed that the method could in general lead to fewer quantization amounts for the same value, while produce a smaller loss comparing to the original. The experimental result also verifies the argument in section 3.3. However, despite the favorable performance, the algorithm could be sensitive to the value of , and it could be numerically very unstable if the value of is too large or too small. To improve the tolerance of in this algorithm could be a point of exploration in the future.

Figure 4: The accuracy with respect to values for sole and optimization, respectively. The value of is set to .

And finally, as for the quantization method, it could not find an non-trivial solution under the optimization method of l0Learn2018Hazimeh, which indicated the drawbacks of numerical unreliability of -based method.

4.2 MNIST Image Quantization

Vector quantization could be used in image processing to reduce the number of values and save storage cost. In this paper, a MNIST-digit image is chosen as an example to show the performance of image quantization of the proposed methods. The performances of two types of -based algorithms and two kinds of clustering-based methods are illustrated and compared in figure 5. From the figure we could find out that K-means and the clustering-based least square optimization could provide best performances in general, and there is no significant differences in execution time. with least square optimization of values could provide less norm difference loss than using solely. Meanwhile, in terms of running time, the -based optimization approaches could provide significant advantages over k-means-based methods. Another remark of the MNIST quantization is that the k–means methods sometime provide out-of-range values (not in the interval ) when the number of clusters are large, and the reason could be attributed to bad initialization. However, for the least-square optimization methods, this problem does not happen at least in the MNIST circumstance.

Figure 5: MNIST quantization comparison for method, and least square method, K-means method, K-means-based least square method
Figure 6: MNIST quantization results for

The quantization result of method is shown separately in figure 6. It could be observed from the figure that the qualities of images are in general high and the loss is competitive. However, the problem of ’not universal’ is also very significant from the figures: in many cases, the algorithm could only find the largest possible quantization amounts smaller than the given value. In addition, the algorithm often fail to find solution when the required quantization amount is large.

4.3 Artificially-Generated Data

To test the performance of the proposed algorithms on data of different distributions, three types of distributions, namely Mixture of Gaussian, Uniform and Single Gaussian, are employed to generate 500 samples from each of them. The samples are constrained in the range of , and the distributions of the data we used could be shown as figure 7. In practice, these three types of quantization could include most cases of data distribution.

Figure 7: The distribution of the 3 types of artificially-generated data

The experimental results for different quantization methods based on the above 3 types of data is shown in figure 8. From the figure, it could be observed that the information loss deficiency of without least square is more significant comparing to those for neural network and MNIST image. However, considering the running time reduced by the algorithm, the overall performance could still be regarded as merited. Also, if least square is employed to optimize the values of the vector, the information loss of approach will be only slightly higher than k-means based methods optimization, while the run-time will be still of great advantage comparing to the k-means branch of methods.

Figure 8: Quantization results of artificially-generated data. For each subplot, the left is the norm loss figure, and the right one is the running time. The x-axis stands for clusters, and the y-axis stands for l2 loss for the left figures, and for time in seconds for the right ones.

And finally, for the quantization method, again in the experiments it could not provide meaningful results for the quantization of the artificially-generated data. This problem further demonstrates the issue of using optimization despite its favourable information loss: the optimization of is NP-hard and with approximation algorithms, there could be a risk of failure in getting the results.

5 Conclusion

This paper proposed several least square form-based algorithms to better accomplish the task of vector quantization. The characteristics and computational properties of the proposed algorithms are examined, and the advantages, drawbacks, and the advantageous scenarios of each of them are analyzed. The algorithms are implemented and tested under the scenarios of neural network weight, MNIST image and artificially-generated data respectively, and the results are demonstrated and analyzed. Experimental results shows that the proposed algorithms have competitive performances in terms of information loss/preservation, and the favourable properties in running time could make the -based algorithms especially useful when processing large batch of medium-size data and the number of post-quantization values are not in a micro scale.

The paper made following major contributions: Firstly, it proposed several novel quantization methods with competitive information preservation ability and much more favorable time complexity, and the results provided by the -based algorithms are more exact comparing to those of k-means; Secondly, the paper innovated the pioneering work of using least square optimization to solve quantization, and it could bring huge research potentials in the area; And thirdly, the algorithms proposed in the paper could provide better options in practice, especially when the requirement is to perform quantization to restrict the number of distinct values to a certain level, which usually could be large.

In the future, the authors intend to continue to explore quantization algorithms with the idea demonstrated in the paper. We intend to research on finding better optimization methods, revisions of target functions and extend the quantization method into high-dimensional situations.

6 Acknowledgement

The authors would like to thank Mr. Feng Chen and Mr. Shixiong Wang from Northwestern Polytechnical University, China. Mr. Chen and Mr. Wang provided the authors many useful comments for the algorithms and programs of the research.

7 Declaration of interest

The authors declare no conflicts of interest.