1 Introduction
The means algorithm in its standard form (Lloyd’s algorithm) employs two steps to cluster data points of dimensions and initial cluster centers lloyd1982least . The expectation or assignment step assigns each point to its nearest cluster while the maximization or update step updates the cluster centers with the mean of the points belonging to each cluster. The means algorithm repeats the two steps until convergence, that is the assignments no longer change in an iteration .
means is one of the most widely used clustering algorithms, being included in a list of top 10 data mining algorithms WuKIS2008 . Its simplicity and general applicability vouch for its broad adoption. Unfortunately, its time complexity depends on the product between number of points , number of dimensions , number of clusters , and number of iterations . Thus, for large such values even a single iteration of the algorithm is very slow.
The simplest way to handle larger datasets is parallelization ZhaoCC2009 ; XuPDS2014 , however this requires more computation power as well. Another way is to process the data online in batches as done by the MiniBatch algorithm of Sculley SculleyWWW2010 , a variant of the Lloyd algorithm that trades off quality (i.e. the converged energy) for speed.
To improve both the speed and the quality of the clustering results, Arthur and Vassilvitskii ArthurDA2007 proposed the means++ initialization method. The initialization typically results in a higher quality clustering and fewer iterations for means, than when using the default random initialization. Furthermore, the expected value of the clustering energy is within a factor of the optimal solution. However, the time complexity of the method is , i.e. the same as a single iteration of the Lloyd algorithm  which can be too expensive in a large scale setting. Since means++ is sequential in nature, Bahman et al. BahmaniVLDBE2012 introduced a parallel version means of means++, but did not reduce the time complexity of the method.
Another direction is to speed up the actual means iterations. Elkan elkan2003using , Hamerly hamerly2010making and Drake and Hamerly drake2012accelerated go in this direction and use the triangle inequality to avoid unnecessary distance computation between cluster centers and the data points. However, these methods still require a full Lloyd iteration in the beginning to then gradually reduce the computation of progressive iterations. The recent Yinyang means method of Ding et al. DingICML2015 is a similar method, that also leverages bounds to avoid redundant distance calculations. While typically performing 23 faster than Elkan method, it also requires a full Lloyd iteration to start with.
Philbin et al. philbin2007object introduce an approximate means (AKM) method based on kdtrees to speed up the assignment step, reducing the complexity of each means iteration from to , where . In this case , the distance computations performed per each iteration, controls the tradeoff between a fast and an accurate (i.e. low energy) clustering. Wang et al. wang2012fast use cluster closures for further speedups.
In this paper we propose means, a method aiming at both fast and accurate clustering. Following the observation that usually the clusters change gradually and affect only local neighborhoods, in the assignment step we only consider the nearest neighbours of a center as the candidates for the clusters members. Furthermore we employ the triangle inequality bounds idea as introduced by Elkan elkan2003using to reduce the number of operations per each iteration. For initializing means, we propose a divisive initialization method, which we experimentally prove to be more efficient than means++.
Our means gives a significant algorithmic speedup, i.e. reducing the complexity to per iteration, while still maintaining a high accuracy comparable to methods such as means++ for a chosen . Similar to in AKM, also controls a tradeoff between speed and accuracy. However, our experiments show that we can use a significantly lower when aiming for a high accuracy.
The paper is structured as follows. In Table 1 we summarize the notations used in this paper. In Section 2 we introduce our proposed means method and our divisive initialization. In Section 3 we describe the experimental benchmark and discuss the results obtained, while in Section 4 we draw conclusions.
number of data points to cluster  

number of clusters  
number of nearest clusters  
number of dimensions of the data points  
the data  
cluster centers  
cluster assignments  
cluster assignment of , i.e.  
cluster assignment of some set of points  
points assigned to cluster ,  
the mean of :  
norm of  
energy of :  
nearest neighbours of in (including ) 
2 Proposed means
In this section we introduce our means method and motivate the design decisions. The pseudocode of the method is given in Algorithm 1.
Given some data , the means clustering objective is to find cluster centers and cluster assignments , such that the cluster energy
(1) 
is minimized, where denotes the points assigned to a cluster . For a data point , we sometimes write instead of for the cluster assignment. Similarly, for a subset of the data, denotes the cluster assignments of the corresponding points.
Standard Lloyd obtains an approximate solution by repeating the following until convergence: i) In the assignment step, each is assigned to the nearest center in . ii) For the update step, each center is recomputed as the mean of its members.
The assignment step requires distance computations, i.e. operations, and dominates the time complexity of each iteration. The update step requires only operations for mean computations.
To speed up the assignment step, an approximate nearest neighbour method can be used, such as kdtrees philbin2007object ; muja2009fast or locality sensitive hashing indyk1998approximate . However, these methods ignore the fact that the cluster centers are moving across iterations and often this movement is slow, affecting a small neighborhood of points. With this observation, we obtain a very simple fast nearest neighbour scheme:
Suppose at iteration , a data point was assigned to a nearby center, . After updating the centers, we still expect to be close to . Therefore, the centers nearby are likely candidates for the nearest center of in iteration . To speed up the assignment step, we thus only consider the nearest neighbours of , , as candidate centers for the points . Since for each point we only consider centers in the assignment step (in line 11 of Algorithm 1), the complexity is reduced to . In practice, we can set .
We also use inequalities as in elkan2003using to avoid redundant distance computations in the assignment step (in line 11 of Algorithm 1). Note that we maintain only lower bounds, for the neighbourhood of each point, instead of for the Elkan method. We refer to the original Elkan paper elkan2003using for a detailed discussion on triangle inequalities and bounds.
As for standard Lloyd, the total energy can only decrease, both in the assignment step (since the points are moved to closer centers) and in the update step. Thus, the total energy is monotonically decreasing which guarantees convergence.
As shown by Arthur and Vassilvitskii ArthurDA2007 , a good initialization, such as means++, often leads to a higher quality clustering compared to random sampling. Since the complexity of means++ would negate the benefits of the means computation savings, we propose an alternative fast initialization scheme, which also leads to high quality clustering solutions.
2.1 Greedy Divisive Initialization (GDI)
For the initialization of our
means, we propose a simple hierarchical clustering method named Greedy Divisive Initialization (GDI), detailed in Algorithm
2. Similarly to other divisive clustering methods, such as boley1998principal ; su2004deterministic , we start with a single cluster and repeatedly split the highest energy cluster until we reach clusters.To efficiently split each cluster, we use Projective Split (Algorithm 3), a variant of means with , that is motivated by the following observation: Suppose we have points and centers in the means method. Let
be the hyperplane with normal vector
, going through (see e.g. the top left corner of Figure 1). When we perform the standard means assignment step, we greedily assign each point to its closest centroid to get a solution with a lower energy, thus assigning the points on one side of to , and the other side of to .Although this is the best assignment choice for the current centers and , this may not be a good split of the data. Therefore, we depart from the standard assignment step and consider instead all hyperplanes along the direction (i.e. with normal vector ). We project onto and “scan” a hyperplane through the data to find the split that gives the lowest energy (lines 48 in Algorithm 3). To efficiently recompute the energy of the cluster splits as the hyperplane is scanned, we use the following Lemma:
Lemma 1.
kanungo2002local [Lemma 2.1] Let be a set of points with mean . Then for any point
(2) 
We can now compute
(3)  
(4)  
(5) 
where we used Lemma 1 in (4). Equipped with (5) we can efficiently update energy terms in line 8 in Algorithm 3 as we scan the hyperplane through the data , using in total only distance computations and mean updates. Note that is easily computed with an add operation as .
Compared to standard means with , our Projective Split takes the optimal split along the direction but greedily considers only this direction. In Figure 1 we show how this can lead to a faster convergence.
2.2 Time Complexity
Table 2 shows the time and memory complexity of Lloyd, Elkan, MiniBatch, AKM, and our means.
The time complexity of each means iteration is dominated by two factors: building the nearest neighbour graph of (line 6), which costs distance computations, as well as computing distances between points and candidate centers (line 11), which initially costs distance computations. Elkan and means use the triangle inequality to avoid redundant distance calculations and empirically we observe the and terms (respectively) gradually reduce down to at convergence.
In MiniBatch means processes only samples per iteration (with ) but needs more iterations for convergence. AKM limits the number of distance computations to per iteration, giving a complexity of .
Table 3 shows the time and memory complexity of random, means++ and our GDI initialization. For the GDI, the time complexity is dominated by calls to Projective Split. If we limit Projective Split to maximum iterations (2 in our experiments) then a call to ProjectiveSplit costs distance computations and vector additions, inner products and comparisons (for the sort), giving in total complexity. However, the resulting time complexity of GDI depends on the data.
For pathological datasets, it could happen for each call to ProjectiveSplit, that the minimum split is of the form , i.e. only one point is split off. In this case, for , the total complexity will be . ^{1}^{1}1 A simple example of such a pathological dataset is where , , , and . The size of grows extremely fast though, e.g. and has 195 digits.
A more reasonable case is when at each call ProjectiveSplit splits each cluster into two similarly large clusters, i.e. the minimum split is of the form where . In this case the worst case scenario is when in each split the highest energy cluster is the largest cluster (in no. of samples), resulting a total complexity of . ^{2}^{2}2If we split all clusters of approximately equal size simultaneously, we need passes and perform computations in each pass. Therefore the time complexity of GDI is somewhere between .
In our experiments we count vector operations for simplicity (i.e. dropping the factor), as detailed in the next section. To fairly account for the complexity of the sorting step in ProjectiveSplit, we artificially count it as vector operations.
Complexity  Lloyd  Elkan elkan2003using  MiniBatch SculleyWWW2010  AKM philbin2007object  means (ours) 

Time  
Memory 
Complexity  random  means++  GDI (ours) 

Time  
Memory 
3 Experiments
For a fair comparison between methods implemented in various programming languages, we use the number of vector operations as a measure of complexity, i.e. distances, inner products and additions. While the operations all share an complexity, the distance computations are most expensive accounting for the constant factor. However, since the runtime of all methods is dominated by distance computations (i.e. more than 95% of the runtime), for simplicity we count all vector operations equally and refer to them as “distance computations”, using the terminology from elkan2003using .
3.1 Datasets
In our experiments we use datasets with 2414150000 samples ranging from 50 to 32256 dimensions as listed in Table 6. The datasets are diverse in content and feature representation.
To create cnnvoc we extract 4096dimensional CNN features krizhevsky2012imagenet for 15662 bounding boxes, each belonging to 20 object categories, from PASCAL VOC 2007 everingham2010pascal dataset. covtype uses the first 150000 entries of the Covertype dataset ucimlrepo of cartographic features. From the mnist database lecun1998mnist of handwritten digits we also generate mnist50 by random projection of the raw pixels to a 50dimensional subspace. For tinygist10k we use the first 10000 images with extracted gist features from the 80 million tiny images dataset torralba200880 . cifar represents 50000 training images from the CIFAR krizhevsky2009learning dataset. usps hull1994database has scans of handwritten digits (raw pixels) from envelopes. yale contains cropped face images from the Extended Yale B Database GeBeKr01 ; KCLee05 .
3.2 Methods
We compare our means with relevant clustering methods: Lloyd (standard means), Elkan elkan2003using (accelerated Lloyd), MiniBatch SculleyWWW2010 (webscale online clustering), and AKM philbin2007object (efficient search structure).
Aside from our GDI initialization, we also use random initialization and means++ ArthurDA2007 in our experiments. For means++ we use the provided Matlab implementation. We Matlab implement MiniBatch means according to Algorithm 1 in SculleyWWW2010 and use the provided codes for Elkan and AKM. Lloyd++ and Elkan++ combine means++ initialization with Lloyd and Elkan, respectively.
We run all methods, except MiniBatch, for a maximum of 100 iterations. For MiniBatch means we use samples per batch and iterations. For the Projective Split, Algorithm 3, we perform only 2 iterations.
3.3 Initializations
We compare means++, random and our GDI initialization by running 20 trials of means (Lloyd) clustering with on the datasets (excluding cifar and tiny10k due to the prohibitive cost of standard Lloyd with a high number of clusters). Table 4 reports minimum and average cluster energy as well as the average number of distance computations, relative to means++, averaged over the settings of (i.e. experiments for each row).
average convergence energy  minimum convergence energy  average runtime complexity  
Dataset  random  means++  GDI  random  means++  GDI  means++  GDI 
cnnvoc  1.000  1.000  0.994  1.000  1.000  0.995  1.000  0.096 
covtype  1.507  1.000  0.983  1.384  1.000  0.991  1.000  0.116 
mnist  1.000  1.000  0.999  1.000  1.000  0.999  1.000  0.093 
mnist50  1.000  1.000  0.999  1.000  1.000  0.999  1.000  0.119 
tinygist10k  0.999  1.000  0.994  1.000  1.000  0.996  1.000  0.098 
usps  1.019  1.000  0.996  1.016  1.000  0.996  1.000  0.099 
yale  1.024  1.000  1.008  1.025  1.000  1.005  1.000  0.103 
average  1.078  1.000  0.996  1.061  1.000  0.997  1.000  0.103 
Our GDI gives a (slightly) better average and minimum convergence energy than the other initializations, while its runtime complexity is an order of magnitude smaller than in the case of means++ initialization.
The corresponding expanded Table 7 of the supplementary materials shows that speedup of GDI over means++ improves as grows, and at is typically more than an order of magnitude. This makes GDI a good choice for the initialization of means.
3.4 Performance
Our goal is fast accurate clustering, where the cluster energy differs only slightly from Lloyd with a good initialization (such as means++) at convergence. Therefore, we measure the runtime complexity needed to achieve a clustering energy that is within of the energy obtained with Lloyd++ at convergence. In the supplementary material we report on performance for more reference levels ( and ).
For a given budget i.e. the maximum number of iterations and parameters such as for AKM and for means, it is not known beforehand how well the algorithms approximate the targeted Lloyd++ energy. For a fair comparison we use an oracle to select the best parameters and the number of iterations for each method, i.e. the ones that give the highest speedup but still reach the reference error. In practice, one can use a rule of thumb or progressively increase , and the number of iterations until a desired energy has been reached.
To measure performance we run AKM, Elkan++, Elkan, Lloyd++, Lloyd, MiniBatch, and means with on various datasets, with 3 different seeds and report average speedups over Lloyd++ when the energy reached is within from Lloyd++ at convergence in Table 6.
Each method is stopped once it reaches the reference energy and for AKM and means, we use the parameters and from that give the highest speedup.
Table 6 shows that for most settings, our means has the highest algorithmic speedup at error. It benefits the most when both the number of clusters and the number of points are large, e.g. for at least speedup for all datasets with samples. We do not reach the target energy for usps and yale with , because was limited to .
Figure 3 show the convergence curves corresponding to cifar and mnist50 entries in Table 6. On cifar the benefit of means is clear since it reaches the reference error significantly faster than the other methods. On mnist50 means is considerably faster than AKM for but AKM reaches the reference faster for . We show more convergence curves in the supplementary material.
In all settings of Table 6, Elkan++ gives a consistent up to speedup (since it is an exact acceleration of Lloyd++). For some settings Elkan is faster than Elkan++ in reaching the desired accuracy. This is due to the faster initialization. MiniBatch fails in all but one case (mnist, ) to reach the reference error of and is thus not shown.
For accurate clustering, when the reference energy is the Lloyd++ convergence energy (i.e. 0% error), Table 6 shows that the speedups of means are even higher. For this setting, the second fastest method is Elkan++, which is designed for accelerating the exact Lloyd++.
In the supplementary material we report extra results at more energy reference levels and also more plots showing the convergence of the compared methods on different datasets.
4 Conclusions
We proposed means, a simple yet efficient method ideally suited for fast and accurate large scale clustering (, , ). means combines an efficient divisive initialization with a new method to speed up the means iterations by using the nearest clusters as the new set of candidate centers for the cluster members as well as triangle inequalities. The algorithmic complexity of our means is sublinear in for and experimentally shown to give a high accuracy on diverse datasets. For accurate clustering, means requires an order of magnitude fewer computations than alternative methods such as the fast approximate means (AKM) clustering. Moreover, our efficient divisive initialization leads to comparable clustering energies and significantly lower runtimes than the means++ initialization under the same conditions.
References
 (1) D. Arthur and S. Vassilvitskii. kmeans++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.
 (2) B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable kmeans++. Proceedings of the VLDB Endowment, 5(7):622–633, 2012.

(3)
C. Blake, E. Keogh, and C. Merz.
Uci repository of machine learning databases., 1998.
 (4) D. Boley. Principal direction divisive partitioning. Data mining and knowledge discovery, 2(4):325–344, 1998.
 (5) Y. Ding, Y. Zhao, X. Shen, M. Musuvathi, and T. Mytkowicz. Yinyang kmeans: A dropin replacement of the classic kmeans with consistent speedup. In Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 579–587, 2015.
 (6) J. Drake and G. Hamerly. Accelerated kmeans with adaptive distance bounds. In 5th NIPS workshop on optimization for machine learning, 2012.
 (7) C. Elkan. Using the triangle inequality to accelerate kmeans. In ICML, volume 3, pages 147–153, 2003.
 (8) M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.

(9)
A. Georghiades, P. Belhumeur, and D. Kriegman.
From few to many: Illumination cone models for face recognition under variable lighting and pose.
IEEE Trans. Pattern Anal. Mach. Intelligence, 23(6):643–660, 2001.  (10) G. Hamerly. Making kmeans even faster. In SDM, pages 130–140. SIAM, 2010.
 (11) J. J. Hull. A database for handwritten text recognition research. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5):550–554, 1994.

(12)
P. Indyk and R. Motwani.
Approximate nearest neighbors: towards removing the curse of dimensionality.
InProceedings of the thirtieth annual ACM symposium on Theory of computing
, pages 604–613. ACM, 1998.  (13) T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local search approximation algorithm for kmeans clustering. In Proceedings of the eighteenth annual symposium on Computational geometry, pages 10–18. ACM, 2002.
 (14) A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images, 2009.
 (15) A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.

(16)
Y. LeCun, C. Cortes, and C. J. Burges.
The mnist database of handwritten digits, 1998.
 (17) K. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5):684–698, 2005.
 (18) S. P. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129–137, 1982.
 (19) M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2, 2009.

(20)
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman.
Object retrieval with large vocabularies and fast spatial matching.
In
Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on
, pages 1–8. IEEE, 2007.  (21) D. Sculley. Webscale kmeans clustering. In Proceedings of the 19th international conference on World wide web, pages 1177–1178. ACM, 2010.

(22)
T. Su and J. Dy.
A deterministic method for initializing kmeans clustering.
In
Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
, pages 784–786. IEEE, 2004. 
(23)
A. Torralba, R. Fergus, and W. T. Freeman.
80 million tiny images: A large data set for nonparametric object and scene recognition.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958–1970, 2008.  (24) J. Wang, J. Wang, Q. Ke, G. Zeng, and S. Li. Fast approximate kmeans via cluster closures. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
 (25) X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip, et al. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1):1–37, 2008.
 (26) Y. Xu, W. Qu, Z. Li, G. Min, K. Li, and Z. Liu. Efficient k means++ approximation with mapreduce. Parallel and Distributed Systems, IEEE Transactions on, 25(12):3135–3144, Dec 2014.
 (27) W. Zhao, H. Ma, and Q. He. Parallel kmeans clustering based on mapreduce. In Cloud Computing, pages 674–679. Springer, 2009.
Additional experimental results
In the expanded Table 7, corresponding to Table 4 in the paper, we see that the speedup of GDI over means++ improves as grows, and at is typically more than an order of magnitude.
In Figure 3 we show the convergence curves corresponding to cifar, cnnvoc, mnist and mnist50 for , which correspond to entries in Table 5 in the paper. For the same datasets we further show in Figure 4 the convergence curves of all parameters tried for means and AKM. Note that we do not try parameters and that are larger than .
In Tables 8,9,10 and 11 we show a detailed speedup table when the reference energy is taken at 0%, 0.5%, 1% and 2% deviation from the Lloyd++ convergence energy. The algorithmic speedups of our method are the largest when the aim is to reach the same energy as the final Lloyd++ energy (i.e. 0% error). However, for a less accurate result, i.e. when the reference is taken at 2% from Lloyd++, Table 11 shows that means is still competitive with AKM.
We conclude that our means reaches its full potential when used for accurate clustering, that is, low clustering energies comparable with the energies at the convergence of a standard Lloyd with a robust means++ initialization. In such conditions, means is clearly orders of magnitude faster than Lloyd++ and significantly faster than the efficient AKM and Elkan++.
average convergence energy  minimum convergence energy  average runtime complexity  
Dataset  k  random  means++  GDI  random  means++  GDI  means++  GDI 
cnnvoc  100  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.16 
200  1.00  1.00  0.99  1.00  1.00  0.99  1.00  0.09  
500  1.00  1.00  0.99  1.00  1.00  0.99  1.00  0.04  
covtype  100  1.51  1.00  0.99  1.47  1.00  0.99  1.00  0.19 
200  1.58  1.00  0.98  1.38  1.00  0.99  1.00  0.11  
500  1.43  1.00  0.99  1.30  1.00  0.99  1.00  0.05  
mnist  100  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.15 
200  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.09  
500  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.04  
mnist50  100  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.19 
200  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.11  
500  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.05  
tinygist10k  100  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.16 
200  1.00  1.00  0.99  1.00  1.00  1.00  1.00  0.09  
500  1.00  1.00  0.99  1.00  1.00  1.00  1.00  0.04  
usps  100  1.01  1.00  0.99  1.01  1.00  1.00  1.00  0.16 
200  1.01  1.00  0.99  1.01  1.00  0.99  1.00  0.09  
500  1.04  1.00  1.00  1.04  1.00  1.00  1.00  0.05  
yale  100  1.01  1.00  1.00  1.00  1.00  0.99  1.00  0.16 
200  1.02  1.00  1.00  1.02  1.00  1.00  1.00  0.10  
500  1.05  1.00  1.03  1.05  1.00  1.02  1.00  0.05 
Dataset  AKM  Elkan++  Elkan  Lloyd++  Lloyd  MiniBatch  means  

cifar  50000  3072  50    17.8    1.0      37.9 
100    20.7    1.0      70.6  
200  1.2  24.2    1.0      139.8  
500  9.0  20.5  31.9  1.0  2.9    287.9  
1000  11.3  17.5  28.2  1.0  2.6    373.6  
cnnvoc  15662  4096  50  2.4  9.3    1.0      26.2 
100  1.3  11.0    1.0      44.3  
200  3.7  9.3    1.0      59.7  
500  5.7  9.2    1.0      79.2  
1000  5.8  8.1    1.0        
covtype  150000  54  50    28.9    1.0      172.0 
100    34.3    1.0      287.2  
200    40.2    1.0      442.4  
500    47.4    1.0      816.9  
1000    44.5    1.0        
mnist  60000  784  50  1.1  17.3  26.6  1.0  2.9    39.3 
100  2.7  20.5    1.0      56.8  
200    25.8    1.0      81.0  
500  5.0  23.6    1.0      108.1  
1000  9.0  29.8    1.0      141.1  
mnist50  60000  50  50    18.7    1.0      31.0 
100  1.4  22.3    1.0      39.7  
200  2.1  26.6    1.0      80.3  
500  3.4  23.2    1.0      79.8  
1000  5.1  22.7    1.0      94.1  
tinygist10k  10000  384  50  12.5  12.5  20.1  1.0  3.6    50.1 
100  3.0  11.7  17.0  1.0  2.8    53.5  
200  4.7  11.0    1.0      71.8  
500  6.2  10.7    1.0      71.7  
1000  2.6  7.8    1.0        
usps  7291  256  50    12.6    1.0      31.7 
100    15.0    1.0      46.4  
200  3.4  14.6    1.0      54.4  
500    10.8    1.0      47.6  
1000    9.4    1.0        
yale  2414  32256  50  2.8  9.5    1.0      32.5 
100    8.0    1.0        
200  20.8  6.5    1.0      18.7  
500  21.6  5.3    1.0        
1000    4.0    1.0       
Dataset  AKM  Elkan++  Elkan  Lloyd++  Lloyd  MiniBatch  means  

cifar  50000  3072  50  0.8  4.1  5.6  1.0  1.1    10.8 
100  0.7  4.7  7.2  1.0  1.4    20.1  
200  1.6  4.5  6.7  1.0  1.2    32.4  
500  3.0  4.1  6.5  1.0  1.2    69.0  
1000  4.6  4.1  6.7  1.0  1.2    101.0  
cnnvoc  15662  4096  50  8.8  2.8  3.8  1.0  1.3    8.7 
100  4.0  2.6  3.3  1.0  1.0    12.9  
200  7.8  2.7  3.7  1.0  1.2    21.9  
500  2.7  2.4  3.4  1.0  1.0    26.0  
1000  2.9  2.4    1.0      18.2  
covtype  150000  54  50    9.8    1.0      56.8 
100    7.5    1.0      62.4  
200    8.1    1.0      96.8  
500    12.5    1.0      249.8  
1000    12.0    1.0      190.0  
mnist  60000  784  50  1.0  4.7  6.6  1.0  1.2    12.4 
100  1.2  4.9  7.1  1.0  1.2    18.3  
200  1.5  6.1  9.3  1.0  1.3    31.8  
500  2.9  5.5  8.6  1.0  1.1    44.5  
1000  3.7  4.9  7.7  1.0  0.6    46.3  
mnist50  60000  50  50  7.5  5.4  7.6  1.0  1.2    11.1 
100  1.1  6.1  9.0  1.0  1.2    18.7  
200  1.8  6.7  10.5  1.0  1.4    30.1  
500  1.9  5.8  9.0  1.0  0.8    35.6  
1000  2.6  5.5  8.7  1.0  0.6    36.7  
tinygist10k  10000  384  50  11.6  3.4  5.1  1.0  1.6    14.2 
100  7.9  3.1  4.4  1.0  1.2    16.3  
200  7.8  3.1  4.6  1.0  1.3    24.8  
500  2.3  2.6  4.0  1.0  0.8    25.3  
1000  1.3  2.5    1.0        
usps  7291  256  50    5.8    1.0      15.4 
100  0.7  6.5    1.0      23.5  
200  17.0  5.8    1.0      25.2  
500  15.6  4.1    1.0      23.5  
1000    3.3    1.0        
yale  2414  32256  50  2.6  6.0    1.0      23.6 
100  1.1  3.6    1.0      12.7  
200  14.2  3.5    1.0      13.7  
500  14.4  2.6    1.0        
1000    2.0    1.0       
Dataset  AKM  Elkan++  Elkan  Lloyd++  Lloyd  MiniBatch  means  

cifar  50000  3072  50  1.0  2.6  3.7  1.0  1.0    9.5 
100  1.0  3.1  4.8  1.0  1.3    15.7  
200  1.9  3.0  4.6  1.0  1.1    26.2  
500  3.1  3.0  4.9  1.0  1.2    59.0  
1000  4.9  3.0  5.1  1.0  1.2    86.7  
cnnvoc  15662  4096  50  13.8  2.1  2.9  1.0  1.4    9.0 
100  14.2  2.0  2.6  1.0  1.1    11.2  
200  22.6  2.0  2.8  1.0  1.2    19.2  
500  15.1  1.9  2.8  1.0  1.1    25.8  
1000  3.3  1.9  2.8  1.0  0.9    20.2  
covtype  150000  54  50    6.1    1.0      35.1 
100    5.6    1.0      46.9  
200    6.3    1.0      78.7  
500    8.1    1.0      175.8  
1000    8.5    1.0      176.6  
mnist  60000  784  50  7.3  3.6  5.3  1.0  1.5  0.5  12.3 
100  4.2  3.3  4.9  1.0  1.1    14.6  
200  1.9  3.7  5.7  1.0  1.2    24.6  
500  3.5  3.8  6.2  1.0  1.1    39.8  
1000  4.7  3.6  5.9  1.0  0.8    43.4  
mnist50  60000  50  50  12.7  3.7  5.4  1.0  1.3    8.8 
100  8.3  4.1  6.2  1.0  1.2    15.0  
200  1.9  4.2  6.7  1.0  1.2    22.3  
500  2.2  4.2  6.8  1.0  1.0    33.1  
1000  3.1  4.1  6.6  1.0  0.8    38.0  
tiny10k  10000  3072  50  1.5  2.9  4.0  1.0  1.0    12.3 
100  1.3  2.7  4.1  1.0  1.2    21.0  
200  2.0  2.9  4.5  1.0  1.1    35.4  
500  2.3  2.7  4.4  1.0  0.6    49.0  
1000  2.2  2.6    1.0      51.8  
tinygist10k  10000  384  50  16.2  2.4  3.6  1.0  1.4    11.7 
100  12.9  2.2  3.3  1.0  1.2    14.2  
200  12.8  2.3  3.5  1.0  1.3    22.3  
500  2.5  2.1  3.3  1.0  1.0    26.2  
1000  1.5  2.1    1.0      13.6  
usps  7291  256  50  5.3  4.1    1.0      11.8 
100  7.2  4.7  7.1  1.0  0.6    19.0  
200  16.8  4.4    1.0      23.6  
500  19.6  3.3    1.0      23.8  
1000  18.5  2.7    1.0        
yale  2414  32256  50  2.1  4.2  6.3  1.0  0.6    17.9 
100  1.2  2.9    1.0      12.0  
200  21.9  2.9    1.0      13.9  
500  18.1  2.3    1.0        
1000    1.9    1.0       
Dataset  AKM  Elkan++  Elkan  Lloyd++  Lloyd  MiniBatch  means  

cifar  50000  3072  50  1.5  1.9  2.9  1.0  1.2    9.4 
100  1.9  2.1  3.3  1.0  1.3    15.6  
200  2.3  2.1  3.4  1.0  1.2    26.2  
500  4.6  2.2  3.7  1.0  1.3    53.5  
1000  5.6  2.3  3.9  1.0  1.3    81.7  
cnnvoc  15662  4096  50  15.5  1.5  2.1  1.0  1.3    8.1 
100  27.6  1.5  2.1  1.0  1.2    11.9  
200  41.0  1.5  2.2  1.0  1.3    19.5  
500  55.9  1.5  2.3  1.0  1.2    29.9  
1000  4.5  1.5  2.4  1.0  1.1    28.2  
covtype  150000  54  50    4.2    1.0      23.9 
100    4.2    1.0      36.4  
200    4.4    1.0      61.2  
500    5.1    1.0      123.9  
1000    5.5    1.0      154.9  
mnist  60000  784  50  14.1  2.3  3.5  1.0  1.4  1.4  10.0 
100  16.8  2.2  3.4  1.0  1.2  1.0  12.4  
200  30.0  2.4  3.9  1.0  1.2  0.3  20.6  
500  67.5  2.6  4.4  1.0  1.2    35.1  
1000  87.8  2.6  4.5  1.0  1.0    42.4  
mnist50  60000  50  50  12.9  2.3  3.4  1.0  1.2  1.0  6.6 
100  18.3  2.6  4.2  1.0  1.3  1.1  11.6  
200  23.2  2.8  4.6  1.0  1.2  0.8  19.2  
500  2.8  2.9  5.0  1.0  1.1  0.2  31.0  
1000  4.1  2.9  5.0  1.0  1.0    36.2  
tinygist10k  10000  384  50  20.3  1.7  2.7  1.0  1.4    10.3 
100  24.2  1.7  2.6  1.0  1.3    15.1  
200  33.3  1.8  2.8  1.0  1.3    21.8  
500  2.9  1.7  2.8  1.0  1.2    30.6  
1000  2.3  1.7  2.7  1.0  0.7    23.7  
usps  7291  256  50  9.5  2.7  3.7  1.0  0.6  0.2  9.7 
100  10.5  3.3  5.2  1.0  0.8    15.1  
200  30.1  3.3  5.3  1.0  0.5    21.5  
500  26.7  2.7    1.0      23.5  
1000  25.2  2.2    1.0      11.6  
yale  2414  32256  50  2.2  3.0  4.9  1.0  0.9    14.2 
100  1.7  2.3  3.7  1.0  0.4    12.9  
200  28.3  2.3    1.0      14.9  
500  29.6  2.0    1.0        
1000    1.6    1.0       
Comments
There are no comments yet.