Introduction
Matrix sketching Woodruff (2014) is a powerful dimensionality reduction method that can efficiently find a small matrix to replace the original large matrix while preserving most of its properties. For an input large data matrix where is the number of samples and is the number of features, matrix sketching methods generate a sketch of by multiplying it with a random sketching matrix
with certain properties. Compared with traditional dimensionality reduction methods (e.g., Principal Component Analysis (PCA)
Jolliffe (2011)), matrix sketching methods can obtain the sketched matrix very efficient with certain theoretical guarantees Woodruff (2014). Therefore, matrix sketching has gained significant research attention and has been used widely for handling highdimensional data in machine learning
Mahoney (2011); Ailon and Chazelle (2006); Bojarski et al. (2017); Choromanski et al. (2017).A typical way of applying matrix sketching in machine learning problem is sketch and solve Dahiya et al. (2018). For example, in a linear classification problem with training data where is a large input feature matrix and
is the corresponding label vector, a classification model can be trained by solving
wheredenotes a loss function (e.g., hinge loss). By using matrix sketching, we can first obtain a sketched data matrix
by and then solve a much smaller problem . Then, the expensive computation on original large matrix can be replaced by computation on small matrix . This sketch and solve method has also been used to speedup other machine learning tasks, such as least squares regression Dobriban and Liu (2019), lowrank approximation Tropp et al. (2017); Clarkson and Woodruff (2017) and means clustering Boutsidis et al. (2010); Liu et al. (2017).Recent advances in randomized numerical linear algebra Martinsson and Tropp (2020)
has provided a solid theoretical foundation for matrix sketching. Various methods have been proposed to construct the random matrix
. The early method Dasgupta and Gupta (1999) constructs a dense random Gaussian matrix where each element inis generated from a Gaussian distribution
. This method based on dense random Gaussian matrix requires time for computing the sketched matrix . Achlioptas (2003) proposed to generate a sparser random matrix where each element in is generated from {, 0, 1} following a discrete distribution. It will reduce the computation complexity from to . In recent years, two famous fast random projection matrices were proposed for efficiently computing the projection . The first one is the Subsampled Randomized Hadamard Transform (SRHT) which can achieve time for computing Tropp (2011); Ailon and Liberty (2009). The second method is called countsketch Clarkson and Woodruff (2017) which can compute in time for any input which makes the countSketch method particularly suitable for sparse input data. In our paper, we focus on improving the countsketch algorithm in the context of classification.Countsketch constructs the random matrix by a product of two matrices and , i.e., , where is a random diagonal matrix where each diagonal values is uniformly chosen from and is a very sparse matrix where each row has only one randomly selected entry equal to 1 and all other are 0. Previously Paul et al. (2014) applied countsketch for linear SVM classification and showed that linear SVM trained on the sketched data matrix can ensure comparable generalization ability as in the original space in the case of classification. However, there are two main limitations of countsketch: (1) It is a dataoblivious method where the generation of sketching matrix is totally independent of input data matrix and therefore the sketched matrix may not be effective for the subsequent classification algorithm; (2) The sketched data matrix will not maintain the same sparsity rate as the original input data . It could make the subsequent classification algorithm on the sketched data more computationally expensive than on the original data . Even though dataoblivious matrix sketching has been extensively studied, few studies focus on efficient datadependent matrix sketching. Recently, Xu et al. (2017)
proposed to use the approximated singular value decomposition (SVD) as the projection subspace.
Lei and Lan (2020) proposed to improve SRHT by nonuniform sampling by exploiting data properties. However, both of them will produce a dense sketched matrix for sparse input data.In this paper, we focus on addressing the aforementioned two limitations of countsketch. To address the first limitation, we first show an interesting connection between countsketch and means clustering by analyzing the reconstruction error of countsketch. Based on our analysis, we propose to reduce the reconstruction error of countsketch by using means clustering to obtain the lowdimensional sketched data matrix. To address the second limitation, we propose to get sparse cluster centers by optimizing means objective function using gradient descent with  ball projection in each iteration. Finally, we compare our proposed methods with the other five popular matrix sketching algorithms on six reallife datasets. Our experimental results clearly demonstrate that our proposed datadependent matrix sketching methods achieve higher accuracy than countsketch and other random matrix sketching algorithms. Our results also show our method produces a sparser sketched data matrix than countsketch and other matrix sketching methods. The prediction cost of our method is smaller than other matrix sketching methods.
Preliminaries
Randomized Matrix Sketching
Given a data matrix and a random sketching matrix with , a sketched matrix is produced by
(1) 
Note that the matrix is randomly generated and is independent of the input data . As shown in the following JohnsonLindenstrauss Lemma (JL lemma), randomized matrix sketching can preserve the pairwise distance of all data points using the sketched data matrix .
Lemma 1 (JohnsonLindenstrauss Lemma (JL lemma) Johnson and Lindenstrauss (1984)).
For any and any integer n, let and be a random orthonormal matrix. Then for any set of points in , the following inequality about pairwise distance between any two data points and in
holds true with high probability:
CountSketch
Among various methods for constructing the sketching matrix , countsketch (or called sparse embedding) is well suited for sparse input data since it can achieve time complexity for computing . Countsketch Clarkson and Woodruff (2017) constructs the random matrix as where and are defined as follows,

is a diagonal matrix with each diagonal entry independently chosen to be 1 or with probability 0.5.

is a binary matrix with = 1, and all remaining entries 0. is a random map such that for any , , for with probability .
Note that random sketching matrix in countsketch is a very sparse matrix where each row have only one nonzero entry. This nonzero entry is uniformly chosen and the value is either 1 or with probability 0.5. can be computed in time because each nonzero entry in is at most by multiplied by one nonzero entry in .
Methodology
Even though countsketch has been successfully used for dimensionality reduction in linear SVM classification Paul et al. (2014), we argue that this dataoblivious method has two limitations: (1) The sketching matrix is randomly generated. It could result in bad sketched data when some important columns in are not sampled by using ; (2) Countsketch will not preserve the sparsity rate of the original data.
When applying countsketch for data classification, the first limitation could result in bad lowdimensional embedding and then produce a classification model with low accuracy. To illustrate this limitation, we show the classification accuracy of using countsketch for dimensionality reduction on mnist dataset for ten different runs in Figure
1. As shown in Figure 1, countsketch (the blue line with triangle markers) could produce low classification accuracy in some runs and also the accuracy is not stable. We also show the classification of our proposed method that will be introduced later in this figure (the red line with circle markers). We can see that our proposed method produces significantly better accuracy than countsketch.The second limitation of countsketch is that, when used with sparse input data, the sketched matrix could be much denser than the original data. We checked the sparsity rate of mnist data before and after countsketch. The original sparsity rate for mnist data is 80.78% and the sparsity rate is significantly decreased to 1.72% in the sketched data. Therefore, the sketched data could contain more nonzero values than the original data and make the subsequent classification algorithm slower. More examples can be found in the experiment section.
Connection between CountSketch and means clustering
Since the construction of matrix and in countsketch is oblivious to the input data matrix , it could produce a bad sketched matrix (e.g., some important columns in are not be sampled in ) and therefore results in low classification accuracy. In this paper, we seek to develop a datadependent countsketch method for addressing the two limitations of countsketch. To motivate our method, we start by analyzing the reconstruction error of the original countsketch method and show an interesting connection between countsketch and means clustering.
Let us define a diagonal scaling matrix as
(2) 
Note that
equals to an identity matrix with size
. The reconstruction error of countsketch can be represented as(3) 
where denotes the Frobenius norm of matrix which is defined as the square root of the sum of the squares of every elements in . Note that where the operator returns the sum of diagonal entries of an input matrix. As shown in the following Proposition 1, the reconstruction error of countsketch as shown in (3) is equivalent to the objective function of applying means clustering to cluster the columns of into clusters.
Proposition 1.
The reconstruction error of countsketch is equivalent to the objective function of means clustering on the columns of matrix product if we treat as a learnable variable which denotes the cluster membership of each column in .
Proof.
We first rewrite the reconstruction error as . Note that is a diagonal matrix with each diagonal entry either 1 or , therefore . Let us use to denote , we will have
(4) 
Next, we will show that as follows,
(5) 
Based on the definition of matrix , is a indicator matrix which each row has only one nonzero entry. Therefore, can viewed as a cluster membership indicator matrix which corresponds to randomly assign columns of matrix into clusters. The nonzero element in th row of denotes the th column in is assigned to cluster . Note that the th column of matrix product is the centroid of the cluster where the th column belongs to. Therefore
(7) 
where returns the index of the cluster that the th column belongs to and is the centroid of that cluster. By treating as a learnable variable which denotes the cluster membership, the reconstruction error of countsketch is the same as the objective function of means algorithm on the columns of as shown in 7.
∎
Our proposition 1 provides an interesting connection between countSketch and means clustering. In the countSketch algorithm, the clustering membership indicator matrix is randomly generated which does not consider intrinsic data properties and it could result in bad embedding with high reconstruction error.
Improved countsketch by means and ball projection
As shown in (7), the reconstruction error of countsketch can be improved by replacing the random cluster membership indicator matrix in the original countsketch algorithm by a cluster membership indicator matrix produced by means algorithm on the columns of . Motivated by this observation, we propose to use means algorithm to learn the cluster membership indicator matrix from data for lower reconstruction error. Therefore, the new cluster centers returned by means with = , which equals to , can be used as the new lowdimensional feature representation. And this new method will result in low reconstruction error than the original countsketch method.
Apart from the reconstruction error, as mentioned earlier, another limitation of countsketch is that it may not preserve the sparsity rate of the input data . In other words, the new data presentation could be dense even if the original data is highly sparse data. This limitation could make the subsequent algorithm on projected data be even slower than just using the original data without countsketch. Therefore, instead of using the Lloyd’s classic means algorithm Lloyd (1982), we would like to develop a new method to obtain very sparse cluster centers. We propose to obtain sparse cluster centers by optimizing the objective of means as shown in (7) using gradient descent together with ball projection Duchi et al. (2008) in each update.
The gradient of the means objective function with respect to the th cluster center is
(8) 
where is a binary function which return 1 if equals to (i.e., the th column of belongs to the th cluster), otherwise it returns 0. In other word, the computation of gradient only depends on columns that belongs to th cluster in current iteration.
By using gradient descent, in each iteration, the cluster center can be updated as
(9) 
where is the learning rate. However, directly using (9) will not produce sparse cluster centers.
To obtain sparse cluster centers, we will use  ball projection to make be a sparse vector. The  ball projection is proposed in Sculley (2010) which is approximated extension of exact ball projection Duchi et al. (2008).  is very effective at getting sparse cluster centers as shown in Sculley (2010). The basic idea of  ball projection is to use bisection to find a value that projects a dense vector to an ball with radius between and . After is found,  ball projection will map the th entry in (denoted as ) to
(10) 
As shown in (10), the resulting cluster centers s will be sparse vectors since will make an element to 0 if its absolute value is smaller than . The whole procedure of  ball projection is described in Algorithm 1.
By using Algorithm 1 in each iteration of optimizing means objective function by gradient descent, we will get sparse cluster centers.
Algorithm Implementation and Analysis
We summarize our proposed algorithm for improving the original countsketch in algorithm Improved countsketch by means and ball projection and named it as Effective and Sparse CountSKetch (ESCK). Our proposed algorithm first obtain as shown in step 12 which is the same as the original countsketch. The contribution of our proposed algorithm is to replace the randomly generated cluster membership indicator matrix in countsketch with the learned cluster membership indicator matrix . The sparse cluster centers will be used as the lowdimensional data representation. As shown from step 3 to 12, sparse cluster centers are obtained by using gradient descent with  ball projection to cluster columns into groups.
With respect to time complexity, step 2 only needs time because of is a diagonal matrix. The time complexity for Step 3 to 12 is upper bounded by where is the number of iterations. For sparse input data, the time complexity in each iteration for updating cluster centers will smaller than since both data and clusters are sparse. Empirically, the means algorithm using gradient descent converges very fast and only a few iterations is needed. In our experiments, we will show that our proposed method is only several time slower than countsketch but the classification accuracy obtained by our method is much larger than countsketch and other methods. Note that our proposed method can also return the learned cluster membership indicator matrix , therefore, our algorithm can also be extended to inductive setting and generate the feature mapping for new unseen data by using which enjoys the same low computational complexity as the original countSketch.
Dataset  # of  # of  # of  sparsity 

samples  features  classes  rate  
usps  9,298  256  10  0% 
mnist  60,000  780  10  87.78% 
gisette  7,000  5,000  2  0.85% 
realsim  72,309  20,958  2  99.75% 
rcv1binary  20,242  47,236  2  99.84% 
rcv1multi  15,564  47,236  53  99.86% 
Performance  usps  mnist  gisette  realsim  rcv1binary  rcv1multi  

(=30)  (=100)  (=256)  (r=256)  (=256)  (=256)  
Gaussian  Accuracy(%)  90.60 0.01  88.92 0.03  90.70 0.01  79.25 0.06  81.63 0.01  69.71 0.01 
Sparsity rate  0%  0%  0%  0%  0%  0%  
Prediction time(ms)  2.89ms  20.84ms  3.01ms  31.15ms  7.95ms  22.42ms  
Achlioptas  Accuracy(%)  90.170.03  87.700.02  89.80 0.03  76.850.06  81.860.01  67.560.07 
Sparsity rate  0%  0.01%  0%  1.07%  0.03%  0.03%  
Prediction time(ms)  3.12ms  54.98ms  3.98ms  41.92ms  11.71ms  88.76ms  
CountSketch  Accuracy(%)  90.74 0.01  87.66 0.01  90.37 0.02  77.21 0.06  80.190.02  69.38 0.06 
Sparsity rate  0%  1.72%  0%  73.36%  73.65%  74.47%  
Prediction time(ms)  2.98ms  50.86ms  4.46ms  11.96ms  4.46ms  24.93ms  
SRHT  Accuracy(%)  89.86 1.66  87.14 0.84  90.45 0.87  78.37 0.20  80.29 0.69  68.50 0.29 
Sparsity rate  0%  0%  0%  0%  0%  0%  
Prediction time(ms)  3.3ms  79.35ms  3.6ms  22.6ms  6.96ms  103ms  
SRHTtopr  Accuracy(%)  90.68 1.51  88.15 0.77  92.45 0.55  82.48 0.19  82.14 0.24  71.01 0.77 
Sparsity rate  0%  0%  0%  0%  0%  0%  
Prediction time(ms)  4.22ms  75.23ms  3.3ms  46.6ms  6.14ms  101ms  
ESCKfull  Accuracy(%)  90.90 0.02  90.600.02  93.25 0.02  88.680.07  92.910.01  78.99 0.01 
sparsity rate  50.81%  43.10%  66.09%  89.57%  87.61%  88.44%  
Prediction time(ms)  0.97ms  15.81ms  0.99ms  3.99ms  1.01ms  13.51ms  
ESCKminiBatch  Accuracy(%)  91.90 0.01  90.500.02  94.45 0.03  88.250.08  90.010.02  77.13 0.01 
sparsity rate  46.78%  40.29%  37.58%  97.47%  94.78%  95.37%  
Prediction time(ms)  1.21ms  16.86ms  2.26ms  3.28ms  0.99ms  5.98ms 
embedding time during training stage  
usps  mnist  gisette  realsim  rcv1bianry  rcv1multi  
CountSketch  1.2s  5.1s  2.6s  2.6s  1.9s  1.1s 
ESCKfull  5.3s  26.4s  10.2s  38.5s  12.6s  8.5s 
ESCKminiBatch  1.1s  6.5s  4.5s  13.5s  5.3s  4.1s 
Experiments
In this section, we compared our proposed algorithm with several different commonlyused random dimensionality reduction algorithms based on six reallife datasets. These six datasets are downloaded from LIBSVM websiteChang and Lin (2011). The summarization of these six datasets is shown in Table 1. The sparsity rate as shown in the last column is the fraction of zeros in each input data matrix . As shown in Table 1, there are four sparse datasets (mnist, realsim, rcv1binary, rcv1multi) and two dense datasets (usps, gisette).
We evaluate the performance of following seven matrix sketching methods:

Gaussian: The sketching matrix is a random Gaussian Matrix Dasgupta and Gupta (1999)

Achlioptas: The sketching matrix is randomly generated from a discrete distribution and is sparser than Gaussian matrix Achlioptas (2003).

CountSketch: original oblivious countsketch method Clarkson and Woodruff (2017)

SRHT : The sketching matrix is generated by the Subsampled Randomized Hadamard Transform (SRHT) Tropp (2011)

SRHTtopr An improved variant of SRHT which is datadependent

ESCKfull: our proposed method that use full batch gradient descent with  ball projection to get the means centers.

ESCKminiBatch: our proposed method that uses minibatch gradient descent with  ball projection to get the means centers. This is more efficient than ESCKfull but with slightly lower accuracy.
Experimental Setting. For the two dense datasets (usps and gisette), the feature values are scaled to [,1] using minmax normalization. We use fivefold cross validation to evaluate the accuracy. The regularization parameter in SVM is chosen from . The parameter for  ball projection is fixed to and parameter is chosen from . Our experiments are performed on a desktop with Intel(R) Core(TM) i79700 CPU and @ 3.00GHz and 16.0 GB RAM.
Experimental Results. We report the classification accuracy, sparsity rate of the sketched matrix and prediction time of different algorithms in Table 2. The projected dimension for each dataset is given in the first row of this table. The results for different settings of projected dimension will be discussed later. The first four methods are dataoblivious random projection methods and the last three are datadependent random projection methods. The best accuracy for each dataset is in bold and the second best accuracy for each dataset is in italic.
As shown in this Table, the datadependent matrix sketching methods (i.e., SRHTtopr, ESCKfull and ESCKminiBatch) get higher accuracy than dataindependent matrix sketching methods. Among these six datasets, the proposed ESCKfull algorithm achieves the best accuracy on four datasets (i.e., mnist, realsim, rcv1binary, rcv1multi) and the second best accuracy in two datasets (i.e., usps and gisette). Overall, our proposed method ESCKfull gets the best accuracy. The proposed method ESCKminiBatch gets slightly lower accuracy than ESCKfull but gets higher accuracy than the other five matrix sketching methods. The results in Table 2 demonstrate that our proposed methods achieve better accuracy than other methods.
Algorithms  Prediction Cost  

Projection Cost  Classification Cost  
Gaussian  
Achlioptas  
CountSketch  
SRHT  
SRHTtopr  
ESCK 
With respect to the sparsity rate of the sketched data, as expected, Gaussian, Achlioptas, SRHT and SRHTtopr will produce dense data even if the input data is sparse. The original countsketch method and our proposed methods can produce sparse embedding for highly sparse input data. The sparsity rate of the sketched data produced by our proposed methods is higher than the countsketch. Furthermore, our proposed method could result in sparse embedding for dense input data (e.g., usps and gisette). With respect to the prediction time, the prediction time of our methods is lower than other methods. The prediction cost for different algorithms is summarized in Table 4. Both countsketch and our proposed ESCK are very efficient for prediction.
We also compare the embedding time of our proposed method with the original countsketch during the training stage. The results are shown in Table 3. As expected, our proposed methods will be several times slower than the original countsketch since we need to perform means clustering on the columns of . ESCKminiBatch is faster than ESCKfull.
Impact of Projection Dimension . In Figure 2, we show the experimental results of all algorithms with different projection dimension . As shown in this figure, our proposed method ESCKfull consistently get better accuracy than other matrix sketching methods. The other two datadependent matrix sketching methods ESCKminibatch and SRHTtopr also gets better than the four dataoblivious matrix sketching method. When the parameter is small, the accuracy improvement of our proposed method is large on realsim, rcv1binary and rcv1multi datasets.
Impact of Sparse Sketched Matrix By tuning the parameter  ball projection, our proposed method can result in a very sparse sketched matrix . In this section, we would like to explore how the sparsity rate of the sketched matrix affects the classification accuracy. In Figure 3, we show the sparsity rate and accuracy for countsketch and ESCKfull. The blue dashed line shows the accuracy of countsketch and the sparsity rate is annotated by the text above this line. The red line shows the accuracies of ESCKfull with different sparsity rate of the sketched matrix. As shown in Figure 3
, our proposed methods obtain better accuracy than countsketch with a higher sparsity rate. As the sparsity rate increased, we can observe that accuracy could slightly decrease but still higher than countsketch. On the mnist dataset, the countsketch method generates a dense sketched matrix with a sparsity rate equals to 1.72% and the accuracy of the subsequent classifier is 87.65%. In comparison, the ESCKfull can generate a sparse sketched matrix with higher classification accuracy.
Conclusion
In this paper, we propose a novel datadependent countsketch algorithm that can produce more effective and sparse subspace embedding than the original dataindependent countsketch algorithm. Our new method applies means clustering algorithm to obtain the sketched data matrix. Sparse sketched data matrix is obtained by using gradient descent with  ball projection to optimize the means clustering objective function. We compared our proposed algorithm with the other five matrix sketching algorithms. Our experimental results on six reallife datasets have demonstrated that our proposed methods achieve higher classification accuracies than countsketch and other matrix sketching methods. Also, our proposed methods can produce sketched matrix with high sparsity rate than other methods that can make the subsequent classification model more efficient than other methods.
References
 Databasefriendly random projections: johnsonlindenstrauss with binary coins. Journal of computer and System Sciences 66 (4), pp. 671–687. Cited by: Introduction, 2nd item.

Approximate nearest neighbors and the fast johnsonlindenstrauss transform.
In
Proceedings of the thirtyeighth annual ACM symposium on Theory of computing
, pp. 557–563. Cited by: Introduction.  Fast dimension reduction using rademacher series on dual bch codes. Discrete & Computational Geometry 42 (4), pp. 615. Cited by: Introduction.
 Structured adaptive and random spinners for fast machine learning computations. In Artificial Intelligence and Statistics, pp. 1020–1029. Cited by: Introduction.
 Random projections for means clustering. In Advances in Neural Information Processing Systems, pp. 298–306. Cited by: Introduction.

LIBSVM: a library for support vector machines
. ACM transactions on intelligent systems and technology (TIST) 2 (3), pp. 1–27. Cited by: Experiments.  The unreasonable effectiveness of structured random orthogonal embeddings. In Advances in Neural Information Processing Systems, pp. 219–228. Cited by: Introduction.
 Lowrank approximation and regression in input sparsity time. Journal of the ACM (JACM) 63 (6), pp. 1–45. Cited by: Introduction, Introduction, CountSketch, 3rd item.
 An empirical evaluation of sketching for numerical linear algebra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1292–1300. Cited by: Introduction.
 An elementary proof of the johnsonlindenstrauss lemma. International Computer Science Institute, Technical Report 22 (1), pp. 1–5. Cited by: Introduction, 1st item.
 Asymptotics for sketching in least squares regression. In Advances in Neural Information Processing Systems, pp. 3675–3685. Cited by: Introduction.
 Efficient projections onto the l1ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272–279. Cited by: Improved countsketch by means and ball projection, Improved countsketch by means and ball projection.
 Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics 26 (189206), pp. 1. Cited by: Lemma 1.
 Principal component analysis. Springer. Cited by: Introduction.
 Improved subsampled randomized hadamard transform for linear SVM. In The ThirtyFourth AAAI Conference on Artificial Intelligence, pp. 4519–4526. Cited by: Introduction.
 Sparse embedded means clustering. In Advances in Neural Information Processing Systems, pp. 3319–3327. Cited by: Introduction.
 Least squares quantization in pcm. IEEE transactions on information theory 28 (2), pp. 129–137. Cited by: Improved countsketch by means and ball projection.
 Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning 3 (2), pp. 123–224. Cited by: Introduction.
 Randomized numerical linear algebra: foundations & algorithms. arXiv preprint arXiv:2002.01387. Cited by: Introduction.
 Random projections for linear support vector machines. ACM Transactions on Knowledge Discovery from Data (TKDD) 8 (4), pp. 1–25. Cited by: Introduction, Methodology.
 Webscale kmeans clustering. In Proceedings of the 19th international conference on World wide web, pp. 1177–1178. Cited by: Improved countsketch by means and ball projection, Algorithm 1.
 Practical sketching algorithms for lowrank matrix approximation. SIAM Journal on Matrix Analysis and Applications 38 (4), pp. 1454–1485. Cited by: Introduction.
 Improved analysis of the subsampled randomized hadamard transform. Advances in Adaptive Data Analysis 3 (01n02), pp. 115–126. Cited by: Introduction, 4th item.
 Sketching as a tool for numerical linear algebra. Theoretical Computer Science 10 (12), pp. 1–157. Cited by: Introduction.
 Efficient nonoblivious randomized reduction for risk minimization with improved excess risk guarantee. In Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, pp. 2796–2802. Cited by: Introduction.
Comments
There are no comments yet.