1 Introduction
Although nonparametric methods (Gershman & Blei, 2012) are exceptionally flexible methods for statistical modelling, they inherently lack scalability. A particular nonparametric method based on covariance functions, a.k.a. kernels, is the GP (Rasmussen & Williams, 2006) which has analytically tractable posterior and marginal likelihood. However, the inability to truly scale represents a major limitation for time series more general periodic problems which require computing and storing datadependent covariance matrices which have the potential to grow as one acquires more data. While there have been various efforts to approximate GPs with lower rank solutions based on inducing points (Snelson & Ghahramani, 2006; Hensman et al., 2013) these methods are still constrained by their datadependence.
To alleviate datadependence, recent innovations take advantage of spectral decompositions of the kernel functions with explicit dataindependent features. Motivated in the key works of (Rahimi & Recht, 2007, 2009) and termed Random Fourier Features (RFFs) is the idea of explicit dataindependent feature maps using ideas from harmonic analysis and sketching theory. By approximating kernels these maps allow scalable inference with simple linear models. Various approximations to different kernels have followed and include polynomial kernels (Pham & Pagh, 2013; Pennington et al., 2015), dotproduct kernels (Kar & Karnick, 2012), histogram and homogenous kernels (Li et al., 2010; Vedaldi & Zisserman, 2012), and approximations based upon the socalled triplespin (Choromanski et al., 2016): (Le et al., 2013; Yang et al., 2015; Felix et al., 2016). A recent work of note, quasiMonte Carlo features (QMC) (Sindhwani & Avron, 2014), uses deterministic sequences on the hypercube to approximate shift invariant kernels. In particular, we compare this body of work with the Halton sequence which was found to be highly performant, and with the generalised Halton using the permutation in (Rainville et al., 2012).
While much work has been done for dataindependent kernel approximations such as RFFs, as opposed to Nyströem (Williams & Seeger, 2001; Gittens & Mahoney, 2013), there is limited work on such approximations of periodic kernels. Two recent works (Solin & Särkkä, 2014; Tompkins & Ramos, 2018) explore approximations for periodic kernels in univariate timeseries where some response varies periodically with respect to time  however it is not clear how to tractably generalise such decompositions into multiple dimensions where the response varies periodically as a function of multiple inputs. In this paper, leveraging deterministic feature space decompositions of multivariate periodic kernels, we demonstrate the efficacy of our solution in terms of both kernel gram matrix reconstruction and generalisation error on synthetic and real data.
Specifically, in our contributions we provide: i) a generalisation of Fourier series approximations of stationary periodic kernels into the multivariate domain using sparse index sets; ii) an efficient sparse decomposition for multivariate Fourier series representations of periodic kernels; iii) an extension to nonisotropic periodic kernels; iv) a general bound for the cardinality of the resulting full and sparse feature sets; v) an upper bound to the truncation error for the multivariate feature approximation; and vi) a numerically stable implementation for the periodic Squared Exponential kernel (in supplementary material).
We finally compare in detail the proposed method against recent stateoftheart kernel approximations in terms of the kernel approximation error as well as the predictive downstream error. Empirical results on real datasets further demonstrate that deterministic index set based features provide significantly improved convergence generalisation properties by reducing both the data samples and the number of features required for equivalently accurate predictions.
2 Preliminaries
We begin by reviewing the relevant kernel literature. The inference setup is deferred to the supplementary.
Notation. Let represent the set of real numbers, the set of integers, the set of nonnegative integers, and the set of positive integers. For any arbitrary set , let be its Cartesian product repeated many times where . Let represent the dimensional torus, or the circle with . Throughout this paper, represents the spatial dimension and is the maximum refinement. The refinement may be interpreted as the multidimensional analogue for the integer valued univariate Fourier series truncation .
2.1 Fourier Features for Approximating Kernels
Random Fourier Features. The result of note from RFFs is Bochner’s Theorem (Bochner, 1933) which allows one to approximate a shift invariant kernel as . It states that a complexvalued function
is positive definite if and only if it is the Fourier transform of a finite nonnegative Borel measure
on : . Without loss of generality, assuming measurehas associated scaled probability density function
, we have:(1) 
where the kernel approximation is determined by (1). is the number of spectral samples from the density . This is in fact an MC approximation to the integral. In equation (1) and using the relation and the fact that real kernels have no imaginary part, we can exploit the trigonometric identity:
(2) 
giving the dimensional mapping :
(3) 
Fourier Series Features. A method of approximating periodic equivalents of standard shift invariant kernels is depicted in (Tompkins & Ramos, 2018), where the authors define a feature based framework based on Fourier series expansions of periodic kernel functions. The key idea is that the Fourier series of a periodic kernel can be harmonically decomposed in a deterministic manner. Given a Fourier series representation of some timedomain kernel , the truncated Fourier series is:
(4) 
with Fourier coefficients and . Here, is the half period and is the fundamental frequency. By solving for one can determine the corresponding analytic Fourier series coefficients. While the above work is tractable in the univariate case, we show in this work it does not straightforwardly expand to higher dimensions due to computationally intractable complexity in the number of approximating coefficients. This limitation for high dimensional periodicity motivates the main contribution of this work where we show there are sparse multivariate expansions for periodic kernels.
2.2 Fourier Series in Higher Dimensions
Our goal is to represent high dimensional periodic kernels. In the primal space of the full kernel, such a composition can be represented as a product of many independent periodic kernels on each dimension. It is known that product compositions in the primal space of the kernel have an equivalent cartesian product operation in the dual (feature) space. The key idea is that by noting particular results from harmonic theory on weighted subspaces of the Wiener algebra (Bonsall & Duncan, 2012; Kämmerer, 2014; Kämmerer et al., 2015), we can use sparse sampling grids to efficiently represent multivariate periodic kernels.
Consider a sufficiently smooth multivariate periodic function . A function can formally be represented as its multivariate Fourier series:
(5) 
with its Fourier series coefficients defined as . Let denote the space of all multivariate trigonometric polynomials supported on some arbitrary index set
. This is simply a finite set of integer vectors. Denote the
cardinality of the set as . Practically, we are interested in a good approximating Fourier partial sum supported on some suitable index set . We can now define weighted function spaces of the Wiener algebra: where we define a weight function , which attempts to characterise the decay of Fourier coefficients of all functions such that decreases faster than the weight function in terms of . That is to say that the decay of the Fourier coefficients defines the smoothness of the function we which are trying to approximate with a partial sum. Furthermore, subspaces of the Wiener algebra contain functions of specific types of smoothness; specifically, we investigate index sets on weighted balls (LPB), , which correspond to functions of isotropic smoothness, and also the Weighted Energy Norm Hyperbolic Cross (ENHC), which is a generalisation of the Weighted Hyperbolic Cross (HC), for functions of dominating mixed smoothness.2.3 Index Sets
While the complete expansion of a univariate Fourier series appears plausible, it is clear that for some maximum isotropic order for the coefficients that the cardinality
of the complete multivariate Fourier series expansion is computationally intractable; this cardinality corresponds to the tensor product of multiple 1dimensional Fourier series for each dimension. Making things worse, the computational burden is amplified when we consider the expanded representation required for the separable feature decomposition when the products of harmonic terms
of the Fourier series themselves must be expanded into sums of cosines. All these considered, it would thus be desirable to; i) maintain high function approximation accuracy; and ii) minimise the total number of coefficients.To this end we introduce the next ingredient required to approximate multivariate kernels: index sets. We introduce a variety of dimensional weighted index sets from the multivariate function approximation literature with their formal definitions. Here, we briefly review the ENHC and refer the interested reader to the supplementary material for extended coverage of additional index sets. The index set definitions to follow refer to the supporting frequency lattice for multivariate Fourier series where and are the Fourier series coefficients.
Weighted Energy Norm Hyperbolic Cross
This section introduces the Weighted Energy Norm Hyperbolic Cross (ENHC) index set (Dũng et al., 2016), which is a generalisation of the Hyperbolic Cross (HC) index set (Zaremba, 1972; Hallatschek, 1992; Kämmerer, 2013). HC based index sets are a more suitable method for approximating functions in spaces of dominating mixed smoothness, a.k.a. Korobov spaces (Gnewuch et al., 2014). In the case of the explicit tensor product of multivariate approximations (see supplementary regarding ball), the cardinality grows exponentially fast in dimension and is computationally intractable for accurately representing kernels in dimensions . The ENHC enables approximations of functions with a large proportion of the Fourier coefficient mass centered at the origin.
The weight function we consider for the ENHC is:
(6) 
with dimensiondependent weighting parameter and sparsity parameter . We can now define the Weighted Energy Norm Hyperbolic Cross index set of refinement :
(7) 
The primary difference between the ENHC and the HC index sets is that the cardinality of the ENHC for the case , which interestingly has a cardinality that isn’t bounded by the term which occurs in the HC (see supplementary for details). We show in corresponding experiments how the performance of the ENHC in higher dimensions is able to closely resemble that of the Tensor or Total order index sets and continue to perform effectively for higher dimensions where the ball becomes computationally intractable.
3 Index Set Fourier Series Features
The goal of our work is to show how multivariate Fourier series representations of kernels with sparse approximation lattices allow efficient and deterministic feature decompositions of multivariate periodic kernels. Formally, we define the shift invariant multivariate periodic kernel approximation as a multivariate Fourier series expansion supported on an arbitrary index set :
(8) 
for some explicit feature map and multivariate Fourier series coefficients .
We begin by explaining how one obtains Fourier series coefficients for univariate Fourier series kernel approximations. We follow with our feature construction which we term Index Set Fourier Series Features (ISFSF). We then introduce a sparse construction reducing total feature count for no loss of accuracy. We show it is straightforward to represent nonisotropic hyperparameters and perform nonisotropic approximations. Finally, we give an upper bound for the multivariate truncation error as well as an interesting numerically ideal asymptotic of the Besseltoexponential term in the Fourier coefficient term of the periodic Squared Exponential kernel (provided in supplementary).
3.1 Univariate Periodic Kernel Approximations
We demonstsrate first how one constructs a stationary periodic kernel and its corresponding Fourier series decomposition. It is possible to construct a periodic kernel from any stationary kernel by applying the warping (MacKay, 1998) to data and then passing the warped result into any standard stationary kernel. Specifically, by performing the warping to a stationary kernel with the general squared distance metric and replacing with we have:
(9) 
To demonstrate the warping process in full, as an example, consider the well known Squared Exponential (SE) kernel (Rasmussen & Williams, 2006) with lengthscale . After performing (9) we recover the periodic SE kernel: where . Solving the coefficients for (4) yields the univariate solution for a Fourier series representation . For the periodic SE
(10) 
where is the truncation factor of the partial sum of the Fourier series (Tompkins & Ramos, 2018). We use this univariate on a perdimension basis for our multivariate feature construction.
3.2 ISFSF Feature Construction
This section presents our main contribution for approximating high dimensional periodic kernels. We show that simply using multivariate Fourier series is not sufficient to represent a stationary kernel in a decomposable way. Using index sets for multivariate Fourier series we present a feature construction involving recursion of a trigonometric identity and the realisation there is a computationally efficient way to calculate the required sign coefficients within the harmonic feature terms using a cartesian combination. We also show that the resulting feature set is sparse and can thus be further reduced with no loss of accuracy.
We introduce the method with an illustrative example for the bivariate case of , and follow with the general construction for all . An extended worked example for is provided in the supplementary. Let us now consider the bivariate case of on :
(11) 
where and is some dimension and refinement index dependent constant. is a coordinate index vector subscripted by perdimension refinement index from some index set with max refinement . for the isotropic case of equal weights with . Multidimensional Fourier series coefficients are evaluated at index where the shorthand notation refers to the perdimension product of the respective coefficients: . For our feature expansion we are interested in the datadependent trigonometric term of the form , where and are the datadependent differences for dimension 1 and 2 respectively. In order for this to be decomposed into the form of (2) we require an additional trigonometric identity:
(12) 
For the bivariate case, a single application of (12) admits the solution. To explicitly illustrate this, we set refinement and use the full tensor grid index set . The tensor grid results in the given index set:
which, after applying (12), allows a straightforward decomposition into separable features. E.g. for and :
This trigonometric term can now be decomposed using (2) because we have independent cosine terms; that is to say a product term has a decomposition as sums of harmonic terms (cosines) where the inner terms involve the elementwise product where is the sign matrix. Continuing the above example, this sign coefficient matrix is: Having shown the expanded trigonometric form, we can now write the general form for the harmonic monomial product expansion for any particular index set coordinate for some arbitrary index set from the partial sum for a dimensional multivariate Fourier series:
(13) 
where is the dimension’s fundamental frequency. The term (13), after repeated application of (12), admits the following decomposable form:
(14) 
with
(15)  
(16)  
(17)  
(18) 
where is the times Cartesian combination of the ordered integer set. , denotes a horizontal concatenation between some matrix of length and every element in some ordered set , and refers to the Hadamard (elementwise) product. Using (2) the complete decomposed feature is thus given as:
(19) 
We can now state the cardinality (i.e. resulting dimensionality) of the resulting decomposed feature map over some index set .
Lemma 1.
Cardinality of Fourier series feature map for some arbitrary index set . Let be the cardinality of some given index set of refinement , and let the dimension be given. Let denote the cardinality of the decomposed feature The following holds (see supplementary for proof):
3.3 Sparse Construction
Although suitable, the decomposable form for the multivariate Fourier series features can be improved. An ideal feature representation should not just approximate our kernel well but should do it efficiently. For ISFSF, this involves the datadependent term , the occurrences of which we want to minimise. Scrutinising the product form in (13) a little closer, notice the term is an integer which clearly contains the value . This means that for all refinement coordinates at the refinement for any dimension we have , therefore the ”feature” is simply times some dataindependent coefficient. Furthermore, since any single term in the product (13) contributes to a multiplier of features before exponentiation due to the repeat application of (12), we don’t unnecessarily want to include features that will simply evaluate to a constant, e.g. if we have and giving 2 and 4 final features respectively. Masking out specific product terms where and retaining only the dataindependent coefficients in (13), we admit a sparse construction. We now define a masking function over some function with :
(20) 
Now we can define the augmented product expansion:
(21) 
where is the dimension’s fundamental frequency. The term (21), after repeated application of (12), admits the following decomposable form: with all corresponding terms identical to (18). We are now ready to state the sparse feature decomposition:
(22) 
We now give an improved cardinality for the decomposed sparse ISFSF feature map. We emphasise this improved feature map cardinality is for the same reconstructing accuracy. To determine the cardinality of this sparse feature map, let:
(23) 
define a function that counts for a particular coordinate the nonzero indexes. Essentially, we want to count which terms to keep, and these will always occur at coordinates with perdimension refinement not equal to 0.
Lemma 2.
Cardinality of sparse Fourier series feature map for some arbitrary index set . Let be the cardinality of some given index set of refinement , and let the dimension be given. Let then denote the cardinality of the decomposed index set Fourier series feature. The following holds (see supplementary for proof):
To appreciate this improvement in real terms with an example, imagine we have an HC index set with , refinement and . This index set has cardinality giving a full ISFSF feature expansion of cardinality whereas the sparse feature map has significantly reduced cardinality of for the same approximating accuracy.
3.4 Multivariate Truncation Error
We have from the univariate truncation error (Solin & Särkkä, 2014) for some that , and furthermore , which is to say the sum of the coefficients converges to 1. It is straightforward to extend this to the multivariate case by considering the tensor index set product expansion. We have then: . Additionally, since we define the multivariate truncation error:
(24) 
where is the refinement, superscript refers to some evaluation on the dimension, is the kernel lengthscale. refers to the approximated kernel’s Fourier coefficient at refinement index with the dimension’s corresponding lengthscale.
3.5 NonIsotropy
Nonisotropic Hyperparameters. Directly analogous to hyperparamter Automatic Relevance Determination (ARD) in GPs (MacKay, 1996; Neal, 2012) we demonstrate how, with ISFSF, one can straightforwardly model nonisotropic kernel hyperparameters such as lengthscale and period. There are two considerations.
Firstly, in the case of nonisotropic periodicities, we simply have the modification of the scalar into a vector resulting in the vector .
Secondly for nonisotropic hyperparameters such as the lengthscale, we again have a modification of a scalar value to a vector. For some arbitrary scalar hyperparameter , the nonisotropic extension is simply the vector .
Figure 3 demonstrates more clearly how nonisotropic lengthscale and periodicity affect the multivariate coefficients and datadependent harmonic terms. Observe the lower lengthscales correspond to slower decay in the coefficients reflecting the hyperparameterdependent upper bound given in section 3.4.
Nonisotropic Approximation. We now briefly remark upon nonisotropic approximation. Recall from Figure 2 the smaller the lengthscale the more refinements required for a particular approximation accuracy, and conversely, for larger lengthscales fewer refinements. This motivates the idea that for any particular multivariate kernel being approximated with Fourier series, it is not necessary to have isotropic refinements for all dimensions. We achieve no more than some improvement in approximating error between two levels of refinement , where and refer to refinement levels. Formally, nonisotropic approximation is defined by the refinement weight parameter for each weighted index set.
4 Experiments
RMSE  MNLL  

Total components  Total components  
Dataset  Method  
Pores  ENHC  0.0787  0.0701  0.0612  0.0608  0.0608  0.3971  0.9284  1.3578  1.3504  1.2833 
RFF+W  0.271  0.2615  0.2329  0.131  0.0614  9.7526  4.3514  1.1269  0.5845  1.1675  
HAL+W  0.2737  0.258  0.2267  0.1271  0.0612  9.7804  4.2501  1.0625  0.6224  1.1665  
GHAL+W  0.265  0.2436  0.2063  0.1316  0.0611  9.3598  3.8556  0.773  0.5759  1.1653  
Rubber  ENHC  0.1119  0.1042  0.1013  0.1026  0.1024  10.1399  3.5833  0.554  0.363  0.7092 
RFF+W  0.1491  0.1436  0.1443  0.1363  0.1102  16.3536  7.4633  2.4248  0.3837  0.6986  
HAL+W  0.1523  0.1412  0.1465  0.1319  0.1111  16.8435  2.3266  7.5562  0.2451  0.7092  
GHAL+W  0.146  0.1421  0.1344  0.1244  0.107  15.8572  7.1348  2.1058  0.0976  0.7331  
Tread  ENHC  0.0614  0.0617  0.0622  0.0624  0.0623  0.6146  1.0925  1.2306  1.2716  1.2747 
RFF+W  0.0611  0.0615  0.0625  0.0622  0.0618  0.5593  1.0157  1.1273  1.2301  1.176  
HAL+W  0.0613  0.0619  0.0625  0.0633  0.0626  0.5874  1.0001  1.1205  1.2236  1.2115  
GHAL+W  0.0622  0.0623  0.0628  0.0621  0.0617  0.5293  1.1558  1.2033  0.9971  1.2376 
We make three critical assessments of the proposed periodic kernel approximation. First, we directly compare the approximation with the full kernel in terms of Gram matrix reconstruction; second we will perform a comparison of predictive qualities against an analytic periodic function (see supplementary material); and finally we directly compare our features to the corresponding state of the art approximations on large scale real world textured image data.
4.1 Quality of Kernel Approximation
We first analyse the proposed feature in terms of the reconstruction error between a true Gram matrix , using the analytic periodic kernel, and its approximated Gram where each . For all comparisons the metric we use is the Normalized Frobenius error using uniformly drawn samples drawn on a single periodic interval . The primary comparison in Figure 4 compares the effects of various index set constructions, RFFs, QMC (Halton, Generalised Halton), and the following index sets: Energy Norm Hyperbolic Cross (ENHC), Total Order (TOT), Hyperbolic (HYP), and Euclidean (EUC). The supplementary contains an extended comparison including comparisons of index set parameters, additional dimensions, and nuances of the reconstruction.
The first observation we can make from Figure 4 is that for lower dimensions the best performing features are those with the Euclidean degree or Total order index sets which correspond to LPBall index sets with parameter set to 2 and 1 respectively. Of the index sets the Hyperbolic and Energy Norm Hyperbolic Cross perform the worst in particular for smaller lengthscales. Overall the index sets all perform significantly better than the warped Fourier Feature methods, amongst which, the original MC based RFF performs the worst and the standard Halton sequence appears to perform marginally better than the generalised Halton.
As the number of dimensions increases the Total and Euclidean index sets become intractable due to their heavy dependency on crossdimensional products of datadependent harmonic terms. Considering FF methods, the approximation accuracy of the standard Halton sequence falls behind even RFFs while the generalised Halton remains consistently ahead. As we have seen, as the dimensionality increases, the Total and Euclidean index sets have no parameterisation that allows them to scale properly; indeed their flexibility is due entirely being a specific instance of the LPBall index set which can give sparser and sparser Hyperbolic index sets. On the other hand, the ENHC can be parameterised by a sparsity and weighting factor giving additional flexibility.
4.2 Generalisation Error
We evaluate predictive performance with Root Mean Square Error (RMSE) and Mean Negative Log Loss (MNLL). The MNLL allows us to account for the model’s predictive mean and predictive uncertainty. It is defined as where , , and
are respectively the predictive standard deviation, predictive mean, and true value at the
test point. This section demonstrates the generalisation performance of a single multidimensional periodic kernel on image texture data. We use images from (Wilson et al., 2014) with the same train and test set pixel locations .Overall, it can be seen that ISFSF based feature provide significantly improved generalisation capability in terms of reconstructing the periodic missing data. For exactly the same kernel, both qualitatively and quantitatively the results show clear advantages over the alternative Fourier Feature methods. Results may be seen in both Table 1 and Figure 5.
Pores. In this dataset we can see the RMSE and MNLL for the ISFSF based features (using the ENHC) perform the best in all cases. The RMSE performs exceedingly well even with only 49 features almost equaling the performance of RFF and QMC methods which require 794 features. In both RMSE and MNLL, the ISFSF with 201 features outperforms FF based methods using 794 features.
Rubber. For this image depicting the pattern of a rubber math, in each group of feature counts, ISFSF outperforms all other methods and is again nearly equal in performance with 49 features to the FF methods using 794 features. The generalised Halton marginally outperforms all methods in the case of 794 features, and the Halton at 94 features. Figure LABEL:fig:icml_comparison_rubber_SUPPLEMENTARY show the predicted regions, for the rubber dataset, for various feature types and increasing number of feature components.
Tread. The last dataset we provides detail comparison is a textured metal tread pattern (see supplementary material). The resulting performance of this dataset is interesting because for all features, the RMSE performance is approximately equal for all feature counts. Additionally, while the RMSE performance remains relatively stable, the MNLL favours the ISFSF in all cases, with further improvements the larger the feature count.
5 Discussion
The ability to model periodicity is a unique problem in machine learning. Modelling with kernels, in particular with the GP, introduces problems related to datadependent model complexity. We show that in the multivariate domain this dependence may be lifted by taking advantage of harmonic decompositions of kernel functions allowing scalable inference. Crucially, we introduce effective sparse approximation to multivariate periodic kernels using multivariate Fourier series with sparse index set based sampling grids allowing efficient feature space periodic kernel decompositions. We thus demonstrate the ability of even a single ISFSF based feature to effectively generalise, in terms of both RMSE and MNLL, with few features.
There are several interesting problems to further investigate regarding ISFSF based decompositions, including i) generalised learning of kernels, before an ISFSF decomposition, that are dependent on the data and assume no apriori structure (Wilson & Adams, 2013; Jang et al., 2017); and arbitrary index sets that make no immediate assumption on the decay of the Fourier coefficients of the learned kernel (Ermakov, 1975; Kämmerer, 2014).
References
 Bochner (1933) Bochner, S. Monotone funktionen, stieltjessche integrale und harmonische analyse. Mathematische Annalen, 108:378–410, 1933. URL http://eudml.org/doc/159644.
 Bonsall & Duncan (2012) Bonsall, Frank F and Duncan, John. Complete normed algebras, volume 80. Springer Science & Business Media, 2012.
 Choromanski et al. (2016) Choromanski, Krzysztof, Fagan, Francois, GouyPailler, Cedric, Morvan, Anne, Sarlos, Tamas, and Atif, Jamal. Triplespina generic compact paradigm for fast machine learning computations. arXiv preprint arXiv:1605.09046, 2016.
 Dũng et al. (2016) Dũng, Dinh, Temlyakov, Vladimir N, and Ullrich, Tino. Hyperbolic cross approximation. arXiv preprint arXiv:1601.03978, 2016.
 Ermakov (1975) Ermakov, Sergeĭ Mikhaĭlovich. Die MonteCarloMethode und verwandte Fragen, volume 72. Oldenburg, 1975.
 Felix et al. (2016) Felix, X Yu, Suresh, Ananda Theertha, Choromanski, Krzysztof M, HoltmannRice, Daniel N, and Kumar, Sanjiv. Orthogonal random features. In Advances in Neural Information Processing Systems, pp. 1975–1983, 2016.
 Gershman & Blei (2012) Gershman, Samuel J and Blei, David M. A tutorial on bayesian nonparametric models. Journal of Mathematical Psychology, 56(1):1–12, 2012.
 Gittens & Mahoney (2013) Gittens, Alex and Mahoney, Michael W. Revisiting the nyström method for improved largescale machine learning. J. Mach. Learn. Res, 28(3):567–575, 2013.
 Gnewuch et al. (2014) Gnewuch, Michael, Mayer, Sebastian, and Ritter, Klaus. On weighted hilbert spaces and integration of functions of infinitely many variables. Journal of Complexity, 30(2):29–47, 2014.
 Hallatschek (1992) Hallatschek, Klaus. Fouriertransformation auf dünnen gittern mit hierarchischen basen. Numerische Mathematik, 63(1):83–97, 1992.

Hensman et al. (2013)
Hensman, James, Fusi, Nicolo, and Lawrence, Neil D.
Gaussian processes for big data.
In
Uncertainty in Artificial Intelligence
, pp. 282. Citeseer, 2013.  Jang et al. (2017) Jang, Phillip A, Loeb, Andrew, Davidow, Matthew, and Wilson, Andrew G. Scalable levy process priors for spectral kernel learning. In Advances in Neural Information Processing Systems, pp. 3943–3952, 2017.
 Kämmerer (2013) Kämmerer, Lutz. Reconstructing hyperbolic cross trigonometric polynomials by sampling along rank1 lattices. SIAM Journal on Numerical Analysis, 51(5):2773–2796, 2013.
 Kämmerer (2014) Kämmerer, Lutz. High dimensional fast Fourier transform based on rank1 lattice sampling. PhD thesis, Technische Universität Chemnitz, 2014.
 Kämmerer et al. (2015) Kämmerer, Lutz, Potts, Daniel, and Volkmer, Toni. Approximation of multivariate periodic functions by trigonometric polynomials based on rank1 lattice sampling. Journal of Complexity, 31(4):543–576, 2015.
 Kar & Karnick (2012) Kar, Purushottam and Karnick, Harish. Random feature maps for dot product kernels. In International conference on artificial intelligence and statistics, pp. 583–591, 2012.
 Le et al. (2013) Le, Quoc, Sarlós, Tamás, and Smola, Alex. Fastfoodapproximating kernel expansions in loglinear time. In Proceedings of the international conference on machine learning, volume 85, 2013.

Li et al. (2010)
Li, Fuxin, Ionescu, Catalin, and Sminchisescu, Cristian.
Random fourier approximations for skewed multiplicative histogram kernels.
InJoint Pattern Recognition Symposium
, pp. 262–271. Springer, 2010. 
MacKay (1996)
MacKay, David JC.
Bayesian methods for backpropagation networks.
InModels of neural networks III
, pp. 211–254. Springer, 1996.  MacKay (1998) MacKay, David JC. Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences, 168:133–166, 1998.
 Neal (2012) Neal, Radford M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
 Pennington et al. (2015) Pennington, Jeffrey, Felix, X Yu, and Kumar, Sanjiv. Spherical random features for polynomial kernels. In Advances in Neural Information Processing Systems, pp. 1846–1854, 2015.
 Pham & Pagh (2013) Pham, Ninh and Pagh, Rasmus. Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 239–247. ACM, 2013.
 Rahimi & Recht (2007) Rahimi, Ali and Recht, Ben. Random features for largescale kernel machines. Conference on Neural Information Processing Systems, 2007.
 Rahimi & Recht (2009) Rahimi, Ali and Recht, Benjamin. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems, pp. 1313–1320, 2009.
 Rainville et al. (2012) Rainville, De, Gagné, Christian, Teytaud, Olivier, Laurendeau, Denis, et al. Evolutionary optimization of lowdiscrepancy sequences. ACM Transactions on Modeling and Computer Simulation (TOMACS), 22(2):9, 2012.
 Rasmussen & Williams (2006) Rasmussen, Carl Edward and Williams, Christopher K. I. Gaussian Processes for Machine Learning. MIT Press, 2006.
 Sindhwani & Avron (2014) Sindhwani, Vikas and Avron, Haim. Highperformance kernel machines with implicit distributed optimization and randomization. arXiv preprint arXiv:1409.0940, 2014.
 Snelson & Ghahramani (2006) Snelson, E. and Ghahramani, Z. Sparse gaussian processes using pseudoinputs. In Advance in Neural Information Processing Systems, volume 18, pp. 1257, 2006.
 Solin & Särkkä (2014) Solin, Arno and Särkkä, Simo. Explicit link between periodic covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33, pp. 904–912, 2014.
 Tompkins & Ramos (2018) Tompkins, Anthony and Ramos, Fabio. Fourier feature approximations for periodic kernels in timeseries modelling. In AAAI Conference on Artificial Intelligence, 2018.
 Vedaldi & Zisserman (2012) Vedaldi, Andrea and Zisserman, Andrew. Efficient additive kernels via explicit feature maps. IEEE transactions on pattern analysis and machine intelligence, 34(3):480–492, 2012.
 Williams & Seeger (2001) Williams, Christopher KI and Seeger, Matthias. Using the nyström method to speed up kernel machines. In Advances in neural information processing systems, pp. 682–688, 2001.
 Wilson & Adams (2013) Wilson, Andrew and Adams, Ryan. Gaussian process kernels for pattern discovery and extrapolation. In International Conference on Machine Learning, pp. 1067–1075, 2013.
 Wilson et al. (2014) Wilson, Andrew G, Gilboa, Elad, Nehorai, Arye, and Cunningham, John P. Fast kernel learning for multidimensional pattern extrapolation. In Advances in Neural Information Processing Systems, pp. 3626–3634, 2014.
 Yang et al. (2015) Yang, Zichao, Wilson, Andrew, Smola, Alex, and Song, Le. A la carte–learning fast kernels. In Artificial Intelligence and Statistics, pp. 1098–1106, 2015.
 Zaremba (1972) Zaremba, SK. La méthode des “bons treillis” pour le calcul des intégrales multiples. In Applications of number theory to numerical analysis, pp. 39–119. Elsevier, 1972.
Comments
There are no comments yet.