# Continuous LWE is as Hard as LWE Applications to Learning Gaussian Mixtures

We show direct and conceptually simple reductions between the classical learning with errors (LWE) problem and its continuous analog, CLWE (Bruna, Regev, Song and Tang, STOC 2021). This allows us to bring to bear the powerful machinery of LWE-based cryptography to the applications of CLWE. For example, we obtain the hardness of CLWE under the classical worst-case hardness of the gap shortest vector problem. Previously, this was known only under quantum worst-case hardness of lattice problems. More broadly, with our reductions between the two problems, any future developments to LWE will also apply to CLWE and its downstream applications. As a concrete application, we show an improved hardness result for density estimation for mixtures of Gaussians. In this computational problem, given sample access to a mixture of Gaussians, the goal is to output a function that estimates the density function of the mixture. Under the (plausible and widely believed) exponential hardness of the classical LWE problem, we show that Gaussian mixture density estimation in ℝ^n with roughly log n Gaussian components given 𝗉𝗈𝗅𝗒(n) samples requires time quasi-polynomial in n. Under the (conservative) polynomial hardness of LWE, we show hardness of density estimation for n^ϵ Gaussians for any constant ϵ > 0, which improves on Bruna, Regev, Song and Tang (STOC 2021), who show hardness for at least √(n) Gaussians under polynomial (quantum) hardness assumptions. Our key technical tool is a reduction from classical LWE to LWE with k-sparse secrets where the multiplicative increase in the noise is only O(√(k)), independent of the ambient dimension n.

## Authors

• 2 publications
• 1 publication
• 12 publications
05/19/2020

### Continuous LWE

We introduce a continuous analogue of the Learning with Errors (LWE) pro...
04/27/2018

### On Basing One-way Permutations on NP-hard Problems under Quantum Reductions

A fundamental pursuit in complexity theory concerns reducing worst-case ...
06/12/2022

### Average-case hardness of estimating probabilities of random quantum circuits with a linear scaling in the error exponent

We consider the hardness of computing additive approximations to output ...
08/25/2021

### Quantum Algorithms for Variants of Average-Case Lattice Problems via Filtering

We show polynomial-time quantum algorithms for the following problems: ...
06/20/2021

### On the Cryptographic Hardness of Learning Single Periodic Neurons

We show a simple reduction which demonstrates the cryptographic hardness...
11/02/2021

### The supersingular isogeny path and endomorphism ring problems are equivalent

We prove that the path-finding problem in ℓ-isogeny graphs and the endom...
10/13/2021

### The Complexity of Bipartite Gaussian Boson Sampling

Gaussian boson sampling is a model of photonic quantum computing that ha...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The learning with errors (LWE) problem [regev2009lattices]

is a versatile average-case problem with connections to lattices, cryptography, learning theory and game theory. Given a sequence of noisy linear equations

over a ring , the LWE problem asks to recover the secret vector (and the decisional version of the problem asks to distinguish between LWE samples and uniformly random numbers mod ). Starting from the seminal work of Regev, who showed that a polynomial-time algorithm for LWE will give us a polynomial-time quantum algorithm for widely studied lattice problems, there has been a large body of work showing connections between LWE and lattice problems [Peikert09, brakerski2013classical]. Ever since its formulation in 2005, LWE has unlocked a wealth of applications in cryptography ranging from fully homomorphic encryption [BV14] to attribute-based encryption [GVW15] to, most recently, succinct non-interactive argument systems for all of P [CJJ21]. LWE-based cryptosystems lie at the center of efforts by the National Institute of Standards and Technology (NIST) to develop post-quantum cryptographic standards. LWE has also had applications to learning theory, in the form of hardness results for learning intersections of halfspaces [KlivansS09], and in game theory, where the hardness of LWE implies the hardness of the complexity class PPAD [JawaleKKZ21]. Finally, LWE enjoys remarkable structural properties such as leakage-resilience [goldwasser2010robustness].

Motivated by applications to learning problems, Bruna, Regev, Song and Tang [CLWE] recently introduced a continuous version of LWE which they called CLWE. (In the definition below and henceforth,

is the normal distribution with mean

and covariance matrix

where the probability of a point

is proportional to .)

[CLWE Distribution [CLWE], rescaled] Let , and let be a distribution over unit vectors in . Let be the distribution given by sampling , and outputting

 (ai,bi:=γ⋅\innerai,w+eimod1)mi=1

We refer to as the dimension and as the number of samples.

The search CLWE problem asks to find the secret vector given CLWE samples, whereas the decisional CLWE problem asks to distinguish between samples from the CLWE distribution and samples with standard normal (just like the CLWE distribution) but now with independent that are distributed uniformly between and .

Bruna et al. [CLWE] showed the hardness of the CLWE problem, assuming the worst-case quantum hardness of approximate shortest vector problems on lattices (such as gapSVP and SIVP). Aside from being quantum, the reduction makes non-black-box use of the rather involved techniques from [regev2009lattices, PeikertRS17]. A natural question is whether there is a classical worst-case to average-case reduction for CLWE, in analogy with such reductions in the context of LWE [Peikert09, brakerski2013classical]. An even better outcome would be if we can “piggyback” on the rich literature on worst-case to average-case reductions for LWE, without opening the box, hopefully resulting in a conceptually simple worst-case to average-case connection for CLWE. The conceptually clean way to accomplish all of this would be to come up with a direct reduction from LWE to CLWE, a problem that was explicitly posed in the recent work of Bogdanov, Noval, Hoffman and Rosen [BNHR22].

Our main conceptual contribution is a direct and conceptually simple reduction from LWE to CLWE. As an immediate application, we obtain a classical worst-case to average-case reduction for CLWE. Our main reduction also allows us to unlock powerful structural results on LWE [goldwasser2010robustness, brakerski2013classical, Mic18, BD20] and derive improved hardness results for learning mixtures of Gaussians with Gaussians instead of in [CLWE] (for arbitrary ). We now describe these results in turn.

### 1.1 Continuous LWE is as Hard as LWE

Our main result is a direct and conceptually simple reduction from LWE to CLWE. Recall that in the decisional LWE problem [regev2009lattices], we are given samples of the form where is uniformly random, is the LWE secret vector, and the errors

are chosen from the one-dimensional Gaussian with standard deviation

. The decisional LWE assumption (parameterized by and ) postulates that these samples are computationally indistinguishable from i.i.d. samples in .

[Informal Version of Theorem 5] Let be an arbitrary distribution over whose support consists of vectors with -norm exactly . Then, for

 γ =~O(r) and β=O(σq),

(where hides various poly-logarithmic factors), there is a dimension-preserving and sample-perserving polynomial-time reduction from decisional , with parameters and secret distribution , to decisional with parameters and , as long as .

Our main reduction, in conjunction with prior work, immediately gives us a number of corollaries. First, letting

be the uniform distribution on

, and invoking the hardness result for LWE with binary secrets [brakerski2013classical, Mic18, BD20], we obtain the following corollary. (The noise blowup of in the corollary below comes from the aforementioned reductions from LWE to LWE with binary secrets.)

[Informal Version of Corollary 5.1] For

 γ =~O(√n) and β=O(σ√nq),

there is a polynomial (in ) time reduction from standard decisional in dimension , with samples, modulus and noise parameter , to decisional in dimension with parameters and , as long as and .

The generality of our main reduction allows us to unlock techniques from the literature on leakage-resilient cryptography, specifically results related to the robustness of the LWE assumption [goldwasser2010robustness, brakerski2013classical, Mic18, BD20], and go much further. In particular, using a variant of the reduction of [Mic18] modified to handle -sparse secrets (discussed further in Section 2) we show the following corollary. In the corollaries, the condition (resp. ) comes from the entropy of random vectors (resp. random -sparse vectors).

[Informal Version of Corollary 5.2] For

 γ =O(√k⋅logn) and β=O(σ√kq),

we have a polynomial (in ) time reduction from standard decisional , in dimension , with samples, modulus , and noise parameter , to decisional in dimension with -sparse norm- secrets and parameters and , as long as and .

Looking ahead, we note that Corollary 5.2 will help us derive improved hardness for the problem of learning mixtures of Gaussians. Towards that end, it is worth stepping back and examining how far one can push Corollary 1.1. The LWE problem is believed to be exponentially hard; that is, in dimensions with a modulus and error parameter , LWE is believed to be hard for algorithms that run in time using samples, for any (see, e.g. [PeikertLindner]). Breaking this sub-exponential barrier not only has wide-ranging consequences for lattice-based cryptography, but also to the ongoing NIST post-quantum standardization competition [NIST] where better algorithms for LWE will lead NIST to reconsider the current parameterization of LWE-based encryption and signature schemes.

Assuming such a sub-exponential hardness of LWE, we get the hardness of CLWE with

 γ=(logn)12+δloglogn

for an arbitrarily small constant . On the other hand, under a far more conservative polynomial-hardness assumption on LWE, we get the hardness of CLWE with for an arbitrarily small .

Combining our main reduction with the known classical worst-case to average-case reduction for LWE [brakerski2013classical] gives us a classical worst-case to average-case reduction for CLWE.

[Classical Worst-case Hardness of CLWE, informal] There is an efficient classical reduction from worst-case -approximate in dimensions, to decisional in dimensions with and arbitrary .

Finally, in Appendix C, we also show a reduction in the opposite direction, that is, from (discrete-secret) CLWE to LWE. Modulo the discrete secret requirement, this nearly completes the picture of the relationship between LWE and CLWE. In turn, our reverse reduction can be combined with the other theorems in this paper to show a search-to-decision reduction for (discrete-secret) CLWE.

### 1.2 Improved Hardness of Learning Mixtures of Gaussians

Bruna, Regev, Song and Tang [CLWE]

used the hardness of CLWE to deduce hardness of problems in machine learning, most prominently the hardness of learning mixtures of Gaussians. (Subsequent work of Song, Zadik and Bruna

[SZB21]

used CLWE to show the hardness of learning a single periodic neuron.) We use our improved hardness result for CLWE to show improved hardness results for learning mixtures of Gaussians. First, let us start by describing the problem of Gaussian mixture learning.

#### Background on Gaussian Mixture Learning.

The problem of learning a mixture of Gaussians is of fundamental importance in many fields of science [titterington1985statistical, MPbook]. Given a set of multivariate Gaussians in dimensions, parameterized by their means , covariance matrices , and non-negative weights

summing to one, the Gaussian mixture model is defined to be the distribution generated by picking a Gaussian

with probability and outputting a sample from .

Dasgupta [Dasgupta99] initiated the study of this problem in computer science. A strong notion of learning mixtures of Gaussians is that of parameter estimation, i.e. to estimate all , and given samples from the distribution. If one assumes the Gaussians in the mixture are well-separated, then the problem is known to be tractable for a constant number of Gaussian components [Dasgupta99, sanjeev2001learning, vempala2002spectral, achlioptas2005spectral, kannan2005spectral, dasgupta2007probabilistic, brubaker2008isotropic, moitra2010settling, belkin2015polynomial, hardt2015tight, regev2017learning, hopkins2018mixture, kothari2018robust, diakonikolas2018list]. Moitra and Valiant [moitra2010settling] and Hardt and Price [hardt2015tight] also show that for parameter estimation, there is an information theoretic sample-complexity lower bound of where is the separation parameter and the number of Gaussian components.

Consequently, it makes sense to ask for a weaker notion of learning, namely density estimation, where, given samples from the Gaussian mixture, the goal is to output a “density oracle” (e.g. a circuit) that on any input , outputs an estimate of the density at  [feldman2006pac]. The statistical distance between the density estimate and the true density must be at most a parameter . The sample complexity of density estimation does not suffer from the exponential dependence in , as was the case for parameter estimation. In fact, Diakonikolas, Kane, and Stewart [diakonikolas2017statistical] show a upper bound on the information-theoretic sample complexity, by giving an exponential-time algorithm.

Density estimation seems to exhibit a statistical-computational tradeoff. While [diakonikolas2017statistical] shows a polynomial upper bound on sample complexity, all known algorithms for density estimation, e.g., [moitra2010settling], run in time for some . This is polynomial-time only for constant . On the flip side, [diakonikolas2017statistical] show that even density estimation of Gaussian mixtures incurs a super-polynomial lower bound in the restricted statistical query (SQ) model [kearns1998efficient, feldman2017statistical]. Explicitly, they show that any SQ algorithm giving density estimates requires queries to an SQ oracle of precision ; this is super-polynomial as long as is super-constant. However, this lower bound does not say anything about arbitrary polynomial time algorithms for density estimation.

The first evidence of computational hardness of density estimation for Gaussian mixtures came from the work of Bruna, Regev, Song and Tang [CLWE]. They show that being able to output a density estimate for mixtures of Gaussians implies a quantum polynomial-time algorithm for worst-case lattice problems. This leaves a gap between Gaussians, which is known to be learnable in polynomial time, versus Gaussians, which is hard to learn. What is the true answer?

#### Our Results on the Hardness of Gaussian Mixture Learning.

Armed with our reduction from LWE to CLWE, and leakage-resilience theorems from the literature which imply Corollaries 1.1 and 1.1, we demonstrate a rich landscape of lower-bounds for density estimation of Gaussian mixtures.

Using Corollary 1.1, we show a hardness result for density estimation of Gaussian mixtures that improves on [CLWE] in two respects. First, we show hardness of density estimation for Gaussians in dimensions for any , assuming the polynomial-time hardness of LWE. Combined with the quantum worst-case to average-case reduction for LWE [regev2009lattices], this gives us hardness for Gaussians under the quantum worst-case hardness of lattice problems. This improves on [CLWE] who show hardness for Gaussians under the same assumption. Secondly, our hardness of density estimation can be based on the classical hardness of lattice problems.

The simplicity and generality of our main reduction from LWE to CLWE gives us much more. For one, assuming the sub-exponential hardness of LWE, we show that density estimation of Gaussians cannot be done in polynomial time given a polynomial number of samples (where is an arbitrarily small constant). This brings us very close to the true answer: we know that Gaussians can be learned in polynomial time; whereas Gaussians cannot, under a standard assumption in lattice-based cryptography (indeed, one that underlies post-quantum cryptosystems that are about to be standardized by NIST [NIST]).

We can stretch this even a little further. We show the hardness of density estimation for Gaussians given samples (where is an arbitrary constant). This may come across as a surprise: is the problem even solvable information-theoretically given such few samples? It turns out that the sample complexity of density estimation for our hard instance, and also the hard instance of [diakonikolas2017statistical], is poly-logarithmic in . In fact, we show (in Corollary B) a quasi-polynomial time algorithm that does density estimation for our hard instance with samples. In other words, this gives us a tight computational gap for density estimation for the Gaussian mixture instances we consider.

These results are summarized below and more succinctly in Figure 1. The reader is referred to Section 6 for the formal proofs.

[Informal Version of Corollary 6 and Corollary 6] We give the following lower bounds for GMM density estimation based on LWE assumptions of varying strength.

1. Assuming standard polynomial hardness of LWE, any density estimator for that can solve arbitrary mixtures with at most Gaussian components, given samples from the mixture, requires super-polynoimal time in for arbitrary .

2. Assuming -dimensional is hard to distinguish with advantage in time , any density estimator for that can solve arbitrary mixtures with at most roughly Gaussian components, given samples from the mixture, requires super-polynoimal in time.

3. Assuming -dimensional is hard to distinguish with advantage in time , any density estimator for that can solve arbitrary mixtures with at most roughly Gaussian components, given samples from the mixture, requires super-polynoimal in time.

### 1.3 Other Applications

We mention that our hardness result for CLWE can also be applied in showing (further) hardness of learning single periodic neurons, i.e.

, neural networks with no hidden layers and a periodic activation function

with frequency . Song, Zadik, and Bruna [song2021cryptographic] give a direct reduction from CLWE to learning single periodic neurons, showing hardness of learning this class of functions assuming the hardness of CLWE. Our reduction from LWE to CLWE shows that this hardness result can be based directly on LWE instead of worst-case lattice assumptions, as done in [CLWE]. Furthermore, our results expand the scope of their reduction in two ways. First, their reduction shows hardness of learning periodic neurons with frequency , while ours, based on exponential hardness of LWE, applies to frequencies almost as small as , which covers a larger class of periodic neurons. Second, the hardness of -sparse CLWE from (standard) LWE shows that even learning sparse features (instead of features drawn from the unit sphere ) is hard under LWE for appropriate parameter settings.

### 1.4 Perspectives and Future Directions

The main technical contribution of our paper is a reduction from the learning with errors (LWE) problem to its continuous analog, CLWE. A powerful outcome of our reduction is the fact that one can now bring to bear powerful tools from the study of the LWE problem to the study of continuous LWE and its downstream applications. We show two such examples in this paper: the first is a classical worst-case to average-case reduction from the approximate shortest vector problem on lattices to continuous LWE; and the second is an improved hardness result for the classical problem of learning mixtures of Gaussians. We believe much more is in store.

For one, while we show a search-to-decision reduction for discrete-secret CLWE (see Appendix C), we still do not know such a reduction for general CLWE. This is in contrast to multiple search-to-decision reductions of varying complexity and generality for the LWE problem [regev2009lattices, MM11]. Secondly, while there has been some initial exploration of the cryptographic applications of the continuous LWE problem [BNHR22], constructing qualitatively new cryptographic primitives or qualitatively better cryptographic constructions is an exciting research direction. A recent example is the result of [GKVZ22] who show use the hardness of CLWE to undetectably backdoor neural networks.

Finally, in terms of the hardness of learning mixtures of Gaussians, the question remains: what is the true answer? The best algorithms for learning mixtures of Gaussians [moitra2010settling] run in polynomial time only for a constant number of Gaussians. We show hardness (under a plausible setting of LWE) for roughly Gaussians.

In our hard instance, the Gaussian components live on a line, and indeed a one-dimensional lattice. For such Gaussians, we know from Bruna et al. [CLWE] that there exists an algorithm running in time roughly , which becomes almost polynomial at the extremes of our parameter settings. Thus, we show the best lower bound possible for our hard instance. (In fact, for our hard instance, we can afford to enumerate over all sparse secret directions to get a solver with a similar run-time as [CLWE] but with much smaller sample complexity. See Corollary B for details.)

There remain three possibilities:

• There is a different hard instance for learning any superconstant number of Gaussians in polynomial time, and hardness can be shown by reduction from lattice problems; or

• There is a different hard instance for learning any superconstant number of Gaussians in polynomial time, but lattice problems are not the source of hardness; or

• We live in algorithmica, where the true complexity of Gaussian mixture learning is better than and looks perhaps more like , despite what SQ lower bounds suggest [diakonikolas2017statistical].

If we believe in the first two possibilities, a natural place to look for a different hard instance is [diakonikolas2017statistical], who consider a family of Gaussian pancakes centered at the roots of a Hermite polynomial. This allows them to match the first moments with that of the standard Gaussian. A tantalizing open problem is to try and prove hardness for their distribution for all algorithms, not just SQ algorithms, possibly under some cryptographic assumptions or perhaps even lattice assumptions.

## 2 Technical Overview

#### From Fixed-Norm-LWE to CLWE.

The goal of our main theorem (Theorem 1.1) is to reduce from the fixed-norm-LWE problem to CLWE. This involves a number of transformations, succinctly summarized in Figure 2. Given samples , we do the following:

1. First, we turn the errors (in ) from discrete to continuous Gaussians by adding a small continuous Gaussian to the LWE samples, using the smoothing lemma [MicciancioR07].

2. Secondly, we turn the samples from discrete to continuously uniform over the torus by doing the same thing, namely adding a continuous Gaussian noise, and once again invoking appropriate smoothing lemmas from [regev2009lattices].

3. Third, we go from uniform samples to Gaussian samples. Boneh, Lewi, Montgomery and Raghunathan  [BLMR13] give a general reduction from samples to “coset-sampleable” distributions, and as one example, they show how to reduce discrete uniform samples to discrete Gaussian samples, at the cost of a multiplicative overhead in the dimension, which is unavoidable information-theoretically. We improve this reduction and circumvent this lower bound in the continuous version by having no overhead in the dimension, i.e. the dimension of both samples are the same. The key ingredient to this improvement is a simple Gaussian pre-image sampling algorithm, which on input , outputs such that and is statistically close to a continuous Gaussian (when marginalized over ). (See Lemma 5 for a more precise statement.)

4. This finishes up our reduction! The final thing to do is to scale down the secret and randomly rotate it to ensure that it is a uniformly random unit vector.

We note that up until the final scaling down and re-randomization step, our reduction is secret-preserving.

#### Hardness of Gaussian Mixture Learning.

Bruna et al. [CLWE] show that a homogeneous version of CLWE, called hCLWE, has a natural interpretation as a certain distribution of mixtures of Gaussians. They show that any distinguisher between the hCLWE distribution and the standard multivariate Gaussian is enough to solve CLWE. Therefore, an algorithm for density estimation for Gaussian mixtures, which is a harder problem than distinguishing between that mixture and the standard Gaussian, implies a solver for CLWE. The condition that is a consequence of their reduction from worst-case lattice problems.

Our direct reduction from LWE to CLWE opens up a large toolkit of techniques that were developed in LWE-based cryptography. In this work, we leverage tools from leakage-resilient cryptography [brakerski2013classical, Mic18, BD20] to improve and generalize the hard instance of [CLWE]. The key observation is that the number of Gaussians in the mixture at the end of the day roughly corresponds to the norm of the secrets in LWE. Thus, the hardness of LWE with low-norm secrets will give us the hardness of Gaussian mixture learning with a small number of Gaussians.

Indeed, we achieve this by reducing LWE to -sparse LWE. We call a vector -sparse if it has exactly non-zero entries. We show the following result:

[Informal Version of Corollary 4] Assume LWE in dimension with samples is hard with secrets and errors of width . Then, LWE in dimension with -sparse secrets is hard for errors of width , as long as .

It turns out that for our purposes, the quantitative tightness of our theorem is important. Namely, we require that the blowup in the noise depends polynomially only on and not on other parameters. Roughly speaking, the reason is that if we have a blow-up factor of , for our LWE assumption, we need for the resulting CLWE distribution to be meaningful. For our parameter settings, if depends polynomially on the dimension (the dimension of the ambient space for the Gaussians) or the number of samples , then we require subexponentially large modulus-to-noise ratio in our LWE assumption, which is a notably stronger assumption. Indeed, the noise blow-up factor of the reduction we achieve and use is .

Our proof of this theorem uses a variant of the proof of [Mic18] to work with -sparse secrets. We note that Brakerski and Döttling [BD20] give a general reduction from LWE to LWE with arbitrary secret distributions with large enough entropy, but the noise blowup when applying their results directly to -sparse secrets is roughly for parameter settings we consider.111It turns out that the techniques of Brakerski et al. [brakerski2013classical], who show the hardness of binary LWE, can also be easily modified to prove -sparse hardness, but the overall reduction is somewhat more complex. For this reason, we choose to show how to modify the reduction of [Mic18].

For a full description of the proof of Theorem 2, the reader is referred to Section 4.

## 3 Preliminaries

For a distribution , we write

to denote a random variable

being sampled from . For any , we let denote the -fold product distribution, i.e. is generated by sampling independently. For any finite set , we write to denote the discrete uniform distribution over ; we abuse notation and write to denote . For any continuous set , we write to denote the continuous uniform distribution over (i.e. having support and constant density); we also abuse notation and write to denote .

For distributions supported on a measurable set , we define the statistical distance between and to be . We say that distributions are -close if . For a distinguisher running on two distributions , , we say that has advantage if

 ∣∣∣Prx∼D1[A(x)=1]−Prx∼D2[A(x)=1]∣∣∣≥ϵ,

where the probability is also over any internal randomness of .

We let denote the identity matrix. When is clear from context, we write this simply as . For any matrix , we let be its transpose matrix, and for , we write to denote the submatrix of consisting of just the first columns, and we write to denote the submatrix of consisting of all but the first columns.

For any vector , we write to mean the standard -norm of , and we write to denote the -norm of , meaning the maximum absolute value of any component. For , we let denote the -dimensional sphere embedded in , or equivalently the set of unit vectors in . By , we refer to the ring of integers modulo , represented by . By , we refer to the set where addition (and subtraction) is taken modulo (i.e. is the torus scaled up by ). We denote to be the standard torus. By taking a real number mod , we refer to taking its representative as an element of in unless stated otherwise.

[Min-Entropy] For a discrete distribution with support , we let denote the min-entropy of ,

 H∞(D)=−log2(maxs∈SPrx∼D[x=s]).

[Leftover Hash Lemma [haastad1999pseudorandom]] Let , and let be a distribution over . Suppose . Then, the distributions given by and where , , have statistical distance at most .

### 3.1 Lattices and Discrete Gaussians

A rank integer lattice is a set of all integer linear combinations of linearly independent vectors in . The dual lattice of a lattice is defined as the set of all vectors such that for all .

For arbitrary and , let

 ρs,c(x)=1snexp(−π\norm(x−c)/s2)

denote the density function of the standard Gaussian over of width centered at . Let be the corresponding distribution. Note that is the

-dimensional Gaussian distribution with mean

and covariance matrix . When , we omit the subscript notation of on and .

For an -dimensional lattice and point , we can define the discrete Gaussian of width to be given by the mass function

 DΛ+c,s(x)=ρs(x)ρs(Λ+c)

supported on , where by we mean .

We now give the smoothing parameter as defined by [regev2009lattices] and some of its standard properties.

[[regev2009lattices], Definition 2.10] For an -dimensional lattice and , we define to be the smallest such that .

[[regev2009lattices], Lemma 2.12] For an -dimensional lattice and , we have

 ηϵ(Λ)≤√ln(2n(1+1/ϵ))π⋅λn(Λ).

Here is defined as the minimum length of the longest vector in a set of linearly independent vectors in .

[[regev2009lattices], Corollary 3.10] For any -dimensional lattice and , and , if

 ηϵ(Λ)≤1√1/(σ′)2+(\normz/σ)2,

then if and , then has statistical distance at most from .

[[micciancio2007worst], Lemma 4.1] For an -dimensional lattice , , for all , we have

 Δ(Ds,cmodP(Λ),U(P(Λ)))≤ϵ/2,

where is the half-open fundamental parallelepiped of .

[[micciancio2007worst], implicit in Lemma 4.4] For an -dimensional lattice , for all , , and all , we have

 ρs(Λ+c)=ρs,−c(Λ)∈[1−ϵ1+ϵ,1]⋅ρs(Λ).

Now we recall other facts related to lattices.

[[micciancio2013hardness], Theorem 3] Suppose with , and suppose for all . As long as for all , then we have is -close to where .

[[Mic18], Lemma 2.2] For , the probability that is at most .

We say that a matrix is primitive if , i.e., if is surjective.

[[Mic18], Lemma 2.6] For any primitive matrix and positive reals , if and , then and are -close.

We also use the notation to denote a multivariate Gaussian distribution with mean and covariance matrix for symmetric positive semi-definite .

We now define mixtures of Gaussians, follow the definition for estimating the density for mixtures of Guassians as given in [CLWE].

Let be the set of all mixtures of Gaussians in . That is, contains exactly the distributions distribution that can be written as

 P=∑i∈[k]wi⋅\Norm(\boldmathμi,Σi),

for weights summing to 1 and arbitrary and covariance matrices . We define the problem of density estimation for to be the following problem. Given sample access to an arbitrary (and unknown) , with probability , output a distribution (as an evaluation oracle) such that .

### 3.2 Learning with Errors

Throughout, we work with decisional versions of LWE, CLWE, and hCLWE.

[LWE Distribution] Let , let be a distribution over , be a distribution over , and be a distribution over . We define to be distribution given by sampling , , and , and outputting for all . We refer to as the dimension and as the number of samples. (The modulus is suppressed from notation for brevity as it will be clear from context.)

We also consider the case where is a distribution over and is a distribution over . In this case, the ouput of each sample is , where and .

[CLWE Distribution [CLWE]] Let , and let be a distribution over and be a distribution over . Let be the distribution given by sampling , and outputting for all . Explicitly, for one sample, the density at is proportional to

 A(y)⋅∑k∈Zρβ(z+k−γ⋅\innery,s)

for fixed secret . We refer to as the dimension and as the number of samples. We omit if , as is standard for CLWE.

[hCLWE Distribution [CLWE]] Let , and let be a distribution over and be a distribution over . Let be the the distribution , but conditioned on the fact that for all samples second entries are .

Explicitly, for one sample, the density at is proportional to

 A(y)⋅∑k∈Zρβ(k−γ⋅\innery,s)

for fixed secret . We refer to as the dimension and as the number of samples. We omit if , as is standard for hCLWE. Note that the distribution is itself a mixture of Gaussians. Explicitly, for a secret , we can write the density of at point as proportional to

 ρ(x)⋅∑k∈Zρβ(k−γ⋅\inners,x)=∑k∈Zρ√β2+γ2(k)⋅ρ(πs⊥(x))⋅ρβ/√β2+γ2(\inners,x−γβ2+γ2k), (1)

where denotes the projection onto the orthogonal complement of . Thus, we can view samples as being drawn from a mixture of Gaussians of width in the secret direction, and width 1 in all other directions.

[Truncated hCLWE Distribution [CLWE]] Let , and let be a distribution over . Let be the the distribution , but restricted to the central Gaussians, where by central Gaussians, we mean the central Gaussians in writing samples as a mixture of Gaussians, as in Eq. 1. Explicitly, for secret , the density of one sample at a point is

 \floor(g−1)/2∑k=−\floorg/2ρ√β2+γ2(k)⋅ρ(πs⊥(x))⋅ρβ/√β2+γ2(\inners,x−γβ2+γ2k). (2)

[Density Estimation for the Gaussian Mixture Model (Definition 5.1 of [CLWE]] We say that an algorithm solves GMM density estimation in dimension with samples and up to Gaussians if, when given samples from an arbitrary mixture of at most Gaussian components in , the algorithm outputs some density function that has statistical distance at most from the true density function of the mixture, with probability at least (over the randomness of the samples and the internal randomness of the algorithm).

The following theorem tells us that distinguishing a truncated version of the hCLWE Gaussian mixture from the standard Gaussian is enough to distinguish the original Gaussian mixture from the standard Gaussian. In particular, we can use density estimation to solve hCLWE since the truncated version has a finite number of Gaussians.

[Proposition 5.2 of [CLWE]] Let , with and . Let be a distribution over . For sufficiently large and for , if there is an algorithm running in time that distinguishes and with constant probability, then there is a time algorithm distinguishing and with constant probability. In particular, if there is an algorithm running in time that solves density estimation with in dimension with samples and Gaussians, then there is a time algorithm distinguishing and with constant probability.

We also use a lemma which says that if CLWE is hard, then so is hCLWE.

[Lemma 4.1 of [CLWE]] There is an expected -time reduction such that maps samples to and maps to .

## 4 Hardness of k-sparse LWE

In this section, we modify the proof of [Mic18] to reduce from standard decisional LWE to a version where secrets are sparse, in the sense that they have few non-zero entries. The main changes we make to [Mic18] are that we slightly modify the gadget matrix and the matrix to handle sparse secrets (using its notation).

For completeness, we give a self-contained proof.

For with , let be the subset of vectors in with exactly non-zero entries. We call -sparse if .

It holds that .

###### Proof.

Observe that . Using the bound , we have

 H∞(Sn,k)≥log2((2⋅nk)k)≥klog2(n/k),

as desired. ∎

Our main theorem in this section is the following: Let with , and let . Suppose , and . Suppose there is no -time distinguisher with advantage between and , and further suppose there is no -time distinguisher with advantage between and . Then, there is no time distinguisher with advantage (up to additive factors) between and , where .

Let with . For all , we define to be the th standard basis column vector, i.e. having a 1 in the th coordinate and 0s elsewhere. We then define to be , i.e. 1s in the first coordinates and 0 elsewhere.

There is a -time computable matrix such that is invertible, , the vector satisfies and , and and are close as long as for a free parameter .

###### Proof.

We use essentially the same gadget as in Lemma 2.7 of [Mic18], except we modify two entries of the matrix and add two columns. Specifically, we set (instead of ), (instead of 1), and add two columns to the end that are all 0 except for two entries of in and .

We will give it explicitly as follows. Let the matrix be defined by

 X=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣−11−1⋱⋱1−1101−1⋱⋱1−11⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

where the row with the abnormal is the th row. Similarly, let be defined by

 Y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣111⋱⋱111011⋱⋱111⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

where the row with the abnormal is again the th row. We then define by

 Q=[e1,X,−en,Y,en,e1,e1,ek,ek].

First, notice that is invertible, since it is upper-triangular with 1s on the diagonal. Next, notice that , as and the sum of the first entries in each column of are all 0 by construction. We can write , which has norm

 √(k−1)⋅22+4⋅12=2√k.

It’s clear to also see that . All that is remaining to show is that and are -close, which we do below.

To show that