Almost Optimal Tensor Sketch
We construct a matrix M∈ R^m⊗ d^c with just m=O(c λ ε^-2polylog1/εδ) rows, which preserves the norm Mx_2=(1±ε)x_2 of all x in any given λ dimensional subspace of R^d with probability at least 1-δ. This matrix can be applied to tensors x^(1)⊗...⊗ x^(c)∈ R^d^c in O(c m min{d,m}) time – hence the name "Tensor Sketch". (Here x⊗ y = asvec(xy^T) = [x_1y_1, x_1y_2,...,x_1y_m,x_2y_1,...,x_ny_m]∈ R^nm.) This improves upon earlier Tensor Sketch constructions by Pagh and Pham [TOCT 2013, SIGKDD 2013] and Avron et al. [NIPS 2014] which require m=Ω(3^cλ^2δ^-1) rows for the same guarantees. The factors of λ, ε^-2 and log1/δ can all be shown to be necessary making our sketch optimal up to log factors. With another construction we get λ times more rows m=Õ(c λ^2 ε^-2(log1/δ)^3), but the matrix can be applied to any vector x^(1)⊗...⊗ x^(c)∈ R^d^c in just Õ(c (d+m)) time. This matches the application time of Tensor Sketch while still improving the exponential dependencies in c and log1/δ. Technically, we show two main lemmas: (1) For many Johnson Lindenstrauss (JL) constructions, if Q,Q'∈ R^m× d are independent JL matrices, the element-wise product Qx ∘ Q'y equals M(x⊗ y) for some M∈ R^m× d^2 which is itself a JL matrix. (2) If M^(i)∈ R^m× md are independent JL matrices, then M^(1)(x ⊗ (M^(2)y ⊗...)) = M(x⊗ y⊗...) for some M∈ R^m× d^c which is itself a JL matrix. Combining these two results give an efficient sketch for tensors of any size.
READ FULL TEXT