t3f
Tensor Train decomposition on TensorFlow
view repo
Tensor Train decomposition is used across many branches of machine learning, but until now it lacked an implementation with GPU support, batch processing, automatic differentiation, and versatile functionality for Riemannian optimization framework, which takes in account the underlying manifold structure in order to construct efficient optimization methods. In this work, we propose a library that aims to fix it and makes machine learning papers that rely on Tensor Train decomposition easier to implement. The library includes 92
READ FULL TEXT VIEW PDFTensor Train decomposition on TensorFlow
Methods based on tensor decompositions gain more and more traction in the machine learning community and are used for analyzing theoretical properties of deep learning
(Cohen et al., 2016; Cohen and Shashua, 2016; Khrulkov et al., 2017), compactly parametrizing models
(Lebedev et al., 2015; Novikov et al., 2015; Yu et al., 2017), training probabilistic models (Anandkumar et al., 2012; Jernite et al., 2013; Song et al., 2013) and deep learning models (Janzamin et al., 2015), parameterizing recommender systems (Frolov and Oseledets, 2017), and many more. In this work, we focus on implementing a library for working with one tensor decomposition in particular – the Tensor Train decomposition (Oseledets, 2011).Despite the fact that there are already three different libraries^{1}^{1}1https://github.com/oseledets/ttpy^{2}^{2}2https://www.mathworks.com/matlabcentral/fileexchange/46312-oseledets-tt-toolbox^{3}^{3}3https://pypi.python.org/pypi/TensorToolbox/ that implement the Tensor Train decomposition, all the recent papers that use it for machine learning purposes had to rewrite core functionality from scratch because the existing implementations do not support GPU execution, automatic differentiation (which forced Novikov et al. (2015) to derive gradients by hand), do not support parallel processing of a batch of tensors (which forced Novikov et al. (2016) to rewrite basic operations in TensorFlow), and lack advanced support for Riemannian geometry operations, which is a technique that allows to boost the optimization of models with the constraint that parameters have compact Tensor Train representation (or other constraint set that forms a smooth manifold).
In the presented library, we aim to make all the results in machine learning papers utilizing the TT decomposition easy to reproduce and provide flexible support for developing new ideas. The library is released^{4}^{4}4https://github.com/Bihaqo/t3f under MIT license and is distributed as a PyPI package^{5}^{5}5https://pypi.python.org/pypi/t3f to simplify installation process. The API reference documentation is also available online^{6}^{6}6https://t3f.readthedocs.io
. The library includes several Jupyter notebook examples such as compressing neural network weights by factorizing them into the Tensor Train format or performing tensor completion by assuming that the result has low TT-rank. The library has
test coverage.The library provides two base classes: TensorTrain and TensorTrainBatch that support storing one tensor in the Tensor Train format and a batch of such tensors respectively, i.e. a list of tensors of the same shape that are supposed to be processed together. These two classes support most of the logic of tf.Tensor class (e.g. .op, .name, and .get_shape methods). Under the hood, these classes are containers for the factors (which are represented as tf.Tensor objects) of the TT-format plus lightweight meta-information, which means that shall a person need to work with the factors directly she may easily access them. The rest of the library is a collection of functions which take as an input one or two TT-objects and output a TT-object or a tf.Tensor object depending on the semantics of a particular function. For example, function t3f.multiply(left, right) implements elementwise multiplication of two TT tensors (or batches of TT tensors), but also supports multiplication of a TT tensor by a number. As an output, this function returns a TensorTrain or a TensorTrainBatch object.
Basic functionality of the library consists of tools for creating tensors (e.g. t3f.ones or t3f.random_tensor), rich indexing of the tensors, element-wise operations (addition and multiplication), matrix by matrix multiplication, SVD based operations (e.g. factorizing a tensor into the TT-format or rounding a TT-object to find it’s closest lower rank approximation). For a complete list of supported operations, see the API reference documentation.
Most operations accept broadcasting and getting a batch of TT-objects as an input. For example, C = t3f.matmul(A, B) for a batch of TT-matrices A and a TT-matrix B will return a batch of TT-matrices C where and the result is computed in parallel across the batch dimension. Also, there are operations specifically tailored to batch inputs such as pairwise_flat_inner(x, y), which computes the matrix of inner products .
One of the advantages of the Tensor Train format is that the set of tensors of fixed TT-ranks forms a Riemannian manifold, which allows using Riemannian geometry ideas to speed up tensor calculus while preserving theoretical guaranties (see Steinlechner (2016) for more details). The T3F library has a rich support for Riemannian operations, the most basic being projecting a TT-object (or a batch of them) onto the tangent space of another TT-object . We denote this projection operation by .
Other supported operations are special cases of combining this basic projection operation with non-Riemannian operations, but are heavily optimized by exploiting the structure of objects that are projected onto the same tangent space. Such operations include projecting a weighted sum of a batch of TT-objects on a tangent space (necessary for efficiently computing the Riemannian gradient):
S = t3f.project_sum(what=A, where=B, weights=c)
Mathematically, this function implements the following operation: . The same operation can be implemented by a projection followed by summation and rounding operations in asymptotic complexity where is the batch-size, is the number of TT-cores, is the mode size of each axis of the tensor (i.e. the tensors are of size ), and are the TT-ranks of the tensors and . But, the tailored project_sum operation requires only for the same operation.
Other tailored operations include computing the Gram matrix of a batch of tensors from the same tangent space with asymptotic complexity (the same operation in the general case is
); and projecting matrix-by-vector product onto a tangent space
with asymptotic complexity , while doing it in two steps – matrix by vector multiplication followed by projection – would require .For matrices, the TT-format is introduced in a special way (in contrast to treating a matrix as a 2-dimensional tensor) such that the Kronecker product of two matrices is a TT-matrix with two TT-factors and the TT-rank being 1 (for details see Novikov et al. (2014)). Since the Kronecker product is a special case of a TT-object, T3F library provides means to work with Kronecker products. For example, one can find the closest approximation (according to Frobenius norm) of a matrix as a Kronecker product of two matrices and of sizes and
t3f.to_tt_matrix(E, shape=((M1, N1), (M2, N2)), max_tt_rank=1)
Kronecker product matrices allow many operations such as computing the determinant, inverse, or computing the Cholesky decomposition to be performed much faster than in the general case. These operations are supported in the t3f.kronecker module (see the full list of supported operations in the API reference documentation). However, some operations such as summing two Kronecker product matrices will result in a general matrix that lacks these properties. To retract back to the class of Kronecker product matrices, one can compute closest approximation of a sum of two Kronecker product matrices as a Kronecker product matrix without ever materializing the large matrix with the following code
first = t3f.TensorTrain([A1, B1]) second = t3f.TensorTrain([A2, B2]) res_exact = first + second res_kronecker_product = t3f.round(res_exact, max_tt_rank=1)
In this section, we benchmark the basic functionality of T3F on CPU and GPU and compare its performance against the most actively developed alternative library TTPY. To reproduce the benchmark on your hardware, see examples/profile folder in the T3F library.
For benchmarking, we generated a batch of 100 random TT-matrices of sizes (so the TT-representation consists of 10 factors) of TT-rank 10 and a batch of 100 random TT-vectors of size . We benchmarked the matrix-by-vector multiplication (‘matvec’), matrix-by-matrix multiplication (‘matmul’), computing the Frobenious norm (‘norm’), computing the Gram matrix of 1 or of 100 TT-vectors (in the case of one vector this is just computing the dot-product of the only vector with itself). There are also two additional operation with different inputs: rounding one or a batch of 100 TT-vectors of TT-rank 100 to the closest TT-rank 10 TT-vectors (‘round’) and projecting 1 or a batch of 100 TT-vectors of TT-rank 100 onto a tangent space of a TT-vector of TT-rank 10 and size . The results are reported in Tbl. 1. Note that TTPY lacks GPU and batch processing support. In the batch case, the time is reported per object, e.g. it actually takes 0.3 ms to process a batch of 100 matrix-by-vector multiplications, but in the table number 0.003 is reported.
Op | TTPY
1 object CPU |
T3F
1 object CPU |
T3F
1 object GPU |
T3F
100 objects CPU |
T3F
100 objects GPU |
---|---|---|---|---|---|
matvec | 11.142 | 0.129 | 0.121 | 0.003 | 0.003 |
matmul | 86.191 | 0.125 | 0.133 | 0.004 | 0.004 |
norm | 3.790 | 1.902 | 0.893 | 0.422 | 0.050 |
round | 73.027 | 0.159 | 0.165 | 0.006 | 0.006 |
gram | 0.145 | 0.806 | 0.703 | 0.029 | 0.001 |
project | 116.868 | 1.564 | 1.658 | 0.017 | 0.018 |
Speeding-up convolutional neural networks using fine-tuned cp-decomposition.
In Proceedings of the International Conference on Learning Representations (ICLR), 2015.