# Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.

## Authors

• 2 publications
• 1 publication
• 1 publication
• 2 publications
• 2 publications
• 1 publication
• 1 publication
• ### Multiscale Analysis for Higher-order Tensors

The widespread use of multisensor technology and the emergence of big da...
04/27/2017 ∙ by Alp Ozdemir, et al. ∙ 0

• ### p-order Tensor Products with Invertible Linear Transforms

This paper studies the issues about tensors. Three typical kinds of tens...
05/23/2020 ∙ by Jun Han, et al. ∙ 0

• ### Efficient N-Dimensional Convolutions via Higher-Order Factorization

With the unprecedented success of deep convolutional neural networks cam...
06/14/2019 ∙ by Jean Kossaifi, et al. ∙ 3

• ### Tensor decompositions and algorithms, with applications to tensor learning

A new algorithm of the canonical polyadic decomposition (CPD) presented ...
10/12/2021 ∙ by Felipe Bottega Diniz, et al. ∙ 0

• ### Tensor-Train Networks for Learning Predictive Modeling of Multidimensional Data

Deep neural networks have attracted the attention of the machine learnin...
01/22/2021 ∙ by M. Nazareth da Costa, et al. ∙ 0

• ### Tensor Networks for Multi-Modal Non-Euclidean Data

Modern data sources are typically of large scale and multi-modal natures...
03/27/2021 ∙ by Yao Lei Xu, et al. ∙ 0

• ### Multi-Branch Tensor Network Structure for Tensor-Train Discriminant Analysis

Higher-order data with high dimensionality arise in a diverse set of app...
04/15/2019 ∙ by Seyyid Emre Sofuoglu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

### 1.1 Reshaping or Folding

The simplest way of tensorization is through the reshaping or folding operations, also known as segmentation [Debals and De Lathauwer, 2015, Boussé et al., 2015]. This type of tensorization preserves the number of original data entries and their sequential ordering, as it only rearranges a vector to a matrix or tensor. Hence, folding does not require additional memory space.

Folding. A tensor of size is considered a folding of a vector of length , if

 Y––(i1,i2,…,iN)=y(i), (1.1)

for all , where is a linear index of ().

In other words, the vector is vectorization of the tensor , while is a tensorization of .

As an example, the arrangement of elements in a matrix of size , which is folded from a vector of length is given by

 Y = ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣y(1)y(I+1)⋯y(L−I+1)y(2)y(I+2)⋯y(L−I+2)⋮⋮⋱⋮y(I)y(2I)⋯y(L)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (1.2)

Higher-order folding/reshaping refers to the application of the folding procedure several times, whereby a vector is converted into an th-order tensor of size .

Application to BSS. It is important to notice that a higher-order folding (quantization) of a vector of length , sampled from an exponential function , yields an th-order tensor of rank 1. Moreover, wide classes of functions formed by products and/or sums of trigonometric, polynomial and rational functions can be quantized in this way to yield (approximate) low-rank tensor train (TT) network formats [Khoromskij, 2011a, b, Oseledets, 2012]. Exploitation of such low-rank representations allows us to separate the signals from a single or a few mixtures, as outlined below.

Consider a single mixture, , which is composed of component signals, , , and corrupted by additive Gaussian noise, , to give

 y(t)=a1x1(t)+a2x2(t)+⋯+aJxJ(t)+n(t). (1.3)

The aim is to extract the unknown sources (components) from the observed signal . Assume that higher-order foldings, , of the component signals, , have low-rank representations in, e.g., the CP or Tucker format, given by

 X––j=⟦G–––j;U(1)j,U(2)j,…,U(N)j⟧,

or in the TT format

or in any other tensor network format. Because of the multi-linearity of this tensorization, the following relation between the tensorization of the mixture, , and the tensorization of the hidden components, , holds

 Y–– = a1X––1+a2X––2+⋯+aJX––J+N–––, (1.4)

where is the tensorization of the noise .

Now, by a decomposition of into blocks of tensor networks, each corresponding to a tensor network (TN) representation of a hidden component signal, we can find approximations of and the separate component signals up to a scaling ambiguity. The separation method can be used in conjunction with the Toeplitz and Hankel foldings. Example 1.10.1 illustrates the separation of damped sinusoid signals.

### 1.2 Tensorization through a Toeplitz/Hankel Tensor

#### 1.2.1 Toeplitz Folding

The Toeplitz matrix is a structured matrix with constant entries in each diagonal. Toeplitz matrices appear in many signal processing applications, e.g., through covariance matrices in prediction, estimation, detection, classification, regression, harmonic analysis, speech enhancement, interference cancellation, image restoration, adaptive filtering, blind deconvolution and blind equalization [Bini, 1995, Gray, 2006].

Before introducing a generalization of a Toeplitz matrix to a Toeplitz tensor, we shall first consider the discrete convolution between two vectors and of respective lengths and , given by

 z=x∗y. (1.5)

Now, we can write the entries in a linear algebraic form as

 zI:L =⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣y(I)y(I−1)y(I−2)⋯y(1)y(I+1)y(I)y(I−1)⋯y(2)y(I+2)y(I+1)y(I)⋯y(3)⋮⋮⋮⋱⋮y(L)y(L−1)y(L−2)⋯y(J)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x(1)x(2)x(3)⋮x(I)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ =YTx=Y¯×1x,

where . With this representation, the convolution can be computed through a linear matrix operator, , which is called the Toeplitz matrix of the generating vector .

Toeplitz matrix. A Toeplitz matrix of size , which is constructed from a vector of length , is defined as

 Y=TI,J(y)=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣y(I)y(I+1)⋯y(L)y(I−1)y(I)⋯y(L−1)⋮⋮⋱⋮y(1)y(2)⋯y(L−I+1)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (1.6)

The first column and first row of the Toeplitz matrix represent its entire generating vector.

Indeed, all entries of in the above convolution (1.5

) can be expressed either by: (i) using a Toeplitz matrix formed from a zero-padded generating vector

, with being the first row of this Toeplitz matrix, to give

 z=TI,L+I−1([0TI−1,yT,0TI−1]T)Tx, (1.7)

or (ii) through a Toeplitz matrix of the generating vector , to yield

 z=TL,L+I−1([0TL−1,xT,0TL−1]T)Ty. (1.8)

The so expanded Toeplitz matrix is a circulant matrix of .

Consider now a convolution of three vectors, , and of respective lengths , and , given by

 z=x1∗x2∗y.

For its implementation, we first construct a Toeplitz matrix, , of size from the generating vector . Then, we use the rows to generate Toeplitz matrices, of size . Finally, all Toeplitz matrices, , …, , are stacked as horizontal slices of a third-order tensor , i.e., , . It can be verified that entries can be computed as

 ⎡⎢ ⎢⎣z(I1+I2−1)⋮z(L)⎤⎥ ⎥⎦=[x1∗x2∗y]I1+I2−1:L = Y––¯×1x1¯×2x2.

The tensor is referred to as the Toeplitz tensor of the generating vector .

Toeplitz tensor. An th-order Toeplitz tensor of size , which is represented by , is constructed from a generating vector of length , such that its entries are defined as

 Y––(i1,…,iN−1,iN)=y(¯i1+⋯+¯iN−1+iN), (1.9)

where . An example of the Toeplitz tensor is illustrated in Figure 1.1.

Example 1  Given a dimensional Toeplitz tensor of a sequence , the horizontal slices are Toeplitz matrices of sizes given by

 T3,3,3(1,…,7) =⎡⎢⎣T3,3(3,…,7)T3,3(2,…,6)T3,3(1,…,5)⎤⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⎡⎢⎣567456345⎤⎥⎦⎡⎢⎣456345234⎤⎥⎦⎡⎢⎣345234123⎤⎥⎦⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Recursive generation. An th-order Toeplitz tensor of a generating vector is of size , can be constructed from an th-order Toeplitz tensor of size of the same generating vector, by a conversion of mode- fibers to Toeplitz matrices of size .

Following the definition of the Toeplitz tensor, the convolution of vectors, of respective lengths , and a vector of length , can be represented as a tensor-vector product of an th-order Toeplitz tensor and vectors , that is

 [x1∗x2∗⋯∗xN−1∗y]J:L=Y––¯×1x1¯×2x2⋯¯×N−1xN−1,

where is a Toeplitz tensor of size generated from , and , or

 x1∗x2∗⋯∗xN−1∗y=˜Y––¯×1x1¯×2x2⋯¯×N−1xN−1,

where is a Toeplitz tensor, of the zero-padded vector of , is of size .

#### 1.2.2 Hankel Folding

The Hankel matrix and Hankel tensor have similar structures to the Toeplitz matrix and tensor and can also be used as linear operators in the convolution.

Hankel matrix. An Hankel matrix of a vector , of length , is defined as

 Y=HI,J(y) = ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣y(1)y(2)⋯y(J)y(2)y(3)⋯y(J+1)⋮⋮⋱⋮y(I)y(I+1)⋯y(L)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (1.10)

Hankel tensor. [Papy et al., 2005] An th-order Hankel tensor of size , which is represented by , is constructed from a generating vector of length , such that its entries are defined as

 Y––(i1,i2,…,iN)=y(i1+i2+⋯+iN−N+1). (1.11)
###### Remark 1

(Properties of a Hankel tensor)

• The generating vector can be reconstructed by a concatenation of fibers of the Hankel tensor , where , and

 y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣Y––(1:I1−1,1,…,1)⋮Y––(I1,…,In−1,1:In−1,1,…,1)⋮Y––(I1,…,IN−1,1:IN)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (1.12)
• Slices of a Hankel tensor , i.e., any subset of the tensor produced by fixing indices of its entries and varying the two remaining indices, are also Hankel matrices.

• An th-order Hankel tensor, , can be constructed from an th-order Hankel tensor of size by converting its mode- fibers to Hankel matrices of size .

• Similarly to the Toeplitz tensor, the convolution of vectors, of lengths , and a vector of length , can be represented as

 [x1∗x2∗⋯∗xN−1∗y]J:L=Y––¯×1~x1¯×2~x2⋯¯×N−1~xN−1,

or

 x1∗x2∗⋯∗xN−1∗y=˜Y––¯×1~x1¯×2~x2⋯¯×N−1~xN−1,

where , , is the th-order Hankel tensor of , whereas is the Hankel tensor of a zero-padded version of .

• A Hankel tensor with identical dimensions , for all , is a symmetric tensor.

Example 2  A – dimensional Hankel tensor of a sequence is a symmetric tensor, and is given by

 H3,3,3(1:7) = ⎡⎢⎣⎡⎢⎣123234345⎤⎥⎦,⎡⎢⎣234345456⎤⎥⎦,⎡⎢⎣345456567⎤⎥⎦⎤⎥⎦.

#### 1.2.3 Quantized Tensorization

It is important to notice that the tensorizations into the Toeplitz and Hankel tensors typically enlarge the number of data samples (in the sense that the number of entries of the corresponding tensor is larger than the number of original samples). For example, when the dimensions for all , the so generated tensor to be a quantized tensor of order , while the number of entries of a such tensor increases from the original size to

. Therefore, quantized tensorizations are suited to analyse signals of short-length, especially in multivariate autoregressive modelling.

#### 1.2.4 Convolution Tensor

Consider again the convolution of two vectors of respective lengths and . We can then rewrite the expression for the entries- as

 [x∗y]I:L=C––¯×1x¯×3y,

where is a third-order tensor of size , , for which the -th diagonal elements of -th slices are ones, and the remaining entries are zeros, for . For example, the slices , for , are given by

The tensor is called the convolution tensor. Illustration of a convolution tensor of size is given in Figure 1.2.

Note that a product of this tensor with the vector yields the Toeplitz matrix of the generating vector , which is of size , in the form

 C––¯×3y=TI,J(y),

while the tensor-vector product yields a Toeplitz matrix of the generating vector , or a circulant matrix of

 C––¯×1x=TL,J([0TL−I,xT,0TJ−1]T).

In general, for a convolution of vectors, , …, , of respective lengths and a vector of length

 z=x1∗x2∗⋯∗xN−1∗y, (1.13)

the entries of can be expressed through a multilinear product of a convolution tensor, , of th-order and size , , and the input vectors

 zL−IN+1:L=C––¯×1x1¯×2x2⋯¯×N−1xN−1¯×N+1y. (1.14)

Most entries of are zeros, except for those located at , such that

 N−1∑n=1¯in+iN−iN+1=0, (1.15)

where , .

The tensor product yields the Toeplitz tensor of the generating vector , shown below

 C––¯×N+1y=TI1,…,IN(y). (1.16)

#### 1.2.5 QTT Representation of the Convolution Tensor

An important property of the convolution tensor is that it has a QTT representation with rank no larger than the number of inputs vectors, . To illustrate this property, for simplicity, we consider an th-order Toeplitz tensor of size generated from a vector of length , where . The convolution tensor of this Toeplitz tensor is of th-order and of size .

Zero-padded convolution tensor. By appending zero tensors of size before the convolution tensor, we obtain an th-order convolution tensor, of size .

QTT representation. The zero-padded convolution tensor can be represented in the following QTT format

 C––=˜C––(1)|⊗|˜C––(2)|⊗|⋯|⊗|˜C––(D)|⊗|˜C––(D+1), (1.17)

where “” represents the strong Kronecker product between block tensors111A “block tensor” represents a multilevel matrix, the entries of which are matrices or tensors. defined from the th-order core tensors as .

The last core tensor

represents an exchange (backward identity) matrix of size

which can represented as an th-order tensor of size . The first core tensors , , …, are expressed based on the so-called elementary core tensor of size , as

 C––(1)=S––(1,:,…,:),C––(2)=⋯=C––(D)=S––. (1.18)

The rigorous definition of the elementary core tensor is provided in Appendix 3.

Table 1.1 provides ranks of the QTT representation for various order of convolution tensors. The elementary core tensor can be further re-expressed in a (tensor train) TT-format with sparse TT cores, as

 S––=$$⟨$$$$⟨$$G–––(1),G–––(2),…,G–––(N+1)$$⟩$$$$⟩$$,

where is of size , for , and the last core tensor is of size .

Example 3  Convolution tensor of 3rd-order.

For the vectors of length and of length , the expanded convolution tensor has size of . The elementary core tensor is then of size and its sub-tensors, , are given in a block form of the last two indices through four matrices, , , and , of size , that is

 S––(1,:,:,:,:) =[S1S3S2S4],S––(2,:,:,:,:)=[S2S4S3S1],

where

 S1=[1001],S2=[0100],S3=[0000],S4=[0010].

The convolution tensor can then be represented in a QTT format of rank-2 [Kazeev et al., 2013] with core tensors , , and the last core tensor which is of size . This QTT representation is useful to generate a Toeplitz matrix when its generating vector is given in the QTT format. An illustration of the convolution tensor is provided in Figure 1.3.

Example 4  Convolution tensor of fourth-order.

For the convolution tensor of fourth order, i.e., Toeplitz order , the elementary core tensor is of size , and is given in a block form of the last two indices as

 S––(1,:,…,:) S––(3,:,…,:) =[S––5S––1S––3S––6S––2S––4].

where are of size , , are zero tensors, and

 S––1 =[[1000][0110]],S––2=[[0000][1000]], S––3 =[[0001][0000]],S––4=[[0110][0001]].

Finally, the zero-padded convolution tensor of size has a QTT representation in (1.17) with , , , and the last core tensor which is of size .

#### 1.2.6 Low-rank Representation of Hankel and Toeplitz Matrices/Tensors

The Hankel and Toeplitz foldings are multilinear tensorizations, and can be applied to the BSS problem, as in (1.4). When the Hankel and Toeplitz tensors of the hidden sources are of low-rank in some tensor network representation, the tensor of the mixture is expressed as a sum of low rank tensor terms.

For example, the Hankel and Toeplitz matrices/tensors of an exponential function, , are rank-1 matrices/tensors, and consequently Hankel matrices/tensors of sums and/or products of exponentials, sinusoids, and polynomials will also be of low-rank, which is equal to the degree of the function being considered.

Hadamard Product. More importantly, when Hankel/Toeplitz tensors of two vectors and have low-rank CP/TT representations, the Hankel/Toeplitz tensor of their element-wise product, , can also be represented in the same CP/TT tensor format

 = =

The CP/TT rank of or is not larger than the product of the CP/TT ranks of the tensors of and .

Example 5

The third-order Hankel tensor of is a rank-3 tensor, and the third-order Hankel tensor of is of rank-2; hence the Hankel tensor of the has at most rank-6.

Symmetric CP and Vandermonde decompositions. It is important to notice that a Hankel tensor of size can always be represented by a symmetric CP decomposition

 Y––=I–×1A×2A⋯×NA.

Moreover, the tensor also admits a symmetric CP decomposition with Vandermonde structured factor matrix [Qi, 2015]

 Y––=diagN(% \boldmathλ)×1VT×2VT⋯×NVT, (1.19)

where comprises non-zero coefficients, and is a Vandermonde matrix generated from distinct values

 V=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1v1v21…vI−111v2v22…vI−12⋮⋮⋮⋱⋮1vRv2R…vI−1R⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (1.20)

By writing the decomposition in (1.19) for the entries (see (1.12)), the Vandermonde decomposition of the Hankel tensor becomes a Vandermonde factorization of [Chen, 2016], given by

 y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11…1v1v2…vRv21v22…v2R⋮⋮⋱⋮vL−11vL−12…vL−1R⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦\boldmathλ.

Observe that various Vandermonde decompositions of the Hankel tensors of the same vector , but of different tensor orders , have the same generating Vandermonde vector . Moreover, the Vandemonde rank, i.e, the minimum of in the decomposition (1.19), therefore cannot exceed the length of the generating vector .

QTT representation of Toeplitz/Hankel tensor. As mentioned previously, the zero-padded convolution tensor of th-order can be represented in a QTT format of rank of at most . Hence, if a vector of length has a QTT representation of rank-, given by

 y=˜Y(1)|⊗|˜Y(2)|⊗|⋯|⊗|˜Y(D+1), (1.21)

where is an block matrix of the core tensor of size , for , or of of size , then following the relation between the convolution tensor and the Toeplitz tensor of the generating vector , we have

 T(y)=C––¯×