# Sparse Sampling for Inverse Problems with Tensors

We consider the problem of designing sparse sampling strategies for multidomain signals, which can be represented using tensors that admit a known multilinear decomposition. We leverage the multidomain structure of tensor signals and propose to acquire samples using a Kronecker-structured sensing function, thereby circumventing the curse of dimensionality. For designing such sensing functions, we develop low-complexity greedy algorithms based on submodular optimization methods to compute near-optimal sampling sets. We present several numerical examples, ranging from multi-antenna communications to graph signal processing, to validate the developed theory.

## Authors

• 9 publications
• 6 publications
• 10 publications
• 17 publications
• ### Sampling and Reconstruction of Signals on Product Graphs

In this paper, we consider the problem of subsampling and reconstruction...
06/30/2018 ∙ by Guillermo Ortiz-Jiménez, et al. ∙ 0

• ### Provable Near-Optimal Low-Multilinear-Rank Tensor Recovery

We consider the problem of recovering a low-multilinear-rank tensor from...
07/17/2020 ∙ by Jian-Feng Cai, et al. ∙ 0

• ### Multilinear Low-Rank Tensors on Graphs & Applications

We propose a new framework for the analysis of low-rank tensors which li...
11/15/2016 ∙ by Nauman Shahid, et al. ∙ 0

• ### Multi-way Graph Signal Processing on Tensors: Integrative analysis of irregular geometries

Graph signal processing (GSP) is an important methodology for studying a...
06/30/2020 ∙ by Jay S. Stanley III, et al. ∙ 0

• ### Greedy Sampling of Graph Signals

Sampling is a fundamental topic in graph signal processing, having found...
04/05/2017 ∙ by Luiz F. O. Chamon, et al. ∙ 0

• ### Adaptive Granularity in Tensors: A Quest for Interpretable Structure

Data collected at very frequent intervals is usually extremely sparse an...
12/19/2019 ∙ by Ravdeep Pasricha, et al. ∙ 0

• ### Direct imaging for the moment tensor point sources of elastic waves

We investigate an inverse source problem of the time-harmonic elastic wa...
01/26/2021 ∙ by Xianchao Wang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In many engineering and scientific applications, we frequently encounter large volumes of multisensor data, defined over multiple domains, which are complex in nature. For example, in wireless communications, received data per user may be indexed in space, time, and frequency. Similarly, in hyperspectral imaging, a scene measured in different wavelengths contains information from the three-dimensional spatial domain as well as the spectral domain. And also, when dealing with graph data in a recommender system, information resides on multiple domains (e.g., users, movies, music, and so on). To process such multisensor datasets, higher-order tensors or multiway arrays have been proven to be extremely useful. In practice, however, due to limited access to sensing resources, economic or physical space limitations, it is often not possible to measure such multidomain signals using every combination of sensors related to different domains. To cope with such issues, in this work, we propose sparse sampling techniques to acquire multisensor tensor data. Sparse samplers can be designed to select a subset of measurements (e.g., spatial or temporal samples as illustrated in Fig. 0(a)) such that the desired inference performance is achieved. This subset selection problem is referred to as sparse sampling [1]

. An example of this is field estimation, in which the measured field is related to the source signal of interest through a linear model. To infer the source signal, a linear inverse problem is solved. In a resource-constrained environment, since many measurements cannot be taken, it is crucial to carefully select the best subset of samples from a large pool of measurements. This problem is combinatorial in nature and extremely hard to solve in general, even for small-sized problems. Thus, most of the research efforts on this topic focus on finding suboptimal sampling strategies that yield good approximations of the optimal solution

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

For signals defined over multiple domains, the dimensionality of the measurements grows much faster. An illustration of this “curse of dimensionality” is provided in Fig. 0(b), wherein the measurements now have to be systematically selected from an even larger pool of measurements. Typically used suboptimal sensor selection strategies are not useful anymore as their complexity is too high; or simply because they need to store very large matrices that do not fit in memory (see Section III for a more detailed discussion). Usually, selecting samples arbitrarily from a multidomain signal, requires that sensors are placed densely in every domain, which greatly increases the infrastructure costs. Hence, we propose an efficient Kronecker-structured sparse sampling strategy for gathering multidomain signals that overcomes these issues. In Kronecker-structured sparse sampling, instead of choosing a subset of measurements from all possible combined domain locations (as in Fig. 0(b)), we propose to choose a subset of sensing locations from each domain and then combine them to obtain multidimensional observations (as illustrated in Fig 0(c)). We will see later that taking this approach will allow us to define computationally efficient design algorithms that are useful in big data scenarios. In essence, the main question addressed in this paper is, how to choose a subset of sampling locations from each domain to sample a multidomain signal so that its reconstruction has the minimum error?

## Ii Preliminaries

In this section, we introduce the notation that will be used throughout the rest of the paper as well as some preliminary notions of tensor algebra and multilinear systems.

### Ii-a Notation

We use calligraphic letters such as to denote sets, and to represent its cardinality. Upper (lower) case boldface letters such as (

) are used to denote matrices (vectors). Bold calligraphic letters such as

denote tensors. represents transposition, conjugate transposition, and the Moore-Penrose pseudoinverse. The trace and determinant operations on matrices are represented by and , respectively.

denotes the minimum eigenvalue of matrix

. We use to represent the Kronecker product, to represent the Khatri-Rao or column-wise Kronecker product; and to represent the Hadamard or element-wise product between matrices. We write the -norm of a vector as and the Frobenius norm of a matrix or tensor as . We denote the inner product between two elements of a Euclidean space as . The expectation operator is denoted by . All logarithms are considered natural. In general, we will denote the product of variables/sets using a tilde, i.e., , or ; and drop the tilde to denote sums (unions), i.e., , or . Some important properties of the Kronecker and the Khatri-Rao products that will appear throughout the paper are [11]: ; ; ; ; and .

### Ii-B Tensors

A tensor of order can be viewed as a discretized multidomain signal, with each of its entries indexed over different domains. Using multilinear algebra two tensors and may be related by a multilinear system of equations as

 \mathbfcalX=\mathbfcalG∙1U1∙2⋯∙RUR, (1)

where represents a set of matrices that relates the th domain of and the so-called core tensor , and represents the th mode product between a tensor and a matrix [12]; see Fig. 1(a). Alternatively, vectorizing (1), we have

 x=(U1⊗⋯⊗UR)g, (2)

with , and .

When the core tensor is hyperdiagonal (as depicted in Fig. 1(b)), (2) simplifies to

 x=(U1⊙⋯⊙UR)g (3)

with collecting the main diagonal entries of . Note that has different meanings in (2) and , which can always be inferred from the context. Such a multilinear system is commonly seen with and

for instance, in image processing when relating an image to its 2-dimensional Fourier transform with

being the spatial Fourier transform of , and and being inverse Fourier matrices related to the row and column spaces of the image, respectively. When dealing with Fourier matrices (more generally, Vandermonde matrices) with and a diagonal tensor core, will be a Toeplitz covariance matrix, for which the sampling sets may be designed using sparse covariance sensing [13, 14].

## Iii Problem modeling

We are concerned with the design of optimal sampling strategies for an th order tensor signal , which admits a multilinear parameterization in terms of a core tensor (dense or diagonal) of smaller dimensionality. We assume that the set of system matrices are perfectly known, and that each of them is tall, i.e., for , and has full column rank. Sparse sampling a tensor is equivalent to selecting entries of . Let denote the set of indices of . Then, a particular sample selection is determined by a subset of selected indices such that (subscript “” denotes unstructured). This way, we can denote the process of sampling as a multiplication of by a selection matrix such that

 y=Θ(Lun)x=Θ(Lun)(U1⊗⋯⊗UR)g, (4)

for a dense core [cf. (2)], and

 y=Θ(Lun)x=Θ(Lun)(U1⊙⋯⊙UR)g, (5)

for a diagonal core [cf. (3)]. Here, is a vector containing the selected entries of indexed by the set .

For each case, if and have full column rank, then knowing allows to retrieve a unique least squares solution, , as

 ^g=[Θ(Lun)(U1⊗⋯⊗UR)]†y, (6)

or

 ^g=[Θ(Lun)(U1⊙⋯⊙UR)]†y, (7)

depending on whether is dense or hyperdiagonal. Next, we estimate using either (2) or (3). In many applications, such as transmitter-receiver placement in multiple input multiple output (MIMO) radar, it is not possible to perform sparse sampling in an unstructured manner by ignoring the underlying domains. For these applications, some unstructured sparse sample selections generally require using a dense sensor selection in each domain (as shown in Fig. 0(b)), which produces a significant increase in hardware cost. Also, there is no particular structure in (6) and (7) that may be exploited to compute the pseudo-inverses, thus leading to a high computational cost to estimate . Finally, in the multidomain case, the dimensionality grows rather fast making it difficult to store the matrix or to perform row subset selection. For all these reasons, we will constrain ourselves to the case where the sampling matrix has a compatible Kronecker structure. In particular, we define a new sampling matrix

 Φ(L)\coloneqqΦ1(L1)⊗⋯⊗ΦR(LR), (8)

where each represents a selection matrix for the th factor of , is the set of selected row indices from the matrix for , and and We will use the notation and to denote the number of selected sensors per domain and the total number of selected sensors, respectively; whereas and denote the set of sample indices and the total number of samples acquired with the above Kronecker-structured sampler. In order to simplify the notation, whenever it will be clear, we will drop the explicit dependency of on the set of selected rows , from now on, and simply use . Imposing a Kronecker structure on the sampling scheme means that sampling can be performed independently for each domain. In the dense core tensor case [cf. (2)], we have

 y =(Φ1⊗⋯⊗ΦR)(U1⊗⋯⊗UR)g =(Φ1U1⊗⋯⊗ΦRUR)g=Ψ(L)g, (9)

whereas in the diagonal core tensor case [cf. (3)], we have

 y =(Φ1⊗⋯⊗ΦR)(U1⊙⋯⊙UR)g (10)

As in the unstructured case, whenever (9) or (10) are overdetermined, using least squares, we can estimate the core as

 ^g=[(Φ1U1)†⊗⋯⊗(ΦRUR)†]y, (11)

or

 ^g ×[(Φ1U1)H⊙⋯⊙(ΦRUR)H]y, (12)

and then reconstruct using (2) or (3), respectively. Comparing (11) and (12) to (6) and (7) we can see that leveraging the Kronecker structure of the proposed sampling scheme allows to greatly reduce the computational complexity of the least-squares problem, as the pseudoinverses in (11) and (12) are taken on matrices of a much smaller dimensionality than in (6) and (7). An illustration of the comparison between unstructured sparse sensing and Kronecker-structured sparse sensing is shown in Fig. 3 for . Suppose the measurements collected in

are perturbed by zero-mean white Gaussian noise with unit variance, then the least-squares solution has the inverse error covariance or the Fisher information matrix

that determines the quality of the estimators . Therefore, we can use scalar functions of as a figure of merit to propose the sparse tensor sampling problem

 optimizeL1,…,LRf{T(L)}s. toR∑i=1|Li|=L,L=R⋃i=1Li, (13)

where with “optimize” we mean either “maximize” or “minimize” depending on the choice of the scalar function . Solving (13) is not trivial due to the cardinality constraints. Therefore, in the following, we will propose tight surrogates for typically used scalar performance metrics in design of experiments with which the above discrete optimization problem can be solved efficiently and near optimally. Note that the cardinality constraint in (13) restricts the total number of selected sensors to , without imposing any constraint on the total number of gathered samples . Although the maximum number of samples can be constrained using the constraint , the resulting near-optimal solvers are computationally intense with a complexity of about [15, 16]

. Such heuristics are not suitable for the large-scale scenarios of interest.

### Iii-a Prior art

Choosing the best subset of measurements from a large set of candidate sensing locations has received a lot of attention, particularly for , usually under the name of sensor selection/placement, which also is more generally referred to as sparse sensing [1]. Typically sparse sensing design is posed as a discrete optimization problem that finds the best sampling subset by optimizing a scalar function of the error covariance matrix. Some of the popular choices are, to minimize the mean squared error (MSE): or the frame potential: , or to maximize or . In this work, we will focus on the frame potential criterium as we will show later that this metric leads to very efficient sampler designs. Depending on the strategy used to solve the optimization problem (13

) we can classify the prior art in two categories: solvers based on convex optimization, and greedy methods that leverage submodularity. In the former category,

[2] and [8] present several convex relaxations of the sparse sensing problem for different optimality criteria for inverse problems with linear and non-linear models, respectively. In particular, due to the Boolean nature of the sensor selection problem (i.e., a sensor is either selected or not), its related optimization problem is not convex. However, these constraints and the constraint on the number of selected sensors can be relaxed, and once the relaxed convex problem is solved, a thresholding heuristic (deterministic or randomized) can be used to recover a Boolean solution. Despite its good performance, the complexity of convex optimization solvers is rather high (cubic with the dimensionality of the signal). Therefore, the use of convex optimization approaches to solve the sparse sensing problem in large-scale scenarios, such as the sparse tensor sampling problem, gets even more computationally intense. For high-dimensional scenarios, greedy methods (algorithms that select one sensor at a time) are more useful. A greedy algorithm scales linearly with the number of sensors, and if one can prove submodularity of the objective function, its solution has a multiplicative near-optimality guarantee [17]. Several authors have followed this strategy and have proved submodularity of different optimality criteria such as D-optimality [4], mutual information [3], and frame potential [5]. All of them for the case . Besides parameter estimation, sparse sensing has also been studied for other common signal processing tasks, like detection [9, 6] and filtering [10, 7]. In a different context, the extension of Compressed Sensing (CS) to multidomain signals has been extensively studied [18, 19, 20, 21]. CS is many times seen as a complementary sampling framework to sparse sensing [1], wherein CS the focus is on recovering a sparse signal rather than on designing a sparse measurement space.

### Iii-B Our contributions

In this paper, we extend the sparse sampling framework to multilinear inverse problems. We refer to it as “sparse tensor sampling”. We focus on two particular cases, depending on the structure of the core tensor :

• Dense core: Whenever the core tensor is non-diagonal, sampling is performed based on (9). We will see that to ensure identifiability of the system, we need to select more entries in each domain than the rank of its corresponding system matrix, i.e., as a necessary condition we require sensors, where are the dimensions of the core tensor .

• Diagonal core: Whenever the core tensor is diagonal, sampling is performed based on (10). The use of the Khatri-Rao product allows for higher compression. In particular, under mild conditions on the entries of the factor matrices, we can guarantee identifiability of the sampled equations using sensors, where is the length of the edges of the hypercubic core .

For both the cases, we propose efficient greedy algorithms to compute a near-optimal sampling set.

### Iii-C Paper outline

The remainder of the paper is organized as follows. In Sections IV and V, we develop solvers for the sparse tensor sampling problem with dense and diagonal core tensors, respectively. In Section VI, we provide a few examples to illustrate the developed framework. Finally, we conclude this paper by summarizing the results in Section VII.

## Iv Dense core sampling

In this section, we focus on the most general situation when is an unstructured dense tensor. Our objective is to design the sampling sets by solving the discrete optimization problem (13). We formulate the sparse tensor sampling problem using the frame potential as a performance measure. Following the same rationale as in [5], but for multidomain signals, we will argue that the frame potential is a tight surrogate of the MSE. By doing so, we will see that when we impose a Kronecker structure on the sampling scheme, as in (8), the frame potential of can be factorized in terms of the frame potential of the different domain factors. This allows us to propose a low complexity algorithm for sampling tensor data. Throughout this section, we will use tools from submodular optimization theory. Hence, we will start by introducing the main concepts related to submodularity in the next subsection.

### Iv-a Submodular optimization

Submodularity is a notion based on the law of diminishing returns [22] that is useful to obtain heuristic algorithms with near-optimality guarantees for cardinality-constrained discrete optimization problems. More precisely, submodularity is formally defined as follows.

###### Definition 1 (Submodular function [22]).

A set function defined over the subsets of is submodular if, for every and , we have

 f(X∪{x})−f(X)≥f(X∪{x,y})−f(X∪{y}).

A function is said to be supermodular if is submodular.

Besides submodularity, many near-optimality theorems in discrete optimization require functions to be also monotone non-decreasing, and normalized.

###### Definition 2 (Monotonicity).

A set function is monotone non-decreasing if, for every ,

 f(X∪{x})≥f(X)∀x∈N∖X
###### Definition 3 (Normalization).

A set function is normalized if .

In submodular optimization, matroids are generally used to impose constraints on an optimization, such as the ones in (13). A matroid generalizes the concept of linear independence in algebra to sets. Formally, a matroid is defined as follows.

###### Definition 4 (Matroid [23]).

A finite matroid is a pair , where is a finite set and is a family of subsets of that satisfies: 1) The empty set is independent, i.e., ; 2) For every , if , then ; and 3) For every with and there exists one such that .

In this paper we will deal with the following types of matroids.

###### Example 1 (Uniform matroid [23]).

The subsets of with at most elements form a uniform matroid with .

###### Example 2 (Partition matroid [23]).

If form a partition of then with defines a partition matroid.

###### Example 3 (Truncated partition matroid [23]).

The intersection of a uniform matroid and a partition matroid defines a truncated partition matroid .

The matroid-constrained submodular optimization problem

 maximizeX⊆Nf(X)subject toX∈T⋂i=1Ii (14)

can be solved near optimally using Algorithm 1. This result is formally stated in the following theorem.

###### Theorem 1 (Matroid-constrained submodular maximization[24]).

Let be a monotone non-decreasing, normalized, submodular set function, and be a set of matroids defined over . Furthermore, let denote the optimal solution of (14), and let be the solution obtained by Algorithm 1. Then

 f(Xgreedy)≥1T+1f(X⋆).

### Iv-B Greedy method

The frame potential [25] of the matrix is defined as the trace of the Grammian matrix with . The frame potential can be related to the MSE, using [5]

 c1FP(Ψ(L))λ2max{T(L)}≤MSE(Ψ(L))≤c2FP(Ψ(L))λ2min{T(L)}, (15)

where , and are constants that depend the data model. From the above bound, it is clear that by minimizing the frame potential of one can minimize the MSE, which is otherwise difficult to minimize as it is neither convex, nor submodular. The frame potential of can be expressed as the frame potential of its factors . To show this, recall the definition of the frame potential as

 FP(Ψ(L)) =tr{TH(L)T(L)} =tr{TH1T1⊗⋯⊗THRTR}, (16)

where . Now, using the fact that for any two matrices and we have , we can expand (16) as

 FP(Ψ(L))=R∏i=1tr{THiTi}=R∏i=1FP(Ψi(Li)).

For brevity, we will write the above expression alternatively as an explicit function of the selection sets :

 F(L) \coloneqqFP(Ψ(L))=R∏i=1Fi(Li)\coloneqqR∏i=1FP(Ψi(Li)) (17)

Expression (17) shows again the advantage of working with a Kronecker-structured sampler: instead of computing every cross-product between the columns of to compute the frame potential, we can arrive to the same value using the frame potential of .

#### Iv-B1 Submodularity of F(L)

Function as defined in (17) does not directly meet the conditions [cf. Theorem 1] required for near optimality of the greedy heuristic, but it can be modified slightly to satisfy them. In this sense, we define the function on the subsets of as

 G(S)\coloneqqF(N)−F(N∖S) (18)

where recall that , , and . Therefore, form a partition of . It is clear that if we make the change of variables from to maximizing over is the same as minimizing the frame potential over . However, working with the complement set results in a set function that is submodular and monotone non-decreasing, as shown in the next theorem. Consequently, satisfies the conditions of the near-optimality theorems.

###### Theorem 2.

The set function defined in (18) is a normalized, monotone non-decreasing, submodular function for all subsets of .

###### Proof.

See Appendix -A. ∎

With this result we can now claim near-optimality of the greedy algorithm that solves the cardinality constrained maximization of . However, as we said, minimizing the frame potential only makes sense as long as (15) is tight. In particular, whenever is singular we know that the MSE is infinity, and hence (15) is meaningless. For this reason, next to the cardinality constraint in (13) that limits the total number of sensors, we need to ensure that has full column rank, i.e., for . In terms of , this is equivalent to

 |Si|=|Ni∖Li|≤Ni−Kii=1,…,R, (19)

where this set of constraints forms a partition matroid [cf. Example 2 from Definition 4]. Hence, we can introduce the following submodular optimization problem as surrogate for the minimization of the frame potential

 maximizeS⊆NG(S)s. toS∈Iu∩Ip (20)

with and . Theorem 1 gives, therefore, all the ingredients to assess the near-optimality of Algorithm 1 applied on (20), for which the results are particularized as the following corollary.

###### Corollary 1.

The greedy solution to (20) obtained from Algorithm 1) is -near-optimal, i.e.,

###### Proof.

Follows from Theorem 1, and since (20) has (truncated-partition) matroid constraint. ∎

Next, we compute an explicit bound with respect to the frame potential of , which is the objective function we initially wanted to minimize. This bound is given in the following theorem.

###### Theorem 3.

The greedy solution to (20) obtained from Algorithm 1 is near optimal with respect to the frame potential as with , and , being the th row of .

###### Proof.

Obtained similar to the bound in [5], but specialized for (17) and 1/2-near-optimality; see [26] for details. ∎

As with the case in [5], is heavily influenced by the frame potential of . Specifically, approximation gets tighter when is small or the core tensor dimensionality decreases.

#### Iv-B2 Computational complexity

The running time of Algorithm 1 applied to solve (20) can greatly be reduced by precomputing the inner products between the rows of every before starting the iterations. This has a complexity of for each domain. Once these inner products are computed, in each iteration we need to find times the maximum over elements. Since we run iterations, the complexity of all iterations is , with . Therefore, the total computational complexity of the greedy method is with .

#### Iv-B3 Practical considerations

Due to the characteristics of the greedy iterations, the algorithm tends to give solutions with a very unbalanced cardinality. In particular, for most situations, the algorithm chooses one of the domains in the first few iterations and empties that set till it hits the identifiability constraint of that domain. Then, it proceeds to another domain and empties it as well, and so on. This is due to the objective function, which is a product of smaller objectives. Indeed, if we are asked to minimize a product of two elements by subtracting a value from them, it is generally better to subtract from the smallest element. Hence, if this minimization is performed multiple times we will tend to remove always from the same element. The consequences of this behavior are twofold. On the one hand, this greedy method tends to give a sensor placement that yields a very small number of samples , as we will also see in the simulations. Therefore, when comparing this method to other sensor selection schemes that produce solutions with a larger it generally ranks worse in MSE for a given . On the other hand, the solution of this scheme tends to be tight on the identifiability constraints for most of the domains, thus hampering the performance on those domains. This implication, however, has a simple solution. By introducing a small slack variable to the constraints, we can obtain a sensor selection which is not tight on the constraints. This amounts to solving the problem

 maximizeS⊆NG(S) (21) s. to |S|=N−L,|S∩Ni|≤Ni−Ki−αi,i=1,…,R.

Tuning allows to regularize the tradeoff between compression and accuracy of the greedy solution. We conclude this section with a remark on an alternative performance measure.

###### Remark.

As a alternative performance measure, one can think on maximizing the set function . Although this set function can be shown to be submodular over all subsets of  [26], the related greedy algorithm cannot be constrained to always result in an identifiable system after subsampling. Thus, its applicability is more limited than the frame potential formulation; see [26] for a more detailed discussion.

## V Diagonal core sampling

So far, we have focused on the case when is dense and has no particular structure. In that case, we have seen that we require at least sensors to recover our signal with a finite MSE. In many cases of interest, admits a structure. In particular, in this section, we investigate the case when is a diagonal tensor. Under some mild conditions on the entries of , we can leverage the structure of to further increase the compression. As before, we develop an efficient and near-optimal greedy algorithm based on minimizing the frame potential to design the sampling set.

### V-a Identifiability conditions

In contrast to the dense core case, the number of unknowns in a multilinear system of equations with a diagonal core does not increase with the tensor order, whereas for a dense core it grows exponentially. This means that when sampling signals with a diagonal core decomposition, one can expect a stronger compression. To derive the identifiability conditions for (10), we present the result from [27] as the following theorem.

###### Theorem 4 (Rank of Khatri-Rao product [27]).

Let and be two matrices with no all-zero column. Then,

 rank(A⊙B)≥max{rank(A),rank(B)}.

Based on the above theorem, we can give the following sufficient conditions for identifiability of the system (10).

###### Corollary 2.

Let denote the maximum number of zero entries in any column of . If for every we have , and there is at least one with , then has full column rank.

###### Proof.

Selecting rows from each ensures that no will have an all-zero column. Then, if for at least one we have , then due to Theorem 4 we have

 rank(Ψ(L)) ≥maxi=1,…,R{rank(Ψi)} =max{rank(Ψj),maxi≠j{rank(Ψj)}}=Kc.

Therefore, in order to guarantee identifiability we need to select rows from any factor matrix , and from the other factors with . In many scenarios, we usually have since no entry in will exactly be zero. In those situations we will require to select at least elements.

### V-B Greedy method

As we did for the case with a dense core, we start by finding an expression for the frame potential of a Khatri-Rao product in terms of its factors. The Grammian matrix of a diagonal core tensor decomposition has the form

 T =ΨHΨ=(Ψ1⊙⋯⊙ΨR)H(Ψ1⊙⋯⊙ΨR) =ΨH1Ψ1∘⋯∘ΨHRΨR=T1∘⋯∘TR.

Using this expression, the frame potential of a Khatri-Rao product becomes

 FP(Ψ)=tr{THT}=∥T∥2F=∥T1∘⋯∘TR∥2F. (22)

For brevity, we will denote the frame potential as an explicit function of the selected set as

 P(L)\coloneqqFP(Ψ(L))=∥T1(L1)∘⋯∘TR(LR)∥2F. (23)

Unlike in the dense core case, the frame potential of a Khatri-Rao product cannot be separated in terms of the frame potential of its factors. Instead, (22) decomposes the frame potential using the Hadamard product of the Grammian of the factors.

#### V-B1 Submodularity of P(L)

Since does not directly satisfy the conditions [cf. Theorem 1] required for near optimality of the greedy heuristic, we propose using the following set function as a surrogate for the frame potential

 Q(S)\coloneqqP(N)−P(N∖S) (24)

with and

###### Theorem 5.

The set function defined in (24) is a normalized, monotone non-decreasing, submodular function for all subsets of .

###### Proof.

See Appendix -B. ∎

Using and imposing the identifiability constraints defined in Section V-A we can write the related optimization problem for the minimization of the frame potential as

 maximizeS⊆NQ(S)s. toS∈Iu∩Ip (25)

where and with and . Here, the choice of is arbitrary, and can be set depending on the application. For example, with some space-time signals it is more costly to sample space than time, and, in those cases, is generally chosen for the temporal domain. This is a submodular maximization problem with a truncated partition matroid constraint [cf. Example 2 from Definition 4]. Thus, from Theorem 1, we know that greedy maximization of (25) using Algorithm 1 has a multiplicative near-optimal guarantee.

###### Corollary 3.

The greedy solution to (25) obtained using Algorithm 1 is -near-optimal, i.e., Here, denotes the optimal solution of (25).

Similar to the dense core case, we can also provide a bound on the near-optimality of of the greedy solution with respect to the frame potential.

###### Theorem 6.

The solution set obtained from Algorithm 1 is near optimal with respect to the frame potential as with and .

###### Proof.

Based on the proof of Theorem 3. The bound is obtained using (22) instead of (17) in the derivation. ∎

#### V-B2 Computational complexity

The computational complexity of the greedy method is now governed by the complexity of computing the Grammian matrices . This can greatly be improved if before starting the iterations, one precomputes all the outer products in . Doing this has a computational complexity of . Then, in every iteration, the evaluation of would only cost operations. Further, because in every iteration we need to query elements on each domain, and we run the algorithm for iterations, the total time complexity of the iterations is . This term dominates over the complexity of the precomputations, and thus can be treated as the worst case complexity of the greedy method.

#### V-B3 Practical considerations

The proposed scheme suffers from the same issues as in the dense core case. Namely, it tends to empty the domains sequentially, thus producing solutions which are tight on the identifiability constraints. Nevertheless, as we indicated for the dense core, the drop in performance associated with the proximity of the solutions to the constraints can be reduced by giving some slack to the constraints.

## Vi Numerical results

In this section111The code to reproduce these experiments can be found at https://gitlab.com/gortizji/sparse_tensor_sensing., we will illustrate the developed framework through several examples. First, we will show some results obtained on synthetic datasets to compare the performance of the different near-optimal algorithms. Then, we will focus on large-scale real-world examples related to (i) graph signal processing:

sampling product graphs for active learning in recommender systems

, and (ii) array processing for wireless communications: multiuser source separation, to show the benefits of the developed framework.

### Vi-a Synthetic example

#### Vi-A1 Dense core

We compare the performance in terms of the theoretical MSE of our proposed greedy algorithm (henceforth referred to as greedy-FP) to a random sampling scheme based on randomly selecting rows of such that the resulting subset of samples also have a Kronecker structure. Only those selections that satisfy the identifiability constraints in (19) are considered valid. We note that the time complexity of evaluating times the MSE for a Kronecker-structured sampler is . For this reason, using many realizations (say, a large number ) of random sampling to obtain a good sparse sampler is computationally intense. To perform this comparison, we draw realizations of three random Gaussian matrices with dimensions , , and . For each of these models, we solve for different number of sensors using greedy-FP. We also compute realizations of random sampling for each . Fig. 4 shows the results of these experiments. The plot on the left shows the performance averaged over the different models against the number of sensors, wherein the blue shaded area represents the 10-90 percentile average interval of the random sampling scheme. The performance values, in dB scale, are normalized by the value of the unsampled MSE. Because the estimation performance is heavily influenced by its related number of samples , and noting the fact that a value of may lead to different , we also present, in the plot on the right side of Fig. 4, the performance comparison for one model realization against the relative number of samples so that differences in the informative quality of the selections are highlighted. The plots in Fig. 4 illustrate some important features of the proposed sparse sampling method. When comparing the performance against the number of sensors, we see that there are areas where greedy-FP performs as well as random sampling. However, when comparing the same results against the number of samples we see that greedy-FP consistently performs better than random sampling. The reason for this disparity is due to characteristics of greedy-FP that we introduced in Section IV-B. Namely, the tendency of greedy-FP to produce sampling sets with the minimum number of samples. On the other hand, the performance curve of greedy-FP shows three bumps (recall that we use ). Again, this is a consequence of greedy-FP trying to meet the identifiability constraints in (20) with equality. As we increase , the solutions of greedy-FP increase in cardinality by adding more elements to a single domain until the constraints are met, and then proceed to the next domain. The bumps in Fig. 4 correspond precisely to these instances.

#### Vi-A2 Diagonal core

We perform the same experiment for the diagonal core case. The results are shown in Fig. 5. Again we see that the proposed algorithm outperforms random sampling, especially when collecting just a few samples. Furthermore, as happened in the dense core case, the performance curve of greedy-FP follows a stairway shape.

### Vi-B Active learning for recommender systems

Current recommendation algorithms seek solving an estimation problem of the form: given the past recorded preferences of a set of users, what is the rating that these would give to a set of products? In this paper, in contrast, we focus on the data acquisition phase of the recommender system, which is also referred to as active learning/sampling. In particular, we claim that by carefully designing which users to poll and on which items, we can obtain an estimation performance on par with the state-of-the-art methods, but using only a fraction of the data that current methods require, and using a simple least-squares estimator. We showcase this idea on the MovieLens dataset [28] that contains partial ratings of users over movies which are stored in a second-order tensor . At this point, we emphasize the need for our proposed framework, since it is obvious that designing an unstructured sampling set with about 1.5 million candidate locations is unfeasible with current computing resources. A model of in the form of (1) can be obtained by viewing as a signal that lives on a graph. In particular, the first two modes of can be viewed as a signal defined on the Cartesian product of a user and movie graph, respectively. These two graphs, shown in Fig. 6, are provided in the dataset and are two 10-nearest-neighbors graphs created based on the user and movie features. Based on the recent advances in graph signal processing (GSP) [29, 30], can be decomposed as . Here, and are the eigenbases of the Laplacians of the user and movie graphs, respectively, and is the so-called graph spectrum of  [29, 30]. Suppose the energy of the spectrum of is concentrated in the first few and columns of and , respectively, then admits a low-dimensional representation, or is said to be smooth or bandlimited with respect to the underlying graph [30]. This property has been exploited in [31, 32]

to impute the missing entries in

. In contrast, we propose a scheme for sampling and reconstruction of signals defined on product graphs. In our experiments, we set , and obtain the decomposition , where and consist of the first and columns of and , respectively; and .

For the greedy algorithm we use and , resulting in a selection of users and movies, i.e., a total of vertices in the product graph. Fig. 6, shows the sampled users and movies, i.e., users to be probed for movie ratings. The user graph [cf. Fig. 5(a)] is made out of small clusters connected in a chain-like structure, resulting in a uniformly spread distribution of observed vertices. On the other hand, the movies graph [cf. Fig. 5(b)] is made out of a few big and small clusters. Hence, the proposed active querying scheme assigns more observations to the bigger clusters and fewer observations to the smaller ones. To evaluate the performance of our algorithm, we compute the RMSE of the estimated data using the test mask provided by the dataset. Nevertheless, since our active query method requires access to ground truth data (i.e., we need access to the samples at locations suggested by the greedy algorithm) which is not provided in the dataset, we use GRALS [34] to complete the matrix, and use its estimates when required. A comparison of our algorithm to the performance of the state-of-the-art methods run on the same dataset is shown in Table I. In light of these results, it is clear that a proper design of the sampling set allows to obtain top performance with significantly fewer ratings, i.e., about an order of magnitude, and using a much simpler non-iterative estimator.

### Vi-C Multiuser source separation

In multiple-input multiple-output (MIMO) communications [37], the use of rectangular arrays [38] allows to separate signals coming from different azimuth and elevation angles, and it is common that users transmit data using different spreading codes to reduce the interference from other sources. Reducing hardware complexity by minimizing the number of antennas and samples to be processed is an important concern in the design of MIMO receivers. This design can be seen as a particular instance of sparse tensor sampling. We consider a scenario with users located at different angles of azimuth () and elevation () transmitting using unique spreading sequences of length . The receiver consists of a uniform rectangular array (URA) with antennas located on a grid. Each time instant, every antenna receives [38]

 x(r,l,m,n) =Kc∑k=1sk(r)ck(l)ej2πnΔxsinθkej2πmΔysinϕk +w(r,l,m,n),

where the symbol transmitted by user in the th symbol period; the th sample of the spreading sequence of the th user; and the antenna separations in wavelengths of the URA in the and dimensions, respectively; and and the azimuth and elevation coordinates of user , respectively; and where represents an additive white Gaussian noise term with zero mean and variance . For the -th symbol period, all these signals can be collected in a 3rd-order tensor that can be decomposed as

 \mathbfcalX(r)=\mathbfcalS(r)∙1U1∙2U2∙3U3+\mathbfcalW(r)

where and are the array responses for the and directions, respectively; contains the spreading sequences of all users in its columns; and is a diagonal tensor that stores the symbols of all users for the th symbol period on its diagonal.

We simulate this setup using users that transmit BPSK symbols with different random powers and that are equispaced in azimuth and elevation. We use a rectangular array with and for the ground set locations of the antennas, and binary random spreading sequences of length . With these parameters, each has entries. We generate many realizations of these signals for different levels of signal-to-noise ratio (SNR) and sample the resulting tensors using the greedy algorithm for the diagonal core case with , resulting in a relative number of samples of . The results are depicted in Fig. 7, where the blue shaded area represents the MSE obtained with the best and worst random samplers. As expected, the MSE of the reconstruction decreases exponentially with the SNR. For a given MSE, achieving maximum compression requires transmitting with a higher SNR of about 30dB than the one needed for no compression. Besides, we see that our proposed greedy algorithm consistently performs as well as the best random sampling scheme.

## Vii Conclusions

In this paper, we presented the design of sparse samplers for inverse problems with tensors. We have seen that by using samplers with a Kronecker structure we can overcome the curse of dimensionality, and design efficient subsampling schemes that guarantee a good performance for the reconstruction of multidomain tensor signals. We presented sparse sampling design methods for cases in which the multidomain signals can be decomposed using a multilinear model with a dense core or a diagonal core. For both cases, we have provided a near-optimal greedy algorithm based on submodular optimization methods to compute the sampling sets.

### -a Proof of Theorem 2

In order to simplify the derivations, let us introduce the notation so that can also be written

 G(