# Joint Activity Detection and Channel Estimation for IoT Networks: Phase Transition and Computation-Estimation Tradeoff

Massive device connectivity is a crucial communication challenge for Internet of Things (IoT) networks, which consist of a large number of devices with sporadic traffic. In each coherence block, the serving base station needs to identify the active devices and estimate their channel state information for effective communication. By exploiting the sparsity pattern of data transmission, we develop a structured group sparsity estimation method to simultaneously detect the active devices and estimate the corresponding channels. This method significantly reduces the signature sequence length while supporting massive IoT access. To determine the optimal signature sequence length, we study the phase transition behavior of the group sparsity estimation problem. Specifically, user activity can be successfully estimated with a high probability when the signature sequence length exceeds a threshold; otherwise, it fails with a high probability. The location and width of the phase transition region are characterized via the theory of conic integral geometry. We further develop a smoothing method to solve the high-dimensional structured estimation problem with a given limited time budget. This is achieved by sharply characterizing the convergence rate in terms of the smoothing parameter, signature sequence length and estimation accuracy, yielding a trade-off between the estimation accuracy and computational cost. Numerical results are provided to illustrate the accuracy of our theoretical results and the benefits of smoothing techniques.

## Authors

• 55 publications
• 28 publications
• 77 publications
• 20 publications
• ### A Dimension Reduction-Based Joint Activity Detection and Channel Estimation Algorithm for Massive Access

Grant-free random access is a promising protocol to support massive acce...
12/18/2019 ∙ by Xiaodan Shao, et al. ∙ 0

• ### Joint Active User Detection and Channel Estimation in Massive Access Systems Exploiting Reed-Muller Sequences

The requirements to support massive connectivity and low latency in mass...
03/23/2019 ∙ by Jue Wang, et al. ∙ 0

• ### Joint Activity Detection and Channel Estimation for mmW/THz Wideband Massive Access

Millimeter-wave/Terahertz (mmW/THz) communications have shown great pote...
01/28/2020 ∙ by Xiaodan Shao, et al. ∙ 0

• ### Phase Transition Analysis for Covariance Based Massive Random Access with Massive MIMO

This paper considers the massive random access problem in which a large ...
03/09/2020 ∙ by Zhilin Chen, et al. ∙ 0

• ### Sparse Activity Detection for Massive Connectivity

This paper considers the massive connectivity application in which a lar...
01/17/2018 ∙ by Zhilin Chen, et al. ∙ 0

• ### ML Estimation and MAP Estimation for Device Activities in Grant-Free Random Access with Interference

Device activity detection is one main challenge in grant-free random acc...
02/07/2020 ∙ by Dongdong Jiang, et al. ∙ 0

• ### Computational Phase Transition Signature in Gibbs Sampling

Gibbs sampling is fundamental to a wide range of computer algorithms. Su...
06/25/2019 ∙ by H. Philathong, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The explosion of small and cheap computing devices endowed with sensing and communication capability is paving the way towards the era of Internet of Things (IoT), which is expected to improve people’s daily life and bring socio-economic benefits. For example, connecting the automation systems of intelligent buildings to the Internet enables to control and manage different smart devices to save energy and improve the convenience for residents [1]. Other applications include smart home, smart city and smart health care [1]. To provide ubiquitous connectivity to enable such IoT based applications, massive machine-type communications and ultra-reliable and low latency communications become critical in the upcoming 5G networks [2, 3]. In particular, in many scenarios, there are huge numbers of devices to be connected to the Internet via the base-station (BS). Thus supporting massive device connectivity is a crucial requirement for IoT networks [4, 5, 6].

Existing cellular standards, including 4G LTE [7], are unable to support massive IoT connectivity. Furthermore, the acquisition of the channel state information that is needed for the effective transmissions will bring huge overheads, and thus will make IoT communications even more challenging [5]. Fortunately, the IoT data traffic is typically sporadic, i.e., only a few devices are active at any given instant out of all the devices [8]. For example, in sensor networks, a device is typically designed to stay in the sleep mode and is triggered only by external events in order to save energy. By exploiting the sparsity in the device activity pattern, it is possible to design efficient schemes to support simultaneous device activity detection and channel estimation. As it is not feasible to assign orthogonal signature sequences to all the devices, this paper studies the Joint Activity Detection and channel Estimation (JADE) problem considering non-orthogonal signature sequences [9, 10].

### I-a Related Work

A growing body of literatures have recently proposed various methods to deal with massive device connectivity and the high-dimensional channel estimation problem. The compressed sensing (CS) based channel estimation techniques have been proposed by exploiting the sparsity of channel structures in time, frequency, angular and Doppler domains [11, 12, 13]. The spatial and temporal prior information was further exploited to solve the high-dimensional channel estimation problem in dense wireless cooperative networks [14]. However, in IoT networks with a limited channel coherence time, it is critical to further exploit the sparsity in the device activity pattern to enhance the channel estimation performance [3, 10], thereby reducing the training overhead. Due to the large-scale nature of IoT communications, it is also critical to develop efficient algorithms to address the computation issue.

The sporadic device activity detection problem has recently been investigated. In the context of cellular networks, the random access scheme was investigated in [15, 16] to deal with the significant overhead incurred by the massive number of devices. In the random access scheme, a connection between an active device and the BS shall be established if the orthogonal signature sequence randomly selected by the active device is not used by other devices. This scheme, however, normally causes collision among a huge number of devices. To support a massive number of devices, we thus focus on the non-orthogonal multiuser access (NOMA) scheme [9], which is able to simultaneously serve multiple devices via nonorthogonal resource allocation. The opportunities and challenges of NOMA for supporting massive connectivity are investigated in [9]. Furthermore, network densification [17] turns to be a promising way to improve network capacity, enable low-latency mobile applications and support massive device connectivity by deploying more radio access points in IoT networks [18] .

The information theoretical capacity for massive connectivity was studied in [19]. The sparsity activity pattern yields a compressed sensing based formulation [10, 20] to detect the active devices and estimate the channels. Recall that the channel state information (CSI) refers to the channel propagation coefficients that describe how a signal propagates between transmitters and receivers. In particular, in the related statements of “prior knowledge of CSI”, CSI refers to the distribution information. Assuming perfect channel state information (CSI), a sparsity-exploiting maximum a posteriori probability (S-MAP) criterion for multi-user detection in CDMA systems was developed in [20]. The authors of [21, 22] considered the multi-user detection problem with the aid of channel prior-information. In [10, 23, 24], a joint design of channel estimation and user activity detection via the approximate message passing (AMP) algorithm was developed, which leverages the statistical channel information and large-scale fading coefficients to enhance the Bayesian AMP algorithm with rigorous performance analysis. However, our approach does not require prior information of the distribution of CSI to reduce the signaling overhead. When assuming no prior knowledge of the distribution of CSI, the joint user detection and channel estimation approach for cloud radio access network via the ADMM algorithm was proposed in [25] without performance analysis.

In this paper, to eliminate the overheads of acquiring large-scale fading coefficients and statistical channel information, we propose a structured group sparsity estimation approach to solve the JADE problem without prior knowledge of the distribution of CSI. To determine the optimal signature sequence length, we provide precise characterization for the phase transition behaviors in the structured group sparsity estimation problem. Although the bounds on the multi-user detection error in the non-orthogonal multiple access system have been presented in [22] based on the restricted isometry property [26], the order-wise estimates are normally not accurate enough for practitioners. A convex geometry approach was thus introduced in [27] to provide sharp estimates of the number of required measurements for exact and robust recovery of structured signals. However, this approach can only provide the success conditions for signal recovery guarantees. Subsequently, the phase transition of a regularized linear inverse problem with random measurements was studied in [28, 29] based on the theory of conic integral geometry [30], which established both the success and failure conditions for signal recovery. In particular, the location and width of the transition are essentially controlled by the statistical dimension of a descent cone associated with the convex regularizers. However, these results are only applicable in the real domain. It is not yet clear how to apply the appealing methodology developed in [28] to provide sharp phase transition results for the high-dimensional estimation problem in the complex domain in IoT networks, which will be pursued in this paper.

The large number of devices in IoT networks raises unique computational challenges when solving the JADE problem with a fixed time budget. Unfortunately, second-order methods like interior point method are inapplicable in large scale optimization problems due to its poor scalability. In contrast, first-order methods, e.g., gradient methods, proximal methods [31], alternating direction method of multipliers (ADMM) algorithm [32, 33], fast ADMM algorithm [34] and Nesterov-type algorithms [35] are particularly useful for solving large-scale problems. Therefore, we focus on the first-order method in this paper. Furthermore, one way to minimize the computational complexity is to reduce the cost of each iteration by sketching approaches [36, 37]. However, this method is often suitable for solving an over-determined system instead of the under-determined linear system in our case. A different approach is to accelerate the convergence rate without increasing the computational cost of each iteration. It was shown in [38] that with more data it is possible to increase the step-size in the projected gradient method, thereby achieving a faster convergence rate. The authors of [39] showed that by modifying the original iterations, it is possible to achieve faster convergence rates to maintain the estimation accuracy without increasing the computational cost of each iteration considerably. More generally, smoothing techniques such as convex relaxation [40] or simply adding a nice smooth function to smooth the non-differentiable objective function [41, 35, 42] often achieves a faster convergence rate. However, the amount of smoothing should be chosen carefully to guarantee the performance of sporadic device activity detection in IoT networks. In this paper, the smoothing method will be exploited to solve the high-dimensional group sparsity estimation problem with a fixed time budget by accelerating the convergence rate. This yields a trade-off between the computational cost and estimation accuracy, as increasing the smoothing parameter will normally reduce the estimation accuracy. The trade-off framework further provides guidelines for choosing the signature sequence length to maintain the estimation accuracy.

### I-B Applications in IoT Systems

The proposed approach in this paper pervades a large number of applications in IoT systems. For instance, detecting active devices shall enhance data transmission efficiency in dynamic IoT networks [43] and wireless sensor networks. The proposed computation-estimation trade-off techniques are particularly suitable for real-time wireless IoT networks, e.g., vehicular networks [44], as well as providing fault-tolerance communication and supporting high QoS and QoE requirements [45] with low estimation errors. While the lower computational complexity comes at the cost of relatively high estimation errors, it shall reduce energy consumption significantly, and thus is suitable for energy sensitive applications [46]. In addition, the proposed approaches can be jointly designed with the secure access methods, which shall enable smart applications of IoT devices especially related to healthcare applications [47].

### I-C Contributions

The major contributions of the paper are summarized as follows:

• By exploiting sparsity in the device activity pattern, we propose a structured group sparsity estimation approach to solve the JADE problem for massive IoT connectivity. Our method is widely applicable and does not depend on the knowledge of channel statistical information and the large-scale fading coefficients.

• Based on the theory of conic integral geometry, we provide precise prediction for the location and the width of the phase transition region of the sparsity estimation problem via establishing both the failure and success conditions for signal recovery. This result provides theoretical guidelines for choosing the optimal signature sequence length to support massive IoT connectivity and channel estimation. We also provide evidence that massive multiple input multiple output (MIMO) system is particularly suitable for supporting massive IoT connectivity, as the width of the phase transition region can be narrowed to zero asymptotically as the number of BS antennas increases.

• We further contribute this work by computing the statistical dimension for the descent cone of the group sparsity inducing regularizer to determine the phase transition of the high-dimensional group sparsity estimation problem. The success of this work is based on the proposal of transforming the original complex estimation problem into the real domain, thereby leveraging the theory of conic integral geometry.

• To solve the high-dimensional group sparsity estimation problem with a fixed time budget, we adopt the smoothing method to smooth the non-differentiable group sparsity inducing regularizer to accelerate the convergence rates. We further characterize the sharp trade-offs between the computational cost and estimation accuracy. This helps guide the signature sequence design to maintain the estimation accuracy for the smoothed estimator. Numerical results shall be provided to show the benefits of smoothing techniques.

Notations

: Uppercase/lowercase boldface letters denote matrices/vectors. For an

matrix , we denote its row by , its column by . Let denote the row submatrix of consisting of the rows indexed by . The operator stand for transpose, Euclidian norm, Frobenius norm, real part, imaginary part. denotes that each element in

follows i.i.d. normal distribution with mean

and variance

.

## Ii System model and problem formulation

### Ii-a System Model and Problem Formulation

We consider an IoT network with one BS serving single-antenna IoT devices, where the BS is equipped with antennas. The channel vector from device to the BS is denoted by , . With sporadic communications, only a few devices are active out of all devices [8] as shown in Fig.1. We consider the synchronized wireless system with block fading. That is, each device is active during a coherence block, and is inactive otherwise. In each block, we define the device activity indicator as follows: if device is active, otherwise . Furthermore, we define the set of active devices within a coherence block as with denoting the number of active devices.

For uplink transmission in a coherence block with length , we consider the Joint Activity Detection and channel Estimation (JADE) problem. Specifically, the received signal at the BS is given by

 y(ℓ)=N∑i=1hiaiqi(ℓ)+n(ℓ)=∑i∈Shiqi(ℓ)+n(ℓ), (1)

for all . Here, is the length of the signature sequence, is the signature symbol transmitted from device at time slot , is the received signal at the BS, and is the additive noise distributed as .

With massive devices and a limited channel coherence block, the length of the signature sequence is typically smaller than the total number of devices, i.e., . It is thus impossible to assign mutually orthogonal sequences to all the devices. As suggested in [10]

, we generate the signature sequences from i.i.d. complex Gaussian distribution with zero mean and variance one, i.e., each device

is assigned a unique signature sequence . Notice these sequences are non-orthogonal.

Let denote the received signal across antennas, be the channel matrix from all the devices to the BS antennas, and be the known signature matrix with . We rewrite (1) as

 Y=QAH+N, (2)

where is the diagonal activity matrix and is the additive noise matrix. Our goal is to jointly estimate the channel matrix and detect the activity matrix .

Let with as the sparse diagonal activity matrix. Matrix thus has the structured group sparsity pattern in its rows [48]. The linear measurement model (2) can be further rewritten as

 Y=QΘ0+N. (3)

To estimate the group row sparse matrix , we introduce the following convex group sparse inducing norm (i.e., mixed -norm) in the form of [48]

 R(Θ):=N∑i=1∥θi∥2, (4)

where is the -th row of matrix . This norm will help to induce a group sparsity structure in the solution. The resulting group sparse matrix estimation problem, i.e., the JADE problem, can thus be formulated as the following convex optimization problem:

 P:minimizeΘ∈CN×M R(Θ)subject to ∥QΘ−Y∥F≤ϵ, (5)

where is an upper bound on and assumed to be known as a priori. Given the estimate matrix , the activity matrix can be recovered as , where if for a small enough threshold ; otherwise, . The estimated channel matrix for the active devices is thus given by with its -th row as where .

### Ii-B Problem Analysis

#### Ii-B1 Phase Transitions

Due to the limited radio resources, it is critical to precisely find the minimal number of signature symbols to support massive device access. This can be achieved by precisely revealing the locations of the phase transition of the high-dimensional group sparsity estimation problem via solving the convex optimization problem . Although recent years have seen progresses on structured signal estimation [49, 50, 27], they only provide a success condition for signal recovery without precise phase transition analysis. The recent work [28] provided a principled framework to predict phase transitions (including the location and width of the transition region) for random cone programs [51] via the theory of conic integral geometry. Unfortunately, the approach based on conic integral geometry is only applicable in the real field case, which thus cannot be directly applied for problem in the complex field. To address this issue, we propose to approximate the original complex estimation problem by a real estimation problem, followed by precise phase transition analysis via conic integral geometry [28]. Theoretical results and numerical experiments will provide evidences that the approximations are quite tight. We shall prove that the locations of phase transitions are determined by the intrinsic geometry invariants (i.e., the statistical dimension) associated with the high-dimensional estimation problem . In particular, we will show that the width of the transition region can be reduced to zero asymptotically in the limit as the number of antennas at the BS goes to infinity. Therefore, massive MIMO is especially well-suited for supporting massive IoT connectivity by providing accurate phase transition location.

#### Ii-B2 Computation and Estimation Trade-offs

To address the computational challenges in massive IoT networks with a limited time budget, we adopt the smoothing method to smooth the non-differentiable group sparsity inducing regularizer to accelerate the convergence rates. The computational speedups can be achieved by projecting onto simpler sets [40], varying the amount of smoothing [42], or adjusting the step sizes [38] applied to the optimization algorithms. However, the computational speedups will normally reduce the estimation accuracy. Based on the phase transition results, we shall propose to control the amount of smoothing to achieve sharp computation and estimation tradeoffs for the smoothed optimization problem via the smoothing method. The smoothed formulation can be further efficiently solved via various efficient first-order methods with cheap iterations and low memory cost, e.g., gradient methods, proximal methods [31], alternating direction method of multipliers (ADMM) algorithm [32], fast ADMM algorithm [34] and Nesterov-type algorithms [35].

## Iii Precise Phase transition analysis

In this section, we study the phase transition phenomenon when solving the JADE problem.

An example of such phenomenon is demonstrated in Fig. 2, from which we see that the empirical success probability changes from to sharply. In particular, this indicates that when the base station is equipped with antennas, the signature sequence length around 30 is sufficient to achieve exact signal recovery for devices where of them are active. Thus if we can accurately find the location of the phase transition, we may choose a minimal signature sequence length accordingly to support massive IoT connectivity and channel estimation.

In the following, we provide precise analysis of the location and width of the phase transition region via characterizing both success and failure conditions for signal recovery based on the conic geometry, followed by computing the probability for holding the conic optimality conditions.

### Iii-a Optimality Condition and Convex Geometry

We consider the real-valued counterpart of the statistical optimization problem as follows:

 Pr:minimize~Θ∈R2N×M RG(~Θ)subject to ∥~Q~Θ−~Y∥F≤ϵ, (6)

where the linear observation in the real domain is given by

 ~Y =~Q~Θ0+~N (7)

and the regularizer is defined as . Here is the row submatrix of consisting of the rows indexed by .

To facilitate phase transition analysis, problem can be further approximated as the following structured group sparse estimation problem with group size :

 Papprox:minimize~Θ∈R2N×M RG(~Θ)subject to ∥¯Q~Θ−~Y∥F≤ϵ, (8)

where

is a Gaussian random matrix. The phase transition of the approximated problem

is empirically demonstrated to coincide with the original problem [53, 13] with structured distribution in the measurement matrix . This will be further verified in the numerical experiments in Section V. Additionally, there are extensive empirical evidences [54, 55] showing that the distribution of the random measurement matrix has little effect on the locations of phase transitions. We thus focus on characterizing the phase transitions of the approximate problem in the real field.

To make the presentation clear, we first characterize the phase transitions in the noiseless case and then extend the results to the noisy case. In the noiseless case, we rewrite problem as follows:

 Pa:minimize~Θ∈R2N×M RG(~Θ)subject to ~Y=¯Q~Θ. (9)

Problem is said to succeed for exact recovery when it has a unique optimal points , which equals the ground-truth ; otherwise, it fails. Here, the phase transition refers to the phenomenon that problem changes from the failure state to the successful state as the sequence length increases. In order to establish the optimality condition for problem , we present the following definition in convex analysis [28].

###### Definition 1.

(Descent Cone): The descent cone of a proper convex function at point is the conic hull of the perturbations that do not increase near , i.e.,

 D(R,x)=⋃τ>0{y∈Rd:R(x+τy)≤R(x)}.

Let denote the null space of the operator . With the aid of the descent cone [56], we shall establish the necessary and sufficient condition for the success of problem via convex analysis [27, 28].

###### Fact 1.

(Optimality Condition): Let be a proper convex function. Matrix is the unique optimal solution to problem if and only if .

Fig. 3 illustrates the geometry of this optimality condition. Problem succeeds if and only if the null space of misses the cone of descent directions of at the ground-truth ; otherwise it fails since the optimal solution is as illustrated in Fig. 3 (b). Intuitively, a smaller size of the decent cone will lead to a higher successful recovery probability of . It is thus critical to characterize the size of the decent cone to depict the phase transition phenomena.

Based on the optimality condition, the phase transition problem is transformed into a classic problem in conic integral geometry: what is the probability that a randomly rotated convex cone shares a ray with a fixed convex cone? The Kinematic formula [30] provides an exact formula for computing this probability. However, this exact formula is hard to calculate. We thus present a practical formula that characterizes the phase transition in two intersection cones in terms of the statistical dimension [28].

###### Definition 2.

(Statistical Dimension): The statistical dimension of a closed convex cone in is defined as:

 δ(C)=E[∥ΠC(g)∥22],

where is a standard normal vector, is the Euclidean norm, and denotes the Euclidian projection onto .

The statistical dimension allows us to measure the size of convex cones and is the generalization of the dimension of linear subspaces. We state the approximated conic kinematic formula based on the statistical dimensions of general convex cones [28].

###### Theorem 1.

(Approximate Kinematic Formula): Fix a tolerance . Let and be convex cones in , but one of them is not a subspace. Draw a random orthogonal basis . Then

 δ(C)+δ(K)≤d−aη√d⟹P{C∩UK≠{0}}≤ηδ(C)+δ(K)≥d+aη√d⟹P{C∩UK≠{0}}≥1−η

where .

This theorem indicates a phase transition on whether the two randomly rotated cones sharing a ray. That is, when the total statistical dimension of the two cones exceeds the ambient dimension , the two randomly rotated cones share a ray with high probability; otherwise, they fail to share a ray.

### Iii-B Phase Transition for Massive IoT Connectivity

Based on general results in Theorem 1, we shall present the phase transition results for the exact recovery of the program in the noiseless case and robust recovery in the noisy case.

#### Iii-B1 Phase Transition in the Noiseless Case

To predict phase transitions of program for signal recovery, we essentially need to compute the probability for holding the optimality condition in Fact 1. Specifically, for Gaussian random matrix , its nullity is with probability one. Therefore, the statistical dimension of is . By replacing convex cones and in Theorem 1 by the descent cone and the subspace , we have the following recovery guarantees for signal recovery via program .

###### Theorem 2.

(Phase Transition of Problem ): Fix a tolerance . Let be a fixed matrix. Suppose , and let . Then

 2L≥δ(D(RG,~Θ0))M+aη√2NMM⇒P{Pa succeeds}≥1−η2L≤δ(D(RG,~Θ0))M−aη√2NMM⇒P{Pa succeeds}≤η

where .

The above theorem indicates that indeed reveals a phase transition when the signature sequence lengths . The transition from failure to success across a sharp range with width . The phase transition location is thus quite accurate. We will show that the size of the decent cone of at a point depends solely on its sparsity level.

There are mainly two implications of Theorem 2. First, in the absence of noise, one can see that the proposed formulation allows perfect signal recovery with exponentially high probability if and only if the number of signature sequence length exceeds the range of phase transition. Second, increasing the number of antennas in BS will narrow the range of phase transition. In particular, the width of the transition region can be reduced to zero asymptotically as the number of antennas at the BS goes to infinity. Therefore, massive MIMO is particularly suitable for supporting massive IoT connectivity by predicting accurate phase transition location.

The sharp phase transition results are thus able to guide the selection of the signature sequence length. We will further contribute this work by computing the statistical dimension of the descent cone for the group sparse inducing norm in Section III-C.

#### Iii-B2 Phase Transition in the Noisy Case

Let be an estimate of the ground truth matrix . To evaluate the accuracy of the estimator, we define the average squared prediction error as follows:

 R(~Θ∗)=12LM∥¯Q~Θ∗−¯Q~Θ0∥2F. (10)

We further define the estimation error of the estimator as for a given signature matrix and ground truth matrix . We will see this quantity enjoys a phase transition as varies.

To facilitate efficient analysis in the noisy case, we consider the following formulation:

 Pb:minimize~Θ∈R2N×M∥¯Q~Θ−~Y∥2Fsubject to RG(~Θ)≤RG(~Θ0), (11)

which is equivalent to problem for some choice of the parameter . It turns out that this problem also undergoes a phase transition when the length of the signature sequence is picked as , which is coincident with the noiseless case [29]. We shall provide sharp phase transition results for robust group sparse estimation via program in the following theorem.

###### Theorem 3.

(Phase Transition of Problem ): Assume matrix satisfies . Let the noise matrix be independent of and with . Let denote the optimal solution to problem . The prediction error and empirical error is defined as , , respectively. Set . Then there exist constants such that

• Whenever ,

 maxσ>0 E~N[R(~Θ∗)]σ2 =1, (12) limσ→0E~N[^R(~Θ∗)]σ2 =0, (13)

with probability .

• Whenever ,

 ∣∣ ∣∣maxσ>0 E~N[R(~Θ∗)]σ2−δL∣∣ ∣∣ ≤t√2NM2LM, (14) ∣∣ ∣∣limσ→0 E~N[^R(~Θ∗)]σ2−(1−δL)∣∣ ∣∣ ≤t√2NM2LM, (15)

with probability .

Here, the probabilities are calculated over the random measurement matrix .

###### Proof.

Please refer to Appendix A for details. ∎

This theorem describes a phase transition at location in the noisy case, which extends the results in the noiseless case. When the signature sequence length is smaller than , the worst-case estimation error is simply the noise power , and increasing

cannot decrease the estimation error. This means that the regularized linear regression problem is sensitive to noise. After crossing the phase transition, increasing the signature length can reduce the worst-case estimation error at the rate

. The worst-case estimation error is achieved when [29]. It will be verified in section V that the obtained phase transition results accurately depict the phase transition behavior of the original problem . One observation in Theorem 3 is that the behavior of empirical estimation error provides guidance for choosing parameter in problem . Using the worst case empirical estimation error, we can set

 ϵ=σ√2LM−δ(D(~RG,~Θ0)), (16)

provided a reasonable estimate of noise power .

### Iii-C Computing the Statistical Dimension

Theorem 2 and Theorem 3 allow us to sharply locate the phase transitions for and , respectively, and computing the statistical dimension of the descent cone is the key to evaluate the theoretical results. But this presents its own challenges to provide a computationally feasible formula for the statistical dimension. We thus provide an accurate estimate and insightful expression for using the following recipe suggested in [28].

###### Lemma 1.

(The Statistical Dimension of a Descent Cone): Let be a proper convex function and . Assume that the sub-differential is non-empty, compact, and does not contain the origin. Then

 δ(D(R,x))≤infτ≥0 E[dist2(g,τ⋅∂R(x))], (17)

where is a standard normal vector.

Although Lemma 1 suggested a general method to study the statistical dimension of a descent cone, it still needs additional technical effort to compute accurate estimate for the statistical dimension of a descent cone for the group sparsity inducing norm adopted in this paper.

###### Proposition 1.

(Statistical Dimension for ): Let be with nonzero rows, and define the normalized sparsity . The upper bound of statistical dimension of descent cone of at is given by

 δ(D(RG;~Θ0))N≤infτ≥0{ρ(2M+τ2)+(1−ρ)21−MΓ(M)∫∞τ(u−τ)2u2M−1e−u22du}. (18)

The unique optimum which minimizes the right-hand side of (18) is the solution of

 21−MΓ(M)∫∞τ(uτ−1)u2M−1e−u22du=ρ1−ρ. (19)
###### Proof.

Please refer to Appendix B for details. ∎

The bound provided in Proposition 1 can be numerically computed efficiently, and thus can be utilized in Theorem 2 and Theorem 3 to compute the locations of phase transitions. Note that the bound only depends on the sparsity level of matrix and turns out to be accurate via extensive experiments.

## Iv Sharp Computation and Estimation Trade-offs via Smoothing Method

In an IoT network with a massive number of devices, it becomes critical to solve the JADE problem under a fixed time budget. To address the computational challenges for solving the high-dimensional group sparsity estimation problem, we adopt the smoothing method to smooth the non-differentiable group sparsity inducing regularizer to accelerate the convergence rates. We further characterize the sharp trade-off between the computational cost and estimation accuracy. This provides guidelines on choosing the optimal signature sequences to maintain the estimation accuracy for the smoothed group sparsity estimator.

### Iv-a Accelerating Convergence Rate via Smoothing

Adding a smooth function to “smooth” the non-differentiable objective function is a well-known idea in the context of sparse optimization, which makes the regularized problem easy to solve [41, 42]. In particular, for problem , we augment by adding a smoothing function , where is a positive scalar and called as the smoothing parameter. Problem is thus smoothed as

 Ps:minimizeΘ∈CN×M~R(Θ):=R(Θ)+μ2∥Θ∥2Fsubject to∥QΘ−Y∥F≤ϵ, (20)

which can be rewritten in the real domain as follows,

 P~r:minimize~Θ∈R2N×M~RG(~Θ)subject to∥~Q~Θ−~Y∥F≤ϵ, (21)

where , , and are given in problem (8).

As problem is not differentiable, applying the subgradient method to solve it would yield a slow coverage rate. Fortunately, the dual formulation of problem leverages the benefits from smoothing techniques, as the smoothed dual problem can be reduced to an unconstrained problem with the composite objective function consisting of a convex, smooth function and a convex, nonsmooth function. This composite form can be solved by a rich set of first-order methods such as Auslender and Teboulle’s algorithm [57], Nesterov’s 2007 algorithm (N07) [58] and Lan, Lu, and Monteiro’s modification of N07 (LLM) algorithm [59] etc., and these algorithms have the ( is the numerical accuracy) convergence rate [60, 35].

The dual problem of is given by

 maximizeZ,tD(Z,t):=inf~Θ{~R(~Θ)−⟨Z,~Q~Θ−~Y⟩−tϵ}subject to∥Z∥F≤t,

where and . Since , eliminating the dual variable , we obtain the unconstrained problem as follows

 minimizeZ∈R2N×M D(Z):=−inf~Θ{~R(~Θ)−⟨Z,~Q~Θ−~Y⟩−ϵ∥Z∥F}.

The dual objective function can be further represented as the following composite function

 D(Z)=−inf~Θ{~R(~Θ)−⟨Z,~Q~Θ⟩}−⟨Z,~Y⟩~D(Z)+ϵ∥Z∥FH(Z). (22)

Function is differentiable and its gradient is

 ∇~D(Z)=−~Y+~Q~ΘZ,

where

 ~ΘZ:=argmin~Θ{~R(~Θ)−⟨Z,~Q~Θ⟩}. (23)

Furthermore, is a Lipschitz continuous with Lipschitz constant upper bounded by . That is to say, the dual objective is a composition of the smooth function and the nonsmooth function . This composite form (22) can be solved by a rich set of first-order methods [35], which are particularly sensitive to the smoothing parameter , i.e., a larger value of the smoothing parameter leads to a faster convergence rate.

In particular, we present the Lan, Lu, and Monteiro”s algorithm [59] in Algorithm 1 as a typical example to show the benefits of smoothing.

In Algorithm 1, lines 1 is the solution to (23), line 1 and 1 are the solutions to the following composite gradient mapping respectively,

 ¯Zk+1←argminZ∈R2N×M{⟨∇~D(Z),Z⟩+12tkLs∥Z−¯Zk∥F+H(Z)},Zk+1←argminZ∈R2N×M{⟨∇~D(Z),Z⟩+12Ls∥Z−Bk∥F+H(Z)}.

The operator is given by

 Shrink(Z,t)=max{1−t∥Z∥F,0}Z.

Let . Each row of is given by

 xi=Shrink(zi,t),for i=1,⋯N.

Let be an optimal point for (22), then the convergence behavior of Algorithm 1 satisfies [35],

 D(Zk+1)−D(Z∗)≤2∥~Q∥22∥Z0−Z∗∥2Fμk2. (24)

Therefore, the number of iterations required to reach accuracy is at most , which implies that a larger will result in a faster convergence rate. For each iteration in Algorithm 1, the operators and are computationally cheap, and the dominate cost is the matrix-matrix products involving the signature matrix , which is .

In practice, we terminate the algorithm when the relative primal feasibility gap satisfies for a small enough . The bound of the feasibility gap of primal iterates at each iteration is given as follows [42],

 ∣∣∥~Q~Θk−~Y∥F−ϵ∣∣≤2∥~Q∥22∥Z∗∥Fμk. (25)

Therefore, the number of iterations sufficient for convergence is upper bounded as

 k≤2∥~Q∥22∥Z∗∥Fγ0μσ√2LM−δ(D(~RG,~Θ0)), (26)

which shows the number of iterations required for convergence in terms of the smoothing parameter, signature sequence length and solution accuracy. We will show in Fig. 6 that the convergence rate of the smoothed estimator will be accelerated as the smoothing parameter increases.

### Iv-B Computation and Estimation Trade-offs

From the geometric perspective, the smoothing term in (with ) enlarges the sublevel set of the regularizer , which results in a problem that is computationally easier to solve with an accelerated convergence rate. However, this geometric deformation brings a loss in the estimation accuracy according to the phase transition results in Theorem 3. This results in a trade-off between the computational time and estimation accuracy. The trade-off is controllable given the statistical dimension of the decent cone of the smoothed regularizer . In particular, the statistical dimension can be accurately estimated by the following result.

###### Proposition 2.

(Statistical Dimension Bound for ) Let be with nonzero rows, and define the normalized sparsity as . An upper bound of the statistical dimension of the descent cone of at is given by

 δ(D(~RG;~Θ0))N≤infτ≥0{ρ(2M+τ2(1+2μ¯a+μ2¯b))+(1−ρ)21−MΓ(M)∫∞τ(u−τ)2u2M−1e−u22du}. (27)

The unique optimum which minimizes the right-hand side of (27) is the solution of

 21−MΓ(M)∫∞τ(uτ−1)u2M−1e−u22du=ρ(1+2μ¯a+μ2¯b)1−ρ, (28)

where , .

###### Proof.

Please refer to Appendix C for details. ∎

Note that and can be calculated given the distribution of the ground truth . For instance, with , we have . Here, follows chi distribution with degrees of freedom and

follows chi square distribution with

degrees of freedom. Hence, we can set , .

Although the convergence rate can be accelerated by increasing the smoothing parameter as shown in the previous subsection, Proposition 2 suggests that a larger smoothing parameter results in a larger statistical dimension as the bound in (27) grows with . This will reduce the estimation accuracy for a given signature sequence length according to the result in Theorem 3. Fig. 7 will demonstrate that the estimation error indeed will increase as the smoothing parameter becomes large. Therefore, the smoothing method yields a trade-off between the computational cost and estimation accuracy, as increasing the smoothing parameter will improve the convergence rate while reduce the estimation accuracy. Such a tradeoff is particular important in scenarios with massive IoT devices and a limited time budget, but not very stringent requirement on estimation accuracy.

### Iv-C Discussion

For typical IoT applications, we are particularly interested in reducing the overall computational cost while maintaining the estimation accuracy, which can be achieved by interpreting the above trade-off from another perspective. For the smoothed estimator , Proposition 2 together with Theorem 3 can help to provide guidelines for choosing a minimal signature sequence length to maintain the estimation accuracy for a given smoothing parameter . Specifically, while smoothing may increase the estimation error, we can increase the signature sequence for the smoothed estimator compared with the original nonsmooth estimator . Specifically, given a smoothing parameter , according to Theorem 3, we are able to maintain the estimation accuracy by choosing the signature sequence length as follows

 L=δ(D(~RG(μ),~Θ0))2Mγ1, (29)

where is the expectation of the worst-case estimation accuracy normalized by noise power .

## V Simulation results

In this section, we verify the phase transition phenomena in IoT networks characterized by Theorem 2 and Theorem 3 via simulations. We further simulate the developed dual-smoothed algorithm to illustrate the benefits of smoothing, as well as the trade-offs between the estimation accuracy and computational cost.

### V-a Phase Transitions

To verify the phase transition in the noiseless case, we consider the scenario in which the base station is equipped with antennas, and the total number of devices is . For estimation problem in this noiseless setting, the channel matrix and signature matrix are generated as and , respectively. We declare successful recovery if , and we record the success probability from trials. The experiments are performed using the CVX package [52] in Matlab with default settings.

In Fig. 4 (a), we show the probability of successful recovery as a function of the signature sequence length and the number of active devices. The brightness corresponds to the empirical recovery probability (white = 100%, black = 0% ). On top of this heap map, the empirical curves of , , are success probabilities that are calculated from data. It can be seen that the theoretical curve from Theorem 2 closely matches the empirical curve of the success probability.

To verify the phase transition in the noisy case, we consider a scenario where the base station is equipped with antennas, and the total number of devices is . We fix the number of active devices as , hence the theoretical phase transition location is given as . For estimation problem , the channel matrix