I Introduction
The explosion of small and cheap computing devices endowed with sensing and communication capability is paving the way towards the era of Internet of Things (IoT), which is expected to improve people’s daily life and bring socioeconomic benefits. For example, connecting the automation systems of intelligent buildings to the Internet enables to control and manage different smart devices to save energy and improve the convenience for residents [1]. Other applications include smart home, smart city and smart health care [1]. To provide ubiquitous connectivity to enable such IoT based applications, massive machinetype communications and ultrareliable and low latency communications become critical in the upcoming 5G networks [2, 3]. In particular, in many scenarios, there are huge numbers of devices to be connected to the Internet via the basestation (BS). Thus supporting massive device connectivity is a crucial requirement for IoT networks [4, 5, 6].
Existing cellular standards, including 4G LTE [7], are unable to support massive IoT connectivity. Furthermore, the acquisition of the channel state information that is needed for the effective transmissions will bring huge overheads, and thus will make IoT communications even more challenging [5]. Fortunately, the IoT data traffic is typically sporadic, i.e., only a few devices are active at any given instant out of all the devices [8]. For example, in sensor networks, a device is typically designed to stay in the sleep mode and is triggered only by external events in order to save energy. By exploiting the sparsity in the device activity pattern, it is possible to design efficient schemes to support simultaneous device activity detection and channel estimation. As it is not feasible to assign orthogonal signature sequences to all the devices, this paper studies the Joint Activity Detection and channel Estimation (JADE) problem considering nonorthogonal signature sequences [9, 10].
Ia Related Work
A growing body of literatures have recently proposed various methods to deal with massive device connectivity and the highdimensional channel estimation problem. The compressed sensing (CS) based channel estimation techniques have been proposed by exploiting the sparsity of channel structures in time, frequency, angular and Doppler domains [11, 12, 13]. The spatial and temporal prior information was further exploited to solve the highdimensional channel estimation problem in dense wireless cooperative networks [14]. However, in IoT networks with a limited channel coherence time, it is critical to further exploit the sparsity in the device activity pattern to enhance the channel estimation performance [3, 10], thereby reducing the training overhead. Due to the largescale nature of IoT communications, it is also critical to develop efficient algorithms to address the computation issue.
The sporadic device activity detection problem has recently been investigated. In the context of cellular networks, the random access scheme was investigated in [15, 16] to deal with the significant overhead incurred by the massive number of devices. In the random access scheme, a connection between an active device and the BS shall be established if the orthogonal signature sequence randomly selected by the active device is not used by other devices. This scheme, however, normally causes collision among a huge number of devices. To support a massive number of devices, we thus focus on the nonorthogonal multiuser access (NOMA) scheme [9], which is able to simultaneously serve multiple devices via nonorthogonal resource allocation. The opportunities and challenges of NOMA for supporting massive connectivity are investigated in [9]. Furthermore, network densification [17] turns to be a promising way to improve network capacity, enable lowlatency mobile applications and support massive device connectivity by deploying more radio access points in IoT networks [18] .
The information theoretical capacity for massive connectivity was studied in [19]. The sparsity activity pattern yields a compressed sensing based formulation [10, 20] to detect the active devices and estimate the channels. Recall that the channel state information (CSI) refers to the channel propagation coefficients that describe how a signal propagates between transmitters and receivers. In particular, in the related statements of “prior knowledge of CSI”, CSI refers to the distribution information. Assuming perfect channel state information (CSI), a sparsityexploiting maximum a posteriori probability (SMAP) criterion for multiuser detection in CDMA systems was developed in [20]. The authors of [21, 22] considered the multiuser detection problem with the aid of channel priorinformation. In [10, 23, 24], a joint design of channel estimation and user activity detection via the approximate message passing (AMP) algorithm was developed, which leverages the statistical channel information and largescale fading coefficients to enhance the Bayesian AMP algorithm with rigorous performance analysis. However, our approach does not require prior information of the distribution of CSI to reduce the signaling overhead. When assuming no prior knowledge of the distribution of CSI, the joint user detection and channel estimation approach for cloud radio access network via the ADMM algorithm was proposed in [25] without performance analysis.
In this paper, to eliminate the overheads of acquiring largescale fading coefficients and statistical channel information, we propose a structured group sparsity estimation approach to solve the JADE problem without prior knowledge of the distribution of CSI. To determine the optimal signature sequence length, we provide precise characterization for the phase transition behaviors in the structured group sparsity estimation problem. Although the bounds on the multiuser detection error in the nonorthogonal multiple access system have been presented in [22] based on the restricted isometry property [26], the orderwise estimates are normally not accurate enough for practitioners. A convex geometry approach was thus introduced in [27] to provide sharp estimates of the number of required measurements for exact and robust recovery of structured signals. However, this approach can only provide the success conditions for signal recovery guarantees. Subsequently, the phase transition of a regularized linear inverse problem with random measurements was studied in [28, 29] based on the theory of conic integral geometry [30], which established both the success and failure conditions for signal recovery. In particular, the location and width of the transition are essentially controlled by the statistical dimension of a descent cone associated with the convex regularizers. However, these results are only applicable in the real domain. It is not yet clear how to apply the appealing methodology developed in [28] to provide sharp phase transition results for the highdimensional estimation problem in the complex domain in IoT networks, which will be pursued in this paper.
The large number of devices in IoT networks raises unique computational challenges when solving the JADE problem with a fixed time budget. Unfortunately, secondorder methods like interior point method are inapplicable in large scale optimization problems due to its poor scalability. In contrast, firstorder methods, e.g., gradient methods, proximal methods [31], alternating direction method of multipliers (ADMM) algorithm [32, 33], fast ADMM algorithm [34] and Nesterovtype algorithms [35] are particularly useful for solving largescale problems. Therefore, we focus on the firstorder method in this paper. Furthermore, one way to minimize the computational complexity is to reduce the cost of each iteration by sketching approaches [36, 37]. However, this method is often suitable for solving an overdetermined system instead of the underdetermined linear system in our case. A different approach is to accelerate the convergence rate without increasing the computational cost of each iteration. It was shown in [38] that with more data it is possible to increase the stepsize in the projected gradient method, thereby achieving a faster convergence rate. The authors of [39] showed that by modifying the original iterations, it is possible to achieve faster convergence rates to maintain the estimation accuracy without increasing the computational cost of each iteration considerably. More generally, smoothing techniques such as convex relaxation [40] or simply adding a nice smooth function to smooth the nondifferentiable objective function [41, 35, 42] often achieves a faster convergence rate. However, the amount of smoothing should be chosen carefully to guarantee the performance of sporadic device activity detection in IoT networks. In this paper, the smoothing method will be exploited to solve the highdimensional group sparsity estimation problem with a fixed time budget by accelerating the convergence rate. This yields a tradeoff between the computational cost and estimation accuracy, as increasing the smoothing parameter will normally reduce the estimation accuracy. The tradeoff framework further provides guidelines for choosing the signature sequence length to maintain the estimation accuracy.
IB Applications in IoT Systems
The proposed approach in this paper pervades a large number of applications in IoT systems. For instance, detecting active devices shall enhance data transmission efficiency in dynamic IoT networks [43] and wireless sensor networks. The proposed computationestimation tradeoff techniques are particularly suitable for realtime wireless IoT networks, e.g., vehicular networks [44], as well as providing faulttolerance communication and supporting high QoS and QoE requirements [45] with low estimation errors. While the lower computational complexity comes at the cost of relatively high estimation errors, it shall reduce energy consumption significantly, and thus is suitable for energy sensitive applications [46]. In addition, the proposed approaches can be jointly designed with the secure access methods, which shall enable smart applications of IoT devices especially related to healthcare applications [47].
IC Contributions
The major contributions of the paper are summarized as follows:

By exploiting sparsity in the device activity pattern, we propose a structured group sparsity estimation approach to solve the JADE problem for massive IoT connectivity. Our method is widely applicable and does not depend on the knowledge of channel statistical information and the largescale fading coefficients.

Based on the theory of conic integral geometry, we provide precise prediction for the location and the width of the phase transition region of the sparsity estimation problem via establishing both the failure and success conditions for signal recovery. This result provides theoretical guidelines for choosing the optimal signature sequence length to support massive IoT connectivity and channel estimation. We also provide evidence that massive multiple input multiple output (MIMO) system is particularly suitable for supporting massive IoT connectivity, as the width of the phase transition region can be narrowed to zero asymptotically as the number of BS antennas increases.

We further contribute this work by computing the statistical dimension for the descent cone of the group sparsity inducing regularizer to determine the phase transition of the highdimensional group sparsity estimation problem. The success of this work is based on the proposal of transforming the original complex estimation problem into the real domain, thereby leveraging the theory of conic integral geometry.

To solve the highdimensional group sparsity estimation problem with a fixed time budget, we adopt the smoothing method to smooth the nondifferentiable group sparsity inducing regularizer to accelerate the convergence rates. We further characterize the sharp tradeoffs between the computational cost and estimation accuracy. This helps guide the signature sequence design to maintain the estimation accuracy for the smoothed estimator. Numerical results shall be provided to show the benefits of smoothing techniques.
Notations
: Uppercase/lowercase boldface letters denote matrices/vectors. For an
matrix , we denote its row by , its column by . Let denote the row submatrix of consisting of the rows indexed by . The operator stand for transpose, Euclidian norm, Frobenius norm, real part, imaginary part. denotes that each element infollows i.i.d. normal distribution with mean
and variance
.Ii System model and problem formulation
Iia System Model and Problem Formulation
We consider an IoT network with one BS serving singleantenna IoT devices, where the BS is equipped with antennas. The channel vector from device to the BS is denoted by , . With sporadic communications, only a few devices are active out of all devices [8] as shown in Fig.1. We consider the synchronized wireless system with block fading. That is, each device is active during a coherence block, and is inactive otherwise. In each block, we define the device activity indicator as follows: if device is active, otherwise . Furthermore, we define the set of active devices within a coherence block as with denoting the number of active devices.
For uplink transmission in a coherence block with length , we consider the Joint Activity Detection and channel Estimation (JADE) problem. Specifically, the received signal at the BS is given by
(1) 
for all . Here, is the length of the signature sequence, is the signature symbol transmitted from device at time slot , is the received signal at the BS, and is the additive noise distributed as .
With massive devices and a limited channel coherence block, the length of the signature sequence is typically smaller than the total number of devices, i.e., . It is thus impossible to assign mutually orthogonal sequences to all the devices. As suggested in [10]
, we generate the signature sequences from i.i.d. complex Gaussian distribution with zero mean and variance one, i.e., each device
is assigned a unique signature sequence . Notice these sequences are nonorthogonal.Let denote the received signal across antennas, be the channel matrix from all the devices to the BS antennas, and be the known signature matrix with . We rewrite (1) as
(2) 
where is the diagonal activity matrix and is the additive noise matrix. Our goal is to jointly estimate the channel matrix and detect the activity matrix .
Let with as the sparse diagonal activity matrix. Matrix thus has the structured group sparsity pattern in its rows [48]. The linear measurement model (2) can be further rewritten as
(3) 
To estimate the group row sparse matrix , we introduce the following convex group sparse inducing norm (i.e., mixed norm) in the form of [48]
(4) 
where is the th row of matrix . This norm will help to induce a group sparsity structure in the solution. The resulting group sparse matrix estimation problem, i.e., the JADE problem, can thus be formulated as the following convex optimization problem:
(5) 
where is an upper bound on and assumed to be known as a priori. Given the estimate matrix , the activity matrix can be recovered as , where if for a small enough threshold ; otherwise, . The estimated channel matrix for the active devices is thus given by with its th row as where .
IiB Problem Analysis
IiB1 Phase Transitions
Due to the limited radio resources, it is critical to precisely find the minimal number of signature symbols to support massive device access. This can be achieved by precisely revealing the locations of the phase transition of the highdimensional group sparsity estimation problem via solving the convex optimization problem . Although recent years have seen progresses on structured signal estimation [49, 50, 27], they only provide a success condition for signal recovery without precise phase transition analysis. The recent work [28] provided a principled framework to predict phase transitions (including the location and width of the transition region) for random cone programs [51] via the theory of conic integral geometry. Unfortunately, the approach based on conic integral geometry is only applicable in the real field case, which thus cannot be directly applied for problem in the complex field. To address this issue, we propose to approximate the original complex estimation problem by a real estimation problem, followed by precise phase transition analysis via conic integral geometry [28]. Theoretical results and numerical experiments will provide evidences that the approximations are quite tight. We shall prove that the locations of phase transitions are determined by the intrinsic geometry invariants (i.e., the statistical dimension) associated with the highdimensional estimation problem . In particular, we will show that the width of the transition region can be reduced to zero asymptotically in the limit as the number of antennas at the BS goes to infinity. Therefore, massive MIMO is especially wellsuited for supporting massive IoT connectivity by providing accurate phase transition location.
IiB2 Computation and Estimation Tradeoffs
To address the computational challenges in massive IoT networks with a limited time budget, we adopt the smoothing method to smooth the nondifferentiable group sparsity inducing regularizer to accelerate the convergence rates. The computational speedups can be achieved by projecting onto simpler sets [40], varying the amount of smoothing [42], or adjusting the step sizes [38] applied to the optimization algorithms. However, the computational speedups will normally reduce the estimation accuracy. Based on the phase transition results, we shall propose to control the amount of smoothing to achieve sharp computation and estimation tradeoffs for the smoothed optimization problem via the smoothing method. The smoothed formulation can be further efficiently solved via various efficient firstorder methods with cheap iterations and low memory cost, e.g., gradient methods, proximal methods [31], alternating direction method of multipliers (ADMM) algorithm [32], fast ADMM algorithm [34] and Nesterovtype algorithms [35].
Iii Precise Phase transition analysis
In this section, we study the phase transition phenomenon when solving the JADE problem.
An example of such phenomenon is demonstrated in Fig. 2, from which we see that the empirical success probability changes from to sharply. In particular, this indicates that when the base station is equipped with antennas, the signature sequence length around 30 is sufficient to achieve exact signal recovery for devices where of them are active. Thus if we can accurately find the location of the phase transition, we may choose a minimal signature sequence length accordingly to support massive IoT connectivity and channel estimation.
In the following, we provide precise analysis of the location and width of the phase transition region via characterizing both success and failure conditions for signal recovery based on the conic geometry, followed by computing the probability for holding the conic optimality conditions.
Iiia Optimality Condition and Convex Geometry
We consider the realvalued counterpart of the statistical optimization problem as follows:
(6) 
where the linear observation in the real domain is given by
(7) 
and the regularizer is defined as . Here is the row submatrix of consisting of the rows indexed by .
To facilitate phase transition analysis, problem can be further approximated as the following structured group sparse estimation problem with group size :
(8) 
where
is a Gaussian random matrix. The phase transition of the approximated problem
is empirically demonstrated to coincide with the original problem [53, 13] with structured distribution in the measurement matrix . This will be further verified in the numerical experiments in Section V. Additionally, there are extensive empirical evidences [54, 55] showing that the distribution of the random measurement matrix has little effect on the locations of phase transitions. We thus focus on characterizing the phase transitions of the approximate problem in the real field.To make the presentation clear, we first characterize the phase transitions in the noiseless case and then extend the results to the noisy case. In the noiseless case, we rewrite problem as follows:
(9) 
Problem is said to succeed for exact recovery when it has a unique optimal points , which equals the groundtruth ; otherwise, it fails. Here, the phase transition refers to the phenomenon that problem changes from the failure state to the successful state as the sequence length increases. In order to establish the optimality condition for problem , we present the following definition in convex analysis [28].
Definition 1.
(Descent Cone): The descent cone of a proper convex function at point is the conic hull of the perturbations that do not increase near , i.e.,
Let denote the null space of the operator . With the aid of the descent cone [56], we shall establish the necessary and sufficient condition for the success of problem via convex analysis [27, 28].
Fact 1.
(Optimality Condition): Let be a proper convex function. Matrix is the unique optimal solution to problem if and only if .
Fig. 3 illustrates the geometry of this optimality condition. Problem succeeds if and only if the null space of misses the cone of descent directions of at the groundtruth ; otherwise it fails since the optimal solution is as illustrated in Fig. 3 (b). Intuitively, a smaller size of the decent cone will lead to a higher successful recovery probability of . It is thus critical to characterize the size of the decent cone to depict the phase transition phenomena.
Based on the optimality condition, the phase transition problem is transformed into a classic problem in conic integral geometry: what is the probability that a randomly rotated convex cone shares a ray with a fixed convex cone? The Kinematic formula [30] provides an exact formula for computing this probability. However, this exact formula is hard to calculate. We thus present a practical formula that characterizes the phase transition in two intersection cones in terms of the statistical dimension [28].
Definition 2.
(Statistical Dimension): The statistical dimension of a closed convex cone in is defined as:
where is a standard normal vector, is the Euclidean norm, and denotes the Euclidian projection onto .
The statistical dimension allows us to measure the size of convex cones and is the generalization of the dimension of linear subspaces. We state the approximated conic kinematic formula based on the statistical dimensions of general convex cones [28].
Theorem 1.
(Approximate Kinematic Formula): Fix a tolerance . Let and be convex cones in , but one of them is not a subspace. Draw a random orthogonal basis . Then
where .
This theorem indicates a phase transition on whether the two randomly rotated cones sharing a ray. That is, when the total statistical dimension of the two cones exceeds the ambient dimension , the two randomly rotated cones share a ray with high probability; otherwise, they fail to share a ray.
IiiB Phase Transition for Massive IoT Connectivity
Based on general results in Theorem 1, we shall present the phase transition results for the exact recovery of the program in the noiseless case and robust recovery in the noisy case.
IiiB1 Phase Transition in the Noiseless Case
To predict phase transitions of program for signal recovery, we essentially need to compute the probability for holding the optimality condition in Fact 1. Specifically, for Gaussian random matrix , its nullity is with probability one. Therefore, the statistical dimension of is . By replacing convex cones and in Theorem 1 by the descent cone and the subspace , we have the following recovery guarantees for signal recovery via program .
Theorem 2.
(Phase Transition of Problem ): Fix a tolerance . Let be a fixed matrix. Suppose , and let . Then
where .
The above theorem indicates that indeed reveals a phase transition when the signature sequence lengths . The transition from failure to success across a sharp range with width . The phase transition location is thus quite accurate. We will show that the size of the decent cone of at a point depends solely on its sparsity level.
There are mainly two implications of Theorem 2. First, in the absence of noise, one can see that the proposed formulation allows perfect signal recovery with exponentially high probability if and only if the number of signature sequence length exceeds the range of phase transition. Second, increasing the number of antennas in BS will narrow the range of phase transition. In particular, the width of the transition region can be reduced to zero asymptotically as the number of antennas at the BS goes to infinity. Therefore, massive MIMO is particularly suitable for supporting massive IoT connectivity by predicting accurate phase transition location.
The sharp phase transition results are thus able to guide the selection of the signature sequence length. We will further contribute this work by computing the statistical dimension of the descent cone for the group sparse inducing norm in Section IIIC.
IiiB2 Phase Transition in the Noisy Case
Let be an estimate of the ground truth matrix . To evaluate the accuracy of the estimator, we define the average squared prediction error as follows:
(10) 
We further define the estimation error of the estimator as for a given signature matrix and ground truth matrix . We will see this quantity enjoys a phase transition as varies.
To facilitate efficient analysis in the noisy case, we consider the following formulation:
(11) 
which is equivalent to problem for some choice of the parameter . It turns out that this problem also undergoes a phase transition when the length of the signature sequence is picked as , which is coincident with the noiseless case [29]. We shall provide sharp phase transition results for robust group sparse estimation via program in the following theorem.
Theorem 3.
(Phase Transition of Problem ): Assume matrix satisfies . Let the noise matrix be independent of and with . Let denote the optimal solution to problem . The prediction error and empirical error is defined as , , respectively. Set . Then there exist constants such that

Whenever ,
(12) (13) with probability .

Whenever ,
(14) (15) with probability .
Here, the probabilities are calculated over the random measurement matrix .
Proof.
Please refer to Appendix A for details. ∎
This theorem describes a phase transition at location in the noisy case, which extends the results in the noiseless case. When the signature sequence length is smaller than , the worstcase estimation error is simply the noise power , and increasing
cannot decrease the estimation error. This means that the regularized linear regression problem is sensitive to noise. After crossing the phase transition, increasing the signature length can reduce the worstcase estimation error at the rate
. The worstcase estimation error is achieved when [29]. It will be verified in section V that the obtained phase transition results accurately depict the phase transition behavior of the original problem . One observation in Theorem 3 is that the behavior of empirical estimation error provides guidance for choosing parameter in problem . Using the worst case empirical estimation error, we can set(16) 
provided a reasonable estimate of noise power .
IiiC Computing the Statistical Dimension
Theorem 2 and Theorem 3 allow us to sharply locate the phase transitions for and , respectively, and computing the statistical dimension of the descent cone is the key to evaluate the theoretical results. But this presents its own challenges to provide a computationally feasible formula for the statistical dimension. We thus provide an accurate estimate and insightful expression for using the following recipe suggested in [28].
Lemma 1.
(The Statistical Dimension of a Descent Cone): Let be a proper convex function and . Assume that the subdifferential is nonempty, compact, and does not contain the origin. Then
(17) 
where is a standard normal vector.
Although Lemma 1 suggested a general method to study the statistical dimension of a descent cone, it still needs additional technical effort to compute accurate estimate for the statistical dimension of a descent cone for the group sparsity inducing norm adopted in this paper.
Proposition 1.
(Statistical Dimension for ): Let be with nonzero rows, and define the normalized sparsity . The upper bound of statistical dimension of descent cone of at is given by
(18) 
The unique optimum which minimizes the righthand side of (18) is the solution of
(19) 
Proof.
Please refer to Appendix B for details. ∎
Iv Sharp Computation and Estimation Tradeoffs via Smoothing Method
In an IoT network with a massive number of devices, it becomes critical to solve the JADE problem under a fixed time budget. To address the computational challenges for solving the highdimensional group sparsity estimation problem, we adopt the smoothing method to smooth the nondifferentiable group sparsity inducing regularizer to accelerate the convergence rates. We further characterize the sharp tradeoff between the computational cost and estimation accuracy. This provides guidelines on choosing the optimal signature sequences to maintain the estimation accuracy for the smoothed group sparsity estimator.
Iva Accelerating Convergence Rate via Smoothing
Adding a smooth function to “smooth” the nondifferentiable objective function is a wellknown idea in the context of sparse optimization, which makes the regularized problem easy to solve [41, 42]. In particular, for problem , we augment by adding a smoothing function , where is a positive scalar and called as the smoothing parameter. Problem is thus smoothed as
(20) 
which can be rewritten in the real domain as follows,
(21) 
where , , and are given in problem (8).
As problem is not differentiable, applying the subgradient method to solve it would yield a slow coverage rate. Fortunately, the dual formulation of problem leverages the benefits from smoothing techniques, as the smoothed dual problem can be reduced to an unconstrained problem with the composite objective function consisting of a convex, smooth function and a convex, nonsmooth function. This composite form can be solved by a rich set of firstorder methods such as Auslender and Teboulle’s algorithm [57], Nesterov’s 2007 algorithm (N07) [58] and Lan, Lu, and Monteiro’s modification of N07 (LLM) algorithm [59] etc., and these algorithms have the ( is the numerical accuracy) convergence rate [60, 35].
The dual problem of is given by
where and . Since , eliminating the dual variable , we obtain the unconstrained problem as follows
The dual objective function can be further represented as the following composite function
(22) 
Function is differentiable and its gradient is
where
(23) 
Furthermore, is a Lipschitz continuous with Lipschitz constant upper bounded by . That is to say, the dual objective is a composition of the smooth function and the nonsmooth function . This composite form (22) can be solved by a rich set of firstorder methods [35], which are particularly sensitive to the smoothing parameter , i.e., a larger value of the smoothing parameter leads to a faster convergence rate.
In particular, we present the Lan, Lu, and Monteiro”s algorithm [59] in Algorithm 1 as a typical example to show the benefits of smoothing.
In Algorithm 1, lines 1 is the solution to (23), line 1 and 1 are the solutions to the following composite gradient mapping respectively,
The operator is given by
Let . Each row of is given by
Let be an optimal point for (22), then the convergence behavior of Algorithm 1 satisfies [35],
(24) 
Therefore, the number of iterations required to reach accuracy is at most , which implies that a larger will result in a faster convergence rate. For each iteration in Algorithm 1, the operators and are computationally cheap, and the dominate cost is the matrixmatrix products involving the signature matrix , which is .
In practice, we terminate the algorithm when the relative primal feasibility gap satisfies for a small enough . The bound of the feasibility gap of primal iterates at each iteration is given as follows [42],
(25) 
Therefore, the number of iterations sufficient for convergence is upper bounded as
(26) 
which shows the number of iterations required for convergence in terms of the smoothing parameter, signature sequence length and solution accuracy. We will show in Fig. 6 that the convergence rate of the smoothed estimator will be accelerated as the smoothing parameter increases.
IvB Computation and Estimation Tradeoffs
From the geometric perspective, the smoothing term in (with ) enlarges the sublevel set of the regularizer , which results in a problem that is computationally easier to solve with an accelerated convergence rate. However, this geometric deformation brings a loss in the estimation accuracy according to the phase transition results in Theorem 3. This results in a tradeoff between the computational time and estimation accuracy. The tradeoff is controllable given the statistical dimension of the decent cone of the smoothed regularizer . In particular, the statistical dimension can be accurately estimated by the following result.
Proposition 2.
(Statistical Dimension Bound for ) Let be with nonzero rows, and define the normalized sparsity as . An upper bound of the statistical dimension of the descent cone of at is given by
(27) 
The unique optimum which minimizes the righthand side of (27) is the solution of
(28) 
where , .
Proof.
Please refer to Appendix C for details. ∎
Note that and can be calculated given the distribution of the ground truth . For instance, with , we have . Here, follows chi distribution with degrees of freedom and
follows chi square distribution with
degrees of freedom. Hence, we can set , .Although the convergence rate can be accelerated by increasing the smoothing parameter as shown in the previous subsection, Proposition 2 suggests that a larger smoothing parameter results in a larger statistical dimension as the bound in (27) grows with . This will reduce the estimation accuracy for a given signature sequence length according to the result in Theorem 3. Fig. 7 will demonstrate that the estimation error indeed will increase as the smoothing parameter becomes large. Therefore, the smoothing method yields a tradeoff between the computational cost and estimation accuracy, as increasing the smoothing parameter will improve the convergence rate while reduce the estimation accuracy. Such a tradeoff is particular important in scenarios with massive IoT devices and a limited time budget, but not very stringent requirement on estimation accuracy.
IvC Discussion
For typical IoT applications, we are particularly interested in reducing the overall computational cost while maintaining the estimation accuracy, which can be achieved by interpreting the above tradeoff from another perspective. For the smoothed estimator , Proposition 2 together with Theorem 3 can help to provide guidelines for choosing a minimal signature sequence length to maintain the estimation accuracy for a given smoothing parameter . Specifically, while smoothing may increase the estimation error, we can increase the signature sequence for the smoothed estimator compared with the original nonsmooth estimator . Specifically, given a smoothing parameter , according to Theorem 3, we are able to maintain the estimation accuracy by choosing the signature sequence length as follows
(29) 
where is the expectation of the worstcase estimation accuracy normalized by noise power .
V Simulation results
In this section, we verify the phase transition phenomena in IoT networks characterized by Theorem 2 and Theorem 3 via simulations. We further simulate the developed dualsmoothed algorithm to illustrate the benefits of smoothing, as well as the tradeoffs between the estimation accuracy and computational cost.
Va Phase Transitions
To verify the phase transition in the noiseless case, we consider the scenario in which the base station is equipped with antennas, and the total number of devices is . For estimation problem in this noiseless setting, the channel matrix and signature matrix are generated as and , respectively. We declare successful recovery if , and we record the success probability from trials. The experiments are performed using the CVX package [52] in Matlab with default settings.
In Fig. 4 (a), we show the probability of successful recovery as a function of the signature sequence length and the number of active devices. The brightness corresponds to the empirical recovery probability (white = 100%, black = 0% ). On top of this heap map, the empirical curves of , , are success probabilities that are calculated from data. It can be seen that the theoretical curve from Theorem 2 closely matches the empirical curve of the success probability.
To verify the phase transition in the noisy case, we consider a scenario where the base station is equipped with antennas, and the total number of devices is . We fix the number of active devices as , hence the theoretical phase transition location is given as . For estimation problem , the channel matrix
Comments
There are no comments yet.