I Introduction
Nowadays, the number of Internet of Things (IoT) devices is exponentially growing driven by the need of turning our homes, vehicles, entertainment, health, work, industries, and social/community services into smart, autonomous, sustainable, interactive and intelligent environments [1, 2, 3, 4, 5]. As connectivity backbone of the IoT, massive machinetype communication (mMTC) paradigm aims to address the corresponding connectivity challenges. The latter are intertwined with the unique features of massive IoT connectivity setups, specifically [1, 2, 3, 4]: i) sporadic transmissions, i.e., an unknown/random subset of machinetype communication devices, called simply devices in the sequel, is active at a given time instant; ii) shortpacket communications dominated by uplink (UL) traffic; and iii) energylimited communication/operation. The third feature evinces the need of energyefficient communication/operation protocols and, in many cases, batteryfree operation [6, 7]; while all the features, in particular the first two, call for novel multipleaccess mechanisms [4, 5, 3]. Our work here contributes to the latter.
Grantfree (GF) multipleaccess protocols are particularly appealing for mMTC, since they [1, 2, 3] i) promote efficient spectrum utilization as each device is not assigned a dedicated transmission resource block, ii) reduce signaling overhead, and iii) improve energyefficiency of the devices. Note that owing to the network massiveness, it is impossible to assign orthogonal pilot sequences/preambles to the devices, thus, motivating the need of GF nonorthogonal multiple access protocols. However, a key challenge here lies in efficiently identifying the set of sporadically active, nonorthogonally coexisting, devices and their data, for which collision resolution mechanisms are required [3, 4].
We can distinguish two basic types of collisions: hard and soft
collisions. The former occurs when exactly the same preamble is being simultaneously used by several active devices, while the latter occurs when active devices use different nonorthogonal preambles as they interfere to some extent among each other. The probability of hard/soft collisions increases/decreases as the number of available preambles reduces. Since hard collisions are difficult to resolve without relying on sufficiently orthogonal channel subspaces
[8, 9, 10] and/or additional communication overhead, increasing the pool of nonorthogonal preambles (thus, favoring the occurrences of soft instead of hard collisions) is usually recommended in practice [2, 3, 4]. A promising class of soft collision resolution methods, known as compressed sensing (CS) techniques, have been considered for multiuser device detection (MUD) in mMTC [11].Ia Related Work
CSMUD solutions usually rely on regularization, greedy, messagepassing (MP), and/or artificial intelligence (AI) techniques.

[wide, labelwidth=!]

Regularized MUD relies on transforming the highly nonconvex CSMUD problem to convex via regularization and iterative procedures. For instance, Zhu and Giannakis [12] proposed a ridge detector and a least absolute shrinkage and selection operator (LASSO) detector, which directly regularize the original CSMUD problem based on and norm, respectively. Later on, some sparsityaware successive interference cancellation regularization techniques were proposed in [13, 14] aiming at lowering the detection complexity by sequentially recovering the transmitted symbols. Meanwhile, Renna and Lamare [15] incorporated an norm regularization into an iteratively updated linear minimum mean square error (MMSE) filter and a constellationlist scheme to enable sparse detection. Moreover, a joint user identification and channel estimation approach exploiting the alternating direction method of multipliers was proposed in [16].

Greedy MUD
has low complexity and generally only requires appropriate tuning of the termination of the transmitted signal/vector reconstruction. Schepker and Dekorsy
[17] applied for the first time the orthogonal least squares (OLS) and orthogonal matching pursuit (OMP) greedy algorithms to a sparse mMTC scenario. Since the latter outperforms the former, latest research on greedy MUD has mostly focused on OMPbased algorithms. For instance, Schepker et al. [18] proposed group OMP (GOMP) leveraging channel decoders for greater performance, while Xiong et al. [19] proposed a detectionbased orthogonal matching pursuit (DOMP) algorithm which, unlike the conventional OMP, does not rely on the priori knowledge of the signal/device sparsity (the number of active devices). Specifically, DOMP runs binary hypothesis on the residual vector of OMP at each iteration, while stopping when there is no signal component in the residual vector. Meanwhile, a noiserobust greedy algorithm exploiting a posteriori probability ratios for every index of sparse input signals is designed in [20]. Finally, Lee and Yu [21] leveraged prior activation probability of each device to improve the performance of several greedy MUD schemes in mMTC, and show that they are robust against prior information inaccuracy. 
MPbased MUD constitutes a class of algorithms exploiting factor graphs, thus, the a posteriori distribution of the signal to be reconstructed. In practice, due to the largescale nature of the access problem in mMTC, the usual approach is to adopt/design approximate MP (AMP) algorithms relying on iterative thresholding, which also allows analytic performance characterization via the socalled state evolution. For instance, Chen et al. [22] derived MMSE denoisers for AMP depending on whether or not the largescale component of the channel fading is known. Senel and Larsson [23] analyzed and proposed algorithmic enhancements for coherent and noncoherent MUD based on AMP. Meanwhile, Ke et al. [24] designed nonorthogonal pseudorandom pilots for UL broadband massive access. They formulated the active user detection and channel estimation as a generalized multiple measurement vector CS problem and solved it via a generalized multiple measurement vector AMP algorithm. The suitability of AMP for joint device activity detection and channel estimation of devices coexisting with enhanced mobile broadband (eMBB) services is assessed and promoted in [25]. Finally, Wang et al. [26] designed an AMP algorithm that exploits the temporal activation correlation of the devices, and showed the achievable performance gains.

AIbased MUD
leads to direct detection decisions as the detection parameters are learned and configured on the go, thus, avoiding empirical parameters tuning. Deep learning is the most commonly used AI tool for solving the CSMUD problem
[27]. Some examples of MUD based on deep learning can be encountered in [28, 29, 30]. Specifically, Bai et al. [28] proposed a fast datadriven algorithm for CSMUD in mMTC relying on a novel block restrictive activation nonlinear unit that nicely captures system sparsity. Meanwhile, Cui et al. [29]designed two modeldriven approaches, which effectively utilize features of sparsity patterns in designing common measurement matrices and adjusting stateoftheart detectors/decoders. Interestingly, the optimum depth (number of layers) to be configured in a deep neural network varies according to the sparsity statistics, which motivated the work in
[30]. Therein, a proposal to autonomously/dynamically update the number of layers in the inference phase is proposed, for which authors introduced an extra halting score at each layer.
IB Contributions
Stateoftheart research on CSMUD either assumes that i) signal sparsity level is known and exploited for MUD, e.g., [12, 13, 14, 15, 16], or ii) signal sparsity level detection is a subproduct of MUD, e.g., [17, 18, 19, 20, 21, 22, 24, 25, 28, 29, 30]. In the first case, there has been no insights/answers on how the sparsity level could be known in advance to MUD, which makes the mechanisms proposed in [12, 13, 14, 15, 16] strictly impractical so far. In the second case, the sparsity level information is not required. However, having and exploiting such information would certainly improve the MUD performance. Specifically, the iterative mechanisms proposed in [17, 18, 19, 20, 21, 22, 24, 25] face the challenge of setting an appropriate stopping criterion, while the deep learning based mechanisms proposed in [28, 29] have fixed depth in terms of the number of layers and do not adapt well to highlyvarying sparsitylevels. Interestingly, the depth could also be learned [30], but this introduces a further nonlinearity into the system. Although the approach leads to accuracy improvements with respect to stateoftheart MUD based on deep learning, it is also more complex. From the discussion above, we can conclude that the CSMUD mechanisms can all significantly benefit from a (sufficiently) deterministic prior on the sparsity level, which is specifically our aim here. The information on the sparsity level enables the application of the MUD solutions in [12, 13, 14, 15, 16], while it potentially makes those ones in [17, 18, 19, 20, 21, 22, 24, 25, 28, 29, 30] more easily configurable and accurate.
We consider a mMTC deployment under quasistatic Rayleigh fading where a random set of devices become active. Our main contributions are sixfold:

[leftmargin=0cm]

We introduce a framework for detecting^{1}^{1}1By convention [31, 32], a detection or classification operation is applied over a (discrete) set of possible hypotheses, while an estimation operation is not restricted to a discrete/natural domain. Hence, a detector for outputs an integer solution, while an estimator for may output a real solution. by exploiting symbols at the beginning of a transmission block. The framework relies on coordinated pilot transmissions (CPT), and may play a crucial role for sparse signal recovery MUD algorithms in mMTC.

We propose unassisted CPT (UCPT) and assisted CPT (ACPT) mechanisms. Specifically, only UL transmissions are exploited when using UCPT, while ACPT mechanisms include also downlink (DL) transmissions for CSI estimation that resolve fading uncertainty. We discuss two ACPT specific implementations: i) ACPTF, which implements CSIbased phase corrections while leveraging the same statistical inverse power control used by UCPT, and ii) ACPTD, which implements a dynamic CSIbased inverse power control, although it requires that the active devices remain in silence with probability given an average transmit power constraint.

We derive the optimum detector of the value of in the case of UCPT. Meanwhile, since the optimum detection relies on a combinatorial search in the case of ACPT mechanisms, we relax the problem to the continuous domain and find efficient estimators for , also for UCPT. The estimator variance is shown to increase with and when operating respectively with UCPT and ACPT mechanisms.

Because of the relaxation mentioned above, the estimators need a quantization or rounding operation to find the integer value of . We discuss two such approaches: i) rounding to the nearest integer (NI), and ii) rounding relying on maximum likelihood (ML). We also numerically assess their performance. The results reveal that the NIbased rounding offers a performance similar to that of the more sophisticated MLbased rounding.

We derive exact or approximate (semi) closedform expressions for the probability mass function (PMF) of the detectors, and validate their accuracy. Such PMFs were shown to be exponential, Gaussian and Student’s like in the case of UCPT, ACPTF and ACPTD, respectively. They allow tractable computation of the estimation success probability, thus, becoming valuable for system design/optimization purposes.

We evince the superiority of ACPTD under appropriate (but not strict/rigorous) configuration of (whose optimum value decreases slowly with ). Moreover, we show that the estimation accuracy increases with , specially when operating with ACPTD, although the performance gain may saturate quickly in high SNR regimes.
Finally, we also identify and briefly discuss several attractive research directions related to CPT to pursue in the sequence.
IC Organization
The remainder of this paper is organized as follows. Section II introduces the system model and related problem. Sections III and IV discuss the UCPT and ACPT mechanisms for detecting , respectively, including accuracy analyses. The distribution of the (relaxed) estimators is characterized in Section V, while Section VI presents and discusses numerical results. Finally, Section VII concludes the article and highlights further research directions.
Notation: Boldface lowercase letters denote column vectors. Superscripts and denote the complex conjugate and Hermitian operations, respectively. is the Euclidean norm of a vector, is the absolute (or cardinality for sets) operation, and denotes integer rounding of the argument according to a certain rule. denotes the probability of the occurrence of event , while
denotes a random variable
conditioned on . and output expected value and variance of the argument, respectively. and are floor and ceiling functions, respectively, while is the binomial coefficient. () outputs the real (imaginary) part of the argument. Additionally, is the logarithm integral [33, eq. (6.2.8)], is the exponential integral [33, eq. (6.2.1)], is the complementary error function [33, eq. (7.2.2)], is the complete gamma function [33, eq. (5.2.1)], and is the modified Bessel function of second kind and order [33, eq. (10.27.4)]. Meanwhile, and with are respectively the incomplete, and regularized incomplete, beta functions [33, eq. (8.17.2)]. () is the set of (nonnegative) real numbers, is the set of complex numbers, and is the imaginary unit. anddenote the probability density function (PDF) and cumulative distribution function (CDF), respectively, of a continuous random variable
, whiledenotes the PMF of a discrete random variable
. Finally, Table I lists the distributions used throughout the paper, including notations and main statistics.Distribution  Notation  Support  CDF  Mean  Variance  

circularlysymmetric complex Gaussian  
Gaussian  
Exponential  
Rayleigh  
Student’s  ,  , 
Ii System & Problem Description
Consider a mMTC deployment, where a set of devices is served by a single coordinator, e.g., a BS or an aggregator. The MTC traffic is sporadic, i.e., only a random subset of the devices is active at any given time. Assume that time is slotted and active devices must wait the next immediate time slot to start a (synchronous) transmission. We aim at detecting the number of devices , out of the total , becoming active in a given time slot, which is also referred to as signal sparsity level. This information is then potentially exploited in posterior detection/decoding mechanisms, e.g., [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 28, 29, 30].
The devices and the coordinator are equipped with a single antenna.^{2}^{2}2As this is, to the best of our knowledge, the first work that proposes the sparsity level detection problem prior to a MUD phase, our aim here lies in introducing the basic ideas, principles, and performance baselines. The extension of our proposed mechanisms to multiantenna setups are not only an interesting but a required research direction, which demands specific but nontrivial adjustments and analyses that we leave for a future work. Channels are subject to quasistatic Rayleigh fading, while DL and UL channels are reciprocal, which is motivated by the use of the same frequency band. Let us denote by the channel coefficient of the link between the coordinator and the th device, thus, , where is the average channel power gain. Moreover, the coordinator and the devices are aware of the largescale channel statistics, which is specially feasible in static and quasistatic MTC deployments.
Our proposed coordinated pilot transmission (CPT) mechanisms, which are schematically illustrated in Fig. 1, leverage the use of symbols for the purpose of detecting . We assume that all the active devices transmit the same pilot sequence during such coordinated transmission phase, either directly as in Fig. 1a, or after a short broadcast DL transmission from the coordinator as in Fig. 1b. The signal received at the coordinator is used to detect among all the possible hypotheses: , where . Let denote the PDF of assuming that is true. The optimal NeymanPearson (NP) detector, which models the true hypothesis as a nonrandom unknown, or the maximum a posteriori
probability (MAP) decision rule with equal prior probabilities of the hypotheses is given by
[32](1) 
Following the initial CPT phase, the transmission of training and data symbols in the case of coherent MUD, or only data symbols in the case of noncoherent MUD, occurs as traditionally.
Iii Unassisted CPT (UCPT)
Under the UCPT approach, which is illustrated in Fig. 1a, all the devices that are active at the beginning of a given transmission block transmit immediately the corresponding shared pilot sequence composed of complex symbols, i.e., , with such that .^{3}^{3}3For the purpose of security and resilience against malicious attacks, such sequence may belong to the orthogonal basis of all vectors of length , and can be periodically and securely changed. Then, the th symbol received at the coordinator can be written as
(2) 
where is the persymbol average transmit power of the th device, and denotes additive white Gaussian noise (AWGN) with variance . Moreover, last step is attained by adopting a statistical inverse power control with , where is the target average receive power, and by setting .
Iiia Detector for
Theorem 1.
Under UCPT, the optimum detector (1) is equivalent to (3) at the top of the next page, where , , , , , and
is the average receive signal to noise ratio (SNR).
Proof.
See Appendix A.
(3) 
Note that the complexity of (3) grows with the number of coexisting devices since it requires testing all the possible hypothesis. But more importantly, (3) is difficult to analyze, thus, deriving performance insights becomes cumbersome. To circumvent these issues, we analyze a much simpler, although suboptimal, detection approach in the sequel.
The idea is to relax the original detection problem (1), which directly leads to an integer solution, to an estimation problem in the continous real domain. Since converges to a zero vector, while scales linearly with , the (relaxed) minimumvariance unbiased (MVU) estimator of exploits the sample variance as
(4) 
Then, since , we need to apply a rounding operation, i.e., , which is in fact the only possible source of performance degradation compared to the optimal NP detector (3). The reason is simple, (4) is not only the MVU but also the ML estimator of , i.e., the value of that maximizes for . In Section V, we discuss the specific rounding operation that we adopt in this work and the implications.
IiiB On the Accuracy of the Proposed Estimator
For simplicity, the performance degradation from the rounding operation is neglected in this subsection. The variance of the estimator can be easily obtained from (4) as
(5) 
by exploiting , thus, . Since , increases with . In fact, in the high SNR regime and/or for a sufficiently large number of samples , it holds that . This means that the contribution to the estimator variance is due to the fading uncertainty, hence, the uncontrolled fading in the UCPT scenario is the main cause of estimation inaccuracy as increases. This motivates the introduction of more evolved estimators, and corresponding requirements, in the following.
Iv Assisted CPT (ACPT)
The main reason behind the limited accuracy of the UCPT detector/estimator is the channel uncertainty. Herein, we address this by introducing a coordinated training phase common to all devices as illustrated in Fig. 1b. Specifically, at the beginning of each time slot, the coordinator sends a broadcast pilot signal comprising symbols. Then, the signal received by the th device is given by
(6) 
where , is the persymbol average transmit power of the coordinator, and is the th AWGN sample at the th device. For simplicity, we assume . This broadcast pilot transmission phase is leveraged by the active devices to estimate their corresponding channel coefficient, which is assumed ULDL reciprocal due to a time division duplex operation [4, 6, 30]. Specifically, the MVU estimator of is given by
(7) 
where , while the estimation error is given by
(8) 
After such initial DL CSI acquisition phase, active devices exploit the remaining symbols for sending the shared pilot sequence , with , but phase shifted as , thus, aiming at a coherent signal combination at the coordinator. Meanwhile, the transmit power of the devices is established according to one of the power control mechanisms described in the next subsections.
Iva Fixed Power Control
Herein, we use the same statistical inverse power control mechanism as in the case of UCPT, i.e., , thus, the power allocation remains fixed in static deployments. We refer to ACPT with such fixed power control as ACPTF.
Since is phaseshifted by the th (active) device as , the signal received at the coordinator is given by
(9) 
where we use and to attain (a) and (b), respectively. Note that is distributed as since is independent of
and uniformly distributed in the unit circle.
IvA1 Detector for
Observe that the statistics of in (9) depend on the specific set of active devices rather than on its cardinality alone. Therefore, an optimal detector inevitably requires evaluating the likelihood of all the possible sets of cardinality for each hypothesis
. This is a nondeterministic polynomialtime (np) complete problem. Since
can be large, the number of combinations to test when assessing each hypothesis , i.e., , may be huge and computationally unaffordable. Therefore, here we do not present the optimum detector and resort directly to a suboptimal but affordable method.Similar to our approach in Section IIIA, we relax the detection problem to an estimation problem. For this, let us first define
(10) 
where is the average SNR of the broadcast pilot received at the th device. Observe that although impacts not only the mean but multiple statistics of via the first two terms of (10), we can safely rely only on the mean as a single channel realization/sample is available.
The first order statistics of can be computed as follows
(11) 
where (a) holds since , while (b) follows from . Thereafter, a relaxing approximation is needed to proceed further since the set of active devices and their corresponding and
are unknown. This is done in (c) by exploiting the relationship between the arithmetic and geometric mean twice, and defining
(12) 
Observe that
is the harmonic mean of
, while can be interpreted as a traditional meanbased estimate of . The estimate becomes more accurate as the set becomes less disperse and/or as increases. The latter is because as increases, one can more easily assure that terms become more similar among one another (closer to 1). Hence, the arithmetic and geometric means, which were used to go from (b) to (c), converge faster as exemplified in [34, 35] in the context of an ultrareliable lowlatency scenario.Using (11) and assuming is known at (computed beforehand by) the coordinator, the ACPTF estimator for can be set as
(13) 
while .
IvA2 On the Accuracy of the Proposed Estimator
Departing from (10) and (13), we have that
(14) 
where , while is the target average receive SNR at the coordinator. Then,
(15) 
while (15) transforms to
(16) 
for a simplified scenario with , and after performing simple algebraic transformations.
Different from the estimator variance for the case of UCPT (5), which has a quadratic dependence on , here the dependence is linear. Therefore, ACPTF estimator provides already some resilience against fading uncertainty.
IvB Dynamic Power Control
Since channel estimates are already available at the devices, a more intuitive approach could rely on setting given a fixed . However, the devices may require to transmit with very high power when channels experience poor conditions, i.e., very small , which may not be affordable. Note also that does not converge, which means that the average power consumption cannot be controlled through . To overcome this, we propose silencing the devices experiencing very poor channel conditions. Thus, only those active devices with may transmit. This guarantees operating with the same statistics of receive power from each device signal at the coordinator. We refer to ACPT with such dynamic power control as ACPTD.
Theorem 2.
The following parameter configuration
(17) 
where ,guarantees that the devices operate with the same average transmit energy consumption as for UCPT and ACPTF, while constraining the probability of the active devices of remaining in silence to as .
Proof.
See Appendix B.
Some insights on the configuration of are discussed in Section VIB.
IvB1 Detector for
Let us denote by and the set of active devices with (UL transmitting devices) and (silent devices), respectively, thus, . Then, the signal received at the coordinator, , is given by
(18) 
where , while last step follows after using and setting .
Observe that in (18) not only depends directly on the cardinality but also on the specific set of transmitting devices via . Therefore, our arguments in Section IVA1 against the viability of an optimum detector are also valid here, hence, we next resort to a suboptimal but affordable/efficient detector exploiting a (relaxed) estimation of .
Since and , and discarding the information on contained in , we can directly estimate the relaxed as
(19) 
while, . To proceed further, we assume all devices are configured such that (i.e., by adopting the parameter configuration given in Theorem 2). Then, the following result establishes a relatively simple estimator for . Nevertheless, readers may refer to Appendix C, not only to follow Theorem 3’s proof, but also to get insights on how the estimator of could be computed for a general case where , for some .
Theorem 3.
Assume , then the MMSE estimate of is given by
(20) 
where is the solution of
(21) 
Proof.
See Appendix C.
Corollary 1.
For a sufficiently large , as expected for small , such result simplifies to
(22) 
Since (22) holds accurately already for small , hereinafter we adopt it to simplify our exposition and analysis. Note that we are using instead of aiming to mitigate the accumulation of rounding errors. From (22), one can notice that as increases, increases as well, motivating the introduction of the correcting term . Finally,
(23) 
and .
IvB2 On the Accuracy of the Proposed Estimator
Theorem 4.
The variance of the ACPTD estimator given in (23) is approximately given by
(24) 
for relatively small , where and is the target average receive SNR at the coordinator.
Proof.
See Appendix D. ∎
Note that he variance of the ACPTD estimator increases linearly with due to the sum operator in (24). In fact, for the specific scenario where devices’ deployment is homogeneous in the sense that , (24) simplifies to
(25) 
Finally, one can get more insights on the variance performance specified by (25) by leveraging [33, eq. (6.8.1)] to attain
(26) 
where last step comes from leveraging which holds accurate for . Since holds in practice, (26) leads to significantly small variance bound values.
V On the Distribution of the Proposed Detectors
In general, and especially for the considered setup, the estimator variance cannot be directly used to quantify, at least thoroughly, the performance degradation due to detection mismatches. Instead, the distribution of the classification results must be leveraged.
The PMF of can be found as
(27) 
where denotes a rounding rule. Herein, we adopt the following rounding rules:

Rounding to nearest integer (NI), where equals or if the fractional part of is respectively smaller or greater than . Then,
(28) 
Maximum likelihood (ML)based rounding, where
(29) Here, is the likelihood function of given the outcome of .
Observe that the distribution of the relaxed estimator, , is needed for computing under both rounding approaches. Thus, our efforts are in this direction in the following subsections. Interestingly, different from the NIbased rounding, the distribution under MLbased rounding impacts the rounding operation itself. Hence, it becomes cumbersome finding for the latter. Moreover, note that due to the intrinsic similarities between (29) and (1), the relaxed estimators followed by MLbased rounding are expected to perform similar to the optimal NP detectors. This is verified in the case of UCPT, for which the optimal detector was derived in Theorem 1, in Section VI, specifically Table II.
Va UCpt
From (4), it follows directly that is a shifted exponential RV since . Specifically,
(30) 
VB ACptF
Theorem 5.
The distribution of conditioned on is given by