I Introduction
The advancements in wireless technologies have enabled connecting sensors, mobile devices, and machines for various mobile applications, leading to an era of InternetofThings (IoT) [1]. IoT connectivity involves connecting a massive number of devices, which form the foundation for many applications, e.g., smart home, smart city, healthcare, transportation system, etc. Thus it has been regarded as an indispensable demand for future wireless networks [2]. With a large number of devices to connect with the base station (BS), in the order to , massive connectivity brings formidable technical challenges, and has attracted lots of attentions from both the academia and industry [3, 4].
The sporadic traffic is one unique feature in massive IoT connectivity, which means that only a restricted portion of devices are active at any given time instant [5]. This is because IoT devices are often designed to sleep most of the time to save energy, and are activated only when triggered by external events. Therefore, the BS needs to manage the massive random access via detecting the active users before data transmission. The grantbased random access scheme has been widely applied to allow multiple users to access the network over limited radio resources, e.g., in 4G LTE networks [4][6]. Under this scheme, each active device is randomly assigned a pilot sequence from a predefined set of preamble sequences to notify the BS of the device’s activity state. A connection between an active device and the BS will be established if the pilot sequence of this device is not engaged by other devices. Besides the overhead caused by the pilot sequence, a major drawback of the grantbased random access scheme is the collision issue due to a massive number of devices [5].
To avoid the excessive access latency due to the collision, a grantfree random access scheme has been proposed [5]
. Under this scheme, the active devices do not need to wait for any grant to access the network, and can directly transmit the payload data following the metadata to the BS. Following activity detection and channel estimation based on the pilot sequences, payload data of the active devices can be decoded. The key idea of activity detection and data decoding under the sporadic pattern is to connect with sparse signal processing and leverage the compressed sensing techniques
[7]. Compared with the grantbased access scheme [5], the grantfree random access paradigm enjoys a much lower access latency. In the scenario where the payload data only contains a few bits, e.g., sending an alarm signal, the efficiency can be further improved by embedding the data symbols in the signature sequences [8, 9]. Nevertheless, with massive devices and massive BS antennas, the resulting highdimensional detection problem brings formidable computational challenges, which motivates our investigation.Ia Related Works
We consider the grantfree massive random access scheme in a network consisting of one multiantenna BS and a massive number of devices with small data payloads, where each message is assigned a unique signature sequence. By exploiting the sparsity structure in both the device activity state and data transmission, joint device activity detection and data decoding can be achieved by leveraging compressed sensing techniques [10, 7]. Recently, a covariancebased method has been proposed to improve the performance of device activity detection [11], where the detection problem is solved by a coordinate descent algorithm with random sampling, i.e., it randomly selects coordinatewise iterate to update. This covariancebased method has also been applied for joint detection and data decoding [9]
. Furthermore, the phase transition analysis for covariancebased massive random access with massive MIMO has been provided in
[12].Although coordinate descent is an effective algorithm to solve the maximum likelihood estimation problem for joint activity detection and data decoding [9], existing works adopted a random coordinate selection strategy, which yields a slow convergence rate. Besides, a rigorous convergence rate analysis for this strategy has not yet been obtained. In this paper, our principle goal is to develop coordinate descent algorithms with more effective coordinate selection strategies for faster activity and data detection in massive random access, supported by rigorous convergence rate analysis.
Coordinate descent algorithms [13] with various coordinate selection strategies have been widely applied to solve optimization problems for which computing the gradient of the objective function is computationally prohibitive. It enjoys a low periteration complexity, as one or a few coordinates are updated in each iteration. In most previous works, e.g., [14, 15]
, each coordinate is selected uniformly at random at each time step. Recent studies have proposed more advanced coordinate selection strategies via exploiting the structure of the data and sampling the coordinates from an appropriate nonuniform distribution, e.g.,
[16][17], which outperform the random sampling strategy in the convergence rate.Specifically, a convex optimization problem that minimizes a strongly convex objective function was considered in [16]. It proposed a GaussSouthwellLipschitz rule that gives a faster convergence rate than choosing random coordinates. Subsequently, Perekrestenko et al. [18] improved convergence rates of the coordinate descent in an adaptive scheme on general convex objectives. Additionally, Zhao and Zhang [19]
developed an importance sampling rule where the sample distribution depends on the Lipschitz constants of the loss functions. The adaptive sampling strategies in
[18, 19] require the full information of all the coordinates, which yields high computation complexity at each step. To address this issue, a recent study [17] exploited a bandit algorithm to learn a good approximation of the reward function, which characterizes how much the cost function decreases when the corresponding coordinate is updated. The coordinate descent algorithms proposed in all the works mentioned above are to solve convex optimization problems. Different from these works, the covariancebased estimation problem is nonconvex. Hence, efficient algorithms with new reward functions and corresponding theoretical analysis are required, which bring unique challenges.IB Contributions
In this paper, we propose coordinate descent algorithms with effective coordinate sampling strategies for faster activity and data detection in massive random access. Specifically, we develop a novel algorithm, i.e., coordinate descent with Bernoulli sampling. Inspired by [17]
, we cast the coordinate selection procedure as a multiarmed bandit (MAB) problem where a reward is received when selecting an arm (i.e., a coordinate), and we aim to maximize the cumulative rewards over iterations. At each iteration, with probability
the coordinate with the largest reward is selected, and otherwise the coordinate is chosen uniformly at random. We provide the convergence rate analysis on the coordinate descent with both Bernoulli sampling and random sampling in Theorem 1, which theoretically validates the advantages of the proposed algorithm. While the algorithm and analysis in [17] only considered convex objective functions, we extend them to the nonconvex case.The value of plays a vital role in the convergence rate and the computational cost. As demonstrated in Theorem 1, the larger the value of is, the higher profitability of selecting the coordinate endowed with the largest reward is. On the other hand, a larger value of leads to a higher computational cost, since the rule of selecting the coordinate with the largest reward requires computing the rewards for all the coordinates. This motivates us to develop a more advanced algorithm called coordinate descent with Thompson sampling, which adaptively adjusts the value of . In this algorithm, an inner MAB problem is established to learn the optimal value of
, which is solved by a Thompson sampling algorithm. Theoretical analysis is provided to demonstrate that the logarithmic expected regret for the inner MAB problem can is achieved. Different from the analysis of Thompson sampling in previous works where the parameters of the beta distribution are required to be integers, i.e.,
[20, 21], our analysis applies to the beta distribution of which the parameters are in the more general and natural forms.Simulation results show that the proposed algorithms enjoy faster convergence rates with lower time complexity than the stateoftheart algorithm. It is also demonstrated that coordinate descent with Thompson sampling enables to further improve the convergence rate compared to coordinate descent with Bernoulli sampling. Furthermore, we show that the proposed algorithm can be applied to faster activity and data detection in more general scenarios, i.e., with low precision (e.g., 1 – 4 bits) analogtodigital converters (ADCs).
Ii System model and problem formulation
In this section, we introduce the system model for massive random access, a.k.a., massive connectivity. A covariancebased formulation is then presented for joint device activity detection and data decoding, which is solved by a coordinated descent algorithm with random sampling.
Iia System Model
Consider an IoT network consisting of one BS equipped with antennas and
singleantenna IoT devices. The channel state vector from device
to the BS is denoted by(1) 
where is the pathloss component depending on the device location, and
is the Rayleigh fading component over multiple antennas that obeys i.i.d. standard complex Gaussian distribution, i.e.,
. Due to the sporadic communications, only a few devices are active out of all devices at a given time instant [22]. For each active device, bits of data are transmitted, where is typically a small number. This is the case for many applications, e.g., sending an alarm signal requires only 1 bit. Our goal is to achieve the joint device activity detection and data detection.Assume the channel coherence block endows with length . The length of the signature sequences () is generally smaller than the number of devices, i.e., , due to the massive number of devices and a limited channel coherence block [9, 22]. We first define a unique signature sequence set for devices. For each device, we assign each bit message with a unique sequence. With , this sequence set is known at the BS:
(2) 
where with for . We assume that all the signature sequences are generated from i.i.d. standard complex Gaussian distribution, and are known to the BS. If the th device is active and aims to send a certain data of bits, the th device will transmit the corresponding sequence from . Specifically, the indicator that implies whether the th sequence of th device is transmitted is defined as follows: if the th device transmits the th sequence; otherwise, . By detecting which sequences are transmitted based on the received signal, i.e., estimating , the BS achieves joint activity detection and data decoding. In this way, the information bits are embedded in the transmitted sequence, and no extra payload data need to be transmitted, which is very efficient for transmitting a small number of bits [8]. Since at most one sequence is transmitted by each device, it holds that , where indicates that device is inactive; otherwise, it is active. The received signal at the BS is represented as
(3) 
where is the additive noise such that for all .
Compact the received signal over antennas as , and the additive noise signal over antennas as
(4) 
The channel matrix is concatenated as
(5) 
with consisting of repeated rows for . Recall the signature sequences defined in (2), and then the model (3) can be reformulated as [9]:
(6) 
where the diagonal block matrix is with being the diagonal activity matrix of the th device. Let denote the diagonal entries of , where for . Our goal is to detect the values of indicators (i.e., ) from the received matrix with the knowledge of the predefined sequence matrix .
IiB Problem Analysis
To achieve this goal, recent works have developed a compressed sensing based approach [10, 23, 24] which recovers from via exploiting the group sparsity structure of . The indicator can then be determined from the rows of . However, such an approach usually suffers an algorithmic complexity that is dominated by in massive IoT networks, i.e., the high dimension of . Furthermore, with messages embedded in the signature sequences, there is no need to estimate the channel state information [9], and thus recent papers [9, 11] have focused on directly detecting activity via estimating instead.
Specifically, the estimation of can be formulated as a maximum likelihood estimation problem. Given , each column of , denoted as for , can be termed as an independent sample from a multivariate complex Gaussian distribution such that [9]:
(7) 
where
with the identity matrix
. Based on (7), the likelihood of given is represented as [9]: where and are operators that return the determinant and the trace of a matrix, respectively. Based on (7), the maximum likelihood estimation problem can be formulated as minimizing :(8) 
where means that each element of is greater or equal to , and denotes the norm. This covariancebased approach was first proposed in [11] for activity detection, and then extended to joint activity and data detection in [9]. Based on the estimated and a predefined threshold , the indicator can be determined by
(9) 
From that indicates whether the th sequence is transmitted by the th device, the activity state of the th device and the transmitted data can be determined, i.e., achieving joint activity detection and data decoding.
For the ease of algorithm design, an alternative way to solve problem (IIB) was developed in [9]. By eluding the absolute value constraints, it yields
(10) 
The first term in (10) is a concave function that makes the objective nonconvex, thereby bringing a unique challenge. The paper [9] showed that the estimator of problem (10) by coordinate descent is approximately sparse, thus constraints can be approximately satisfied. Specifically, it demonstrated that as the sample size, i.e., , increases, the estimator of problem (10) concentrates around the ground truth and becomes an approximate sparse vector for large , which implies that constraints are satisfied approximately when is large. Motivated by its low periteration complexity, the papers [9, 11] developed a coordinate descent algorithm to solve the relaxed problem (10), which updates the coordinate of randomly until convergence (illustrated in Algorithm 1). However, such a simple coordinate update rule
yields a less aggressive convergence rate, and lacks rigorous convergence rate analysis with theoretical guarantees. In this paper, we aim to design a novel sampling strategy for coordinate descent to improve its convergence rate.
There have been lots of efforts in pushing the efficiency of coordinate descent algorithms by developing more sophisticated coordinate update rules. Concerning supervised learning problems, previous works
[18, 15] have demonstrated that the coordinate descent algorithm can yield better convergence guarantees when exploiting the structure of the data and sampling the coordinates from an appropriate nonuniform distribution. Furthermore, the paper [17]proposed a multiarmed bandit based coordinate selection method that can be applied to minimize convex objective functions, e.g., Lasso, logistic and ridge regression. Inspired by
[17], we shall apply the idea of Bernoulli sampling to solve the estimation problem (10) with a nonconvex objective function for joint activity and data detection. In the remainder of the paper, we first present a basic coordinate descent algorithm with Bernoulli sampling in Section III, followed by proposing a more efficient algorithm with Thompson sampling in Section IV, both with rigorous analysis. Simulation results are provided in Section VI.Iii Coordinate Descent with Bernoulli Sampling
In this section, a basic algorithm, coordinate descent with Bernoulli sampling, is developed. We begin with introducing a reward function for each coordinate, which quantifies the decrease of the objective function in (10) by updating the corresponding coordinate. Based on the reward function, a coordinate descent algorithm with Bernoulli sampling (CDBernoulli) is proposed for joint device activity and data detection. The convergence rate of the proposed algorithm will be provided, and compared with that of coordinate descent with random sampling [11].
Iiia Reward Function
The coordinate selection strategy depends on the update rule for the decision variable for . The update rule with respect to the th coordinate is denoted as , which is illustrated by Line 57 in Algorithm 1. The following lemma quantifies the decrease of updating a coordinate according to the update rule , which is the reward function in our proposed algorithm and denoted as .
Lemma 1.
Considering problem (10), and choosing the coordinate and updating with the update rule , we have the following bound: where
(11) 
Proof.
Please refer to Appendix A for details. ∎
A greedy algorithm based on Lemma 1 is to simply select at time the coordinate with the largest at time . However, the cost of computing reward functions for all the is prohibitively high, especially with a large number of devices. To address this issue, the paper [17] adapted a principled approach using a bandit framework for learning the best ’s, instead of exactly computing all of them. Inspired by this idea, at each step , we select a single coordinate and update it according to the rule . The reward function is computed and used as a feedback to adapt the coordinate selection strategy with Bernoulli sampling. Thus, only partial information is available for coordinate selection, which reduces the computational complexity of each iteration. Details of the algorithm are provided in the following subsection.
IiiB Algorithm and Analysis
Consider a multiarmed bandit (MAB) problem where there are arms (coordinates in our setting) from which a bandit algorithm can select for a reward, i.e., as in (11) at time . The MAB aims to maximize the cumulative reward received over rounds, i.e., , where is the arm (coordinate) chosen at time . After the th round, the MAB only receives the reward of the selected arm (coordinate) which is used to adjust its arm (coordinate) selection strategy for the next round. For more background on the MAB problem, please refer to [25].
Based on the MAB problem introduced above, the CDBernoulli algorithm is illustrated in Algorithm 2.
To address the computational complexity issue of the greedy algorithm that requires to compute the reward function for all at each round , Algorithm 2 only computes the reward function of all the coordinates every rounds (please refer to Line 46 in Algorithm 2). In the remaining rounds, is estimated based on the most recently observed reward in the MAB. The coordinate selection policy is presented as follows: with probability a coordinate is determined uniformly at random, while with probability the coordinate endowed with the largest is chosen. It mimics the greedy approach for conventional MAB problems [25]. This is to achieve a tradeoff between exploration and exploitation. That is, whether choosing the coordinate with currently the largest reward or exploring other coordinates. Then the th coordinate of is updated according to the update rule . The th entry of the estimated reward function is updated as with the rest unchanged.
The following result shows the convergence rate of coordinate descent for joint activity and data detection with two different coordinate selection strategies, i.e., random sampling and Bernoulli sampling. The estimation error is defined as
(12) 
with . In contrast to the previous work [17], which concerns the objective function consisting of a smooth convex function and a regularized convex function, this paper considers in (10) that consists of a concave function and a convex function. Denote the best arm (coordinate) as with the estimated reward in Algorithm 2, we have the following convergence result.
Theorem 1.
Assume that at each iteration , for some constant that depends on and , then the iterate at the th iteration of the CDBernoulli algorithm (illustrated in Algorithm 2) for solving problem (10) obeys
(13) 
where with some constant , for all and where with defined in (11). Furthermore, the CDRandom algorithm (illustrated in Algorithm 1) for solving problem (10) yields with some constant .
Proof.
Please refer to Appendix E for details. ∎
We conclude from Theorem 1 that by choosing proper values of and (we use and in the experiments of Section VI) to yield sufficiently small , the bound with respect to CDBernoulli approaches with , which outperforms the bound with respect to CDRandom, i.e., . Hence, Theorem 1 demonstrates that for solving covariancebased joint device activity detection and data decoding, CDBernoulli yields a faster convergence rate than that with CDRandom.
In Algorithm 2, the value of plays a vital role in the balance between exploitation and exploration. The larger the value of is, the higher profitability of selecting the coordinate endowed with the largest current reward function (11) at each iteration is. However, a larger value of leads to insufficient exploration, which may lead to slow convergence rate. Instead of fixing , we prefer to developing a more flexible strategy for choosing . This motivates an improved algorithm to be presented in the next section.
Iv Coordiante Descent with Thompson sampling
In this section, we improve the convergence rate of CDBernoulli Algorithm by incorporating another bandit problem to adaptively choose . Specifically, we formulate the choice of the parameter as a general Bernoulli bandit problem, and develop a Thompson sampling algorithm for solving this bandit problem. The theoretical analysis is also presented to verify the advantage of Algorithm 3 over Algorithm 2.
Iva A Stochastic MAP Problem for Choosing
We first introduce a stochastic armed bandit problem for optimizing the parameter in Algorithm 2. In this paper, we assume that the reward distribution with respect to choosing is Bernoulli, i.e., the rewards are either or . Note that the reward with respect to choosing is different from the reward function of selecting the coordinates defined by (11).
An algorithm for the MAB problem needs to decide which arm to play at each time step , based on the outcomes of the previous plays. Let denote the (unknown) expected reward for arm . The means for the armed bandit problem, denoted as , are unknown, and are required to be learned by playing the corresponding arms. A general way is to maximize the expected total reward by time , i.e., , where is the arm played at step , and the expectation is over the random choices of made by the algorithm. The expected total regret can be also represented as the loss that is generated due to not playing the optimal arm in each step. Let and Also, let denote the number of times arm has been played up to step Then the expected total regret in time is given by [20]
IvB Thompson Sampling
We first present some background on the Thompson sampling algorithm for the Bernoulli bandit problem, i.e., when the rewards are either or , and for arm the probability of success (reward =) is . More details on Thompson sampling can be found in [26] and [20].
It is convenient to adopt Beta distribution as the Bayesian priors on the Bernoulli means
’s. Specifically, the probability density function (pdf) of
, i.e., the beta distribution with parameters , , is given by with being the gamma function. If the prior is a distribution, then based on a Bernoulli trial, the posterior distribution can be represented as when the trail leads to a success; otherwise, it is updated as .The previous studies of Thompson sampling algorithm, e.g., [20], generally assumed that and are integers. The algorithm initially assumes that arm has prior as on , which is natural because is the uniform distribution on the interval . At time , having observed successes (reward = ) and failures (reward = ) in plays of arm , the algorithm updates the distribution on as . The algorithm then samples from these posterior distributions of the ’s, and plays an arm according to the probability of its mean being the largest.
Different from previous methods, in this paper, we consider a more general way to update the parameters and by evaluating the reward function , to be presented in the following subsection.
IvC CDThompson
The coordinate descent algorithm via Thompson sampling (CDThompson) is illustrated in Algorithm 3. In this algorithm, a stochastic MAB problem for learning the best for arms at the th iteration is established, and a Thompson sampling algorithm is developed to solve this bandit problem. In Algorithm 3, the reward for selecting the th coordinate at time step is taken into consideration to update the parameters , thereby choosing based on . To be specific, for the index and the Bernoulli variable , if , we update
(14) 
otherwise, we update
(15) 
where is defined in (11) and is defined in (10). For illustration, the main processes of CDBernoulli and CDThompson are illustrate in Fig. 1.
Recall that denotes the (unknown) expected reward for arm . At time , if arm has been played a sufficient number of times, is tightly concentrated around with high probability. In the following analysis, we assume that the first arm is the unique optimal arm, i.e., . The expected regret for the stochastic MAB problem in Algorithm 3 is presented as follows.
Theorem 2.
The armed stochastic bandit problem for choosing for in Algorithm 3 has an expected regret as in time , where .
Proof.
Please refer to Appendix F for a brief summary of the proof. ∎
Remark 1.
Algorithm 2 adopts a fixed constant as the probability of updating coordinate with the largest reward function (i.e., coordinatewise descent value) (11) at time step , which lacks flexibility for better explorationexploitation tradeoff. In contrast, Algorithm 3 improves the strategy of choosing the parameter in Algorithm 2. This is achieved by establishing a stochastic armed bandit problem for choosing the corresponding probability. This multiarmed stochastic bandit problem studies an exploitation/exploration tradeoff by sequentially designing for at time step . During the sequential decision, Algorithm 3 is able to approximate the optimal value of the probability. Theoretically, Theorem 2 demonstrates that Algorithm 3 enjoys a logarithmic expected regret for the stochastic armed bandit problem, which typically is the best to expect. Furthermore, the exploitation/exploration tradeoff in Algorithm 3 eludes the situation where the large value of in Algorithm 3 is maintained in many time steps, and thus avoids high computational cost for computing for all at time step .
Remark 2.
Different from the previous MAB based coordinate descent algorithm [17] that solves convex optimization problems, our proposed algorithm solves a covariancebased estimation problem that is nonconvex. Beta distribution, i.e., , is a powerful tool to learn the priors for Bernoulli rewards. Specifically, we consider a more general way to update the parameters and based on the reward function . Our proposed algorithms turn out to be enjoying faster convergence rates with modest computational time complexity.
V Application to Massive Connectivity with Lowprecision ADCs
While the formulation in Section II presents a basic massive connectivity system, the proposed algorithms, i.e., CDBernoulli and CDThompson, can also be applied to solve more general activity detection problems. In this section, we introduce massive connectivity with lowprecision analogtodigital converters (ADCs) as an example. Recently, the use of low precision (e.g., 1–4 bits) ADCs in massive MIMO systems has been proposed to reduce cost and power consumption [27, 28, 29]. In the following, we illustrate how the proposed algorithms can be applied to this new scenario.
At each of receive antennas, the A/D converter samples the received signal and utilizes a finite number of bits to represent corresponding samples. Each entry, i.e., , of (6) for is quantized into a finite set of predefined values by a bit quantizer . The quantized received signal is thus represented by [29]
(16) 
where the complexvalued quantizer is defined as i.e., the real and imaginary parts are quantized separately. The real valued quantizer maps a realvalued input to one of the bins, which are characterized by the set of thresholds such that For , an element of the output is assigned a value in when the quantizer entry of the input falls in the th bin, i.e., the interval .
Generally, the quantization operation is nonlinear. For ease of applying coordinate descent algorithms to solve quantized model, we linearize the quantizer. Based on Bussgang’s theorem, the quantizer output can be decomposed into a signal component plus a distortion that is uncorrelated with the signal component [27], i.e.,
(17) 
where is the realvalued diagonal matrix containing the distortion factors:
(18) 
with for denoting the bit resolution of the scalar quantizer with respect to each antenna.
Since is uncorrelated with the signal component , the covariance matrix of the quantizer can be represented as
(19) 
where is defined in (7). Hence, the joint device activity detection and data decoding with lowprecision ADCs can be formulated as
(20) 
Problem (20) can be efficiently solved by the proposed algorithms, i.e., Algorithm 2 and Algorithm 3. Simulations will be presented in the next section.
Vi Simulation Results
In this section, we provide simulation results to demonstrate that the proposed algorithms enjoy faster convergence rates than coordinate descent with random sampling for joint device activity detection and data decoding. Furthermore, we apply our proposed algorithms to massive connectivity with lowprecision ADCs.
Via Simulation Settings and Performance Metric
Consider a single cell of radius m containing devices, among which devices are active. The performance is characterized by the probability of missed detection.
The simulation settings are given as follows:

The signature matrix (2) with is generated from i.i.d. standard complex Gaussian distribution, followed by normalization, i.e.,

The channel matrix consists of Rayleigh fading components that follow i.i.d. standard complex Gaussian distribution, i.e.,
Meanwhile, the fading component in (1) for device with is given as in dB where for all .

The additive noise matrix is generated from i.i.d. complex Gaussian distribution, i.e.,
where the variance
is the background noise power normalized by the device transmit power. In the simulations, the background noise power is set as  dBm, and the transmit power of each device is set as dBm. 
Performance metric is defined in the following. The missed detection occurs when a device is active but is detected to be inactive, or a device is active and is detected to be active but the data decoding is incorrect. Different probabilities of missed detection can be obtained by adjusting the value of the threshold in (9). In the simulations, we choose a threshold that enables to determine active devices from the estimated .
The following three algorithms are compared:

Proposed coordinate descent with Bernoulli sampling (CDBernoulli): Problem (10) is solved by Algorithm 2 with the setting of and . Note that the computational time will increase as the value of increases. The convergence rate of CDBernoulli will decrease as the value of decreases. We thus pick a modest value to illustrate the performance of CDBernoulli.
All the algorithms stop when the relative change of the objective function is lower than a certain level, i.e.,
or the number of iterations exceeds .
ViB Convergence Rate
In the simulations, the length of the signature sequences is , the number of antenna is , and each device transmits a message of bit or bits. The convergence rates of different algorithms are illustrated in Fig. 2. We validate the convergence rate analysis in Theorem 1 by comparing CDBernoulli (i.e., Algorithm 2) with CDRandom (i.e., Algorithm 1). Furthermore, Fig. 2 shows that CDThompson with a more sophisticated strategy on choosing the probability of updating the coordinate has better performance than Algorithm 2. As illustrated in Fig. 2 and demonstrated in Theorem 1, a larger value of yields a large value of , which leads to a slower convergence rate. In summary, this simulation shows that the proposed algorithms yield faster convergence rates than the stateoftheart algorithm [9].
ViC Probability of Missed Detection
Under the setting of , the computational time of three algorithms is further illustrated in Fig. 3. It shows that the proposed algorithms achieve the same level of detection accuracy with much less computational time than the algorithm in [9]. The reason is that the coordinate selection with Bernoulli sampling or Thompson sampling is able to choose the coordinate that yields a larger descent in the objective value. Additionally, Fig. 3 also shows that Algorithm 3 can further reduce the computational time, compared to Algorithm 2. This is achieved by a better exploitation/exploration tradeoff in Algorithm 3 which eludes the situation where the large value of in Algorithm 3 is maintained in many time steps, which leads to a high computational cost for computing for all in time step .
ViD Applications in Lowprecision ADCs
In this part, we test the proposed algorithms with lowprecision ADCs. For the quantization procedure, we use the typical uniform quantizer with the quantization stepsize . For bit quantization, the threshold of this uniform quantizer is given by
(21) 
and the element of the quantization output (17) is assigned the value when the input falls in the th bin, i.e., .
Under the same setting as Section VIC, Fig. 4 shows the unquantization case, and the quantization case with different quantization levels, i.e., . To further illustrate the computational cost of the proposed algorithm applied to the lowprecision ADCs, Fig. 5 shows the probability of missed detection with respect to computational time. These results demonstrate that bit quantization is sufficient to achieve similar convergence rate and accuracy as the unquantization scenario.
Vii Conclusions
In this paper, we developed efficient algorithms based on multiarmed bandit to solve the joint device activity detection and data decoding problem in massive random access. Specifically, we exploited a multiarmed bandit algorithm to learn to update the coordinate, thereby resulting in more aggressive descent of the objective function. To further improve the convergence rate, an inner multiarmed bandit problem was established to improve the exploration policy. The performance gains in the convergence rate and time complexity of the proposed algorithms over the startoftheart algorithm were demonstrated both theoretically and empirically. Furthermore, our proposed algorithms can be applied to a more general scenario, i.e., activity and data detection in the lowprecision analogtodigital converters (ADCs), thereby saving energy and reducing the power consumption.
Our proposed algorithm only updates a single coordinate at each time step
. It is interesting to further investigate the effect of choosing multiple coordinates from a budget at each time step. At a high level, the proposed approach can be regarded as an instance of “learning to optimize”, i.e., applying machine learning to solve optimization problems. Specifically, it belongs to optimization policy learning
[30], which learns a specific policy for some optimization algorithm. One related work is [31], which learns the pruning policy of the branchandbound algorithm. It is interesting to apply such an approach to other optimization algorithms to improve the computational efficiency for massive connectivity.Appendix A Computation of the Reward Function
In this section, we derive the reward function for the multiplearmed bandit problem for coordinate descent. Define as the index of the selected coordinate and define where denotes the th canonical basis with a single at its th coordinate and zeros elsewhere. We can simplify as follows
(22) 
According to [11], the global minimum of in is so the descent value of the cost function is:
(23) 
Hence, the reward function is defined as
Appendix B Primary theorems for the Proof of Theorem 1
Several theorems are needed to pave the way for the proof of Theorem 1.
Theorem 3.
Proof.
Please refer to Appendix C for details. ∎
Theorem 4.
Under the assumptions of Lemma 1, we have the following convergence guarantee:
(25) 
for all , where , is the suboptimality gap at and is an upper bound on for all iterations .
Appendix C Proof of Theorem 3
The selection strategy concerned in this proof is to choose the coordinated with the largest reward function defined in (11), which is denoted by . Hence, based on the fact it yields that
(26) 
that induces
(27) 
which leads to
Comments
There are no comments yet.