I Introduction
In the Internet of Things (IoT) paradigm it is envisioned that many types of devices will be wirelessly connected. A foundational study to understand the fundamental tradeoffs and thus enable the successful deployment of ubiquitous, interconnected wireless network is needed. This new paradigm imposes new traffic patterns on the wireless network. Moreover, devices within such a network often have strict energy consumption constraints, as they are often battery powered sensors transmitting bursts of data very infrequently to an access point. Finally, as the name suggests, these networks must support a huge number of interconnected devices.
Due to these new network characteristics, we propose a novel communication and multipleaccess model: the Strongly Asynchronous Slotted Massive Access (SASMAC). In a SASMAC, the number of users increases exponentially with blocklength with occupancy exponent . Moreover, the users are strongly asynchronous, i.e., they transmit in one randomly chosen time slot within a window of length blocks, each block of length , where is the asynchronous exponent. In addition, when active, each user can choose from a set of messages to transmit.All transmissions are sent to an access point and the receiver is required to jointly decode and identify all users. The goal is to characterize the set of all achievable triplets.
Ia Past work
Strongly asynchronous communications were first introduced in [1] for synchronization of a single user, and later extended in [2] for synchronization with positive transmission rate.
In [3] the authors of [2] made a brief remark about a “multiple access collision channel” extension of their original singleuser model. In this model, any collision of users (i.e., users who happen to transmit in the same block) is assumed to result in output symbols that appear as if generated by noise. The error metric is taken to be the per user probability of error, which is required to be vanishing for all but a vanishing fraction of users. In this scenario, it is fairly easy to quantify the capacity region for the case that the number of users are less than the square root of the asynchronous window length (i.e., in our notation ). However, finding the capacity of the “multiple access collision channel” for global / joint probability of error, as opposed to per user probability of error, is much more complicated and requires novel achievability schemes and novel analysis tools. This is the main subject and contribution of this paper.
Recently, motivated by the emerging machinetomachine type communications and sensor networks, a large body of work has studied “manyuser” versions of classical multiuser channels as pioneered in [4]. In [4] the number of users is allowed to grow linearly with blocklength . A full characterization of the capacity of the synchronous Gaussian (random) many access channel was given [4]. In [5], the author studied the synchronous massive random access channel where the total number of users increases linearly with the blocklength . However, the users are restricted to use the same codebook and only a per user probability of error is enforced. In the model proposed here, the users are strongly asynchronous, the number of users grow exponentially with blocklength, and we enforce a global probability of error.
Training based synchronization schemes (the use of pilot signals) was proven to be suboptimal for bursty communications in [2]. Rather, one can utilize the users’ statistics at the receiver for synchronization or user identification purposes. The identification problem (defined in [6]) is a classic problem considered in hypothesis testing. In this problem, a finite number of distinct sources each generates a sequence of i.i.d. samples. The problem is to find the underlying distribution of each sample sequence, given the constraint that each sequence is generated by a distinct distribution.
Studies on identification problems all assume a fixed number of sequences. In [7], authors study the Logarithmically Asymptotically Optimal (LAO) Testing of identification problem for a finite number of distributions. In particular, the identification of only two different objects has been studied in detail, and one can find the reliability matrix, which consists of the error exponents of all error types. Their optimality criterion is to find the largest error exponent for a set of error types for given values of the other error type exponents. The same problem with a different optimality criterion was also studied in [8], where multiple, finite sequences were matched to the source distributions. More specifically, the authors in [8] proposed a test for a generalized NeymanPearsonlike optimality criterion to minimize the rejection probability given that all other error probabilities decay exponentially with a prespecified slope.
In this paper, we too allow the number of users to increase in the blocklength. We assume that the users are strongly asynchronous and may transmit randomly anytime within a time window that is exponentially large in the blocklength. We require the receiver to recover both the transmitted messages and the users’ identities under a global/joint probability of error criteria. By allowing the number of sequences to grow exponentially with the number of samples, the number of different possibilities (or hypotheses), would be doubly exponential in blocklength and the analysis of the optimal decoder becomes much more challenging than classical (with constant number of distributions) identification problems. These differences in modeling the channel require a number of novel analytical tools.
IB Contribution
In this paper, we consider the SASMAC whose number of users increase exponentially with blocklength . In characterizing the capacity of this model, we require its global probability of error to be vanishing. More specifically our contributions are as follows:

We define a new massive identification paradigm in which we allow the number of sequences in a classical identification problem to increase exponentially with the sequence blocklength (or sample size). We find asymptotically matching upper and lower bounds on the probability of identification error for this problem. We use this result in our SASMAC model to recover the identity of the users.

We propose a new achievability scheme that supports strictly positive values of for identical channels for the users.

We propose a new achievability scheme for the case that the channels of the users are chosen from a set of conditional distributions. The size of the set increases polynomially in the blocklength . In this case, the channel statistics themselves can be used for user identification.

We propose a new achievability scheme without imposing any restrictive assumptions on the users’ channels. We show that strictly positive are possible.

We propose a novel converse bound for the capacity of the SASMAC.

We show that for , not even reliable synchronization is possible.
IC Paper organization
In Section II we introduce our massive identification model and present a technical theorem (Theorem 1) that will be needed later on in the proof of Theorem 3. In Section III we introduce the SASMAC model and in Section IV we present our main results. More specifically, we introduce different achievability schemes for different scenarios and a converse technique to derive an upper bound on the capacity of the SASMAC. Finally, Section V concludes the paper. Some proofs may be found in the Appendix.
ID Notation
Capital letters represent random variables that take on lower case letter values in calligraphic letter alphabets. The notation
means . We write , where , to denote the set , and . We use , and simply instead of . The binary entropy function is defined by .Ii Massive Identification Problem
We first introduce notation specifically used in this Section and then introduce our model and results.
Iia Notation
When all elements of the random vector
are generated i.i.d according to distribution , we denote it as . We use , where , to denote the set of all possible permutations of a set of elements. For a permutation , denotes the th element of the permutation. is used to denote the remainder of divided by . is the complete graph with nodes with edge index and edge weights . We may drop the edge argument and simply write when the edge specification is not needed. A cycle of length in may be interchangeably defined by a vector of vertices as or by a set of edges where is the edge between and is that between . With this notation, is then used to indicate the th vertex of the cycle . is used to denote the set of all cycles of length in the complete graph . The cycle gain, denoted by , for cycle is the product of the edge weights within the cycle , i.e., .The Bhatcharrya distance between and is denoted by .
IiB Problem Formulation
Let consist of distinct distributions and also let
be uniformly distributed over
, the set of permutations of elements. In addition, assume that we have independent random vectors of length each. For , a realization of , assign the distribution to the random vector . After observing a sample of the random vector , we would like to identify . More specifically, we are interested in finding a permutation to indicate that . Let .The average probability of error for the set of distributions is given by
We say that a set of distributions is identifiable if .
IiC Condition for Identifiability
In Theorem 1 we characterize the relation between the number of distributions and the pairwise distance of the distributions for reliable identification. Moreover, we introduce and use a novel graph theoretic technique in the proof of Theorem 1 to analyze the optimal Maximum Likelihood decoder.
Theorem 1.
A sequence of distributions is identifiable iff
(1) 
The rest of this section contains the proof. To prove Theorem 1, we provide upper and lower bounds on the probability of error in the following subsections.
IiD Upper bound on the probability of identification error
We use the optimal Maximum Likelihood (ML) decoder, which minimizes the average probability of error, given by
(2) 
where . The average probability of error associated with the ML decoder can also be written as
(3)  
(4) 
where and where (3) is due to the requirement that each sequence is distributed according to a distinct distribution and hence the number of incorrect distributions ranges from . In order to avoid considering the same set of error events multiple times, we incorporate a graph theoretic interpretation of in (4) which is used to denote the fact that we have identified distributions incorrectly. Consider the two sequences and for which we have
These two sequences in (4) in fact indicate the event that we have (incorrectly) identified instead of the (true) distribution . For a complete graph , the set of edges between in would produce a single cycle of length or a set of disjoint cycles with total length . However, we should note that in the latter case where the sequence of edges construct a set of (lets say of size ) disjoint cycles (each with some length for such that ), then those cycles and their corresponding sequences are already taken into account in the (union of) set of error events.
As an example, assume and consider the error event
which corresponds to the (error) event of choosing over with errors. In the graph representation, this gives two cycles of length each, which correspond to
and are already accounted for in the events
with .
As the result, in order to avoid double counting, in evaluating (4) for each we should only consider the sets of sequences which produce a single cycle of length .
Before proceeding further, we define the edge weights for a complete weighted graph
In particular, we define to be the edge weight between vertices in the complete graph shown in Fig. 1.
Hence, we can upper bound the probability of error in (4) as
(5)  
(6) 
where enumerates the number of incorrect matchings and where is the th vertex in the cycle . In (6), we have leveraged the fact that is the edge weight between vertices in the complete graph and hence is the gain of cycle . The inequality in (5) is by
(7)  
The fact that we used in (7) instead of finding the exact optimizing , comes from the fact that is the optimal choice for and as we will see later, the rest of the error events are dominated by the set of only incorrectly identified distributions. This can be seen as follows for
(8) 
where in the first equality in (8), by using the Lagrangian method, can be shown to be equal to and subsequently the second inequality in (8) is proved.
In order to calculate the expression in (6), we use the following graph theoretic Lemma, the proof of which is given in Appendix A.
Lemma 1.
In a complete graph and for the set of cycles of length , , we have
where are the number of cycles of length and the number of edges in the complete graph , respectively.
By Lemma 1 and (6) we prove in Appendix B that the upper bound on the probability of error goes to zero if
(9) 
As a result of Lemma 1, it can be seen from (88) that the sum of probabilities that distributions are incorrectly identified is dominated by the probability that only distributions are incorrectly identified. This shows that the most probable error event is indeed an error event with two wrong distributions.
IiE Lower bound on the probability of identifiability error
For our converse, we use the optimal ML decoder, and as a lower bound to the probability of error in (4), we only consider the set of error events with only two incorrect distributions, i.e., the set of events with . In this case we have
(10) 
where (10) is by [12] and where
(11) 
We prove in Appendix C that a lower bound on is given by
(12)  
(13)  
(14) 
where (13) is by Lemma 1. As it can be seen from (14), if , the probability of error is bounded away from zero. As a result, we have to have
which also matches our upper bound on the probability of error in (89).
Remark 1.
As it is clear from the result of Theorem 1, when is a constant or grows polynomially with , the sequence of distributions in are always identifiable and the probability of error in the identification problem decays to zero as the blocklength goes to infinity. The interesting aspect of Theorem 1 is in the regime that increases exponentially with the blocklength.
Having proved the criterion for identifiability of a massive number of distributions in Theorem 1, we move on to the SASMAC problem. We use the result of Theorem 1
to identify the massive number of users by their induced probability distribution at the receiver.
Iii SASMAC problem
We first introduce the special notation used in the SASMAC and then formally define the problem.
Iiia Special Notation
A stochastic kernel / transition probability / channel from to is denoted by , and the output marginal distribution induced by through the channel as
(15) 
where is the space of all distributions on . We define the shorthand notation
(16) 
For a MAC channel , we define the shorthand notation
(17) 
to indicate that users indexed by transmit , and users indexed by transmit their respective idle symbol . When , we use
and when , we use
The empirical distribution of a sequence is
(18) 
where denotes the number of occurrences of letter in the sequence ; when using (18) the target sequence is usually clear from the context so we may drop the subscript in . The type set and the shell of the sequence are defined, respectively, as
(19)  
(20) 
where is the number of joint occurrences of in the pair of sequences .
We use
to denote the Kullback Leibler divergence between distribution
and , and for the conditional Kullback Leibler divergence. We let denote the mutual information between random variablewith joint distribution
.IiiB SASMAC Problem Formulation
Let be the number of messages, be the number of blocks, and be the number of users. An code for the SASMAC consists of:

A message set , for each user , from which messages are chosen uniformly at random and are independent across users.

An encoding function , for each user . We define
(21) Each user choses a message and a block index , both uniformly at random. It then transmits , where is the designated ‘idle’ symbol for user .

A destination decoding function
(22) such that its associated probability of error, , satisfies where
(23) where the hypothesis that user has chosen message and block is denoted by .
A tuple is said to be achievable if there exists a sequence of codes with . The capacity region of the SASMAC at asynchronous exponent , occupancy exponent and rate , is the closure of all possible achievable triplets.
Iv Main results for SASMAC
In this Section we first introduce an achievable region for the case that different users have identical channels (in Theorem 2
). We then move on to the more general case where the users’ channels belong to a set of conditional probability distributions of polynomial size in
(in Theorem 3). In this case, we use the output statistics to distinguish and identify the users. We then remove all restrictions on the users’ channels and derive an achievability bound on the capacity of the SASMAC (in Theorem 4). After that, we propose a converse bound on the capacity of general SASMAC (in Theorem 5). We then provide a converse bound on the number of users (in Theorem 6).Iva Users with Identical Channels
The following theorem is an achievable region for the SASMAC for the case that different users have identical channels toward the base station when they are the sole active user. In this scenario, users’ identification and decoding can be merged together.
Theorem 2.
For a SASMAC with asynchronous exponent , occupancy exponent and rate , assume that (recall definition (17)) for all users. Then, the following region is achievable
(24) 
where
(25) 
Proof.
Before starting the proof, we note that for (first bound in (24)), with probability approaching one as the blocklength grows to infinity, the users transmit in distinct blocks. Hence, in analyzing the joint probability of error of our achievability scheme, we can safely condition on the hypothesis that users do not collide. The probability of error given the hypothesis that collision has occurred, which may be large, is then multiplied by the probability of collision and hence is vanishing as the blocklength goes to infinity, regardless of the achievable scheme. The probability of error for this twostage decoder can be decomposed as
(26)  
(27) 
Codebook generation
Let be the number of users, be the number of blocks, and be the number of messages. Each user generates a constant composition codebook with composition by drawing each message’s codeword uniformly and independently at random from the type set (recall definition in (19)). The codeword of user for message is denoted as .
Probability of error analysis
A twostage decoder is used, to first synchronize and then decode (which also identifies the users’ identities) the users’ messages. We now introduce the two stages and bound the probability of error for each stage.
Synchronization step. We perform a sequential likelihood test as follows. Fix a threshold
(28) 
For each block if there exists any message for any user such that
(29) 
then declare that block is an ‘active’ block, and an ‘idle’ block otherwise. Let
(30) 
be the hypothesis that user is active in block and sends message . The average probability of synchronization error, averaged over the different hypotheses, is upper bounded by
(31)  
(32)  
(33)  
Comments
There are no comments yet.