The past few years have witnessed the explosive growth of demand for massive data transmission brought by the increasing number of mobile devices and applications . The traditional orthogonal frequency division multiple access (OFDMA) techniques in the fourth-generation (4G) networks fail to achieve the adequate exertion of the frequency resources due to its orthogonal nature [3, 4]. This also causes access congestion especially in a dense network, urging for more efficient solutions. Capable of supporting massive connectivity and improving the spectral efficiency , non-orthogonal multiple access (NOMA) has drawn widespread attention as a promising candidate to solve the above issues in future communications [6, 7].
In NOMA systems [8, 9], multiple users are allowed to share the same resources, such as a subchannel, a time slot and so on. One of the efficient schemes is the code-domain NOMA [10, 12]. Specifically, the incoming data of each layer are mapped to a sparse codeword in which the non-zero elements correspond to the occupying subcarriers of each layer. After superimposing the codewords of multiple layers, the transmitter sends multiple data streams to the receiver simultaneously over a limited number of subcarriers, achieving the overloading function . Each layer is assigned a pre-defined codebook where the mapping between non-zero elements and the data streams is constructed based on multi-dimensional (MD) modulation techniques [11, 12]
. To decode the superimposed codeword, the belief probability based massage passing algorithm (MPA) has been applied at the receiver.
For the code-domain NOMA, previous works [15, 16, 17, 18] have discussed various schemes to improve the coding gain. In , a constellation optimization method has been developed in which the structure of the MD complex constellation has been optimized based on the star-QAM constellation. In , the Turbo TCM technique has been utilized to generate the multi-dimensional codebooks for each user separately, and the pure MPA has been adopted for decoding. In , a new constellation has been proposed and analyzed based on a novel performance criterion. In , a low-complexity detector has been designed for an uplink NOMA system based on the adaptive Gaussian approximation. Most existing works [15, 16, 17, 18] focus on either optimizing the procedures of MD constellation construction or improving the performance of MPA under the framework in . However, the practical utilization of NOMA systems can be further improved if joint multi-user codeword design can be achieved.
In this paper, we jointly design codewords for multiple users simultaneously via MD trellis-coded modulation (TCM) techniques. The data streams of different users are jointly coded and mapped directly to the sparse codewords. Traditionally, TCM techniques  have been well investigated as a suitable approach to jointly optimize the error control coding and modulation through the signal set expansion instead of bandwidth expansion. Differently, we consider the MD-TCM technique purely as a method to design codewords for NOMA users, instead of replacing the channel coding. New challenges have been posed on both the codeword design and the multi-user detection. On one hand, due to the multiplexing nature, the mapping from the MD constellation point to the superimposed codeword is different from that in the traditional MD-TCM scheme. Therefore, a new method for selecting and labeling the MD constellation should be considered. On the other hand, due to the entanglement of encoding and multiplexing, the traditional MPA or Viterbi algorithm cannot be directly applied to decode the TCM-based NOMA signals, requiring a new decoding scheme.
To tackle the above challenges, we design new encoding and decoding schemes for joint codeword design in an overloaded NOMA system to improve the coding gain. Over each subcarrier, the data streams of multiple users are rearranged into one data sequence, which is then coded and mapped to an MD constellation point. Our main contributions in this paper can be summarized as below:
To construct the TCM encoder, a novel bipartite set partitioning algorithm based on farthest-point optimization (FPO)  is proposed and analyzed such that the minimum free squared Euclidean (MFSE) distance of the system can be maximized.
A maximum likelihood sequence detection (MLSD) is developed. To balance the complexity and the BER performance, we also propose a suboptimal two-layer joint decoding scheme based on the Viterbi algorithm. The non-orthogonal nature of NOMA is utilized in this scheme and the decoding complexity is analyzed.
Simulation results show that the proposed scheme significantly outperforms the traditional code-domain NOMA and OFDMA in terms of the BER performance. The influence of system parameters on the BER performance and decoding complexity is also investigated.
The rest of this paper is organized as follows. In Section II, we introduce the framework of TCM-based NOMA. In Section III, we discuss the TCM-based NOMA encoder design criteria and detailed steps. Based on the criteria, an FPO-based bipartite set partitioning algorithm is proposed and analyzed. In Section IV, we provide an optimal MLSD decoding scheme. To balance the complexity and BER performance, we also develop a suboptimal two-layer soft-decision based Viterbi algorithm. Simulation results are presented in Section V, and finally, we conclude the paper in Section VI.
Ii System Model
In this section, we first introduce downlink code-domain NOMA and then present the key idea of TCM-based joint NOMA codeword design.
Ii-a Code-Domain NOMA Multiplexing
The general structure of a code-domain NOMA system is shown in Fig. 1. In the figure, the BS sends data streams, each to a NOMA user. The available bandwidth is divided into orthogonal subcarriers. Different from orthogonal access, multiple users can share the same subcarrier simultaneously.
At the transmitter, the data stream111In Fig. 1, we omit the subscript due to the limited space. of each user in time unit , denoted by , is transmitted over () subcarriers. Data stream is mapped to a sparse codeword of length , in which nonzero elements of this codeword represent the set of intended signal points and zero elements correspond to the unoccupied subcarriers of user . All NOMA users are multiplexed over shared subcarriers. Therefore, the received signal over subcarrier can be expressed as
where the channel coefficient of subcarrier in time unit is denoted by and the additive white Gaussian noise (AWGN) is denoted as , with
as the noise variance.
Ii-B Mapping Matrix Design
The occupied subcarrier set of each user is fixed, that is, for each user the positions of non-zero elements in the sparse codeword with respect to any data streams are the same. We can use a binary matrix F to depict such mapping relation where each variable in F indicates whether user occupies subcarrier , i.e., whether is a non-zero element.
Since each codeword contains non-zero elements out of , there are possible mappings between the subcarriers and each user. Note that different users occupy different subset of subcarriers. Therefore, at most users can be supported simultaneously. Denote as the number of users sharing the same subcarrier in this case. We then have .
For the NOMA system in Fig. 1 where , , , and , a widely used mapping matrix is presented as below:
In Fig. 1, we use a square marked by 1 to represent an occupied subcarrier by each user, otherwise a blank square marked by 0. With the mapping matrix F, we can then denote the set of users occupying subcarrier as . For example, for the mapping matrix in , we have .
Ii-C TCM-based Joint NOMA Codeword Design
In the traditional NOMA scheme , the codeword of each user, , is only related to its data stream . Each user independently selects a codeword from a pre-defined codebook. Though such a design provides a straightforward decoding method, it greatly limits the potential coding gain due to the independent coding of each user. In addition, channel coding needs to be performed ahead of the MD modulation in the existing scheme. Therefore, extra bandwidth expansion is required when both the channel coding and modulation are considered, leading to inefficient utilization of frequency resources.
To tackle the above two issues, we jointly determine the non-zero elements of all the users over each subcarrier based on their data streams. We consider a joint coding and modulation scheme MD-TCM in which the binary convolutional code and M-ary constellation are combined to obtain the coding gain through the signal set expansion instead of the bandwidth expansion. The key idea can be illustrated as below.
Ii-C1 TCM-based NOMA Encoding
Given the set of users occupying subcarrier , , the data sequence transmitting over subcarrier consists of data streams from users in time unit , which can be denoted as . As mentioned above, according to the first row of the mapping matrix F in equation , we have . Therefore, the data streams for user 1, 2, and 6 in the current time slot in Fig. 1, i.e., , , and , respectively, are transmitted over subcarrier 1. Thus, we have .
In each time unit , a data sequence is encoded into a sequence and then mapped to an MD constellation point via the MD-TCM technique such that the event-error probability can be minimized. Each component of the MD constellation point represents a non-zero element of the sparse codewords over subcarrier , i.e., , . In Fig. 1, after passing the MD-TCM encoder, the data sequence is first encoded as and then projected onto a 6D constellation point consisting of three 2D components, i.e., , , and in Fig. 1.
The -th element of the superimposed codeword in time unit can be obtained by adding these 2D components , carrying the information of all users in . The superimposed codeword is then determined after all elements are obtained through the above process. In Fig. 1, the first element of the superimposed codeword is . Other elements in the superimposed codeword can be similarly obtained, represented by the dark squares in a vertical column.
Ii-C2 TCM-based NOMA Decoding
At the receiver of each user, we jointly decode the data streams of all users by utilizing the soft-decision Viterbi based MLSD. Depending on the mapping matrix F, the same data stream of each user transmits over different subcarriers, which can be utilized for joint decoding and is different from the traditional Viterbi algorithm.
In each step of the sequence detection, we first estimate the transmitted signal points by utilizing the above non-orthogonal nature. We then update the survivor paths based on the soft-decision Viterbi algorithm. The data streams of each user can be recovered based on the survivor paths.
Iii TCM-based NOMA Encoder Design
In this section, we first discuss the criteria for TCM-based NOMA encoder design. and then illustrate three phases of the encoder design in detail.
Iii-a Encoder Design Criteria
We assume that the number of subcarriers is , and the number of bits to be coded and transmitted for each user in each time unit is , , . For example, in Fig. 1, we have for each user. The sequence of non-zero elements in the sparse codewords with respect to subcarrier is denoted as .
As shown in Fig. 2, we utilize the conventional lattice or star constellation to construct an MD mother constellation from which we select the signal set. To be specific, given a M-QAM constellation with size , a M-QAM mother constellation can be constructed with the size of . In Fig. 2, we have a 16QAM mother constellation .
Since the original constellation is composed of 2D signal points, the MD mother constellation consists of -D points. Each -D point is denoted by with , in which represents a signal point from the original M-QAM constellation . We denote the position of a -D point in the 2D plane222For convenience, we denote the position of the signal point in the form of a complex number. The coordinates of this point in the 2D plane is . In the remaining part of this paper, when we mention the position of a -D point , we refer to its position in the 2D plane. as . For example, if we have three 2D points extracted from , say, , , and where , and then a 6D point in Fig. 2 constructed from these points is denoted by , and its position in the 2D plane is .
With the above definitions, the process of coded modulation can be described as below. According to the TCM principle, in each time unit , out of bits from the data sequence are sent into a convolutional encoder of rate . The rest of uncoded bits from will determine a specific point in a subset of with the size of . In this way, the coded sequence with the length of is mapped to a unique -D constellation point . The th element in the superimposed codeword can be determined based on the position of a chosen -D point , i.e., .
Take Fig. 2 as an example. For subcarrier 1, the data sequence to be transmitted is . Given a signal constellation of size 128, the first three bits of this sequence are sent into a convolutional encoder of rate 3/4 such that the output four coded bits, say, 1011, select a signal subset of size 8, say, . A specific 6D point of , say, , is then selected based on the remaining three uncoded bits of . The value of the first element in the superimposed codeword is actually the position of . The mapping between the coded (or uncoded) bits and the signal subset (or a 6D point) will be illustrated in Section III.B-2.
To minimize the event-error probability , we aim to jointly design the coding and modulation so that the MFSE distance between any two coded sequences, denoted by , can be maximized. In other words, we aim to construct an optimal mapping from the transmitted data sequence over each subcarrier to the corresponding non-zero elements in the codewords, i.e., , , . Since the mapping is not related to , we omit it in the remaining part of this section.
As mentioned above, not all bits of sequence participate in the convolution computation, and thus, there exist parallel branches between two states of the convolutional encoder, brought by the uncoded bits. According to the TCM principle, is determined by two terms: 1) the minimum squared Euclidean (MSE) distance between different trellis paths longer than one branch; 2) the MSE distance between the parallel branches of the encoder. Therefore, we have . In Subsections III.B and III.C, we illustrate how to calculate and , respectively, as well as the relationship between them.
Iii-B Joint Encoding
Three phases are illustrated as below, i.e., 1) signal set selection; 2) signal set labeling; 3) convolutional encoder construction.
Iii-B1 Signal Set Selection
A signal set is selected from the MD mother constellation so as to satisfy the following two constraints on a) the size of the set; b) the uniqueness of each point’s position in the set. For successful decoding, each coded data sequence of length is required to be mapped to a unique position. For generality, we also assume that any two coded data sequences and transmitted over different subcarriers and are mapped to two different positions333In the simulation, we will show both cases where different signal sets or identical signal sets are allocated to different subcarriers.. Since there are subcarriers, unique positions in total are required for the selected signal set.
Therefore, we aim to construct a signal set of size from the MD mother constellation such that the positions of any two points projecting in the 2D plane are different, which can be mathematically formulated as:
This problem can be solved by the following two steps.
a) Obtaining a unique constellation: Note that in the MD mother constellation , there may exist more than one -D constellation points sharing the same position in the 2D plane. Denote the set of points sharing the same positions as . To keep the uniqueness of each signal point, we can only select one point as a member of while all the other points are removed. To improve the diversity among non-zero elements of the sparse codewords transmitting over the same subcarrier, we select the MD point with the largest variance among its 2D components, i.e., . Note that the size of is required to be no smaller than ; otherwise, the MD mother constellation is required to be reconstructed.
b) Shaping: After step a), we denote the current constellation as . If the size of is larger than , we continue to remove points until constraint is satisfied. For a constellation, its average energy is required to be minimized while the MFSE distance between points can be as large as possible. Therefore, we tend to remove those points close to their neighbors but far away from the original point. A greedy algorithm is adopted for shaping based on the above criterion. We sort the constellation points in as such that
holds , where the numerator in the fraction is the MFSE distance between point and other points in , and the denominator represents the energy of point . We remove the points one by one in starting from until the size of the remained signal set is , i.e., the target set is obtained.
Iii-B2 Signal Set Labeling
Signal set labeling refers to the process in which each coded sequence , is assigned a unique signal point in such that the MFSE distance can be maximized. As shown in Fig. 2, two steps of signal set labeling for NOMA are listed as below:
Subcarrier-based Set Partitioning: As mentioned above, any two coded sequences and are assigned to different signal points in . Therefore, we divide into subsets with the equal size, i.e., . Each subset corresponds to all possible signal points transmitting over subcarrier , satisfying .
TCM-based Set Partitioning: For each subcarrier , a signal point in is assigned to a coded sequence such that the MFSE distance can be maximized. To construct the mapping based on the above criterion, we adopt the set partitioning technique in which a binary partitioning tree is utilized . In each level of the tree, the current signal set is divided into two subsets such that the minimum squared subset distance (MSSD), that is, the MSE distance between signal points in the same subset, is maximized. In the last level of the tree, the leaf node refers to a constellation point. Each constellation point can be reached via a unique path through the tree.
Overall Set Partitioning Process: To further maximize the MFSE distance of the coding scheme with respect to each subcarrier, we consider adopting the binary-tree based set partitioning technique in the first step as well. This is to say, given , we aim to maximize the MSSD of each subset . For the system shown in Fig. 1 and Fig. 2, the signal set labeling can be described as below. We first construct a super binary partitioning tree of levels with the root node . The first level of the tree consists of two subsets divided from , denoted by and . For level of the tree, each branch node represents a point set which will be divided into two subsets in the next level, i.e., and . The signal sets obtained after the subcarrier-based set partitioning can be found in level 2 of the tree, i.e., . In the last level of the tree, we have , i.e., each leaf node represents a specific signal point. The coded sequence mapped to this point is labeled by the path towards this leaf node. For example, we have implying that the coded sequence 0000000 transmitted over subcarrier 1 is mapped to the signal point .
Based on the binary subtree for each subcarrier , , we observe that can be obtained by inspecting the minimum MSSD of the subsets in the th level of the subtree, i.e., the th level of the binary tree in Fig. 2.
Basic Bipartite Set Partitioning Operation: Note that the basic operation in the above process is to divide a signal set into two subsets with the equal size. We now formulate this basic bipartite set partitioning problem and propose a novel algorithm to solve it.
a) Problem Formulation: Given a signal set , we aim to divide it into two subsets and with equal size such that the MSSD is maximized. The MSSD of each subset is given by
Therefore, an optimal bipartite set partitioning of , i.e., , can be obtained by solving the following problem:
This is a non-trivial problem due to its combinatorial nature and the irregular positions of the points. The traditional multi-level coding technique  which is originally designed for partitioning lattice or star constellations does not fit this case any more. We then propose a modified farthest point optimization algorithm to address this problem, as illustrated below.
b) Algorithm Design: We observe that the distribution of the points in our formulated problem has the blue noise properties, i.e., i) the signal points cover a certain area and there are no “holes” or “clusters” in the 2D plane; ii) the points in the selected set are distributed almost irregularly. Such properties have drawn great attention in the researches on point set generation  where a farthest point optimization (FPO) strategy  has been utilized for generating point distributions with high-quality “blue noise” characteristics, i.e., large point spacing in a given area.
Instead of point generation, we aim to select points for each subset given the point distribution. By extending the FPO strategy, we then propose a novel bipartite set partitioning (FPO-BSP) algorithm to solve problem , consisting of two phases: i) initial subset construction; ii) FPO iteration. Details of the algorithm are shown in Algorithm 1 and are illustrated as below.
For convenience, we first present three different distance metrics. Given a signal point , its minimum distance to a set is defined as
Based on this definition, the MSSD of a set in can be rewritten by
The average minimum distance of a set , denoted by , can be obtained based on each point’s minimum distance, i.e.,
Since we tend to divide the target signal set into two subsets where the points are spread out as far as possible, both the MSSD and the average minimum distance of the subsets are encouraged to be maximized.
i) Initial Subset Construction: As shown in Phase 1 of Algorithm 1, we adopt a greedy method to divide the target set into two initial subsets. For the target set of size , there are pairs of points in total. We sort these pairs in the increasing order of the distance between two points (line 2 in Algorithm 1). The list can be mathematically presented as
As shown in line 3 of Algorithm 1, we initialize two subsets as and . The remaining pairs of points in list are then traversed to be added into different subsets (line 5-21). To be specific, we consider a pair . If both points never show up in the current subsets, we then add and separately into and in which such that the minimum distance between the members of this pair and the subsets can be maximized (line 15-17), i.e.,
However, when the distance between and is larger than the MSSD of current subsets, i.e., , then it is not necessary that and are divided into two subsets. Each of them selects a subset with a large minimum distance to join (line 11-13). If one of the points in the target pair has already been added in a subset in previous operations, then the other point is naturally added to another subset (line 19-21). The whole process ends when the size of one subset reaches . The left unchosen points in list are then added to the other set which has not been fully filled (line 6-9).
ii) FPO Iteration: Based on the above initialized subsets, we then perform the FPO iteration as shown in Phase 2 of Algorithm 1. The key idea of the FPO method is illustrated as below. Given two initial subsets and , we consider replacing each point in subset with the farthest position for this subset, which is selected from the other subset . The farthest position444Different from the researches on point distribution generating , we cannot generate new points in the given set, and thus, we redefine the farthest position as the farthest position in the existing points of the set. for , defined as , can be obtained by searching ,
For each point , it is first removed from and inserted into (line 28). We then search for the farthest position for from the set based on equation (line 29-34). When all points in are traversed once, one FPO iteration is finished (line 25-35). During multiple FPO iterations, both the MSSD and the average minimum distance of the subsets will be increasing until convergence, which will be proved in detail in Proposition 1.
Delaunay Triangulation for FPO: Note that in the FPO method, a large amount of operations such as point searching, removing, and inserting are required. To achieve a low computational complexity, we introduce the Delaunay triangulation (DT) method to construct a dynamic graph in which the relative positions of points can be better depicted and easily traced.
Definition 1: Given a point set , a DT refers to a triangulation such that no point in lies inside the circumcircle of any triangle in . Any edge in a DT is called a Delaunay edge.
In a DT, each triangle follows the property of empty circumcircle. One commonly used method for constructing such a triangulation is the on-line DT method . Starting from a certain point in the set, the neighboring points are inserted one by one to form the triangles while the property of empty circumcircle is guaranteed. Given a formulated triangulation , point inserting and removing can be completed flexibly via the local optimization procedure in which Delaunay edges are added or removed within a localized area.
Remark 1: In Algorithm 1, we refer to the above operations as DT-INSERT and DT-REMOVE , implying the point inserting into and removing from , respectively.
Remark 2: In each FPO iteration, suppose that the DT constructed from the set is . The minimum distance between a point and can be obtained by searching the Delaunay neighbors555The Delaunay neighbors of refer to those points in sharing the same edges with . of in instead of searching all points in . We refer to this operation as DT-SEARCH .
Algorithm Interpretation of Phase 2: Algorithm 1 with respect to DT can be re-interpreted in detail as below. Following the on-line DT method, the triangulations of two initial point subsets and are constructed, denoted by and . As shown in Phase 1, points are added into two subsets in sequence (line 4, 14, 18, 22). In each FPO iteration, for each vertex , the minimum distance between and , i.e., , can be obtained by searching the Delaunay neighbors of in (line 26). After recording the current minimum distance , vertex is removed from and inserted into (line 27-29). We then traverse the vertices in the newly constructed to search for the farthest position of (line 30-36). If the MSSD of will not decrease and will not be smaller than the MSSD of , is then replaced by the farthest position of (line 35-37). The sizes of two subsets maintain to be equal since we just swap a new point in with an old one in . The FPO iterations will not stop until no changes can be made to and (line 38).
c) Analysis of the Proposed Algorithm: The convergence and complexity of the FPO-BSP algorithm are analyzed as below. The proof of Proposition 1 can be found in Appendix A.
Proposition 1: In each FPO iteration of Algorithm 1, the average minimum distance and MSSD of subset are increasing, and the MSSD of subset are non-decreasing. Therefore, Phase 2 is guaranteed to converge.
Remark 3: Since the MSSD of two initial subsets of Phase 1 are usually different, we set the subset with a smaller MSSD as and the other one as , and then send them to Phase 2. Based on Proposition 1, the MSSD of and will be more balanced after the iterations.
As discussed in Appendix B, the complexity of Phase 1 is and the complexity of each FPO iteration in Phase 2 is .
Iii-B3 Convolutional Encoder Design
As mentioned in Subsection II.A.2, a convolutional encoder is adopted in the TCM-based NOMA scheme to generate coded bits. We now illustrate how to design the rate and the structure of the convolutional encoder.
As shown in equation and Fig. 2, rate determines the level of set partitioning in a way that the value of increases with . Since the decoding complexity also grows with , a trade-off should be reached between and the complexity.
For the different rate with , the optimal convolutional encoder that maximizes is different. For simplicity, we assume that the same convolutional encoder of rate is designed for different subcarriers. The diagram of a systematic feedback convolutional encoder (Fig. 18-16 in ) is adopted. Note that in the traditional MD-TCM scheme, the distance between any two nodes and is . This can be considered as an upper bound of the distance between these two nodes in our proposed TCM-based NOMA scheme since we have
Therefore, we adopt the structure of the convolutional encoder for M-PSK/QAM in the MD-TCM scheme . For a given rate , the value of increases with the number of register states, , in the encoder. However, a large value of increases the decoding complexity as well, requiring a trade-off between and . Since the optimal convolutional encoders have always been found by computer search , we do not illustrate the detailed process in this paper.
Note that the above method is suitable for the AWGN channels. However, for the Rayleigh fading, the number of parallel branches should be reduced and the encoder state diagram should be redesigned such that a) the shorted error event path length can be increased ; b) the product of the squared branch distance with respect to that path can be maximized . Given an encoder designed for AWGN channels, we can modify the state diagram by reducing the parallel branches based on the above criteria.
Iv TCM-based NOMA Decoder Design
Due to the non-orthogonal nature of the NOMA, signals of multiple users are superimposed and transmitted over the same subcarrier, and thus a joint decoding technique is required. Since the convolutional encoding is adopted, an efficient sequence detection should be considered in the decoding scheme. In this section, we design a two-layer Viterbi-based algorithm for the joint TCM-based NOMA decoding in which the soft-decision based MLSD technique is utilized.
Iv-a Criterion for Joint Decoding
We assume that data sequences are sent into each TCM encoder sequentially, i.e., one sequence per time unit. Denote as the number of time units required for the encoder registers to be cleared. The set of coded sequences is . In the traditional ML detection, the decoder should produce a set of estimated of given the received sequence such that can be maximized.
However, due to the multiplexing nature of NOMA, the input data sequences of different encoders overlap with each other, which can be utilized for joint decoding. For example, given the mapping matrix in , the data streams of users 1, 2, and 3 are coded and transmitted over subcarrier 1, and that of users 1, 4, and 5 over subcarrier 2. A successful decoding scheme requires that the decoded data sequence of the same user obtained from different subcarriers in the one time unit should be the same. In other words, the data sequence of user 1 decoded from the first two subcarriers, i.e., and , should be identical and the same is true with other users. Mathematically, we have
Based on the above idea, we consider to adopt the cross check in the ML detection for joint NOMA decoding. Given the encoder state of time unit and the input data sequence of time unit , a TCM encoder only produce one out of possible code sequences as the output. To depict this, we introduce a binary probability variable in which is the encoder state in time unit . Given and , the probability of obtaining c over subcarrier is
By combining condition and definition , we present the following proposition.
Proposition 2: Given the encoder states in time unit , if code sequences in time unit are correctly estimated, then the following condition is satisfied:
in which the kit of data sequences satisfies condition .
For convenience, we denote a kit of code sequences as . Since there are encoders, we define the super encoder state in time unit as the combination of encoder states of all encoders, i.e., . Proposition 2 provides a necessary condition, and thus, there may exist multiple qualified kits of code sequences in each time unit . For any possible super encoder state , the set of qualified kits of coded sequences satisfying Proposition 2 in time unit is contained in .
The criterion for joint NOMA decoding can then be described as below. The decoder is required to estimate given the received sequence y by optimizing the following problem:
in which is called the branch metrics.
Iv-B Joint Decoding Scheme Design
To solve the ML detection problem in , we first present an optimal solution extended from the Viterbi algorithm . To obtain a more practical solution with tolerable complexity, we then design a suboptimal two-layer Viterbi-based decoding algorithm.
Iv-B1 Optimal Maximum Likelihood Decoding Scheme
For each encoder, we assume that the number of registers is , and thus there are encoder states. Since there are encoders, we have super encoder states in total. A super encoder state diagram can then be constructed based on the super encoder states. Considering the cross check requirement of the NOMA decoder, we assume that one super encoder state can only transfer to another if the input data sequences and output coded sequences, i.e., and , satisfy Proposition 2.
Given the super encoder state diagram, the soft-decision based Viterbi algorithm  can then be performed. For each super encoder state, we aim to update the survivor path in each time unit. A survivor path in time unit refers to a record consisting of a series of super encoder states , and the branch between every two sequent states and . A branch is denoted by the estimated input data sequences and output code sequences, i.e., and . Mathematically, a survivor path with respect to can be given by
To update the survivor path for each , we utilize the branch metrics in to select a state as well as the branch bridging and . Specifically, the branch metrics in each time unit can be depicted by the total Euclidean distances between received signals and the signal points mapped to the candidate code sequences, i.e., , in which maps a coded sequence to the output signal point in the constellation.
In the last time unit , we select the shortest survivor path terminated with an all-zero super encoder state as the final path. The optimality of this cross-check based soft-decision Viterbi algorithm can be guaranteed since the final survivor path is always the ML path (Theorem 12.1 in ).
Iv-B2 Suboptimal Two-layer Viterbi-based Joint Decoding Scheme
In the optimal decoding algorithm, for each time unit the decoder is required to traverse all super encoder states and possible branches to update the survivor path for each super encoder state, leading to a prohibitively high computation complexity. To reduce the complexity, we consider only checking the most possible super encoder states and branches to update no more than survivor paths in each time unit in the suboptimal decoding scheme.
a) Inner Layer Soft-decision Viterbi Operation: For convenience, we denote the set of stored super encoder states corresponding to the survivor paths in time unit as . For each , we first illustrate how to select the candidate branches based on the received signals over subcarriers in time unit . The inner layer operation is performed over each subcarrier separately.
For a received signal point
, most candidate output signal points lie in the neighborhood of the received signal point since the noise power obeys the Gaussian distribution with a given variance. Therefore, instead of recoding all the signal points corresponding to potential branches, we only record those lying in the neighborhood666The neighborhood of the received signal point can be defined by a circle whose center is this signal point..
To be specific, for each subcarrier , given the current encoder state , the set of all possible input data sequences is denoted as and . The set of all possible output coded sequences can be denoted as such that each satisfies that