## I Introduction

We introduced in [WJH18a, WJH18b] a novel blind (noncoherent) communication scheme for the physical layer, called
modulation on conjugate-reciprocal zeros (MOCZ), to reliably transmit sporadic short-packets of fixed size over unknown
wireless multipath channels with bandwidth at an incredible low-latency. Here the information of the packet is
modulated on the zeros of the transmitted discrete-time baseband signal’s transform. We will call the discrete-time
baseband signal a *MOCZ symbol*, similar to an orthogonal frequency division multiplexing (OFDM) symbol, which is a
finite length sequence of complex-valued coefficients. These coefficients will then modulate a continuous-time pulse
shape at a sample period of to generate the continuous-time baseband waveform. Since the
MOCZ symbols (sequences) are neither orthogonal in time nor frequency domain, the MOCZ design can be seen as a non-orthogonal
multiplexing scheme.
After up-converting to the desired carrier frequency, the transmitted passband signal will propagate in space such that,
due to reflections, diffractions, and scattering, different delays of the attenuated signal will interfere at the
receiver. Hence, multipath propagation causes a time-dispersion which results in a frequency-selective fading channel
[TV05].
Due to ubiquitous impairments between transmitter and receiver clocks a *carrier frequency offset*
(CFO) will be present after a down conversion to the baseband. Doppler shifts due to relative velocity causes additional
frequency dispersion which can be also approximated in first order by a CFO. This is a known weakness in many
multi-carrier modulation schemes, such as OFDM [TV05, Moo94, ZGX10, LLTC04], and various approaches have been
developed to estimate or eliminate the CFO effect. A common approach for OFDM systems is to learn the CFO in a training
phase or from blind estimation algorithms, such as MUSIC [LT98] or ESPRIT [TLZ00]. Furthermore, due to the
unknown distance and asynchronous transmission, a *timing offset* (TO) of the received symbol has to be determined
as well, which will otherwise destroy the orthogonality of the OFDM symbols [CKYK10, 5.1],[PKPKKH06]. By
“sandwiching” the data symbol between two training symbols a timing and frequency offset can be estimated
[SC97],[SC96]. By using antenna arrays at the receiver, antenna diversity of a single-input-multiple output
(SIMO) system can be exploited to improve the performance [ZGX10].

Whereas OFDM is typically used in long frames, consisting of many successive OFDM symbols and hence much longer signal lengths, we consider here only one single symbol transmission with a very short signal length. This places high demands on such a bursty signaling scheme, since timing and carrier frequency offsets have to be addressed from only one received symbol. Here our MOCZ scheme will be a promising solution. Since any communication will be scheduled and timed on the MAC layer by a certain bus, running with a known bus clock-rate, timing-offsets of the symbols can be assumed as fractions of the bus clock-rate. We will introduce here an improved receiver design for a coded binary MOCZ (BMOCZ) scheme and demonstrate by bit-error-rate (BER) simulations the robustness against these impairments.

In the MOCZ design, a CFO will result in an unknown common rotation of all received zeros. Since the angular zero
spacing in a BMOCZ symbol of length is given by a base angle of , a fractional rotation can be
easily obtained at the receiver by an oversampling during the post-processing to identify the most likely transmitted
zeros (zero-pattern).
Rotations, which are integer multiples of the base angle, correspond to cyclic shifts of the binary message word. By
using a *cyclically permutable code* (CPC) for the binary message, the BMOCZ symbol becomes invariant against any
cyclic shift and hence against any CFO. This prevents any further symbol transmissions for estimating the CFO, which
will reduce overhead, latency, and complexity. As a byproduct, this has the appealing feature of providing a CFO
estimation from the decoding process of a single BMOCZ symbol. Furthermore, due to the embedding into
a cyclic code, such as BCH codes, we can use their error correction capabilities to improve the BER and moreover the *block
error-rate* (BLER) performance
tremendously.
By measuring the energy of the expected symbol length with a sliding window in the received signal, we can identify
arbitrary TOs at the receiver. We will show the robustness of the TO estimation analytically, which reveals another
strong property of the MOCZ design.

At last, we will combine CFO and TO with error correction over multiple receive antennas and demonstrate antenna diversity of the SIMO system. By simulating BER over the received SNR for various average power delay profiles, with constant and exponential decay as well as random sparsity constraints, we will demonstrate the performance in various indoor and outdoor scenarios by using the simulation framework Quadriga [JRBT14].

### I-a Notation

We will use small letters for complex numbers in . Capital Latin letters denote natural numbers

and refer to fixed dimensions, where small letters are used as indices. Boldface small letters denote row vectors and capitalized letters refer to matrices. Upright capital letters denote complex-valued polynomials in

. We will denote the first natural numbers in as . For we denote by the shift of the set . The Kronecker-delta symbol is given by and is if and else. For a complex number , given by its real part and imaginary part with imaginary unit , its complex-conjugation is given by and its absolute value by . For a vector we denote by its complex-conjugated time-reversal or*conjugate-reciprocal*, given as for . We use for the complex-conjugated transpose of the matrix . For the identity matrix we write and for a matrix with all elements zero we write . By we refer to the diagonal matrix generated by . The unitary Fourier matrix is given entry-wise by for . By denote the elementary Toeplitz matrix given element-wise as . The all one and all zero vectors in dimension will be denoted by and , respectively. The -norm of a vector is given by for . If we write and for we set

. The expectation of a random variable

is denoted by .## Ii System Model and Requirements

We are interested in a blind and asynchronous transmission of a short single MOCZ symbol at a designated
bandwidth . In this “one-shot” communication we assume no synchronization and no packet scheduling between
transmitter and receiver. Such extreme sporadic, asynchronous, and ultra short-packet transmissions are required, for
example, in critical control applications, exchange of channel state information (CSI), signaling protocols, secret
keys, authentication, commands in wireless industry applications, or initiation, synchronization and channel probing
packets to prepare for longer or future transmission phases.
By choosing the carrier frequency, transmit sequence length, and bandwidth accordingly, a receive duration in the order
of the channel delay spread can be obtained, which pushes the latency at the receiver to the lowest possible. Since the
next generation of mobile wireless networks aims for large bandwidths with carrier frequencies beyond Ghz, in the so
called *mmWave* band, the transmitted signal duration will be in the order of nano seconds. Hence, even at
moderate mobility, the wireless channel in an indoor or outdoor scenario can be considered as approximately
time-invariant over such a short time duration. On the other hand, wideband channels are highly frequency selective,
which is due to the superposition of different delayed versions (echos) of the transmitted signal at the receiver. This
makes equalizing in time-domain very challenging and is commonly simplified by using OFDM instead. But conventional OFDM
requires an additional cyclic prefix to convert the frequency-selective channel to parallel scalar channels and in
coherent mode it requires additional pilots (training) to learn the channel coefficients. This will increase the latency
for short messages dramatically.

For a communication in mmWave band massive antenna arrays are exploited to overcome the large attenuation, which increases the complexity and energy consumption in estimating the huge amount of channel parameters and becomes the bottleneck in mmWave MIMO systems, especially for mobile scenarios. However, in a sporadic communication only one symbol will be transmitted and a next symbol may follow at an unknown time later. In a random access channel (RACH), a different user may transmit the next symbol from a different location, which will therefore experience an independent channel realization. Hence, the receiver can barely use any channel information learned from past communications. OFDM systems approach this by transmitting many successive OFDM symbols as a long frame, to estimate the channel impairments, which will cause a considerable overhead and latency if only a few data-bits need to be communicated. Furthermore, to achieve orthogonal subcarriers in OFDM, the cyclic prefix has to be at least as long as the channel impulse response (CIR) length, resulting in signal lengths at least twice as the CIR length during which the channel also needs to be static. Using OFDM signal lengths much longer than the coherence time might be not feasible for fast time-varying block-fading channels. Furthermore, the maximal CIR length needs to be known at the transmitter and if underestimated will lead to a serious performance loss. This is in high contrast to our MOCZ design, where the signal length can be chosen for a single MOCZ symbol independently from the CIR length. The goal in this work is to address the ubiquitous impairments of the MOCZ design under such ad-hoc communication assumptions and signal lengths in the order of the CIR length.

After up-converting the MOCZ symbol, which is a discrete-time complex-valued baseband signal
of two-sided bandwidth , to the desired carrier frequency , the transmitted
passband signal will propagate in space. Regardless of directional or omnidirectional antennas, the signal will be
reflected and diffracted at point-scatters, resulting in different delays of the attenuated signal which interfere at
the receiver if the maximal delay spread of the channel is larger than the sample period . Hence,
the multipath propagation causes time dispersion resulting in a frequency-selective fading channel.
Due to ubiquitous impairments between transmitter and receiver clocks an unknown *frequency offset*
will be present after the down-conversion to the received continuous-time baseband signal

(1) |

By sampling at the sample period , the received discrete-time baseband signal can be represented
by a *tapped delay line* (TDL) model. Here the channel action is given as the convolution of the MOCZ symbol
with a finite impulse response , where the th complex-valued channel tap describes the
th averaged path over the bin , which we model by a circularly symmetric Gaussian random
variable in for and zero elsewhere. The average *power delay profile* (PDP)
of the channel can be sparse and exponentially decaying, where defines the sparsity pattern of
non-zero coefficients and the exponential
decay rate.
To obtain equal average transmit and average receive power we will eliminate in our analysis the overall channel gain by
normalizing the CIR realization by its average energy (for a given
sparsity pattern), such that . The convolution output is then additively distorted by Gaussian
noise

of zero mean and variance (average power density)

for as(2) |

Here denotes the *carrier frequency offset* (CFO) and the
*timing offset* (TO), which marks the delay of the first symbol coefficient via the first channel path
, measured in integer multiples of the sample time . The modulated MOCZ symbol will have
rotated coefficients as well as the channel ,
which will be also effected by a *global phase* . Since the channel taps have a uniform independent
phase the distribution does not change. By the same argument, the Gaussian noise distribution is not alternated by
the phase, hence we have for any and .

In [WJH18b, WJH18a] a good signal-codebook is given for Binary MOCZ (BMOCZ) for the set of normalized *Huffman
sequences* , i.e., by all with positive
first coefficientand “impulsive-like“ autocorrelation [HUf62], given by

(3) |

for some . The absolute value of (3) forms a *trident* with one main peak at the center,
given by the energy , and two equal side-peaks of , see
Figure (1). From an analytical and empirical investigation [WJH18a], the BMOCZ symbols are
most robust against noise if

(4) |

Hence, the BMOCZ codebook (constellation set) is only determined by the number . Each BMOCZ symbol (constellation, Huffman sequence) defines the coefficients of a polynomial of degree , where the zeros are uniformly placed on a circle of radius or , selected by the message bits as

(5) |

see also Figure (3). Hence, the BMOCZ encoder is defined iteratively for by its
*zero codeword* as

(6) |

where we normalize after the last iteration step . From the received noisy signal samples (no CFO and TO)

(7) |

the decoder is given as a *Direct Zero Testing* (DiZeT) of the received polynomial
at the possible zero positions as

(8) |

see [WJH18a, WJH18b].
A global phase in will have no affect to the DiZeT decoder and to the received zeros. But the CFO
modulates the BMOCZ symbol in (2) and causes a rotation^{1}^{1}1The CFO would
rotate the zeros in any scheme of modulation on zeros, but we will consider here for simplicity only the BMOCZ scheme.
of its zeros by in (5), which will destroy the hypothesis test of the DiZeT decoder. Hence, one
needs to either estimate or use an outer code for BMOCZ to be invariant against an arbitrary rotation of the
entire zero codebook , which we will introduce in Section (IV).
However, before we can apply the DiZeT decoder, we have to identify the timing offset of the symbol which yields to the
convolution output in (7).

## Iii Timing Offset and Effective Delay Spread for BMOCZ

In an asynchronous communication, the receiver does not know when a packet from a transmitter (user) will arrive. Hence, at first the receiver has to detect a transmitted packet, which is already one bit of information. We will assume that the receiver decide correctly, that in an observation window of received samples, one single MOCZ packet of length with maximal channel length of is captured. By assuming a maximal length and a known or a maximal at the receiver, the observation window can be chosen, for example, as . From the noise floor knowledge at the receiver, a simple energy detector with a hard threshold over the observation window can be used for a packet detection. Then, an unknown TO and CFO will be present in the observation window

(9) |

The challenge here is to identify and the efficient channel length which contains most of the energy of the
instantaneous CIR realization . The estimation of these *Timing-of-Arrival* (TOA) parameters are usually done by
observing the same channel under many symbol transmission, to obtain a sufficient statistic of the channels PDP
[GGKST03], [CWM02].
Since we only have one observation available, a good estimation is very challenging.

The efficient (instantaneous) channel length , defined by an energy concentration window, will be much less than
the maximal channel length , due to blockage and attenuation by the environment, which might also cause a sparse,
clustered, and exponential decaying power delay profile.
For the MOCZ scheme, it is essential to correctly identify in the window (9) the first
received sample from the transmitted symbol , or at least do not miss it, since it will carry most of the
energy if is the *line of sight* (LOS) path. It was shown in [WJH18b] that for the optimal radius in
BMOCZ, carries in average to of the BMOCZ symbol energy, see also Figure (1). On
the other hand, an overestimated channel length will reduce the overall bit-error performance because the
receiver collects unnecessary noise samples.

Since we assume no CSI at the receiver, the channel characteristic, i.e., the instantaneous power delay profile, has to be determined entirely from the received MOCZ symbol. We will introduce here an efficient approach for the BMOCZ design, by exploiting the radar properties of the Huffman sequences, to obtain excellent estimation of the timing offset and the effective channel delay in moderate and high SNR.

Huffman sequences have an impulsive autocorrelation (3), originally designed for radar applications, and are therefore very suitable to measure the channel impulse response [GG05]. Since the transmitted Huffman sequence is still unknown at the receiver, we can not correlate the received signal with the correct Huffman sequence to retrieve the CIR. Instead, we will use an approximative universal Huffman sequence, which is just the first and last peak of a typical Huffman sequence, expressed by the impulses and for as

(10) |

which we call the *Huffman bracket of phase *. Since the first and last coefficients are

(11) |

see [WJH18b], typical Huffman sequences, i.e., having same amount of ones and zeros, will have

(12) |

By correlating the modulated Huffman sequence with the *Huffman bracket* we keep the
locational properties of the Huffman autocorrelation (trident)

(13) |

where denotes the *exterior signature* and the
*interior signature* of the Huffman sequence , see Figure (1). Here, the interior
signature can be seen as the data noise floor distorting the trident in (3). Taking the
absolute-squares in (13), we get for the three peaks of the approximated trident

(14) |

where the side-peaks have energy

(15) |

Since we get by (3) and that , where the lower bound is achieved for typical sequences with (having the same amount of ones and zeros) and the upper bound for (all ones or all zeros). If then the two coefficients (the exterior signature ) will carry all the energy of the Huffman sequence. But then also and the only Huffman sequences (real valued first and last coefficient) are given by and for else, which are the coefficients of polynomials with uniform zeros on the unit circle, see [WJH17a]. For given by (4) the autocorrelation side-lobe is exponentially decaying in but is bounded to for . Hence, , such that almost half of the Huffman sequence energy is always carried in the two peaks. If the CFO would be known, we can set and get for the center peak in (14)

(16) |

i.e., the energy of the center peak is roughly twice as large as the energy of the side-peaks, and reveals the trident in the
approximated Huffman autocorrelation . But , since we do not know the CFO and for some
then we get for typical Huffman sequences, such that the power of the center peak will vanish.
Hence, in the presence of an unknown CFO the center peak does not always identify the trident. We will therefore
correlate the *positive Huffman bracket* with the absolute-square value of or in presence of
noise and channel with the absolute-square of the received signal , which will result approximately in

(17) |

where and are colored noise and

(18) |

denotes the noisy trident which collects three times the *instant power delay profile* of the shifted
CIR. These three echos of the CIR will be separated if we have .
The approximation in (17) can be justified by the isometry property of the Huffman convolution. Briefly, , the
generated (banded) Toeplitz matrix , for any Huffman sequence
, is a stable *linear time-invariant* (LTI) system, since the energy of the output satisfy for any
CIR realization

(19) |

Here, is the autocorrelation matrix of , which is the identity scaled by if . Hence, each normalized Huffman sequence, generates an isometric operator having the best stability among all discrete-time LTI systems, as studied in [WJP15].

### Iii-a Timing Offset Estimation

The delay of the strongest path can be identified from the maximum in (17)

(20) |

where the last equality follows from the fact that both peaks in are contributing between and . If the CIR has a LOS path, then and we immediately have found an estimate for the timing-offset by . In case of NLOS or if the first paths are equally strong, we have to go further back and identify the first significant peak above the noise floor, since the convolution sum of the CIR with the interior signature might produce a significant peak. Let us note here, that this might result in a misidentification of the tridents center peak by (20), for example if . Therefore we will use as a peak threshold

(21) |

which is the average power of the Huffman sequence distorted by the channel and noise. By comparing to the noise power we found empirically to set the noise-dependent threshold to

(22) |

to ensure with high probability to be above the instantaneous noise energy. By using an iterative back stepping in Algorithm (

1), we will stop if the sample power falls below the threshold , which finally yields an estimate of the timing-offset. In line of Algorithm (1) we update the timing-offset estimate, if the sample power is larger than the threshold and the average power of the preceding samples is larger than the threshold divided by , which will be weighted by the amount of back-steps.### Iii-B Efficient Channel Length Estimation

Since the BMOCZ design does not need any channel knowledge at the receiver, it is also well suited for estimating the
channel itself at the receiver. Here, a good channel length estimation is essential for the performance of the decoder,
if the power delay profile (PDP) is decaying. At some extent, the channel delays will fade out exponentially and the
receiver can cut-off the received signal by using a certain energy ratio threshold. Let us recall the average *received SNR*

(23) |

where is the energy of the BMOCZ symbols, which is constant for the codebook. If the power delay
profile is *flat*, then the collected energy will be uniform and the SNR will not change if we cut the channel
length at the receiver. However, the additional channel zeros will increase the confusion for the DiZeT decoder and
reduce the BER performance. Therefore, the performance will decrease for increasing at a fixed symbol length , see
simulations in [WJH18b]. For the most interesting scenario of the BER performance loss is only dB
over , but will
increase dramatically if . The reason for this behaviour is the collection of many noise taps, which will lead
to more distortion of the transmitted zeros. Since in most realistic scenarios the PDP will be decaying, most of the
channel energy will be concentrated in the first channel taps.
Hence, if we cut the received signal length to , we will reduce the channel length to and improve the rSNR for *non-flat* PDPs with , since it holds

(24) |

Since we obtain a significant gain in SNR if and . Hence, by cutting the received signal to the effective channel length, given by a certain energy concentration, we can improve the SNR and reduce at the same time the amount of channel zeros, which we will demonstrate by simulations in Section (V-C).

Assuming the knowledge of the noise
floor at the receiver, a cut-off time can be defined as the window time which, for example, contains of the
received energy.
The estimation of the efficient channel length can be done after the detection of the timing-offset
with Algorithm (1). We
assume here that the maximal channel delay is . Since the BMOCZ symbol length is , we know that the samples
of the received time-discrete signal in (2), which
is the CIR correlated by shifts of and distorted by additive noise (we ignore here the CFO distortion since it
will be not relevant for the PDP estimation), see Figure (1(a))-fig:receivedpower. We therefore
need to determine by an energy concentration threshold, which depends on the *instantaneous SNR* of .
We know, that the last channel tap will be multiplied by , which is as strong as in
average.
There are many signal processing methods to detect the efficient energy window
in the received samples, like total variation smoothing [BV04], or regularized least-square methods
[BV04, FR13] which promotes short window sizes (sparsity). We propose in Algorithm (2) an iterative increasing
of starting at and increase until enough channel energy is collected.
Here we set the estimate channel/signal energy to

(25) |

where we start with the maximal CIR length . By assuming a path exponent of we can calculate a threshold for the effective energy with . The algorithm then collects as many samples of until the energy is achieved and sets . The extracted modulated signal is then given by

(26) |

which will processed further for a CFO detection and final decoding.

## Iv Carrier Frequency Offset

We assume now, that the down-converted baseband signal in (9) has no further timing-offset and captured all path delays up to . The signal will experience an unknown CFO of

(27) |

This is a common problem in many multi-carrier systems, such as OFDM, which therefore require CFO estimation algorithms [Moo94, LLTC04, ZGX10]. For a bandwidth of , the relative frequency offset is

(28) |

Let us consider, for example, a carrier frequency of GHz with a drastic frequency offset of MHz and bandwidth Mhz. This would result in a relative frequency offset , which is able to rotate all zeros by any in the -plane. Hence, the received polynomial (noiseless) will experience a rotation of all its zeros by the angle

(29) | ||||

As illustrated in Figure (3), we have to ensure that each rotated zero (red) does not leave the zero-codebook set (blue) . To apply the DiZeT decoder, we have to find such that , i.e., we need to ensure that all the data zeros will lie on the uniform grid. Hence, for the CFO can be split in

(30) |

for some and , where is called the *integer* and the *fractional
CFO*, which are also present in OFDM systems [CKYK10, Cha.5.2]. Only if (or correctly compensated), the
DiZeT decoder, will sample at correct zero positions and decode, due to the unknown integer shift , a cyclic permuted
bit sequence , which we will correct in Section (IV-C) by an cyclically permutable code.

### Iv-a Decoding BMOCZ via FFT

The DiZeT decoder for BMOCZ allows also a simple hardware implementation at the receiver. Let us scale the received samples with the radius powers respectively

(31) |

By applying the point unitary IDFT matrix on the

zero-padded scaled signal, where

with , we get the samples of the transform^{2}

^{2}2An even more efficient FFT calculation with could be achieved if for some . by

where . Hence, the *DiZeT
decoder* simplifies to

(32) |

Here, can be seen as an oversampling factor of the IDFT, where we pick each th sample point to obtain the zero sample values. Hence, the decoder can be fully implemented by a simple IDFT from the delayed amplified received signal, by using for example FPGA or even analog front-ends. We can also rewrite the diagonal scaling matrix (31) in the symmetric form

(33) |

such that corresponds to a time-reversal of the diagonal, which brings us to

(34) |

since the absolute values cancel the phases from a circular shift and the conjugate-time-reversal , where is the circular time-reversal, can be rewritten by using .

### Iv-B Fractional CFO estimation via Oversampled FFTs

To estimate the factional frequency offset, we will oversample by choosing to add further zero
blocks to . This leads to an oversampling factor of and allows to quantize in uniform bins
with separation for a *base angle* . Hence, the absolute values of the sampled
transform in (27) of the rotated codebook-zeros are given by

(35) |

for each and , where is addition modulo . To estimate the fractional frequency offset of the base angle, we will sum the smaller sample values and select the fraction corresponding to the smallest sum

(36) |

Then the recovered signal will have the data zeros on the constellation grid .
See Figure (4) for a random fractional CFO and Figure (3) for
a schematic picture.

### Iv-C Using Cyclically Permutable Codes

To be robust against rotations which are integer multiple of the base angle, we will need an outer block code in for the binary message , which is invariant against cyclic shifts, i.e., a bijective mapping on the Galois Field

(37) |

such that for any . We will use the common notation for the code length .
Such a block code is called a *cycling
register code* (CRC) [Gol67], which can be constructed from the linear block code , by
separating it in all its *cyclic equivalence classes*

(38) |

where has *cyclic order* if
for the smallest possible . To make coding one-to-one, each equivalence class can be
represented by the codeword with smallest decimal value [RW75]^{3}^{3}3The authors call the cycling register codes
as cyclically permutable codes, which are nowadays defined differently. Furthermore, they claim that CRCs are also
comma-free codes, which is not true by definition.. Then is given by the union of all its equivalence class
representatives and its cyclic shifts, i.e.

(39) |

This will generate in a systematically way a look-up table for the cycling register code. Unfortunately, the construction is non-linear and combinatorial difficult. However, the cardinality of such a code is proven explicitly for any positive integer in [Gol67, Thm.VI.3] to be (number of cycles in a cycling register)

(40) |

where is the *Euler function*, which counts the number of elements coprime to .
For prime, we obtain

(41) |

which would allow to encode at least bits. For this would result in a loss of only

Comments

There are no comments yet.