I Introduction
With continuing advances in sensors and lowpower communication technologies, humancentric Internet of Things (IoT) has gained increasing momentum in both industrial and academic communities by unobtrusively providing smart usercentered services [1, 2]. The hardware miniaturization of IoT devices resembles the two sides of a coin: it empowers IoT devices to communicate via ultra lowpower radios while making communication links vulnerable to malicious invasions. Since onbody IoT devices are generally attached to users’ bodies to continuously record finegrained vital signs, security breaches of these devices pose a serious threat to users’ everyday privacy and safety [3].
Growing attempts and extensive endeavors have been devoted to thwarting malicious masqueraders for hardwareconstrained wearable devices. It has been shown that radio channel characteristics in body area networks (BANs) can be exploited to perform device authentication [4]. Recent efforts have also leveraged dedicated sensors [5, 6, 7, 8], such as accelerometers and gyroscopes, to verify wearable devices. However, hardly any of them have achieved widespread acceptance. They limit themselves to either special motion scenarios [4], or fitness related wearables [5, 6, 7, 8]. To embrace the coming wave of humancentric IoT, it is critical for a device authentication solution to support various onbody IoT devices under diverse user motions.
The salient physical layer (PHY) signatures underlying different BANs present us with an exciting opportunity. In BAN channels, offbody signals are mainly comprised of lineofsight (LOS) and multipath components, while onbody signals are governed by creeping waves [9, 10]. The distinct radio propagation patterns potentially enable a general security solution relying on prevalent wireless chips. However, radio signals in BAN channels are severely affected by IoT users’ body motions. As a consequence, on and offbody signals can exhibit significantly different patterns under a specific user motion, and their patterns tend to vary dramatically across multiple motion states. Furthermore, users’ frequent motion changes in daily life make it a highly challenging task to manually select features to represent propagation patterns from realworld radio traces.
To address this challenge, we propose a motion invariant authentication framework for onbody IoT devices. The proposed system performs device authentication by exploiting BAN radio signatures in two steps. In the first step, our system abstracts representative time and frequency features from noisy received signal strength (RSS) segments to characterize finegrained radio propagation characteristics. In the second step, to learn robust feature representations from abstracted radio features, an adversarial multiplayer network is customized to effectively remove motion specific features and thereafter accurately recognize the identities of IoT devices. To achieve this goal, during training, an adversarial training criterion is implemented, which leads to the emergence of transferable features that generalize well in unseen motion states. We implement a working prototype of our system on universal software radio peripheral (USRP) devices and conduct experiments with various body motions in different realworld environments. Experimental results show that our system achieves an authentication accuracy of 90.4% on average.
The main contributions of this work are summarized as follows.

We propose a general authentication system that supports various onbody IoT devices under diverse body motions. The crux of the proposed system is to construct reliable radio propagation profiles from RSS segments and to develop an adversarial network to essentially identify IoT devices based on underlying propagation patterns.

We theoretically analyze our adversarial multiplayer network and demonstrate that at equilibrium, the learned feature representation contains all information about BAN radio propagation patterns, and becomes invariant to user body motions.

We build a prototype of our system on USRP platform and conduct extensive experiments with various frequently appearing body motions in a variety of indoor and outdoor environments. The experimental results demonstrate the effectiveness and generalizability of our system.
Ii Exploiting Distinct Radio Propagation Patterns between On and OffBody Channels
Since the human body is basically a lowloss dielectric at microwaves frequencies, including WiFi and Bluetooth frequency bands, radio propagation between a transmitter (Tx) and a receiver (Rx) carried by a user is significantly influenced by the user’s body. As shown in Fig. 1, offbody links are dominated by LOS and multipath propagations. On the other hand, onbody links are governed by creeping waves, which are diffracted by human tissues and spread out along the human body [10]. Previous measurements [9] indicate that creeping waves are rarely disturbed by multipath fading (smallscale fading) or largescale fading caused by TxRx distance changes or shadowing, but are largely influenced by body motions. Thus, we see that distinct propagation patterns exist between on and offbody signals.
Fig. 2
depicts the RSS and the cumulative distribution function (CDF) of different BAN signals that were collected in standing and walking states, respectively. In the standing state, we observe that onbody signals are more stable in the time domain and offbody signals contain more components in the high frequency band. In the walking state, onbody signals have a larger RSS variance and fall into a low frequency range with a very high probability. The experimental observations verify that differentiable propagation patterns exist in on and offbody channels in each motion state. This supports our premise that we can rely upon PHY signatures to authenticate various onbody IoT devices.
Iii Adversarial Network Based Device Authentication
Iiia Design Rationale
It is, however, nontrivial to reliably capture propagation patterns from realworld radio traces. As shown in Fig. 2, although on and offbody signals show distinguishable propagation patterns in each case, their patterns are remarkably different between the two cases. Consequently, an authentication model that is trained under a specific user motion will typically not generalize well in different motion scenarios.
To deal with such dilemma, we resort to adversarial networks, which have recently surfaced as a popular tool to discover transferable features in the deep learning field and have proven their advantages in many realworld applications
[11, 12, 13]. Being a branch of deep learning approaches, adversarial networks facilitate automatic extraction of complex and latent feature representations by adopting a hierarchical structure [14]. Different from traditional approaches, they have the ability to find and exclude irrelevant features in the learned representations with an adversarial training criterion.Therefore, we can reap the benefits of adversarial networks to recognize underlying on and offbody propagation patterns. In our application, a customized adversarial network can be leveraged to autonomously extract feature representations about BAN radio propagation patterns and selectively eliminate motion specific features from the representations. To this end, we propose an adversarial network based security system to seamlessly authenticate various onbody IoT devices.
IiiB Design Overview
Our system takes advantage of an adversarial network to extract distinct radio propagation patterns for onbody device authentication. Fig. 3 illustrates the framework of our authentication system. It takes as input RSS time series and outputs the corresponding device authentication results. It is worth noting that to verify RSSs from various lowend embedded IoT devices, our authentication system locates at users’ smartphones, which have sufficient capability to perform lowlatency and accurate learning based inferences [15].
The core of our authentication system includes two components – Propagation Profile Characterization and
Propagation Pattern Recognition
.
Propagation Profile Characterization. This component first divides RSS time series into multiple basic segments. Then, radio features are extracted from both the time and frequency domains of RSS segments for finegrained characterization of potential propagation patterns. Finally, the extracted features are integrated into radio propagation profiles for future pattern recognition by the adversarial network.

Propagation Pattern Recognition. Upon receiving a propagation profile, the adversarial network first utilizes a functional block to abstract a feature representation in terms of on and offbody propagations. Subsequently, the model infers the identity of a connected IoT device through an onoff prediction block. Moreover, an adversary block is added to eliminate motion specific features in the feature representation in the training phase. All blocks are learned through an adversarial training process to promote the emergence of features that are resilient to motion changes.
IiiC Propagation Profile Characterization
Signal Segmentation. Our system first partitions RSS measurements into multiple segments. As an RSS segment is the basic unit for device authentication, the segment interval needs to be carefully determined. If the interval is too long, on and offbody signals will be probably both included in a same segment. If it is too short, the system will be unable to recognize any segment. We empirically find that a time interval of 5s is capable of correctly differentiating over of on and offbody IoT devices.
Time Domain Feature Extraction.
Since on and offbody signals have different levels of impact from body motions, large and smallscale fading, we first decompose each RSS segment into multiscale variations by using filters. As creeping waves are sensitive to body motions and their frequencies fall into relatively low frequency bands with a high probability [16], a bandpass filter is leveraged to extract motioninduced variations. Based on our experimental observations, most fluctuations caused by body motions fall between 0.5 Hz and 15 Hz. Variations in the residual low and high frequency bands are also extracted by a lowpass filter and a highpass filter, respectively, as large and smallscale variations.With multiscale variations, we select six time domain features, including maximum, minimum, median, variance, kurtosis and skewness
, to characterize propagation signatures from each kind of variations. The maximum, minimum, median and variance are chosen to describe the impact from the human body, because dramatic body vibration typically contributes to rapid changes in the maximum, minimum and median and also results in a large variance. Kurtosis and skewness show the symmetry and asymmetry of radio signals, respectively, and can potentially capture propagation patterns due to the fact that both symmetric and asymmetric components are richly shared in radio waves. For finergrained feature extraction, we divide each kind of variations into ten chunks and extract six features from each chunk. Therefore, a total of 180 feature points are extracted to describe radio propagation signatures from the time domain of an RSS segment.
Frequency Domain Feature Extraction.
To abstract frequency domain features, we start by performing ShortTime Fourier Transform (STFT) on each RSS segment to obtain its twodimensional spectrogram. Specifically, with a signal sampling rate of 500 Hz, we conduct a 1000point Fast Fourier Transform (FFT) within a 2s sliding window, shifting 1s each time to make full use of sampling data. To summarize information in the frequency domain, the frequency band of each spectrogram, i.e., [0,250] Hz, is partitioned into 40 intervals, each of which is associated with a frequency component of the segment. To effectively indicate propagation signatures, we equally segment the low frequency band, i.e., [0,15] Hz, into 30 intervals and the residual high frequency band into 10 intervals, and we sum up the magnitudes in each interval in every FFT result. In this way, we transform a twodimensional spectrogram into a 4
40 matrix . Then we take two frequency domain features from : the component magnitude (or each element in ) and the proportion of each component (PC), such that , where . Finally, a total of 200 feature points are extracted from the frequency domain of an RSS segment.IiiD Propagation Pattern Recognition
We formulate the propagation pattern recognition as a binary classification task, where is the sample space and is the target label set. In our context, each is a radio propagation profile sample, and indicates the corresponding on or offbody IoT device. Moreover, for each , denotes an auxiliary label that refers to the body motion that is sampled from.
We develop an adversarial multiplayer network for propagation pattern recognition. As depicted in Fig. 4, our model encompasses three blocks – a Feature Extractor , an OnOff Predictor and a Motion Discriminator . Since simply wiping out all dependencies between feature representations and domains (e.g., motions in our application) could degrade the accuracy of target label prediction, our model adopts a conditional adversarial architecture[12] for better generalization performance.
Feature Extractor . An extractor is the front block of the adversarial model. It takes as input a propagation profile and returns a latent feature representation as .
OnOff Predictor . A predictor acts as the end block of the model. It takes as input a learned feature representation
and outputs a twodimensional probability vector
in terms of on and offbody devices.Motion Discriminator . A discriminator serves as an adversary in our model in the training phase. It takes a feature representation and the associated onoff probability vector together as input, and discriminates which motion state is sampled from as .
Adversarial Training. Before performing authentication, our adversarial model needs to be trained on training data, which follows the distribution . We define the loss of as the crossentropy between and the true posterior target label distribution over , which is given as
(1) 
Similarly, the loss of is defined as the crossentropy between and the true conditional distribution over , which is expressed as
(2) 
Note that to effectively learn parameters of our multiplayer model, the flow from to is a oneway link (i.e., the black arrow line in Fig. 4), along which gradients don’t propagate back. Thus, the parameters of are not updated through the optimization of the loss .
To robustly authenticate IoT devices under diverse body motions, it is critical for our model to implement an adversarial training criterion. The basic idea is that to generalize well in unseen scenarios, a predictive model must discriminate well between on and offbody devices, but it cannot distinguish body motions associated with input samples. To achieve this goal, we use minimax games between , and in the training phase. Particularly, plays a cooperative game with to minimize the loss . At the same time, and together play a minimax game, where aims to minimize the loss and tries to maximize it.
We integrate the above objectives into one value function:
(3) 
where
is hyperparameter. With the value function (
3), the adversarial training criterion can be implemented by optimizing the following minimax problem:(4) 
IiiE Theoretical Analysis of Adversarial Model
We prove that the output of our adversarial model becomes invariant to motion changes through the adversarial training. Specifically, we first present the optimal predictor and optimal discriminator in Proposition 1 and Proposition 2, respectively, without proving them, and refer the reader to [12] (Proposition 2) for details. Then, we illustrate the virtual training criterion, optimal extractor and optimal output, respectively, in Corollary 1, Proposition 3 and Corollary 2. Differing from the theoretical efforts in the prior work [12], our analysis focuses on a practical adversarial model.
Proposition 1
(Optimal predictor) For a fixed extractor , the output of the optimal predictor over achieves
(5) 
and the loss of is
(6) 
where denotes the conditional entropy function.
Note that given , the equality (5) indicates the maximal predictive capability that a predictor can learn from .
Proposition 2
(Optimal discriminator) Given any extractor and any predictor , the optimal discriminator over obtains
(7) 
and its loss is
(8) 
With the optimal predictor and optimal discriminator, we proceed to simplify the minimax training criterion (4).
Corollary 1
(Virtual training criterion) If and have enough capacity and are trained to be optimal over , the minimax optimization (4) is equivalent to the minimization of a virtual value function , which is expressed as
(9) 
Proof:
Considering the optimal predictor in Proposition 1, we can rewrite the loss of the optimal discriminator in Proposition 2, by substituting (5) into (8), as
(10) 
According to the losses of the optimal predictor (6) and optimal discriminator (10), the initial value function (3) can be simplified as the virtual version (9). Thus, optimizing the minimax optimization (4) equals to minimizing .
Then, we obtain the optimal extractor by minimizing .
Proposition 3
(Optimal extractor) If , and have enough capability and are trained to be optimal over , any optimal extractor satisfies
(11) 
and
(12) 
Proof:
When is fixed, and . Therefore, we obtain a lower bound of , that is
(13) 
Since the bound is achieved if and only if both the conditions (11) and (12) hold, proving that any optimal extractor satisfies (11) and (12) is identical to proving the equality .
We note that the lower bound is achievable by considering a special case, where , an extractor with the best representative ability. In this case, we can check that
(14) 
Remark 1
Proposition 3 indicates that when all blocks are trained to be optimal and our adversarial model reaches equilibrium, the extractor is able to extract all information about from the training samples and eliminate any information about except what is also related to .
Corollary 2
(Optimal output) If , and have enough capacity and are trained to be optimal over , the output of our adversarial model achieves
(15) 
and
(16) 
Iv Evaluation in Real Environments
Iva Experimental Methodology
Implementation. We build a proofofconcept prototype of the proposed system with three GNURadio/USRP B210 devices, which work at 2.4 GHz with a sampling rate of 500 Hz. Furthermore, two USRP devices are placed on a volunteer, referred to as a legitimate user, and are considered to be two onbody devices. The left device is situated on another volunteer, referred to as a malicious attacker, and is regarded to be an offbody device.
Data Collection. We collect radio traces in both controlled and uncontrolled user motion scenarios. In the controlled scenario, the user is confined to five frequently appearing motions, which are comprised of two static motions, sitting and standing, and three dynamic ones, arm moving, rotating and walking. In the uncontrolled scenario, the user is permitted to behave casually. In both scenarios, the attacker is allowed to move freely in the vicinity of the user to try to fool the legitimate devices. Moreover, to verify the robustness of our system under various environments, we collect wireless signals in five indoor and outdoor settings, i.e., a lab, a meeting room, a corridor, a rooftop and a park. We conduct the experiments over seven days and collect a total of ten hours of radio traces.
Dataset. Our dataset includes a total of 7200 samples that are extracted from collected radio traces. Therein, 6000 samples are from the controlled user motion scenario, and 1200 are from the uncontrolled scenario. When evaluating our model, we randomly take out 4800 samples from the controlled scenario for training and combine the leftover 1200 ones and all 1200 samples from the uncontrolled scenario for testing. Additionally, in both the training and testing sets, the numbers of on and offbody samples are equal.
Parameterization. As shown in Fig. 4
, we parameterize our multiplayer model as a deep neural network. Specifically, the feature extractor
is a convolutional neural network with eight convolutional layers to abstract latent feature representations from input samples. Furthermore, the onoff predictor
and the motion discriminator are configured with three fullyconnected layers to facilitate their own predictions.Evaluation Metrics. We use the following metrics to illustrate the performance of our system.

Accuracy. It is computed as the ratio of the number of RSS segments that are correctly recognized to the total number of on and offbody RSS segments.

True positive (TP) rate. It is denoted as the ratio of the number of onbody RSS segments that are correctly predicted to the total number of onbody segments.

False positive (FP) rate. It is defined as the ratio of the number of offbody RSS segments that are mistakenly accepted to the total number of offbody segments.
IvB Performance Results
We first illustrate the overall performance of our authentication system on all testing data. As shown in Table I, our system is able to identify 90.4% of on and offbody devices on average. Specifically, it can correctly recognize onbody devices with a ratio of 89.0% and successfully mitigate 91.8% of attacks from offbody devices. In addition, we report the receiver operating characteristic (ROC) curve of our system, which depicts the tradeoff between FP and TP rates by varying their discrimination threshold in the interval . As depicted in Fig. 7, the system’s ROC curve first goes straight up and then becomes steady promptly as FP rate increases. Moreover, the area under the ROC curve (AUROC) reaches 0.958, which is close to 1, i.e., the AUROC of the ideal case. The above results indicate that our system achieves good authentication ability.
Accuracy  TP Rate  FP Rate 

90.4% 1.9%  89.0% 2.4%  8.2% 1.7% 
Then, we elaborate on the authentication performance for each frequently appearing motion. In general, each motion has a unique movement pattern of the human body, and thus exhibits different effects on BAN radio waves. As plotted in Fig. 7, the proposed system achieves better performance for the static motions than for the dynamic ones. The same observations are also shown in Fig. 7. Therein, higher TP rates and lower FP rates are clearly present in the static states, because there are fewer disturbances caused by body movements in radio signals when the user sits or stands still with IoT devices, which makes it much easier for the system to recognize on and offbody propagation patterns. Despite the above differences, the system still achieves average TP and FP rates of 90.8% and 6.9%, respectively, in the controlled user motion scenario.
Next, we compare the system performance in the uncontrolled scenario with that in the controlled one. As illustrated in Fig. 10
, the system shows performance degradation in each metric in the uncontrolled scenario. The reason for the degradation is that more irregular and complicated body movements are present when the user behaves casually, which causes the extractor to extract more noisy features and thus hampers the prediction ability of the predictor. More specifically, the system has a TP rate reduction of 4.0% and a FP rate increase of 2.6% for uncontrolled motions. This is due to the fact that, compared with offbody signals, onbody signals, dominated by creeping waves, are more sensitive to user motion dynamics, which results in more onbody RSS segments to be mistakenly classified as offbody ones.
We further illustrate the benefits of adopting an adversarial discriminator in our multiplayer model. Our discriminator aims at helping the extractor to discover transferable features and thus boosts the generalization ability of the predictor. To illustrate these merits, we set up a version of our model with a nonadversarial discriminator as a baseline. Note that in the baseline, the update of the extractor’ parameters relies solely on the minimization of the predictor’s loss.
Fig. 10 plots the training losses of discriminators in our and baseline models. The loss of the nonadversarial discriminator declines quickly and then stabilizes at a very low level. However, ours first fluctuates dramatically and finally converges to a high value. This is due to the fact that at the beginning, the fluctuations of the adversarial loss are incurred by its minimax optimization, and they mitigate gradually as motion specific features irrelevant to the predictor fade out in the feature representation. The above observations reveal that the extractor in our model abstracts more transferable features than that in the baseline. Furthermore, comparing the performance of two predictors in Fig. 10, we see that both loss curves decrease at first and then increase after certain numbers of iterations. However, the adversarial curve rises up at a lower speed than the nonadversarial one, which suggests that our adversarial discriminator works as a regularizer for alleviating overfitting and enables the promotion of the predictor’s generalization ability.
V Related Work
Dedicated sensors, including accelerometers [5], bioimpedance sensors [7], motion sensors [17] and capacitive touch sensors [8], have been used to differentiate on and offbody devices. Additionally, various sensors in smartphones [18, 19] have been also exploited to identify devices or users. However, sensorbased approaches limit themselves to specified user motions or fitness related wearables.
Existing measurements [9, 10] have shown that essential differences exist between on and offbody radio propagations. Based on the above studies, radio propagation characteristics were examined to identify legitimate wearable devices [20]. In comparison to the prior work, our work develops a customized adversarial network to essentially extract underlying propagation patterns and obtains a better generalized authentication performance in various motion scenarios.
Vi Conclusion
This paper presents a motion invariant authentication system to secure onbody IoT device pairing and data transmission by harnessing an adversarial multiplayer network to effectively recognize underlying radio propagation patterns. Our system takes one step forward to embrace the advent of humancentric IoT by supporting various wearable devices under diverse user motions. Our theoretical analysis indicates that at equilibrium, our adversarial model is resilient to motion variances. We extensively evaluate the proposed system with various static and dynamic user motions in indoor and outdoor settings. The results shows that our system can recognize 89.0% of legitimate devices while at the same time mitigating 91.8% of impersonation attack attempts.
Acknowledgement
The work was supported in part by the NSFC under Grant 61871441, 91738202, 61729101, the RGC under Contract CERG 16203215, Young Elite Scientists Sponsorship Program by CAST under Grant 2018QNRC001, National Key R&D Program of China under Grant 2017YFE0121500, the Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space (Nanjing Univ. Aeronaut. Astronaut.), MIIT, China under Grant KF20181911.
References
 [1] D. Uckelmann, M. Harrison, and F. Michahelles, “An architectural approach towards the future Internet of Things,” in Springer Architecting the Internet of Things, 2011, pp. 1–24.
 [2] L. Mainetti, V. Mighali, and L. Patrono, “An IoTbased usercentric ecosystem for heterogeneous smart home environments,” in Proc. IEEE ICC, 2015, pp. 704–709.
 [3] S. Gollakota et al., “They can hear your heartbeats: noninvasive security for implantable medical devices,” in Proc. ACM SIGCOMM, vol. 41, no. 4, 2011, pp. 2–13.
 [4] L. Shi et al., “BANA: body area network authentication exploiting channel characteristics,” IEEE J. Sel. Areas Commun., vol. 31, no. 9, pp. 1803–1816, 2013.
 [5] G. Revadigar et al., “Accelerometer and fuzzy vaultbased secure group key generation and sharing protocol for smart wearables,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 10, pp. 2467–2482, 2017.
 [6] W. Xu et al., “Gaitkey: A gaitbased shared secret key generation protocol for wearable devices,” ACM Trans. Sensor Networks, vol. 13, no. 1, p. 6, 2017.
 [7] C. Cornelius et al., “A wearable system that knows who wears it,” in Proc. ACM MobiSys, 2014, pp. 55–67.
 [8] T. Vu et al., “Distinguishing users with capacitive touch communication,” in Proc. ACM MobiCom, 2012, pp. 197–208.
 [9] F. Di Franco et al., “Onbody to onbody channel characterization,” in IEEE Sensors J., 2011, pp. 908–911.
 [10] J. Ryckaert et al., “Channel model for wireless communication around human body,” IET Electronics Letters, vol. 40, no. 9, pp. 543–544, 2004.

[11]
Y. Ganin et al., “Domainadversarial training of neural networks,”
MIT press Journal of Machine Learning Research
, vol. 17, no. 1, pp. 2030–2096, 2016.  [12] M. Zhao et al., “Learning sleep stages from radio signals: A conditional adversarial architecture,” in Proc. ACM ICML, 2017, pp. 4100–4109.
 [13] Y. Shinohara, “Adversarial multitask learning of deep neural networks for robust speech recognition,” in Interspeech, 2016, pp. 2369–2372.
 [14] I. Goodfellow et al., Deep learning. MIT press, 2016, vol. 1.

[15]
“MobileNetV2: The Next Generation of OnDevice Computer Vision Networks,” Google Research, April 3, 2018. [Online]. Available:
https://ai.googleblog.com/2018/04/mobilenetv2nextgenerationofon.html  [16] Y. Xiong and F. Quek, “Hand motion gesture frequency properties and multimodal discourse analysis,” Springer International Journal of Computer Vision, vol. 69, no. 3, pp. 353–371, 2006.
 [17] W. Xu et al., “Walkietalkie: Motionassisted automatic key generation for secure onbody device communication,” in Proc. ACM/IEEE IPSN, 2016, pp. 1–12.
 [18] Y. Ren et al., “Smartphone based user verification leveraging gait recognition for mobile healthcare systems,” in Proc. IEEE SECON, 2013, pp. 149–157.
 [19] A. Das, N. Borisov, and M. Caesar, “Do you hear what I hear?: Fingerprinting smart devices through embedded acoustic components,” in Proc. ACM CCS, 2014, pp. 441–452.
 [20] W. Wang et al., “Securing onbody IoT devices by exploiting creeping wave propagation,” IEEE J. Sel. Areas Commun., vol. 36, no. 4, pp. 696–703, 2018.
Comments
There are no comments yet.