Harvesting Ambient RF for Presence Detection Through Deep Learning

02/13/2020 ∙ by Yang Liu, et al. ∙ 2

This paper explores the use of ambient radio frequency (RF) signals for human presence detection through deep learning. Using WiFi signal as an example, we demonstrate that the channel state information (CSI) obtained at the receiver contains rich information about the propagation environment. Through judicious pre-processing of the estimated CSI followed by deep learning, reliable presence detection can be achieved. Several challenges in passive RF sensing are addressed. With presence detection, how to collect training data with human presence can have a significant impact on the performance. This is in contrast to activity detection when a specific motion pattern is of interest. A second challenge is that RF signals are complex-valued. Handling complex-valued input in deep learning requires careful data representation and network architecture design. Finally, human presence affects CSI variation along multiple dimensions; such variation, however, is often masked by system impediments such as timing or frequency offset. Addressing these challenges, the proposed learning system uses pre-processing to preserve human motion induced channel variation while insulating against other impairments. A convolutional neural network (CNN) properly trained with both magnitude and phase information is then designed to achieve reliable presence detection. Extensive experiments are conducted. Using off-the-shelf WiFi devices, the proposed deep learning based RF sensing achieves near perfect presence detection during multiple extended periods of test and exhibits superior performance compared with leading edge passive infrared sensors. The learning based passive RF sensing thus provides a viable and promising alternative for presence or occupancy detection.



There are no comments yet.


page 2

page 3

page 4

page 5

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Presence detection plays a key role in improving operation efficiency and reducing carbon footprint for office and residential buildings. The use of occupancy information in controlling HVAC and lighting systems has become increasingly prevalent. Existing methods for human presence detection include passive infrared (PIR), microwave, CO, and wearable sensors, and cameras [26], among others. Microwave sensors are overly sensitive as they tend to have frequent false alarms, e.g., detecting movements outside of intended coverage areas. CO sensors have a slow response time and a high cost barrier. Cameras raise privacy concerns and are sensitive to lightning conditions. Wearable sensors/devices can be intrusive or cumbersome for users. PIR sensors are the most widely deployed method for presence detection. PIR sensors pick up infrared emission using its on-board pyroelectric sensor and detect movement of humans (or objects) through heat variation within the field of view. Its drawback is its low sensitivity and limited coverage and it is mostly used for isolated lighting control.

This paper explores the use of RF signals for presence detection. In particular, we use WiFi signals in the current work given its ubiquity in almost all indoor environment. Current and future WiFi systems (i.e., the upcoming WiFi-6) employ multiple-input and multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) at the physical layer. As such, the CSI contains rich information about the ambient environment in spatial, temporal, and frequency domains.

Exploiting ambient RF signals for detecting, localizing, tracking, and identifying human motion/activities have been extensively studied in the literature [4, 10, 19, 33, 21, 31, 35, 15, 17, 28, 5, 27, 7, 32, 16, 23, 25, 34, 29, 18, 37, 22, 8, 20, 36]. Early work for indoor RF sensing mainly relies on received signal strength indicator (RSSI) [4, 10, 19]. RSSI measures instantaneous attenuation of RF signals at the receiver and its temporal variation can be associated with motion/activities of humans/objects. Recently, more fine-grained features such as CSI have been used for RF sensing. For example, different human activities (e.g., running, walking and eating) or locations can be recognized by analyzing their unique effect on the CSI [28, 5, 27, 7, 32, 16, 23]. Another interesting application is gesture recognition; the SignFi [18]

system uses CSI extracted from WiFi signals to classify

sign gestures with high accuracy.

There is an important distinction between presence detection and activity detection (e.g., sign language [18] or fall detection [21]

). For the detection of particular activities, one can use a model based approach - certain activity will impose an identifiable signature on RF propagation thus hand-crafted features extracted from received signals can be exploited 


. Alternatively, a data driven approach can be used where collected training data are fed to machine learning algorithms (e.g., a neural network) which learn to discriminate different states (labels) through training. For presence detection, however, there is no defined activities when humans are present thus a model based approach is typically not adequate. While a data driven approach appears to be a natural choice, it is unclear

a priori what would be the best way to collect training data for presence detection. Perhaps the only reasonable assumption that one can make for presence detection is that humans are not expected to be completely still for an extended period of time. We comment here that there exist studies that detect human presence using RF signals through either carefully calibrating the human free environment  [25, 34] or breathing detection [29]. Their performance however is highly sensitive to environment change (e.g., room change, furniture move) or human locations (as in the case of breathing detection).

There is prior work on presence detection using CSI. The FreeDetector system [37]

achieves occupancy detection by computing the temporal similarity of CSIs across frequencies; however, it can only detect walking across line of sight between the transmitter and the receiver. The PADS and R-TTWD systems in

[22] and [36]

utilize support vector machine (SVM) to detect motion; the inputs to the SVM come from CSI time series after dimensionality reduction through principal component analysis. Many existing approaches discard phase information of CSI as it is typically noisy due to either estimation error or inherent impediments such as carrier frequency offset (CFO) and sampling time offset (STO). For example, in 

[8, 20], temporal cross of CSI amplitude is used since motions tend to decrease correlation of CSI in time.

This paper proposes a WiFi CSI based presence detection system consisting of pre-processing for data representation, a CNN for motion detection, and post-processing for the eventual presence detection. The rationale of choosing CNN is its ability to exploit CSI variation in multiple dimensions thanks to the MIMO-OFDM waveforms at the physical layer. In contrast, while recurrent neural network (RNN) can also be used for learning variation pattern of CSI over time, it introduces more computational overhead compared with CNN and the long-term memory offered by RNN provides no meaningful gain in this application given that channel correlation in time and frequency both diminishes as distances increase. With multi-layer perceptron (MLP), on the other hand, local features, i.e., change in correlation in both temporal and frequency domains, are not explicitly explored hence MLP typically requires much deeper networks to achieve similar performance.

I-a Summary of Contributions

A new CNN architecture is proposed that separately processes the magnitude and phase information with two independent CNNs before combining the respective outputs as input to fully connected (FC) layers. Such a CNN architecture provides an important alternative to existing approaches when handling complex input (e.g., in simply stacking up real/imaginary components or magnitude/phase components). This allows for different pre-processing of CSI magnitude and phase which are critical in exploiting motion induced CSI variation in the presence of various channel and hardware impairments.

Pre-processing of CSI estimate is carefully designed where spatial, temporal, and frequency domain information is exploited in a holistic manner. The pre-processing takes into consideration how human movement affects CSI while insulating against unintended distortion in RF circuitry (e.g., CFO/STO). Fourier transform is used to localize important motion-induced features in the constructed image, making it more amenable for presence detection as CNN builds its ability for discriminating data through local features (i.e., small kernel size).

An important contribution of the present work is extensive test implemented using commercial off-the-shelf (COTS) WiFi devices. Assuming that humans are not completely still for an extended period of time, our presence detection compares much more favorably against that of commercial PIR sensors. We note that the comparison is done without customized WiFi hardware. For example, we have found that when using USRP systems [3], which we have full control on sampling frequency and gives a much cleaner CSI estimate, performance can improve substantially over COTS WiFi receiver.

I-B Organization and Notation

Section II describes the MIMO-OFDM waveforms and how human movement impacts wireless channels. The design of the sensing system including pre-processing, the proposed CNN architecture, and post-processing is described in Section III. Experiment setup and the corresponding test results are provided in Section IV followed by conclusion in Section V.

Scalars are denoted by either lower or upper case letters, e.g., and . Column vectors and matrices are denoted by lower and upper case bold letters, e.g., and . The -th entry of a vector , the -th entry of , and the th entry of a -D array are denoted by , , and , respectively. and are used to denote the -th column and the -th row of . Similarly, represents the -D matrix at depth of the -D array . denotes the discrete Fourier transform (DFT) hence the inverse DFT. and denote magnitude and phase of a complex number .

Ii MIMO-OFDM System model

Ii-a Mimo-Ofdm

Consider a MIMO-OFDM system with transmit antennas, receive antennas, and subcarriers. Each physical layer frame consists of OFDM symbol blocks. Denote by the -th frequency domain OFDM symbol vector in the -th frame sent by the -th transmit antenna. The discrete-time complex baseband signal corresponding to is given by . At the receiver, after cyclic prefix removal and applying DFT, the complex baseband sample at the -th receive antenna in the frequency domain can be expressed as, for ,


where is the channel noise and the channel coefficient from the -th transmit antenna to the -th receive antenna on the -th subcarrier, assumed to be a constant within one frame (i.e., for all OFDM blocks within the -th frame). Expressed in vector form, we have

where is the channel matrix corresponding to the -th subcarrier for the -th OFDM frame and .

Ii-B Effect of Human Motion on MIMO-OFDM Channel

Human motion leads to CSI variation in both the frequency (across subcarriers) and temporal (across frames) domains. Human presence and movement introduce new paths whose delays are affected by human locations, leading to change in path delay profile. This results in the change in channel frequency response. Human movement also induces temporal CSI variation (i.e., signals for paths affected by humans may add in-phase and out-of-phase at the receiver depending on the locations of humans). An alternative interpretation is the increase of Doppler spread due to human movement in an otherwise static environment, leading to time-selective channel fading [24]. An example using real WiFi measurement of the variation of over frame index with and without human movement for fixed and is shown in Fig. 1 for four evenly spaced subcarrriers. Clearly, with movement, channel variation both in frequency (across subcarriers) and in time (along the horizontal axis) increases.

The effect of human movement on the CSI in the spatial dimension is more subtle. The human motion induced CSI variation in the temporal and frequency domains applies to every transmit-receive antenna pair. The fact that multiple transceiver pairs (a.k.a, spatial diversity) exist in the MIMO-OFDM system should be exploited for enhanced sensing performance. CNN is a natural choice for exploiting such spatial diversity by mapping temporal-frequency CSI corresponding to each transceiver pair to a layer (‘channel’) in a CNN architecture, much like the way colored images are processed in a CNN where RGB pixels serve as separate channels.

Multiple antennas at WiFi transceivers are also exploited in this paper to make the phase information of CSI estimate much more useful for presence detection. As WiFi devices use a single oscillator for RF circuitry corresponding to different antennas, the CFO, if present, is common to all inputs at different receive antennas. Similarly, sampling is driven by a single clock, hence STO is also identical for all inputs at different receive antennas. Thus instead of using the raw CSI phase measurement, one can use phase differences between receive antennas to remove phase variation due to CFO and STO. While such processing has no effect on digital communication performance (e.g., it does not correct residual CFO/STO for each receive chain for the purpose of symbol detection), it cleans up the phase information when phase variation due to human movement is of interest. An example of phase differences is given in Fig. 2 with where the CSI phase from the first antenna serves as a reference. Clearly, phase differences stay relatively stable in a human-free environment. In contrast, motion from human introduces significant fluctuation to the relative phases across three receive antennas.

(a) human-free environment
(b) human movement
Fig. 1: CSI magnitude variation over time for four evenly spaced subcarriers.
(a) human-free environment
(b) human movement
Fig. 2: CSI phase difference between antennas over time

Figs. 1 and 2 indicate that both magnitude and phase of estimated CSI contain rich information about human motion. While in theory, deep learning trained using labeled data appears to be a straightforward exercise, the challenge is that for presence detection, there is no clearly defined human motion that one tries to detect. As such, collecting labeled training data with human presence needs to be carefully addressed along with the design of the learning system for presence detection.

Iii System design

A high level description of the proposed system is depicted in Fig. 3. Consecutive CSIs are first arranged into CSI magnitude and phase images. They are processed separately and fed into the CNN learning block comprised of two parallel CNNs - one for magnitude images and the other for phase images - followed by FC layers (c.f. Fig. 7 and Section III-B). The post-processing block accumulates instantaneous detection results provided by the CNN and output the final presence detection depending on the required time resolution.

Fig. 3: system flowgraph

Iii-a Input Pre-processing

Recall that is an array consisting of MIMO channel matrices across all subcarriers for the -th frame. Instantaneous motion detection is based on collected over consecutive frames, denoted as . Here we assume without loss of generality (WLOG) the first CSI array has frame index . For each , we select evenly spaced subcarriers out of subcarriers, resulting in with size . Down selection of subcarriers significantly reduces the data dimension yet does not have any negative effect in sensing performance. This is because the carrier spacing in WiFi signals (kHz) is much smaller than the coherent bandwidth in a typical indoor (i.e. low mobility) environment, thus the behaviors of subcarriers that are immediate neighbors closely track each other with or without human motion. The resulting are subsequently stacked up along the temporal domain to form a -D array of size . The magnitude and phase information are then extracted from prior to independent pre-processing.

Iii-A1 CSI magnitude

We reshape the -D array into a -D array by combining the last two spatial dimensions, i.e., channel matrix for each subcarrier is flattened into a -D array. The obtained array, denoted by is of size .

Pre-processing involves normalization and transformation. Normalization is done to remove dependence of the absolute CSI magnitude on various environment parameters that are irrelevant to presence detection. For example, the dynamic range of is highly dependent on the distance between the transmitter and receiver and the existence of line of sight transmission. While various normalization methods can be used, we find through extensive experiments the following offers the most robust performance: for ,


where denotes element-wise division. Note that indexes OFDM frame, thus the normalization is done with respect to the first OFDM frame within the frames contained in .

Subsequently, a 2-D DFT is applied to along the temporal (frame) and frequency (subcarrier) dimensions, resulting in the output array for each transceiver antenna pair:

Here the DFT output is properly shifted so that zero frequency component is at the center of the array. The use of 2-D DFT serves two purposes. First, human motion induced temporal variation of CSI is continuous in nature. As such, it results in dispersion in the lower frequency region along the temporal dimension. This is in contrast to hardware impairment and channel estimation error when sudden change of CSI may be observed irrespective of human presence. Therefore, high frequency change can be removed by simple cropping of the DFT output along the temporal dimension around zero frequency:


where , and is the cropping window size. Here we assume WLOG that both and are even numbers. Cropping also significantly reduces the image size, leading to faster learning and reduced storage requirement. This makes the learning suitable to be implemented on edge devices instead of having to resort to cloud services. Note that further reduction of image size can be achieved by utilizing the conjugate symmetry of the -D FFT due to the fact that input to the FFT is real-valued (i.e., magnitude of CSI arrays).

Another reason of using 2-D DFT is its ability to localize motion related CSI variation. While temporal variation in is exhibited for the entire frames, 2-D DFT concentrates such variation into the low frequency region. This is particularly suitable for CNN given its ability to build discriminating ability on local features. Figs. 3(a) and 3(b) provide a sample of collected in the same room without and with human motions. One can see that, in the temporal dimension, the -D DFT using data collected in an empty room is dominated by the DC component. With human movement, there is clearly dispersion at low frequency region in the temporal (horizontal) dimension.

(a) human-free environment
(b) human movement
Fig. 4: 2D DFT of CSI magnitude along frame and subcarrier
(a) human-free environment
(b) human movement
Fig. 5: DFT of CSI phase difference at a fixed subcarrier
(a) human-free environment
(b) human movement
Fig. 6: DFT of CSI phase difference along time at all subcarriers

Iii-A2 CSI phase

Even with a completely static environment, the estimated CSI phase will undergo variation (e.g., from residual CFO and STO) which may lead to abrupt changes within . This can be partially resolved by phase unwrapping which removes such abrupt changes. However, phase unwrapping does not remove phase variation introduced by any residual CFO and STO offset, but merely correct discontinuous phase jumps. Thus CSI phases are often discarded for WiFi sensing [20, 8, 36] because of this “noisy” nature.

However, a simple pre-processing that computes phase difference with respect to a reference receive antenna can largely mitigate this problem due to the fact that CFO and STO are common to all receive antennas (see Section II-B). Denote by the phase difference between for different


where . The last two spatial dimensions of are then flattened into one dimension and phase is unwrapped along the time axis to remove discontinuity at boundary points and . The obtained result is denoted by . Different from CSI magnitude, only -D DFT along the temporal dimension is performed on to get since phase unwrapping weakens relation of CSI phase across different subcarriers. An example of is given in Figs. 5 and 6 where significantly increased dispersion of the DFT output along the temporal dimension can be observed with human movement.

The following steps are similar to how we obtain , where we shift the zero frequency component to the center and crop out the high frequency components in the temporal domain, leading to the following CSI phase information


where is chosen to be the same as that in (3).

Iii-A3 Image Normalization

DFT typically results in increased dynamic range of and . Elements with low intensity are easily overwhelmed by those with large values. The logarithmic operator is applied to each element in both images [9] to reduce such disparity. The final input to the two parallel CNNs are

Fig. 7: Architecture of the proposed CNN

Iii-B Architecture of CNN

The architecture of the proposed CNN is shown in Fig. 7. Magnitude and phase images in (6) are fed into two parallel CNNs which share the same structure. The output of the two CNNs are then concatenated and fed to FC layers.

The building blocks of the proposed CNN are similar to those in AlexNet [13]. Each of the two parallel CNNs consists of two convolution (Conv) layers without padding. Each Conv layer is followed by an average pooling layer [14]

to reduce the output image dimension. The multi-dimensional output of the last Conv layer is flattened into vectors and subsequently fed into an FC layer. Batch Normalization (BN) 


is added after each layer that has trainable parameters, which can speed up training and make the model more robust against variations in outputs from previous layers. Two activation functions are used - rectified linear unit (ReLU) and softmax. ReLU is used for the hidden layer whereas softmax for the output layer. We note that with presence detection, the number of classes is

, i.e., it is a binary classification problem. Therefore, a sigmoid function can be used instead of softmax for the output layer. However, our experiment indicates slightly more robust classification performance using softmax - this can perhaps be attributed to the difference in weight and bias terms between the two: softmax employs two independent sets of weight vectors and biases for the two neurons whereas the sigmoid function has a single input to the neuron at the output layer. While mathematically one can show equivalence between the two for binary classification by finding the corresponding parameters, learning such parameters through training may yield some performance difference.

In the training phase, cross-entropy is chosen to be the loss function and Adam optimizer 


is used to update weights during backpropagation. To prevent overfitting, both

regularization and dropout layers with dropout probability

are added for each fully-connected hidden layer.

Iii-C Post processing

The design of the post processing block is closely tied with how data collection is conducted. As alluded in the introduction, presence detection differs with detection of certain activities in that one is not looking for a certain activity pattern but rather, whether a room is being occupied or not, assuming that occupants are not completely still for extended periods of time. As such, there are two different ways of collecting training data for the occupant state: one is to collect CSI for the entire duration when occupants are present; an alternative way is to collect CSI only when occupants are moving. While in theory the former seems to be a natural choice - what we try to detect is the presence or absence of humans in a room - doing so leads to significantly high false alarm rate regardless of how many training data are collected. The reason is quite simple: collecting training CSI data when humans are present will include many instances when humans are completely still. Such CSI samples, albeit scattered throughout the measurement data (i.e., not for extended period of time), are indistinguishable with that of an empty room. In essence, the training data corresponding to human present are polluted with a large number of data samples that are similar to that training data without human presence.

We elect to use training data corresponding to the CSI instances when there are detectable human movements in the room. While this leads to ‘missed detection’ corresponding to instances when the occupants are still, simple post processing can be done after the CNN block with tunable parameters such as the resolution with which the presence detection is desired. In short, the training is done so that the CNN attempts to reliably detect human motion of any kinds; complete still human presence thus is likely to be classified as the negative state. Post-processing then applies some averaging operation within a time window, whose duration corresponds to some desired time resolution, for presence detection. This is sufficient in practice since with a truly empty room, the CNN output should contain negative outputs whereas with human present, the output should have significant portion of positive outputs.

Iv Experiment Setup

This section describes the experiment setup where COTS WiFi cards are used to collect WiFi CSI in an indoor environment. Data collection is explained in detail and the presence detection result is compared to that using PIR sensors.

Iv-a Experiment setup

Our WiFi system consists of a laptop (Thinkpad T) as WiFi access point (AP) and a desktop (Dell OptiPlex ) as WiFi client. Atheros 802.11n WiFi chipset, AR9580, and Ubuntu 14.04 LTS with built-in Atheros-CSI-Tool [30] are installed on both computers. The AP sends packets at the rate of pkts/s, while the client is recording CSIs using Atheros-CSI-Tool, i.e., the CSI sampling interval is roughly ms. With receive antennas, transmit antennas, and subcarriers in a MHz channel operating at channel in the GHz band [1], each CSI instance is a complex valued array. Down-selecting to evenly spaced subcarriers, the resulting CSI array is of dimension .

The indoor environment in which both training data collection and testing are done is sketched in Fig. 8. Two different labs are used to understand how sensitive the developed system is to environment parameters. In both labs, there are multiple monitors/laptops on desks and multiple chairs on the floor which are not drawn in this figure since their positions may change in different days. Notice that since the transmit antennas are placed behind a laptop and the receive antenna array is surrounded by a lot of other computers as shown in Fig. 9, there is no strong line of sight component between the transmitter and the receiver.

Fig. 8: Indoor space layout
(a) Transmitter (b) Receiver (c) PIR sensor
Fig. 9: Device Setup

Iv-B Data collection

For the image input to the CNN, we choose consecutive CSI instances, which lasts for around s. This is chosen since one second is sufficiently long for any detectable human motion to induce temporal CSI variation. A CSI image is only used (i.e., considered a valid sample) if it satisfies the following two conditions: 1) Every entry of is non-zero. This is imposed to remove erroneous CSI estimate - occasionally zero entries will show up in recorded CSI series, potentially due to hardware/firmware problems. 2) The time difference between the last and the first frame lies within s. WiFi scheduling may lead to different frame lengths hence excessively long interval between two CSI estimates. In the experiment, the cropping window size is chosen to be , hence and in (6) are of size and respectively.

Data collected in the human-free state is labeled as . The training data with label are collected when at least one person is walking randomly in the room. This way, training samples collected when occupants are completely still will not be used. Both human-free and motion data are collected on multiple days since the wireless channels are inherently nonstationary. This prevents CNN from being tuned to features that are irrelevant to presence detection, e.g., different CFO and STO on different days due to frequency drift. Data collection on any given day is also divided into disjoint runs which alternate between human-free and human motion. Finally, training and test data come from completely disjoint days.

The proposed CNN is built under Keras with Tensorflow as backend 

[6]. Training and off-line testing described in Section IV-C are performed on a Linux server (Dell PowerEdge R) with one E- v CPU and GB of RAM. On-line detection described in Section IV-D is run on the WiFi receiver (Dell desktop) with one i-3770 CPU and GB of RAM.

Days Location Dates (in )
Lab I Sept.15 to Sept.22
Lab II Oct.10 to Oct.30
Lab II Nov.26 to Dec.5
TABLE I: Data Collection

Iv-C Motion Detection

We first evaluate motion detection using the proposed CNN without post-processing, i.e., CNN is trained to classify input CSI images according their labels. The CSIs were collected in days over a period of four months, as summarized in Table I. Data collected during the first three days were from Lab I, with the data for the remaining days were collected from Lab II. While training data consist of runs either labeled with (human free) or (human motion where someone is randomly walking at arbitrary speed and direction), evaluation is done in two ways: the first uses measurement from runs with single state, while second uses test runs including both human free and human motion measurement.

Iv-C1 Single State Test Runs

Model name Label 0 Label 1
days size days size
model I
model II , ,
model III ,
TABLE II: Training set composition

The CNN with parameters is trained using data from days and the resulting model is denoted by model I. The number of training data in each class is summarized in Table II. Model I is then tested on single label data from the remaining days and the results are summarized in Table III. The performance on data collected in Lab II is quite consistent (all close to ) - notice that days were about one month earlier than the time that training data were collected, thus lab settings, e.g., the placement of the transceiver and number of surrounding objects, were quite different. The results show that the proposed CNN is robust to the environment changes over time but in the same room. However, performance is not nearly as good for data collected in Lab I (days ). For example, the false alarm rate in day is which is noticeably higher than other days.

A simple remedy is to include data collected from Lab I in the training set. This leads to model II in Table III where data from day are combined with randomly chosen samples from the previous training set (see Table II). The performance of model II on day 1 exhibits noticeable improvement over model I in false alarm rate.

Iv-C2 Mixed State Test Runs

In this part of the experiment, each test run lasts minutes and is divided into five one-minute intervals. Measurement is done carefully such that each interval has the same state. Detection results of mixture runs on day , and are given in Fig. 10. We set the CSI step size to be , thus each one-minute interval contains around images. Instead of using predicted labels, probability of assigning each image to class is used to give a more refined detection performance. From Fig. 10, Model I can successfully track state change in a single run.

(a) day 1 (ground truth: )
(b) day 3 (ground truth: )
(c) day 4 (ground truth: )
(d) day 13 (ground truth: )
Fig. 10: Detection result for runs with mixed states.
Days Label 0 Label 1
size Model I Model II size Model I Model II
TABLE III: Test accuracy for models I and II

Iv-D Presence Detection

Before presenting our presence detection results conducted in Lab II, let us first examine how the CNN training with walking data performs when more subtle human motion other than walking is used for testing. These data are collected on days from Oct.20, 2019 to Oct.24, 2019 in Lab II with various small scale motion (turning in chairs, arm waving, etc.). The results are summarized in Table IV. Clearly, with training data coming from exclusively random walking for label 1, the CNN can still reliably detect other motion types.

Days Label 0 Label 1
size model I size model I
TABLE IV: Test accuracy for small scale motion

The actual model used for presence detection (model III in Table II) is obtained by further augmenting training data with label data (i.e., human free) collected on day 16. This is done since the output motion probabilities of human-free data collected on day are closer to than other days. Thus adding these data for CNN training provides more sample diversity. Model III is then deployed at the WiFi receiver, along with post-processing, for real-time presence detection.

As a comparison study, presence detection is conducted concurrently using a PIR sensor. We chose Honeywell DT [2], a leading edge PIR sensor with a coverage range of (our lab dimension is ). A camera is used in the lab to provide ground truth. The PIR sensor is mounted on the shelf at one side of the room at a height of (see Fig. 9(c)). Note that DT also has a microwave sensor which was disabled for this experiment. Throughout the experiment, human activities are restricted to the left side of the room (left of the red dash line in Fig. 8(b)) to avoid blind spot of the PIR sensor as its coverage is in a conical shape.

The post-processing is an averaging process on the motion detection outputs of the CNN. Each new CSI instance is used to construct CSI images with the previous CSI estimates, i.e., a sliding window with step size one is applied to CSI series. This results in a CNN output at a rate of about one per ms. The reporting rate for WiFi is times per second, i.e., once every ms. Each output is calculated using the CNN outputs from the previous one second, divided into intervals each of duration ms. A positive detection is declared for the second period if at least three out of the five subintervals within the second have positive motion detection, defined as at least CSI images have output label . Finally, given that the PIR sensor outputs its detection result between to times each second, we choose the detection resolution to be second for both WiFi and PIR: a presence is detected for each second if there is at least one positive detection within the one second period for PIR.

Iv-D1 False alarm test

The test is done over a day period (days from Dec.30, 2019 to Jan.1, 2020), when Lab II is empty. Results shown in Table V are the numbers of one-second intervals in which presence is detected by CNN and PIR sensor. In order to not interrupt normal lab activities, a single test on certain days can not last for very long. For example, on day , the entire test is broken into three periods, and the shortest one, lasting for minute happens during lunch break. The entire test lasts for about hours, the proposed system only report false positive four times, yielding a false alarm rate . The PIR sensor does have zero false alarm rate and the results are comparable given that isolated one second positive can be easily ruled out for occupancy detection.

day index duration CNN PIR
17 1 8hrs 3s 0s
2 20mins 0s 0s
3 9hrs 1s 0s
18 1 8hrs 0s 0s
2 9hrs 0s 0s
19 1 12hrs 0s 0s
TABLE V: False alarm counts (in seconds) in an empty room
(a) CNN
(b) PIR sensor
Fig. 11: Comparison with PIR sensor: test 1
(a) CNN
(b) PIR sensor
Fig. 12: Comparison with PIR sensor: test 2
(a) CNN
(b) PIR sensor
Fig. 13: Comparison with PIR sensor: test 3
day test index duration CNN PIR
17 1 1800s 212s 119s
2 1800s 76s 26s
18 1 2340s 117s 48s
2 1800s 19 11
3 1800s 68 41
TABLE VI: Presence Count (seconds)

Iv-D2 Sensitivity test

This part evaluates the sensitivity of the system to human presence. The experiments are done when people go about with their daily activities in the lab without introducing intentional motions. In most of the time, people would just sit in front of the computer and occasionally engaged in normal conversations as usual. Five tests are done in days and . Duration of each test and presence counts reported by CNN and PIR are summarized in Table VI. Figs. 11-13 show detection results of tests done in day . All the tests were done with at least one person present in the lab from the beginning to the end except test on day when the lab is empty for the first minutes. Human activities detected by CNN but not by PIR are marked using red rectangular boxes in Figs. 11-13. Notice that we only highlight parts when there is not a single positive detection output from PIR sensor for the entire duration of the box. For example, at around s in Fig. 11, CNN can detect much longer human presence than the PIR sensor but is not marked in the figure for clarity of presentation. To compare the sensitivity of two systems more accurately, we summarize presence counts in Table VI. Each count corresponds to a positive detection for a one second period during the the entire test run. WiFi sensing consistently outperforms PIR in all runs. By cross reference with video recordings, we find that the presence detected in the highlighted ranges (i.e., those detected by WiFi but not by PIR) in Figs.11-13 is associated with subtle human movement, such as stretching while sitting, adjusting sitting postures and conversing with each other without excessive movement. These subtle movements are often missed by PIR but are easily picked up by WiFi sensing. We emphasize again that model III is trained with only random walking for label data, i.e., no small scale motion is included.

It is worth noting that there are still movements that are missed by both WiFi sensing and PIR. The most important example is when occupants are typing on keyboards but otherwise completely still. Such movement appears to be too subtle to be detected by even WiFi sensing. A possible remedy is to deliberately add those keyboard typing data to the motion training set yet it is likely to increase the false alarm rate given the subtleness of such movements.

Iv-E Discussions

We discuss in this section the impact of various design parameters on the WiFi sensing performance. Due to space limit, high level observations are summarized here in lieu detailed experimental results.

Iv-E1 CSI sampling interval

The CSI sampling interval is set at ms in all the experiments reported above. Retraining model I under two more sampling intervals, ms and ms, we find that slight performance degradation occurs for motion detection but with no significant impact on the presence detection with properly designed post-processing, provided that training and testing are done in the same lab space. With training and testing done at different lab spaces, slower sampling rate results in noticeable performance loss.

Iv-E2 Contribution of CSI magnitude and phase

Using the same data set, we have also studied the performance of WiFi sensing using only magnitude or phase for presence detection. The result is quite informative: for empty rooms (label data), all three - magnitude only, phase only, and both magnitude and phase - give comparable results and all are quite accurate. For motion detection, using only CSI phase performs slightly worse than magnitude and magnitude plus phase when detecting random walks. However, for data collected in days 14-16, i.e., data with small scale movement, the phase only CNN performs significantly worse than the other two CNNs. For example, for the data in day 16, while the other two CNNs are able to detect small scale motion with over accuracy, phase alone achieves only accuracy.

Iv-E3 Normalization

The normalization of CSI magnitude in (2) is particularly helpful in controlling the false alarm rate. Experimental results show that using CSI without magnitude normalization achieves comparable motion detection accuracy (i.e., for data with label ). However, it results in an elevated false alarm rate for some data set, which may trigger positive presence detection in an empty room. The performance degradation is even more severe when different lab spaces are used for training and testing in the absence of normalization.

V Conclusion

In this paper, a passive WiFi sensing system is proposed for obtaining indoor occupancy information. The system exploits motion induced variation in both magnitude and phase of the CSI. A new convolutional neural network architecture is designed to harvest occupancy information in CSI estimates along temporal, frequency, and spatial dimensions. With judicious pre-processing to remove hardware/system impairments and post-processing to infer presence information from motion detection output, the proposed learning system provides a viable and promising alternative for real time presence or occupancy detection. Extensive experiments were conducted using commercial off-the-self WiFi devices. It was demonstrated that system is much more sensitive to human presence than PIR sensors and maintains desired robustness against time-varying wireless channel.

A key challenge for presence detection is the calibration of human motion to achieve balance between sensitivity to human presence and false alarm. The collection and use of data with human motion has an outsized influence on the presence detection performance. Future work will explore presence detection using only training data corresponding to empty rooms for the desired robustness. Learning approaches such as universal hypothesis test and one-class SVM may prove useful alternatives than deep learning.


  • [1] Note: IEEE Std. 802.11n-2009: Enhancements for higher throughput, 2009. [Online]. Available: http://www.ieee802.org Cited by: §IV-A.
  • [2] Note: Honeywell PIR sensor DT8035. [Online]. Available: https://www.security.honeywell.com/product-repository/dt8035 Cited by: §IV-D.
  • [3] Note: USRP: Universal Software Radio Perpheral, accessed September 2013. [Online]. Available: http://www.ettus.com Cited by: §I-A.
  • [4] H. Abdelnasser, M. Youssef, and K. A. Harras (2015-Apr.) Wigest: a ubiquitous wifi-based gesture recognition system. In Proc. IEEE Conf. on Comput. Commun. (INFOCOM), Hong Kong, China, pp. 1472–1480. Cited by: §I.
  • [5] S. Arshad, C. Feng, Y. Liu, Y. Hu, R. Yu, S. Zhou, and H. Li (2017-Jun.) Wi-chase: a wifi based human activity recognition system for sensorless environments. In Proc. IEEE 18th Int. Symp. A World of Wireless, Mobile Multimedia Networks (WoWMoM), Macao, China, pp. 1–6. Cited by: §I.
  • [6] F. Chollet et al. (2015) Keras. Note: https://keras.io Cited by: §IV-B.
  • [7] Q. Gao, J. Wang, X. Ma, X. Feng, and H. Wang (2017) CSI-based device-free wireless localization and activity recognition using radio image features. IEEE Trans. Veh. Technol. 66 (11), pp. 10346–10356. Cited by: §I.
  • [8] L. Gong, W. Yang, Z. Zhou, D. Man, H. Cai, X. Zhou, and Z. Yang (2016) An adaptive wireless passive human detection via fine-grained physical layer information. Ad Hoc Networks 38, pp. 38–50. Cited by: §I, §I, §III-A2.
  • [9] C. Gonzalez and E. Woods (1991) Digital image processing. 2nd edition, Prentice Hall, Upper Saddle River, NJ, USA. Cited by: §III-A3.
  • [10] Y. Gu, F. Ren, and J. Li (2015) Paws: passive human activity recognition based on wifi ambient signals. IEEE Internet Things J. 3 (5), pp. 796–805. Cited by: §I.
  • [11] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §III-B.
  • [12] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §III-B.
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances neural inform.n process. syst., pp. 1097–1105. Cited by: §III-B.
  • [14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), pp. 2278–2324. Cited by: §III-B.
  • [15] S. Li, X. Li, K. Niu, H. Wang, Y. Zhang, and D. Zhang (2017-Aug.) Ar-alarm: an adaptive and robust intrusion detection system leveraging csi from commodity wi-fi. In Proc. Int. Conf. Smart Homes Health Telematics, Paris, France, pp. 211–223. Cited by: §I.
  • [16] X. Li, D. Zhang, Q. Lv, J. Xiong, S. Li, Y. Zhang, and H. Mei (2017) IndoTrack: device-free indoor human tracking with commodity wi-fi. Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technologies 1 (3), pp. 72. Cited by: §I.
  • [17] X. Liu, J. Cao, S. Tang, and J. Wen (2014-Dec.) Wi-sleep: contactless sleep monitoring via wifi signals. In Proc. IEEE Real-Time Syst. Symp., Rome, Italy, pp. 346–355. Cited by: §I.
  • [18] Y. Ma, G. Zhou, S. Wang, H. Zhao, and W. Jung (2018) SignFi: sign language recognition using wifi. Proc. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 2 (1), pp. 23. Cited by: §I, §I.
  • [19] M. Moussa and M. Youssef (2009-Mar.) Smart cevices for smart environments: device-free passive detection in real environments. In Proc. IEEE Int. Conf. Pervasive Computing and Commun., Galveston, TX, USA, pp. 1–6. Cited by: §I.
  • [20] S. Palipana, P. Agrawal, and D. Pesch (2016-Nov.) Channel state information based human presence detection using non-linear techniques. In Proc. 3rd ACM Int. Conf. Systs Energy-Efficient Built Environments, Palo Alto, CA, USA, pp. 177–186. Cited by: §I, §I, §III-A2.
  • [21] S. Palipana, D. Rojas, P. Agrawal, and D. Pesch (2018) FallDeFi: ubiquitous fall detection using commodity wi-fi devices. Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1 (4), pp. 155. Cited by: §I, §I.
  • [22] K. Qian, C. Wu, Z. Yang, Y. Liu, F. He, and T. Xing (2018) Enabling contactless detection of moving humans with dynamic speeds using csi. ACM Trans. Embedded Computing Syst. (TECS) 17 (2), pp. 52. Cited by: §I, §I.
  • [23] K. Qian, C. Wu, Z. Yang, Y. Liu, and K. Jamieson (2017-Jul.) Widar: decimeter-level passive tracking via velocity monitoring with commodity wi-fi. In Proc. 18th ACM Int. Symp. Mobile Ad Hoc Networking Computing, Chennai, India, pp. 6. Cited by: §I.
  • [24] T. S. Rappaport (2002) Wireless communication - principles and practice. 2nd edition, Prentice Hall, Upper Saddle River, NJ, USA. Cited by: §II-B.
  • [25] E. Soltanaghaei, A. Kalyanaraman, and K. Whitehouse (2017-Jun.) Peripheral wifi vision: exploiting multipath reflections for more sensitive human sensing. In Proc. Int. Workshop Physical Analytics, Niagara Falls, NY, USA, pp. 13–18. Cited by: §I, §I.
  • [26] T. Teixeira, G. Dublon, and A. Savvides (2010) A survey of human-sensing: methods for detecting presence, count, location, track, and identity. ACM Computing Surveys 5 (1), pp. 59–69. Cited by: §I.
  • [27] W. Wang, A. X. Liu, M. Shahzad, K. Ling, and S. Lu (2017) Device-free human activity recognition using commercial wifi devices. IEEE J. Select. Areas Commun. 35 (5), pp. 1118–1131. Cited by: §I.
  • [28] Y. Wang, J. Liu, Y. Chen, M. Gruteser, J. Yang, and H. Liu (2014-Sep.) E-eyes: device-free location-oriented activity identification using fine-grained wifi signatures. In Proc. 20th Annu. Int. Conf. Mobile computing networking, Hawaii, USA, pp. 617–628. Cited by: §I.
  • [29] C. Wu, Z. Yang, Z. Zhou, X. Liu, Y. Liu, and J. Cao (2015) Non-invasive detection of moving and stationary human with wifi. IEEE J. Select. Areas Commun. 33 (11), pp. 2329–2342. Cited by: §I, §I.
  • [30] Y. Xie, Z. Li, and M. Li (2015) Precise power delay profiling with commodity wifi. In Proc. 21st Annu. Int. Conf. Mobile Computing and Networking, MobiCom ’15, New York, NY, USA, pp. 53–64. External Links: ISBN 978-1-4503-3619-2, Link, Document Cited by: §IV-A.
  • [31] Q. Xu, Y. Han, B. Wang, M. Wu, and K. R. Liu (2019) Indoor events monitoring using channel state information time series. IEEE Internet Things J.. Cited by: §I, §I.
  • [32] Q. Xu, F. Zhang, B. Wang, and K. R. Liu (2018-Apr.) Time reversal indoor tracking with centimeter accuracy. In Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Calgary, AB, Canada, pp. 6433–6437. Cited by: §I.
  • [33] Y. Zeng, P. H. Pathak, C. Xu, and P. Mohapatra (2014-Sept.) Your ap knows how you move: fine-grained device motion recognition through wifi. In Proc. 1st ACM workshop on Hot topics in wireless, Hawaii, USA, pp. 49–54. Cited by: §I.
  • [34] Y. Zhao, N. Patwari, J. M. Phillips, and S. Venkatasubramanian (2013-Apr.) Radio tomographic imaging and tracking of stationary and moving people via kernel distance. In Proc. ACM/IEEE Int. Conf. Inform. Process. Sensor Networks (IPSN), Philadelphia, PA, USA, pp. 229–240. Cited by: §I, §I.
  • [35] X. Zheng, J. Wang, L. Shangguan, Z. Zhou, and Y. Liu (2017) Design and implementation of a csi-based ubiquitous smoking detection system. IEEE/ACM Trans. Networking 25 (6), pp. 3781–3793. Cited by: §I.
  • [36] H. Zhu, F. Xiao, L. Sun, R. Wang, and P. Yang (2017) R-ttwd: robust device-free through-the-wall detection of moving human with wifi. IEEE J. Select. Areas in Commun. 35 (5), pp. 1090–1103. Cited by: §I, §I, §III-A2.
  • [37] H. Zou, Y. Zhou, J. Yang, W. Gu, L. Xie, and C. Spanos (2017-06) Freedetector: device-free occupancy detection with commodity wifi. In Proc. IEEE Int. Conf. Sensing, Commun. and Networking (SECON Workshops), San Diego, CA, USA, pp. 1–5. Cited by: §I, §I.