Presence detection plays a key role in improving operation efficiency and reducing carbon footprint for office and residential buildings. The use of occupancy information in controlling HVAC and lighting systems has become increasingly prevalent. Existing methods for human presence detection include passive infrared (PIR), microwave, CO, and wearable sensors, and cameras , among others. Microwave sensors are overly sensitive as they tend to have frequent false alarms, e.g., detecting movements outside of intended coverage areas. CO sensors have a slow response time and a high cost barrier. Cameras raise privacy concerns and are sensitive to lightning conditions. Wearable sensors/devices can be intrusive or cumbersome for users. PIR sensors are the most widely deployed method for presence detection. PIR sensors pick up infrared emission using its on-board pyroelectric sensor and detect movement of humans (or objects) through heat variation within the field of view. Its drawback is its low sensitivity and limited coverage and it is mostly used for isolated lighting control.
This paper explores the use of RF signals for presence detection. In particular, we use WiFi signals in the current work given its ubiquity in almost all indoor environment. Current and future WiFi systems (i.e., the upcoming WiFi-6) employ multiple-input and multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) at the physical layer. As such, the CSI contains rich information about the ambient environment in spatial, temporal, and frequency domains.
Exploiting ambient RF signals for detecting, localizing, tracking, and identifying human motion/activities have been extensively studied in the literature [4, 10, 19, 33, 21, 31, 35, 15, 17, 28, 5, 27, 7, 32, 16, 23, 25, 34, 29, 18, 37, 22, 8, 20, 36]. Early work for indoor RF sensing mainly relies on received signal strength indicator (RSSI) [4, 10, 19]. RSSI measures instantaneous attenuation of RF signals at the receiver and its temporal variation can be associated with motion/activities of humans/objects. Recently, more fine-grained features such as CSI have been used for RF sensing. For example, different human activities (e.g., running, walking and eating) or locations can be recognized by analyzing their unique effect on the CSI [28, 5, 27, 7, 32, 16, 23]. Another interesting application is gesture recognition; the SignFi 
system uses CSI extracted from WiFi signals to classifysign gestures with high accuracy.
). For the detection of particular activities, one can use a model based approach - certain activity will impose an identifiable signature on RF propagation thus hand-crafted features extracted from received signals can be exploited
. Alternatively, a data driven approach can be used where collected training data are fed to machine learning algorithms (e.g., a neural network) which learn to discriminate different states (labels) through training. For presence detection, however, there is no defined activities when humans are present thus a model based approach is typically not adequate. While a data driven approach appears to be a natural choice, it is uncleara priori what would be the best way to collect training data for presence detection. Perhaps the only reasonable assumption that one can make for presence detection is that humans are not expected to be completely still for an extended period of time. We comment here that there exist studies that detect human presence using RF signals through either carefully calibrating the human free environment [25, 34] or breathing detection . Their performance however is highly sensitive to environment change (e.g., room change, furniture move) or human locations (as in the case of breathing detection).
There is prior work on presence detection using CSI. The FreeDetector system 
achieves occupancy detection by computing the temporal similarity of CSIs across frequencies; however, it can only detect walking across line of sight between the transmitter and the receiver. The PADS and R-TTWD systems in and 
utilize support vector machine (SVM) to detect motion; the inputs to the SVM come from CSI time series after dimensionality reduction through principal component analysis. Many existing approaches discard phase information of CSI as it is typically noisy due to either estimation error or inherent impediments such as carrier frequency offset (CFO) and sampling time offset (STO). For example, in[8, 20], temporal cross of CSI amplitude is used since motions tend to decrease correlation of CSI in time.
This paper proposes a WiFi CSI based presence detection system consisting of pre-processing for data representation, a CNN for motion detection, and post-processing for the eventual presence detection. The rationale of choosing CNN is its ability to exploit CSI variation in multiple dimensions thanks to the MIMO-OFDM waveforms at the physical layer. In contrast, while recurrent neural network (RNN) can also be used for learning variation pattern of CSI over time, it introduces more computational overhead compared with CNN and the long-term memory offered by RNN provides no meaningful gain in this application given that channel correlation in time and frequency both diminishes as distances increase. With multi-layer perceptron (MLP), on the other hand, local features, i.e., change in correlation in both temporal and frequency domains, are not explicitly explored hence MLP typically requires much deeper networks to achieve similar performance.
I-a Summary of Contributions
A new CNN architecture is proposed that separately processes the magnitude and phase information with two independent CNNs before combining the respective outputs as input to fully connected (FC) layers. Such a CNN architecture provides an important alternative to existing approaches when handling complex input (e.g., in simply stacking up real/imaginary components or magnitude/phase components). This allows for different pre-processing of CSI magnitude and phase which are critical in exploiting motion induced CSI variation in the presence of various channel and hardware impairments.
Pre-processing of CSI estimate is carefully designed where spatial, temporal, and frequency domain information is exploited in a holistic manner. The pre-processing takes into consideration how human movement affects CSI while insulating against unintended distortion in RF circuitry (e.g., CFO/STO). Fourier transform is used to localize important motion-induced features in the constructed image, making it more amenable for presence detection as CNN builds its ability for discriminating data through local features (i.e., small kernel size).
An important contribution of the present work is extensive test implemented using commercial off-the-shelf (COTS) WiFi devices. Assuming that humans are not completely still for an extended period of time, our presence detection compares much more favorably against that of commercial PIR sensors. We note that the comparison is done without customized WiFi hardware. For example, we have found that when using USRP systems , which we have full control on sampling frequency and gives a much cleaner CSI estimate, performance can improve substantially over COTS WiFi receiver.
I-B Organization and Notation
Section II describes the MIMO-OFDM waveforms and how human movement impacts wireless channels. The design of the sensing system including pre-processing, the proposed CNN architecture, and post-processing is described in Section III. Experiment setup and the corresponding test results are provided in Section IV followed by conclusion in Section V.
Scalars are denoted by either lower or upper case letters, e.g., and . Column vectors and matrices are denoted by lower and upper case bold letters, e.g., and . The -th entry of a vector , the -th entry of , and the th entry of a -D array are denoted by , , and , respectively. and are used to denote the -th column and the -th row of . Similarly, represents the -D matrix at depth of the -D array . denotes the discrete Fourier transform (DFT) hence the inverse DFT. and denote magnitude and phase of a complex number .
Ii MIMO-OFDM System model
Consider a MIMO-OFDM system with transmit antennas, receive antennas, and subcarriers. Each physical layer frame consists of OFDM symbol blocks. Denote by the -th frequency domain OFDM symbol vector in the -th frame sent by the -th transmit antenna. The discrete-time complex baseband signal corresponding to is given by . At the receiver, after cyclic prefix removal and applying DFT, the complex baseband sample at the -th receive antenna in the frequency domain can be expressed as, for ,
where is the channel noise and the channel coefficient from the -th transmit antenna to the -th receive antenna on the -th subcarrier, assumed to be a constant within one frame (i.e., for all OFDM blocks within the -th frame). Expressed in vector form, we have
where is the channel matrix corresponding to the -th subcarrier for the -th OFDM frame and .
Ii-B Effect of Human Motion on MIMO-OFDM Channel
Human motion leads to CSI variation in both the frequency (across subcarriers) and temporal (across frames) domains. Human presence and movement introduce new paths whose delays are affected by human locations, leading to change in path delay profile. This results in the change in channel frequency response. Human movement also induces temporal CSI variation (i.e., signals for paths affected by humans may add in-phase and out-of-phase at the receiver depending on the locations of humans). An alternative interpretation is the increase of Doppler spread due to human movement in an otherwise static environment, leading to time-selective channel fading . An example using real WiFi measurement of the variation of over frame index with and without human movement for fixed and is shown in Fig. 1 for four evenly spaced subcarrriers. Clearly, with movement, channel variation both in frequency (across subcarriers) and in time (along the horizontal axis) increases.
The effect of human movement on the CSI in the spatial dimension is more subtle. The human motion induced CSI variation in the temporal and frequency domains applies to every transmit-receive antenna pair. The fact that multiple transceiver pairs (a.k.a, spatial diversity) exist in the MIMO-OFDM system should be exploited for enhanced sensing performance. CNN is a natural choice for exploiting such spatial diversity by mapping temporal-frequency CSI corresponding to each transceiver pair to a layer (‘channel’) in a CNN architecture, much like the way colored images are processed in a CNN where RGB pixels serve as separate channels.
Multiple antennas at WiFi transceivers are also exploited in this paper to make the phase information of CSI estimate much more useful for presence detection. As WiFi devices use a single oscillator for RF circuitry corresponding to different antennas, the CFO, if present, is common to all inputs at different receive antennas. Similarly, sampling is driven by a single clock, hence STO is also identical for all inputs at different receive antennas. Thus instead of using the raw CSI phase measurement, one can use phase differences between receive antennas to remove phase variation due to CFO and STO. While such processing has no effect on digital communication performance (e.g., it does not correct residual CFO/STO for each receive chain for the purpose of symbol detection), it cleans up the phase information when phase variation due to human movement is of interest. An example of phase differences is given in Fig. 2 with where the CSI phase from the first antenna serves as a reference. Clearly, phase differences stay relatively stable in a human-free environment. In contrast, motion from human introduces significant fluctuation to the relative phases across three receive antennas.
Figs. 1 and 2 indicate that both magnitude and phase of estimated CSI contain rich information about human motion. While in theory, deep learning trained using labeled data appears to be a straightforward exercise, the challenge is that for presence detection, there is no clearly defined human motion that one tries to detect. As such, collecting labeled training data with human presence needs to be carefully addressed along with the design of the learning system for presence detection.
Iii System design
A high level description of the proposed system is depicted in Fig. 3. Consecutive CSIs are first arranged into CSI magnitude and phase images. They are processed separately and fed into the CNN learning block comprised of two parallel CNNs - one for magnitude images and the other for phase images - followed by FC layers (c.f. Fig. 7 and Section III-B). The post-processing block accumulates instantaneous detection results provided by the CNN and output the final presence detection depending on the required time resolution.
Iii-a Input Pre-processing
Recall that is an array consisting of MIMO channel matrices across all subcarriers for the -th frame. Instantaneous motion detection is based on collected over consecutive frames, denoted as . Here we assume without loss of generality (WLOG) the first CSI array has frame index . For each , we select evenly spaced subcarriers out of subcarriers, resulting in with size . Down selection of subcarriers significantly reduces the data dimension yet does not have any negative effect in sensing performance. This is because the carrier spacing in WiFi signals (kHz) is much smaller than the coherent bandwidth in a typical indoor (i.e. low mobility) environment, thus the behaviors of subcarriers that are immediate neighbors closely track each other with or without human motion. The resulting are subsequently stacked up along the temporal domain to form a -D array of size . The magnitude and phase information are then extracted from prior to independent pre-processing.
Iii-A1 CSI magnitude
We reshape the -D array into a -D array by combining the last two spatial dimensions, i.e., channel matrix for each subcarrier is flattened into a -D array. The obtained array, denoted by is of size .
Pre-processing involves normalization and transformation. Normalization is done to remove dependence of the absolute CSI magnitude on various environment parameters that are irrelevant to presence detection. For example, the dynamic range of is highly dependent on the distance between the transmitter and receiver and the existence of line of sight transmission. While various normalization methods can be used, we find through extensive experiments the following offers the most robust performance: for ,
where denotes element-wise division. Note that indexes OFDM frame, thus the normalization is done with respect to the first OFDM frame within the frames contained in .
Subsequently, a 2-D DFT is applied to along the temporal (frame) and frequency (subcarrier) dimensions, resulting in the output array for each transceiver antenna pair:
Here the DFT output is properly shifted so that zero frequency component is at the center of the array. The use of 2-D DFT serves two purposes. First, human motion induced temporal variation of CSI is continuous in nature. As such, it results in dispersion in the lower frequency region along the temporal dimension. This is in contrast to hardware impairment and channel estimation error when sudden change of CSI may be observed irrespective of human presence. Therefore, high frequency change can be removed by simple cropping of the DFT output along the temporal dimension around zero frequency:
where , and is the cropping window size. Here we assume WLOG that both and are even numbers. Cropping also significantly reduces the image size, leading to faster learning and reduced storage requirement. This makes the learning suitable to be implemented on edge devices instead of having to resort to cloud services. Note that further reduction of image size can be achieved by utilizing the conjugate symmetry of the -D FFT due to the fact that input to the FFT is real-valued (i.e., magnitude of CSI arrays).
Another reason of using 2-D DFT is its ability to localize motion related CSI variation. While temporal variation in is exhibited for the entire frames, 2-D DFT concentrates such variation into the low frequency region. This is particularly suitable for CNN given its ability to build discriminating ability on local features. Figs. 3(a) and 3(b) provide a sample of collected in the same room without and with human motions. One can see that, in the temporal dimension, the -D DFT using data collected in an empty room is dominated by the DC component. With human movement, there is clearly dispersion at low frequency region in the temporal (horizontal) dimension.
Iii-A2 CSI phase
Even with a completely static environment, the estimated CSI phase will undergo variation (e.g., from residual CFO and STO) which may lead to abrupt changes within . This can be partially resolved by phase unwrapping which removes such abrupt changes. However, phase unwrapping does not remove phase variation introduced by any residual CFO and STO offset, but merely correct discontinuous phase jumps. Thus CSI phases are often discarded for WiFi sensing [20, 8, 36] because of this “noisy” nature.
However, a simple pre-processing that computes phase difference with respect to a reference receive antenna can largely mitigate this problem due to the fact that CFO and STO are common to all receive antennas (see Section II-B). Denote by the phase difference between for different
where . The last two spatial dimensions of are then flattened into one dimension and phase is unwrapped along the time axis to remove discontinuity at boundary points and . The obtained result is denoted by . Different from CSI magnitude, only -D DFT along the temporal dimension is performed on to get since phase unwrapping weakens relation of CSI phase across different subcarriers. An example of is given in Figs. 5 and 6 where significantly increased dispersion of the DFT output along the temporal dimension can be observed with human movement.
The following steps are similar to how we obtain , where we shift the zero frequency component to the center and crop out the high frequency components in the temporal domain, leading to the following CSI phase information
where is chosen to be the same as that in (3).
Iii-A3 Image Normalization
DFT typically results in increased dynamic range of and . Elements with low intensity are easily overwhelmed by those with large values. The logarithmic operator is applied to each element in both images  to reduce such disparity. The final input to the two parallel CNNs are
Iii-B Architecture of CNN
The architecture of the proposed CNN is shown in Fig. 7. Magnitude and phase images in (6) are fed into two parallel CNNs which share the same structure. The output of the two CNNs are then concatenated and fed to FC layers.
The building blocks of the proposed CNN are similar to those in AlexNet . Each of the two parallel CNNs consists of two convolution (Conv) layers without padding. Each Conv layer is followed by an average pooling layer 
to reduce the output image dimension. The multi-dimensional output of the last Conv layer is flattened into vectors and subsequently fed into an FC layer. Batch Normalization (BN)
is added after each layer that has trainable parameters, which can speed up training and make the model more robust against variations in outputs from previous layers. Two activation functions are used - rectified linear unit (ReLU) and softmax. ReLU is used for the hidden layer whereas softmax for the output layer. We note that with presence detection, the number of classes is
, i.e., it is a binary classification problem. Therefore, a sigmoid function can be used instead of softmax for the output layer. However, our experiment indicates slightly more robust classification performance using softmax - this can perhaps be attributed to the difference in weight and bias terms between the two: softmax employs two independent sets of weight vectors and biases for the two neurons whereas the sigmoid function has a single input to the neuron at the output layer. While mathematically one can show equivalence between the two for binary classification by finding the corresponding parameters, learning such parameters through training may yield some performance difference.
Iii-C Post processing
The design of the post processing block is closely tied with how data collection is conducted. As alluded in the introduction, presence detection differs with detection of certain activities in that one is not looking for a certain activity pattern but rather, whether a room is being occupied or not, assuming that occupants are not completely still for extended periods of time. As such, there are two different ways of collecting training data for the occupant state: one is to collect CSI for the entire duration when occupants are present; an alternative way is to collect CSI only when occupants are moving. While in theory the former seems to be a natural choice - what we try to detect is the presence or absence of humans in a room - doing so leads to significantly high false alarm rate regardless of how many training data are collected. The reason is quite simple: collecting training CSI data when humans are present will include many instances when humans are completely still. Such CSI samples, albeit scattered throughout the measurement data (i.e., not for extended period of time), are indistinguishable with that of an empty room. In essence, the training data corresponding to human present are polluted with a large number of data samples that are similar to that training data without human presence.
We elect to use training data corresponding to the CSI instances when there are detectable human movements in the room. While this leads to ‘missed detection’ corresponding to instances when the occupants are still, simple post processing can be done after the CNN block with tunable parameters such as the resolution with which the presence detection is desired. In short, the training is done so that the CNN attempts to reliably detect human motion of any kinds; complete still human presence thus is likely to be classified as the negative state. Post-processing then applies some averaging operation within a time window, whose duration corresponds to some desired time resolution, for presence detection. This is sufficient in practice since with a truly empty room, the CNN output should contain negative outputs whereas with human present, the output should have significant portion of positive outputs.
Iv Experiment Setup
This section describes the experiment setup where COTS WiFi cards are used to collect WiFi CSI in an indoor environment. Data collection is explained in detail and the presence detection result is compared to that using PIR sensors.
Iv-a Experiment setup
Our WiFi system consists of a laptop (Thinkpad T) as WiFi access point (AP) and a desktop (Dell OptiPlex ) as WiFi client. Atheros 802.11n WiFi chipset, AR9580, and Ubuntu 14.04 LTS with built-in Atheros-CSI-Tool  are installed on both computers. The AP sends packets at the rate of pkts/s, while the client is recording CSIs using Atheros-CSI-Tool, i.e., the CSI sampling interval is roughly ms. With receive antennas, transmit antennas, and subcarriers in a MHz channel operating at channel in the GHz band , each CSI instance is a complex valued array. Down-selecting to evenly spaced subcarriers, the resulting CSI array is of dimension .
The indoor environment in which both training data collection and testing are done is sketched in Fig. 8. Two different labs are used to understand how sensitive the developed system is to environment parameters. In both labs, there are multiple monitors/laptops on desks and multiple chairs on the floor which are not drawn in this figure since their positions may change in different days. Notice that since the transmit antennas are placed behind a laptop and the receive antenna array is surrounded by a lot of other computers as shown in Fig. 9, there is no strong line of sight component between the transmitter and the receiver.
|(a) Transmitter||(b) Receiver||(c) PIR sensor|
Iv-B Data collection
For the image input to the CNN, we choose consecutive CSI instances, which lasts for around s. This is chosen since one second is sufficiently long for any detectable human motion to induce temporal CSI variation. A CSI image is only used (i.e., considered a valid sample) if it satisfies the following two conditions: 1) Every entry of is non-zero. This is imposed to remove erroneous CSI estimate - occasionally zero entries will show up in recorded CSI series, potentially due to hardware/firmware problems. 2) The time difference between the last and the first frame lies within s. WiFi scheduling may lead to different frame lengths hence excessively long interval between two CSI estimates. In the experiment, the cropping window size is chosen to be , hence and in (6) are of size and respectively.
Data collected in the human-free state is labeled as . The training data with label are collected when at least one person is walking randomly in the room. This way, training samples collected when occupants are completely still will not be used. Both human-free and motion data are collected on multiple days since the wireless channels are inherently nonstationary. This prevents CNN from being tuned to features that are irrelevant to presence detection, e.g., different CFO and STO on different days due to frequency drift. Data collection on any given day is also divided into disjoint runs which alternate between human-free and human motion. Finally, training and test data come from completely disjoint days.
|Days||Location||Dates (in )|
|Lab I||Sept.15 to Sept.22|
|Lab II||Oct.10 to Oct.30|
|Lab II||Nov.26 to Dec.5|
Iv-C Motion Detection
We first evaluate motion detection using the proposed CNN without post-processing, i.e., CNN is trained to classify input CSI images according their labels. The CSIs were collected in days over a period of four months, as summarized in Table I. Data collected during the first three days were from Lab I, with the data for the remaining days were collected from Lab II. While training data consist of runs either labeled with (human free) or (human motion where someone is randomly walking at arbitrary speed and direction), evaluation is done in two ways: the first uses measurement from runs with single state, while second uses test runs including both human free and human motion measurement.
Iv-C1 Single State Test Runs
|Model name||Label 0||Label 1|
The CNN with parameters is trained using data from days and the resulting model is denoted by model I. The number of training data in each class is summarized in Table II. Model I is then tested on single label data from the remaining days and the results are summarized in Table III. The performance on data collected in Lab II is quite consistent (all close to ) - notice that days were about one month earlier than the time that training data were collected, thus lab settings, e.g., the placement of the transceiver and number of surrounding objects, were quite different. The results show that the proposed CNN is robust to the environment changes over time but in the same room. However, performance is not nearly as good for data collected in Lab I (days ). For example, the false alarm rate in day is which is noticeably higher than other days.
A simple remedy is to include data collected from Lab I in the training set. This leads to model II in Table III where data from day are combined with randomly chosen samples from the previous training set (see Table II). The performance of model II on day 1 exhibits noticeable improvement over model I in false alarm rate.
Iv-C2 Mixed State Test Runs
In this part of the experiment, each test run lasts minutes and is divided into five one-minute intervals. Measurement is done carefully such that each interval has the same state. Detection results of mixture runs on day , and are given in Fig. 10. We set the CSI step size to be , thus each one-minute interval contains around images. Instead of using predicted labels, probability of assigning each image to class is used to give a more refined detection performance. From Fig. 10, Model I can successfully track state change in a single run.
|Days||Label 0||Label 1|
|size||Model I||Model II||size||Model I||Model II|
Iv-D Presence Detection
Before presenting our presence detection results conducted in Lab II, let us first examine how the CNN training with walking data performs when more subtle human motion other than walking is used for testing. These data are collected on days from Oct.20, 2019 to Oct.24, 2019 in Lab II with various small scale motion (turning in chairs, arm waving, etc.). The results are summarized in Table IV. Clearly, with training data coming from exclusively random walking for label 1, the CNN can still reliably detect other motion types.
|Days||Label 0||Label 1|
|size||model I||size||model I|
The actual model used for presence detection (model III in Table II) is obtained by further augmenting training data with label data (i.e., human free) collected on day 16. This is done since the output motion probabilities of human-free data collected on day are closer to than other days. Thus adding these data for CNN training provides more sample diversity. Model III is then deployed at the WiFi receiver, along with post-processing, for real-time presence detection.
As a comparison study, presence detection is conducted concurrently using a PIR sensor. We chose Honeywell DT , a leading edge PIR sensor with a coverage range of (our lab dimension is ). A camera is used in the lab to provide ground truth. The PIR sensor is mounted on the shelf at one side of the room at a height of (see Fig. 9(c)). Note that DT also has a microwave sensor which was disabled for this experiment. Throughout the experiment, human activities are restricted to the left side of the room (left of the red dash line in Fig. 8(b)) to avoid blind spot of the PIR sensor as its coverage is in a conical shape.
The post-processing is an averaging process on the motion detection outputs of the CNN. Each new CSI instance is used to construct CSI images with the previous CSI estimates, i.e., a sliding window with step size one is applied to CSI series. This results in a CNN output at a rate of about one per ms. The reporting rate for WiFi is times per second, i.e., once every ms. Each output is calculated using the CNN outputs from the previous one second, divided into intervals each of duration ms. A positive detection is declared for the second period if at least three out of the five subintervals within the second have positive motion detection, defined as at least CSI images have output label . Finally, given that the PIR sensor outputs its detection result between to times each second, we choose the detection resolution to be second for both WiFi and PIR: a presence is detected for each second if there is at least one positive detection within the one second period for PIR.
Iv-D1 False alarm test
The test is done over a day period (days from Dec.30, 2019 to Jan.1, 2020), when Lab II is empty. Results shown in Table V are the numbers of one-second intervals in which presence is detected by CNN and PIR sensor. In order to not interrupt normal lab activities, a single test on certain days can not last for very long. For example, on day , the entire test is broken into three periods, and the shortest one, lasting for minute happens during lunch break. The entire test lasts for about hours, the proposed system only report false positive four times, yielding a false alarm rate . The PIR sensor does have zero false alarm rate and the results are comparable given that isolated one second positive can be easily ruled out for occupancy detection.
Iv-D2 Sensitivity test
This part evaluates the sensitivity of the system to human presence. The experiments are done when people go about with their daily activities in the lab without introducing intentional motions. In most of the time, people would just sit in front of the computer and occasionally engaged in normal conversations as usual. Five tests are done in days and . Duration of each test and presence counts reported by CNN and PIR are summarized in Table VI. Figs. 11-13 show detection results of tests done in day . All the tests were done with at least one person present in the lab from the beginning to the end except test on day when the lab is empty for the first minutes. Human activities detected by CNN but not by PIR are marked using red rectangular boxes in Figs. 11-13. Notice that we only highlight parts when there is not a single positive detection output from PIR sensor for the entire duration of the box. For example, at around s in Fig. 11, CNN can detect much longer human presence than the PIR sensor but is not marked in the figure for clarity of presentation. To compare the sensitivity of two systems more accurately, we summarize presence counts in Table VI. Each count corresponds to a positive detection for a one second period during the the entire test run. WiFi sensing consistently outperforms PIR in all runs. By cross reference with video recordings, we find that the presence detected in the highlighted ranges (i.e., those detected by WiFi but not by PIR) in Figs.11-13 is associated with subtle human movement, such as stretching while sitting, adjusting sitting postures and conversing with each other without excessive movement. These subtle movements are often missed by PIR but are easily picked up by WiFi sensing. We emphasize again that model III is trained with only random walking for label data, i.e., no small scale motion is included.
It is worth noting that there are still movements that are missed by both WiFi sensing and PIR. The most important example is when occupants are typing on keyboards but otherwise completely still. Such movement appears to be too subtle to be detected by even WiFi sensing. A possible remedy is to deliberately add those keyboard typing data to the motion training set yet it is likely to increase the false alarm rate given the subtleness of such movements.
We discuss in this section the impact of various design parameters on the WiFi sensing performance. Due to space limit, high level observations are summarized here in lieu detailed experimental results.
Iv-E1 CSI sampling interval
The CSI sampling interval is set at ms in all the experiments reported above. Retraining model I under two more sampling intervals, ms and ms, we find that slight performance degradation occurs for motion detection but with no significant impact on the presence detection with properly designed post-processing, provided that training and testing are done in the same lab space. With training and testing done at different lab spaces, slower sampling rate results in noticeable performance loss.
Iv-E2 Contribution of CSI magnitude and phase
Using the same data set, we have also studied the performance of WiFi sensing using only magnitude or phase for presence detection. The result is quite informative: for empty rooms (label data), all three - magnitude only, phase only, and both magnitude and phase - give comparable results and all are quite accurate. For motion detection, using only CSI phase performs slightly worse than magnitude and magnitude plus phase when detecting random walks. However, for data collected in days 14-16, i.e., data with small scale movement, the phase only CNN performs significantly worse than the other two CNNs. For example, for the data in day 16, while the other two CNNs are able to detect small scale motion with over accuracy, phase alone achieves only accuracy.
The normalization of CSI magnitude in (2) is particularly helpful in controlling the false alarm rate. Experimental results show that using CSI without magnitude normalization achieves comparable motion detection accuracy (i.e., for data with label ). However, it results in an elevated false alarm rate for some data set, which may trigger positive presence detection in an empty room. The performance degradation is even more severe when different lab spaces are used for training and testing in the absence of normalization.
In this paper, a passive WiFi sensing system is proposed for obtaining indoor occupancy information. The system exploits motion induced variation in both magnitude and phase of the CSI. A new convolutional neural network architecture is designed to harvest occupancy information in CSI estimates along temporal, frequency, and spatial dimensions. With judicious pre-processing to remove hardware/system impairments and post-processing to infer presence information from motion detection output, the proposed learning system provides a viable and promising alternative for real time presence or occupancy detection. Extensive experiments were conducted using commercial off-the-self WiFi devices. It was demonstrated that system is much more sensitive to human presence than PIR sensors and maintains desired robustness against time-varying wireless channel.
A key challenge for presence detection is the calibration of human motion to achieve balance between sensitivity to human presence and false alarm. The collection and use of data with human motion has an outsized influence on the presence detection performance. Future work will explore presence detection using only training data corresponding to empty rooms for the desired robustness. Learning approaches such as universal hypothesis test and one-class SVM may prove useful alternatives than deep learning.
-  Note: IEEE Std. 802.11n-2009: Enhancements for higher throughput, 2009. [Online]. Available: http://www.ieee802.org Cited by: §IV-A.
-  Note: Honeywell PIR sensor DT8035. [Online]. Available: https://www.security.honeywell.com/product-repository/dt8035 Cited by: §IV-D.
-  Note: USRP: Universal Software Radio Perpheral, accessed September 2013. [Online]. Available: http://www.ettus.com Cited by: §I-A.
-  (2015-Apr.) Wigest: a ubiquitous wifi-based gesture recognition system. In Proc. IEEE Conf. on Comput. Commun. (INFOCOM), Hong Kong, China, pp. 1472–1480. Cited by: §I.
-  (2017-Jun.) Wi-chase: a wifi based human activity recognition system for sensorless environments. In Proc. IEEE 18th Int. Symp. A World of Wireless, Mobile Multimedia Networks (WoWMoM), Macao, China, pp. 1–6. Cited by: §I.
-  (2015) Keras. Note: https://keras.io Cited by: §IV-B.
-  (2017) CSI-based device-free wireless localization and activity recognition using radio image features. IEEE Trans. Veh. Technol. 66 (11), pp. 10346–10356. Cited by: §I.
-  (2016) An adaptive wireless passive human detection via fine-grained physical layer information. Ad Hoc Networks 38, pp. 38–50. Cited by: §I, §I, §III-A2.
-  (1991) Digital image processing. 2nd edition, Prentice Hall, Upper Saddle River, NJ, USA. Cited by: §III-A3.
-  (2015) Paws: passive human activity recognition based on wifi ambient signals. IEEE Internet Things J. 3 (5), pp. 796–805. Cited by: §I.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §III-B.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §III-B.
-  (2012) Imagenet classification with deep convolutional neural networks. In Advances neural inform.n process. syst., pp. 1097–1105. Cited by: §III-B.
-  (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), pp. 2278–2324. Cited by: §III-B.
-  (2017-Aug.) Ar-alarm: an adaptive and robust intrusion detection system leveraging csi from commodity wi-fi. In Proc. Int. Conf. Smart Homes Health Telematics, Paris, France, pp. 211–223. Cited by: §I.
-  (2017) IndoTrack: device-free indoor human tracking with commodity wi-fi. Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technologies 1 (3), pp. 72. Cited by: §I.
-  (2014-Dec.) Wi-sleep: contactless sleep monitoring via wifi signals. In Proc. IEEE Real-Time Syst. Symp., Rome, Italy, pp. 346–355. Cited by: §I.
-  (2018) SignFi: sign language recognition using wifi. Proc. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 2 (1), pp. 23. Cited by: §I, §I.
-  (2009-Mar.) Smart cevices for smart environments: device-free passive detection in real environments. In Proc. IEEE Int. Conf. Pervasive Computing and Commun., Galveston, TX, USA, pp. 1–6. Cited by: §I.
-  (2016-Nov.) Channel state information based human presence detection using non-linear techniques. In Proc. 3rd ACM Int. Conf. Systs Energy-Efficient Built Environments, Palo Alto, CA, USA, pp. 177–186. Cited by: §I, §I, §III-A2.
-  (2018) FallDeFi: ubiquitous fall detection using commodity wi-fi devices. Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1 (4), pp. 155. Cited by: §I, §I.
-  (2018) Enabling contactless detection of moving humans with dynamic speeds using csi. ACM Trans. Embedded Computing Syst. (TECS) 17 (2), pp. 52. Cited by: §I, §I.
-  (2017-Jul.) Widar: decimeter-level passive tracking via velocity monitoring with commodity wi-fi. In Proc. 18th ACM Int. Symp. Mobile Ad Hoc Networking Computing, Chennai, India, pp. 6. Cited by: §I.
-  (2002) Wireless communication - principles and practice. 2nd edition, Prentice Hall, Upper Saddle River, NJ, USA. Cited by: §II-B.
-  (2017-Jun.) Peripheral wifi vision: exploiting multipath reflections for more sensitive human sensing. In Proc. Int. Workshop Physical Analytics, Niagara Falls, NY, USA, pp. 13–18. Cited by: §I, §I.
-  (2010) A survey of human-sensing: methods for detecting presence, count, location, track, and identity. ACM Computing Surveys 5 (1), pp. 59–69. Cited by: §I.
-  (2017) Device-free human activity recognition using commercial wifi devices. IEEE J. Select. Areas Commun. 35 (5), pp. 1118–1131. Cited by: §I.
-  (2014-Sep.) E-eyes: device-free location-oriented activity identification using fine-grained wifi signatures. In Proc. 20th Annu. Int. Conf. Mobile computing networking, Hawaii, USA, pp. 617–628. Cited by: §I.
-  (2015) Non-invasive detection of moving and stationary human with wifi. IEEE J. Select. Areas Commun. 33 (11), pp. 2329–2342. Cited by: §I, §I.
-  (2015) Precise power delay profiling with commodity wifi. In Proc. 21st Annu. Int. Conf. Mobile Computing and Networking, MobiCom ’15, New York, NY, USA, pp. 53–64. External Links: Cited by: §IV-A.
-  (2019) Indoor events monitoring using channel state information time series. IEEE Internet Things J.. Cited by: §I, §I.
-  (2018-Apr.) Time reversal indoor tracking with centimeter accuracy. In Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Calgary, AB, Canada, pp. 6433–6437. Cited by: §I.
-  (2014-Sept.) Your ap knows how you move: fine-grained device motion recognition through wifi. In Proc. 1st ACM workshop on Hot topics in wireless, Hawaii, USA, pp. 49–54. Cited by: §I.
-  (2013-Apr.) Radio tomographic imaging and tracking of stationary and moving people via kernel distance. In Proc. ACM/IEEE Int. Conf. Inform. Process. Sensor Networks (IPSN), Philadelphia, PA, USA, pp. 229–240. Cited by: §I, §I.
-  (2017) Design and implementation of a csi-based ubiquitous smoking detection system. IEEE/ACM Trans. Networking 25 (6), pp. 3781–3793. Cited by: §I.
-  (2017) R-ttwd: robust device-free through-the-wall detection of moving human with wifi. IEEE J. Select. Areas in Commun. 35 (5), pp. 1090–1103. Cited by: §I, §I, §III-A2.
-  (2017-06) Freedetector: device-free occupancy detection with commodity wifi. In Proc. IEEE Int. Conf. Sensing, Commun. and Networking (SECON Workshops), San Diego, CA, USA, pp. 1–5. Cited by: §I, §I.