With the increasing popularity of WiFi technology and wide availability of WIFi infrastructure, WiFi-based smart human sensing is receiving more attention and plays an important role in Internet of Things (IoT) systems. It has many applications in health care, security, entertainment and tailored services [47, 5, 1, 27].
WiFi-based sensing technology is enabled by Channel State Information (CSI) [11, 13, 12] which is a fine-grained measurement at the physical layer from a subcarrier channel. It can reflect the complex multipath effect caused by human motion due to its frequency diversity and capture multipath propagation of WiFi signal from transmitters to receivers over multiple subcarriers, which reveals various influences of human motion. As a result, different human behaviors can be detected and recognized by observing the variations of CSI data. Compared with other existing smart human sensing technologies using either vision-based sensing techniques or wearable sensor based techniques, WiFi-based smart sensing has many advantages. It does not require either line of sight or illumination. Besides, it leverages the existing WiFi infrastructure in buildings and at homes, which reduces the additional cost of wearable devices and is more user-friendly. It also provides better privacy protection than cameras based solutions.
Though existing CSI-based smart human sensing systems have achieved decent performances in some applications, many of them still suffer from serious performance degradation under multiple environments. A high-accuracy system trained in one environment cannot be readily deployed in another environment due to the performance degradation caused by the different environment setting.
There are currently two main methods to address this problem: model based method and learning based method. Model based method aims to build a signal model using extracted signal parameters which suffer least perturbation under changing environment dynamics. However, it requires expert knowledge for parameter selection. Meanwhile, by selecting only parts of the signal information from the complete CSI sequences, it may cause loss of some important features. Most importantly, it is hard for model based methods to recognize complex human behaviors, such as gestures due to their weak fitting ability. Recently learning based method has drawn increasing attention with the development of deep learning techniques. One popular technique applied in the CSI-based human behavior recognition is Domain Adaption (DA)[7, 48, 32]. DA is able to transfer a CSI-based sensing system trained in one environment (source domain) to another environment (target domain). But the DA based CSI human behavior sensing systems require a large number of CSI samples from the new environment to perform domain adaption, which is not practical in many scenarios. In , simulated fake data is used to solve this issue, however, an adequate amount of CSI data from the new environment are still needed to generate fake data. In order to address this problem, we aim to use the CSI data from multiple training environments to train a generalized system so that the model can be applied to a new testing environment without collecting any CSI data from the testing environment.
In this paper, we propose a novel Augmented environment-Invariant Robust WiFi gesture recognition system AirFi that aims to solve the performance degradation of a CSI-based smart sensing system in unseen environments. In our scenario, CSI data from the testing environment is not available. AirFi addresses this issue by generalizing its system model to the testing environment. We are inspired by the idea of Domain Generalization (DG) [31, 19, 63]
. AirFi is trained using CSI data from multiple training environmental settings in such a way that the model can generalize well to a new environment setting. Firstly, AirFi uses an encoder to extract feature codes from CSI data collected in several training environment settings. Then with the extracted features mapped on the feature plane, AirFi minimizes the distribution differences between feature codes from different environment settings. Finally, the feature codes are used to train the classifier. In this way, AirFi is able to generalize its model to unseen environment settings. Besides, in order to enhance the system model training, an additional random prior distribution is introduced to the feature extraction process in an adversarial manner. It reduces the dependency between the model and training CSI data. Data augmentation and feature augmentation techniques are also applied to improve the system training. Experiments show that AirFi achieves decent performance and outperforms the benchmarking reference systems.
The contributions of the paper are summarized as follows:
We propose a CSI-based gesture recognition system AirFi that can generalize to a new environment without any new data by domain generalization. To the best of our knowledge, the AirFi is the first work that deals with the environment dependency issue without collecting new data or adapting the model in the new environment.
For better generalization, we augment the CSI data and feature codes to improve their representativity. Unlike previous works which augment CSI data and features randomly, in this paper the augmentation is designed to be more aggressive on the domain direction while less aggressive on the class-wise direction using a label dependent regularizer.
In the new environment, by applying the few-shot learning technique the performance of our proposed system AirFi can be further improved with a few CSI data from the testing environment setting.
Experiments show that our proposed system gives decent performances across different environments. With few-shot learning techniques, the performances are further improved.
The rest of the paper is organized as follows: Section II discusses related works. Section III provides the detail of AirFi system design. Section IV shows experimental results and comparisons with existing works. Section V concludes the paper and provides recommendations for some future research topics.
2 Related Works
In this section, we are going to review some previous works on CSI-based behavior recognition and their methods to overcome the environmental dynamics. Besides, we will also review some related works on DA and DG. Finally, some few-shot learning works will be reviewed.
2.1 CSI based Human Behavior Recognition
Human behavior recognition has been receiving great attention in recent years [18, 59, 44]. Using large number of CSI samples, an accurate human behavior recognition system can be built. Reference  proposes a system that uses the CSI to recognize human gestures. In , the system model is further improved with an environmental noise removal mechanism to mitigate the effect of signal dynamics due to environment changes. Reference 
proposes a deep learning-based approach which is called attention based bi-directional long short-term memory (ABLSTM), for passive human activity recognition using WiFi CSI signals. WiGrus leverages the CSI to recognize a set of hand gestures using software defined radio. In , the authors propose a novel deep Siamese representation learning architecture for one-shot gesture recognition.
While the issue of environmental dynamics is becoming one of the most critical challenges faced by existing CSI-based human behavior sensing systems, many research works have studied this problem. Existing methods can be categorized into two main categories, model based methods and learning based methods.
Model based method manages to use signal parameters to build a signal model which suffers the least perturbation under changing environment dynamics. References  and  use Fresnel Zone for human respiration and walking detection. Carm  uses both CSI-speed model and CSI-activity model to quantify the correlation between the movement speeds of different human body parts and specific human activities. Doppler shifts are measured and used for determining the directions of human motions in WiSee . A movement towards the receiver causes a positive Doppler shift, while a movement away from the receiver results in a negative shift. WiAnti, a CSI-based activity recognition system that addresses the issue of co-channel interference by using adaptive subcarrier selection is proposed in . Reference  proposes a location-free activity recognition system using Angle Difference of Arrival (ADoA). Though model based methods achieve good performances in these aforementioned works, they have some limitations. First, expert knowledge is required for parameter selection. Besides, selecting only parts of the signal information may cause some important features missing. Lastly, it is hard for model based methods to recognize different kinds of human activities due to their weak fitting ability.
. With the development of computer vision and deep learning algorithms, some works on CSI-based human sensing are highly inspired by some novel deep learning research topics. Some recent works apply the idea of DA from deep learning research. Reference proposes a novel scheme for CSI-based behavior recognition task that uses an activity filter-based deep learning network with enhanced correlation features to achieve robustness under different environmental settings. In , an environment-robust CSI-based human behavior sensing system is proposed. It leverages the properties of a matching network and enhanced features to create an environment-robust behavior recognition. Reference 
adapts the idea of generative adversarial network to perform the domain adaption for model transfer. Reference applies a roaming model which is also able to transfer the system model to a new environment using labeled data from target environment. DA transfers the system model trained in one environment (source domain) to be applied in a new different environment (target domain).
DA, however, requires a large number of CSI samples from the new environment, which is not practical in many scenarios. MCBAR  and CSIGAN  use simulated fake data to mitigate this issue, however, an adequate amount of CSI data from the new environment is still needed to perform the domain transfer of the trained model. In this paper, we study the problem of human activity recognition with limited or no data from the target domain with the idea of using CSI data from multiple training environments to train the system so that the system model can be better generalized to a new testing environment.
2.2 Domain Adaption and Generalization
In the last few years, great success has been achieved by machine learning. The corresponding works have benefited many real-world applications including the CSI-based human sensing field. However, it takes a lot of resources to collect and annotate each dataset for new tasks. Especially when the number of samples and domains are very large, it can be an extremely resource-consuming and time-consuming process. Besides, sufficient data samples will not always be available in certain circumstances. For example, it is not user-friendly to collect large numbers of data from users when the systems are deployed in users’ places. This motivates the research works on reusing a trained model in a new domain. DA is one of the methods proposed to achieve this goal.
Recent works focus on transferring network representations from the source domain where labeled data datasets are easy to acquire to a target domain where labeled data is sparse or even non-existent . Reference  proposes a new CNN architecture that introduces an adaptation layer and an additional domain confusion loss, to learn a representation that is both semantically meaningful and domain invariant. In 
, a new Deep Adaptation Network architecture is proposed which generalizes deep convolutional neural network to the domain adaptation scenario. Reference
makes a shared-latent space assumption and proposes an unsupervised image-to-image translation framework based on Coupled GANs. In, a multimodal unsupervised image-to-image translation framework is proposed by assuming that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. Both  and  can provide the ability that data from one domain can be transferred into another domain without changing the categories of the data samples. In 
, the CoGAN learns a joint distribution of images in the two domains from images drawn separately from the marginal distributions of the individual domains by enforcing a simple weight-sharing constraint. The main strategy is to guide feature learning by minimizing the difference between the source and target feature distributions. Some other methods also manage to minimize the Maximum Mean Discrepancy (MMD) loss for this purpose. DA has achieved great success and benefited many systems and applications. However, the limitation of DA is that it still needs many data samples from the target domain in order to perform the domain transfer of system models. In many other scenarios, there may not be any data of the target domain during the training phase, but the system is still needed to build a precise model for a totally new target domain. To address this problem, domain generalization is proposed.
DG leverages the labeled data from multiple source domains to learn a universal representation, which is expected to generalize well for an unseen target domain . DG is firstly introduced in . They identify an appropriate reproducing kernel Hilbert space and optimize a regularized empirical risk over the space. Then in , a discriminative framework is used to directly exploit dataset bias during training. In 
, they propose a new framework for estimating generative models via an adversarial process using a generative model and a discriminative model. Reference utilizes MMD, which leads to a simple objective that can be interpreted as matching all orders of statistics between datasets and samples from the model. Reference  leverages deep neural networks for domain-invariant representation learning and achieves end-to-end conditional invariant deep domain generalization. The above works inspire us to use several environments where data is relatively easier to collect as source domains. Then we generalize and apply the trained model into a new environment which is regarded as the target domain. As it is difficult to collect large numbers of CSI data to train a generalized system model in our research problem, we utilize the data and feature augmentation techniques to improve the model generalization.
2.3 Few-shot Learning
Machine learning has been highly successful in data-intensive applications, but is often hampered when the data set is small . It reduces its scalability to new classes. To address this problem, few-shot learning is proposed to tackle this problem. Few-shot learning aims to train a system model using very few labeled samples.
Rather than traditional convolutional neural networks which need a large number of data to train the network layers, few-shot learning techniques manage to build a model with only a limited amount of data.
In references [9, 34, 51, 30], neighborhood component analysis is applied. They optimize the K Nearest Neighbor accuracy in the feature space with limited amounts of data to build their models. In , a non-linear embedding is learned in an end-to-end manner. By minimizing the distance between different feature codes and their label-dependent central points, a non-linear classifier is constructed.
parameterizes and learns the classification metric using a Multi-Layer Perceptron (MLP). In, task-specific support images are used to fine-tune the feature extractor network. In , the authors demonstrate that a simple class-covariance-based distance metric, namely the Mahalanobis distance, leads to a significant performance improvement.
In our work, though AirFi can give decent performances without any CSI data from the new environment settings, we found that by adding on the few-shot learning techniques the performance can be further improved with a few CSI data from the deployed environment setting.
3 System Overview
We illustrate our system AirFi in Fig 1. AirFi is composed of four stages: data augmentation, feature extraction & augmentation, domain generalization and classifier training. In order to generalize the trained model, a basic assumption is that there exists a feature space underlying different domains. In , it is proved that there is a common feature space between CSI samples of different human behavior from different environment settings. As shown in Fig 1, we collect data from
different environments which are referred to as source environments. Then we augment our collected CSI samples by adding an arbitrary Gaussian noise, which helps to generate more simulated CSI samples. AirFi uses an encoder to encode the collected data and simulated data and extract the down-sampled features. The extracted feature codes are also augmented to improve their diversity and projected onto the hidden codes space for further training. To avoid the issue of overfitting, we take the advantage of Adversarial Autoencoders
. We introduce a prior distribution to regularize the distribution of the feature codes using an adversarial training approach. The decoder is used to decode the feature codes back to source environment CSI data, which helps unify the extracted feature codes. The feature codes are mapped onto the feature space underlying all source environments. AirFi minimizes the distribution variance among different training environments based on the MMD. Finally, a classifier is trained to recognize different human gestures using the feature codes and their corresponding label information. We will present each part of AirFi in detail in the following sections.
3.1 Data Augmentation
CSI samples of human gestures are collected from source environments where CSI data are relatively easier to acquire. These environments are referred to as training environments. To have more representative CSI data collections for a better generalization result, data from more domains should be collected if possible. However, due to limited time and human resources, it is very difficult and expensive to collect CSI data from all different environment settings. Furthermore, each environment is generally dynamic as well. In some previous works [45, 52, 62], Gaussian noise is widely used for data augmentation purpose. In , they build a multimodal CSI model for simulated CSI data generation. They found that by adding an arbitrary Gaussian noise to the collected CSI data, they are able to generate fake CSI data. The introduced Gaussian noise will not change the label of the CSI data, moreover, these fake CSI data can be used to approximate the distribution of the related CSI data in other environment settings.
We denote the collected CSI data pairs from different environments as where is the collection of CSI samples and is the collection of gesture labels. To augment the datasets and improve its diversity, an arbitrary Gaussian noise is added to each CSI data sequence. By combining the original collected CSI data sequence and the Gaussian noise, a new simulated CSI data sequence is generated. As explained above, the introduction of Gaussian noise does not change the label of the CSI data. As in wireless signal transmission perspective, the received signal can be modeled as a combination of the transmitted signal multiplying the transmission channel matrix and the Gaussian noise. By combining Gaussian noise with the CSI data, it will not change the class label of the data in terms of gesture recognition purposes [45, 52, 15]. The new generated CSI datasets can help us to approximate the CSI data distribution in other environment settings, which improves the diversity of our CSI datasets and benefits the system training.
3.2 Feature Extraction via Adversarial Learning
Given the augmented CSI data, an encoder is used for feature extraction. For CSI data from source environments, AirFi uses the same encoder to extract feature codes from them. The encoder is an adversarial autoencoder. When the input CSI data pass through each convolutional layer in the encoder, they are downsampled and feature codes are extracted.
As AirFi utilizes CSI data from all source environments to train its model, it may cause the overfitting issue during the training phase. The model trained may follow too closely to the given training data and has a strong dependency on the source environment CSI datasets. This will harm model generalization. Unless this issue is addressed, the training process of AirFi would be like a supervised learning with all source domains. It is important that the model is trained using all source domain data, meanwhile it does not have a strong dependency on the training data. Only in this way can the model learn the common feature space of CSI data from different environments and be generalized to other unseen environments. To address this issue, a prior distributionis imposed as an additional domain besides the data source environment domains. Whenever the feature codes are extracted, a regularization code is also generated from the prior distribution. Both of them are sent to the discriminator that is used to distinguish between the feature code and regularization code. This process is similar to the generative adversarial network. The adversarial loss is given by
By minimizing the adversarial loss, we impose the dependency of the feature codes extraction on the prior distribution. This can reduce the dependency of the system model on the training CSI data. With the introduction of the prior distribution, we expect that the issue of overfitting to the source domains data can be addressed. In theory, the prior distribution can be any arbitrary distribution . It is introduced to enable the adversarial encoder to extract feature code with less dependency on original CSI data distribution. Therefore the trained model can generalize better to the testing environment. We use the Laplace distribution as the prior distritbution in AirFi. We compare between several popular distribution used in related works  and is also applied to decode the feature code back to the source domain CSI data form. The reconstruction loss is given by
The reconstruction process can unify the content of feature codes encoded from different training environments and improve the level of generalization to other unseen environments.
3.3 Label Dependent Feature Augmentation
To further enhance the generalization ability of AirFi, we augment the feature codes. As shown in , besides data augmentation, feature augmentation is also able to improve the model generalization by improving the diversity of feature codes. Given the collection of feature code , which is extracted from the input CSI data using the encoder . We have
Then we input the feature codes into the augmentation layers . In , the feature codes are augmented by scaling and adding bias in their networks. The scaling changes the absolute difference between elements in the feature codes, while the bias changes the absolute mean value of the feature codes. In AirFi, after the feature extractor, the sampled CSI feature codesgiven by
are the scale and bias hyperparameters, and sampled from two Gaussian distributions,
where and are two scalar hyperparameters. We set to reduce the number of hyperparameters. The perturbation introduced improves the diversity of feature codes and benefits the model generalization. However, one of the limitations brought by the feature augmentation is that the perturbation caused by the random noise may not follow the class-preserving direction. The augmented feature codes may lose some properties of their own behavior classes and are embedded with some new properties of other behavior classes. This will affect the model training and performances. In order to augment the feature codes to improve their diversity meanwhile preserving those feature properties of their own behavior classes. We add a label dependent regularizer to the augmentation layer in AirFi. The augmentation process becomes
The regularizer is sampled from a class-wise normal distribution , , where is the gesture class index, is the total number of gesture classes and is the class-wise covariance. is estimated and updated from every mini-batch of training data in a moving average manner
where is the discount factor. The corresponding is only updated when the label of CSI data belongs to its own class . During the training phase, AirFi calculates the class-wise covariances of data from each class and update the of each class. Then the elements of are sampled as,
With the additional regularizer, the feature codes are augmented more aggressively along the cross domain direction instead of the class-wise direction. Though the feature code is augmented with random variables, it is added with the covariance of its own gesture class to preserve key properties of that particular class. As a result the augmented feature codes are similar to those original feature codes of their own classes and have some perturbation introduced by the random variables and . This improves the diversity of feature codes from different gesture classes and leads to better performances of AirFi. We perform an ablation study to test it in our experiments.
3.4 Domain Generalization
The feature codes are mapped onto the feature space for domain generalization. Denote the feature codes from source environment as with the distribution of CSI data. To perform the mapping, a mean map operation is required to map the feature codes to a reproducing kernel Hilbert space , which is given as
where is the kernel function. For AirFi, we use the Radius Bias Function (RBF) kernel, which is a well-known and commonly used characteristic kernel .
To achieve domain generalization of the system model, the mapped feature codes from different domains are supposed to be clustered together on the feature space. AirFi fulfills this purpose by minimizing the MMD between different distributions. The MMD between feature codes from two source environments can be measured by
By extending it from two source environments to multiple environments, the distribution variances between different feature domains is calculated as as
where is the mean distribution for all training environments. The indices of any two source environments are denoted by and . As shown in the equation, the distribution variances are upper bounded. Therefore we use the distribution regularization loss
By minimizing , the distribution variances between each source domain are also reduced. As a result, the model trained can be generalized between different domains.
3.5 Classifier Optimization
With the features identified, a classifier is added at the end of AirFi. The classifier consists of three fully connected layers . With the generalized feature codes on the feature space and corresponding gesture labels, AirFi trains its classifier in a supervised learning manner. AirFi uses the cross-entropy loss to measure classification errors :
3.6 Few-Shot Learning
In our study, we find that though AirFi can achieve a decent performance without any training CSI samples from the target environments, a few CSI samples from the target environment can indeed help to further improve its performances. It is possible that the feature codes encoded from testing environment CSI data may not be mapped closely to the distribution of source environment feature codes. To address this issue, a few labeled CSI samples from the testing environment can be very helpful. After AirFi is trained using the source domain data from the training environment, we input the testing environment CSI data and minimize the distribution difference between the source environment and testing environment CSI data with the new distribution regularizer loss .
We retrain the system with the source environment, target environment distribution loss and classification cross-entropy loss which also includes the testing environment labeled data .
Then the total few shot learning loss is given by
After the few-shot learning is added, the trained model has a better generalization ability on the target environment. We show the improvement in Section IV.
|CNN Only ||31.23%||29.57%||36.55%||32.48%||29.63%||34.11%||30.86%||28.99%|
|CNN Only ||35.95%||39.11%||37.49%||46.24%||40.18%||41.79%||39.85%||40.43%|
|CNN Only ||31.01%||40.42%||34.15%||32.68%||36.76%||37.91%||35.31%||37.82%|
|CNN Only ||38.44%||37.98%||32.17%||30.19%||31.74%||32.68%||32.48%||36.37%|
|AirFi w/o Data Aug||85.15%||87.06%||83.47%||86.72%||85.96%||87.17%||85.63%||86.31%||ABC-D|
|AirFi w/o Fea Aug||84.34%||85.65%||84.98%||86.10%||83.57%||86.21%||85.67%||82.44%|
|AirFi w/o Data Aug||84.83%||86.94%||87.75%||85.28%||84.67%||85.13%||83.71%||86.14%||ABD-C|
|AirFi w/o Fea Aug||82.68%||85.14%||84.36%||86.11%||84.09%||83.76%||82.97%||84.59%|
|AirFi w/o Data Aug||82.87%||83.94%||84.72%||83.16%||84.71%||85.93%||83.66%||84.68%||ACD-B|
|AirFi w/o Fea Aug||83.91%||82.70%||83.05%||84.61%||83.14%||83.45%||83.17%||82.08%|
|AirFi w/o Data Aug||83.95%||82.27%||84.78%||85.49%||82.29%||84.77%||84.52%||83.96%||BCD-A|
|AirFi w/o Fea Aug||83.01%||82.94%||84.95%||82.39%||81.79%||83.28%||83.26%||84.73%|
|CNN Only ||39.57%||41.08%||47.11%||34.95%||43.89%||41.97%||43.44%||37.96%|
|CNN Only ||45.93%||41.08%||35.12%||40.58%||38.94%||44.421%||37.01%||43.18%|
|CNN Only ||48.15%||34.18%||42.69%||39.48%||41.56%||42.00%||38.14%||40.82%|
|CNN Only ||41.33%||47.86%||42.17%||39.68%||39.42%||43.71%||43.62%||44.76%|
To be applied in a different environment, AirFi does not require any CSI data from the target environments as in the case for most existing systems. Using CSI data from several training environments, AirFi aims to build a generalized system model. In this section, we conduct multiple experiments to evaluate AirFi under different environments. Firstly, we introduce the experimental setup. Then we do an overall evaluation to compare AirFi with other CSI-based smart human sensing systems. Thirdly, we do an ablation study to investigate the impacts of different components in AirFi. Besides, we test AirFi with few-shot learning added on and observe how it improves the performances. Finally, we use the T-SNE plots to show the distribution of hidden features with different system designs.
4.1 Environment Setup and Data collection
AirFi is designed to be generalizable to different environment settings. In order to test its ability, our experiments are performed in four different environment settings: lab, cubic office, meeting room and tutorial room. We select them as their layouts and furniture are very different from each other, which can be used to test the performances of compared systems in different environments. Their layouts are shown in Fig 2. In each location, two routers are used. One router is the transmitter (one antenna), and the other router is the receiver (three antennas). We have upgraded the firmware of both routers to our CSI enabled platform for data collection. The transmitter is operated in 802.11n AP mode at 5 GHz with a 40 MHz bandwidth and the receiver is connected to the transmitter in client mode. The detail of the environments is as follows.
Environment A (lab environment). The furniture in the lab is mainly lab benches and chairs. The routers are placed on two opposite lab benches. The volunteer performs different human gestures, while sitting in the middle between the two lab benches.
Environment B (cubic office environment). The furniture in the cubic office is mainly cubical desks and chairs. The two routers are placed on two different desks as shown in the layout figure. The volunteer performs different gestures in the middle area.
Environment C (tutorial room environment). The furniture is mainly round table and chairs. There is also one big screen on the wall. The two routers are placed on two tables. The volunteer performs different gestures in the testing area.
Environment D (meeting room environment). There is a big table in the center of the room. Chairs are put around the table. We place two routers at one side of the table, and the volunteer performs different gestures beside the table.
We have selected 8 volunteers aging from 19 to 27 to participate in our experiments. 5 of them are males and 3 of them are females. Our experiments involve 8 categories of human gestures including up & down, left & right, back & forward, clap, fist, circling, throw, zoom. Each volunteer performs different gestures while the transmitter sends signal packets to the receiver. The CSI enabled platform  measures and stores the CSI data.
For each gesture, 200 CSI samples are recorded at each experimental location. In total, each gesture has 800 CSI samples collected. AirFi only uses their amplitude information for gestures recognition. There are 114 subcarriers of each CSI sample received by our router platform during the collection. The input size of our CSI data is .
4.2 Overall Evaluation
We compare the system AirFi with state-of-the-art CSI-based gesture recognition systems. For the compared systems, we select WiGr, WGRDTL and Wi-Multi [61, 4, 8]. The compared systems are selected as they also have the ability to adapt to the new environment by taking the advantage of domain transfer. Besides, we remove the generator components of MCBAR  and only keep the feature extractor and classifier components which are basically convolutional neural networks (CNN). The remaining CNN is able to perform accurate human behavior recognition within one environment. Actually the CNN structure is widely used for classification purpose in many CSI-based human sensing systems. However, it does not have fitting ability to adapt into an unseen environment. We train the CNN to have over 90% accuracy in the training environment, then we deploy it in the testing environment together with other compared systems. To test the ability of each method in adapting to a new environment, we use the CSI data from three environments as the source environment data and the CSI data from the left environment as the testing environment data. For example, when environment A, B, C are used as training environment, D will be used as the testing environment, and denote the setting as ABC-D. All four different combinations are tested in the experiment. The testing environment data are not available for any systems including AirFi during the training phase.
The experiments results are shown in Table I. As shown in the table, AirFi outperforms all other compared systems in all testing environments with an accuracy of around 90%. The obvious performance degradation of CNN demonstrates that there are large variations of collected CSI data from different environment settings. Though the compared systems are also designed to adapt to the a environment, they need a large number of testing environment data to perform the domain adaption. As in our scenario, all systems are not able to acquire the testing domain data, their models are not able to be functional as they should be. WiGr has the second best performance with an overall accuracy of over 80%. In order for it to adapt to a new environment better, it requires CSI data from the target environment. However, this is not provided in the experiment. AirFi takes the advantage of domain generalization. It manages to extract the common feature codes from these source environment data and generalize them on the feature space by minimizing the feature codes distribution differences. The training of AirFi does not need any CSI data from the testing environment.
4.3 Ablation study
In order to equip the system AirFi with the ability to generalize to new environment settings, we take the advantage of domain generalization. We also augment our CSI datasets and feature codes to improve the performance. We use an ablation test to study how each component contributes to the system AirFi. We compare the performances between the pure CNN, AirFi without data augmentation, AirFi without feature augmentation and complete AirFi. The experiments setups are the same as the previous overall evaluation. In each test, three of them are used as the training environments and the remaining one as the testing environment.
The testing results are shown in Table II. CNN performs worst among the compared systems, while AirFi is still able to generalize its system model which is the most important ability of AirFi. To generalize the system model of AirFi, it needs the CSI data from different training environments. Both CSI data and feature augmentation are used to further enhance the generalization ability. Without these two techniques, the remaining parts of AirFi can still generalize its model and outperform the CNN which does not have any ability to adapt to a new environment. For AirFi without either data augmentation or feature augmentation, their accuracies all get worse compared to the complete AirFi. In other words, both augmentation techniques contribute to the performances of AirFi. It is observed that after removing either data augmentation or feature augmentation, the performance degradation of the two compared systems are very close to each other, which is about 4% to 6%. The missing of feature augmentation affects AirFi slightly more than data augmentation. In fact, both augmentation techniques make the feature codes of training to be more representative. With diverse feature codes, AirFi is able to generalize better on the feature space. The data augmentation improves the diversity of CSI training datasets. It generates more simulated CSI data so that more feature codes can be extracted from these generated CSI data. On the other hand, the feature augmentation works directly on the feature codes. It makes the feature codes more representative. With the help of these two augmentation techniques, the feature codes generalized on feature space are more diverse and AirFi has a higher possibility to generalize to a new environment setting.
4.4 Few Shot Learning Adds on
In the previous evaluations, CSI data from the testing domain environments are totally not available during the training phase. We also explore the situation that only a few CSI data from the testing domain environments are used for system training. As for the compared systems, they use domain adaption techniques which require a large number of CSI data to transfer their model to the new environment. We improve AirFi with a few-shot learning technique added on so that the generalized model can be further enhanced using a few labeled CSI samples. For the evaluation, this time, 10 CSI samples from the testing domain environments are available during the training phase for all systems. The environment settings and compared systems are the same as they are in the overall evaluation.
The results are shown in Table III. Obvious improvement of performances can be observed for all systems. AirFi still outperforms the other compared systems. As the amount of the given CSI data is very small, the compared systems do not have enough CSI data to fully retrain their models. As WiGr is also equipped with the few-shot learning property in their prototypical model, it performs the second best among the compared systems. While for AirFi, its system model is already generalized to different environments. By applying the few shot learning techniques, AirFi is able to improve its performances with a limited amount of data and generalize even better to the testing environment. AirFi manages to minimize the distribution difference between the given CSI data from the testing environments and the training environments, which can be achieved with small amounts of CSI data.
4.5 Distribution Visualization
To better understand how AirFi can generalize its model to different environments, we use the T-SNE plotting to visualize the distribution of feature codes . We plot the hidden features of CSI data from four environments, which are environment A to D.
As shown in Fig 3(a), for a trained system without domain generalization which can be a convolutional neural network, the hidden features of CSI data from one environment are gathered together while they are away from those of other different environments. When it is applied to a new environment, its model cannot recognize CSI data as the distribution between them is very large. For AirFi which is equipped with the ability of domain generalization, feature codes from different environment settings are gathered together as their distribution differences are minimized during the training phase. This is shown in Fig 3(b)
. As a result, the model trained has a high probability to generalize to the new environment.
This paper has investigated the problem that a CSI-based human sensing system suffers from serious performance degradation under different environments. To deal with this problem, we proposed AirFi which takes the advantage of domain generalization to train a generalized model that can be applied to different environments. Moreover, the training of AirFi does not require CSI data from the testing environment which is more suitable to the real-world situation. The experimental results show that AirFi outperforms state-of-the-art in this field.
This research is supported by Agency for Science, Technology and Research (Singapore) under AGS scholarship. This work is jointly supported by NTU Presidential Postdoctoral Fellowship, “Adaptive Multimodal Learning for Robust Sensing and Recognition in Smart Cities” project fund, in Nanyang Technological University, Singapore.
-  (2019) A Ubiquitous WiFi-Based Fine-Grained Gesture Recognition System. IEEE Transactions on Mobile Computing 18 (11), pp. 2474–2487. External Links: Cited by: §1.
Improved Few-shot Visual Classification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14493–14502. Cited by: §2.3.
-  (2011) Generalizing from Several Related Classification Tasks to A New Unlabeled Sample. Advances in Neural Information Processing Systems 24. Cited by: §2.2.
Wi-fi based Gesture Recognition using Deep Transfer Learning. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 590–595. Cited by: TABLE I, TABLE III, §4.2.
-  (2022) WiFace: Facial Expression Recognition Using Wi-Fi Signals. IEEE Transactions on Mobile Computing 21 (1), pp. 378–391. External Links: Cited by: §1.
-  (2019) WiFi CSI Based Passive Human Activity Recognition Using Attention Based BLSTM. IEEE Transactions on Mobile Computing 18 (11), pp. 2714–2724. External Links: Cited by: §2.1, §2.1.
-  Frustratingly Easy Domain Adaptation. arXiv preprint arXiv:0907.1815, year=2009. Cited by: §1.
-  (2019) Wi-multi: A Three-phase System for Multiple Human Activity Recognition with Commercial WiFi Devices. IEEE Internet of Things Journal 6 (4), pp. 7293–7304. Cited by: TABLE I, TABLE III, §4.2.
-  (2004) Neighbourhood Components Analysis. Advances in Neural Information Processing Systems 17, pp. 513–520. Cited by: §2.3.
-  (2014) Generative Adversarial Nets. Advances in Neural Information Processing Systems 27. Cited by: §2.2, §3.2.
-  (2010) Predictable 802.11 Packet Delivery from Wireless Channel Measurements. ACM SIGCOMM Computer Communication Review 40 (4), pp. 159–170. Cited by: §1.
-  (2011) Tool Release: Gathering 802.11 n Traces with Channel State Information. ACM SIGCOMM Computer Communication Review 41 (1), pp. 53–53. Cited by: §1.
A New Method using Covariance Eigenvalues and Time Window in Passive Human Motion Detection based on CSI Phases. In 2017 IEEE 5th International Symposium on Electromagnetic Compatibility (EMC-Beijing), pp. 1–6. Cited by: §1.
-  (2018) WiAnti: an Anti-Interference Activity Recognition System Based on WiFi CSI. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Vol. , pp. 58–65. External Links: Cited by: §2.1.
-  (2018) Multimodal Unsupervised Image-to-Image Translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189. Cited by: §2.2, §3.1.
-  (2012) Undoing the Damage of Dataset Bias. In European Conference on Computer Vision, pp. 158–171. Cited by: §2.2.
-  (2015) Siamese Neural Networks for One-shot Image Recognition. In ICML Deep Learning Workshop, Vol. 2, pp. 0. Cited by: §2.3.
Two-stream Convolution Augmented Transformer for Human Activity Recognition.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 286–293. Cited by: §2.1.
-  (2018) Learning to Generalize: Meta-learning for Domain Generalization. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §1.
-  (2018) Domain Generalization with Adversarial Feature Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409. Cited by: §2.2, §3.2, §3.4, §3.
-  (2021) A Simple Feature Augmentation for Domain Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8886–8895. Cited by: §3.3, §3.3.
-  (2018) Deep Domain Generalization via Conditional Invariant Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 624–639. Cited by: §2.2.
-  (2020) Location-free CSI based Activity Recognition with Angle Difference of Arrival. In 2020 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6. Cited by: §2.1.
Generative Moment Matching Networks. In International Conference on Machine Learning, pp. 1718–1727. Cited by: §2.2.
-  (2017) Unsupervised Image-to-Image Translation Networks. Advances in Neural Information Processing Systems 30. Cited by: §2.2.
-  (2016) Coupled Generative Adversarial Networks. Advances in Neural Information Processing Systems 29. Cited by: §2.2.
-  (2016) Contactless Respiration Monitoring Via Off-the-Shelf WiFi Devices. IEEE Transactions on Mobile Computing 15 (10), pp. 2466–2479. External Links: Cited by: §1.
-  (2015) Learning Transferable Features with Deep Adaptation Networks. In International Conference on Machine Learning, pp. 97–105. Cited by: §2.2.
-  (2015) Adversarial Autoencoders. arXiv preprint arXiv:1511.05644. Cited by: §3.
A Deep Non-linear Feature Mapping for Large-margin KNN Classification. In 2009 Ninth IEEE International Conference on Data Mining, pp. 357–366. Cited by: §2.3.
-  (2013) Domain generalization via Invariant Feature Representation. In International Conference on Machine Learning, pp. 10–18. Cited by: §1.
-  (2010) Domain Adaptation via Transfer Component Analysis. IEEE Transactions on Neural Networks 22 (2), pp. 199–210. Cited by: §1.
-  (2013) Whole-home Gesture Recognition using Wireless Signals. In Proceedings of the 19th Annual International Conference on Mobile Computing and Networking, pp. 27–38. Cited by: §2.1.
-  (2007) Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure. In Artificial Intelligence and Statistics, pp. 412–419. Cited by: §2.3.
-  (2020) Towards Environment-Independent Human Activity Recognition using Deep Learning and Enhanced CSI. In GLOBECOM 2020 - 2020 IEEE Global Communications Conference, Vol. , pp. 1–6. External Links: Cited by: §2.1.
-  (2020) WiFi-Based Activity Recognition using Activity Filter and Enhanced Correlation with Deep Learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Vol. , pp. 1–6. External Links: Cited by: §2.1.
-  (2007) A Hilbert Space Embedding for Distributions. In International Conference on Algorithmic Learning Theory, pp. 13–31. Cited by: §3.4.
-  (2017) Prototypical Networks for Few-shot Learning. arXiv preprint arXiv:1703.05175. Cited by: §2.3.
-  (2018) Learning to Compare: Relation Network for Few-shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208. Cited by: §2.3.
-  (2016) WiFinger: Leveraging Commodity WiFi for Fine-grained Finger Gesture Recognition. In Proceedings of the 17th ACM International Symposium on Mobile Ad hoc Networking and Computing, pp. 201–210. Cited by: §2.1.
-  (2017) Adversarial Discriminative Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176. Cited by: §2.2.
-  (2014) Deep Domain Confusion: Maximizing for Domain Invariance. arXiv preprint arXiv:1412.3474. Cited by: §2.2.
-  (2008) Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (11). Cited by: §4.5.
-  (2020) Robust CSI-based Human Activity Recognition using Roaming Generator. In 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), Vol. , pp. 1329–1334. External Links: Cited by: §2.1, §2.1.
-  (2021) Multimodal CSI-based Human Activity Recognition using GANs. IEEE Internet of Things Journal 8 (24), pp. 17345–17355. Cited by: §1, §2.1, §3.1, §3.1, TABLE I, TABLE III, §3, §4.2.
-  (2016) Human Respiration Detection with Commodity WiFi Devices: Do user location and body orientation matter?. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 25–36. Cited by: §2.1.
-  (2021) WiTrace: Centimeter-Level Passive Gesture Tracking Using OFDM Signals. IEEE Transactions on Mobile Computing 20 (4), pp. 1730–1745. External Links: Cited by: §1.
-  (2018) Deep Visual Domain Adaptation: A survey. Neurocomputing 312, pp. 135–153. Cited by: §1, §2.2.
-  (2015) Understanding and Modeling of WiFi Signal based Human Activity Recognition. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pp. 65–76. Cited by: §2.1.
-  (2020) Generalizing from A Few Examples: A survey on Few-shot Learning. ACM Computing Surveys (csur) 53 (3), pp. 1–34. Cited by: §2.3.
-  (2009) Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research 10 (2). Cited by: §2.3.
-  (2019) CsiGAN: Robust Channel State Information-based Activity Recognition with GANs. IEEE Internet of Things Journal 6 (6), pp. 10191–10204. Cited by: §2.1, §2.1, §3.1, §3.1.
-  (2015) CSI-based Device-free Gesture Detection. In 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET), pp. 1–5. Cited by: §2.1.
-  (2018) Device-free Occupant Activity Sensing using WiFi-enabled IoT Devices for Smart Homes. IEEE Internet of Things Journal 5 (5), pp. 3991–4002. Cited by: §4.1.
-  (2019) Learning Gestures from WiFi: A Siamese Recurrent Convolutional Architecture. IEEE Internet of Things Journal 6 (6), pp. 10763–10772. Cited by: §2.1, §2.1.
-  (2014) How Transferable Are Features in Deep Neural Networks?. Advances in Neural Information Processing Systems 27. Cited by: §2.3.
-  (2017) Toward Centimeter-scale Human Activity Sensing with WiFi Signals. Computer 50 (1), pp. 48–57. Cited by: §2.1.
-  (2018) CrossSense: Towards Cross-site and Large-scale WiFi sensing. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 305–320. Cited by: §2.1.
-  (2021) Privacy-Preserving Cross-Environment Human Activity Recognition. IEEE Transactions on Cybernetics. Cited by: §2.1.
-  (2019) WiGrus: A Wifi-Based Gesture Recognition System Using Software-Defined Radio. IEEE Access 7 (), pp. 131102–131113. External Links: Cited by: §2.1.
-  (2021) WiFi-based Cross-Domain Gesture Recognition via Modified Prototypical Networks. IEEE Internet of Things Journal. Cited by: TABLE I, TABLE III, §4.2.
-  (2016) Improving the Robustness of Deep Neural Networks via Stability Training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4480–4488. Cited by: §3.1.
-  (2021) Domain Generalization in Vision: A Survey. arXiv preprint arXiv:2103.02503. Cited by: §1, §2.2.