1 Introduction
Location based services (LBS) are essential for applications like locationbased advertising, outdoor/indoor navigation and social networking, etc. With the help of significant advancement of the smartphone technology in recent decades, smartphone devices are integrated with various builtin sensors, such as GPS modules, WiFi modules, cellular modules, etc. Acquiring the data from such kinds of sensors enables researchers to study human activities.
There are several types of data can be utilised for such research purpose. For instance, GPS equipment can provide relative accurate position information with latitudes an longitudes directly when the smartphone users stay outdoors, the GPSbased methods are favored by many researchers [5], [28]. However, this type of methods are not suitable for indoor positioning tasks because GPS coordinates are no longer available if the users stay indoors.
On such occasion, we have to utilize other indirect data to interpret the location information. Since WiFi connections are widely used nowadays, one frequently used approach is to make use of the detected WiFi fingerprints of smartphone devices. In this case, the received signal strength indicator (RSSI) values of WiFi access points (WAPs) scanned by the mobile phones, instead of latitudes and longitudes, are used to compute the locations of the users. Compared to the GPSbased methods, the WiFi fingerprintsbased localization method not only can function indoors but also are less energyconsuming.
In this work, we attempt to interpret WiFi fingerprints (RSSI values) into accurate users location (coordinates). However, this problem is not easy to solve. Typically, the RSSI vaule data are vectors whose elements represent the unique WAP IDs. To provide the good quality of WiFi connection, modern public buildings are equipped with a relatively large number of WiFi access points. As a result, this leads to the high dimensionality problem. Another issue is that the RSSI values are not always stable due to the signalfading effect and the multipath effect
[14]. According to our investigation, we find that an ordinary deeplearning regressor with euclidean distancebased loss is not powerful enough to overcome such difficulties.In order to solve these problems, we propose two deep learningbased models in this work. The first proposed model is called the convolutional mixture density recurrent neural network (CMDRNN). CMDRNN is designed to predict the user paths. In the CMDRNN model, to address the high dimensionlity issue, we deploy an onedimensional convolutional neural network as a substructure to detect the feature of the input. In CMDRNN, in order to overcome the the instability problem of the data, a mixture density network substructure is incorporated into our model for calculating the final output. Meanwhile, since our task is a timeseries prediction, we model the state transition via a recurrent neural network substructure. With such unique design, the CMDRNN model is able to predict user location with WiFi fingerprints.
As we know, labelling data usually is timeconsuming and laborconsuming, thus most of realword data in fact is unlabeled. However, even so, we still want to make as much use of accessible data as possible. For doing this, we proposed the second deep learningbased model, the VAEbased semisupervised learning model. In this approach, we assume that the input (RSSI values) and the target (user location) are governed by the same latent distribution. therefore, in the unsupervised learning process, we use the variation autoencoder model to learn the latent distribution of the input whose information is relatively abundant. Then, in the supervised learning process, the labeled data is used to train the predictor. In this way, we can exploit more information of the dataset than other supervised learning methods.
The main contributions of our work are summarized as follows.

We devise an novel hybrid deeplearning model (CMDRNN) which allow us to predict the accurate position of the smartphone users based on detected WiFi fingerprints.

We devised the VAEbased deeplearning models to perform semisupervised learning for accurate indoor positioning.

We conduct the evaluation experiments on the realworld datasets and compare our methods with other deep learning methods.
The reminder of the paper is organized as follows. Section 2 surveys the related work. Section 3 addresses the problem we will solve in this paper. In Section 4, the proposed method is introduced. Section 5 presents the validation experiments and the results with the real user data. Finally, we draw the conclusions and discuss about the potential future work in Section 6.
2 Related work
In literature, researchers have explored various types of machine learning techniques, both conventional machine learning and deep learning methods, on location recognition and prediction with WiFi fingerprints data.
2.1 Conventional machine learning methods
In the work of [4]
, the researchers compared many traditional machine learning methods, decision trees (DT), Knearest neighbors (KNN), naive Bayes (NB), neural networks (NN), etc., for classifying buildings, floors and regions. In
[7], the authors clustered the 3D coordinates data by Kmeans and clustered the RSSI data by the affinity clustering algorithm, respectively. Specially, in
[26], the researchers compared distance metrics to investigate the most suitable distance functions for accurate WiFibased indoor localization. Some researchers used Gaussian processes (GPs) [22] to model the relationship between WiFi signal strengths and the indoor locations [9], [11], [27]. Whereas GPs are not scalable to large datasets due to the expensive computational cost.2.2 Deep learning methods
Deep learning methods, such as convolutional neural networks (CNNs) [19], autoencoders (AEs) [13] and recurrent neural networks (RNNs) also have been utilized in WiFibased positioning tasks. [15] used the CNN model for timeseries analysis. Generally, a buildings has many different WiFi access points, thus the RSSI data in many situations, can be very high dimensional. For this reason, it is reasonable to reduce the data dimension before carrying out a regression or a classification task. Some deep learningbased dimensionreduction methods like autoencoders can be an appropriate choice [21], [23], [16]. For example, in the research of [23], the authors used an autoencoder network to reduce the data dimension, then used a CNN to proceed accurate user positioning. In [21], [16]
, the researchers used autoencoders to reduce the input dimension before using a multilayer perceptron (MLP) to classify the buildings and floors.
For time series predictions, there are two typed of applicable deep architectures, CNNs and RNNs. In [14]
, the authors compared different types Recurrent Neural Networks including vanilla RNN, long shortterm memory (LSTM), gated recurrent units (GRU) and bidirectional LSTM for accurate RSSI indoor localization. And they emplyed a weighted filter for both input and output to enhance the sequential modeling accuracy.
2.3 Limitation of convectional neural networks
Traditional neural networks (NNs) can be regarded as deterministic models, which can be described as follow.
(1) 
where, and are the input and output of the NN, respectively. represents the neural network structure and is the wight of the NN.
Accordingly, the training loss (for instance, typically, mean squared errors) of NNs can be described as follow.
(2) 
where, is the total number of the input and is the model target.
In many situations, a NN model is powerful enough to obtain satisfying results. However, in some cases, for instance, a high nonGaussian inverse problem, the traditional neural networks will lead to very poor modeling results[2]
. A good solution to this issue is to seek a framework that can calculate conditional probability distributions.
Mixture density networks (MDNs) solve this problem by using maximum likelihood estimation (MLE)
[3]. In MDNs, the final output is sampled by a mixture distribution rather than computed directly. One advantage of MDNs is that it can be applied to an estimation situation in which a large variety lies. For instance, we can incorporate more mixture of Gaussians to a MDN to enhance its estimating capacity for more complex distributions. However, MLE also has obvious disadvantages. First, it needs to set some hyperparameters properly (i.g., the mixture number of a MDN), otherwise, it may not provide the desirable result. Moreover, MLE can be biased when samples are small so MDNs are not suitable for semisupervised learning. In practice, we also find that MDNs suffer from computational instability when the mixture number is large.To alleviate the disadvantages of MDNs, the researchers introduced Bayesian neural networks (BNNs) [12]
, which applies Bayesian inference. BNNs follow the scheme of maximum a posterior (MAP) estimation, in which the prior knowledge of models and the likelihood are combined. MAP has the regularizing effect which can prevent overfitting. However, in practice, we find that BNNs are not flexible enough for very complex distribution like our case. We conjecture that this could be caused by the simple choice of the prior.
To solve this problem, we deploy a variational autoencoder [18], which is a deep latent generative model, in the proposed model to introduce a richer prior information. Meanwhile, since variational autoencoders (VAEs) are unsupervised learning methods, we can use it to learn knowledge from the unlabeled data and to devise the semisupervised learning model.
3 Problem Description
In this work, our purpose is to interpret the smartphone user location with the corresponding WiFi fingerprints. In contrast to some previous work, we aim at obtaining the accurate user location, namely, the coordinates (either in meters or longitudes and latitudes) rather than treat this subject a classification task (identifying buildings and floors). The first task of our work is a timeseries prediction task whose purpose is to predict the next timepoint user location using current timepoint WiFi fingerprints. The second task of our work is a smeisupervised learning location recognition task. We attempt to make use of not only the labeled data but also the unlabeled data to improve the location recognition accuracy.
4 Proposed Methods
We will introduce two proposed deeplearning models. The first proposed model is called the cconvolutional mixture density recurrent neural network (CMDRNN), which is designed to prediction path with WiFi fingerprints. The second proposed model is a VAEbased model, which is designed to proceed semisupervised learning for WiFibased positioning.
4.1 Convolutional mixture density recurrent neural network
4.1.1 1D convolutional neural network
In our first task, the input features are composed of the RSSI values of all the WiFi access points (WAPs) in the buildings, thus the input can be very high dimensional. In practice, we discover that the adjacent input features (Wifi access point values) are more likely to have the similar numerical values than the remote input feature. For this reason, to deal with the high dimensionality problem, we resort to the technique of convolutional neural network (CNN) [19]
. CNN is a powerful tool for detecting feature and it is widely used for tasks like imagine processing, natural language processing and sensor signal processing. In particular, since the input of our model is the RSSI value vector, we adopt the 1D convoultional neural network to extract the proprieties of high dimensional input.
4.1.2 Recurrent neural network
Recurrent neural networks (RNNs) [8]
are widely used for natural language process (NLP), computer vision and other time series prediction task.
The state transition of RNNs can be expressed as follow.
(3) 
where, is the input, is the hidden state,
is the activation function,
is the hidden weight for the input, is the hidden weight for the hidden state and is the hidden bias.The output of a conventional RNN can be expressed as follow.
(4) 
where, is the output, is the activation function, is the hidden weight for the input and is the output bias
However, in practice, RNNs may suffer the longterm dependency problem during learning process. To address this problem, the researchers proposed long shortterm memory networks (LSTMs) [10], which is a variant of RNNs. In our model, we employ the LSTM network to predict user location. More recently, the researchers proposed a variant of RNN, gated recurrent units (GRUs) [6], which have the similar accuracy as LSTMs but less computing cost. We will deploy these three RNN architectures into the proposed model as comparisons.
4.1.3 Mixture density network
A traditional neural network with a loss function, for instance, mean square errors, is optimized by a gradientdescent based method. Generally, such model can perform well on the problems that can be described by a deterministic function
, i.e., each input’s corresponding output can only be one single specific value. However, for some stochastic problems, one input may be corespondent to more than one possible values. Generally, this kind of problems are better to be described as a conditional distribution than a deterministic function . In such case, traditional neural networks may not work as expected.To tackle with this kind of problems, intuitively, we can replace the original loss function with a conditional function, for a regression task, the Gaussian distribution can be the proper choice. Moreover, to utilize the mixed Gaussian distributions can improve the representation capacity of the model.
The researchers proposed the mixture density networks (MDNs) model [3]. In contrast with traditional neural network, the output of MDN is the parameters a set of mixed Gaussian distributions and the loss function become the conditional probabilities. Therefore, the optimization process is to minimize the negatived log probability. Hence, the loss function can be described as follow:
(5) 
where, is the input, is the assignment probability for each model, with , and is the internal parameters of the base distribution. For Gaussians, , is the means and
is the variances.
Accordingly, in the proposed model, the original output layer of the RNN, Eq. (4) ,is rewritten as this:
(6) 
where, is the output of the RNN submodel and also the input of the MDN submodel, is the activation function, is the hidden weight for the input and is the output bias.
After the training process, we can use the neural network along with the mixed Gaussian distributions to illustrate the target distributions.
4.1.4 Proposed model
Combined with the merits of three aforementioned neural networks, we devised a novel deep neural network architecture, called the convolutional mixture density recurrent neural network (CMDRNN). In the CMDRNN model, an 1D CNN is used to capture the features of the high dimensional inputs, then the state transitions of the time series data is modeled by a LSTMRNN model, and the output layer composed of mix Gaussian densities to enhance the pediction accuracy. With such an structure, we believe that our model is able to illustrate complicated high dimensional time series data. Fig 1 shows the whole structure of the CMDRNN model and Algorithm. 1 demonstrates the learning process of the CMDRNN model.
The uniqueness of our method is that, compare other existing models in literature, our model adopt a sequential density estimation approach. Thus, the learning target of the proposed become a conditional distribution of the data rather than a common regressor. Thanks to this, our model can conquer such complicate modeling tasks.
4.2 VAEbased semisupervised learning
We assume that (RSSI values) and (corrdiantes) are governed by the same latent variable (building structures). However, in many real cases, the available datasets have more information about input and less information about target , thus it is more reasonable to infer the distribution of via rather than via . This procedure can be described as:
(7) 
where represents the prior distribution of .
Afterwards, if we apply the chain rule and assume the conditional generative scheme as follow.
(8) 
Accordingly, the predicting model (either deterministic or probabilistic) can be described as:
(9) 
We can implement Eq. (7) and Eq. (9) by an unsupervised learning process an a supervised process, respectively. Therefore, our method consists of two steps:

The first step (unsupervised learning): we employ a deep generative model to obtain the latent distribution .

The second step (supervised learning): we employ a MLP model to obtain the target value .
The structure of the VAEbased semisupervised learning model is illustrated as Fig. 2
4.2.1 Unsupervised learning procedure
For the unsupervised learning process, we adopt a variational autoencoder as the generative model to learn the latent distribution. Variational autoencoders (VAEs) [18] are deep latent generative models.
In VAEs, the prior of the latent variable yields to a standard Gaussian distribution.
(10) 
In order to obtain the VAE encoder via a MLP, is reparameterized by the reparameterization trick:
(11) 
where, is the mean of and is the variance of .
The VAE decoder can be described as:
(12) 
where is the reconstructed input.
The evidence lower bound (ELBO) of VAEs can be written as:
(13) 
Once is maximized, we have the approximate posterior .
4.2.2 Deterministic predictor (M1 model)
After supervised training, we have the latent distribution . For the supervised learning, to obtain the target , we devise two predicting models, one is deterministic, the other is probabilistic.
As a naïve approach, we can build a deterministic predictor which consists of two predicting steps:
Step 1: to obtain the means of latent variables
(14) 
where, can be regarded as the encoder of the VAE.
Step 2: to obtain the final prediction based on the output of Step 1.
(15) 
where,
is a deterministic multilayer perceptron model.
Consequently, the loss function is:
(16) 
Note that the direct input of this predictor is the latent variable , which introduces the information of the distribution to the predictor. Hence, this predictor does not suffer the problem as conventional neural networks.
The scheme of M1 model is summarized in Algorithm. 2.
4.2.3 Probabilistic predictor (M2 model)
The proposed M2 model is more robust to the noise.
We apply the chain rule to obtain the factorized conditional distribution:
(17) 
(18) 
Since Eq. (18) cannot be solved explicitly, we use Monte Carlo method to draw samples , . First, we draw the latent variables from the VAE encoder:
(19) 
Then, we draw the predicted values based on:
(20) 
Then, the loss function of the predictor can be written as:
(21) 
Since is already trained, here we only care about optimizing . Thus, the loss function becomes:
(22) 
where, is the mini batch size.
Assume that the likelihood function is a Gaussian distribution with noise .
The scheme of M2 model is summarized in Algorithm. 3.
5 Experiments and Results
5.1 Sequential Prediction
5.1.1 Dataset Description
5.1.2 CMDRNN architecture overview
The implementation details of our model are illustrated in Table 1. The CNN subnetwork consists three layers, a convolutional layer, a max pooling layer and a flatten layer. The RNN subnetwork includes a hidden layer with neurons. The MDN subnetwork has a hidden layer and output layer. The mixed Gaussians number of the MDN outputlayer is , and each mixture has
parameters, two dimensional means, diagonal variances and correspondent portion. For the optimizer, we choose RMSprop
[24].Subnetwork  Layer  Hyperparameter  Activation Function 
CNN  convolutional layer  filter number: 100; stride: 2 
sigmoid 
CNN  max pooling layer  neuron number: 100  relu 
CNN  flatten layer  neuron number: 100  sigmoid 
RNN  hidden layer  memory length: 5; neuron number: 200  sigmoid 
MDN  hidden layer  neuron number: 200  leaky relu 
MDN  output layer  5*mixed Gaussians number (5*30)   
Optimizer: RMSProp; learning rate: 1e3  
5.1.3 Comparison with other methods
In order to prove the effectiveness of our method, we conduct a series of experiments to thoroughly compare our CMDRNN model to other deep learning approaches. The purposes of experiments are indicated as follows.

Comparing optimizers: Adam v.s. RMSProp

Comparing feature detectors: RNN, RNN+MDN, AE + RNN + MDN, CMDRNN

Comparing regressors: RNN, CNN+RNN and CMDRNN

Comparing RNN variants: CMDRNN, CMDLSTM and CMDGRU
Path  RNN  CNN+RNN  RNN+MDN  AE+RNN+MDN  CMDRNN  CMDLSTM  CMDGRU 

Path 1  
Path 2  
The overall results are demonstrated in Table 2.
5.1.4 Optimizers comparison
In [1], it reports that RMSProp [24] may have better performance on very nonstationary tasks than the Adam optimizer [17]. To verify that, we train our algorithm with RMSProp and Adam, respectively. As it is shown in Fig. 4, the proposed model can converge to a lower negative loglikelihood via RMSProp than Adam. Thus, we choose RMSProp as the optimizer for our model.
Since the input is high dimensional, the sagacious way to deal with this is to incorporate a subnetwork into the model for dimension reduction or feature detection. Many previous research adopted autoencoders to reduce dimension, while we argue that the more appropriate choice for the task in our work is using an onedimensional CNN. In order to prove that, we test three different models, one without a featuredetecting structure, one using using an autoencoder and one using 1D CNN (the proposed model). The autoencoder model with structure {hidden neurons: ; hidden neurons: ; code size: ; hidden neurons: ; hidden neurons: }.
5.2 Semisupervised learning for location recognition
5.2.1 Dataset description
For the validation dataset, we use the UJIIndoorLoc dataset [25], which is similar to the Tampere dataset. The input dimension of the UJIIndoorLoc dataset is . The RSSI values of the detected WAPs range from dB to dB and the RSSI values of undetected WAPs are set to be . The coordinates are in longitudes and latitudes and we use the scaled values for the experiments. The total instances number for the experiments is about .
5.2.2 Experimental results
Different from other supervised We make use of All the Fig. 8 demonstrates the distribution of latent variable .
Labeled data  kNN  GP  MDN (2)  MDN(5)  BNN  M1  M2 

2%  
5%  
10%  
20%  
30%  
50%  
80%  
For the experimental set up, we use different portions of labeled data, ranging from to . We use kNN, GP, MDN with mixtures, noted as MDN(2), MDN with mixtures, noted as MDN(5) and BNN, as comparisons.
From the results, we can see that the proposed models, M1 and M2, can provide satisfying results even when the labeled data is scarce. The predicting accuracy is improved when the labeled data increases. In contrast with other methods, the proposed models have better performance than other methods. Through the experiments, we also fin that the proposed models, compared to other methods, have such advantages:

Compared to GPs, the proposed models are less computationally expensive.

Compared to MDNs, the proposed models are more computationally stable.

Compared to BNNs, the proposed models are more flexible to complex distributions.
6 Conclusions and perspectives
In this paper, we attempt to tackle the WiFi fingerprintbased user positioning problem. The first task is to predict the next user location with the current WiFi fingerprints. In contrast with existing approaches, our solution is to devise a hybrid deep learning model. The proposed model is composed of three deep neural network, a CNN, a RNN and a MDN. This unique deep architecture combines all the strengths of three deep learning models, which the enables us to recognize and predict user location with high accuracy.
The second task is a semisupervised learning problem for accurate user location recognition. To tackle this, we propose a VAEbased semisupervised learning model. This model employs a VAE model to proceed unsupervised learning procedure. Meanwhile, in order to interpret WiFi RSSI values into coordinates, we devise two different predictors, one is a deterministic model, the other is a probabilistic model.
Finally, we test our models on the realworld datasets. For both of the tasks, the results verify the effectiveness of our approaches and show the superiority of our methods compared other deep learning based methods as well.
For the future work, we will explore other deep generative model for the potential applications for this research topic. For instance, normalising flows can be the potential approach to improve the performance.
References
 [1] (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §5.1.4.
 [2] (2006) Pattern recognition and machine learning. Springer Science+ Business Media. Cited by: §2.3.
 [3] (1994) Mixture density networks. Cited by: §2.3, §4.1.3.
 [4] (2015) A comparative study on machine learning algorithms for indoor positioning. In 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA), pp. 1–8. Cited by: §2.1.
 [5] (2016) Exploiting machine learning techniques for location recognition and prediction with smartphone logs. Neurocomputing 176, pp. 98–106. Cited by: §1.
 [6] (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §4.1.2.
 [7] (2016) Clustering benefits in mobilecentric wifi positioning in multifloor buildings. In 2016 International Conference on Localization and GNSS (ICLGNSS), pp. 1–6. Cited by: §2.1.
 [8] (1990) Finding structure in time. Cognitive science 14 (2), pp. 179–211. Cited by: §4.1.2.
 [9] (2007) Wifislam using gaussian process latent variable models.. In IJCAI, Vol. 7, pp. 2480–2485. Cited by: §2.1.
 [10] (1999) Learning to forget: continual prediction with lstm. Cited by: §4.1.2.
 [11] (2006) Gaussian processes for signal strengthbased location estimation. In Proceeding of robotics: science and systems, Cited by: §2.1.

[12]
(2015)
Probabilistic backpropagation for scalable learning of bayesian neural networks
. In International Conference on Machine Learning, pp. 1861–1869. Cited by: §2.3.  [13] (2006) Reducing the dimensionality of data with neural networks. science 313 (5786), pp. 504–507. Cited by: §2.2.
 [14] (2019) Recurrent neural networks for accurate rssi indoor localization. arXiv preprint arXiv:1903.11703. Cited by: §1, §2.2.
 [15] (2018) CNN based indoor localization using rss timeseries. In 2018 IEEE Symposium on Computers and Communications (ISCC), pp. 01044–01049. Cited by: §2.2.
 [16] (2018) A scalable deep neural network architecture for multibuilding and multifloor indoor localization based on wifi fingerprinting. Big Data Analytics 3 (1), pp. 4. Cited by: §2.2.
 [17] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.1.4.
 [18] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.3, §4.2.1.
 [19] (1998) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §2.2, §4.1.1.
 [20] (2017) Crowdsourced wifi database and benchmark software for indoor positioning. Data set], Zenodo. doi 10. Cited by: §5.1.1.
 [21] (2017) Loweffort place recognition with wifi fingerprints using deep learning. In International Conference Automation, pp. 575–584. Cited by: §2.2.
 [22] (2003) Gaussian processes in machine learning. In Summer School on Machine Learning, pp. 63–71. Cited by: §2.1.
 [23] (2019) A novel convolutional neural network based indoor localization framework with wifi fingerprinting. IEEE Access 7, pp. 110698–110709. Cited by: §2.2.
 [24] (2012) Lecture 6.5rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4 (2), pp. 26–31. Cited by: §5.1.2, §5.1.4.
 [25] (2014) UJIIndoorLoc: a new multibuilding and multifloor database for wlan fingerprintbased indoor localization problems. In 2014 international conference on indoor positioning and indoor navigation (IPIN), pp. 261–270. Cited by: §5.2.1.
 [26] (2015) Comprehensive analysis of distance and similarity measures for wifi fingerprinting indoor positioning systems. Expert Systems with Applications 42 (23), pp. 9263–9278. Cited by: §2.1.
 [27] (2015) Gaussian process assisted fingerprinting localization. IEEE Internet of Things Journal 3 (5), pp. 683–690. Cited by: §2.1.
 [28] (2017) Modeling user activity patterns for nextplace prediction. IEEE Systems Journal 11 (2), pp. 1060–1071. Cited by: §1.