1 Introduction
Featurebased (i.e. fingerprintingbased) indoor positioning systems (FIPSs), one of the promising indoor positioning solutions, have been proposed using various types of features (e.g. WLAN/BLE signal strengths (Padmanabhan et al., 2000; Youssef and Agrawala, 2008; Zhuang et al., 2016), geomagnetic field strengths (He and Shin, 2018) or visible patterns (Guan et al., 2016)) for providing indoor locationbased services (LBSs) to pedestrians (Brena et al., 2017; He and Chan, 2016; Pei et al., 2016). The positioning accuracy of the stateoftheart FIPSs using the received signal strength (RSS) of WLAN access points (APs) is in the range of a few meters (Mautz, 2012). This is adequate for pedestrian indoor positioning and navigation in many cases. However, unexpected and unacceptably large errors (e.g. 20 m in horizontal coordinates (TorresSospedra et al., 2017)) can be observed in real environments. They jeopardize the practical usability of FIPSs (Wu et al., 2017; TorresSospedra and Moreira, 2017). Such large errors may be caused by large deviations of the measured or stored feature values when performing the location estimation (Kaemarungsi and Krishnamurthy, 2012).
In order to benefit from the attractive characteristics of FIPS while mitigating large errors, the trend is to combine the featurebased positioning with other techniques. Such hybrid approaches combine the featurebased information with e.g. pedestrian dead reckoning (PDR) (Li et al., 2016), map matching (Wang et al., 2015, 2012) or infrared ranging (Bitew et al., 2015)
. In addition, Bayes filtering methods, such as Kalman filters or particle filters are used to improve the estimated trajectory of pedestrians by combining the measurements with assumptions on the user’s motion
(Li et al., 2016; Röbesaat et al., 2017). Merging different positioning solutions may help mitigating the impact of large errors of individual observations on the quality of a specific type of LBSs. However, such approaches requires either deploying additional infrastructure or providing extra information (e.g. the indoor map). It would be useful to detect or mitigate large errors in FIPS using only intrinsically available data. This has attracted little research attention in the past, see e.g. (Wu et al., 2017; TorresSospedra and Moreira, 2017; Lemic et al., 2019), and is the motivation for the present contribution.We base our approach on the variability of the feature values at each individual location. Feature values measured during the positioning stage are snapshots affected by noise. Even if the expected value of the feature has not changed since the data collection for the generation of the reference fingerprint map (RFM), the measured value may be closer to the RFM value at a different position than to the one at the correct position because of this noise. It is therefore important to take the noise into account when assessing the similarity of measured and stored feature values. We facilitate this by storing the empirical standard deviations (STDs) in the RFM which is generated during the offline phase for representing the relationship between locations and their associated features. The estimation of the variability is carried out by empirically analyzing the spatial distribution of the raw data (e.g. RSS values) included in the RFM. It yields an extended representation of the RFM, which contains not only the spatially smoothed feature values, but also the locationwise estimated STD of each individual feature (see Section 4). These values can then be used to mitigate the impact of large errors in FIPS. To this end we propose a weighted dissimilarity measure, which quantifies the difference between the online measured features and the features stored in the RFM, by adapting the contribution of the individual features to the dissimilarity measure relative to their estimated STD values (see Section 5.1). The positioning process is carried out in an iterative way because we need to assume the user’s location, which is required for retrieving the STD of the online measured features (see Section 5.2). Beyond the use further discussed in this paper, the locationdependent standard deviations can also be employed for identifying (large) changes of features which may need an update of the RFM, see e.g. (He and Chan, 2016; Tao and Zhao, 2018; He et al., 2016).
The remaining of the paper is organized as follows: Section 2 summarizes the work related to reducing large errors in an FIPS. The fundamentals of the featurebased positioning are briefly described in Section 3. The robust estimation of the variability of the RFM and its application to positioning are presented in Section 4 and 5, respectively. Finally, the evaluation of the variability estimation as well as the positioning performance using the iterative scheme are presented in Section 6 for a real world dataset.
2 Related work
Herein we focus on publications that address the detection and reduction of large errors in an FIPS. We refer the interested readers to (Mautz, 2012; He and Chan, 2016; Brena et al., 2017) for more general information about indoor positioning. A comprehensive comparison of different featurebased indoor positioning algorithms using various similarity/dissimilarity metrics is available in (Retscher and Joksch, 2016; TorresSospedra et al., 2015; Minaev et al., 2017). A short review of the methods used for generation or creation of the RFM can be found in e.g. He and Chan (2016); Zhou and Wieser (2019).
TorresSospedra and Moreira (2017) provides a detailed analysis of the sources of large errors when employing deterministic featurebased positioning approaches (e.g. NN). The analysis is based on simulations for different indoor scenarios. The authors consider the influence of several factors such as the quantization error of signal acquisition, the density of the reference measurements, and the selected dissimilarity metrics on the positioning error. The analysis shows that large observation errors mostly occur at locations where both the mean and the maximum value of the RSS are low. However, the authors do not report about a validation of their analysis in a really deployed FIPSs. On a related note, Kaemarungsi and Krishnamurthy (2012) proposes to simply disregard features with a large standard deviation for the estimation of the user’s position.
There are only few works that focus on reducing or estimating the positioning errors based on the analysis of the RFM^{1}^{1}1TorresSospedra and Moreira (2017) provides a complete discussion of the works focusing on reducing large positioning errors by support of other technologies (e.g. PDR, or Bayes filtering).. Wu et al. (2017) introduces a weighted dissimilarity measure by computing the discriminative indicator for each feature according to the Logdistance path loss model. However, the variability of the online measured features which has an impact on the estimation of the discriminative factor is not taken into account. In (Lemic et al., 2019) and (Li et al., 2019)
, the authors propose different regression models (e.g. neural networks, random forest, or Gaussian processes) for estimating the positioning errors and uncertainties that can be used to improve the performance of tracking a pedestrian’s trajectory. Even if this is not the focus of these papers, the results suggest that the regressionbased error prediction models cannot help to mitigate
large errors because the predicted errors have a large uncertainty.Compared to previous publications, we carry out the variability analysis of the RFM using a kinematically collected dataset, which includes not only the noise originating from the short term fluctuations of the features measured by a mobile device, but also the noise introduced by the motion status (e.g. moving speed and headings) of the mobile device. This setup is closer to the realistic situation of positioning and tracking pedestrians. The estimation of the variability is based solely on the raw RFM and is later used for reducing large errors by introducing an iterative scheme with the weighted dissimilarity measure in the online positioning phase.
3 Featurebased positioning
We start this section by introducing the fundamental concepts of featurebased positioning and then briefly describe the process of kinematically collecting the RFM.
3.1 Fundamental concepts
Each measured feature is uniquely identifiable and has a measured value. For example, the signal from an AP, can be identified by its media access control address and is associated with an RSS. Features are thus formulated as pairs of attribute and value , i.e. . A measurement (i.e. fingerprint) taken by the user at the location/time consists of a set of measured features, i.e. , where is the complete set of the identifiers of all available features and () is the number of features observed by the user at . The set of attributes of is defined as (). The positioning process consists of inferring the estimated user location as a function of the measurement and the RFM , where is a suitable mapping algorithm from the measurement to location^{2}^{2}2 The RFM is omitted from the positioning algorithm for simplicity. (e.g. ) is the dimension of the coordinates.. represents the relationship between the location and the measurement , i.e. throughout the region of interest (RoI) . If the RFM is discretely represented, we denote it as (where ). A discrete RFM can be obtained e.g. by collecting fingerprints at different known or independently measured locations within the RoI .
3.2 Kinematically acquired Rfm
The kinematically obtained dataset used as the basis for the RFM herein has already been employed in (Zhou and Wieser, 2019). It was acquired using a mobile device (Nexus 6P) whose ground truth location was continuously measured with mm to cmlevel accuracy by a total station tracking a mini prism mounted on top of the mobile device. This procedure enables to simultaneously obtain accurate reference coordinates and the fingerprinting data collected by a pedestrian. The measurements were obtained at arbitrary locations lying on the trajectory of a pedestrian because the data acquisition on the mobile phone is passively triggered by the status of measurable features (e.g. the arrival of new features or the change of feature values) (Schulz et al., 2018). By carrying out a thorough sitesurvey, all the collected measurements and their tracked trajectories were merged and used to generate the raw RFM. Herein we use this dataset as the basis of our analysis. More details of its acquisition and processing can be found in (Zhou and Wieser, 2019).
Fig.(a)a and (b)b show examples of the raw data collected for RFM generation, namely the RSS values from two WLAN APs. These are signals of opportunity as the APs had been installed for providing Internet access and the signals are their anyway, when using them for the purpose of indoor positioning. The raw measurements have been acquired at arbitrary locations throughout the RoI which consists of several rooms and corridors within an office building.
4 Robust estimation of the feature variability
To estimate the noise of the measurable features at each location throughout the RoI, the features would have to be measured (ideally consecutively) multiple times at each location. However, even for a relatively sparse set of reference points throughout the RoI this would be prohibitively timeconsuming and laborintensive. We relax this requirement by assuming that the expected feature values change only little within a local, spatial neighborhood. Therefore, instead of estimating the standard deviation from the data collected only at a single location, we use all feature values obtained within a certain radius about a chosen reference location. The corresponding data are identified within the time series of data resulting while the user walked through the RoI. We denote these fingerprints as kinematically collected ones. The estimation of the standard deviation is still possible if a sufficient number of measurements is obtained in the proximity of each reference location (see Fig.1). The measurements thus associated with an individual reference location contain data obtained consecutively within a short time at slightly different positions, but also data collected a certain time interval apart (e.g. half an hour) because the user passed most locations several times during the entire data collection process. The resulting standard deviations thus reflect also the temporal variability of the signals, and the impact of user motion during measurement, which will also apply during the positioning stage. We thus consider the kinematically collected RFM data suitable for the variability analysis.
Under the assumption that the expected value of each feature is locally obtainable, the locationwise STD of each feature can be approximated based on the measurements associated to the neighborhood of a given reference location. More formally, we estimate the STD of th () feature at the reference location in the RFM . These estimated values of the STD are later included in the extended representation of the RFM, i.e. with . We start the estimation of the feature values for the RFM
by applying a spatial median filter to the raw measurements in order to mitigate potential outliers. We proceed with the
kernel smoothing (KS) that enables us to reduce the impact of noise and obtain a quasicontinuous representation of the RFMby interpolation. It allows us to approximate the expected value of the measurements at any location throughout the
RoI. We perform spatial filtering and KS in two separate steps because KS is nonrobust and the preceding filtering allows us to remove outliers before filtering noise and interpolating. The locationwise STD for each measurable feature is finally calculated as empirical standard deviation of the raw measurements (before filtering and kernel smoothing) within a neighborhood of the specific reference points. In the following, the individual steps of the algorithm are explained in more detail.As can be seen in Fig.(a)a and (b)b the measured feature values in the neighborhood of a given location may vary significantly. This is particularly visible around locations with very low signal strength values, i.e. values close to the sensitivity limit of the mobile devices. In order to mitigate the impact of these variations on the representation of the RFM, we apply the spatial filtering which replaces the originally measured feature value of feature at the given location by the median value of the values measured within the neighborhood of . We have chosen to defined the neighborhood as the set of measurements collected at the up to locations closest to that at the same time lie within the given radius about (see the schematic in Fig.2).
In the second step, we estimate a continuous RFM using KS in order to be able to retrieve the expected measurements at any location within the RoI (Berlinet and ThomasAgnan, 2011). Albeit KS can reduce noise by implicit filtering, it is not robust and the results could therefore be severely contaminated by outliers in the measured features (Fig.(a)a and (b)b). Therefore, we apply KS to the media filtered data rather than to the original ones. Because the structure of the indoor region is not taken into account, KS tends to smoothen the RFM over discontinuities like large changes of feature values or change from feature presence to feature absence over short distances e.g. because of walls. This oversmoothing degrades the quality of the RFM for certain features at certain locations. This may be relevant for positioning (Bong and Kim, 2012), especially when using radio frequency signals such as WLAN whose propagation is highly influenced by obstacles. Herein we employ a modified version of KS which uses only a subset of the data in the neighborhood of a given location for approximating the expected feature values (Berlinet and ThomasAgnan, 2011). This alleviates the impact of oversmoothing, while at the same time reducing the computational complexity (Berlinet and ThomasAgnan, 2011; Cormen et al., 2009).^{3}^{3}3A detailed analysis of the oversmoothing problem, the computational complexity of KS, and a discontinuity preserving approach to KS is beyond the scope of this paper and left for future work..
The distribution of the measured noise shown in Fig.4
clearly suggests that the variances are locationdependent, are different for different features, and cannot be represented as just a function of feature value or of geometric distance from a single point per feature (e.g. the
AP location). So, we propose to model the STD as a locationdependent quantity, independently for each individual feature. To this end, we compute the absolute residuals of the raw data with respect to the spatially filtered and kernel smoothed RFM in order a robust estimate of the STD. At the reference location in the RFM , the STD of th () feature contained in is computed by the median absolute deviation (MAD) of the measured feature values associated to locations defined as the support set for spatial filtering. The extended representation of the RFM with the estimated STD at location is denoted as and is continuously represented using KS, i.e. .5 Iterative scheme for online positioning
Inspired by the finding that the variability of the features has a large effect on the positioning error (Kaemarungsi and Krishnamurthy, 2012), we employ the robustly estimated STD of the features to reduce the impact of uncertain feature values when calculating the position estimate. We construct a weighting scheme that reduces the weight of a feature with high STD relative to features with low STD. Therefore, a discrepancy between online measured and expected value of a feature with low STD has more impact on the dissimilarity measure—and thus on the estimated position—than the same discrepancy for a feature with a high STD. This dissimilarity measure is used to identify which subset of reference locations is taken into account when inferring the user’s location using deterministic featurebased positioning algorithms such as NN.
5.1 Weighted dissimilarity measure
Given the online measured features at the location , the weighted dissimilarity measure between and the th reference fingerprint stored in the RFM is computed as:
(1) 
where is the selected dissimilarity measure (e.g. Minkowski distance) and is the missing value indicator (e.g. 110 dBm). This equation represents a compound dissimilarity measure (CDM) as defined in (Zhou and Wieser, 2018), and correspondingly and
are hyperparameters regulating the contribution of mutually unshared features to the dissimilarity measure. However, the
CDM herein uses a new distance metric, not covered in (Zhou and Wieser, 2018), by locationwise weighting of individual features instead of only weighting according to the respective observability. is the weight of the th feature at the location/time and is computed by employing the variability derived from the estimated expected measurement obtained at . In case that the th feature in () is not measurable at location , the weight of the corresponding feature is set to the minimum value of the weights of the measurable features thus reducing their impact on the estimation of the location.We selected the softmax function (Murphy, 2012; Gal and Ghahramani, 2016)
(2) 
to calculate the weight of each feature using the estimated STD, where and
is the scale factor for adapting the concentration of the softmax function. The denominator normalizes the weights and makes the solution invariant to the scale of the weights. We have also tried to use a weight function corresponding to the one frequently employed for weighted leastsquares (and actually motivated by maximum likelihood estimation with normally distributed observations), namely setting each weight proportional to the inverse of the respective variance. However, the accuracy of the solutions was worse than using the softmax function.
The weighted dissimilarity measure is used to identify the candidate locations, whose dissimilarity values are smallest among all reference fingerprints stored in the RFM. We estimate the user’s location using NN or weighted NN by averaging or weighted averaging (e.g. inversely proportional to the value of dissimilarity measure) of the candidate locations. More details about NN and weighted NN can be found in e.g. (Padmanabhan et al., 2000; Zhou and Wieser, 2019).
5.2 Iterative scheme
The position estimation requires to calculate the weight of each feature. However, the weight depends on the standard deviation which in turn varies with location. The required value can only be extracted from the RFM once the location is known. We thus carry out the positioning in an iterative way by i) assuming a position (initialization); ii) retrieving the STDs from the RFM, calculating the weights and estimating the position (update step); and iii) repeating ii) until a termination condition of the iterative scheme is fulfilled. These steps are explained in more detail in the following subsections.
5.2.1 Determination of the initial location
The initial location of the user is used to derive the weights for the first iteration. One straightforward way of initializing is to choose the location estimated by the standard NN without the weighted dissimilarity measure (i.e. the traditional NN). When processing real world data we found out that the solution obtained at the termination of the iterative process is quite stable when initializing the location even randomly (see Section 6). This suggests that the positioning performance does not depend strongly on the choice of the initial location.
5.2.2 Update step
At the th iteration (), the weights as well as the dissimilarities are computed according to the variability obtained at the location searched at the th iteration. The weight of the th feature at location/time and the th iteration is defined as:
(3) 
where is the estimated expected value of features with their STD at location . This updated weights are used to compute the dissimilarity measure as defined in (1) and consequently to infer the estimated location at the th iteration using e.g. NN algorithm.
5.2.3 Termination condition
Ideally the searching process should converge to a fixed location. This state is assumed to be reached when the distance between two consecutively obtained location estimated is lower than a given small threshold. We denote this subsequently as converging state and terminate the iterative process when
where is the threshold, which we set to m in the experimental analysis later on. We found out that the iterative process proposed herein sometimes enters a loop in which a (small) subset of locations are repeatedly obtained as estimates in the same sequence. We denote this as the looping state and introduce a second termination condition which is met when this state is recognized. We implement it as a threshold on the distance between the location estimate obtained at the iteration and the ones estimated at previous iterations except the estimated location at the th iteration. More formally, the second condition is satisfied and the iteration is terminated when
Finally, the maximum number of iterations is also limited (e.g. ) in order to prevent long or endless search for a solution. If the search for an estimate is terminated due to this condition, we denote it as max. state.
Assuming that the iterations terminate after iterations we select or compute the final estimate of the position depending on the termination flag (TF) , indicating the respective state, as follows:

Converging state: The location estimated at the th iteration is selected as the final estimate of the user’s location , and is set to 0.

Looping state: In this case the searched locations do not converge to a single point. If the number of locations exceeds a certain minimum (e.g. 4) and if the locations visited in the looping state are not farther apart than a chosen maximum (e.g. 0.01 m) (see Fig.5) we use the minimum covariance determinant (MCD) estimator^{4}^{4}4MCD is a highly robust estimation of multivariate location and scatter. We use the implementation of MCD from scikitlearn (Pedregosa et al., 2011). for computing the estimated location of the user from the convex hull of the visited locations. If the number of points is too low or if they are too far apart from each other the situation is handled like the max. state. If the looping state termination condition is met and the MCD is reported as the final estimate, the TF is set to 1.

Max. state: This case actually means that the position estimation using the weighted dissimilarity measure fails because no position can be found where the measured features and the predetermined standard deviations are compatible. In this case, we can either report a failure of the algorithm and not calculate a solution, or we can calculate an estimate ignoring the variability information. We have chosen the latter herein. In particular, we determine from all searched locations analyzing the similarities between the user measured fingerprint and the expected ones at the searched locations. Specifically, we employ the
modified Jaccard index
(MJI), which has been used for identifying subregions according to the measurability of features (Zhou and Wieser, 2019), as the similarity metric. The MJI value between and the expected one at the searched location of the th iteration is computed by:(4) where and are the sets of the measured features contained in and , respectively. The estimated user’s location is then the one that has the biggest MJI value among all searched locations , i.e. the one with the maximum number of common measurable features is selected as the final estimate of the user’s location.
6 Analysis of the variability estimation and positioning performance
We start this section by presenting the results of the locationwise variability of each individual feature estimated using the kinematically collected RFM data, which is discussed in detail in (Zhou and Wieser, 2019). We conclude this section with an analysis concerning the characteristics of iteratively searched locations as well as the positioning performance of the proposed iterative scheme.
6.1 Results of the variability estimation
Herein we set and m to obtain the spatially filtered RFM, which visually has an adequate spatial consistency in the neighborhood of each location. Fig.(c)c and (d)d clearly show that the spatial filtering can reduce the large variations contained in the raw RFM to a great extent. For further analysis, we compute the residuals between the raw and the spatially filtered RFM. The obtained residuals are close to zeromean distributed and have a locationdependent magnitude as illustrated in Fig.(e)e and (f)f. Large residuals occur either in regions close to the boundaries of the RoI (e.g. close to the walls or corners of rooms and corridors) or at locations where the RSS values are hardly measurable by the mobile phone. In both cases the features are very likely affected by obstacles which also cause locally large variations of the feature values.
Fig.4 shows the results of the estimated STD value using the MAD of the measured feature values associated to the neighborhood of a given location. As can be seen, each feature has a different variability throughout the RoI, i.e. the STD value is dependent both on the feature as well as on the location. This is the primary motivation that the variability is modeled locationwise for each individual feature instead of simply expressing the variability as a function of the measured feature value or as a constant value. The regions where the feature values have a higher STD are clearly correlated to the local variations of the measured feature value and the geometry of the building (Fig.3 and 4). The high variances occur in the case that a low number of measurements has been collected in the neighborhood region. These are caused by the violation of the assumption that the expected feature values are locally obtainable.
6.2 Results of iterative scheme for positioning
The proposed iterative scheme for featurebased positioning
is implemented using the application programming interface of scikitlearn package, a widely used machine learning package in Python
(Pedregosa et al., 2011). Herein we present the results of the iterative positioning using NN with the weighted Euclidean distance as the dissimilarity metric for measuring the distance in the feature space. The values of several hyperparameters have to be configured. Regarding the weighted dissimilarity measure, as formulated in Section 5.1, and are set to 3.0 by following the results reported in (Zhou and Wieser, 2018). The number of nearest neighbors in the NN algorithm and the scale factor of the softmax function are empirically set to 1 and 2.0, respectively. Optimization of the parameters (e.g. using grid/random search or Bayesian optimization (Wang and de Freitas, 2014)) for achieving the best positioning performance is left for future work.Fig.5 shows several examples of the searched locations of the iterative positioning with random as well as NN initialization. Each individual subplot depicts the results of the iterative positioning at a fixed test location. The subplots depict the searched locations (red squares), the final estimation (blue triangle or coral diamond), the estimation using the traditional NN algorithm, and the ground truth (black square)^{5}^{5}5In order to improve the readability, the initial location has not been visualized.. The initialization of the initial location has only a minor impact on the iterative searching process in this case because the RoI is relatively small. The initial location is determined by arbitrarily taking one of the reference locations stored in the RFM in case of random initialization. In addition, our schemes for determining the final estimation of the user’s location from these searched locations do not achieve the best potential positioning performance using the iterative scheme. Because there are locations in that are closer to the ground truth but they are not taken as the final estimation. This suggests that the iterative scheme for positioning has the potential to further improve the positioning performance if the proper technique is applied to retrieve the final estimation from these searched locations. The optimal positioning accuracy (denoted as Ours (opt.) in Fig.8 and in TABLE 1) is defined by assuming that the technique for the final estimation is capable of retrieving the searched location that is the closest to the ground truth. As depicted in Fig.8, our scheme for retrieving the final estimation can achieve comparable performance to that of the optimal one when comparing the overall positioning accuracy. However, from TABLE 1, it also suggests that the maximum positioning error can be reduced to a large extent if the optimal positioning can be achieved.
Fig.6 shows the statistics of the TF, denoted by the percentage of locations terminated with different conditions. In case of about 83% iterative search processes have terminated with the converging state. This is about 35 percent points higher than in case of . We therefore set the number of the nearest neighbors for NN to 1. In addition, we have analyzed the searched locations within the looping state cases are distributed on space. Fig.7 shows the distribution of the maximum distance between points within the same loop. Most of the maximum distances are significantly less than 10 m, though, in some extreme cases we observed up to 60 meters. As shown in Fig.8, the schemes proposed for the loop state (MCD or MJI) are still capable of properly selecting position estimates close to the ground truth also in most of these cases.
The
empirical cumulative distribution function
(ECDF) of the radial positioning errors is presented in Fig.8. The proposed approach can significantly improve the positioning performance as compared to the performance of the traditional NN and NN with CDM. Using the algorithm proposed herein, about 86% of the estimated locations have a positioning error smaller than 2 m and around 97% of estimated locations have an error of less than 4 m. Compared to NN, this represent an improvement of up to 20 and 10 percent points, respectively. The improvement is also up to 10 and 6 percent points as compared to NN with CDM. The percentage of estimated locations whose error distance is larger than 5 m is reduced from about 10.2% and 6.7% to 2.6%, when compared to the traditional NN, and NN with CDM, respectively. We also report the circular error (CE) defined as the minimum radius for including a given percentage of positioning errors (e.g. CE 50 for the percentile) in TABLE 1. The maximum positioning error is reduced by about 40%, from 37.0 m to 22.2 m when comparing NN with CDM to our approach. The CE 50, CE 75, and CE 90 are reduced by one third when compared to the NN without iterative positioning. Furthermore, in Fig.9 we illustrate and compare the distribution of the locations, at which the positioning error is larger than 8 m using the original NN. Fig.(a)a shows that these locations yielding large errors are mostly located close to the accessible boundaries of the indoor regions, i.e. close to corners of corridors and rooms, or to the walls. This pattern is similar to the spatial distribution of high variance of the feature values contained in the raw RFM as shown in Fig.4. Our approach can significantly reduce the number of occurrences where the positioning errors are larger than 8 m.CE 50  CE 75  CE 90  Max. error  

NN  1.4  2.6  5.2  29.6 
NN (CDM)  1.1  1.9  3.7  37.0 
Ours  1.0  1.5  2.2  22.2 
Ours (opt.)  0.9  1.4  1.9  9.2 
7 Conclusion
We have proposed an iterative scheme for featurebased positioning, which is based on the weighted dissimilarity measure, for reducing large errors occurring in FIPSs. Appropriate weights for the individual feature can be obtained by analyzing the variability of the kinematically collected raw data underlying the RFM. The locationwise standard deviation of each feature is robustly computed using the MAD between the raw data and the spatially smoothed RFM. This variability information is stored as an additional layer of the RFM and used for weighting the contribution of each feature to the dissimilarity measure during the online positioning phase.
Using real WLAN RSS data collected along with location ground truth in an office building, we could show that the noise of the raw observations indeed depends on the location and on the feature. We have implemented the proposed algorithms in Python and have validated the performance of the proposed iterative scheme. Compared to NN with CDM, the maximum positioning error is reduced by more than 40% and the iterative scheme can improve the overall positioning performance. The positioning accuracy defined as the percentage of the locations whose radial positioning error is less than 2 m is improved from 65% to 86% when compared to traditional NN.
In future work, we will further investigate the proposed algorithms using data from other environments. We will further investigate the loop state and the handling of remaining outliers. Finally, we will investigate how the standard deviations modeled within the RFM can help to identify the need for updates of the RFM.
Acknowledgment
The Chinese Scholarship Council has supported C. Zhou during his doctoral studies at ETH Zürich. The data used within the experimental investigation were collected by the students E. Weiss, I. Bai, N. Meyer, and G. Filella. The device holder for the ground truth measurements was designed by R. Presl.
References

Berlinet and ThomasAgnan (2011)
Berlinet, A. and ThomasAgnan, C. (2011).
Reproducing Kernel Hilbert Spaces in Probability and Statistics
. Springer Science & Business Media, Berlin Heidelberg.  Bitew et al. (2015) Bitew, M. A., Hsiao, R.S., Lin, H.P., and Lin, D.B. (2015). Hybrid indoor human localization system for addressing the issue of rss variation in fingerprinting. International Journal of Distributed Sensor Networks, 11(3):831423.
 Bong and Kim (2012) Bong, W. and Kim, Y. C. (2012). Fingerprint wifi radio map interpolated by discontinuity preserving smoothing. In Convergence and Hybrid Information Technology, pages 138–145, Berlin, Heidelberg. Springer Berlin Heidelberg.
 Brena et al. (2017) Brena, R. F., GarcíaVázquez, J. P., GalvánTejada, C. E., MuñozRodriguez, D., VargasRosales, C., and Fangmeyer, J. (2017). Evolution of indoor positioning technologies: A survey. Journal of Sensors, 2017.
 Cormen et al. (2009) Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition.

Gal and Ghahramani (2016)
Gal, Y. and Ghahramani, Z. (2016).
Dropout as a bayesian approximation: Representing model uncertainty in deep learning.
In international conference on machine learning, pages 1050–1059.  Guan et al. (2016) Guan, K., Ma, L., Tan, X., and Guo, S. (2016). Visionbased indoor localization approach based on surf and landmark. In 2016 International Wireless Communications and Mobile Computing Conference (IWCMC), pages 655–659. IEEE.
 He and Chan (2016) He, S. and Chan, S.H. G. (2016). Wifi fingerprintbased indoor positioning: Recent advances and comparisons. IEEE Communications Surveys & Tutorials, 18(1):466–490.
 He et al. (2016) He, S., Ji, B., and Chan, S. H. G. (2016). Chameleon: Surveyfree updating of a fingerprint database for indoor localization. IEEE Pervasive Computing, 15(4):66–75.
 He and Shin (2018) He, S. and Shin, K. G. (2018). Geomagnetism for smartphonebased indoor localization: Challenges, advances, and comparisons. ACM Computing Surveys (CSUR), 50(6):97.
 Kaemarungsi and Krishnamurthy (2012) Kaemarungsi, K. and Krishnamurthy, P. (2012). Analysis of wlan’s received signal strength indication for indoor location fingerprinting. Pervasive and Mobile Computing, 8(2):292 – 316. Special Issue: WideScale Vehicular Sensor Networks and Mobile Sensing.
 Lemic et al. (2019) Lemic, F., Handziski, V., Aernouts, M., Janssen, T., Berkvens, R., Wolisz, A., and Famaey, J. (2019). Regressionbased estimation of individual errors in fingerprinting localization. IEEE Access, 7:33652–33664.
 Li et al. (2016) Li, X., Wang, J., Liu, C., Zhang, L., and Li, Z. (2016). Integrated wifi/pdr/smartphone using an adaptive system noise extended kalman filter algorithm for indoor localization. ISPRS International Journal of GeoInformation, 5(2):8.
 Li et al. (2019) Li, Y., Gao, Z., He, Z., Zhuang, Y., Radi, A., Chen, R., and ElSheimy, N. (2019). Wireless fingerprinting uncertainty prediction based on machine learning. Sensors, 19(2):324.
 Mautz (2012) Mautz, R. (2012). Indoor positioning technologies. Habilitation Thesis, Department of Civil, Environmental and Geomatic Engineering, ETH Zurich, Switzerland.
 Minaev et al. (2017) Minaev, G., Visa, A., and Piché, R. (2017). Comprehensive survey of similarity measures for ranked based location fingerprinting algorithm. In 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pages 1–4.
 Murphy (2012) Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.
 Padmanabhan et al. (2000) Padmanabhan, P. B., N., V., and N., V. (2000). RADAR: An inbuilding RF based user location and tracking system. Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), 2(c):775–784.
 Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
 Pei et al. (2016) Pei, L., Zhang, M., Zou, D., Chen, R., and Chen, Y. (2016). A survey of crowd sensing opportunistic signals for indoor localization. Mobile Information Systems, 2016.

Retscher and Joksch (2016)
Retscher, G. and Joksch, J. (2016).
Comparison of different vector distance measure calculation variants for indoor location fingerprinting.
In LocationBased Services (LBSs), 2016 International Conference on, pages 53–76. ICA Commission on LBSs.  Röbesaat et al. (2017) Röbesaat, J., Zhang, P., Abdelaal, M., and Theel, O. (2017). An improved ble indoor localization with kalmanbased fusion: An experimental study. Sensors, 17(5):951.
 Schulz et al. (2018) Schulz, M., Link, J., Gringoli, F., and Hollick, M. (2018). Shadow wifi: Teaching smartphones to transmit raw signals and to extract channel state information to implement practical covert channels over wifi. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’18, pages 256–268, New York, NY, USA. ACM.
 Tao and Zhao (2018) Tao, Y. and Zhao, L. (2018). A novel system for wifi radio map automatic adaptation and indoor positioning. IEEE Transactions on Vehicular Technology, 67(11):10683–10692.
 TorresSospedra et al. (2017) TorresSospedra, J., Jiménez, A., Knauth, S., Moreira, A., Beer, Y., Fetzer, T., Ta, V.C., Montoliu, R., Seco, F., MendozaSilva, G., et al. (2017). The smartphonebased offline indoor location competition at ipin 2016: Analysis and future work. Sensors, 17(3):557.
 TorresSospedra et al. (2015) TorresSospedra, J., Montoliu, R., Trilles, S., Belmonte, A., and Huerta, J. (2015). Comprehensive analysis of distance and similarity measures for wifi fingerprinting indoor positioning systems. Expert Systems with Applications, 42(23):9263 – 9278.
 TorresSospedra and Moreira (2017) TorresSospedra, J. and Moreira, A. (2017). Analysis of sources of large positioning errors in deterministic fingerprinting. Sensors, 17(12):2736.
 Wang et al. (2012) Wang, H., Sen, S., Elgohary, A., Farid, M., Youssef, M., and Choudhury, R. R. (2012). No need to wardrive: Unsupervised indoor localization. In Proceedings of the 10th international conference on Mobile systems, applications, and services, pages 197–210. ACM.
 Wang et al. (2015) Wang, J., Hu, A., Liu, C., and Li, X. (2015). A floormapaided wifi/pseudoodometry integration algorithm for an indoor positioning system. Sensors, 15(4):7096–7124.
 Wang and de Freitas (2014) Wang, Z. and de Freitas, N. (2014). Theoretical analysis of bayesian optimisation with unknown gaussian process hyperparameters. CoRR, abs/1406.7758.
 Wu et al. (2017) Wu, C., Yang, Z., Zhou, Z., Liu, Y., and Liu, M. (2017). Mitigating large errors in wifibased indoor localization for smartphones. IEEE Transactions on Vehicular Technology, 66(7):6246–6257.
 Youssef and Agrawala (2008) Youssef, M. and Agrawala, A. (2008). The Horus location determination system. Wireless Networks, 14(3):357–374.
 Zhou and Wieser (2018) Zhou, C. and Wieser, A. (2018). Cdm: Compound dissimilarity measure and an application to fingerprintingbased positioning. In 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pages 1–7.

Zhou and Wieser (2019)
Zhou, C. and Wieser, A. (2019).
Modified jaccard index analysis and adaptive feature selection for location fingerprinting with limited computational complexity.
Journal of Location Based Services, 13(2):128–157.  Zhuang et al. (2016) Zhuang, Y., Yang, J., Li, Y., Qi, L., and ElSheimy, N. (2016). Smartphonebased indoor localization with bluetooth low energy beacons. Sensors, 16(5):596.
Comments
There are no comments yet.