Recent years have witnessed the advent of indoor localization systems harnessing the many untapped capabilities of phones-embedded sensors and their widespread availability. Most developed indoor localization techniques are WiFi-based, leveraging the Received Signal Strength (RSS) from WiFi access points (APs) as the metric for the location determination. To combat the noisy wireless channel, typically a WiFi fingerprint map is built that captures the signatures of the APs at different locations in the area of interest. Nonetheless, the deployment cost is prohibitive as this requires a calibration phase; which is time consuming, labour intensive and is vulnerable to environmental dynamics. To tackle this problem, a number of approaches for automating the fingerprinting process have been proposed including the use of RF propagation models [elbakly2016robust], combining RF localization with other sensors [wang2012no], or crowdsourcing the fingerprint; where users perform the required survey process in the real-time [rai2012zee, park2010growing]. These techniques, however, suffer from low location accuracy and/or require explicit user intervention. To further address these issues, some algorithms, e.g., [jin2011robust], leverage the phone inertial sensors to track the user starting from a reference point. However, the localization error quickly accumulates due to the noise of the low-cost inertial sensors. To alleviate this problem, researchers, e.g. [rai2012zee]
, proposed to match the user path with the map information to remove any outlier locations. Nonetheless, these indoor localization techniques require the knowledge of indoor map (i.e., wall and hallway locations).
Recently, Simultaneous Localization and Mapping (SLAM) approaches, commonly used in the robotics domain, have been introduced to provide iterative and autonomous feature maps (i.e., anchor positions and signatures; not the indoor floorplan) of previously unknown environments and localization of a subject within this environment simultaneously. For instance, ActionSLAM [hardegger2012actionslam] leverages foot-mounted inertial sensors to track the user’s indoor location using detected user actions as observations. However, foot-mounted sensors are not ubiquitous and the recognized activities cannot be accurately detected by standard smartphone sensors, limiting this technique scalability. To extend SLAM to standard smartphones, SemanticSLAM [abdelnasser2016semanticslam] leverages phone inertial sensors to continuously estimate the user’s location and detect supervised and unsupervised anchors in the building structure (e.g., stairs, elevators, etc.) with unique patterns that surface on the phone sensors when passing through them. These anchors provide error resetting opportunities to control the inertial sensors error accumulation. Although SemanticSLAM achieves good localization accuracy, its performance is contingent on the anchor density [abdelnasser2016semanticslam], which may be limited in many scenarios when the user moves in only one floor or not passing by any elevation-change anchor; losing many opportunities for error-resetting.
In this paper, we propose DynamicSLAM, a novel SLAM framework that can provide accurate, robust, low-overhead, and ubiquitous indoor localization. DynamicSLAM is a plug-and-play indoor localization system, where no infrastructure support nor war driving are needed for the operation of the system; making it building-independent. As in traditional SLAM, DynamicSLAM tracks the user’s location using dead reckoning based on the phone inertial sensors and leverages stationary anchors to curb the accumulated location error. However, to further increase the anchor density and hence the localization accuracy in any environment, DynamicSLAM introduces the novel concept of human mobile anchors. Specifically, DynamicSLAM leverages humans walking in the environment as anchors for error-resetting opportunities. To achieve this, DynamicSLAM needs to address a number of challenges, including recognizing users’ encounters in a ubiquitous manner, determining their relative location, and handling the inherent error in the users’ estimated locations.
We evaluate the performance of DynamicSLAM in a real deployment based on different Android phones. Our results show that it can achieve a median localization error of 1.1m, outperforming other state-of-the-art techniques by more than 55%. In summary, we provide the following contributions in this paper:
We present a novel collaborative simultaneous localization and mapping framework that leverages other users in the environment as mobile anchors to enhance the localization and mapping accuracy in environments with low building anchor density.
We present different ubiquitous observation models to detect users’ encounters and estimate their relative distances.
We present a theoretical proof for system convergence and the human anchors ability to reset the accumulated error.
We implement and evaluate our system on different Android phones and quantify its performance relative to the state-of-the-art systems.
The rest of the paper is organized as follows: Section 2 gives a brief background about the traditional SLAM framework. Section 3 gives an overview of the proposed system main components while Section 4 gives the details of the DynamicSLAM probabilistic framework. We evaluate the system performance in Section 5. Finally sections 6 and 7 discuss related work and conclude the paper respectively.
2 SLAM Background
Simultaneous Localization and Mapping (SLAM) is a computational problem originally introduced in robotics where a mobile robot is placed at an unknown location in an unknown environment and the robot continuously localizes itself within this environment, while simultaneously incrementally building a consistent feature map of this environment [durrant2006simultaneous]. The points constituting the feature map are also called anchors [abdelnasser2016semanticslam]. In SLAM, the trajectory of the robot and the locations and signatures of all anchors are estimated on-line without the need for any a priori knowledge. To do that, SLAM harnesses the robot odometers to keep track of the robot motion (i.e., to estimate its pose) as well as its exteroceptive sensors for detecting anchors in the environment to build the map. In a probabilistic form, the goal of SLAM is to find the pose and map
that maximize the joint probability density function:
where is the set of motion update measurements (displacement and heading change) capturing the control history till time , is the history of anchor location observations till time , and is history of anchor identifications (data associations) where specifies the identity of the anchor observed at time .
The earliest SLAM algorithm [durrant2006simultaneous]
is based on the extended Kalman filter (EKF). The EKF represents the robot’s map and position by a high-dimensional Gaussian density over all map anchors and the robot pose. The off-diagonal elements in the covariance matrix of this multivariate Gaussian represent the correlation between errors in the robot pose and the anchors in the map. Therefore, the EKF can accommodate the natural correlation of errors in the map. To estimate the joint probability density function in Equation1, the EKF-SLAM algorithm factors it into two different models: the state transition model (motion update model) and the observation model , describing the effect of the control input and observation respectively. The motion update model is assumed to be a Markov process in which the next position depends only on the immediately preceding position and the applied control , and is independent of both the observations and the map. The observation model describes the probability of making an observation when the robot location and anchor locations are known. EKF-SLAM approach has several drawbacks. First, its computation complexity is quadratic in the number of anchors [thrun2005probabilistic]. Second, EKF assumes a Gaussian noise for the robot motion and observation. In this case, the amount of uncertainty in sensors must be relatively small, since the linearization in EKF may introduce intolerable noise. Third, EKF represents posteriors by a Gaussian, which is a unimodal distribution that works well only in tracking problems where the posterior is focused around the true state with a small margin of uncertainty. Gaussian posteriors are a poor match for many global estimation problems in which many distinct hypotheses exist, each of which forms its own mode in the posterior. This directly impacts the data association, i.e., how to determine the identity of the detected anchors when multiple of them have a similar signature (e.g. two nearby stairs), which can lead to different maps based on the chosen association.
FastSLAM [montemerlo2002fastslam, michael2003fastslam, montemerlo2007fastslam] is an alternative SLAM solution that mitigates the limitation of EKF-SLAM. FastSLAM combines particle filter and extended Kalman filter. The basic idea of FastSLAM is to maintain a set of particles. Each particle contains a sampled robot path and its own representation of the map where each anchor in the map is represented by its own local Gaussian instead of one big joint Gaussian as in the case of EKF-SLAM algorithm. This is based on the SLAM property where the map anchors are conditionally independent given the path. By factoring out the path (one per particle), we can simply treat each map anchor independently, thereby avoiding the costly step of maintaining the correlations between them that plagued the EKF-SLAM approach. The resulting representation requires space linear in the size of the map, and linear in the number of particles. Estimating the posterior over all paths enables the factorization of posterior over poses and maps:
This factorization states that the calculation of the posterior over paths and maps can be decomposed into probabilities, where is the number of anchors. FastSLAM estimates the robot path using a particle filter (Localization problem). The second term (Mapping problem) can be factored into separate problems, where the individual anchor location probability density function is estimated using an EKF. Formally, the posterior of the particle () contains a path and local estimates of anchors . A local estimate of anchor by particle is represented by anchor type, the mean of anchor position , and the position covariance .
FastSLAM has many advantages over EKF-SLAM. First, FastSLAM can be implemented in time logarithmic in the number of anchors [thrun2005probabilistic], offering computational advantages over the plain EKF-SLAM. Second, FastSLAM provides significant robustness in data association as it maintains posteriors over multiple data associations, and not just by tracking only a single data association at any point in time based on the incremental maximum likelihood data association incorporated in EKF-SLAM [montemerlo2002fastslam]. Finally, FastSLAM can also cope with non-linear models as particle filters can handle non-linear robot motion models and is guaranteed to converge under certain assumptions [montemerlo2002fastslam]. Due to these advantages, we adopt FastSLAM approach in DynamicSLAM.
3 System Overview
Now, we start with an overview of DynamicSLAM core building blocks (Figure 1), postponing the discussion on the core SLAM algorithm to the next section.
3.1 User Mobile Traces
DynamicSLAM collects time-stamped sensor measurements including accelerometer, gyroscope, and magnetometer, and the received signal strength values from the available WiFi access points and Bluetooth. The collected raw sensor measurements are preprocessed to reduce the effect of noise by applying a low-pass filter, and phone orientation changes by transforming the sensor readings from the mobile coordinate system to the world coordinate system leveraging the inertial sensors [mohssen2014s].
3.2 Dead reckoning- Motion Model
DynamicSLAM relies on dead-reckoning to track the user’s location starting from a reference point (e.g., the last GPS fix at the building entrance [alzantot2012crowdinside]). Dead-reckoning leverages inertial sensors (accelerometer, magnetometer, and gyroscope) where the current user’s location is estimated from the previous location, distance traveled by the user, and the direction of her motion during the traveling distance. The user direction of motion can be estimated from the magnetometer and/or the gyroscope [wang2012no], while the displacement can be obtained from the accelerometer [alzantot2012uptime].
3.3 Anchors Detection- Observation Model
The dead-reckoning accumulated location error is unbounded [wang2012no]. Hence, it is infeasible for indoor tracking. To mitigate this error, we propose a anchor-based displacement error resetting technique. In particular, we leverage ample and unique points in the indoor environment to reset the accumulated error when the user hits one of them. These anchor points (anchors) can be estimated in a crowd-sourced manner, removing the need for any labor-intensive calibration. Traditional anchors are either physical structure/machines or organic anchors that exist in buildings (e.g., escalators, elevators, stairs, turns, electronic machines, etc.) [abdelnasser2016semanticslam, wang2012no]. DynamicSLAM introduces a novel class of human-based anchors, where encountering other system users is used as dynamic mobile anchors, increasing the anchor density and making them more ubiquitously available. The details of extracting the two anchor classes and handling the associated challenges are discussed in the next section.
3.4 SLAM Algorithm
DynamicSLAM extends the FastSLAM algorithm to incorporate its motion and observation models. Specifically, when DynamicSLAM detects a user step, it triggers a position update where each particle updates its location belief as detailed in Section 4. When the system detects a human anchor (i.e., encountering another user), the particle location is refined by applying an extended Kalman filter on the relative users’ locations and the covariance matrix representing the uncertainty in relative location estimates. The EKF helps to filter out the noise in users relative location estimation. Two approaches that trade-off the accuracy and efficiency are proposed in DynamicSLAM to fuse the two users’ locations. On the other hand, if the particle detects a traditional building anchor, both the particle location and the map are updated to account for the detection of these anchors. Finally, in both cases, the particles importance weights are estimated to reflect the confidence of the location associated with each particle. The weights of particles are initialized equally and updated at each step based on the likelihood of the detected anchors (user and building anchors). Finally, a resampling step is performed using the current weights in order to refine the particles and drop those that significantly deviated from the actual user location. In the next section, we will explain the details of the SLAM algorithm incorporated in DynamicSLAM.
4 The DynamicSLAM System
This section elaborates on the main components of DynamicSLAM: the anchor detection and the core SLAM algorithm as well as provides a proof of the system convergence. Finally, we discuss the possible ways to deploy the system.
4.1 Anchor Detection
In this section, we first briefly discuss the detection of the traditional building anchors. We then elaborate on the details of the detection algorithm for the novel human-based anchors.
4.1.1 Building anchors
Buildings have many unique points which present identifiable signatures on one or more cell-phone sensors. These unique patterns can be leveraged to automatically recognize these anchors. Once a user sees a anchor, DynamicSLAM detects its signature and leverages it to reset the dead-reckoning error. The list of physical anchors starts by the building’s door entrance, detected by the loss of the GPS signal, which is used as a reference point to seed the dead-reckoning process [alzantot2012crowdinside]. Other indoor anchors include escalators, elevators, stairs, turns, and doors which can be detected using a finite state machine by employing the approach in [wang2012no]. Also, there may exist some anchors that are specific to some buildings. For instance, railway stations and airports have many specific machines (e.g., ticketing machines, lockers, electronic gates, etc) that can be detected using the different phone sensors [elhamshary2016transitlabel].
Finally, there are many organic anchors that may have a unique signature on the phone sensors without having a physical meaning to humans, e.g., areas with magnetic distortion caused by nearby electromagnetic waves. These points (organic anchors) can be detected using unsupervised learning techniques and can also be leveraged as anchors[abdelnasser2016semanticslam].
Note that since both anchor types (physical and organic) have a unique signature in the building, there is no need to differentiate between the different types of building anchors during processing. This differentiation is only relevant during the anchor extraction process.
DynamicSLAM learns anchors locations through crowd-sourcing, which converges to the true anchor location over time [wang2012no, abdelnasser2016semanticslam]. In addition, in some buildings, a map with the location of the building anchors, such as elevators and stairs, may be available. This can further speed the convergence process. Nonetheless, DynamicSLAM models the uncertainty of the anchor locations in its operation.
4.1.2 Human-based anchors
The density of anchors differs from one building to another. If the anchors are not dense in a given building, there are fewer opportunities for curbing the dead-reckoning accumulated error, affecting the overall localization accuracy. To alleviate this problem, DynamicSLAM introduces the concept of human-based mobile anchors, where other nearby users in the environment are used to reset the localization error. The main intuition is that, when two users encounter each other, they can calculate their relative distances, and consequently, both users can refine their internal beliefs; reducing each other uncertainty in their estimated locations. DynamicSLAM presents two different measurement models to detect the users’ encounter and estimate their relative distances based on the available signals and leading to different ubiquity and accuracy. These are the Bluetooth and the WiFi location models.
The Bluetooth Model: A user overhearing Bluetooth signals emitted from nearby users can directly infer their encounter. The Received Signal Strength (RSS) can be used to estimate the distance between the nearby users. We have conducted experiments to capture the relation between the Bluetooth RSS and the relative distance between users. We also show the quality of fitting using different fitting functions, including those based on the traditional known propagation models [bahl2000radar, feldmann2003indoor, jung2013distance].
Figure 2 shows the results. The figure confirms that the quadratic fitting function leads to the best approximation. This is consistent with previous results reported in literature [feldmann2003indoor, jung2013distance].
The user encounters detection can be performed locally on the device and forwarded to the server for processing.
The WiFi Model:
This model is based on the WiFi signals. The idea is that users in the vicinity of each other should hear similar APs with close RSS values. Therefore, to detect the encounter of two users, we check the similarity of their collected WiFi RSS measurement vectors. If the similarity is greater than a certain threshold, we detect the users’ encounter.
Figure 2: Comparison between different Bluetooth models. Figure 3: Comparison between different WiFi models.
To determine this threshold, we conducted an experiment to build a spatial physical-distance to WiFi-space distance model. Our WiFi-similarity measure takes into account the wireless channel noise that may lead to different number of APs heard in the same location at different times. Specifically, the WiFi similarity measure is calculated as:
where is the number of commonly heard access points by both users and is the RSS vector of the common APs heard by user . Note that the normalization by the number of APs makes the similarity measure independent of the number of heard APs. Moreover, one can restrict the RSS vector to the strongest APs to reduce the noise effect. This is quantified in Section 5.
Figure 2(a) shows the relation between the spatial space Euclidean distance and the WiFi-space similarity measure. Figure 2(b) further compares the accuracy of the estimated relative users’ distance using different fit models. The figure shows that different models give comparable performance. We therefore use the logarithmic model as it is commonly used in the propagation literature [bahl2000radar]. The figure also shows that the WiFi samples are noisier than the Bluetooth samples. This is due to the higher range of WiFi signals compared to Bluetooth, which makes them more susceptible to noise.
Finally, since WiFi chips cannot scan for other users’ WiFi signal during their normal operation111WiFi scans are performed between the WiFi chip on the client and the APs installed in the environment to determine the best AP to associate to. To scan for other nearby users’ WiFi signals, the device needs to be in monitoring mode, which disables the normal transmission operation., WiFi-based users’ encounters are performed at the DynamicSLAM server, where the WiFi scans of nearby APs are collected from each user and a module on the server calculates the similarity and performs the encounter detection.
|Control data at time|
|Estimated displacement at time from the sensors|
|Estimated step length for particle at time|
|Estimated heading change at time|
|Sampled heading change for the particle at time|
|Predicated location at time for the particle|
|User location at time|
|User ’s location as seen by particle at time|
|User location as seen by particle after refining using anchor at time|
|entire location history (|
|Mean of estimated location of particle of user at time|
|Covariance matrix of estimated location of particle of user at time|
|Mean of estimated location of particle given anchor at time|
|Covariance matrix of estimated location of particle given anchor at time|
|Estimated measurement of anchor position at time by user|
|Predicted measurement of particle for user of observed anchor at time|
|Covariance matrix of observing anchor at time by particle|
|Covariance matrix of control data at time|
|Measurement covariance at time|
|Kalman gain for particle at time|
|Location of anchor at time|
|Number of anchors for particle at time|
|Mean of anchor location at time for particle|
|Covariance matrix of anchor at time for particle|
|Weight of particle at time|
|anchor type (e.g. elevator)|
|Detected anchor type (e.g. elevator pattern)|
|Likelihood of observing a anchor for the first time|
|Likelihood correspondence of anchor with the observed pattern at time|
|The Jacobian of measurement model for anchor at time|
|A threshold on the number of iterations for the iterative sampling technique|
4.2 Core SLAM Algorithm
DynamicSLAM is a variant of the FastSLAM [montemerlo2002fastslam] framework optimized to operate with anchors that can be detected based on the limited capabilities of phone-embedded sensors, as compared to robot sensors. DynamicSLAM uses the particle filter to enhance user’s position. It relies on dead-reckoning to track the user’s location and predicts the distance (i.e. offset) between the user location and anchor location (Motion model / Prediction model). On the other hand, it relies on the Bluetooth/WiFi models to get the estimated distance (Observation model). Finally, DynamicSLAM combines both models using the particle filter equations.
DynamicSLAM core algorithm consists of three steps: location sampling (motion update), map update (observation update), and re-sampling. Without loss of generality, we assume that only a single anchor is observed at any time instance to simplify the problem complexity. Note that, since DynamicSLAM needs around 150ms to get the estimated location while the typical human speed is 1.4m/s; the anchor can be assumed stationary during the estimation process. Table I summarizes the notations used in the paper.
4.2.1 Motion Update
It is invoked when a step is detected based on the phone inertial sensors. At this time instance, the control signal , where and are the estimated step length and heading from the phone inertial sensors, updates the user’s location from to . More formally, given the control signal at time , we draw each particle according to:
Assuming Gaussian-distributed error for bothand , the sampled displacement and heading for particle at time are calculated as:
are the variance of the displacement and heading estimation errors respectively. Therefore, the new sampled location for a given user in Eq.4 can be rewritten as:
where are the new user heading, x, and y coordinates of the location respectively after incorporating the control input .
4.2.2 Observation Update
At any time instance, the user may observe a human-based () or building () anchor. In both cases, the current user’s location belief is updated. In addition, for the case of the building anchor, the feature map is also updated to refine the observed anchor location. The balance of this subsection describes the details of both cases.
Observing a Human-based anchor When user Alice encounters another user Bob, her location should be updated based on their relative distances and this distance covariance matrix, i.e., the confidence in the estimated location. Note that, since each user is considered a anchor for the other user, no map update is invoked in this case. To update Alice’s location based on Bob’s encounter, we propose two different techniques that trade-off the accuracy and efficiency: One-shot sampling and iterative sampling.
To account for the relative distance measurement error between the two users, we incorporate the EKF to update the user’s location belief based on the observation. The EKF captures the deviation of the predicted measurement, , (based on the estimated users’ locations) from the estimated measurement, , (based on the two observation models described in Section 4.1.2), taking into account the confidence in the different quantities
Specifically, Alice’s predicted distance measurement, i.e., the location offset between Alice’s (a) and Bob’s (b) locations is calculated as:
The mean and covariance matrix of Alice’s location distribution after incorporating Bob’s encounter is:
Where is the transpose of the Jacobian of the measurement model with respect to Bob’s location , is the covariance matrix of the control data at time , is the human-based anchor observation covariance matrix given by:
and is the measurement covariance matrix. Finally, is the Kalman gain given by:
Finally, the new location of a particle of Alice after incorporating Bob’s encounter is sampled from the following distribution:
Intuitively, when a user encounters another user, her location is updated based on her previous state, the confidence of the two users’ locations, and the innovation (the difference between the estimated and predicted relative distance). Algorithm 1 summarizes the process of refining the particle position.
In this technique, the one-shot sampling described in the previous section is repeated on the same pair of users until the particle position converges. To determine convergence, the algorithm stops when the difference between two successive location updates for particle is less than a certain threshold .
The iterative sampling approach has the advantage of increased accuracy at the expense of increased computations (as we quantify in Section 5).
Observing a Building Anchor: Similar to the human-based anchors, encountering a building-based anchor, e.g., an elevator, updates the user’s position using the EKF. In addition, the system needs to (I) resolve the uncertainty in the type of detected anchors, i.e., stairs vs elevators vs other types, due to the inherent noise in the phone sensors and (II) Update the building anchor location.
(I) Data Association (Detecting building anchor type) To account for this, we compute the conditional probability of actually observing each anchor type when the detected pattern is of type
. This uncertainty is a function of the sensors noise and is independent from the building. It is typically described by a confusion matrix, where each entry indicates the probabilityof the anchor given pattern . Given this confusion matrix and the anchor location uncertainty for each anchor in the particle’s map, the likelihood of correspondence with the observed pattern is calculated as [abdelnasser2016semanticslam]
Where is a normalization factor. This likelihood takes into account the distance between the current user’s location and the building anchor, the location uncertainty of both, and the confusion between the observed and actual anchor types ( vs ).
We assume that the probability of observing a newly unseen anchor given the detected pattern is calculated as:
where is a constant determined empirically. (II) Updating anchor Location (Map update) The anchor with the highest probability () is selected as the observed anchor and its location is updated using the standard EKF formulas. (Algorithm 3).
The last step assigns a weight to each particle to reflect the confidence of its location. The particles’ weights are initialized uniformly and updated at each step based on the likelihood of the detected anchors, i.e., the anchor with the highest probability, as:
These weights are normalized to add to one. Then, the location and the map of the particle having the maximum weight are selected as the current estimate. Finally, the algorithm draws with replacement new particles to avoid degeneration. Algorithm 2 summarizes the full DynamicSLAM algorithm.
4.3 DynamicSLAM Convergence
In this section, we argue about the system convergence as well as the effectiveness of human anchors ability to reset the accumulated error. Here convergence refers to reducing the error in the users’ locations to zero. Assume that user encounters an anchor (human or building anchor). We define the error in user location as follows:
Note that the ground-truth user location is used in the previous equation only to prove DynamicSLAM convergence and is not needed during the actual system operation. From equations 6-8 and 9, , . Also, by setting , , and , the mean and covariance of Eq. 10 and 11 of the proposal distribution are defined as a follows
First, we study the human anchor ability to reset the accumulated error.
If user with error encounters another user with error , where , then will shrink in expectation. Conversely, if , then the later may increase but will not exceed in expectation.
The expected error of user at time is given by:
We can get the first term from the sampling distribution (Eq 19) which depends on user location, anchor location, estimated measurements, and the uncertainty in anchor and user locations. The second term can be obtained from the motion model (Eq 7 and 8). Therefore:
For linear SLAM, . Hence,
As , , and matrices are positive semidefinite, the value of is a contraction matrix. Hence, depends on the difference . If , then user location error will be reduced in expectation. If , then the will increase by a certain amount depending on the difference, which makes the upper bound for the expected error, , . Now, we study the effect of a user that encounters a building anchor with known location.
If a user encounters a building anchor with known location , then the error in her location will shrink in expectation.
The building anchor with known location has zero error = 0 and . By substitution in Eq. 24
The following theorem proves our system convergence.
DynamicSLAM converges even for =1 particles if there is at least one building anchor whose location is known in advance.
Assuming that the building contains at least one of building anchor with known location, some of the system users will eventually observe the building anchors, which according to Lemma 2 will lead to reducing their error. Similarly, when two users encounter each other, the one with the lower accuracy will enhance his estimate based on the other user (from Lemma 1). Therefore, the error of the different users in DynamicSLAM will always reduce, leading to convergance.
DynamicSLAM can have different deployment modes. In this paper, we assumed that the system operations are performed on a centralized server. However, the system can be completely implemented on devices that communicate with each others in a peer-to-peer manner. The devices in this case will broadcast their own estimated locations, confidence in the estimated locations, and RSS heard from other devices. Note that, unlike the traditional collaborative SLAM-based techniques, DynamicSLAM does not need to exchange particles local maps as the users’ locations are updated based on their local maps only.
The peer-to-peer implementation is more private. It can solve the latency and connection problems due to the centralized process. However, making all computations on smart-phones may be an overhead due to the limited computational power on smart-phones. In both cases, cryptographic approaches can be used to prevent malicious users from introducing errors in the system, e.g. by broadcasting fake or wrong locations.
In this section, we describe the testbed and the data collection methodology followed by evaluating the performance of the system components under different parameter settings. We end the section by comparing DynamicSLAM performance against other systems and quantifying its energy footprint.
5.1 Testbed & Data Collection Methodology
We deployed DynamicSLAM in the engineering building in our university campus. The building area is 3000 containing offices, labs, meeting rooms as well as corridors. The building has 30 building anchors (physical and organic). The data is collected by 23 users using different Android phones (e.g., LG Nexus 5, Samsung Galaxy Note 3, Galaxy 4, Galaxy Tab, among others). The data is then processed in a centralized server. Each user walked around in a random independent path in the building for at least 250m passing by some building anchors (10 on average) as well as encountering other users (7 on average). If a user encounters another group of users, the system selects the user with the highest confidence.
|Measurement Model||Bluetooth, WiFi||WiFi|
|Location update technique||One-shot sampling, Iterative sampling||Iterative sampling|
|Number of particles||1 to 100||75|
|User inverse densityuser)||3 to 3000||30|
|RSS threshold (Bluetooth model)||-70 to -95||-90|
|WiFi threshold (WiFi model)||2 to 14||8|
|Number of APs (WiFi model)||1 to 16||5|
|Iterative sampling threshold||0 to 1||0.1|
5.2 Effect of Changing DynamicSLAM Parameters
In this section, we study the effect of the different parameters on the system performance including the user encounter detection method, user density in the building, sampling method, and number of particles. Table II shows the default parameters values used throughout the evaluation section.
5.2.1 Users’ encounter observation models
Bluetooth model Figure 7 shows the accuracy of the estimated relative users’ distance (in terms of Root Mean Square Error) at different cut-off Bluetooth RSSs. The figure shows that the higher the cut-off threshold, i.e. the stronger the Bluetooth signal, the lower the relative distance error. However, requiring a strong Bluetooth signal reduces the users’ encounter opportunities due to the lower range. For example, setting the cut-off RSS threshold to -70dBm, the encounter detection range is around 4m (From Figure 2), and the RMSE of relative distance is around 1m.
WiFi Model Similarly, decreasing the similarity threshold for the WiFi signal leads to better accuracy at the expense of a shorter range (Figure 7). We use a WiFi threshold that is equal to 8dBm in our experiments. Figure 7 further shows the effect of controlling the number of APs used in the similarity calculation. The figure shows that increasing the number of the strongest APs used in similarity calculations leads to higher accuracy until we reach an optimal point at five APs. This can be explained by noting that a few number of APs leads to ambiguity in the user location while a large number of APs adds noise to the APs vector, both lead to reduced accuracy.
Models Comparison Figure 7 compares the box-plot of relative distance error estimation by the two encounter observation models. Evident from the figure, the Bluetooth and WiFi models have comparable performance. This is an interesting result as one should expect that Bluetooth should lead to better accuracy due to its shorter range. However, WiFi has more than one AP in a scan, compared to only one device for Bluetooth. This compensates for the noise due to the higher range of WiFi. Finally, we note that the two observation models are independent and can be used together to provide both higher range and better accuracy.
5.2.2 Sampling Method
shows the box-plot of the two sampling techniques used to update the user’s location: one-shot and iterative sampling. The figure confirms that iterative sampling outperforms one-shot sampling in the median as well as the different quantile errors. This comes at the expense of increased computation time as shown in Figure7(b).
5.2.3 Number of Particles
Figure 8(a) shows the effect of increasing the number of particles on performance. The figure shows that as more particles are used to represent the user’s location, the localization accuracy increases. This highlights that the particle filter does capture the dynamics of the system. The figure also shows that as more particles are used to represent the user’s location, the localization accuracy increases. Nonetheless, the computation time of the system increases linearly with the increase in the number of particles as shown in Figure 8(b). To find a comprise of the computation time- accuracy tradeoff, DynamicSLAM uses only 75 particles.
5.2.4 User Density
Figure 10 shows the effect of using the traditional building anchors only and using the new dynamic anchors on localization accuracy. Figure 12 shows the effect of the number of per user (inverse of the user density) on the median localization error. Figure 12 further shows the effect of users’ encounter rate on the localization accuracy. As more users move around, there is a higher probability for users’ encounters, leading to more opportunities of individual user location calibration, and hence better localization accuracy. The secondary x-axis shows the user speed to achieve a certain number of encounters (linear relation) under the assumption that other users are uniformly and independently distributed in the area[liu2005mobility].
Note that, regardless of the user speed, since DynamicSLAM needs around 150ms to get the estimated location while the typical human speed is 1.4m/s [levine1999pace]; the user can be assumed stationary during the estimation process.
5.3 Comparison With Other Systems
In this section, we compare the localization accuracy of DynamicSLAM against the SemanticSLAM technique that employs the SLAM approach based on the standard phone sensors using only traditional building anchors. Figure 14 shows the comparison between our proposed DynamicSLAM system and the SemanticSLAM system under different building anchors densities. The figure shows that reducing the building anchor densities affects the SemanticSLAM system accuracy while DynamicSLAM maintains its accuracy. This is due to the fact that DynamicSLAM leverages the users in the environment as anchors. Hence, its performance is less affected by the building anchors density.
Figure 14 also shows the CDF of the localization accuracy for both systems and Table III summarizes the results. DynamicSLAM achieves a median distance error around 1.1m, which is better than SemanticSLAM by 55%. Moreover, DynamicSLAM reduces the worst case error by 29% and enhances all other quantiles. This is due to the new human-based mobile anchors that provide ample opportunities for error resetting as illustrated in Fig 16.
|10cmSemanticSLAM [abdelnasser2016semanticslam]||1.2 (-170% )||1.7 (-54%)||2.6 (-52%)||3.0 (-30%)||6.6 (-29%)|
5.4 Energy Footprint
DynamicSLAM harnesses inertial sensors, WiFi, and Bluetooth for its operation. The inertial sensors are always running to detect the phone orientation are used in a variety of mobile applications. Therefore, using them consumes virtually zero extra energy. Similarly, many users turn-on WiFi indoors to connect the Internet. Thus, as many sensors are activated by default for other purposes, this curbs the overall power consumption of DynamicSLAM. Moreover, to further enhance DynamicSLAM energy footprint, we adopt an adaptive sensing strategy where the sensors sampling rate and their activation are based on the nearby anchor density. For example, if a certain area with high anchor density, DynamicSLAM can decrease the sensor sampling rate as the location error will be frequently calibrated by these anchors. Additionally, Bluetooth and WiFi can be turned off if the server predicts that the probability of different users’ encounter is small. Figure 16 compares the power consumption (calculated using the PowerTutor [yang2012powertutor]) of DynamicSLAM against the GPS (at 2Hz) (used as a baseline).
6 Related Work
6.1 Indoor Localization
Mobile phone localization has been well-studied in literature [shokry2017tale, park2010growing, jin2011robust, wang2012no, shokry2018deeploc, rizk2018cellindeep]. The most ubiquitous indoor localization techniques are either WiFi-based or dead-reckoning based. WiFi-based techniques [bahl2000radar, lamarca2005place], require calibration to create a prior “RF map” for the building to counter the wireless channel noise. This calibration process is time consuming, tedious, and requires periodic updates. This led to the emergence of new calibration-free techniques [elbakly2016robust, park2010growing]. These techniques, however, suffer from low location accuracy or requiring explicit user intervention. In dead-reckoning based system [jin2011robust, wang2012no], the inertial sensors on mobile phones are leveraged to dead-reckon the user starting from a reference point. Nevertheless, dead-reckoning error quickly accumulates with time; leading to a complete deviation from the actual path. Therefore, many techniques have been proposed to reset the dead-reckoning error including snapping to environment anchor points, such as elevators and stairs [wang2012no, abdelnasser2016semanticslam] and matching with the map information [rai2012zee]. DynamicSLAM combines dead-reckoning as its motion model with building anchors and its novel human-based anchors, as observation models, that are ubiquitously in any building gives it an ample opportunities to reset the dead-reckoning error.
6.2 Encounter-based Localization
Recent work has explored more social aspects of indoor localization named encounters-based approaches [jun2013social, martin2014social]. The basic idea behind these approaches is to leverage the useful proximity information between users to enhance the localization accuracy. SocialLoc [jun2013social] utilises users’ meeting or missing information to correct localization errors. A user may intersects with another user, which is defined as an encounter. If the user does not meet another specific user during the localization process, such an event is defined as non-encounter. SocialLoc utilizes these encounter and non-encounter events to cooperatively correct the localization errors. SocialSpring [martin2014social] is a general system that sits on top of an indoor positioning system. It leverages the pairwise distance measurements between the different users in the environment to refine position estimates.
DynamicSLAM, on the other hand, combines building anchors with human-based anchors (users’ encounter) in a collaborative SLAM framework to enhance the localization accuracy.
6.3 Simultaneous Localization and Mapping
SLAM [durrant2006simultaneous] is a well-known technique in the robotics domain where robot motion and observation models (e.g., odometers, laser, ultrasound) are cost-prohibitive for smartphones. Recently, a number of SLAM-based indoor localization systems have been proposed for tracking human users. For instance, WiFi-SLAM [ferris2007wifi] builds a WiFi map based on a Gaussian Process Latent Variable Model; which allows for detecting the latent-space locations of unlabeled signal strength data. ActionSLAM [hardegger2012actionslam] leverages a dedicated foot-mounted inertial sensor to track the user’s motion while using actions uniquely related to certain locations as anchors for the user’s location correction. Recently, SemanticSLAM [abdelnasser2016semanticslam], generalizes the SLAM concept to work with smartphones without using any external hardware. It harnesses smartphone inertial sensors to dead-reckon the user’s location and leverages anchor points (elevator, escalator, stairs), which can be uniquely detected by phone sensors for the accumulated error resetting. These systems, however, either require special hardware or are dependent on a limited set of anchors that are not ubiquitously available in all buildings. DynamicSLAM, on the other hand, combines building anchors with the newly introduced human-based anchors that are ubiquitously in any environment.
6.4 Multi-Robot SLAM
Multi-Robot SLAM is an extension for SLAM that can benefit from having multiple robots in many places to carry out diverse tasks at the same time [howard2006multi, williams2002towards, fenwick2002cooperative, birk2006merging, thrun2005multi, ko2003practical, zhou2006multi, howard2004multi]. The typical ultimate target of Multi-Robot SLAM systems is to construct the joint (common) map of an unknown environment, i.e. a virtual map of the anchors the robots see. For this, they use the traditional/static building anchors to reset the robots’ accumulated error. For instance the work in [howard2006multi], uses robots’ specific sensors (laser range-finders) to detect robot pairs encounter. Then, by determining their relative position using laser sensors, the system combines the observations from both robots into the common map. Hence, it indirectly refines robots’ locations by obtaining a more accurate common map.
DynamicSLAM, on the other hand, does enhance the environment map (traditional anchors) using a similar approach to multi-robot SLAM. In addition, we also leverage the other users (other robots in multi-robot SLAM) as a new type of anchors. Specifically, we leverage other users (robots) as a temporary anchor to directly refine users’ positions and reset accumulated error. This leads to increased anchor density and hence better localization accuracy as quantified in the evaluation section.
We presented DynamicSLAM: an accurate, low-overhead, and collaborative simultaneous localization and mapping system. DynamicSLAM tracks the user’s location using dead-reckoning and leverages physical anchors existing in buildings as well a novel human-based dynamic anchors to reset the accumulated localization error in a unified probabilistic framework. A number of methods to detect users’ encounters and to handle their relative distance measurement error have also been proposed. Furthermore, we presented a theoretical proof of system convergence as well as the human anchors ability to reset the accumulated error.
Evaluation of DynamicSLAM in a typical university building using different Android phones shows that it can achieve a median distance error of 1.1m, which outperforms other state-of-the-art indoor localization techniques by approximately 55%, highlighting its promise for ubiquitous indoor localization.