Log In Sign Up

Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement

by   Ming Lin, et al.

Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression is chosen to model the vertical Total Electron Content (vTEC) distribution of the ionosphere of the Earth. Our experiments show that the noise in the real-time GPS signals often exceeds the breakdown point of the conventional robust regression methods resulting in sub-optimal system performance. We propose a three-step approach to address this challenge. In the first step we perform a set of signal validity tests to separate the signals into clean and dirty groups. In the second step, we train an initial model on the clean signals and then reweigting the dirty signals based on the residual error. A final model is retrained on both the clean signals and the reweighted dirty signals. In the theoretical analysis, we prove that the proposed three-step approach is able to tolerate much higher noise level than the vanilla robust regression methods if two reweighting rules are followed. We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods. Our method achieves centimeter positioning accuracy in the benchmark region with probability 78.4% , outperforming the second best baseline method by a margin of 8.3%. The benchmark takes 6 hours on 20,000 CPU cores or 14 years on a single CPU.


page 1

page 2

page 3

page 4


Emulating satellite drag from large simulation experiments

Obtaining accurate estimates of satellite drag coefficients in low Earth...

High Precision Indoor Navigation for Autonomous Vehicles

Autonomous driving is an important trend of the automotive industry. The...

Improving GPS/INS Integration through Neural Networks

The Global Positioning Systems (GPS) and Inertial Navigation System (INS...

Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee

Factorization machine (FM) is a popular machine learning model to captur...

OCR-RTPS: An OCR-based real-time positioning system for the valet parking

Obtaining the position of ego-vehicle is a crucial prerequisite for auto...

Scaling Multidimensional Inference for Structured Gaussian Processes

Exact Gaussian Process (GP) regression has O(N^3) runtime for data size ...

Matrix Completion Methods for the Total Electron Content Video Reconstruction

The total electron content (TEC) maps can be used to estimate the signal...

1 Introduction

Real-time high precision positioning service is a critical core component in modern AI-driven industries, such as self-driving cars, autonomous drones and so on. The Global Positioning System (GPS) is inarguably the most popular, if not the only for now, satellite-based positioning system accessible to public all around the world. The major performance index of GPS based system is the positioning accuracy. When using civilian level smart phones with GPS enabled, the horizontal positioning accuracy is around 5 meters. Even equipped with high quality, single frequency GPS receiver, the horizontal accuracy is typically within 1.891 meters with 95% probability111Data source While it might be sufficient for daily usage, the meter level accuracy is insufficient for many modern autonomous applications. Previously a popular method to achieve the centimeter accuracy is to fix the GPS receiver at one point for several hours. This method is clearly infeasible in mobile applications. How to achieve centimeter accuracy in real-time is therefore an open challenge in this domain.

In this work, we develop a data mining approach to address the above challenge. The majority of the noise in GPS signal is the signal delay caused by the ionosphere which can be described by the vertical Total Electron Content (vTEC). To eliminate this delay, we build a grid system where each node in the grid is a ground station equipped with high precision multiple frequency GPS receiver. A Gaussian Process model (GP) is learned to predict the vTEC value for any given geographic coordinate. When a mobile client requests positioning service, its GPS signal is calibrated by our GP model every second.

A critical problem in the model training step is the large amount of noisy data. As validated in our experiment, there are 20%

40% outliers in the GPS signal received by our grid system. Directly applying off-the-shelf robust regression methods cannot achieve the required performance because existing methods are either non-consistent or are not suitable to tolerate high corruption rate. To overcome this difficulty, we develop a screening algorithm to detect outliers and split the received signal into clean and noisy subsets. The GP model is then trained on the clean dataset.

However, only training on the clean dataset has an obvious drawback. The screening algorithm is not always reliable. It often over-kills clean data. If the dirty dataset is completely discarded after screening, we cannot collect sufficient clean data for robust prediction. On the other hand, each data point is collected from an expensive high precision GPS receiver therefore simply discarding the noisy dataset is a great waste. In the worst case nearly 40% data points will be marked as noisy in one day. This number is even higher at noon in summer which results in a low positioning accuracy or even interruption of our service.

In order to address the above problem, we formulate the problem as a robust regression problem. We are considering a scenario where we are provided a clean dataset and a noisy dataset by an unreliable screening algorithm. The noise in the clean dataset is sub-gaussian while the noise in the noisy dataset is heavy-tailed. The volume of the noisy dataset might be infinite. We aim to design an efficient and robust algorithm to boost the model performance by learning on both clean and noisy datasets. Under this setting, we find that existing robust regression methods cannot be applied directly due to their small breakdown points or due to their inconsistency. To this end, we propose a three-step algorithm. Our algorithm first learns an initial estimator only using the clean dataset. Then we apply the estimator on the noisy dataset to filter out those of large residual error. The remaining noisy data points are reweighted according to their residual error and finally a robust estimator is retrained on the reweighted noisy data points in addition to the clean dataset together. We call this approach as Filter-Reweight-Retrain (FRR). Our theoretical analysis shows that the three steps in FRR is

not only sufficient but actually necessary

. The filtering step truncates the noise level in the noisy dataset. The reweighting step reduces the variance of the noise. When the volume of the noisy dataset is infinite, FRR is consistent which means that it achieves zero recovery error. When the volume of the noisy dataset is finite, a reweighting scheme is proposed to improve the asymptotic recovery error bound. While many previous robust regression methods involve iterative retraining and reweighting, we show that FRR does not need multiple iterations. In fact, our analysis suggests that designing an effective reweighting-retraining scheme is non-trivial due to the risk of over-fitting. We give two general rules to avoid over-fitting in the reweighting step.

The remainder of this work is organized as following. In Section 2 we briefly survey related works. Section 3 introduces some background knowledge of our GPS positioning system and our main algorithm, robust Gaussian process regression with FRR. Theoretical analysis is given in Section 4. We demonstrate the performance of our method in real-time GPS positioning system in Section 5. Section 6 encloses this work.

2 Related Work

The vTEC contributes the majority of noise in the GPS signal received by ground stations. Conventional ionosphere scientific researches focus on the static estimation of vTEC. For example, Sardon et al. (1994); Mannucci et al. (1998)

used Kalman filter to vTEC based on GPS signals received by multiple ground stations.

Arikan et al. (2003) proposed a high pass penalty regularizer to smooth the estimated vTEC values. Mukesh (2019) described the vTEC distribution by an ordinary kriging (OK)-based surrogate model. The real-time vTEC estimation was considered in Huang and Reinisch (2001). In Renga et al. (2018), the authors proposed a Linear Thin Shell model to better describe the horizontal variation of the vTEC distribution in real-time. Several researches introduce multiple receivers to jointly estimate the vTEC. In Otsuka et al. (2002), the authors used over 1000 dual frequency receivers to construct a large-scale vTEC map over Japan. Zhang et al. (2018) used low-cost receivers jointly to improve the vTEC estimation quality. Comparing to previous researches, we use robust GP regression to model the vTEC distribution. The GPS signal is collected from a few hundreds of ground stations in a given region. With modern hardware and our new algorithm, we are able to report the centimeter positioning accuracy in real-time.

In regard to the robust regression, we briefly survey recent works closely related to ours. As there is a vast amount of robust regression literature, we refer to Morgenthaler (2007) for a more comprehensive review.

When comparing robust regression methods, we usually consider the performance of an algorithm from three aspects: breakdown point, consistency and corruption model. The breakdown point is a real number which measures the maximum ratio of corruption data the algorithm can tolerance. A more robust method should have a larger breakdown point. The consistency tells whether the algorithm is unbiased. An unbiased algorithm should achieve zero recovery error when provided with infinite training data. The corruption model makes assumptions about the noise. The conventional setting assumes that the training data is randomly corrupted by arbitrary large noise. A more stronger corruption model is the oblivious adversarial model which assumes that the attacker can add noise at arbitrary location without looking at the data. The most difficult model is the adaptive adversarial model. In this model, the attacker is able to review the data and then adaptively corrupts the data.

Ideally, we wish to find a robust algorithm with breakdown point asymptotically convergent to while being consistent under adaptive adversarial corruption. However, it is impossible to satisfy all the three requirements Diakonikolas et al. (2018). A popular approach is to use

-norm in the loss function. The

-norm based methods usually have breakdown point asymptotically convergent to 1 but inconsistent Wright and Ma (2010); Nguyen and Tran (2013). Dalalyan and Chen (2012) proposed a second order cone programming method (SOCP). The SOCP has breakdown point asymptotically convergent to but is inconsistent. McWilliams et al. (2014) proposed to use weighted subsampling and then iteratively retrain the model. Its breakdown point is and the algorithm is known to be inconsistent. Recently, a branch of hard iterative thresholding (HIT) based methods are proven to be more efficient. Chen et al. (2013) shows that HIT has breakdown point on order of ). Bhatia et al. (2015) proposed a variant of HIT with breakdown point convergent to but inconsistent. The first consistent HIT algorithm is proposed by Bhatia et al. (2017) with breakdown point . The corruption models in the above works are either oblivious or adaptive.

Comparing to the above works, our setting is new where we are given two datasets to train our model rather than one mixed with clean and corrupted data. In our setting, the volume of the noisy dataset could be infinite which means the breakdown point of our algorithm is asymptotically convergent to 1. Our algorithm is consistent when the volume of the noisy dataset is infinite.

3 Our System and Algorithm

This section consists of two subsections. In subsection 3.1, we describe our Global Navigation Satellite System (GNSS) behind the scene. We give our FRR algorithm in subsection 3.2.

3.1 Global Navigation Satellite System

(a) (b) (c)
Figure 1: GNSS Background

A global navigation satellite system (GNSS) is a satellite navigation system that provides positioning service with global coverage, including GPS from the United States, GLONASS from Russia, BeiDou from China and Galileo from European Union. In GNSS, the satellite positions are known. With high precision distance measurement between user and more than 4 satellites, the user’s location can be determined Hofmann-Wellenhof et al. (2012). The distance is calculated from the time interval between signal emitting from the satellite and signal receiving by the user, times the speed of microwave transmission through the path from the satellite to the user.

In a high precision positioning system, the major error terms along the path of microwave transmission are shown in Figure 1(a). The pseudorange, a measure of distance between the satellite and the receiver, is obtained through the correlation of the modulated code in the received signal from the satellite with the same code generated in the receiver. The pseudorange is affected by the following factors:

  • The Euclidean distance between satellite position at signal emission and the receiver at the signal reception.

  • Offsets of receiver clock and satellite clock. They are the clock synchronism errors of the receiver and satellite referring to GNSS time. They are associated with the specific receiver or satellite.

  • Relativistic correction which can be modeled very well.

  • Tropospheric delay which is frequency independent.

  • Ionospheric delay which is frequency dependent.

  • Instrumental delay which is frequency dependent and can be model very well.

  • Multi-path effect. This error can be minimized by high quality antenna.

Most of errors like clock error and instrumental delay can be mitigated by double differencing and multiple frequencies Pajares et al. (2005). The computation of double difference is shown in Figure 1(b). Biases due to orbit and atmosphere are significantly reduced due to the similar path of transmissions. As the ionospheric delays are spatially correlated, the between-receiver differenced ionospheric bias is much less than its absolute values.

After double difference, the residual tropospheric delays are rather small since the tropospheric delays can be at least 90% corrected using the empirical model Collins and Langley (1997). The remaining error is dominated by the modeling of the ionosphere. The ionosphere is in the Earth’s upper atmosphere. It can be modeled as a thin layer of ionized electrons at 350 kilometers above sea level Klobuchar (1987). The vertical Total Electron Content (vTEC) is an important descriptive parameter of the ionosphere. It integrates the total number of electrons along a perpendicular path through the ionosphere. The additional phase and group delay of microwave signal accumulated during transmission through ionosphere are proportional to the vTEC. The vTEC is impacted by the nearly unpredictable solar flare which brings inherently high temporal and spatial fluctuations Meza et al. (2009).

To estimate the vTEC, we use machine learning to learn the vTEC distribution from double differences received in real-time. However due to the difficulties in signal resolving, the quality of the received data varies a lot. To understand this, we must understand how the signal is resolved. First we solve the double differential equations with both code and phase using wide-lane combination

Teunissen (1998). The equations are solved with Kalman filter to get floating ambiguity Teunissen (2003). Then the narrow-lane float ambiguity is derived from ionosphere-free and wide-lane combination. The integer phase ambiguity can be found by Least-squares AMBiguity Decorrelation Adjustment (LAMBDA) method Teunissen (1995). After finding the integer ambiguity, solve the double differential phase equations and get the base-line Ionospheric delays which are frequency dependent and the Tropospheric delay which are frequency independent.

However, the LAMBDA method does not always find the optimal integer solutions. We use Closed Loop and n-sigma outlier tests to check if the solved base-lines are self-consistent as shown in Figure 1(c). We mark the data point as clean if it passes the tests otherwise we mark it as noisy/corrupted. The two tests work well when the ionosphere is quiet. In summer or in equatorial region, ionosphere fluctuates violently and the LAMBDA method does not work well. The low quality of integer solutions found by LAMBDA drives the Closed Loop and n-sigma test to over-kill and under-kill data, resulting in large noise and missing data.

3.2 Filter-Reweight-Retrain Algorithm For GPS Signal Enhancement

0:  Clean set , noisy set , truncation threshold and reweighting function in Eq. (1).
0:  Robust estimation .
1:  .
2:  , .
3:  .
4:   if otherwise .
5:  Retrain weighted least square
6:  Output: .
Algorithm 1 Filter-Reweight-Retrain Meta Algorithm

The FRR is applicable to any base estimator that can generate decision value. For sake of simplicity, we restrict ourselves in dimensional linear space. Following the standard settings, suppose that the data point

are sampled from sub-gaussian distribution such that

where is a universal constant, . The label is a real number. We are given two datasets consisting of pairs, the clean set and the noisy set . Both sets are independently and identically (i.i.d.) sampled. In the clean set , we assume that

The noise term is i.i.d sampled from sub-gaussian distribution with mean zero, variance proxy and boundary . In the noisy set , the observed label is corrupted by the noise model

The noise term is i.i.d. sampled from heavy-tailed distribution. It is important to emphasize that even the mean value of may not exist therefore directly learning on is impossible. Without loss of generality, we assume that the distribution of

is symmetric. Otherwise we add a bias term in the linear regression model to capture the offset. The volume of the clean set

is denoted as and the volume of the noisy set . is the noise ratio of the full dataset. We choose linear least square regression as our base estimator defined by

The recovery error is used as performance index in our theoretical analysis.

Our FRR algorithm is detailed in Algorithm 1. First we train an initial estimator with the clean dataset . The residual error is computed for every data point in the noisy dataset and then any with residual error larger than the threshold is filtered out. The noisy dataset after filtering is denoted as . We assign instance weight for each . Finally we retrain the weighted least square to get the robust estimation .

The reader might note that we have not specify how to choose and in Algorithm 1. It is possible to adaptively choose these parameters such that the recovery error bound given in Theorem 3 (Section 4) is minimized. . When the volume of the noisy set is infinite, we can simply choose sufficiently large and . Theorem 3 guarantees that the recovery error is asymptotically zero in this case. However when the volume of the noisy set is finite, finding the optimal parameters requires more efforts. In practice we suggest to choose


where are tuned by cross-validation. The reader is free to design any filtering and reweighting scheme if he/she has more prior knowledge about the noise distribution. However, several rules must be followed when designing the filtering and reweighting scheme. Please check the discussion under Theorem 3 to see why those rules are important to the success of FRR.

Remark 1

Although Algorithm 1

only concerns about linear estimator and least square loss function, the FRR is essentially a meta algorithm applicable to non-linear kernel machines and general convex loss function, for example Gaussian process regression. The feature vector

could be replaced by where is the feature map function induced by the kernel function and the inner product could be replaced by the corresponding inner product defined on the Reproducing Hilbert Kernel Space (RHKS). However in this generalized formulation, our proof should be modified respectively. For example we should replace matrix concentration with Talagrand’s concentration. To keep things intuitive without loss of generality, we will focus on the linear model in our theoretical analysis.

4 Theoretical Analysis

In this section, we give the statistical learning guarantees of FRR. Our main result is given in Theorem 3 which claims the recovery error bound. Two non-trivial rules for effective filtering and reweighting are derived from Theorem 3.

First we show that the initial estimator trained on clean dataset is not far away from the ground-truth . This is a well-known result in linear regression. The proof is given in Appendix A.

Lemma 1.

In Algorithm 1, with probability at least ,

provided that .

Lemma 1 shows that the initial estimator will not be far away from as long as the clean dataset is sufficiently large. The FRR only requires the volume of the clean dataset no less than . The major challenge is how to retrain on the noisy dataset. Recall that is heavy-tailed so we cannot bound any concentration directly on . Before we can do any learning on the noisy dataset, we must bound the variance of . To this end, the FRR uses to truncate via instance filtering. After the filtering step on line 3 in Algorithm 1, for any we have the following lemma. The proof is given in Appendix B.

Lemma 2.

In Algorithm 1, with probability at least , we have

Lemma 2 shows that when we take , we can truncate the noise level on below . It is important to note that in the proof of Lemma 2, we use the independence of and . This indicates that one cannot retrain on twice. Actually in the second round of retraining, since previous depends on , the bound will degrade to which requires . However, if the clean dataset is larger than , the initial estimator is already good enough and no retraining is necessary at all. In the experiment, we find that iteratively retraining indeed hurts the performance.

After filtering, we get a bounded noise distribution on making learning possible. Suppose after filtering, we have . The sample weights are computed via . The following main theorem bounds the recovery error after retraining.

Theorem 3.

In Algorithm 3, denote , . are weights of elements in , . are sub-gaussian random vectors with bounded norm where is a universal constant, . Define

Choose such that are bounded and . Then with probability at least ,




Denote , , , , . is the diagonal function.

According to Algorithm 1,

The derivation is

Similar to Lemma 1,

To ensure the matrix inversion exists, we need to ensure that is invertible as is a symmetric positive semi-definite matrix. According to Lemma 1, when

we have with probability at least ,


is the smallest eigenvalue.

To bound ,

Please note that and are not independent so we cannot simply write . As are symmetric PSD matrices,

Next we bound the concentration of . Applying matrix Bernstein’s inequality, with probability at least ,



Therefore with probability at least :

provided . To ensure the lower bound is meaningful, we must constrain

Combining all above together, we have with probability at least ,


To bound , according to Lemma 1, with probability at least ,

provided .

To bound ,

According to the assumption, the noise distribution is assumed to be symmetric,

In order to apply matrix Bernstein’s inequality again,

Therefore with probability at least ,


We then get

Combining all together, with probability at least ,


Theorem 3 is our main result. It delivers several important messages. Roughly speaking, the recovery error of FRR is bounded by . The choice of is critical in our analysis. If depends on , we cannot simply take in the proof therefore there is no closed-form tight upper bound as in Theorem 3. The constants are defined to capture such dependency. In the simplest scenario where is a fixed constant independent from , the recovery error of FRR is then bounded by where is the variance of truncated noise in . Therefore for a fixed reweighting function , the recovery error will always decrease when gets larger and controlled by increases at a lower rate. If we have prior knowledge about the distribution of noise term , we could choose to minimize the above error bound.

An obvious problem of the constant reweighting scheme is that it weights instances of large noise and small noise equally. Observing that the noise variance is multiplied by , it seems reasonable to assign small weights for instances with large residual error , leading to a better (but actually incorrect) recovery error bound . We argue that the analysis of adaptive reweighting scheme is more complex than the constant case. When depends on , one cannot simply apply the expectation equality and concentration inequality in the same way. A counter example is to select one instance with and then set , for . This is equal to retrain the model using the -th instance along. Clearly the recovery error bound is not bounded by . This counter example shows that there are some rules we must follow when designing adaptive reweighting scheme. Based on the proof of Theorem 3, we summarize the following two rules:

  • The reweighting function should depend on only. That is, must not refer to any other . This rule ensures that are independent to each other so that is concentrated around its mean value. Particularly, this rule prohibits jointly and iteratively optimizing .

  • The maximum of should be bounded. This rule prevents the reweighting step assigns large weight for one single instance.

Please be advised that the above two rules are derived from the concentration inequalities we used in our proof. They are imposed by the nature of our learning problem rather than the weakness of FRR. Informally speaking, it is discouraged to tune too aggressively. In practice, we suggest to choose a base weight for all instances in and then decay the weight a bit proportional to . We find that with is a good choice for our problem. It is easy to verify that this choice satisfies the above two rules.

5 Experiment

Dataset # clean data # noisy data noisy ratio
TrainDay1 17751091 4288037 19.5%
TrainDay2 17443691 4619126 20.9%
TrainDay3 16198561 5851525 26.5%
TrainDay4 17091968 5042280 22.8%
TrainDay5 15807494 6305344 28.5%
TestDay1 16292229 5605534 25.6%
TestDay2 15048558 6983814 31.7%
TestDay3 14616549 7327788 33.4%
TestDay4 15191055 6849075 31.1%
TestDay5 13616039 8374171 38.1%
Table 1: Dataset Statistics
(a) TestDay1 (b) TestDay2 (c) TestDay3
(d) TestDay4 (e) TestDay5 (f) Average
Figure 2: RTK Ratio Comparison

We are using a Network-based Real-time Kinematic (RTK) carrier-phase ground-based augmentation system to achieve centimeter-level positioning service. A triangular mesh grid of 20 satellite stations spaced by 75 kilometers are distributed over 100 000 square kilometers area in an anonymous region. They can receive and process signals of the satellites from all 4 major GNSS systems. All stations are equipped with dual-frequency receivers. When satellites’ orbit above horizon with elevation angle larger than 15 degrees, each receiver-satellite pair forms a Pierce Point on the ionosphere. Their data will be sent to a central processing facility hosted on the cloud to fit the model for vTEC pseudorange compensation.

We collected a dataset consisting of 10 consecutive days for experiment. We use the first 5 days as training set and the last 5 days as testing set. Due to data anonymity, we mark the training dates from TrainDay1 to TrainDay5 and the testing dates from TestDay1 to TestDay5. The dataset statistics are summarized in Table 1. The second and the third columns are the number of clean and noisy data points respectively (See Subsection 3.1 for our algorithm to detect the noisy data points). The last column reports the ratio of noisy data points with respect to the total number of data points collected in one day. As shown in Table 1, around 30% data points will be filtered out by our noisy data screening algorithm. This is a great waste if no robust method is used to recycle the noisy data points. On the other hand, simply using all data points is harmful as we will show shortly.

It is important to note that the definition of training/testing in our problem does not follow the convention. We choose one ground station as the testing station and use its measured signal as ground-truth. Around this testing station, we choose 16 ground stations as training stations. The experiment consists of two stages: the offline parameter tuning stage and the online prediction stage. In the offline parameter tuning stage, we use the training set to tune parameters of our model and the best model parameter is selected. In the online prediction stage, we apply the best model parameter to the testing set. In both offline and online stages, the performance of the model is evaluated as following. In every second, we retrieve new data points from our grid system and train the model on the training stations. Then the learned model is applied to the testing station to prediction the double difference for any quadruplet pierce point related to the testing station. To avoid over-fitting, the quadruplet pierce points containing the testing station are excluded in the offline stage.

To evaluate performance, we use RTK ratio as index, the higher the better. The RTK ratio is the probability of successful positioning while the positioning error is less than one centimeter. The RTK ratio largely depends on the double differences we predicted for the testing station.

As our system is a real-time online system, we require that the total time cost of model training and prediction cannot exceed 300 milliseconds per second frame. This excludes many time consuming models such as deep neural network. After benchmarking, we adopt Gaussian Process (GP) as our base estimator as it achieves the best RTK ratio among other popular models. In our GP model, the vTEC for each pierce point is denoted as

where . The matrix encodes the double-difference quadruplet: each row of has exact two “1” entries and two “-1” entries where the corresponding ionosphere pierce point is involved in the double-difference calculation. Each ionosphere pierce point is presented as a four dimensional vector . encode the latitude and the longitude of the pierce point. is the zenith angle and is the azimuth angle. Define kernel function

where is a diagonal kernel parameter matrix. The kernel matrix induced by is where . Denote the double-difference value. Our GP model aims to minimize the loss function


In every second, Eq. (3) is solved using the data points collected on the training stations. To predict on the testing station, we predict the vTEC for each pierce point on the testing station by

The double-difference value is followed immediately by where is the quadruplet matrix of the testing station.

In the offline stage, we tune model parameters on the 5 day training data. The kernel parameter and are tuned in range . and are tuned in range . is tuned in range . To apply our algorithm, we first train Eq. (3) on clean dataset. Then we apply the learned model on noisy dataset to predict and compute the absolute residual . For tuned in range , if , the -th noisy data point is reserved in the retraining step otherwise is discarded. We tune the reweighting parameter in range and in range . The model Eq. (3) is retrained using the filtered noisy dataset and the clean dataset together.

For comparison, we considered the following four baseline methods due to their popularity in robust regression literature: Hard Iterative Thresholding (HIT) Bhatia et al. (2017), Huber Regressor (Huber) Huber (1992), Iteratively Reweighted Least Squares (IRLS) Holland and Welsch (1977), Random Sample Consensus (RANSAC) Fischler and Bolles (1981). Due to the computational time constraint, for algorithms requiring iterative retraining, the retraining is limited to 5 trials. For HIT, in each iteration we keep percentage data points. That is, we always select the percentage data points with the smallest residual error for next iteration. is tuned in range . The Huber uses loss for small residual error and loss if the residual error is larger than a threshold. We tune this threshold in range . For IRLS, we replace the loss in Eq. (3) with norm. For RANSAC, in each iteration we keep percentage data points similarly as in HIT.

In offline tuning stage, for all methods we randomly select model parameters uniformly in the tuning range and evaluate the RTK ratio. The procedure is repeated 1000 times and the model with the highest RTK ratio on training set is selected. The parameter tuning is carried on a cluster with 20,000 CPU cores. Each method takes about 6 wall-clock hours (13.9 CPU years) to find the best parameters. In Figure 2, we report the RTK ratio of each method with the selected parameters on 5 testing days. For each day we report the averaged RTK ratio over

seconds. The last figure reports the averaged RTK ratio over all 5 days. The 95% confidence intervals of all methods are within


The numerical results show that directly training on the full dataset could be harmful as the dataset contains lots of outliers — even a robust estimator is used. We see that all four baseline methods will decrease the RTK ratio dramatically. This is not surprising as most consistent robust methods require the noisy ratio small enough. In our dataset, the noisy data could take up as many as 30% of the total data points, a number too large for most conventional robust regression methods. The HIT achieves the best results among the four baseline methods. This is interesting because the best provable breakdown point of HIT is far below 1%. Our experiment shows that HIT is able to tolerate much higher breakdown point in real-world problems. The fact that Huber does not work well might be due to the loss used in the Huber regression. Although loss is more robust than loss, it is still affected by outliers. Therefore, when the proportion of the outliers is as large as 30%, loss cannot survive. The RTK ratio of RANSAC is barely above 50%. RANSAC requires lots of trials of random sampling in order to work. However in our online system, we are only able to perform the random sampling no more than 5 trials which makes RANSAC vulnerable. IRLS performs poorly with RTK ratio less than half of FRR. This indicates that IRLS is sensitive to outliers, as we showed in the discussion of Theorem 4.

6 Conclusion

We propose to estimate the vTEC distribution in real-time by Gaussian process regression with a three-step Filter-Reweight-Retrain algorithm to enhance its robustness. We prove that the FRR is consistent and has breakdown point asymptotically convergent to 1. We apply this new method in our real-time high precision satellite-based positioning system to improve the RTK ratio of our baseline Gaussian process model. By large-scale numerical experiments, we demonstrate the superiority of the proposed FRR against several state-of-the-art methods. The FRR is more robust and accurate than conventional methods especially in our system where conventional methods are fragile due to the high corruption rate in GPS signal.


We acknowledge Qianxun Spatial Intelligence Inc. China for providing its high precision GPS devices and the grid computing system used in this work.


  • Arikan et al. [2003] F Arikan, CB Erol, and O Arikan. Regularized estimation of vertical total electron content from global positioning system data. Journal of Geophysical Research: Space Physics, 108(A12), 2003.
  • Bhatia et al. [2015] Kush Bhatia, Prateek Jain, and Purushottam Kar. Robust regression via hard thresholding. In Advances in Neural Information Processing Systems, pages 721–729, 2015.
  • Bhatia et al. [2017] Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, and Purushottam Kar. Consistent robust regression. In Advances in Neural Information Processing Systems, pages 2110–2119. 2017.
  • Chen et al. [2013] Yudong Chen, Constantine Caramanis, and Shie Mannor. Robust sparse regression under adversarial corruption. In Proceedings of the 30th International Conference on Machine Learning, volume 28, pages 774–782, 2013.
  • Collins and Langley [1997] J Paul Collins and Richard Brian Langley. A tropospheric delay model for the user of the wide area augmentation system. Department of Geodesy and Geomatics Engineering, University of New Brunswick Fredericton, 1997.
  • Dalalyan and Chen [2012] Arnak Dalalyan and Yin Chen. Fused sparsity and robust estimation for linear models with unknown variance. In Advances in Neural Information Processing Systems, pages 1259–1267, 2012.
  • Diakonikolas et al. [2018] Ilias Diakonikolas, Weihao Kong, and Alistair Stewart. Efficient algorithms and lower bounds for robust linear regression. arXiv preprint arXiv:1806.00040, 2018.
  • Fischler and Bolles [1981] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.
  • Hofmann-Wellenhof et al. [2012] Bernhard Hofmann-Wellenhof, Herbert Lichtenegger, and James Collins. Global positioning system: theory and practice. Springer Science & Business Media, 2012.
  • Holland and Welsch [1977] Paul W Holland and Roy E Welsch. Robust regression using iteratively reweighted least-squares. Communications in Statistics-theory and Methods, 6(9):813–827, 1977.
  • Huang and Reinisch [2001] Xueqin Huang and Bodo W Reinisch. Vertical electron content from ionograms in real time. Radio Science, 36(2):335–342, 2001.
  • Huber [1992] Peter J. Huber. Robust Estimation of a Location Parameter, pages 492–518. Springer New York, New York, NY, 1992.
  • Klobuchar [1987] J. A. Klobuchar. Ionospheric time-delay algorithm for single-frequency gps users. IEEE Transactions on Aerospace and Electronic Systems, AES-23(3):325–331, 1987.
  • Mannucci et al. [1998] AJ Mannucci, BD Wilson, DN Yuan, CH Ho, UJ Lindqwister, and TF Runge. A global mapping technique for gps-derived ionospheric total electron content measurements. Radio science, 33(3):565–582, 1998.
  • McWilliams et al. [2014] Brian McWilliams, Gabriel Krummenacher, Mario Lucic, and Joachim M Buhmann. Fast and robust least squares estimation in corrupted linear models. In Advances in Neural Information Processing Systems, pages 415–423, 2014.
  • Meza et al. [2009] A. Meza, M.A. Van Zele, and M. Rovira. Solar flare effect on the geomagnetic field and ionosphere. Journal of Atmospheric and Solar-Terrestrial Physics, 71(12):1322 – 1332, 2009.
  • Morgenthaler [2007] Stephan Morgenthaler. A survey of robust statistics. Statistical Methods and Applications, 15(3):271–293, 2007.
  • Mukesh [2019] P.and Karthikeyan V.and Sindhu P. Mukesh, R.and Soma. Prediction of ionospheric vertical total electron content from gps data using ordinary kriging-based surrogate model. Astrophysics and Space Science, 364(1):15, Jan 2019.
  • Nguyen and Tran [2013] N. H. Nguyen and T. D. Tran. Exact recoverability from dense corrupted observations via-minimization. IEEE Transactions on Information Theory, 59(4):2017–2035, 2013.
  • Otsuka et al. [2002] Y. Otsuka, T. Ogawa, A. Saito, T. Tsugawa, S. Fukao, and S. Miyazaki. A new technique for mapping of total electron content using gps network in japan. Earth, Planets and Space, 54(1):63–70, 2002.
  • Pajares et al. [2005] M.H. Pajares, J.M.J. Zornoza, and J.S. Subirana. GPS Data Processing: Code and Phase : Algorithms, Techniques and Recipes. Centre de Publicacions del Campus Nord, UPC, 2005.
  • Renga et al. [2018] Alfredo Renga, Flavia Causa, Urbano Tancredi, and Michele Grassi. Accurate ionospheric delay model for real-time gps-based positioning of leo satellites using horizontal vtec gradient estimation. GPS Solutions, 22(2):46, Feb 2018.
  • Sardon et al. [1994] Esther Sardon, A Rius, and N Zarraoa. Estimation of the transmitter and receiver differential biases and the ionospheric total electron content from global positioning system observations. Radio science, 29(3):577–586, 1994.
  • Teunissen [1995] P. J. G. Teunissen. The least-squares ambiguity decorrelation adjustment: a method for fast gps integer ambiguity estimation. Journal of Geodesy, 70(1):65–82, 1995.
  • Teunissen [2003] Peter Teunissen. Towards a unified theory of gnss ambiguity resolution. Journal of Global Positioning Systems, 2(1):1–12, 2003.
  • Teunissen [1998] Peter J. G. Teunissen. GPS Carrier Phase Ambiguity Fixing Concepts, pages 319–388. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998.
  • Wright and Ma [2010] J. Wright and Y. Ma. Dense error correction via-minimization. IEEE Transactions on Information Theory, 56(7):3540–3560, 2010.
  • Zhang et al. [2018] Baocheng Zhang, Peter J. G. Teunissen, Yunbin Yuan, Hongxing Zhang, and Min Li. Joint estimation of vertical total electron content (vtec) and satellite differential code biases (sdcbs) using low-cost receivers. Journal of Geodesy, 92(4), Apr 2018.

Appendix A Proof of Lemma 1


Denote and where the -th column of is . Similarly denote to be the label vector. The least square regression has solution

To bound , since , according to matrix Bernstein’s inequality, with probability at least ,


We get with probability at least ,


To bound , according to the assumption, the noise distribution is symmetric,

Therefore, with probability at least ,


we get

In summary, with probability at least ,


Appendix B Proof of Lemma 2


From filtering step in FRR (line 3 in Algorithm 1), ,

Since is sub-gaussian random vector independent to , apply Bernstein’s concentration, we have with probability at least ,

Therefore we get