1 Introduction
An autonomous, adaptive system, such as a selfdriving car, needs to be robust to selffailures and changing environmental conditions. To do so, it must distinguish between selferrors and environmental changes. This chickenandegg problem is the concern of ground truth inference algorithms  algorithms that measure a statistic of ground truth given the output of an ensemble of evaluators. They seek to answer the question  Am I malfunctioning or is the environment changing so much that my models are starting to break down?
Ground truth inference algorithms have had a spotty history in the machine learning community. The original idea came from (Dawid et al., 1979)
and used the EM algorithm to solve a maximumlikelihood equation. This enjoyed a brief renaissance in the 2000s due to advent of services like Amazon Mechanical Turk. Our main critique of all these approaches is that they are parametric  they assume the existence of a family of probability distributions for how the estimators are committing their errors. This has not worked well in theory or practice
(Zheng et al., 2017).Here we will discuss the advantages and limitations of a nonparametric approach that uses compressed sensing to solve the ground truth inference problem for noisy regressors (CorradaEmmanuel & Schultz, 2008). Ground truth is defined in this context as the correct values for the predictions of the regressors. The existence of such ground truth is taken as a postulate of the approach. More formally,
Definition 1 (Ground truth postulate for regressors).
All regressed values in a dataset can be written as,
(1) 
where does not depend on the regressor used.
In many practical situations this is a very good approximation to reality. But it can be violated. For example, the regressors may have developed their estimates at different times. Meanwhile, may have varied under them.
We can now define the ground truth inference problem for regressors as,
Definition 2 (Ground truth inference problem for regressors).
Given the output of aligned regressors on a dataset of size D,
estimate the error moments for the regressors,
(2) 
and
(3) 
without the true values, {}.
The separation of moment terms that are usually combined to define a covariance^{1}^{1}1To wit, covariance can be expressed as between estimators is deliberate and relates to the math for the recovery as the reader will understand shortly.
As stated, the ground truth inference problem for sparsely correlated regressors was solved in (CorradaEmmanuel & Schultz, 2008) by using a compressed sensing approach to recover the moments, , for unbiased () regressors. Even the case of some of the regressors being strongly correlated is solvable. Sparsity of nonzero correlations is all that is required. Here we point out that the failure to find a unique solution for biased regressors still makes it possible to detect and correct biased regressors under the same sort of engineering logic that allows bit flip error correction in computers.
2 Independent, unbiased regressors
We can understand the advantages and limitations of doing ground truth inference for regressors by simplifying the problem to that of independent, unbiased regressors. The inference problem then becomes a straightforward linear algebra one that can be understood without the complexity required when some unknown number of them may be correlated.
Consider two regressors giving estimates,
(4)  
(5) 
By the Ground Truth Postulate, these can be subtracted to obtain,
(6) 
Note that the lefthand side involves observable values that do not require any knowledge of . The right hand side contains the error quantities that we seek to estimate. Squaring both sides and averaging over all the datapoints in the dataset we obtain our primary equation,
(7) 
Since we are assuming that the regressors are independent in their errors (), we can simplify 7 to,
(8) 
This is obviously unsolvable for the unknown square moments, and , with a single pair of regressors. But for three regressors it is. It leads to the following linear algebra equation,
(9) 
An application of this simple equation to a synthetic experiment with three noisy regressors is shown in Figure 1. Just like any least squares approach, and underlying topology for the relation between the different data points is irrelevant. Hence, we can treat, for purposes of experimentation, each pixel value of a photo as a ground truth value to be regressed by the synthetic noisy regressors. In this experiment we used uniform error. Similar results are obtainable with other error distributions (e.g. Gaussian noise).
To highlight the multidimensional nature of equation 6, we randomized each of the color channels but made one channel more noisy for each of the pictures. This simulates two regressors being mostly correct, but a third one perhaps malfunctioning. Since even synthetic experiments with independent regressors will result in spurious nonzero crosscorrelations, we solved the equation via least squares^{2}^{2}2The full compressive sensing solution being wholly unnecessary in this application where we know the regressors are practically independent..
3 Biased, independent regressors
Our approach is not a typical one in statistics. Here we are concerned with the recovery of the multiple error signals of the regressors, not the single underlying {} signal. Wainwright ((Wainwright, 2019)
) discusses sparsity in the context of highdimensional data to discuss the typical use of cormpressed sensing  to recover the signal, not statistics of the many error signals of the regressors. Likewise,
seemingly unrelated regressions (Wikipedia contributors, 2019) by Zellner is focused on recovering the model parameters of a linear model for an ensemble of regressors.Our approach fails for the case of biased regressors. We can intuitively understand that because eq. 6 is invariant to a global bias, , for the regressors. We are not solving for the full average error of the regressors but their average precision error,
(10) 
We can only determine the error of the regressors modulus some unknown global bias. This, by itself, would not be an unsurmountable problem since global shifts are easy to fix. From an engineering perspective, accuracy is cheap while precision is expensive^{3}^{3}3Examples are (a) the zeroing screw in a precision weight scale, (b) the number of samples needed to measure a classifier’s accuracy when it is of unknown accuracy versus when we know it is either, say, 1% or 99% accurate. The former situation is more accurate on average but less precise. The latter one, precise but inaccurate.. The more problematic issue is that it would not be able to determine correctly who is biased if they are biased relative to each other.
Let us demonstrate that by using eq 6 to estimate the average bias, , for the regressors. Averaging over both sides, we obtain for three independent regressors, the following equation^{4}^{4}4The observable statistic, , is equal to ,
(11) 
The rank of this matrix is two. This means that the matrix has a onedimensional null space. In this particular case, the subspace is spanned by a constant bias shift as noted previously. Nonetheless, let us consider the specific case of three regressors where two of them have an equal constant bias,
(12) 
This would result in the vector,
(13) 
The general solution to Eq. 10 would then be,
(14) 
This seems to be a failure for any ground truth inference for noisy regressors. Lurking underneath this math is the core idea of compressed sensing: pick the value of for the solutions to eq. 14 that minimizes the norm of the recovered vector. When such a point of view is taken, nonunique solutions to ground truth inference problems can be reinterpreted as error detecting and correcting algorithms. We explain.
4 Error detection and correction
Suppose, instead, that only one of the three regressors was biased,
(15) 
This would give the general solution,
(16) 
with an arbitrary, constant scalar. If we assume that errors are sparse, then an minimization approach would lead us to select the solution,
(17) 
The algorithm would be able to detect and correct the bias of a single regressor. If we wanted more reassurance that we were picking the correct solution then we could use 5 regressors. When the last two have constant bias, the general solution is,
(18) 
With the corresponding minimization solution of,
(19) 
This is the same engineering logic that makes practical the use of error correcting codes when transmitting a signal over a noisy channel. Our contribution is to point out that the same logic also applies to estimating errors by regressors trying to recover the true signal.
5 Conclusions
A compressed sensing algorithm for recovering the average error moments of an ensemble of noisy regressors exists. Like other ground truth inference algorithms, it leads to nonunique solutions. However, in many wellengineered systems, errors are sparse and mostly uncorrelated when the machine is operating normally. Algorithms such as this one can then detect the beginning of malfunctioning sensors and algorithms.
We can concretize the possible applications of this technique by considering a machine such as a selfdriving car. Optical cameras and range finders are necessary subcomponents. How can the car detect a malfunctioning sensor? There are many ways this already can be done (no power from the sensor, etc.). This technique adds another layer of protection by potentially detecting anomalies earlier. In addition, it allows the creation of supervision arrangements such as having one expensive, precise sensor coupled with many cheap, imprecise ones. As the recovered error moment matrix in Figure 2 shows, many noisy sensors can be used to benchmark a more precise one (the (sixth regressor {6,6} moment in this particular case). As (CorradaEmmanuel & Schultz, 2008) demonstrate, it can also be used on the final output of algorithms. In the case of a selfdriving car, a depth map is needed of the surrounding environment  the output of algorithms processing the sensor input data. Here again, one can envision supervisory arrangements where quick, imprecise estimators can be used to monitor a more expensive, precise one.
There are advantages and limitations to the approach proposed here. Because there is no maximum likelihood equation to solve, the method is widely applicable. The price for this flexibility is that no generalization can be made. There is no theory or model to explain the observed errors  they are just estimated robustly for each specific dataset. Additionally, the math is easily understood. The advantages or limitations of a proposed application to an autonomous, adaptive system can be ascertained readily. The theoretical guarantees of compressed sensing algorithms are a testament to this (Foucart & Rauhaut, 2013). Finally, the compressed sensing approach to regressors can handle strongly, but sparsely, correlated estimators.
We finish by pointing out that nonparametric methods also exist for classification tasks. This is demonstrated for independent, binary classifiers (with working code) in (CorradaEmmanuel, 2018). The only difference is that the linear algebra of the regressor problem becomes polynomial algebra. Nonetheless, there we find similar ambiguities due to nonunique solutions to the ground truth inference problem of determining average classifier accuracy without the correct labels. For example, the polynomial for unknown prevalence (the environmental variable) of one of the labels is quadratic, leading to two solutions. Correspondingly, the accuracies of the classifiers (the internal variables) are either or . So a single classifier could be, say, 90% or 10% accurate. The ambiguity is removed by having enough classifiers  the preferred solution is where one of them is going below 50%, not the rest doing so.
References
 CorradaEmmanuel (2018) CorradaEmmanuel, A. Ground truth inference of binary classifier accuracies  the independent classifiers case. https://github.com/andrescorrada/groundtruthproblemsinbusiness/blob/master/classification/IndependentBinaryClassifiers.pdf, 2018.

CorradaEmmanuel & Schultz (2008)
CorradaEmmanuel, A. and Schultz, H.
Geometric precision errors in lowlevel computer vision tasks.
In Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 168–175, Helsinki, Finland, 2008.  Dawid et al. (1979) Dawid, P., Skene, A. M., Dawidt, A. P., and Skene, A. M. Maximum likelihood estimation of observer errorrates using the em algorithm. Applied Statistics, pp. 20–28, 1979.
 Foucart & Rauhaut (2013) Foucart, S. and Rauhaut, H. A Mathematical Introduction to Compressive Sensing. Birkhäuser, New York, 2013.
 Wainwright (2019) Wainwright, M. J. HighDimensional Statistics: A NonAsymptotic Viewpoint. Cambridge University Press, New York, 2019.
 Wikipedia contributors (2019) Wikipedia contributors. Seemingly unrelated regressors — Wikipedia, the free encyclopedia, 2019. URL https://en.wikipedia.org/wiki/Seemingly_unrelated_regressions. [Online; accessed 3July2019].
 Zheng et al. (2017) Zheng, Y., Li, G., Li, Y., Shan, C., and Cheng, R. Truth inference in crowdsourcing: Is the problem solved? In Proceedings of the VLDB Endowment, volume 10, no. 5, 2017.