Developing technologies to understand and enhance user experience has become one of the most challenging problems for WWW researchers and practitioners. During the last decade, the growth of interactions continuously supported innovations in a data-driven fashion, based on user interactions and user feedback. However, latest research revealed a considerable extent of uncertainty within user feedback and discussed striking impacts for the assessment of adaptive web systems and content personalisation approaches (Amatriain and Pujol, 2009; et. al., 2012; Jasberg, 2017a). As a motivating example, we consider the task of gathering explicit user feedback (e.g. user satisfaction for a novel interface, rating a recently purchased item, etc.). Figure 1 depicts the relative histograms for two users who have been rating a theatrical trailer five times with a small temporal gap in between. Their feedback is scattering around a central tendency, thus raising the question of implications for our knowledge about those users’ true opinions. For example, when a user feedback doesn’t match a prior prediction, can this deviance be deemed as system-related (and improvable) or is it just an artefact of human uncertainty (meaning that the system works well)?
2. Related Work / deduced Methods
The idea of uncertainty is not only related to the web and prediction but also to measuring sciences such as metrology. In this field, quantities are modelled by probability density functions and composed quantities emerge as a convolution of densities(for Guides in Metrology, 2008). An application of this theory has recently been carried out by (Jasberg, 2017b) for addressing similar issues in the field of computer science. Recent research reveales some striking impacts of response uncertainty within the databases of web information systems. In (Jasberg, 2017a, 2018) it is demonstrated that comparative assessments and rankings are more or less subject to possible errors due to response uncertainty. Moreover, the findings of (et. al., 2012; Jasberg, 2017b) show that human uncertainty induces some kind of offset, i.e. a non-vanishing barrier representing the minimum of a specific metric.
We will turn these problems into a benefit by deriving an instrument for detecting significant improvements of web information systems. In doing so, we consider two scores of an assessment metric and assume the relation to hold if the opposite case occurs with a probability
(type I error). It is quite laborious to deriveand 2018), but this way of testing can be simplified using the magic barrier. The idea is to shift the barrier distribution
along the x-axis of metric scores and test whether it is possible to cover both metric results and
within the 95%-confidence interval
. This simplification is valid since a metric’s variance matches the variance of the magic barrier for large data records(Jasberg, 8866) which is most usual for WWW research. The optimal shift is given when
. Heuristically explained, two metric scores cannot be distinguished by means of the relationif there exists a single solution which can explain both outcomes with sufficient validity.
3. Dealing with Uncertainty
A lot of research has been done on dealing with human uncertainty before. Possible solutions can be parted in three groups, de-noising via preprocessing steps, averaging out by artificially inducing noise and omitting data by account only for largest deviations.
3.1. Pre-Processing Steps
A prominent example of de-noising algorithms has been introduced in (Amatriain and Pujol, 2009), where the authors recursively removed all (repeated) ratings whose distance was larger than a certain threshold and replaced them by ratings whose distance was less or equal than this threshold. Heuristically, human uncertainty is artificially limited by manually replacing it with smaller deviations. This pre-processing step is denoted as user-based kNN and leads to an RMSE score on the Netflix data record which outperformed achieved by the same algorithm without any pre-processing. However, Fig. 1(d) reveals, that both scores can be located within the confidence interval of a shifted magic barrier. In other words, both scores might just result as two trials from exactly the same metric distribution. Thus, a supposed improvement can not be detected significantly.
3.2. Predictor Noise
Another strategy of dealing with uncertainty, as proposed by (Koren and Sill, 2011), is to additionally associate the model-based predictions with artificial uncertainty. Let be the model-based prediction for a user-item-pair , then we consider the random variable as the prediction along with uncertainty. The basic idea of this is to average out the human uncertainty when it comes to a comparison of both uncertain quantities, i.e. the rating as well as its prediction. This strategy was implemented in the OrdRec algorithm and has been compared to simple techniques of SVD++, RBM and MultiFM by means of the RMSE on the data records of Netflix, Y!Music-I and Y!Music-II (Koren and Sill, 2011). For the Netflix set, Fig. 1(a) demonstrates that all scores are so close together that they can be considered as different draws from just a single distribution. We can observe the same for the Y!Music-I data (Fig. 1(b)) with the exception of the RBM algorithm which is significantly worse. For the Y!Music-II data record, we cannot cover all scores under a single distribution because the SVD++ and the MultiMF algorithms differ significantly. For OrdRec, however, we can find barrier shifts so that achieved scores can be covered pairwise. In other words, the OrdRec model is neither better nor worse than each of the other systems.
3.3. Partially Omitting Noise
Yet another apporach was introduced by (Jasberg, 2017a), where the author used hypothesis testing to decide whether deviations between a rating and its prediction can be explained by uncertainty or not and by only calculating accuracy metrics with those 5% of deviations that have been large enough. Unfortunately this approach cannot be compared to other algorithms since it changes the metric itself and there is no common baseline for evaluations. One disadvantage of this approach is, that it denies 95% of data which impacts the validity of evaluations. Moreover, all problems of uncertainty were still existent and only slightly diminished.
Casually speaking, all strategies of dealing with uncertainty that have been developed so far, share a very strong system centric view where user variation is something undesirable and should be modelled with the eye to eliminate. However, all these strategies more or less fail to improve the accuracy of personalisation approaches and thus we have to ask whether this controversial view amidst a large fraction of web researches is yet worthwhile. Instead, we recommend a novel strategy of acceptance which turns away from mere elimination. Therefore, further research has to focus on how to use uncertainty as a new trait of information and how to benefit from it. It would be conceivable that, for example, recommender systems propose items on the basis of user uncertainty that they would otherwise never have offered. Moreover, if further research were to concentrate on understanding human uncertainty through neuroscientific theories, new psychological characteristics might be found, according to which users can be clustered. These and related questions are key challenges for the future of web technologies.
- Amatriain and Pujol (2009) Xavier Amatriain and Josep Pujol. 2009. Rate It Again: Increasing Recommendation Accuracy by User Re-rating. In RecSys Conference. ACM.
- et. al. (2012) Alan Said et. al. 2012. Users and Noise: The Magic Barrier of Recommender Systems. In In Proceedings of UMAP.
- for Guides in Metrology (2008) Joint Commitee for Guides in Metrology. 2008. Evaluation of measurement data - Guide to the expression of uncertainty in measurement. BIPM.
- Jasberg (2017a) Kevin Jasberg. 2017a. Assessment of Prediction Techniques: The Impact of Human Uncertainty. In Proceedings of WISE.
- Jasberg (2017b) Kevin Jasberg. 2017b. The Magic Barrier Revisited: Accessing Natural Limitations of Recommender Assessment. In Proceedings of ACM RecSys.
- Jasberg (2018) Kevin Jasberg. 2018. Human Uncertainty and Ranking Error - Fallacies in Metric-Based Evaluation of Recommender System. In In Proceedings of ACM SAC.
- Jasberg (8866) Kevin Jasberg. arXiv:1706.08866. Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on Reliability. (arXiv:1706.08866).
- Koren and Sill (2011) Yehuda Koren and Joe Sill. 2011. OrdRec: An Ordinal Model for Predicting Personalized Item Rating Distributions. In Proceedings of ACM RecSys.