The investigation of eye movements is important in the understanding of basic vision properties and with the availability of new eye tracker technology it is also of growing practical importance. Typical research topics are the relation between image properties and eye movements (bottom-up), the influence of high-level goals on movement patterns (top-down) or models to predict salient regions in an image (for an overview see ). In this study we use a model which describes the location of the eye positions as the realization of a stochastic process that consists of two components, one component that describes the larger, jump-type changes and a second component related to the relatively small changes of the eye positions. This saccade-and-fixate strategy is one of the fundamental processes used by the human visual system to analyze its environment.
The factors that control these processes are very complex. They include high-level task-solving factors and input driven factors depending on the visual properties of the current input image. In the following we will ignore all these factors and we will only analyze the statistical properties of eye-tracker data. The problem we want to investigate is: can we characterize individual observers from a collection of eye-tracker measurements? Furthermore we will ignore fixation points, which are perhaps more controlled by task and/or image related factors, and we will only use parameters derived from the saccadic movements. We investigate this problem with the help of a large database where 15 observers viewed 1003 images in free-viewing conditions .
The major steps in this investigation are the following: First we describe two methods to compute the step-length between two different eye positions. The first is the usual euclidean length while the second is using the disc version of hyperbolic geometry which takes into account that the viewing space of the observer is a cone. Transition points between fixations are by definition points with comparatively large distances between consecutive eye positions. If we consider saccadic eye movements as a stochastic process then these non-fixation points correspond to the tail of the distribution of step-lengths. Such tail distributions follow very often the generalized Pareto distribution (GPD). In the second step of our analysis we will show that the distribution of the step-lengths of the saccadic eye movements can indeed be described by the GPD. The GPD depends on three parameters: position (), shape () and scale (
). In the third part we choose a random selection of images and for each user we fit the GPD to the data. Each such experiment results in a 3-D parameter vector for each user. Selecting only a few images will obviously lead to results where the variation between images is larger than the variation between users. We will however show that with an increasing number of images per trial clear clusters appear in the parameter space. These clusters can be characterized with a mixture-of-Gaussian model. Distances in parameter space are a poor way to characterize the similarity between probability distributions but the structure in the parameter space suggests that it should be possible to identify individual observers based on the distribution of their saccadic step lengths. In our final experiments we therefore use samples of the distributions as feature vectors and we train support-vector machines (SVM) to discriminate between one specific observer and the rest. These experiments result in very high recognition rates.
2 Material and methods
The data used in this study is described in . Together with useful code it can be downloaded from the website of the authors111 http://people.csail.mit.edu/tjudd/WherePeopleLook/index.html. The database contains eye tracking data of 15 viewers and 1003 images. The database contains also code to compute fixation points. In the following we use this code to identify fixation points and to extract those eye tracking measurements that are related to the saccadic movements between fixations.
We ignore the direction of the movements and use only their step lengths. The eye-positions in the database are given in a planar coordinate system. A natural way to compute the distance between two position vectors is thus given by the common euclidean distance. In the following we will also use a model based on hyperbolic geometry. We don’t go into the technical details (which can be found in books on non-euclidean geometry ,  has a short introduction) but we will give an intuitive motivation. Consider the model of a pinhole camera in Figure 1. The field of view of the camera is restricted to the cone between the two lines denoted by B. A point in the scene located somewhere along the line L is mapped to the pixel r on the sensor. All projection lines go through the ’pinhole’ located at position (note that it is also possible to place the sensor behind the pinhole which gives essentially the same geometry). In euclidean geometry based models of eye movements one often uses two angles to describe the motion of the eye, see . A corresponding model for the one-dimensional model in Figure 1 would use the angle to characterize the line L. In this framework there is no build-in mechanism that requires that the projection line L lies between the two lines denoted by B. If we use the sensor coordinate r then we can require that the absolute value of r lies between zero and one and we can use the new, hyperbolic angle defined as as a coordinate for L. For points near the origin the hyperbolic distance is similar to the euclidean distance by for points near the boundary it goes to infinity. Note also that instead of the euclidean distance r with the rather arbitrary upper bound of one we can use the hyperbolic angle as a distance to the origin and then we have a natural distance measure with the only restriction that it should be non-negative. In the case of a three-dimensional scene and a two-dimensional sensor, the sensor is given by the unit disk. The points on the sensor can be described by euclidean coordinates but it is easier to use complex coordinates where is defined as in the one-dimensional case. For two sensor points their hyperbolic distance is given by
which reduces to the one-dimensional definition in the case where and real and .
For pairs of consecutive points we compute the distance between them and consider the statistical properties of these step lengths. In the terminology of random walks we consider the eye-tracking data as a random walk where the step lengths follow a heavy tailed distribution. Furthermore we are only interested in those parts of the walk in which the step lengths are large. Here we do not define a numerical value of that threshold but we simply restrict our investigation to points that are not classified as fixation points in the database. Data of this type is often investigated with the help of Pareto distributions that are used when excesses of a random variable over a threshold are considered. The probability density function (PDF) of the generalized Pareto distribution is defined as:
We estimate the parameters of the distribution with the help of the Matlab function gpfit and therefore we follow the notation used there:is the shape, is the scale and is the location parameter of the distribution. Note that the shape parameter can be negative in which case the support of the distribution is a finite interval. For positive the support is the positive half-axis with left end point at We do not consider the special case of . More information about the generalized Pareto distribution and extreme value distributions in general can be found in  (but note the sign change for there). The estimation of the parameters is done in two steps: first we find the minimum distance value in the data vector. Then we shift the distance values so that the minimum value is zero. In the second step we ignore all shifted distance values with value exactly equal to zero and fit a two parameter GPD with shape and scale parameters to the shifted data. This gives a three parameter representation of the data in terms of the minimum value and the shape and scale parameters of the GPD.
Estimating distribution parameters for single images and a single user is obviously not very meaningful. One problem is that the number of non-fixation points is, by definition, much lower than the number of fixation points and the other reason is that the form of a given eye movement sequence can vary considerably. In our experiments we therefore select randomly a given number of images from the database. For a given observer we combine all non-fixation measurements from the corresponding observations into one dataset. From this dataset we estimate the parameters of the GPD. In most experiments described below this process is repeated 5000 times to see how the random selection of the images influences the values of the estimated parameters. The results show that the distribution parameters are concentrated in clusters linked to the different observers.
In the final experiments we use samples from the probability density functions of the GPD’s and train support-vector machines (SVM) to discriminate between one observer and the remaining observers. We will show that the recognition rates are very high and that they are slightly better in the euclidean metric based experiments than in the experiments using the hypberbolic distances.
3 Experiments and Results
As an illustration of the variation of the eyetracking measurements for different observers we show in Figures 2 and 3 the measurements for four observers (ems, hp, jw, kae) when viewing the two images (i14020903.jpeg, i113347896.jpg) in the database.
After fitting the GPD to the data we computed the adjusted R-squared value (see ) which gives a measure of how similar the shapes of the empirical and the fitted distribution functions are. These values lie between zero and one and higher values indicate better fit. In the following experiments we selected 50, 100, 200 and 500 random images in each trial and we fitted the GPD for each observer. We did this for both the euclidean and the hyperbolic distances. The mean values of the adjusted R-squared values are collected in Table 1 for the euclidean distance and in Table 2 for the hyperbolic distance. The column All contains the mean value over all observers. From the tables one can see that the hyperbolic distance results are in general slightly better and they produce more consistent fitting results. As an illustration of what these numbers really mean, we compare the empirical distribution and the estimated GPD in Figures 4, 5,6 and 7. In the experiment we selected 200 random images and the eye-tracker data of the observer jw. The first two plots show the histogram of the distribution and the fitted GPD, both for the euclidean and the hyperbolic distances. These distributions have long tails and therefore we restrict the plot range in Figures 4, 5 to the relevant parts of the distributions. Figures 6 and 7
are quantile-quantile plots of the same data. These plots show the relations between the quantiles of the empirical data and the corresponding quantiles of the GPD. For a perfect fit all points lie on the 45 degrees diagonal. We see again that the fit is very good for the major parts of the distributions, only for very high quantiles the differences become noticeable. Which is natural in this case since we have, by definition, very few observations in this value range.
Mean values of the adjusted R-squared values give only a summary overview over the computed values. In Figure 8
we show for all the observers the empirical cumulative distribution function (ECDF) of these values for the case where we used the hyperbolic distance, computed 5000 fittings and used 200 images in each run.
The distribution of the data points are described by three parameters: the minimum value, the shape and the scale parameters. The experiments confirmed that the minimum values are indeed very small and the variance of the minimum values is also almost zero (typically of the order). The following description of the structure of the parameter space of the distributions is therefore restricted to the distribution of the shape and scale parameters of the GPD, fitted to the shifted distance values. In the classification experiments we will make use of the three-parameter form of the GPD.
An analysis of the distribution of the ()-parameters showed that the (
)-vectors for a given observer are concentrated in restricted regions and we therefore fitted Gaussian mixture models (with 15 Gaussians, using the Matlab function gmdistribution.fit) to the distribution of the ()-vectors.
For the experiments where 50, and 200 images per trial were used in 5000 trials we show the distribution of the GPD parameters and the overlayed Gaussian mixture contours in Figures 9, and 10. The hyperbolic distance is used in all figures. The abbreviated user name is displayed at the position of the value of the median of the shape parameter and the scale parameter for that user.We see that by increasing the number of images considered in each trial the separation between the users improves and for 50 and 200 images the 15 Gaussians give a good description of the 15 observers. One interesting observation is the location of the parameters for observer kae where fitting resulted in a negative shape parameter. Note that the observers ems, hp, jw, kae used in Figure 2 and Figure 3 are those that occupy the outlying regions in the ()-space.
The structure in the distribution parameter space indicates that it should be possible to identify individual observers from the distribution of their saccadic eye movements. In a series of experiments we therefore investigated if this really is the case.
First we note that the distributions are characterized by the three parameters from which the complete pdf can be computed. In the case where we used trials in the fitting experiment the whole information can be stored in a database of size (number of trials x number of observers x number of distribution parameters). Such databases will be used in the following classification experiments, the values for are 5000 and 10000.
The distributions are defined on different intervals due to the location parameter. We construct a sampled version of the pdfs as follows: First we compute 100 samples of the pdf on the interval between its 5-percentile and its 95 percentile. The we embed all pdfs in the larger interval between the lowest 5 percentile and the highest 95 percentile for all pdfs under consideration. We then define 200 equally spaced sample points on the larger interval and finally construct for each pdf a 200-D vector with the square root of the pdf at these sample points by interpolation. This correponds to the Hellinger distance between two distributions. Each pdf is thus characterized by a 200-D vector defined on a common reference grid.
Next we select from the 200-D vectors those samples which have the highest variance. The components in this -dimensional vector are of course very similar to each other and there are other, more effective methods available, but we found this choice sufficient for this first study.
After the feature selection we selectedtrials, described in the previous experiment, and the corresponding distributions for all observers. These -dimensional vectors were then used to train a support vector machine (SVM) to discriminate one observer against all the other observers. After the training we selected vectors from random observers and random trials and classified them with the trained SVM. This was repeated 100 times and the results were collected in the mean recognition rates. In all experiments we used vectors for classification.
The following figures illustrate some of the results obtained using these evaluation procedure. First we investigated the influence of the number of pdf-samples used (the value of the -parameter). We used the euclidean distance, 5000 trials, 50 images per trial and samples to train the SVM. In each classification step 5000 vectors were classified. Figure 11 shows the results obtained for
We see that the best results are obtained using 10 or 20 samples and we therefore choose to use 20 pdf samples in the following experiments.
In the next experiment we investigated if the euclidean or the hyperbolic distance measure gives better classification results. We used, 10000 trials, 250 images per trial and samples to train the SVM. In figure 12 we show the results and we see that the euclidean distance measure gives in general better classification results than the hyperbolic distance (but note the very high recognition rates).
In Fig. 13 we illustrate the influence of the number of images used to train the SVM. We used, 5000 trials, 50 images per trial and samples to train the SVM. We see that 20 or 50 samples are sufficient.
In the last experiment we investigated how the number of images used in a trial influence the classification result. We used 5000 trials and 50 and 200 images per trial (corresponding to Figs. 9 and 10) We see that the number of images used to estimate the parameters of the GPD has a great influence on the classification result, as was to be expected.
The three main topics described in the paper are (1) the introduction of the hyperbolic distance, (2) the usage of the GPD and (3) the recognition of observers based on the measurements of the saccadic eye movements.
The application of hyperbolic geometry is attractive from a theoretical point of view since the limitation of the viewing geometry is built into the model. The experiments showed that the fitting of the GPD distribution, as measured by the R-squared values, are slightly better for the hyperbolic distance than for the euclidean distance. This is probably an effect of the relative higher importance of large distances in the hyperbolic geometry. The somewhat lower classification rates in the hyperbolic case, compared to the euclidean case, may be an indication that for a classification the eye movements in the transition phases between fixation and saccades are also important.
In both cases, euclidean and hyperbolic, we find however that the GPD provides a very good and compact model for the statistical distribution of the step lengths. The classification experiments show that these distribution are potentially useful for identification tasks but more detailed studies with more observers are necessary to judge the potential of the method.
We also note that the special form of the GPD’s make it possible to derive analytic expressions for different distances (like the Hellinger distance). We derived some of these expressions with the help of Mathematica but their final form often involves expressions based on special functions like the hypergeometric function which are costly to evaluate numerically. We therefore used the numerical implementation described above.
No attempts were made to optimize the parameters in the different processing steps. All statistical computations (gpfit, gmdistribution, fitcsvm, predict) were done using the tools in the Matlab 2014b Statistics Toolbox with default parameter settings.
-  J. W. Anderson. Hyperbolic geometry. Springer Verlag, London, 1999.
A. Borji and L. Itti.
State-of-the-art in visual attention modeling.IEEE Trans. Pattern Analysis and Machine Intelligence, 35(1):185–207, 2013.
-  E. Castillo. Extreme value and related models with applications in engineering and science. Wiley, Hoboken, N.J., 2005.
-  T. Haslwanter. Mathematics of three-dimensional eye rotations. Vision Research, 35(12):1727–1739, June 1995.
-  M. Nikulin. Hellinger distance. In Encyclopedia of Mathematics. Springer Verlag. http://www.encyclopediaofmath.org/index.php?title=Hellinger_distance&oldid=16453.
-  C. L. Siegel. Topics in complex function theory. Vol. 2, Automorphic functions and Abelian integrals. Wiley Interscience, New York, 1969.
-  J. Tilke, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In Int. Conf. Comp. Vis., ICCV 2009, pages 2106–2113, 2009.
-  G. J. G. Upton and I. T. Cook. A dictionary of statistics. Oxford University Press, Oxford, 2008.