Privacy Preserving Gaze Estimation using Synthetic Images via a Randomized Encoding Based Framework

11/06/2019 ∙ by Efe Bozkir, et al. ∙ Universität Tübingen 0

Eye tracking is handled as one of the key technologies for applications which assess and evaluate human attention, behavior and biometrics, especially using gaze, pupillary and blink behaviors. One of the main challenges with regard to the social acceptance of eye-tracking technology is however the preserving of sensitive and personal information. To tackle this challenge, we employed a privacy-preserving framework based on randomized encoding to train a Support Vector Regression model on synthetic eye images privately to estimate human gaze. During the computation, none of the parties learns about the data or the result that any other party has. Furthermore, the party that trains the model cannot reconstruct pupil, blink or visual scanpath. The experimental results showed that our privacy preserving framework is also capable of working in real-time, as accurate as a non-private version of it and could be extended to other eye-tracking related problems.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


With the recent advances in the fields of smart glasses and Head-Mounted-Display (HMD) technology, computer graphics, augmented reality (AR) and eye tracking numerous novel applications are becoming available. One of the most natural and non-intrusive ways of interaction with these devices is achieved by gaze-contingent interfaces using eye tracking. However, it is possible to derive a lot of sensitive and personal information from eye tracking data such as intentions, behaviors, preferences or fatigue due to the fact that eyes are not fully controlled in a conscious way.

To count a few of them, cognitive load [Chen and Epps2014, Appel et al.2018], attention [Bozkir, Geisler, and Kasneci2019], stress [Kübler et al.2014], task identification, skill level assessment and expertise [Liu et al.2009, Borji and Itti2014, Eivazi et al.2017, Castner et al.2018], activity recognition [Steil and Bulling2015, Braunagel et al.2017], biometric information and authentication [Kinnunen, Sedlak, and Bednarik2010, Komogortsev et al.2010, Komogortsev and Holland2013, Zhang et al.2018, Abdrabou et al.2019], or personality traits [Berkovsky et al.2019] could be obtained using eye movements and eye tracking data. Since highly sensitive information can be derived from eye tracking data, it is not very surprising that HMDs, smart glasses or similar devices have not been adopted by large communities yet. According to a recent survey [Steil et al.2019a], people agreed to share their eye tracking data only when it is co-owned by a governmental health-agency. This indicates that people are hesitant about sharing their eye-tracking data in commercial applications. Therefore, larger communities could adopt HMDs or smart glasses if privacy-preserving techniques were applied in the eye-tracking applications. The reasons why privacy preserving schemes are needed for eye tracking were discussed in [Liebling and Preibusch2014] extensively. However, until now, there are not many studies concentrated on privacy preserving eye tracking. Recently, a method to detect privacy-sensitive everyday situations [Steil et al.2019b], an approach to degrade iris authentication while keeping the gaze tracking utility in acceptable accuracy [John, Koppal, and Jain2019], and differential-privacy based techniques to protect personal information on heatmaps and eye movement features [Liu et al.2019, Steil et al.2019a] were introduced. While differential privacy could be applied to eye tracking data for various tasks and used effectively, it introduces additional noise on the data which causes decrease in the utility [Liu et al.2019, Steil et al.2019a]

, and could lead to less accurate results in computer vision tasks, such as gaze estimation, intention detection or activity recognition.

In light of the above, function-specific privacy models are required. In this work, we focus on the gaze estimation problem as a proof-of-concept due to the simplicity of synthetic data generation including eye landmarks and ground truth gaze vectors. However, the same privacy preserving approach could be further extended to any feature-based, eye-tracking problem such as intention, fatigue or personality trait detection, in HMD or unconstrained setups due to the real-time working capabilities. In our study, the gaze estimation task is solved in a privacy-preserving manner by computing the dot product for a scenario, where we have two parties having eye landmarks, each of which we call input-party, and one party that we call function-party that wants to train a prediction model on the data of the input-parties. This scenario is relevant when the input-parties want to use eye-tracking data to improve the accuracy of their gaze estimators and do not want to share the data due to privacy concerns. To accomplish this, we utilize a framework employing randomized encoding [Ünal, Akgün, and Pfeifer2019]. In the computation, neither the eye images nor the extracted features will be revealed to the function party. Furthermore, the input-parties will not learn anything about the data of the other input-party or the result of the computation. We utilized Support Vector Regression (SVR) to estimate the gaze vectors. Eye images were rendered using UnityEyes [Wood et al.2016] synthetically and 36 landmark-based features [Park et al.2018] were used. To the best of our knowledge, this is the first study that applies a privacy preserving scheme based on function-specific privacy models on an eye tracking related problem.


In this section, we briefly discuss the synthetic data generation, processing, randomized encoding, and overall framework.

Data Generation

To train and evaluate the gaze estimator, we first generated eye images and their gaze vectors. Since our work is a proof-of-concept and requires a high amount of data, synthetic eye images rendered by UnityEyes [Wood et al.2016], which is based on the Unity3D game engine, were used. Camera parameters and Eye parameters were taken as (fixed camera) and (eyeball pose range parameters in degrees), respectively. images were rendered in Fantastic graphics quality setting and screen resolution. Afterwards, data processing and normalization pipeline from [Park et al.2018] was employed. In the end, we obtained sized eye images, 18 eye landmarks including eight iris edge landmarks, eight eyelid landmarks, one iris center landmark and one iris-center-eyeball-center vector normalized according to Euclidean distance between two eye corners and gaze vectors including pitch and yaw angles. Final feature vectors consisted of normalized elements. An example visualization of eye landmarks and gaze vector on the rendered synthetic images is shown in Figure 1.

(a) Landmarks
(b) Gaze
Figure 1: Eye landmarks and gaze on an example synthetic image

Randomized Encoding

The framework that we utilized employs randomized encoding (RE) [Applebaum, Ishai, and Kushilevitz2006b, Applebaum, Ishai, and Kushilevitz2006a] to compute the element-wise multiplication of the landmark vectors, which is then used to compute the dot product of these vectors. In randomized encoding, the computation of a function is performed by a randomized function where is the input value and is the random value. The idea is to encode the original function by using random value(s) such that the combination of the components of the encoding reveals only the output of the original function but nothing else. In the framework, the computation of the dot product is accomplished by utilizing the decomposable and affine randomized encoding (DARE) of multiplication [Applebaum2017]. The encoding of multiplication is as follows:

Definition 1 (Perfect RE for Multiplication)

[Applebaum2017] Let there be a multiplication function over a ring . One can perfectly encode by employing the DARE :

where and are uniformly chosen random values. The recovery of can be accomplished by computing where , , and . The simulation of can be done perfectly by the simulator where are random values.


To perform the private gaze estimation task in our scenario, we inspired from the framework proposed in [Ünal, Akgün, and Pfeifer2019] due to its efficiency compared to the other proposed approaches in the literature. The framework was proposed to compute the addition or multiplication of the input values of two input-parties in the function-party by utilizing randomized encoding. We utilized the multiplication operation over the eye landmark vectors to compute the dot product of these vectors.

Let us assume that Alice and Bob have the eye landmark data and where and represent the number of samples in Alice and Bob, respectively, and is the number of features. In addition to the input-parties Alice and Bob, let us assume that there is a server that wants to train a model on the data of the input-parties. Before continuing with the process, it is worth to note that for any matrix represents the -th column of the corresponding matrix and ”“ represents the element-wise multiplication of the vectors. Furthermore, the input-parties shuffle their data before the computation to avoid the possibility of private information leakage such as the behavior of the person due to the nature of the visual sequence information. At the first step of the computation, Alice creates three vectors with uniformly chosen random values, which will be used to encode the element-wise multiplication of the vectors. Afterwards, Alice shares these vectors with Bob. Once Bob receives the vectors with random values, he computes and , where and . Meanwhile, Alice computes and , where and . When the input-parties compute their share of the encoding, they send these components to the server along with the gram matrix of their own samples, which is the dot product among their own samples. Then, the server needs to compute the dot product between samples of Alice and Bob to complete the missing part of the gram matrix of all samples. To achieve this, the server computes , and where is the -th row -th column entry of the gram matrix between the samples of Alice and Bob. Once the server has all components of the gram matrix, it can then construct the complete gram matrix by simply concatenating the parts of it. In our solution, Alice and Bob send to the server and tuples, respectively. and are the summations of one share of the element-wise product of the whole input. By doing so, we prevent the server from making inference from the element-wise multiplication of the raw vectors. The overall flow between the input-parties and server is summarized in Figure 2.

Figure 2: Overall protocol execution

After having the whole gram matrix for all samples that Alice and Bob have, the server can use it as a kernel matrix as if it was computed by the linear kernel function on pooled data. Additionally, the server can also compute a kernel matrix as if it was computed by the polynomial or radial basis kernel function (RBF) by utilizing the resulting gram matrix. As an example, the calculation of RBF from the gram matrix is as follows:

where “” represents the dot product of vectors, which can be obtained from the gram matrix, and is the parameter utilized to adjust the similarity level. Once the desired kernel matrix is computed, one can train an SVR model by employing the computed kernel matrix to estimate the gaze. In the process of the computation of the dot product, the amount of data transferred among parties is bytes where is the size of one data unit.


To demonstrate the performance of the proposed framework on the gaze estimation problem, we conducted experiments on a PC equipped with Intel Core i7-7500U with 2.70 GHz processor and 16 GB memory RAM. In these experiments, we employed varying sizes of eye landmark data, that were , and samples of which one-fifth was the test data. We split the data between the input-parties equally. It is worth to note that the framework allows us to optimize the parameters of the model in the server without further communication with the input-parties. Thanks to this, we utilized 5-fold cross-validation to optimize the parameters, which are the similarity adjustment parameter of the Gaussian RBF kernel; the misclassification penalty parameter and the tolerance parameter of SVR. In order to evaluate the prediction results of the experiments, we employed mean angular error in the same way as [Park et al.2018]. Table 1 demonstrates the relationship between the size of the dataset and the mean angular error of gaze estimation. Since no additional noise was introduced during the computation of the kernel matrix, the results that we obtained from the framework are the same with the non-private ones as long as the same parameters are utilized. The mean angular error values we obtained are comparably lower than in the state-of-the-art gaze estimation studies due to the fact that we purely used synthetic data and fixed the camera position during synthetic image rendering.

# of samples
Mean angular error
5k 0.21
10k 0.18
20k 0.17
Table 1: The mean angular errors for varying sizes of the dataset

Furthermore, the total amount of time to train and test the models increased parallel to the number of samples, since the computation requirements get larger with more samples. It increases the communication cost as well as the size of the vectors and the matrices. The execution time of all parties for a single set of parameters are shown in Table 2.

# of samples
Exec. time
of Alice (s)
Exec. time
of Bob (s)
Exec. time
of server (s)
5k 4.62 2.40 29.28
10k 9.80 10.39 425.07
20k 51.41 77.70 537.80
Table 2: The execution times of the parties for varying sizes of the dataset

In order to assess the real-time working capabilities of the framework in the gaze estimation task, we assessed the required amount of time to estimate the gaze of test samples including pitch and yaw angles. In total, s were spent which was ms per sample. When the current sampling frequencies of eye trackers or HMDs are taken into consideration, it is possible to deploy and use the current framework in the wild to estimate gaze as long as optimized and efficient communication between parties is established.


Since a high amount of sensitive information is contained in eye tracking data such as behaviors, intentions or preferences, it is essential to develop secure and privacy preserving approaches to process the data. In this study, we utilized a framework based on randomized encoding to achieve gaze estimation on synthetic images generated by UnityEyes. In our proposed system, none of the input-parties has the access to the eye landmark data of the others or the result of the computation in the function-party. Similarly, the function-party cannot infer anything about the data of the input-parties. Additionally, even though the function-party has the ground-truth gaze vectors from the input-parties, in a real world application, temporal information of visual scanpath cannot be inferred due to the shuffling operation of the data and lack of additional sensory information such as sampling frequency. Since eye landmarks are not directly accessible, pupillary or blink information cannot be reconstructed either. While working as accurate as non-private gaze estimator SVR models using the same parameter set, our framework is also capable of working in real-time.

In conclusion, our major contribution with this work is two folds. First, we show that the accuracy of gaze estimation task could be improved by using the data of two input-parties in a privacy preserving manner. When the genetic structural differences in the eye region are taken into consideration, this could provide improved gaze estimation. Since, we also show that our framework is capable of working in real time, the proposed system could be deployed along with modern HMDs for different cases such as improved gaze-contingent rendering. Second, our study provides basis for privacy preserving solutions in eye-tracking related problems. Our approach indicates that two parties could use their sensitive eye-tracking features to solve similar eye-tracking related problems without exposing their data and use these solutions in real time as long as similar number of eye features are used. To the best of our knowledge, this is the first work based on function-specific privacy models in the eye tracking domain.

As future work, we plan to apply our solution to the tasks other than gaze estimation using eye movement features and assess it. Another potential improvement is to extend our approach to more number of input and function parties. Lastly, instead of using SVR in our solution, we plan to extend our approach especially to deep learning based solutions using additional encryption techniques and assess them.


  • [Abdrabou et al.2019] Abdrabou, Y.; Khamis, M.; Eisa, R. M.; Ismail, S.; and Elmougy, A. 2019. Just gaze and wave: Exploring the use of gaze and gestures for shoulder-surfing resilient authentication. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 29:1–29:10. New York, NY, USA: ACM.
  • [Appel et al.2018] Appel, T.; Scharinger, C.; Gerjets, P.; and Kasneci, E. 2018. Cross-subject workload classification using pupil-related measures. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ETRA ’18, 4:1–4:8. New York, NY, USA: ACM.
  • [Applebaum, Ishai, and Kushilevitz2006a] Applebaum, B.; Ishai, Y.; and Kushilevitz, E. 2006a. Computationally private randomizing polynomials and their applications. computational complexity 15(2):115–162.
  • [Applebaum, Ishai, and Kushilevitz2006b] Applebaum, B.; Ishai, Y.; and Kushilevitz, E. 2006b. Cryptography in nc^0. SIAM Journal on Computing 36(4):845–888.
  • [Applebaum2017] Applebaum, B. 2017. Garbled circuits as randomized encodings of functions: a primer. In Tutorials on the Foundations of Cryptography. Springer. 1–44.
  • [Berkovsky et al.2019] Berkovsky, S.; Taib, R.; Koprinska, I.; Wang, E.; Zeng, Y.; Li, J.; and Kleitman, S. 2019. Detecting personality traits using eye-tracking data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, 221:1–221:12. New York, NY, USA: ACM.
  • [Borji and Itti2014] Borji, A., and Itti, L. 2014. Defending yarbus: Eye movements reveal observers’ task. Journal of vision 14.
  • [Bozkir, Geisler, and Kasneci2019] Bozkir, E.; Geisler, D.; and Kasneci, E. 2019. Assessment of driver attention during a safety critical situation in VR to generate VR-based training. In ACM Symposium on Applied Perception 2019, SAP ’19, 23:1–23:5. New York, NY, USA: ACM.
  • [Braunagel et al.2017] Braunagel, C.; Geisler, D.; Rosenstiel, W.; and Kasneci, E. 2017. Online recognition of driver-activity based on visual scanpath classification. IEEE Intelligent Transportation Systems Magazine 9(4):23–36.
  • [Castner et al.2018] Castner, N.; Kasneci, E.; Kübler, T.; Scheiter, K.; Richter, J.; Eder, T.; Hüttig, F.; and Keutel, C. 2018. Scanpath comparison in medical image reading skills of dental students: distinguishing stages of expertise development. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications,  39. ACM.
  • [Chen and Epps2014] Chen, S., and Epps, J. 2014. Using task-induced pupil diameter and blink rate to infer cognitive load. Human-Computer Interaction 29:390–413.
  • [Eivazi et al.2017] Eivazi, S.; Hafez, A.; Fuhl, W.; Afkari, H.; Kasneci, E.; Lehecka, M.; and Bednarik, R. 2017. Optimal eye movement strategies: a comparison of neurosurgeons gaze patterns when using a surgical microscope. Acta neurochirurgica 159(6):959–966.
  • [John, Koppal, and Jain2019] John, B.; Koppal, S.; and Jain, E. 2019. Eyeveil: Degrading iris authentication in eye tracking headsets. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 37:1–37:5. New York, NY, USA: ACM.
  • [Kinnunen, Sedlak, and Bednarik2010] Kinnunen, T.; Sedlak, F.; and Bednarik, R. 2010. Towards task-independent person authentication using eye movement signals. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ’10, 187–190. New York, NY, USA: ACM.
  • [Komogortsev and Holland2013] Komogortsev, O. V., and Holland, C. D. 2013. Biometric authentication via complex oculomotor behavior. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 1–8.
  • [Komogortsev et al.2010] Komogortsev, O. V.; Jayarathna, S.; Aragon, C. R.; and Mahmoud, M. 2010. Biometric identification via an oculomotor plant mathematical model. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ’10, 57–60. New York, NY, USA: ACM.
  • [Kübler et al.2014] Kübler, T. C.; Kasneci, E.; Rosenstiel, W.; Schiefer, U.; Nagel, K.; and Papageorgiou, E. 2014. Stress-indicators and exploratory gaze for the analysis of hazard perception in patients with visual field loss. Transportation Research Part F: Traffic Psychology and Behaviour 24:231–243.
  • [Liebling and Preibusch2014] Liebling, D. J., and Preibusch, S. 2014. Privacy considerations for a pervasive eye tracking world. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, UbiComp ’14 Adjunct, 1169–1177. New York, NY, USA: ACM.
  • [Liu et al.2009] Liu, Y.; Hsueh, P.; Lai, J.; Sangin, M.; Nussli, M.; and Dillenbourg, P. 2009. Who is the expert? analyzing gaze data to predict expertise level in collaborative applications. In 2009 IEEE International Conference on Multimedia and Expo, 898–901.
  • [Liu et al.2019] Liu, A.; Xia, L.; Duchowski, A.; Bailey, R.; Holmqvist, K.; and Jain, E. 2019. Differential privacy for eye-tracking data. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 28:1–28:10. New York, NY, USA: ACM.
  • [Park et al.2018] Park, S.; Zhang, X.; Bulling, A.; and Hilliges, O. 2018. Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ETRA ’18, 21:1–21:10. New York, NY, USA: ACM.
  • [Steil and Bulling2015] Steil, J., and Bulling, A. 2015. Discovery of everyday human activities from long-term visual behaviour using topic models. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15, 75–85. New York, NY, USA: ACM.
  • [Steil et al.2019a] Steil, J.; Hagestedt, I.; Huang, M. X.; and Bulling, A. 2019a. Privacy-aware eye tracking using differential privacy. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 27:1–27:9. New York, NY, USA: ACM.
  • [Steil et al.2019b] Steil, J.; Koelle, M.; Heuten, W.; Boll, S.; and Bulling, A. 2019b. Privaceye: Privacy-preserving head-mounted eye tracking using egocentric scene image and eye movement features. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 26:1–26:10. New York, NY, USA: ACM.
  • [Ünal, Akgün, and Pfeifer2019] Ünal, A. B.; Akgün, M.; and Pfeifer, N. 2019.

    A framework with randomized encoding for a fast privacy preserving calculation of non-linear kernels for machine learning applications in precision medicine.

    In International Conference on Cryptology and Network Security, 493–511. Springer, Cham.
  • [Wood et al.2016] Wood, E.; Baltrušaitis, T.; Morency, L.-P.; Robinson, P.; and Bulling, A. 2016. Learning an appearance-based gaze estimator from one million synthesised images. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, 131–138.
  • [Zhang et al.2018] Zhang, Y.; Hu, W.; Xu, W.; Chou, C. T.; and Hu, J. 2018. Continuous authentication using eye movement response of implicit visual stimuli. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1(4):177:1–177:22.