Introduction
With the recent advances in the fields of smart glasses and HeadMountedDisplay (HMD) technology, computer graphics, augmented reality (AR) and eye tracking numerous novel applications are becoming available. One of the most natural and nonintrusive ways of interaction with these devices is achieved by gazecontingent interfaces using eye tracking. However, it is possible to derive a lot of sensitive and personal information from eye tracking data such as intentions, behaviors, preferences or fatigue due to the fact that eyes are not fully controlled in a conscious way.
To count a few of them, cognitive load [Chen and Epps2014, Appel et al.2018], attention [Bozkir, Geisler, and Kasneci2019], stress [Kübler et al.2014], task identification, skill level assessment and expertise [Liu et al.2009, Borji and Itti2014, Eivazi et al.2017, Castner et al.2018], activity recognition [Steil and Bulling2015, Braunagel et al.2017], biometric information and authentication [Kinnunen, Sedlak, and Bednarik2010, Komogortsev et al.2010, Komogortsev and Holland2013, Zhang et al.2018, Abdrabou et al.2019], or personality traits [Berkovsky et al.2019] could be obtained using eye movements and eye tracking data. Since highly sensitive information can be derived from eye tracking data, it is not very surprising that HMDs, smart glasses or similar devices have not been adopted by large communities yet. According to a recent survey [Steil et al.2019a], people agreed to share their eye tracking data only when it is coowned by a governmental healthagency. This indicates that people are hesitant about sharing their eyetracking data in commercial applications. Therefore, larger communities could adopt HMDs or smart glasses if privacypreserving techniques were applied in the eyetracking applications. The reasons why privacy preserving schemes are needed for eye tracking were discussed in [Liebling and Preibusch2014] extensively. However, until now, there are not many studies concentrated on privacy preserving eye tracking. Recently, a method to detect privacysensitive everyday situations [Steil et al.2019b], an approach to degrade iris authentication while keeping the gaze tracking utility in acceptable accuracy [John, Koppal, and Jain2019], and differentialprivacy based techniques to protect personal information on heatmaps and eye movement features [Liu et al.2019, Steil et al.2019a] were introduced. While differential privacy could be applied to eye tracking data for various tasks and used effectively, it introduces additional noise on the data which causes decrease in the utility [Liu et al.2019, Steil et al.2019a]
, and could lead to less accurate results in computer vision tasks, such as gaze estimation, intention detection or activity recognition.
In light of the above, functionspecific privacy models are required. In this work, we focus on the gaze estimation problem as a proofofconcept due to the simplicity of synthetic data generation including eye landmarks and ground truth gaze vectors. However, the same privacy preserving approach could be further extended to any featurebased, eyetracking problem such as intention, fatigue or personality trait detection, in HMD or unconstrained setups due to the realtime working capabilities. In our study, the gaze estimation task is solved in a privacypreserving manner by computing the dot product for a scenario, where we have two parties having eye landmarks, each of which we call inputparty, and one party that we call functionparty that wants to train a prediction model on the data of the inputparties. This scenario is relevant when the inputparties want to use eyetracking data to improve the accuracy of their gaze estimators and do not want to share the data due to privacy concerns. To accomplish this, we utilize a framework employing randomized encoding [Ünal, Akgün, and Pfeifer2019]. In the computation, neither the eye images nor the extracted features will be revealed to the function party. Furthermore, the inputparties will not learn anything about the data of the other inputparty or the result of the computation. We utilized Support Vector Regression (SVR) to estimate the gaze vectors. Eye images were rendered using UnityEyes [Wood et al.2016] synthetically and 36 landmarkbased features [Park et al.2018] were used. To the best of our knowledge, this is the first study that applies a privacy preserving scheme based on functionspecific privacy models on an eye tracking related problem.
Methodology
In this section, we briefly discuss the synthetic data generation, processing, randomized encoding, and overall framework.
Data Generation
To train and evaluate the gaze estimator, we first generated eye images and their gaze vectors. Since our work is a proofofconcept and requires a high amount of data, synthetic eye images rendered by UnityEyes [Wood et al.2016], which is based on the Unity3D game engine, were used. Camera parameters and Eye parameters were taken as (fixed camera) and (eyeball pose range parameters in degrees), respectively. images were rendered in Fantastic graphics quality setting and screen resolution. Afterwards, data processing and normalization pipeline from [Park et al.2018] was employed. In the end, we obtained sized eye images, 18 eye landmarks including eight iris edge landmarks, eight eyelid landmarks, one iris center landmark and one iriscentereyeballcenter vector normalized according to Euclidean distance between two eye corners and gaze vectors including pitch and yaw angles. Final feature vectors consisted of normalized elements. An example visualization of eye landmarks and gaze vector on the rendered synthetic images is shown in Figure 1.
Randomized Encoding
The framework that we utilized employs randomized encoding (RE) [Applebaum, Ishai, and Kushilevitz2006b, Applebaum, Ishai, and Kushilevitz2006a] to compute the elementwise multiplication of the landmark vectors, which is then used to compute the dot product of these vectors. In randomized encoding, the computation of a function is performed by a randomized function where is the input value and is the random value. The idea is to encode the original function by using random value(s) such that the combination of the components of the encoding reveals only the output of the original function but nothing else. In the framework, the computation of the dot product is accomplished by utilizing the decomposable and affine randomized encoding (DARE) of multiplication [Applebaum2017]. The encoding of multiplication is as follows:
Definition 1 (Perfect RE for Multiplication)
[Applebaum2017] Let there be a multiplication function over a ring . One can perfectly encode by employing the DARE :
where and are uniformly chosen random values. The recovery of can be accomplished by computing where , , and . The simulation of can be done perfectly by the simulator where are random values.
Framework
To perform the private gaze estimation task in our scenario, we inspired from the framework proposed in [Ünal, Akgün, and Pfeifer2019] due to its efficiency compared to the other proposed approaches in the literature. The framework was proposed to compute the addition or multiplication of the input values of two inputparties in the functionparty by utilizing randomized encoding. We utilized the multiplication operation over the eye landmark vectors to compute the dot product of these vectors.
Let us assume that Alice and Bob have the eye landmark data and where and represent the number of samples in Alice and Bob, respectively, and is the number of features. In addition to the inputparties Alice and Bob, let us assume that there is a server that wants to train a model on the data of the inputparties. Before continuing with the process, it is worth to note that for any matrix represents the th column of the corresponding matrix and ”“ represents the elementwise multiplication of the vectors. Furthermore, the inputparties shuffle their data before the computation to avoid the possibility of private information leakage such as the behavior of the person due to the nature of the visual sequence information. At the first step of the computation, Alice creates three vectors with uniformly chosen random values, which will be used to encode the elementwise multiplication of the vectors. Afterwards, Alice shares these vectors with Bob. Once Bob receives the vectors with random values, he computes and , where and . Meanwhile, Alice computes and , where and . When the inputparties compute their share of the encoding, they send these components to the server along with the gram matrix of their own samples, which is the dot product among their own samples. Then, the server needs to compute the dot product between samples of Alice and Bob to complete the missing part of the gram matrix of all samples. To achieve this, the server computes , and where is the th row th column entry of the gram matrix between the samples of Alice and Bob. Once the server has all components of the gram matrix, it can then construct the complete gram matrix by simply concatenating the parts of it. In our solution, Alice and Bob send to the server and tuples, respectively. and are the summations of one share of the elementwise product of the whole input. By doing so, we prevent the server from making inference from the elementwise multiplication of the raw vectors. The overall flow between the inputparties and server is summarized in Figure 2.
After having the whole gram matrix for all samples that Alice and Bob have, the server can use it as a kernel matrix as if it was computed by the linear kernel function on pooled data. Additionally, the server can also compute a kernel matrix as if it was computed by the polynomial or radial basis kernel function (RBF) by utilizing the resulting gram matrix. As an example, the calculation of RBF from the gram matrix is as follows:
where “” represents the dot product of vectors, which can be obtained from the gram matrix, and is the parameter utilized to adjust the similarity level. Once the desired kernel matrix is computed, one can train an SVR model by employing the computed kernel matrix to estimate the gaze. In the process of the computation of the dot product, the amount of data transferred among parties is bytes where is the size of one data unit.
Results
To demonstrate the performance of the proposed framework on the gaze estimation problem, we conducted experiments on a PC equipped with Intel Core i77500U with 2.70 GHz processor and 16 GB memory RAM. In these experiments, we employed varying sizes of eye landmark data, that were , and samples of which onefifth was the test data. We split the data between the inputparties equally. It is worth to note that the framework allows us to optimize the parameters of the model in the server without further communication with the inputparties. Thanks to this, we utilized 5fold crossvalidation to optimize the parameters, which are the similarity adjustment parameter of the Gaussian RBF kernel; the misclassification penalty parameter and the tolerance parameter of SVR. In order to evaluate the prediction results of the experiments, we employed mean angular error in the same way as [Park et al.2018]. Table 1 demonstrates the relationship between the size of the dataset and the mean angular error of gaze estimation. Since no additional noise was introduced during the computation of the kernel matrix, the results that we obtained from the framework are the same with the nonprivate ones as long as the same parameters are utilized. The mean angular error values we obtained are comparably lower than in the stateoftheart gaze estimation studies due to the fact that we purely used synthetic data and fixed the camera position during synthetic image rendering.



5k  0.21  
10k  0.18  
20k  0.17 
Furthermore, the total amount of time to train and test the models increased parallel to the number of samples, since the computation requirements get larger with more samples. It increases the communication cost as well as the size of the vectors and the matrices. The execution time of all parties for a single set of parameters are shown in Table 2.





5k  4.62  2.40  29.28  
10k  9.80  10.39  425.07  
20k  51.41  77.70  537.80 
In order to assess the realtime working capabilities of the framework in the gaze estimation task, we assessed the required amount of time to estimate the gaze of test samples including pitch and yaw angles. In total, s were spent which was ms per sample. When the current sampling frequencies of eye trackers or HMDs are taken into consideration, it is possible to deploy and use the current framework in the wild to estimate gaze as long as optimized and efficient communication between parties is established.
Conclusion
Since a high amount of sensitive information is contained in eye tracking data such as behaviors, intentions or preferences, it is essential to develop secure and privacy preserving approaches to process the data. In this study, we utilized a framework based on randomized encoding to achieve gaze estimation on synthetic images generated by UnityEyes. In our proposed system, none of the inputparties has the access to the eye landmark data of the others or the result of the computation in the functionparty. Similarly, the functionparty cannot infer anything about the data of the inputparties. Additionally, even though the functionparty has the groundtruth gaze vectors from the inputparties, in a real world application, temporal information of visual scanpath cannot be inferred due to the shuffling operation of the data and lack of additional sensory information such as sampling frequency. Since eye landmarks are not directly accessible, pupillary or blink information cannot be reconstructed either. While working as accurate as nonprivate gaze estimator SVR models using the same parameter set, our framework is also capable of working in realtime.
In conclusion, our major contribution with this work is two folds. First, we show that the accuracy of gaze estimation task could be improved by using the data of two inputparties in a privacy preserving manner. When the genetic structural differences in the eye region are taken into consideration, this could provide improved gaze estimation. Since, we also show that our framework is capable of working in real time, the proposed system could be deployed along with modern HMDs for different cases such as improved gazecontingent rendering. Second, our study provides basis for privacy preserving solutions in eyetracking related problems. Our approach indicates that two parties could use their sensitive eyetracking features to solve similar eyetracking related problems without exposing their data and use these solutions in real time as long as similar number of eye features are used. To the best of our knowledge, this is the first work based on functionspecific privacy models in the eye tracking domain.
As future work, we plan to apply our solution to the tasks other than gaze estimation using eye movement features and assess it. Another potential improvement is to extend our approach to more number of input and function parties. Lastly, instead of using SVR in our solution, we plan to extend our approach especially to deep learning based solutions using additional encryption techniques and assess them.
References
 [Abdrabou et al.2019] Abdrabou, Y.; Khamis, M.; Eisa, R. M.; Ismail, S.; and Elmougy, A. 2019. Just gaze and wave: Exploring the use of gaze and gestures for shouldersurfing resilient authentication. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 29:1–29:10. New York, NY, USA: ACM.
 [Appel et al.2018] Appel, T.; Scharinger, C.; Gerjets, P.; and Kasneci, E. 2018. Crosssubject workload classification using pupilrelated measures. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ETRA ’18, 4:1–4:8. New York, NY, USA: ACM.
 [Applebaum, Ishai, and Kushilevitz2006a] Applebaum, B.; Ishai, Y.; and Kushilevitz, E. 2006a. Computationally private randomizing polynomials and their applications. computational complexity 15(2):115–162.
 [Applebaum, Ishai, and Kushilevitz2006b] Applebaum, B.; Ishai, Y.; and Kushilevitz, E. 2006b. Cryptography in nc^0. SIAM Journal on Computing 36(4):845–888.
 [Applebaum2017] Applebaum, B. 2017. Garbled circuits as randomized encodings of functions: a primer. In Tutorials on the Foundations of Cryptography. Springer. 1–44.
 [Berkovsky et al.2019] Berkovsky, S.; Taib, R.; Koprinska, I.; Wang, E.; Zeng, Y.; Li, J.; and Kleitman, S. 2019. Detecting personality traits using eyetracking data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, 221:1–221:12. New York, NY, USA: ACM.
 [Borji and Itti2014] Borji, A., and Itti, L. 2014. Defending yarbus: Eye movements reveal observers’ task. Journal of vision 14.
 [Bozkir, Geisler, and Kasneci2019] Bozkir, E.; Geisler, D.; and Kasneci, E. 2019. Assessment of driver attention during a safety critical situation in VR to generate VRbased training. In ACM Symposium on Applied Perception 2019, SAP ’19, 23:1–23:5. New York, NY, USA: ACM.
 [Braunagel et al.2017] Braunagel, C.; Geisler, D.; Rosenstiel, W.; and Kasneci, E. 2017. Online recognition of driveractivity based on visual scanpath classification. IEEE Intelligent Transportation Systems Magazine 9(4):23–36.
 [Castner et al.2018] Castner, N.; Kasneci, E.; Kübler, T.; Scheiter, K.; Richter, J.; Eder, T.; Hüttig, F.; and Keutel, C. 2018. Scanpath comparison in medical image reading skills of dental students: distinguishing stages of expertise development. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, 39. ACM.
 [Chen and Epps2014] Chen, S., and Epps, J. 2014. Using taskinduced pupil diameter and blink rate to infer cognitive load. HumanComputer Interaction 29:390–413.
 [Eivazi et al.2017] Eivazi, S.; Hafez, A.; Fuhl, W.; Afkari, H.; Kasneci, E.; Lehecka, M.; and Bednarik, R. 2017. Optimal eye movement strategies: a comparison of neurosurgeons gaze patterns when using a surgical microscope. Acta neurochirurgica 159(6):959–966.
 [John, Koppal, and Jain2019] John, B.; Koppal, S.; and Jain, E. 2019. Eyeveil: Degrading iris authentication in eye tracking headsets. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 37:1–37:5. New York, NY, USA: ACM.
 [Kinnunen, Sedlak, and Bednarik2010] Kinnunen, T.; Sedlak, F.; and Bednarik, R. 2010. Towards taskindependent person authentication using eye movement signals. In Proceedings of the 2010 Symposium on EyeTracking Research & Applications, ETRA ’10, 187–190. New York, NY, USA: ACM.
 [Komogortsev and Holland2013] Komogortsev, O. V., and Holland, C. D. 2013. Biometric authentication via complex oculomotor behavior. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 1–8.
 [Komogortsev et al.2010] Komogortsev, O. V.; Jayarathna, S.; Aragon, C. R.; and Mahmoud, M. 2010. Biometric identification via an oculomotor plant mathematical model. In Proceedings of the 2010 Symposium on EyeTracking Research & Applications, ETRA ’10, 57–60. New York, NY, USA: ACM.
 [Kübler et al.2014] Kübler, T. C.; Kasneci, E.; Rosenstiel, W.; Schiefer, U.; Nagel, K.; and Papageorgiou, E. 2014. Stressindicators and exploratory gaze for the analysis of hazard perception in patients with visual field loss. Transportation Research Part F: Traffic Psychology and Behaviour 24:231–243.
 [Liebling and Preibusch2014] Liebling, D. J., and Preibusch, S. 2014. Privacy considerations for a pervasive eye tracking world. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, UbiComp ’14 Adjunct, 1169–1177. New York, NY, USA: ACM.
 [Liu et al.2009] Liu, Y.; Hsueh, P.; Lai, J.; Sangin, M.; Nussli, M.; and Dillenbourg, P. 2009. Who is the expert? analyzing gaze data to predict expertise level in collaborative applications. In 2009 IEEE International Conference on Multimedia and Expo, 898–901.
 [Liu et al.2019] Liu, A.; Xia, L.; Duchowski, A.; Bailey, R.; Holmqvist, K.; and Jain, E. 2019. Differential privacy for eyetracking data. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 28:1–28:10. New York, NY, USA: ACM.
 [Park et al.2018] Park, S.; Zhang, X.; Bulling, A.; and Hilliges, O. 2018. Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ETRA ’18, 21:1–21:10. New York, NY, USA: ACM.
 [Steil and Bulling2015] Steil, J., and Bulling, A. 2015. Discovery of everyday human activities from longterm visual behaviour using topic models. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15, 75–85. New York, NY, USA: ACM.
 [Steil et al.2019a] Steil, J.; Hagestedt, I.; Huang, M. X.; and Bulling, A. 2019a. Privacyaware eye tracking using differential privacy. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 27:1–27:9. New York, NY, USA: ACM.
 [Steil et al.2019b] Steil, J.; Koelle, M.; Heuten, W.; Boll, S.; and Bulling, A. 2019b. Privaceye: Privacypreserving headmounted eye tracking using egocentric scene image and eye movement features. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, ETRA ’19, 26:1–26:10. New York, NY, USA: ACM.

[Ünal, Akgün, and
Pfeifer2019]
Ünal, A. B.; Akgün, M.; and Pfeifer, N.
2019.
A framework with randomized encoding for a fast privacy preserving calculation of nonlinear kernels for machine learning applications in precision medicine.
In International Conference on Cryptology and Network Security, 493–511. Springer, Cham.  [Wood et al.2016] Wood, E.; Baltrušaitis, T.; Morency, L.P.; Robinson, P.; and Bulling, A. 2016. Learning an appearancebased gaze estimator from one million synthesised images. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, 131–138.
 [Zhang et al.2018] Zhang, Y.; Hu, W.; Xu, W.; Chou, C. T.; and Hu, J. 2018. Continuous authentication using eye movement response of implicit visual stimuli. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1(4):177:1–177:22.
Comments
There are no comments yet.