VirtualIdentity: Privacy-Preserving User Profiling

08/30/2018
by   Sisi Wang, et al.
Ghent University
Aarhus Universitet
0

User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learning models for user profiling -- which are the intellectual property of the company -- to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy-preserving manner.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

05/01/2018

Securing Social Media User Data - An Adversarial Approach

Social media users generate tremendous amounts of data. To better serve ...
09/03/2021

Increasing Adversarial Uncertainty to Scale Private Similarity Testing

Social media and other platforms rely on automated detection of abusive ...
06/21/2019

Privacy Preserving QoE Modeling using Collaborative Learning

Machine Learning based Quality of Experience (QoE) models potentially su...
09/02/2021

Coordinating Narratives and the Capitol Riots on Parler

Coordinated disinformation campaigns are used to influence social media ...
04/10/2017

Matching Media Contents with User Profiles by means of the Dempster-Shafer Theory

The media industry is increasingly personalizing the offering of content...
01/05/2020

User Profiling Using Hinge-loss Markov Random Fields

A variety of approaches have been proposed to automatically infer the pr...
10/27/2021

Masked LARk: Masked Learning, Aggregation and Reporting worKflow

Today, many web advertising data flows involve passive cross-site tracki...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

As more users are creating their own content on the web, there is a growing interest to mine this data for use in personalized information access services, recommender systems, tailored advertisements, and other applications that can benefit from personalization [31]. In addition to myriad applications in e-commerce, there is a growing interest in user profiling for digital text forensics [40]. Furthermore, the popularity of applications such as How-Old.net and HowHot.io shows that users are directly interested in their own personal features analysis as well [37, 36]. What is common across all of these existing personalized services is that the personal data of users, such as their pictures and text, is fully exposed to the user profiling service.

An obvious way to circumvent this would be to perform the user profiling entirely on the user’s side. However, this would imply sharing proprietary, trained machine learning models for user profiling with each user of the social media site. Applying traditional cryptography to encrypt the personal data of the user (henceforth called the client) before sending it to the user profiling service (the service, or server) is not a solution either, as data encrypted with usual techniques becomes useless, and user characteristics can no longer be derived from it. Hiding the client’s data from the service, while still allowing the client to use the service, requires novel cryptographic techniques that not only protect private information but also allow mathematical operations to be performed on encrypted data. To this end, the VirtualIdentity application that we present in this paper (see Figure 1) relies on secure multi-party computation, a process in which client and server jointly compute classification labels by exchanging encrypted messages, while keeping their own inputs private. As a result, VirtualIdentity allows a user to run our trained support vector machines (SVMs) for detection of age, gender, and personality traits, without leaking any personal text or profile picture to our server. In addition, the user does not learn anything about the coefficients of our SVM models.

Other services exist that will predict a user’s age, gender, or personality based on UGC. For example, users can input their tweets or text and receive back scores of their personality, needs, and values [28]

. Another site allows users to input a photo and receive an estimation of the gender and age of each face in the photo

[37] while a third estimates the user’s attractiveness and age from a photo [36]. However, none of these services attempt to keep the user’s data private. To the best of our knowledge, VirtualIdentity is the first platform to construct user profiles while preserving both the privacy of the user’s data and the prediction models.

Fig. 1: Screenshots of VirtualIdentity application.

This paper is an extended version of the work that appeared at the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) [46]. This full version contains the description of the cryptographic tools that are used.

Ii Predictive Models

Much work has been done recently using machine learning classification to predict age, gender, and personality based on images and text “in the clear”, i.e. without any attempts for privacy preservation. In this paper we use SVMs, which are known as state-of-the-art classification techniques for detecting age, gender and personality traits from text and images [35, 24, 9, 25, 26].

Ii-a Age and Gender Classification

For age and gender classification we used the IMDB image dataset, which is part of the IMDB-WIKI dataset. This set is formed by 460,723 face images crawled from IMDB websites with age and gender information [43]. From each image, we detected face and cropped the margin to using OpenCV [39]. Then we extracted 136 facial landmark features using Dlib [34]

. These features, which include attributes such as the exact locations of the eyes, nose, and mouth, were then used to train the models. Some facial images were dropped because there is no face or more than one face detected in them by OpenCV or Dlib’s facial landmark detector. In addition, images with unreasonable age (e.g.,negative age) and images without gender information were taken out as well. After preprocessing and feature extraction, we have 318,562 valid instances remaining in the set. The set is divided into 4 similar-sized age groups: (7-26), (27-34), (35-43), (44-101). For age classification, each instance will be classified into one age bucket. For the actual training, we used 6000 of the IMDB dataset images such that the age and gender distributions of the selected images are representative of the full set.

We trained a binary SVM classifier for gender classification, and three binary SVMs for age classification. We use the results of all three age classifiers to determine the most likely age bracket. When a new instance comes in, it will first be classified by Age SVM2 to determine if it is younger than 35 or not. If it is classified into younger than 35-year-old group, it will then be scored against Age SVM1 to see if it is younger than 27. Otherwise, Age SVM3 will be applied to check if the new instance is older than 43. This approach is similar to the approach of Han et al. [26]. While they use additional models to then predict an actual age inside the bracket, we return the result determined from the three original SVMs.

Table 1 depicts the baseline values compared with the average accuracies of the gender SVM and three age SVMs using 10-fold cross-validation.

TABLE I: Age and Gender Classification SVMs Accuracy Table

Ii-B Personality Traits Detection

For personality we report scores using the traits of the widely accepted model, the Big Five, consisting of the following five results: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism [14]. Our goal is to create classifiers to predict if one user displays those 5 characteristics. The dataset we used for personality traits detection is a dataset with 2467 essays (one empty instance was removed from the original 2468) from psychology students who were told to write whatever came to their mind for 20 minutes [35]. Each essay was analysed and given Big Five personality ground truth labels by Pennebaker et al. [42].

We extracted three kinds of features from the essays as input for the classifiers: 14 MRC features, 10 NRC features, and 19 LIWC features (43 features in total).

MRC is a psycholinguistic database which contains psychological and distributional information about words such as the number of letters in the word, the concreteness, and the age of acquisition [12].We used the same 14 MRC features as Farnadi et al. [25] The features are: number of letters in the word (NLET), number of phonemes in the word (NPHON), number of syllables in the word (NSYL), Kucera and Francis written frequency (KF FREQ), Kucera and Francis number of categories (KF NCATS), Kucera and Francis number of samples (KF NSAMP), Thorndike-Lorge frquency (TL FREQ), Brown verbal frequency (BROWN FREQ), Familiarity (FAM), concreteness (CONC), imagery (IMAG), mean Colerado Meaningfulness (MEANC), mean Pavio Meaningfulness (MEANP), and age of acquisition (AOA). Each feature is computed by averaging the feature value of all the words in the essay. [25]

NRC is a lexicon that contains more than 14,000 distinct English words annotated with 8 emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust), and 2 sentiments (negative, positive)

[38]. For each document we counted the number of words in each of the 8 emotion and 2 sentiment categories, resulting in 10 features per document.

The Linguistic Inquiry and Word Count tool (LIWC) is a well-known text analysis software which is widely used in psychology studies [44]. Part of the LIWC features rely on a proprietary dictionary. Our SVM models are trained on 19 LIWC features that relate to standard counts and that do not require the specific LIWC dictionary: word count, words per sentence, number of unique words, number of words longer than six letters, number of abbreviations, emoticons, question marks, periods, commas, colons, semi-colons, exclamation marks, dashes, quotation marks, apostrophes, parentheses, other punctuation marks, all punctuation marks, and number of interrogative sentences.

Since more than one trait can be present in the same user, we used the 43 features to trained one binary SVM classifier for each of the five traits, which separates the users displaying the characteristic from those who do not. Table 2 depicts the baseline values compared with the average accuracies of our 5 SVMs using 10-fold cross-validation.

TABLE II: Personality Classification SVMs Accuracy Table

All SVMs in our model bank were trained using scikit-learn in Python, with a linear kernel and penalty parameter .

The trained SVMs are part of a private machine learning model bank that resides on the server, as shown on the right side in Figure 2. When a user requests analysis of a snippet of text and a picture, the features described above are extracted from the text and the image on the client side, as shown on the left side in Figure 2. Neither the user’s text, nor the user’s image, nor any of the extracted features are leaked to the server. Instead, both the client and the server engage in cryptographic protocols and exchange encrypted messages that ultimately allow the server to classify the feature vectors of the client, without ever seeing them in the clear, as we explain in Section III.

Fig. 2: System overview of the VirtualIdentity application.

Iii Adding Privacy to our Classifiers

Only a limited amount of work has been done in cryptographically secure privacy-preserving machine learning classification and none of it is aimed specifically at user profiling.

Cryptographically secure privacy-preserving SVM classification protocols have been proposed in [33, 5, 16, 10]. The basic idea behind these protocols is to decompose the task of scoring an SVM into smaller tasks and to implement each one of them in a privacy-preserving way. To better understand these previous approaches we recall that for the case of two classes, the process for SVM classification in the clear is as follows [13]: the client holds an -dimensional input feature vector , and the server holds a trained model that consists of a -dimensional vector of weights and a real number learned from the training data. The result of the classification is obtained by computing

where the function gives if and otherwise. For instance, in the case of personality prediction, is a 43-dimensional vector with features extracted from the client’s text and are the weights and the bias that make up the trained SVM model for e.g. “neuroticism”. A classification outcome means that the user is neurotic, and an outcome of means that he is not. Therefore, to score SVMs privately, one needs to build privacy-preserving for two tasks: computing inner products and performing comparisons.

In [33], private inner products and comparisons are obtained by using additive homomorphic encryption and oblivious transfer, while in [5] the proposed protocols are based on Paillier encryption - a specific kind of additive homomorphic encryption scheme. These operations are usually expensive from a computational complexity point of view, demanding costly modular exponentiations. In [16, 10], highly efficient protocols for privacy-preserving comparison and argmax in the commodity-based model [4] were proposed. In the commodity-based model correlated data is pre-distributed to the parties by a trusted initializer during an off-line setup phase. Here we use the comparison protocol of [10] combined with a generalization for matrices of Beaver’s multiplication protocol [2] in order to obtain the first implementation of a practical system for solving the problem of privacy-preserving user profiling.

We have already mentioned how we perform the private classification of personality traits. Now, we briefly describe how we proceed to obtain age and gender prediction. For age prediction, we first split the age groups into classes, such that the frequency of each class is equal. Because there are classes, there will be splitting points. Our target functionality is the one that first determine if the instance is in the lower or upper half by using the binary SVM that is the middle, and then use one binary SVM more in the remaining half to determine the exact class. In the protocol, instead of running the SVMs sequentially, we run all the three independent SVMs in parallel up to the point right before the opening of the results. We then open the result of the SVM in the middle, and after that we open the result of the SVM that is relevant for the chosen half in order to determine the class. A separate SVM is evaluated in a privacy-preserving way to determine the gender of the user. It should be noted, that the techniques used here for implementing privacy-preserving inner product and comparison protocols only work for integer values. To account for this, real values must be converted into integers and lose some of the precision allowable by floating notation.

In this work we consider honest-but-curious adversaries (i.e., adversaries that follow the protocol instructions but try to learn additional information), as done in the other works on privacy-preserving classification.

Iii-a Computing with Secret Sharing

We use the paradigm of secure computation based on secret sharings. For each shared value, Alice and Bob hold uniformly random values (i.e., the shares) constrained to the condition that they sum up to the actual value that is shared. The computation is then done over the shares, and when it is finished Alice and Bob exchange their shares of the output in order to recover it. In more details, if the shares are in a ring and the shared value is , Alice and Bob hold uniformly random , respectively, subject to the constraint that . Our notation for secret sharings will follow the one used in [10]. Let denote the secret sharing . Given secret sharings , , performing additions and subtractions of the shared values, and adding a/multiplying by a constant are very simple operations that can be performed locally by Alice and Bob by just performing the respective operations in their local shares. These local operations will be denoted by , , and . The notation is extend straightforwardly to element-wise secret sharing of vectors and matrices ; and similarly for the operations.

Iii-B Commodity-based Cryptography

Our solution works in the commodity-based model [4, 3] – a setup assumption in which a trusted initializer pre-distributes correlated randomness to Alice and Bob during an initial setup phase. This pre-distributed data is independent from the protocol inputs, which can even be fixed far after. We should also emphasize that the trusted initializer does not participate anymore after given the pre-distributed correlated randomness to Alice and Bob. The commodity-based model allows the design of practical, unconditionally secure protocols for many interesting functionalities, for example: inner product [19], linear algebra [15], oblivious transfer [4], oblivious polynomial evaluation [45], verifiable secret sharing [23], set intersection [29] and string equality [29]. Given its usefulness it was also already used for obtaining privacy-preserving machine learning protocols [16, 11, 10]. In this work the trusted initializer is modeled by an ideal functionality , which is parametrized by an algorithm that samples the correlated data to be pre-distributed to Alice and Bob. See Figure 3 for details.

Functionality

runs with Alice and Bob and is parametrized by an algorithm . Upon initialization run . Deliver to Alice and to Bob.
Fig. 3: The Trusted Initializer functionality.

Another advantage of the commodity-based model is that it is one of the setup assumptions allowing to obtain UC-security [6], which is the notion of security allowing the modular design of protocols while keeping the security guarantees (as done in our privacy-preserving protocols). 111It is impossible to obtain non-trivial UC-secure two-party computation without setup assumptions [7, 8]. Some other setup assumptions are also known to allow non-trivial two-party computation, such as: the existence of a common reference string [7, 8, 41], of noisy channels [21, 20], of a public public-key infrastructure [1], of signature cards [27] or of tamper-proof hardware [30, 17, 22].

Iii-C Secure Distributed Matrix Multiplication

While Section III-A described many operations that can be performed locally by Alice and Bob, the most important operation that is missing and that requires interaction between them is the multiplication of shared values. This can be an expensive operation in general, but in the commodity-based model there is a very elegant and simple solution by Beaver [2]. As we will also need secure (distributed) inner product as a building block, we describe here a generalization of Beaver’s idea that performs secure distributed matrix multiplication (and so covers both cases of interest) following the description used in [10]. Alice and Bob hold secret sharings and of matrices and

and they want to obtain a secret sharing corresponding to the matrices’ product. The approach is to have the trusted initializer pre-distributing a random matrix multiplication triple to Alice and Bob, i.e., secret sharings

and with and uniformly random in and , respectively, and . This matrix multiplication triple is then easily derandomize by Alice and Bob in order to match the actual inputs. The protocol is described in Figure 4. The protocol correctness can be easily checked using the fact that . The protocol security essentially comes from the fact that in the revealed values, and , the inputs and

are masked by completely random one-time pads

and , respectively (and the one-time pads are only used once). A more detailed security proof can be found in [10, 18]:

Lemma III.1

The protocol UC-realizes the distributed matrix multiplication functionality against honest-but-curious adversaries in the commodity-based model.

Secure Matrix Multiplication Protocol

The protocol is parametrized by the size of the ring and the dimensions , and of the matrices, and runs with Alice and Bob. The trusted initializer chooses uniformly random and in and , respectively, computes and pre-distributes secret sharings to Alice and Bob. Alice and Bob have inputs , and interact as follows:
  1. Locally compute and , then open and .

  2. Locally compute .

Fig. 4: The protocol for secure distributed matrix multiplication [10]..

Notation: We denote by the protocol for the special case of multiplication of single elements. The special case of inner product computation will be denoted as .

Iii-D Secure Distributed Comparison

We also use as a building block the secure distributed comparison protocol of [10]. Alice and Bob want to compare two -bit integers, and . Alice and Bob have as input secret sharings and of each bit of and . The output of the distributed comparison is if and if . No additional information should be leaked to Alice or Bob. The (basic) comparison protocol is described in Figure 5. The security of was proved in [10]. The intuition is that the only non-local operations are the multiplications, so the security of the distributed comparison protocol follows from the security of the distributed multiplication protocol.

Secure Distributed Comparison Protocol

Let be the bit length of the integers to be compared. The trusted initializer pre-distributes the correlated randomness necessary for the execution of all instances of the distributed multiplication protocol. Alice and Bob have as inputs shares and of each bit of and . The protocol proceeds as follows:
  1. For , compute in parallel using the multiplication protocol and locally compute .

  2. For , compute using the multiplication protocol .

  3. Compute locally.

Fig. 5: The protocol for secure distributed comparison [10].
Lemma III.2 ([10])

The distributed comparison protocol UC-realizes the distributed comparison functionality against honest-but-curious adversaries in the commodity-based model.

We use the optimized version of the protocol (described in [10]), which only has rounds.

Iii-E Secure Bit-Decomposition

For obtaining the privacy-preserving SVMs, the secure inner product and the secure distributed comparison protocols need to be integrated; however, the inner product will be used with inputs over large rings, while the comparison protocol works on the binary field. Therefore it is necessary to have a protocol for converting secret sharings in the big field to bit-wise secret-sharings in the binary field (for ). This work uses the same specialized bit-decomposition protocol as in [10] (which is similar to the one of Laud and Randmets [32]). It works for and the main idea is use a carry computation to obtain the bitwise secret sharings starting from the shares of that Alice and Bob have. The (basic version of the) bit-decomposition protocol is presented in Figure 6. The security of follows intuitively from the fact that the only non-local operations are the distributed multiplications, and these are performed using a UC-secure protocol.

Secure Bit-Decomposition Protocol

Let be the bit length of the value to be re-shared. All distributed multiplications using protocol will be over and the required correlated randomness is pre-distributed by the trusted initializer. Alice and Bob have as input for and proceed as follows:

  1. Let denote Alice’s share of , which corresponds to the bit string . Similarly, let denote Bob’s share of , which corresponds to the bit string . Define the secret sharings as the pair of shares for , as and as .

  2. Compute using and locally set .

  3. For :

    1. Compute

  4. Output for .

Fig. 6: The secure bit-decomposition protocol [10].
Lemma III.3 ([10])

Over any ring , the bit-decomposition protocol UC-realizes the bit-decomposition functionality in the commodity-based model.

Optimization: This work use the round-optimized version of , which has rounds and uses instances of the multiplication protocol over . Details available in [10].

Iii-F Privacy-Preserving SVMs

As the mentioned before, for a SVM, given the feature vector and the trained model that consists of a vector of weights and a real number , the result of the classification is given by

The idea for obtaining a privacy-preserving SVM protocol is as follows: (1) Alice inputs her feature vector and Bob inputs the weight vector to the secure inner product protocol ; (2) the resulting secret sharing in is then processed by the bit-decomposition protocol to obtain bitwise secret sharings in ; (3) then the bitwise secret sharings can be used in the comparison protocol to check whether it is greater than or not. The final result is then revealed to Alice.

The security of this protocol follows straightforwardly from the fact that the building blocks are UC-secure (i.e., they can be arbitrarily composed without the security being compromissed) and the fact that no values are ever opened before the final result. In other words, other than the output, each party only sees shares which appear completely random to them.

Iv System Overview

The overall architecture of our demo is shown in Figure 2. The framework consists of a client Java application, a server, and the cryptographic protocols embedded in client and server. Next, we describe these modules.

Iv-a Client Application

The user interface of our client application shown in Figure 1 is developed with JavaFX. The client application consists of a feature extractor and its respective portion of the cryptographic protocols. It allows users to upload user generated content (i.e. to input written text and to upload a personal picture). It extracts features from the UGC, executes cryptographic protocols with the server, and interprets and displays the final prediction results from the machine learning models. The interpretation of personality refers to Personality Insights [28].

Iv-B Server

The server contains its respective portion of the cryptographic protocols and the private machine learning model bank. The model bank contains the SVMs which are used for predicting personality traits, age and gender.

Iv-C Cryptographic Protocols

The cryptographic protocols (privacy-preserving protocols for computing inner products and comparisons), are executed in both the client and server side. The trusted initializer pre-distributes correlated data to the client and the server as specified in the commodity-based model during an off-line phase [16, 4]. The communication between client and server is implemented using sockets. The whole VirtualIdentity application is programmed with Java under JDK 1.8.

Iv-D System Performance

We have implemented VirtualIdentity within a server of the University of Washington. The client was hosted in a local computer in the city of Tacoma, outside of the university?s network. In average, the running time (including computing and communication delays) was about 5 seconds for the solution in the clear. The privacy preserving solution was evaluated in about 14s-16s. The privacy preserving solution was about 3 times slower than the solution in the clear.

We believe our solution is practical, particularly in the trusted initializer model, where correlated randomness is distributed to the parties in a setup phase.

V Conclusion

Many data-driven personalized services require that private data of users – such as user generated content, personal preferences, browsing behavior, or medical lab results – is scored with proprietary, trained machine learning models. The current widespread practice expects users to give up their privacy by sending their data in the clear to the server where the machine learning models reside. In this paper we have demonstrated that the use of secure multi-party computation techniques allows the construction of user profiles from user generated content while preserving both the privacy of the user’s data and the prediction models. The overall architecture of the VirtualIdentity application is generic and can be extended to other applications; this would involve extraction of different features and training new models for the private machine learning model bank.

References

  • [1] B. Barak, R. Canetti, J. B. Nielsen, R. Pass. Universally composable protocols with relaxed set-up assumptions. In 45th Annual Symposium on Foundations of Computer Science, pages 186–195, Rome, Italy, October 17–19, 2004. IEEE Computer Society Press.
  • [2] D. Beaver. Efficient multiparty protocols using circuit randomization. In Joan Feigenbaum, editor, Advances in Cryptology – CRYPTO’91, volume 576 of Lecture Notes in Computer Science, pages 420–432, Santa Barbara, CA, USA, August 11–15, 1991. Springer, Heidelberg, Germany.
  • [3] D. Beaver. Precomputing oblivious transfer. In D. Coppersmith, editor, Advances in Cryptology – CRYPTO’95, volume 963 of Lecture Notes in Computer Science, pages 97–109, Santa Barbara, CA, USA, Aug. 27–31, 1995. Springer, Heidelberg, Germany.
  • [4] D. Beaver. Commodity-based cryptography (extended abstract). In

    29th Annual ACM Symposium on Theory of Computing

    , pages 446–455, El Paso, Texas, USA, May 4–6, 1997. ACM Press.
  • [5] R. Bost, R. A. Popa, S. Tu, S. Goldwasser. Machine learning classification over encrypted data. In ISOC Network and Distributed System Security Symposium – NDSS 2015, San Diego, California, USA, Feb. 8–11, 2015. The Internet Society.
  • [6] R. Canetti. Universally composable security: A new paradigm for cryptographic protocols. In 42nd Annual Symposium on Foundations of Computer Science, pages 136–145, Las Vegas, Nevada, USA, October 14–17, 2001. IEEE Computer Society Press.
  • [7] R. Canetti, M. Fischlin. Universally composable commitments. In Joe Kilian, editor, Advances in Cryptology – CRYPTO 2001, volume 2139 of Lecture Notes in Computer Science, pages 19–40, Santa Barbara, CA, USA, August 19–23, 2001. Springer, Heidelberg, Germany.
  • [8] R. Canetti, Y. Lindell, R. Ostrovsky, A. Sahai. Universally composable two-party and multi-party secure computation. In 34th Annual ACM Symposium on Theory of Computing, pages 494–503, Montréal, Québec, Canada, May 19–21, 2002. ACM Press.
  • [9] F. Celli, E. Bruni, B. Lepri, “Automatic Personality and Interaction Style Recognition from Facebook Profile Pictures,” Proc. of ACMMM2014.
  • [10] M. De Cock, R. Dowsley, C. Horst, R. Katti, A. C. A. Nascimento, S. C. Newman, W. Poon.

    Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation.

    IEEE Transactions on Dependable and Secure Computing, 2017.
  • [11] M. D. Cock, R. Dowsley, A. C. Nascimento, S. C. Newman.

    Fast, privacy-preserving linear regression over distributed datasets based on pre-distributed data.

    In

    Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security

    , AISec ’15, pages 3–14, New York, NY, USA, 2015. ACM.
  • [12] M. Coltheart, “The MRC Psycholinguistic Database”, Quarterly Journal of Experimental Psychology, Vol. 33A, p. 497–505, 1981.
  • [13] C. Cortes, V. Vapnik, “Support-vector networks”, Machine Learning, Vol. 20(3), p. 273-297, 1995.
  • [14] P. T. Costa, R. R. McCrae, “The Revised NEO Personality Inventory (neo-pi-r),” in The SAGE Handbook of Personality Theory and Assessment, Thousand Oaks, CA, SAGE Publications Inc, 2008, p. 179-198.
  • [15] B. David, R. Dowsley, J. van de Graaf, D. Marques, A. C. A. Nascimento, A. C. B. Pinto. Unconditionally secure, universally composable privacy-preserving linear algebra. Information Forensics and Security, IEEE Transactions on, 11(1):59–73, Jan 2016.
  • [16] B. M. David, R. Dowsley, R. Katti, A. C. A. Nascimento. Efficient unconditionally secure comparison and privacy-preserving machine learning classification protocols. In M. H. Au and A. Miyaji, editors, ProvSec 2015: 9th International Conference on Provable Security, volume 9451 of Lecture Notes in Computer Science, pages 354–367, Kanazawa, Japan, Nov. 24–26, 2015. Springer, Heidelberg, Germany.
  • [17] N. Döttling, D. Kraschewski, J. Müller-Quade. Unconditional and composable security using a single stateful tamper-proof hardware token. In Yuval Ishai, editor, TCC 2011: 8th Theory of Cryptography Conference, volume 6597 of Lecture Notes in Computer Science, pages 164–181, Providence, RI, USA, March 28–30, 2011. Springer, Heidelberg, Germany.
  • [18] R. Dowsley. Cryptography Based on Correlated Data: Foundations and Practice. Doctoral Thesis, Karlsruhe Institute of Technology, 2016.
  • [19] R. Dowsley, J. van de Graaf, D. Marques, A. C. A. Nascimento. A two-party protocol with trusted initializer for computing the inner product. In Y. Chung and M. Yung, editors, WISA 10: 11th International Workshop on Information Security Applications, volume 6513 of Lecture Notes in Computer Science, pages 337–350, Jeju Island, Korea, Aug. 24–26, 2010. Springer, Heidelberg, Germany.
  • [20] R. Dowsley, J. van de Graaf, J. Müller-Quade, A. C. A. Nascimento. On the composability of statistically secure bit commitments. Journal of Internet Technology, 14(3):509–516, 2013.
  • [21] R. Dowsley, J. Müller-Quade, A. C. A. Nascimento. On the possibility of universally composable commitments based on noisy channels. In André Luiz Moura dos Santos and Marinho Pilla Barcellos, editors, Anais do VIII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, SBSEG 2008, pages 103–114, Gramado, Brazil, September 1–5, 2008. Sociedade Brasileira de Computação (SBC).
  • [22] R. Dowsley, J. Müller-Quade, T. Nilges. Weakening the isolation assumption of tamper-proof hardware tokens. In Anja Lehmann and Stefan Wolf, editors, ICITS 15: 8th International Conference on Information Theoretic Security, volume 9063 of Lecture Notes in Computer Science, pages 197–213, Lugano, Switzerland, May 2–5, 2015. Springer, Heidelberg, Germany.
  • [23] R. Dowsley, J. Müller-Quade, A. Otsuka, G. Hanaoka, H. Imai, A. C. A. Nascimento. Universally composable and statistically secure verifiable secret sharing scheme based on pre-distributed data. IEICE Transactions, 94-A(2):725–734, 2011.
  • [24] E. Eidinger, R. Enbar, T. Hassner, “Age and Gender Estimation of Unfiltered Faces,” IEEE Trans. Inf. Forensic Secur., Vol. 9(12), p. 2170-2179, 2014.
  • [25] G. Farnadi, G. Sitaraman, S. Sushmita, F. Celli, M. Kosinski, D. Stillwell, S. Davalos, M.F. Moens, M. De Cock, “Computational Personality Recognition in Social Media”, User Model. User-Adapt. Interact., 2016.
  • [26] H. Han, C. Otto, A.K. Jain, “Age Estimation from Face Images: Human vs. Machine Performance”, in Proc. 6th IAPR International Conference on Biometrics (ICB), 2013.
  • [27] D. Hofheinz, J. Müller-Quade, D. Unruh. Universally composable zero-knowledge arguments and commitments from signature cards. In In Proceedings of the 5th Central European Conference on Cryptology MoraviaCrypt 2005, 2005.
  • [28] IBM Watson Developer Cloud, “Personality Insights,” https://personality-insights-livedemo.mybluemix.net/ [Accessed 29 4 2016].
  • [29] Y. Ishai, E. Kushilevitz, S. Meldgaard, C. Orlandi, A. Paskin-Cherniavsky. On the power of correlated randomness in secure computation. In A. Sahai, editor, TCC 2013: 10th Theory of Cryptography Conference, volume 7785 of Lecture Notes in Computer Science, pages 600–620, Tokyo, Japan, Mar. 3–6, 2013. Springer, Heidelberg, Germany.
  • [30] J. Katz. Universally composable multi-party computation using tamper-proof hardware. In Moni Naor, editor, Advances in Cryptology – EUROCRYPT 2007, volume 4515 of Lecture Notes in Computer Science, pages 115–128, Barcelona, Spain, May 20–24, 2007. Springer, Heidelberg, Germany.
  • [31] M. Kosinski, Y. Bachrach, P. Kohli, D. Stillwell, T. Graepel, “Manifestations of User Personality in Website Choice and Behaviour on Online Social Networks,” Mach. Learn., Vol. 95(3), p. 357-380, 2013.
  • [32] P. Laud, J. Randmets. A domain-specific language for low-level secure multiparty computation protocols. In I. Ray, N. Li, and C. Kruegel:, editors, ACM CCS 15: 22nd Conference on Computer and Communications Security, pages 1492–1503, Denver, CO, USA, Oct. 12–16, 2015. ACM Press.
  • [33] S. Laur, H. Lipmaa,T. Mielikainen, “Cryptographically Private Support Vector Machines,” Proc. ACM SIGKDD2006.
  • [34] Learn OpenCV: Facial Landmark Detection, http://www.learnopencv.com/facial-landmark-detection/ [Accessed 3 May 2016].
  • [35] F. Mairesse, M. A. Walker, M. R. Mehl, R. K. Moore, “Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text,” J. Artif. Intell. Res., Vol. 30, p. 457-500, 2007.
  • [36] Merantix & Blinq, “Let Artificial Intelligence Guess your Attractiveness and Age,” http://howhot.io/ [Accessed 29 April 2016].
  • [37] Microsoft Cognitive Services, https://how-old.net/ [Accessed 29 April 2016].
  • [38] S. Mohammad, X. Zhu, J. Martin, “Semantic Role Labeling of Emotions in Tweets”, Proc. of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, 2014.
  • [39]

    OpenCV (open source computer vision),

    http://opencv.org/ [Accessed 3 May 2016].
  • [40] PAN Evaluation Lab on Uncovering Plagiarism, Authorship, and Social Software Misuse, http://pan.webis.de/ [Accessed 02 May 2016].
  • [41] C. Peikert, V. Vaikuntanathan, B. Waters. A framework for efficient and composable oblivious transfer. In David Wagner, editor, Advances in Cryptology – CRYPTO 2008, volume 5157 of Lecture Notes in Computer Science, pages 554–571, Santa Barbara, CA, USA, August 17–21, 2008. Springer, Heidelberg, Germany.
  • [42] J.W. Pennebaker, L.A. King, “Linguistic Styles: Language Use as an Individual Difference”, J. Pers. Soc. Psychol., Vol. 77(6), p. 1296-1312, 1999.
  • [43] R. Rothe, R. Timofte, L. Van Gool, “DEX: Deep EXpectation of apparent age from a single image”, ICCV, ChaLearn Looking at People workshop, December, 2015.
  • [44] Y.R. Tausczik, J.W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods”, J. Lang. Soc. Psychol., Vol. 29, p. 24–54, 2010.
  • [45] R. Tonicelli, A. C. A. Nascimento, R. Dowsley, J. Müller-Quade, H. Imai, G. Hanaoka, and A. Otsuka. Information-theoretically secure oblivious polynomial evaluation in the commodity-based model. International Journal of Information Security, 14(1):73–84, 2015.
  • [46] S. Wang, W. S. Poon, G. Farnadi, C. Horst, K. Thompson, M. Nickels, A. C. A. Nascimento, M. De Cock. VirtualIdentity: Privacy preserving user profiling 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1434–1437, 2016.