Unveiling Real-Life Effects of Online Photo Sharing

Social networks give free access to their services in exchange for the right to exploit their users' data. Data sharing is done in an initial context which is chosen by the users. However, data are used by social networks and third parties in different contexts which are often not transparent. We propose a new approach which unveils potential effects of data sharing in impactful real-life situations. Focus is put on visual content because of its strong influence in shaping online user profiles. The approach relies on three components: (1) a set of concepts with associated situation impact ratings obtained by crowdsourcing, (2) a corresponding set of object detectors used to analyze users' photos and (3) a ground truth dataset made of 500 visual user profiles which are manually rated for each situation. These components are combined in LERVUP, a method which learns to rate visual user profiles in each situation. LERVUP exploits a new image descriptor which aggregates concept ratings and object detections at user level. It also uses an attention mechanism to boost the detections of highly-rated concepts to prevent them from being overwhelmed by low-rated ones. Performance is evaluated per situation by measuring the correlation between the automatic ranking of profile ratings and a manual ground truth. Results indicate that LERVUP is effective since a strong correlation of the two rankings is obtained. This finding indicates that providing meaningful automatic situation-related feedback about the effects of data sharing is feasible.



page 13

page 15

page 16


Efficient Quantification of Profile Matching Risk in Social Networks

Anonymous data sharing has been becoming more challenging in today's int...

User profiles matching for different social networks based on faces embeddings

It is common practice nowadays to use multiple social networks for diffe...

Profile Matching Across Unstructured Online Social Networks: Threats and Countermeasures

In this work, we propose a profile matching (or deanonymization) attack ...

Content-Based Spam Filtering on Video Sharing Social Networks

In this work we are concerned with the detection of spam in video sharin...

Real-Time System for Human Activity Analysis

We propose a real-time human activity analysis system, where a user's ac...

A Personalized Recommender System for Pervasive Social Networks

The current availability of interconnected portable devices, and the adv...

Active Object Localization in Visual Situations

We describe a method for performing active localization of objects in in...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ubiquitous use of online social networks (OSNs) shows that their services are appealing to users. Most OSNs implement a business model in which access is free in exchange for user data monetization [11]. The downside of such a model is that its economic performance is correlated to the level of detail of user profiles that are inferred from raw data. Intrusiveness is likely to grow with the wide usage of AI techniques to infer actionable information from users’ shared data. Automatic inferences happen in the back-end of OSNs or of associated third parties and aren’t transparent for users. As a consequence, data can be used in ways which are not obvious to the users. The Snowden revelations on Internet surveillance [33] and the Cambridge Analytica data breach [6] are well known and striking examples of such uses. Equally interesting is the Chinese Social Credit System, whose main objective is to improve social trust by crunching both offline and online data. While approved by a majority of Chinese citizens [25], this system raises very serious concerns in terms of the development of an ubiquitous surveillance state [29]. The debates triggered by these situations raise awareness about the risks and opportunities associated to data sharing.

The proposed approach contributes to these debates by unveiling how data can affect users real lives. It increases awareness by linking data sharing to impactful situations such as searching for a job, an accommodation or a bank credit. Photos are in focus because they constitute a large part of the data shared on OSNs and contribute strongly to shaping user profiles [2]. We propose a method which automatically predicts a rating of a visual user profile in a given situation by exploiting three main components. First, each situation is modeled as a set of concepts with associated impact ratings obtained through crowdsourcing. Second, object detectors are trained for these concepts and used to analyze photos. Third, a dedicated dataset is created by manually rating user profiles in each situation. 500 visual profiles, each made of 100 photos, are evaluated by nine human annotators. The proposed method, named from LEarning to Rate Visual User Profiles, learns attempts to learn a ranking of user profiles which is similar to that by aggregating human profile ratings. training is driven by a new descriptor which combines concept impact ratings and object detections. Equally important, the contributions of concepts with high ratings are boosted in order to mimic the way humans assess photographic content. We compare manual and automatic rankings of user profile ratings and obtain a positive correlation between them. This result holds promise to help users better understand the effects of online data sharing and, ultimately, to let them gain better control of their data.

2 Related work

Our work is motivated by: (1) the role of human and/or technical biases in decision-making processes based on user data, (2) examples of relevant social science and machine learning studies in the area, and (3) computer vision methods that infer actionable information from visual data.

The main promise of OSNs is to connect people and allow them to exchange information within affinity-based networks. While this type of service has a strong appeal, participation can have positive and negative effects, depending on the way the shared information is interpreted in different contexts [5]. This interpretation process is influenced by human and/or technical biases. Human biases fall into two main categories that are often studied in relation to demographic factors such as gender or ethnicity [54]. Implicit biases [18] might influence one’s decisions without that person being conscious of them. Explicit biases [9] are assumed and used intentionally. Implicit and explicit biases, as well as technical imperfections, contribute to the occurrence of different algorithmic biases. The partial mapping of complex real-life processes into computer systems [16] is a first source of bias. For instance, the introduction of deep architectures brought important progress in content classification [8, 27] but accuracy is still affected by internal representation limitations [35]. A second important bias is due to the uneven class prediction accuracy. Such biases are determined by inherent difficulty of these concepts, such as visual object scale [7]

, and/or the availability of skewed 

[51] and imbalanced data [21]. While they include biases, approaches such ours are needed to make AI-powered decision-making processes more transparent to users. Here, biases are due to the use of concept impact and visual profile ratings in situations and of automatic object detectors. Impact and profile ratings are potentially biased insofar they encode the opinions of the persons involved in the experiments. Object detection is biased because only a part of potentially impactful concepts are modeled for each situation and because deep detectors are imperfect.

Below, we illustrate contexts in which users’ lives are influenced by their online activity. The authors of [1] and  [34] create fictitious Facebook profiles in which they vary only one type of personal data to assess its influence during job search. The authors of [1] find no significant discrimination due family structure and sexual orientation, while a negative effect is elicited for radical religious stance. The supposed origin of the user is found to have a significant effect on the number of replies a person gets to a job application [34]. The chances to obtain short-term accommodation online are influenced by the assumed racial origin [14]. Rather accurate creditworthiness is automatically obtained in [12] based on one’s interests but also from the analysis of the list of friends. These studies motivate our effort to unveil real-life consequences of data sharing.

The prediction of user traits from shared data received a lot of attention in the last decade. The threats induced by geolocation mining were studied in [15]. In [24], Facebook likes were exploited to predict sexual orientation, political opinions, race and personality traits. This last aspect was studied in more depth in [37], where the authors proposed mine personality traits based on the language used online. A hierarchical organization of privacy aspects was introduced in [38], along with methods which provide accurate automatic trait prediction. The authors of [13] implemented an instructional awareness system which provides feedback about content whose publishing might be harmful for the users. These works explore the use of different types of shared data in order to predict privacy aspects. However, they do not provide a systematic way to map predictions to different real-life situations and do not focus on visual data

The understanding of the effects of visual content sharing was pioneered by [2]

, with the introduction of disclosure dimensions such as security, identity and convenience. The study concludes that user feedback should provide warnings to prevent mistakes, inform about the effects of data aggregation and estimate the appropriate audience. The authors of 


proposed used hand-crafted visual features to predict the privacy status of an image. Results were encouraging but far from practical usability. Transfer learning from generalist deep models as a way to improve privacy prediction was tested in 

[47]. An important step forward was made in [36], with the creation of a taxonomy of privacy-related attributes and of dataset dedicated to privacy prediction. Interestingly, the resulting model provided more consistent predictions compared to users’ judgments, indicating that users might fail to follow their own privacy-related preferences. A multimodal prediction model which mixes visual content and tags is introduced in [52]. Performance is improved by exploiting predictions from neighboring photos from the user’s stream. These approaches are relevant insofar they focus on improving users’ control over shared data by predicting the privacy for individual images. Our approach is different because we aggregate predictions at user profile levels in relation to impactful real-life situations.

Image analysis is central here because it extracts actionable information from users’ photos. There is a choice to be made between deep learning based image classification and detection. Classification 

[20, 26] provides global labels for each image, while detection [30, 41] delineates image regions which contain specific objects. Detection is better suited here since a wide majority of useful information is conveyed by localizable objects which might be missed in classification. Object detection witnessed the proposal of increasingly accurate methods [19, 41, 42, 43]. However, the most accurate models are often too complex for edge computing. This is important insofar one objective here is to inform users about the potential effects of sharing before it is done, on their smartphones. More compact models which search a trade-off between performance and complexity were proposed in [17, 40, 45]. Consequently, we compare models which are usable on smartphone and are either generic [43] or specifically designed for edge computing [45]. Note that the integration of other detection models is easy and could further boost its performance.

3 Proposed method

We introduce , a method which learns to rate visual user profiles in real-life situations. The problem can be formalized with the following concepts and notations: an user whose profile is rated; the set of photographs of defined as and analyzed automatically to determine the rating; a set of visual concepts ; an object detector which searches for the detectable object of the concept in the photo; a situation model defined by set of rated visual concepts via crowdsourcing ; a set of visual profiles for users for which manual profile ratings are collected by crowdsourcing; an automatic profile rating which evaluates the visual profile of in situation .

3.1 Crowdsourcing Situation Ratings

The same concepts might be interpreted differently across situations and the effect of sharing images of them will vary accordingly. We model a situation via the ratings of a series of concepts associated to it. Ratings of visual objects from in four situations are obtained through crowdsourcing. Situations which might have serious consequences if users are confronted to them were selected: accommodation search (ACC below), bank loan demand (BANK), job search as IT engineer (IT) and job search as a waitress/waiter (WAIT). The first two situations are applicable to a large part of the population. The last two are relevant for specific population segments, but the respective job searches require different skills. It is interesting to assess the differences between them. Detectable objects from the OpenImages [28]

, ImageNet 


and COCO 

[31] dataset were rated. A dedicated rating interface is created and it includes for each situation: the concept name, a series of illustrative thumbnails and a 7-points Likert scale with ratings between -3 (strongly negative influence) to +3 (strongly positive influence). Ratings were provided by 14 participants for each situation and the final rating is obtained by averaging individual scores provided by participants. The final rating is obtained by averaging scores from all participants who rated the situation. The resulting detection dataset includes 269 objects with in at least one of the four situations. Inter-rater agreement, which is important for task which are prone to bias such as the one proposed here, is computed using the average deviation index ([4]. The obtained varies between for and for . These values are well below , the maximum acceptable value for a 7-points Likert scale defined in [3].

mean 0.03 -0.13 0.09 0.27
std 0.7 0.68 0.58 0.6
Table 1: Rating statistics for the four modeled situations.

Table 1 summarizes the rating statistics for the four situations. The lowest mean rating (-0.13) is obtained for BANK. This illustrates the tendency of participants to be stricter when deciding about a bank loan than elsewhere. The finding is intuitive because granting a loan has tangible monetary consequences which are easily internalized by raters. A large number of positive ratings and the highest mean rating (0.27) are observed in WAIT. The result is intuitive since WAIT has less serious implications than BANK or ACC. The relatively low number of strong ratings indicates that the dataset should be enriched with more concepts which are highly rated. Such an enrichment would allow a finer-grained computation of visual user profile rating.

Figure 1: Rating patterns (columns) obtained by clustering concepts with similar ratings across situations (lines). The pattern name and the corresponding number of concepts are provided under each column. Rating colors go from red (strongly negative) to green (strongly positive) with stronger intensity indicating a higher absolute value of rating.

A clustering of concepts based on the similarity of their rating variants across situations is presented in Figure 2

. It illustrates the 40 discovered patterns (cluster centroids) which yield the lowest silhouette criterion under the use of K-means. We note that averaged negative ratings are stronger since they reach -3 in some cases, while averaged positive ratings have a maximum value of 1.15. Consistent with the statistics from Table 

1, BANK and ACC have a larger number of negatively rated concepts compared to IT and WAIT. While the negative range is stronger, a majority of concepts has low positive scores. Patterns P27, P30 and P32 include 31, 27 and 21 concepts, respectively. Interestingly, the ratings of some concepts vary a lot from one situation to another. For instance, P35, P39 and P23 are positive for WAIT but neutral or even negative elsewhere. These patterns include concepts related to junk food and alcohol. Patterns P3, P2 and P1, which include concepts related to weapons, are rated negatively everywhere.

The results from Table 1 and Figure 1 confirm that concept ratings differ across situations, leading to different perceptions of the same profile. More details about concept crowdsourcing are given in the supplementary material.

3.2 Focal rating

The frequency of patterns from Figure 1 shows that a majority of visual concepts have low ratings. Such concepts are likely to constitute a majority of the valid detections made in the images and they might overwhelm the less numerous but more significant highly-rated detections. It is thus important to boost the influence of highly-rated concepts. In Eq. 1, we introduce a function which prioritizes concepts with high impact ratings:


where: and are the parameters which control the strength of the focal rating. This function is inspired by attention mechanisms [53] which were already used to improve performance of deep learning applications, such as object detection [30]. It is necessary to have in order to preserve the sign of after scaling with . will have little influence on concepts which have low initial . The higher the absolute value of is, the more it will be boosted by Eq. 1. The effect of focal rating is illustrated in the supplementary material. Eq. 1 is applicable to any profile rating method which exploits .

Figure 2: Rating patterns (columns) obtained by clustering visual user profiles with similar ratings across situations (lines). The pattern name and the corresponding number of concepts are provided under each column. Rating colors go from red (strongly negative) to green (strongly positive) with stronger intensity indicating a higher absolute value of rating.

3.3 Crowdsourcing Visual Using Profile Ratings

Visual profile ratings are necessary to tune and evaluate . We collect manual ratings for users in situation via crowdsourcing. Similar to object rating, visual profiles are evaluated using a 7-points Likert scale which goes from -3 (strongly unappealing) to +3 (strongly appealing). Ratings are collected from 9 participants for 500 users from the YFCC dataset [50] and 100 images per profile. The images of each visual profile are shown on a single page, along with the possible situation rating. Participants are asked to look at all the photos and provide a global score for each user in each situation. Profiles were presented randomly to participants so as to avoid any ordering bias. Similar to Subsection 3.1, inter-rater agreement is analyzed using the index [4]. The obtained values are for ACC, for BANK, for IT and for WAIT. These values are within the acceptability bounds defined in [3] (). We note however that there is more disagreement between annotators compared to concept rating. This occurs because profile rating is at the same time more subjective and more complex than concept rating.

The same clustering experience in Subsection 3.1 applied to visual profiles, based on the similarity of their rating variants across situations, is presented in Figure 2. There are 57 patterns discovered for the 500 profiles in the dataset. A first observation is that, unlike the concept rating patterns from Figure 1, positive ratings are in majority here and positive ratings are stronger. Most populous patterns are positive with 24 users in P33, 21 in P41 and 19 in P40, the only exception being P10 with 18 users. We note that the ratings of the same users vary significantly from a situation to another. Our hypothesis is that profile interpretation is context-dependent is confirmed. Patterns such as P26, P27, P28 are rated positively in some situations and negatively in others. This finding supports the creation of per-situation rating predictors. More details about profile ratings crowdsourcing are given in the supplementary material.

3.4 Baseline Rating of Visual User Profile

The objective is to obtain a reliable estimation of profile ratings in the modeled situations by exploiting: (1) object ratings from Subsection 3.1, (2) detections proposed by a visual object detector and (3) the set of visual profiles rated in Subsection 3.3. Given a photo , detectors search for all objects from . By default, scores are predicted for all detectable objects. In practice, a threshold is needed to decide if an object is actually present. We define:


where: is the detector for concept which evaluates the objectness of the raw prediction in photo of the concept; is a detection threshold for .

The threshold value in Eq. 2 can be set to a single value for all objects. The best single value can be learned by maximizing correlation between automatic and manual profile ratings over a validation set . This procedure is simple but does not take into account the performance variations for the visual objects available in .

Instead, we implement a filtering based attribute selection mechanism [39]. Given a situation , its objective is double: (1) determine optimal thresholds for individual detectors and (2) select only the individual detectors which are most relevant in context. We start by calculating:


where: is the optimal correlation between manual ranking of visual profiles in and automatic based on each taken alone; is the detection threshold associated to , is detection threshold which ranges from 0.01 to 1 and is tested with a step of 0.01; is Pearson correlation coefficient; and are the sets of automatic and manual rankings of user profile ratings from obtained for each tested.

Eq. 3 optimizes the correlation between automatic and manual user ratings when a single detector is activated. If several values of maximize , the smallest of them is used to keep more detectors . The outputs of Eq. 3 need to be aggregated in order to decide which subset of is best in . The optimal is selected with:


where: is a correlation value between -1 and 1, tested with a step of 0.01; is a subset of in which only a subset of objects detectors are activated, and the , and are the same as in Eq. 3.

Eq. 4 provides the correlation threshold which selects the best subset of detectors for visual profiles in . Individual detectors from are activated with:


High values of create a sparse . Only highly relevant detectors which provide strong correlations for are activated. The disadvantage of using a high value of is that the coverage ensured by a limited is reduced and only a subset of profiles can be reliably characterized. Inversely, low values of lead to a dense . They ensure a large coverage of profiles at the expense of the relevance of individual detectors. Eq. 5 finds a compromise between the relevance of individual detectors and dataset coverage.

The computation of user profile rating in is based on a combination of concept ratings for detections from obtained with Eq. 5. We define:


where: is the total number of detections of filtered using Eq. 5, - the cardinality of the subset of photos with at least one visual detector activated.

The denominator is used to produce an averaged profile rating. It facilitates the comparison of visual user profiles that include a variable quantity of images. For instance, assuming that the numerator gives the same result for two users, profile rating will be lower for the one which has a smaller number of associated images. In this way, priority is given to profiles which include fewer detected objects but with more salient ratings .

3.5 Learning to Rate Visual User Profiles

The baseline for user rating presented in Subsection 3.4

is a simple way to aggregate the components of our method and does not exploit them fully. We assume that using a supervised learning approach is a better way to rate visual user profiles in the modeled situations. The proposed method builds on the baseline and includes an image-level descriptor, an attention mechanism which boost highly rated concepts and a model which performs the final prediction.

3.5.1 Image-level descriptor

Individual images are a core factor in the manual rating of user profiles. It is thus interesting to aggregate individual object detections at image level. Such a descriptor is equally interesting insofar it can be used to present easily understandable feedback about the contribution of each image to the predicted rating. The descriptor includes three attributes which are defined as follows:


where: is the filtered objectness of the detection with the optimal threshold estimated in Equation 3 for each concept , is total number of detections, is the rating of . , and represent the positiveness, negativeness and confidence attributes of the image; is the Iverson bracket, valued 1 if the inner condition is true and zero otherwise. Note that Equation 7 is applicable when at least a valid detection exists in the image. Otherwise, the image is not considered in .

and are designed to favor images which include concepts with a strong impact ratings. The higher the absolute values of are on average, the more salient and will be. gives an average of the valid detection scores from the image. It will favor images which include high confidence object detections over the others.

for  in  do
       for  in  do
TrainClusteringModel(F) for  in  do
       for cluster in clusters do
Algorithm 1 User Profile Rating Descriptor

3.5.2 User-level descriptor

The image-level descriptors need to be aggregated at user level in order to imitate the way humans rate visual user profiles. This process is challenging because visual concepts with different ratings appear in isolation or jointly in one or several of the images which compose the profile. The method which produces the user-level descriptor is described in Algorithm 1. First, we construct , a set of image descriptors for each user . The sets of users are aggregated into which is exploited to train a clustering model . is subsequently exploited to infer clusters

which group together patterns in the underlying structure of each user’s profile. The K-means algorithm is chosen for its effectiveness and simplicity; K is set equal to four clusters. Other values were tried but did not improve the obtained results. The mean and variance of user’s image descriptors from different clusters are concatenated in a final feature vector


constitutes a better representation of the user profile compared to raw use of concept ratings and object detections. It captures in a compact form patterns from an initial high-dimensional space defined by a the array of object detectors and a large quantity of images which would be affected by the curse of dimensionality 

[22]. The proposed descriptor is an alternative to classical dimensionality reduction techniques [46, 55]. A comparison to the two forms of raw representation is proposed.

3.5.3 Lervup training

Visual user profile rating is mathematically modeled as a regression problem which exploits the user-level descriptor. The dimensionality of this vector depends on the number of clusters used in Algorithm 1. The training of

is deployed on a pipeline process. First, individual object detections are validated within each image. Second, the image-level descriptor is constructed per image. Third, a clustering is applied to group together similar image descriptors and discover relevant patterns for the entire training set. Fourth, the discovered patterns are concatenated to build the user descriptor. Finally, a random forest regression model is used to learn the rating of visual user profiles. Random forest was chosen because it is robust to data which contain non-linear relationships between features and target variables

[23, 48, 57].

4 Evaluation

We propose an evaluation which compares different variants of the methods proposed for automatic rating of user profiles. The profiles dataset is classically split in three subsets , and used for training, validation and testing of each method111Note that the baseline does not need training and uses the union of , for validation., with 350, 50 and 100 profiles in each subset. The optimal configuration of each method is obtained using a grid search. Details about optimized parameters and their ranges are given in the supplementary material.

4.1 Object Detection Dataset and Models

The coverage ensured by the object detection dataset is important since a variety of objects can influence user profile ratings. To maximize coverage, we merge three existing datasets: OpenImages [28]

, ImageNet 

[44] and COCO [31]

. Whenever an object is present in more than one dataset, we select data equally between the sources in order to reduce dataset biases. The resulting dataset includes 269 objects and 137976 images. We limit imbalance by retaining at most 1000 images per object. The average and standard deviation of the distribution are 513 and 305, respectively.

We train object detectors with mobile and generic models to compare their performance. The mobile model is based on MobileNetV2 [45] with depthwise convolutions which offer a good precision/speed tradeoff. The detection head is a Single Shot MultiBox Detector [32], a fast single-stage method which is adapted for edge computation. The generic model uses Inception-ResNet-v2 [49] with atrous convolutions. Detection is done with a Faster RCNN module [43]. While not designed specifically for mobile devices, preliminary tests showed that it is usable on recent Android smartphones. The first model is abbreviated as MOBI and the second as RCNN. Details about detector training are provided in the supplementary material.

4.2 Methods

We test the following variants of the proposed methods:

  • - ranking based on Eq. 6 with a unique detection threshold for all concepts.

  • - ranking based on Eq. 6 with detection thresholds optimized per object via Eq. 3 and concept selection from Eq. 4.

  • - version of in which ratings are replaced by focal ratings from Eq. 1.

  • - supervised learning method using random forest but with raw features obtained by using the individual object activations which are aggregated in .

  • - version of in which raw features are compressed using PCA. Results are reported for 16-dimensional vectors, the setting which provided the best global performance.

  • - supervised learning method described in Subsection 3.5 with concept ratings.

  • - supervised learning method described in Subsection 3.5 with for concept ratings.

  • - same as but learned with a total of 200 profiles for training and validation.

0.40 0.28 0.36 0.64 0.37 0.26 0.41 0.58
0.46 0.29 0.36 0.65 0.42 0.26 0.41 0.58
0.46 0.33 0.36 0.65 0.42 0.3 0.41 0.58
0.31 0.25 0.37 0.60 0.35 0.18 0.45 0.48
0.47 0.29 0.42 0.58 0.26 0.10 0.22 0.61
0.48 0.48 0.46 0.66 0.44 0.27 0.47 0.68
0.55 0.5 0.52 0.67 0.49 0.36 0.5 0.68
0.47 0.49 0.48 0.66 0.35 0.28 0.48 0.68
Table 2: Pearson correlation between automatic and manual rankings of the ratings of visual user profiles. Results are provided for the test subset of dataset using five profile rating methods and two object detectors. Best results in bold.

4.3 Results

The performance of the different methods tested is presented in Table 2. Correlations are analyzed using Cohen’s interpretation of the Pearson correlation coefficient [10]. Correlation is considered weak for values between 0.1 and 0.3, moderate between 0.3 and 0.5 and strong above 0.5. All evaluated methods provide a positive correlation between manual and automatic rankings of the profile ratings, with a wide majority of reported correlations in the moderate or strong ranges. This is a first positive result since the evaluated task is a complex one. Performance variations observed along different axes are discussed below.

The comparison of the two object detectors is globally favorable to RCNN. This result is intuitive insofar RCNN is built with a higher capacity deep network architecture. Interestingly, MOBI obtains both the best and the worst results over all methods and situations tested. MOBI is slightly better than RCNN for WAIT (0.68 vs. 0.67). This good behavior for WAIT is explained by the fact that MOBI is known to provide good detections for large objects [45]. A wide majority of positively rated concepts for WAIT are related to food, which is often photographed in close-up. However, MOBI also has the worst results by a large margin compared to RCNN. Even with the best method, the maximum correlation value obtained with MOBI for BANK is 0.36 while the corresponding value for RCNN is 0.5. These results point out that further performance improvement should be achievable with better object detectors.

The best global results are obtained with . Six out eight of the correlations provided by this method are in the strong range defined by [10], with the other in the moderate range. Except for WAIT, clearly outperforms all baselines. This finding validates the utility of the learning-based approach which models automatic profile ranking as a regression problem. is also better than , with to 9 points gained for BANK with MOBI detector. The boosting of highly-rated concepts via the attention mechanism from Eq. 1 is thus validated.

The four modeled situations have variable performance. WAIT is clearly the easiest situation (correlation up to 0.68). This happens because the detection dataset contains a large number of food and beverage-related objects and such objects are often easy to detect. WAIT approximates an upper-bound performance one can expect with the two object detectors used here. BANK is the most challenging situation tested, particularly for MOBI, which offers a maximum correlation of 0.36. While MOBI and RCNN have similar performances for BANK with the baselines, is clearly more beneficial for RCNN.

Among the baselines, is better than and . The introduction of focal rating has a smaller effect on the baseline compared to . Concept selection (Eq. 4) and the use of individual detection thresholds (Eq. 3) have a larger impact than focal rating.

Results for and are particularly interesting since these methods exploit a random forest training like . The difference is that raw user representations, complete or compressed are fed into , while exploits the proposed compact descriptor. The large difference of performance in favor of validates the relevance of the proposed descriptor.

The size of the training dataset is often crucial for the success of supervised learning methods. The comparison of results obtained with the complete training set () and with half of it () shows that adding more profiles is clearly beneficial for ACC, BANK and IT, and has a marginal or no effect for WAIT. This finding has practical significance since the effort needed to build the profiles dataset is significant. Performance saturation for a situation indicates that enough profiles were manually rated for it and the process can stop.

5 Discussion

We presented a new approach which unveils potentially serious real-life effects of online photo sharing. It is currently implemented for four situations but is easily extensible in terms of situations, types of data included, object detection models and profile rating methods. Below we discuss implications of the approach, point out its merits and limitations and discuss perspectives of improvement.

The main objective is to help users have a better understanding of the way data they share online could reflect on their real lives. We acknowledge the fact that the approach is affected by a combination of human and technical biases. However, such biases are inherent to any AI-driven computer systems and will also appear in real decision-making processes which are mimicked here. For instance, crowdsourced concept and profile ratings will reflect an average bias of the participants who provided inputs in the experiments. In a real context, where shared data is analyzed by a single person to reach a decision, the bias will be that of the person. Averaging ratings is a good way to reduce bias since any extreme individual opinions will be smoothed. In the future, it would be interesting to collect data from a larger pool of participants and cluster them in order to see if there are large rating differences between sub-communities. It is also important to note that we encode both positive and negative human biases since participants are asked to rate concepts and profiles on a symmetric scale. This approach is more balanced than that of studies whose objective is to elicit negative biases [15, 13]. Ideally, decision-making processes should be bias-free but it is realistic to assume that biases cannot be eliminated. It is thus important to act toward at least removing the most damaging of them, which are related to sensitive demographics such as ethnicity, religion or gender. This topic will be part of our future work.

Technical bias is due to imperfections in the detection model, the available data and the model. Detection model imperfection can be reduced via the use of more powerful deep detection architectures [30, 40]

. However, since the rating is most useful if done on the users’ devices, models should remain tractable at the edge. A second technical bias is due to detector availability. Three existing datasets were merged to have more detectors. They seem sufficient for WAIT, which is well mapped in the detection dataset, but probably not for the other situations. We will extend the dataset with priority given to new concepts which are highly rated in at least one situation. A third technical bias is due to data imbalance. We limited the maximum number of images per object to reduce imbalance while also preserving accuracy. Imbalance will be further reduced with new annotations for both existing and new detectors. A fourth bias is related to the focus on images. The approach is extensible to other types of shared data which might be relevant, such as likes and texts studied in 

[24] and [38]. We intend to exploit them in order to obtain more relevant and broader profile ratings. Finally, provides performance gains compared to a series of baselines. The proposed method constitutes only a first attempt to tackle the profile rating and important improvements over it are possible.

It is important to keep in mind that the proposed approach does not make decisions in the modeled situations. is implemented to mimic real decision-making processes and make them more transparent to users. A key problem regarding transparency is related to the explainability of the detection and prediction models. While deep models explainability remains challenging [56], they are widely used in real applications. Any attempt to mimic such applications should thus include them. learning is easier to explain from a technical perspective but still requires a fair amount of machine learning related knowledge. The core question is what kind of explanations are most useful for the users. Our approach allows the proposal of understanable feedback about the contribution of individual images to profile ratings. Within images, they can know which objects were detected and used during rating. Future work will include user experiments to get their feedback about the types of desirable explanations.

The code and data used in this paper will be made public. This is important to ensure the transparency of the approach itself and to encourage further research in the area.


  • [1] A. Acquisti and C. Fong (2020) An experiment in hiring discrimination via online social networks. Management Science 66 (3), pp. 1005–1024. External Links: Document Cited by: §2.
  • [2] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. Nair (2007) Over-exposed?: privacy patterns and considerations in online and mobile photo sharing. In Proceedings of the 2007 Conference on Human Factors in Computing Systems, CHI 2007, San Jose, California, USA, April 28 - May 3, 2007, M. B. Rosson and D. J. Gilmore (Eds.), pp. 357–366. External Links: Link, Document Cited by: §1, §2.
  • [3] M. J. Burke and W. P. Dunlap (2002) Estimating interrater agreement with the average deviation index: a user’s guide. Organizational research methods 5 (2), pp. 159–172. Cited by: §3.1, §3.3.
  • [4] M. J. Burke, L. M. Finkelstein, and M. S. Dusig (1999) On average deviation indices for estimating interrater agreement. Organizational Research Methods 2 (1), pp. 49–68. Cited by: §3.1, §3.3.
  • [5] M. Burke, J. Cheng, and B. de Gant (2020) Social comparison and facebook: feedback, positivity, and opportunities for comparison. In CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020, R. Bernhaupt, F. ’. Mueller, D. Verweij, J. Andres, J. McGrenere, A. Cockburn, I. Avellino, A. Goguey, P. Bjøn, S. Zhao, B. P. Samson, and R. Kocielnik (Eds.), pp. 1–13. External Links: Link, Document Cited by: §2.
  • [6] C. Cadwalladr and E. Graham-Harrison (2018) Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach. The guardian 17, pp. 22. Cited by: §1.
  • [7] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos (2016)

    A unified multi-scale deep convolutional neural network for fast object detection

    In European conference on computer vision, pp. 354–370. Cited by: §2.
  • [8] D. CireşAn, U. Meier, J. Masci, and J. Schmidhuber (2012)

    Multi-column deep neural network for traffic sign classification

    Neural networks 32, pp. 333–338. Cited by: §2.
  • [9] J. A. Clarke (2018) Explicit bias. Nw. UL Rev. 113, pp. 505. Cited by: §2.
  • [10] J. Cohen (2013) Statistical power analysis for the behavioral sciences. Academic press. Cited by: §4.3, §4.3.
  • [11] K. Curran, S. Graham, and C. Temple (2011) Advertising on facebook. International Journal of E-business development 1 (1), pp. 26–33. Cited by: §1.
  • [12] S. De Cnudde, J. Moeyersoms, M. Stankova, E. Tobback, V. Javaly, and D. Martens (2015) Who cares about your facebook friends? credit scoring for microfinance. Technical report University of Antwerp. Cited by: §2.
  • [13] N. E. Díaz Ferreyra, R. Meis, and M. Heisel (2017-08) Online Self-disclosure: From Users’ Regrets to Instructional Awareness. In 1st International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Eds.), Machine Learning and Knowledge Extraction, Vol. LNCS-10410, Reggio, Italy, pp. 83–102. Note: Part 2: MAKE Smart Factor External Links: Link, Document Cited by: §2, §5.
  • [14] B. Edelman, M. Luca, and D. Svirsky (2017) Racial discrimination in the sharing economy: evidence from a field experiment. American Economic Journal: Applied Economics 9 (2), pp. 1–22. Cited by: §2.
  • [15] G. Friedland and J. Choi (2011) Semantic computing and privacy: a case study using inferred geo-location. Int. J. Semantic Computing 5 (1), pp. 79–93. External Links: Link, Document Cited by: §2, §5.
  • [16] B. Friedman and H. Nissenbaum (1996) Bias in computer systems. ACM Transactions on Information Systems (TOIS) 14 (3), pp. 330–347. Cited by: §2.
  • [17] G. Ghiasi, T. Lin, and Q. V. Le (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection. In

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019

    pp. 7036–7045. External Links: Link, Document Cited by: §2.
  • [18] A. G. Greenwald and M. R. Banaji (1995) Implicit social cognition: attitudes, self-esteem, and stereotypes.. Psychological review 102 (1), pp. 4. Cited by: §2.
  • [19] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick (2017) Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2980–2988. External Links: Link, Document Cited by: §2.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. External Links: Link, Document Cited by: §2.
  • [21] C. Huang, Y. Li, C. C. Loy, and X. Tang (2016) Learning deep representation for imbalanced classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5375–5384. Cited by: §2.
  • [22] G. Hughes (1968) On the mean accuracy of statistical pattern recognizers. IEEE transactions on information theory 14 (1), pp. 55–63. Cited by: §3.5.2.
  • [23] M. Kayri, I. Kayri, and M. T. Gencoglu (2017)

    The performance comparison of multiple linear regression, random forest and artificial neural network by using photovoltaic and atmospheric data

    In 2017 14th International Conference on Engineering of Modern Electric Systems (EMES), pp. 1–4. Cited by: §3.5.3.
  • [24] M. Kosinski, D. Stillwell, and T. Graepel (2013) Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110 (15), pp. 5802–5805. External Links: Document, Link, https://www.pnas.org/content/110/15/5802.full.pdf Cited by: §2, §5.
  • [25] G. Kostka (2019) China’s social credit systems and public opinion: explaining high levels of approval. New media & society 21 (7), pp. 1565–1593. Cited by: §1.
  • [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1106–1114. External Links: Link Cited by: §2.
  • [27] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2017) Imagenet classification with deep convolutional neural networks. Communications of the ACM 60 (6), pp. 84–90. Cited by: §2.
  • [28] A. Kuznetsova, H. Rom, N. Alldrin, J. R. R. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari (2018) The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. CoRR abs/1811.00982. External Links: Link, 1811.00982 Cited by: §3.1, §4.1.
  • [29] F. Liang, V. Das, N. Kostyuk, and M. M. Hussain (2018) Constructing a data-driven society: china’s social credit system as a state surveillance infrastructure. Policy & Internet 10 (4), pp. 415–453. Cited by: §1.
  • [30] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. External Links: Link, Document Cited by: §2, §3.2, §5.
  • [31] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft COCO: common objects in context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Lecture Notes in Computer Science, Vol. 8693, pp. 740–755. External Links: Link, Document Cited by: §3.1, §4.1.
  • [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: Link, Document Cited by: §4.1.
  • [33] D. Lyon (2014) Surveillance, snowden, and big data: capacities, consequences, critique. Big data & society 1 (2), pp. 2053951714541861. Cited by: §1.
  • [34] M. Manant, S. Pajak, and N. Soulié (2019) Can social media lead to labor market discrimination? evidence from a field experiment. Journal of Economics & Management Strategy 28, pp. 225–246. Cited by: §2.
  • [35] A. Nguyen, J. Yosinski, and J. Clune (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436. Cited by: §2.
  • [36] T. Orekondy, B. Schiele, and M. Fritz (2017) Towards a visual privacy advisor: understanding and predicting privacy risks in images. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 3706–3715. External Links: Link, Document Cited by: §2.
  • [37] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E. Seligman (2015) Automatic personality assessment through social media language.. Journal of personality and social psychology 108 (6), pp. 934. Cited by: §2.
  • [38] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris (2015) PScore: A framework for enhancing privacy awareness in online social networks. In 10th International Conference on Availability, Reliability and Security, ARES 2015, Toulouse, France, August 24-27, 2015, pp. 592–600. External Links: Link, Document Cited by: §2, §5.
  • [39] T. M. Phuong, Z. Lin, and R. B. Altman (2005)

    Choosing snps using feature selection

    In 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05), pp. 301–309. Cited by: §3.4.
  • [40] Z. Qin, Z. Li, Z. Zhang, Y. Bao, G. Yu, Y. Peng, and J. Sun (2019) ThunderNet: towards real-time generic object detection on mobile devices. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 6717–6726. External Links: Link, Document Cited by: §2, §5.
  • [41] J. Redmon and A. Farhadi (2017) YOLO9000: better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 6517–6525. External Links: Link, Document Cited by: §2.
  • [42] J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. CoRR abs/1804.02767. External Links: Link, 1804.02767 Cited by: §2.
  • [43] S. Ren, K. He, R. B. Girshick, and J. Sun (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 91–99. External Links: Link Cited by: §2, §4.1.
  • [44] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. External Links: Link, Document Cited by: §3.1, §4.1.
  • [45] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) MobileNetV2: inverted residuals and linear bottlenecks. pp. 4510–4520. External Links: Link, Document Cited by: §2, §4.1, §4.3.
  • [46] B. Schölkopf, A. Smola, and K. Müller (1997)

    Kernel principal component analysis

    In International conference on artificial neural networks, pp. 583–588. Cited by: §3.5.2.
  • [47] E. Spyromitros-Xioufis, S. Papadopoulos, A. Popescu, and Y. Kompatsiaris (2016) Personalized privacy-aware image classification. In ICMR ’16 Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, United States, pp. 71–78. Note: Conference of 6th ACM International Conference on Multimedia Retrieval, ICMR 2016 ; Conference Date: 6 June 2016 Through 9 June 2016; Conference Code:122023 External Links: Link Cited by: §2.
  • [48] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston (2003) Random forest: a classification and regression tool for compound classification and qsar modeling. Journal of chemical information and computer sciences 43 (6), pp. 1947–1958. Cited by: §3.5.3.
  • [49] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi (2017)

    Inception-v4, inception-resnet and the impact of residual connections on learning


    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA

    , S. P. Singh and S. Markovitch (Eds.),
    pp. 4278–4284. External Links: Link Cited by: §4.1.
  • [50] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li (2016) YFCC100M: the new data in multimedia research. Commun. ACM 59 (2), pp. 64–73. Cited by: §3.3.
  • [51] T. Tommasi, N. Patricia, B. Caputo, and T. Tuytelaars (2017) A deeper look at dataset bias. In Domain adaptation in computer vision applications, pp. 37–55. Cited by: §2.
  • [52] A. Tonge and C. Caragea (2019) Dynamic deep multi-modal fusion for image privacy prediction. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, L. Liu, R. W. White, A. Mantrach, F. Silvestri, J. J. McAuley, R. Baeza-Yates, and L. Zia (Eds.), pp. 1829–1840. External Links: Link, Document Cited by: §2.
  • [53] W. Wang and J. Shen (2017) Deep visual attention prediction. IEEE Transactions on Image Processing 27 (5), pp. 2368–2378. Cited by: §3.2.
  • [54] J. C. Williams (2014) Double jeopardy? an empirical study with implications for the debates over implicit bias and intersectionality. Harvard Journal of Law & Gender 37, pp. 185. Cited by: §2.
  • [55] S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §3.5.2.
  • [56] C. T. Wolf (2019) Explainability scenarios: towards scenario-based xai design. In Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 252–257. Cited by: §5.
  • [57] A. M. Youssef, H. R. Pourghasemi, Z. S. Pourtaghi, and M. M. Al-Katheeri (2016) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at wadi tayyah basin, asir region, saudi arabia. Landslides 13 (5), pp. 839–856. Cited by: §3.5.3.
  • [58] S. Zerr, S. Siersdorfer, J. S. Hare, and E. Demidova (2012) Privacy-aware image classification and search. In The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12-16, 2012, W. R. Hersh, J. Callan, Y. Maarek, and M. Sanderson (Eds.), pp. 35–44. External Links: Link, Document Cited by: §2.