A visual approach for age and gender identification on Twitter

by   Miguel A. Álvarez-Carmona, et al.

The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media data is shared through a wide range of modalities (e.g., text, images and audio), representing valuable information to be exploited for extracting valuable insights from users. Nevertheless, most of the current work in AP using social media data has been devoted to analyze textual information only, and there are very few works that have started exploring the gender identification using visual information. Contrastingly, this paper focuses in exploiting the visual modality to perform both age and gender identification in social media, specifically in Twitter. Our goal is to evaluate the pertinence of using visual information in solving the AP task. Accordingly, we have extended the Twitter corpus from PAN 2014, incorporating posted images from all the users, making a distinction between tweeted and retweeted images. Performed experiments provide interesting evidence on the usefulness of visual information in comparison with traditional textual representations for the AP task.



There are no comments yet.


page 13

page 15


Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System

The rapid expansion in the usage of social media networking sites leads ...

A Comparative Analysis of Distributional Term Representations for Author Profiling in Social Media

Author Profiling (AP) aims at predicting specific characteristics from a...

Representing Social Media Users for Sarcasm Detection

We explore two methods for representing authors in the context of textua...

Gender Recognition in Informal and Formal Language Scenarios via Transfer Learning

The interest in demographic information retrieval based on text data has...

Inferring User Gender from User Generated Visual Content on a Deep Semantic Space

In this paper we address the task of gender classification on picture sh...

A Graph Approach to Simulate Twitter Activities with Hawkes Processes

The rapid growth of social media has been witnessed during recent years ...

The Secret Lives of Names? Name Embeddings from Social Media

Your name tells a lot about you: your gender, ethnicity and so on. It ha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays there is a tremendous amount of information available on the Internet. Specifically, social media domains are constantly growing thanks to the information generated by a huge community of active users. Such information is available in several modalities, including text, image, audio and video. The availability of all this information plays an important role in designing appropriate tools for diverse tasks and applications. Particularly, during recent years, the Author Profiling (AP) task has gained interest among the scientific community. AP aims at revealing demographic information (e.g., age, gender, native language, personality traits, cultural background) of authors through analyzing their written texts [17]. The AP task has a wide range of applications and it could have a broad impact in a number of problems. For instance, in forensics, profiling authors could be used as valuable additional evidence; in marketing, this information could be exploited to improve targeted advertising.

As known, social media data has a multimodal nature (e.g., text, images, audio, social interactions), however, most of the previous research on AP has been devoted to the analysis of the textual modality [7, 17, 28, 33, 40], disregarding information from the other modalities that could be potentially useful for improving the performance of AP methods. Accordingly, some works have begun to exploit distinct modalities for approaching the AP problem [8, 26, 44, 52]. The visual modality has resulted particularly interesting, mostly because it is, to some extent, language independent nature. In fact, previous work has found a relationship between images and users’ interests, opinions and thoughts [10, 11, 15, 45, 47, 51, 50].

Although visual information is particularly appealing for AP, it is just recently that some authors began to pay attention to the content of images shared by users. For example, for gender identification, some authors have exploited the information provided by the colors adopted by users in their profiles [24]. In [5] authors used state of the art face-gender recognizers over user profile pictures. Nonetheless, the most common strategy so far consist in exploiting the posting behavior, which implies the manual classification of posted images in order to analyze histograms of classes/objects posted by users [15].

Despite previous efforts for including visual information in the AP task, only the gender recognition problem has been studied, leaving the age identification problem unexplored. In addition, many of the previous research considers an scenario where manually tagged images are provided for training, resulting in an impractical and unrealistic scenario for AP systems.

In order to overcome these limitations, we present a thorough analysis on the pertinence of visual information for approaching the AP problem, targeting both, age and gender identification. Our study comprises an analysis on the discriminative capabilities of tweeted and retweeted images by users. As part of the study, a method for AP using images is proposed in this paper. The proposed method relies on a representation derived from a pre-trained convolutional neural network. Through our study, we aim to bring some light into:

i) the importance of just the visual information for solving the AP tasks (age and gender), and ii) how complementary are both textual and visual information for the age and gender identification problems.

The main contributions of this paper are as follows:

  • We built an extended multimodal version of the PAN 2014 AP corpus, a reference data set for AP in social media. For this, we incorporated all of the images from the users’ profiles contained in the original corpus.

  • We propose a method for AP from images based on state-of-the-art (CNNs) representation learning techniques, which have not been previously used for this task.

  • We propose a methodology for addressing the age identification problem using posted images in Twitter. To the best of our knowledge, this is the first effort in approaching this task by using purely visual information.

  • We provide a comparative analysis on the importance of using textual and visual information for age and gender identification.

  • We evaluate the usefulness of images in the AP task, whether they are tweeted or retweeted by the users.

The remainder of this paper is organized as follows. Section 2 reviews related work on AP using textual, visual and multimodal approaches. Section 3 describes the proposed methods for AP using images. Section 4 describes the adopted methodology for building a multimodal corpus for AP in Twitter. Section 5 presents our experimental results, which comprise quantitative and qualitative evaluation. Finally, Section 6 outlines our conclusions and future work directions.

Author Profiling subtasks
Approaches Gender Age Personality Interests Others
Textual [2, 3, 4, 7, 9, 13, 14, 17, 23, 27, 31, 32, 33, 36, 39, 40, 46] [2, 3, 4, 13, 23, 28, 29, 31, 33, 36, 40] [22, 21] [20, 34] sentiment anal. [37]
Visual [5, 15, 25, 41, 50] [10, 11, 24, 45] [47, 51] retweet predict. [8]
Multimodal [26, 44] sentiment anal. [52]
Table 1: State-of-art methods applied to author profiling.

2 Related Work

According to the literature, AP in social media has two main subtasks: age and gender detection (see [2, 3, 4, 7, 9, 13, 14, 17, 23, 27, 31, 32, 33, 36, 39, 40, 46] for gender, and [2, 3, 4, 13, 23, 28, 29, 31, 33, 36, 40] for age). Related tasks include personality prediction [22, 21], interests identification [20, 34] (a.k.a. genres), sentiment and emotion recognition [37], among others. Generally speaking, AP has been approached as a single-label classification problem, where the target profiles (e.g., males vs. females) stand for the target classes.

To accurately model target profiles it is necessary to extract general demographic-features that apply to heterogeneous groups of authors, and indicate, to some extent, how they use them given their native language, genre, age, etc. [2]. Thus, the AP task in social media is particularly challenging given the nature of Internet interactions and the informality of written language.

Table 1 provides a summary of related work on AP. We can distinguish three broad approaches for addressing the AP task: textual, visual and multimodal. Notice that both visual and multimodal approaches have been less studied (see last two rows in Table 1). Regarding the textual approach, authors have proposed combinations of textual attributes ranging from lexical features (e.g., content words [2] and function words [17]) to syntactical features (e.g., POS-based features [6], personal phrases [31], and probabilistic context-free grammars [39]). Concerning the visual approach, most of the research had focused on gender recognition [5, 25, 41, 50], and have only considered some general statistics about the shared images as features [15]. Similarly, some works have considered visual information for the task of personality prediction [10, 11, 24, 45]. For instance, in [24], authors try to determine users’ behavioral biometric traits analyzing their favorite images in Flickr. As features, authors considered the average size of the regions, colorfulness, and wavelet textures among others. They conclude that images are very good elements for determining people’s behavior. Similarly, in [10] authors proposed a method for identifying personality traits from Flickr users. Their obtained results showed a strong correlation between the personality and the favorite image of users.

Regarding multimodal approaches, it is worth mentioning that this type of strategies has just recently starting to gain attention. For example, in [44] authors proposed a weighted text-image scheme for gender classification of twitter users. The basic idea consist on identifying a set of image categories associated for male and female classes. This process is done through the use of a CNN, which is used for determining a score for every user’s image. At the end, computed scores are averaged and combined with textual information for approaching the gender identification problem.

Contrary to previous research, this paper focuses in exploiting the visual modality to perform both age and gender identification in social media. Accordingly, we have extended a well-known benchmark corpus by means of incorporating images into users’ profiles. Thus, we are able to perform a comparative analysis on the importance of using textual and visual information for the age and gender identification problem. In addition, we also evaluate the utility of images when they are original images (i.e., tweeted by the user) or reused images (i.e., retweeted from other user’s post). Last, but not least, our visual approach to AP is based on state of the art visual descriptors (CNN-based).

3 Author profiling from images

In this section we describe the proposed visual approach for AP. Firstly, Section 3.1 describes the adopted image representation. Then, Section 3.2 introduces two different methods for performing AP in Twitter using information from posted images.

3.1 CNN-based image representation

Defining robust and discriminative features for images is not a trivial task. A direct way for defining these features is manually, through handcrafted features provided by experts or using a mechanical turk approach [43]

. As it is possible to imagine, this approach is highly inviable due to the number of images that would have to be labeled in a social media domain. Hence, we adopted an alternative solution based on feature learning by using a pre-trained deep learning model

111Features learned by a deep model on a generic and very large dataset are transferred to another task. [30, 49]. Deep models are composed of multiple processing layers that allow to learn representations of data with multiple levels of abstraction [19]. For instance, when an image is propagated in a pre-trained deep model, it is processed layer-by-layer transforming an array of pixel values (an image) into a representation that amplifies important aspects of the input and suppress irrelevant variations for discrimination [19]

. This methodology has reported outstanding results in a number of computer vision tasks. Our intuition is that this type of representation can be beneficial in solving the posed task.

As mentioned before, our dataset is composed of images from Twitter. Thus, we choose a pre-trained model general enough to cover the visual diversity of target images. As known, a pre-trained model will perform better, under a transfer learning scenario, when the target dataset shares a similar distribution with the source dataset 

[48]. Accordingly, we used the 16-layer CNN-model called VGG [42]

. This model was trained on the ImageNet dataset 

[38], a large visual database designed for visual object recognition, including classification and detection of objects and scenes.

Every image’s representation is obtained by passing its raw input (pixels) through the ConvNet model using the Caffe Library 

[16]. Then the activation of an intermediate layer is used as the representation of the feed image. For our experiments, we choose the 4096 activations produced in the last hidden layer of the net.

The activation of this layer produces similar values when similar images are introduced [18]

. Note that our CNN representation did not rely on the last layer of the net, which produces detection scores over 1000 different classes. The reason is that transferability is negatively affected by the specialization of higher layer neurons to their original task at the expense of performance on the target task 

[49]. In our case, an abstract representation over 4096 neurons would be reduced to 1000 different classes. Therefore, for representing images we use the last hidden layer, and the final layer is employed only for the qualitative analysis reported in Section 5.4.

3.2 AP from visual and multimodal information

This section describes two different visual-based methods for AP in social media. Also, it presents an effective multimodal approach that jointly uses textual and visual information, as well as a baseline method exclusively based on the use of textual information.

  1. Visual methods. The two proposed methods for AP from images are illustrated in Figure 1

    . Both of them use the same input information and apply the same process for building the images’ representation; this is, given the set of images from a user (profile), each image is passed through the pre-trained deep model obtaining its vector representation (see upper box in the figure). Next, the obtained information can be exploited in two different ways, deriving in our two proposed strategies:

    1. Individual classification

      : first, each image from a user is classified individually, then, the AP class of the user is determined by means of a majority-vote strategy.

    2. Prototype classification: first, each user is represented by a prototype vector built by averaging the CNN representations from all his/her images. Then, this prototype is feed to a standard classier that outputs the AP class of the user.

    Figure 1: Two visual approaches for AP in Twitter: (a) individual-based classification and (b) prototype-based classification.
  2. Multimodal method. This method follows the same pipeline that 1.(b) for building the images’ representation, however, it is called a multimodal representation since it combines visual and textual information. Specifically, we build the multimodal prototype of each user by concatenating the visual prototype with a traditional BOW representation from all the tweets that the user has posted.

  3. Textual method. As before, a prototype representation is built for each user, however, for this method we consider only textual information. Two different BOW representations were used in the experiments, containing the 2k and 10k most frequent terms from the training data respectively.

More details regarding the implementation of these methods are given in Section 5. Next, we introduce the multimodal corpus specifically assembled for evaluating our proposed approach.

4 A multimodal corpus for twitter AP

Images shared by social media users tend to be strongly correlated to their thematic interests as well as to their style preferences. Motivated by these facts we tackled the task of assembling a corpus considering text and images from twitter users. Mainly, we extended the PAN-2014 [35] dataset by harvesting images from the already existing twitter users.

The PAN-2014 twitter dataset considers gender profiles (female vs. male), and five non-overlapping age profiles. It includes tweets (only textual information) from English and Spanish users. Based on this dataset we harvested more than 150,000 images, corresponding to a subset of 450 user profiles, 279 profiles in English and 171 in Spanish222Note that the PAN-2014 corpus includes more profiles in both languages, however, for some twitter users it was impossible to download their associated images.. The images associated to all of the users were downloaded and lined to existing user profiles, resulting in a new multimodal twitter corpus for the AP task. Next subsections present detail information from the new corpus.

4.1 Statistics of the multimodal twitter corpus

Table 2

presents general statistics of the new multimodal AP corpus, which includes around 85,000 images for the English users and approximately 73,000 for the Spanish profiles. Given our interest in studying the discriminative capabilities of tweeted and retweeted images, we have separated both kinds of images; approximately 50% of the collected images correspond to each kind. It is worth noting that although there is a considerable amount of images per profile, there is a high standard deviation in both corpora, indicating the presence of some users with very few images in their profiles. Table 

2 also shows that users from the Spanish corpus posted more images (40% more) than the users from the English corpus.

# Profiles used 279 171
Images tweeted 44,376 35,583
Images retweeted 40,361 37,625
images ()
by profile 304 (340) 428 (409)
in tweet set 159 (239) 208 (304)
in retweet set 144 (188) 220 (223)
Table 2: General statistics of the images from the English (EN) and Spanish (SP) corpora.

Tables 3 and 4 present additional statistics on the values that both variables, gender and age can take, respectively. On the one hand, Table 3 divides profiles by age ranges, i.e., 18-24, 25-34, 35-49, 50-64 and 65-N. Both languages show an important level of imbalance, being the 35-49 class the one having the greatest number of users, while extreme ages are the ones with the lowest. Nonetheless, the users from the 65-N range are the ones with the greatest number of posted images as well as the lower standard deviation values. It is also important to notice that, in both corpora, the users belonging to the 50-64 range share in average a lot of images, but show a large standard deviation, indicating the presence of some users with too many and very few images.

Average images ()
ages # by profile in tweets in retweets
EN 18-24 17 246 (80) 141 (50) 105 (35)
25-34 78 286 (202) 148 (118) 137 (109)
35-49 123 301 (253) 154 (155) 147 (138)
50-64 54 334 (238) 174 (168) 160 (120)
65-N 7 441 (102) 291 (76) 150 (53)
SP 18–24 12 254 (99) 123 (58) 131 (45)
25–34 36 331 (198) 154 (101) 177 (109)
35–49 85 414 (341) 207 (234) 207 (170)
50–64 32 565 (308) 258 (197) 307 (179)
65–N 6 808 (173) 440 (116) 368 (87)
Table 3: Statistics of images shared by each age category, in both English (EN) and Spanish (SP) corpora.

On the other hand, Table 4 reports some statistics for each gender profile. It is observed a balanced number of male and females users in both corpora as well as a similar number of shared images.

Average images ()
gender # by profile in tweets in retweets
EN F 140 162 (294) 83 (182) 79 (158)
M 139 141 (274) 75 (192) 65 (144)
SP F 86 228 (372) 104 (236) 124 (205)
M 85 200 (347) 104 (242) 96 (178)
Table 4: Statistics of images shared by each gender category, in both English (EN) and Spanish (SP) corpora.

5 Experimental results

This section presents experimental results on the multimodal twitter corpus introduced previously. This section is divided in two parts: (1) a comparison among different methods for performing AP, followed by (2) a discussion based on a purely visual evaluation. Overall, our aim is to show how useful the images are to approach the AP task.

approach methods age (EN) gender (EN) age (SP) gender (SP)
Textual T1: BoW (2k) 0.394 0.741 0.481 0.601
T2: BoW (10k) 0.409 0.755 0.505 0.703
Visual V3: LL-CNN (all-imgs) 0.349 0.526 0.481 0.524
V4: LL-CNN AVG (all-imgs) 0.390 0.700 0.380 0.650
Multimodal M3: T1+V4 0.414 0.775 0.451 0.685
M6: T2+V4 0.423 0.778 0.433 0.642
Table 5: Comparison among methods based on textual, visual and multimodal approaches for performing author profiling task.

5.1 Comparison of textual, visual and multimodal methods for AP

This subsection compares the performance (classification accuracy) of the methods introduced in Section 3 when approaching the AP task. Evaluation is carried out by profile, allowing us to make a fair comparison among the different approaches. In order to provide comparable results, using the profile ID in the PAN-2014 twitter corpus, we construct 10 subject-independent partitions (including at least one subject from each class in each partition) and we adopt 10-fold cross-validation strategy for evaluation. As expected, partitions are unbalanced with respect to number of images. We used a SVM using LibLinear [12] for classification, making a direct comparison among evaluated approaches. Thereby, although different representations are used by the methods, the information came from the same profiles. Besides, the evaluated representations used by the methods do not include any preprocessing, this is to perform a fair comparison among different modalities.

Considering the proposed visual methods, four variants were evaluated: three of them using individual classification, considering all-images (i.e. V3 in Table 5); and one using prototype classification in the all-images subset (V4 in Table 5). In this last case, we decided to include only the variant with the biggest performance.

Table 5 presents the obtained results, highlighting in bold the best obtained result by column. On the one hand, surprisingly, the evaluation in English corpus reveals that using a multimodal approach is better for detecting age and gender, when all images are collapsed into a prototype. Regarding the visual methods, they perform poorly when compared to the textual one (except V4).

On the other hand, the best results on the evaluation in the Spanish corpus are achieved by textual approaches, especially when more terms in BoW are considered, i.e. 10k. However, it is worth noticing that for gender identification, the V4 method is very competitive (as in the case of the English corpus). The multimodal approaches obtained quite competitive performance as well, but they were not able to outperform the text-only results.

With the goal of studying the discriminative properties of tweeted and retweeted images, we performed an additional experiment in which both types of images were separated and then we evaluated our proposed visual and multimodal approaches. The obtained results reveals an accuracy performance of 0.432 for the V3 method using retweeted images, surpassing the results obtained in age identification for English corpus. However, none other improvements were observed.

5.2 How much a single image says about the user

This scenario aims to evaluate globally the visual information in images. In this setting, every image from a profile is an individual instance where the label is the same that was assigned to the profile. We have evaluated gender and age for the two corpora in PAN-2014. For the sake of the evaluation, we have included the probability for each class evaluated.

From Table 6, we can confirm the usefulness of visual information for performing gender identification, obtaining in most of the cases better results than the class probability in both corpora. Besides, a better performance is observed in the female gender. Instead, the age recognition task seems to be more challenging under this scenario, it seems that images posted by users are more diverse in terms of age.

Interestingly, although, the highest performances are reached for the majority classes, these are only higher than the class probability for the Spanish corpus. Whereas for the English language, results for three age intervals surpassed the class probability.

corpus English [P*] Spanish [P*]
18-24 0.071 [0.049] 0.040 [0.042]
25-34 0.269 [0.264] 0.080 [0.163]
35-49 0.333 [0.437] 0.580 [0.482]
50-64 0.343 [0.213] 0.220 [0.247]
65-N 0.010 [0.036] 0.060 [0.066]
accuracy 0.296 0.355
Gender Female 0.578 [0.535] 0.548 [0.467]
Male 0.503 [0.465] 0.509 [0.532]
accuracy 0.546 0.531

Class probability.

Table 6: Accuracy performance on age and gender identification.
(a) testing all-imgs/[training with] (b) [testing with]/training all-imgs (c) [testing/training] with
evaluating tweets retweets tweets retweets tweets retweets
age (EN) 0.288 0.322 0.298 0.294 0.263 0.340
gender (EN) 0.515 0.535 0.550 0.541 0.519 0.544
age (SP) 0.350 0.290 0.357 0.349 0.357 0.282
gender (SP) 0.532 0.510 0.525 0.544 0.549 0.504
Table 7: Accuracy performance considering origin of the images.

5.2.1 The importance of tweeted and retweeted images

In view of the results obtained earlier, we repeated the experiment (same evaluation protocol), but this time separating the source of the images, i.e. tweeted and retweeted. This distinction aimed at answering the following questions:

  1. Do the posted (tweet) and re-shared (retweet) images express author’s interests in the same way?

  2. Do any of these ways of sharing images give more information about the user’s profile?

Thus, three specific scenarios were defined according to the source of the images: (a) testing all-images (from both sources) and training with one source, i.e. tweeted images, retweeted images, and all-images; (b) training with all-images and testing in one of the two sources; and finally (c) testing and training using one of the two sources. The obtained results are presented in Table 7.

From Table 7 we can stress the following:

  • Results for scenario (a) indicate that only using an image source for training is comparable and even better to use both sources together, in age and gender identification (see results from scenario (a) when compared with results from Table 6). It is interesting that English and Spanish take advantage of different sources, English presents better performances using retweeted images for training, whilst Spanish from tweeted images.

  • Scenario (b) allows us to compare the results obtained in each source by including all-images as training. In the case of age, only the retweeted images in Spanish have presented a considerable increment in performance, i.e. from 0.29 to 0.34.

  • Scenario (c) allows us to compare the results obtained with those obtained in scenario (b). Here, the training is reduced to individual sources in comparison to scenario (b) that uses all-images. In general, we can see decrements in age and gender identification over English, with an exception in retweeted images where it has obtained better results. Instead, the tweeted images for gender identification in Spanish has been the only with an increment in performance.

Summarizing the obtained results from the experiments in this section, we have showed that it is feasible, using only the visual information from posted images, to approach the gender identification task and in some cases the age identification as well. Moreover, we have found that, in fact, the image source matters (i.e. tweeted or retweeted), and it is possible to exploit it for achieving better results on the age and gender identification.

5.3 A picture is worth a thousand words

Inspired by the saying ’a picture is worth a thousand words’, we decided to classify individual profiles taken randomly. Some works have followed the same idea, for instance [15] has performed a statistical analysis by genders, and [51] has constructed a dataset from Pinterest with the aim to classify images from a user into categories. Instead, here explicitly each profile is defined by a tweet composed of only 1000 words, and after classified. In return, images from the same profile are classified. Both, classification schemes are compared trying to answer whether it is possible to say more with a picture than with thousand words.

In order to approximate the same conditions for both classification scenarios, the training instances are provided from the same profiles but of course using its respective representation. For the case of textual approach, samples of 1000 words are used for textual representation, repeating as many subsets of this size can be extracted from the profile. Instead, all images in profiles are classified for the visual approach. Obtained results are presented in Tables 8 and 9 for age and gender identification, respectively.

In general, results from Table 8 indicate that it is possible to identify with reasonable accuracy some age ranges by using images only, this holds for both languages. Especially, minority classes where using only 1000 words it is not enough. Even more interesting is to observe that the image source is important, presenting better results than using all-images without making a distinction.

Textual Visual
ages BoW (2k) BoW (10k) all-images tweets retweets
ES 18-24 0% 0% 4.6% 12.6% 4%
25-34 0% 14.2% 29.7% 13.8% 17%
35-49 100% 100% 30% 41.8% 33.1%
50-64 14.2% 14.2% 38.5% 41.8% 33.1%
65-N 0% 0% 0% 8.3% 0%
EN 18-24 0% 0% 3.1% 3.1% 14.2%
25-34 0% 11.1% 28.1% 39.6% 31.4%
35-49 33% 50% 38.3% 27.6% 8.5%
50-64 18.1% 18.1% 20.1% 29.1% 29.4%
65-N 0% 14.2% 0.3% 1.4% 9.1%
Table 8: Accuracy performance for age ranges.

On the other hand, results obtained by gender identification indicate that it is kind of easier to determine whether a profile belongs a female/male person by their images than by their words. Interestingly, for female gender it is better using the tweeted images for performing the identification, while for the male gender using retweeted images works better.

Spanish English
Textual BoW (2k) 40% 0% 20% 25%
BoW (10k) 60% 20% 20% 50%
Visual all-images 54.6% 55.5% 52.5% 55.7%
tweets 75.7% 37% 95.7% 5.3%
retweets 55.2% 59.2% 38.9% 62.8%
Table 9: Accuracy performance for gender.
(a) Male (EN) images.
(b) Differences among labels from English corpus.
(c) Female (EN) images.
(d) Male (SP) images.
(e) Differences among labels from Spanish corpus.
(f) Female (SP) images.
Figure 2: Comparison of frequent labels found in male and female profiles.

5.4 Qualitative analysis of the posted images

This section presents qualitative experiments in order to show how useful are images for exploring information in Twitter. Our aim to perform this study is to show the visual evidence left by users in a lapse of time.

For this experiment we represented images with the final layer of the considered CNN. The dimensions correspond to the 1000 categories in ImageNet [38], thus the higher the value is, then the more likely the corresponding category appears. Of course it is unlikely that images posted in a ’wild’ scenario could be represented by only 1000 classes. However, through this kind of experiments we can define user preferences using denotative descriptions, i.e. labels assigned.

Hence, in order to analyze the content in images posted by the users, we labeled all-images using these 1,000 ImageNet categories [42]. After classification, images in a specific gender or age range are concentrated in normalized histograms. A similar study has been carried out in [47] using Pinterest images under the Travel category, nevertheless, the authors intended to answer whether user-generated visual contents had predictive capabilities for users’ preferences beyond labels.

Figure 2 shows a list of words with frequency order (top to down), comparing how often they are used by gender in both corpora, i.e. male versus female. Besides, a sample of images posted accompany both genders. For producing such word list, the difference is calculated over normalized histograms of genders. We have taken 20 words with the biggest difference considering equal number of words in favor of each gender.

On the one hand, in the English corpus (see 1(a), 1(b) and 1(c) in Figure 2), the male gender users seem to post images associated to topics as sports (i.e. ’mountain bike’, ’scoreboard’, etc.), machines and vehicles (i.e. ’airship’, ’trailer truck’, ’streetcar’, etc.). Whereas, for female gender users, there is a topic related to beauty products (’hair spray’, ’perfume’, ’hand blower’), and another associated to fashion (i.e. ’velvet’, ’wig’, ’pajama’, etc.).

On the other hand, in the Spanish corpus (see 1(d), 1(e) and 1(f) in Figure 2), the male gender users are prone to post images that include sports (i.e. ’scoreboard’ and ’digital clock’) as well as technology (i.e. ’monitor’, ’hand-held computer’, ’iPod’, etc.). While the female gender users posted images related to fashion (i.e. ’wig’, ’miniskirt’, ’bonnet’, etc.).

(a) From 18-24 ages (English).
(b) From 18-24 ages (Spanish).
(c) From 25-34 ages (English).
(d) From 25-34 ages (Spanish).
(e) From 35-50 ages (English).
(f) From 35-50 ages (Spanish).
(g) From 51-64 ages (English).
(h) From 51-64 ages (Spanish).
(i) From 65-N ages (English).
(j) From 65-N ages (Spanish).
Figure 3: Most frequent words, in addition to images posted among age ranges. On the right, from English corpus: (a), (c), (e), (g) and (i). On the left, from Spanish corpus: (b), (d), (f), (h) and (j). Size of the words indicates their frequency, being the smaller sizes the less frequent.

A second qualitative evaluation is presented in Figure 3, this time showing the most posted words by age ranges in both corpora. For this purpose, each row presents an age range by a sample of images and its respectively cloud words. Here the size of the word indicates its frequency. The idea is to allow the reader to judge the visual information that we can extract from images posted in Twitter.

6 Conclusions and Future Work

This paper explored the use of visual information to perform both age and gender identification in social media, specifically in Twitter. Novel methods for AP using visual information were proposed, as well as techniques based on multimodal techniques (text+images) for approaching the task. The models incorporating visual information rely on a CNN for feature extraction. The usefulness of images for AP was also explored by contrasting performance when using tweeted and retweeted images for the predictive models. Furthermore, we extended a benchmark data set for AP (PAN-2014 Twitter dataset) to include visual information by incorporating images from the users’ profiles. The release of the later dataset is an important contribution of this work to the state-of-the-art on AP, as it will motivate further research on visual and multimodal approaches to AP.

Using the extended benchmark, we conducted an extensive evaluation including textual, visual and multimodal methods for AP. The obtained results represent relevant evidence on the usefulness of visual information for AP. On the one hand, experimental results suggested that approaches based on multimodal information result in better performance, when compared to single modality approaches in both, age and gender prediction. On the other hand, results indicated that images tend to be more relevant than text for determining the gender of Twitter users. We also found that the usefulness of visual information is somewhat dependent on the language of tweets.

Regarding the analysis on the discriminative capabilities of tweeted and retweeted images, the obtained results did not allow us to formulate a convincing conclusion. However, results seemed to indicate that the image source matters. For example, results showed that females gender identification was more accurate when using tweeted images, whereas for male gender identification using retweeted images worked better.

For future work, we plan to extend this study by incorporating more advanced techniques for modeling both visual and textual information. We also consider evaluating the usefulness of other information modalities for AP, such as the posting behavior and video sharing.


This research was supported by CONACyT under scholarships 401887, 214764 and 243957; project grants 247870, 241306 and 258588; and the Thematic Networks Program (Language Technologies Thematic Network, project 281795).


  • [1] Miguel A. Alvarez Carmona, Luis Pellegrin, Manuel Montes y Gómez, Fernando Sánchez-Vega, Hugo Jair Escalante, A. Pastor López-Monroy, Luis Villase nor Pineda, and Esaú Villatoro-Tello. A visual approach for age and gender identification on twitter. Journal of Intelligent & Fuzzy Systems, 34(5):3133–3145, 2018.
  • [2] Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. Gender, genre, and writing style in formal written texts. Text, 23(3):321–346, 2003.
  • [3] Shlomo Argamon, Moshe Koppel, James W Pennebaker, and Jonathan Schler. Mining the blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9), 2007.
  • [4] Shlomo Argamon, Moshe Koppel, James W Pennebaker, and Jonathan Schler. Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2):119–123, 2009.
  • [5] S. Azam and M. Gavrilova. Gender prediction using individual perceptual image aesthetics. Journal of WSCG, 24(2):53–62, 2016.
  • [6] Shane Bergsma, Matt Post, and David Yarowsky. Stylometric analysis of scientific articles. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 327–337, 2012.
  • [7] John D Burger, John Henderson, George Kim, and Guido Zarrella. Discriminating gender on twitter. In

    Proceedings of the Conference on Empirical Methods in Natural Language Processing

    , pages 1301–1309, 2011.
  • [8] Ethem F. Can, Hüseyin Oktay, and R. Manmatha. Predicting retweet count using visual cues. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 1481–1484, 2013.
  • [9] Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. Author gender identification from text. Digital Investigation, 8(1):78–88, 2011.
  • [10] Marco Cristani, Alessandro Vinciarelli, Cristina Segalin, and Alessandro Perina. Unveiling the multimedia unconscious: Implicit cognitive processes and multimedia content analysis. In Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, pages 213–222. ACM, 2013.
  • [11] Azar Eftekhar, Chris Fullwood, and Neil Morris. Capturing personality from facebook photos and photo-related activities. Comput. Hum. Behav., 37(C):162–170, August 2014.
  • [12] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871–1874, June 2008.
  • [13] Sumit Goswami, Sudeshna Sarkar, and Mayur Rustagi. Stylometric analysis of bloggers age and gender. In Third International AAAI Conference on Weblogs and Social Media, 2009.
  • [14] Susan C Herring and John C Paolillo. Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4):439–459, 2006.
  • [15] Noelle J. Hum, Perrin E. Chamberlin, Brittany L. Hambright, Anne C. Portwood, Amanda C. Schat, and Jennifer L. Bevan. A picture is worth a thousand words: A content analysis of facebook profile photographs. Computers in Human Behavior, 27(5):1828 – 1833, 2011. 2009 Fifth International Conference on Intelligent Computing.
  • [16] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
  • [17] Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni. Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4):401–412, 2002.
  • [18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
  • [19] Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 5 2015.
  • [20] Chunshan Li, William K. Cheung, Yunming Ye, Xiaofeng Zhang, Dianhui Chu, and Xin Li. The author-topic-community model for author interest profiling and community discovery. Knowledge and Information Systems, 44(2):359–383, Aug 2015.
  • [21] T. A. Litvinova, P. V. Seredin, and O. A. Litvinova. Using part-of-speech sequences frequencies in a text to predict author personality: a corpus study. Indian Journal of Science and Technology, 8(S9), 2015.
  • [22] Tatiana Litvinova, Olga Zagorovskaya, Olga Litvinova, and Pavel Seredin. Profiling a Set of Personality Traits of a Text’s Author: A Corpus-Based Approach, pages 555–562. Springer International Publishing, Cham, 2016.
  • [23] A. Pastor López-Monroy, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villaseñor Pineda, and Efstathios Stamatatos. Discriminative subprofile-specific representations for author profiling in social media. Know.-Based Syst., 89(C):134–147, November 2015.
  • [24] P. Lovato, M. Bicego, C. Segalin, A. Perina, N. Sebe, and M. Cristani. Faved! biometrics: Tell me which image you like and i’ll tell you who you are. IEEE Transactions on Information Forensics and Security, 9(3):364–374, March 2014.
  • [25] Xiaojun Ma, Y. Tsuboshita, and N. Kato.

    Gender estimation for sns user profiling using automatic image annotation.

    In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6, July 2014.
  • [26] M. Merler, Liangliang Cao, and J. R. Smith. You are what you tweet…pic! gender prediction based on semantic analysis of social media images. In 2015 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, June 2015.
  • [27] Arjun Mukherjee and Bing Liu. Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 207–217, 2010.
  • [28] Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, and Theo Meder. How old do you think i am?: A study of language and age in twitter. In Seventh International AAAI Conference on Weblogs and Social Media, 2013.
  • [29] Dong Nguyen, Noah A Smith, and Carolyn P Rosé.

    Author age prediction from text using linear regression.

    In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123. Association for Computational Linguistics, 2011.
  • [30] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In

    2014 IEEE Conference on Computer Vision and Pattern Recognition

    , pages 1717–1724, June 2014.
  • [31] Rosa María Ortega-Mendoza, Anilú Franco-Arcega, Adrián Pastor López-Monroy, and Manuel Montes-y Gómez. I, Me, Mine: The Role of Personal Phrases in Author Profiling, pages 110–122. Springer International Publishing, Cham, 2016.
  • [32] Jahna Otterbacher. Inferring gender of movie reviewers: exploiting writing style, content and metadata. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 369–378, 2010.
  • [33] Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. Predicting age and gender in online social networks. In Proceedings of the 3rd international workshop on Search and mining user-generated contents, pages 37–44, 2011.
  • [34] P. Peñas, R. del Hoyo, J. Vea-Murguía, C. González, and S. Mayo. Collective knowledge ontology user profiling for twitter – automatic user profiling. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 1, pages 439–444, Nov 2013.
  • [35] F. Rangel, P. Rosso, I. Chugur, M. Potthast, M. Trenkmann, B. Stein, B. Verhoeven, and W. Daelemans. Overview of the author profiling task at PAN 2014. In CLEF (Online Working Notes/Labs/Workshop), pages 898–927, 2014.
  • [36] Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 37–44, 2010.
  • [37] Paolo Rosso, Cristina Bosco, Rossana Damiano, Viviana Patti, and Erik Cambria. Emotion and sentiment in social and expressive media: Introduction to the special issue. Information Processing & Management, 52(1):1 – 4, 2016.
  • [38] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, 115(3):211–252, December 2015.
  • [39] Ruchita Sarawgi, Kailash Gajulapalli, and Yejin Choi. Gender attribution: tracing stylometric evidence beyond topic and genre. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 78–86, 2011.
  • [40] Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James Pennebaker. Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pages 199–205, 2006.
  • [41] R. Shigenaka, Y. Tsuboshita, and N. Kato. Content-aware multi-task neural networks for user gender inference based on social media images. In 2016 IEEE International Symposium on Multimedia (ISM), pages 169–172, Dec 2016.
  • [42] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
  • [43] A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8, June 2008.
  • [44] Tomoki Taniguchi, Shigeyuki Sakaki, Ryosuke Shigenaka, Yukihiro Tsuboshita, and Tomoko Ohkuma. A Weighted Combination of Text and Image Classifiers for User Gender Inference, pages 87–93. Association for Computational Linguistics, 2015.
  • [45] Yen-Chun Jim Wu, Wei-Hung Chang, and Chih-Hung Yuan. Do facebook profile pictures reflect user’s personality? Comput. Hum. Behav., 51(PB):880–889, October 2015.
  • [46] Xiang Yan and Ling Yan. Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pages 228–230, 2006.
  • [47] Longqi Yang, Cheng-Kang Hsieh, and Deborah Estrin. Beyond classification: Latent user interests profiling from visual contents analysis. CoRR, abs/1512.06785, 2015.
  • [48] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, 2014.
  • [49] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, pages 3320–3328, 2014.
  • [50] Q. You, S. Bhatia, T. Sun, and J. Luo. The eyes of the beholder: Gender prediction using images posted in online social networks. In 2014 IEEE International Conference on Data Mining Workshop, pages 1026–1030, Dec 2014.
  • [51] Quanzeng You, Sumit Bhatia, and Jiebo Luo. A picture tells a thousand words - about you! user interest profiling from user generated visual content. Signal Processing, 124:45 – 53, 2016. Big Data Meets Multimedia Analytics.
  • [52] Quanzeng You and Jiebo Luo.

    Towards social imagematics: Sentiment analysis in social multimedia.

    In Proceedings of the Thirteenth International Workshop on Multimedia Data Mining (MDMKDD) 2013, pages 3:1–3:8, 2013.