Log In Sign Up

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

"SMP Challenge" aims to discover novel prediction tasks for numerous data on social multimedia and seek excellent research teams. Making predictions via social multimedia data (e.g. photos, videos or news) is not only helps us to make better strategic decisions for the future, but also explores advanced predictive learning and analytic methods on various problems and scenarios, such as multimedia recommendation, advertising system, fashion analysis etc. In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.) of new online posts in social media feeds before uploading. We also collected and released a large-scale SMPD benchmark with over 480K posts from 69K users. In this paper, we define the challenge problem, give an overview of the dataset, present statistics of rich information for data and annotation and design the accuracy and correlation evaluation metrics for temporal popularity prediction to the challenge.


A Multimodal Approach to Predict Social Media Popularity

Multiple modalities represent different aspects by which information is ...

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

Prediction of popularity has profound impact for social media, since it ...

Time Matters: Multi-scale Temporalization of Social Media Popularity

The evolution of social media popularity exhibits rich temporality, i.e....

Deepfakes Detection with Automatic Face Weighting

Altered and manipulated multimedia is increasingly present and widely di...

WeSeer: Visual Analysis for Better Information Cascade Prediction of WeChat Articles

Social media, such as Facebook and WeChat, empowers millions of users to...

A Dataset and Benchmarks for Multimedia Social Analysis

We present a new publicly available dataset with the goal of advancing m...

Will You Dance To The Challenge? Predicting User Participation of TikTok Challenges

TikTok is a popular new social media, where users express themselves thr...

1. Introduction

Figure 1. SMP Challenge introduces Temporal Popularity Prediction task for social multimedia. SMPD includes visual content (diverse images with categories into a semantic taxonomy), textual content (e.g. title and custom tags) and spatial-temporal content (e.g. location and time). The popularity score in the figure is prediction target and calculated by “user interactions” of online post.

People are interested in predicting the future. For example, which who will win the upcoming Grammy Awards or which film will be prevalent in next few weeks? Making predictions about the future brings real values to a variety of applications and scenarios (Martin et al., 2016), such as multimedia recommendation (Mei et al., 2011; Khosla et al., 2014; Qian et al., 2014), advertising system (Li et al., 2015; Zhao et al., 2015), fashion analysis (Lo et al., 2019; Hidayati et al., 2014), topic mining (Ferrara et al., 2014; Wu et al., 2014; Szabo and Huberman, 2010) etc.

Therefore, the purpose of SMP Challenge is to discover novel challenge tasks based on numerous resources on social multimedia and seek excellent research teams who are capable of making the prediction. For prediction, the increasing ubiquity of social media (e.g. Facebook, Twitter, Flickr, YouTube, etc.) provides a crucial way for learning about the real-world. Meanwhile, social multimedia data increased interest for researches in study of exploring rich social facts and knowledge with multi-modal information (e.g. images, text, video, events, etc.), while social media is now globally ubiquitous and prevalent. So far the researches of social media prediction covered in several significant areas of multimedia and artificial intelligence, and closely integrated with computer vision, machine learning, natural language and human-centered interaction.

During this year, the task of SMP Challenge 2019 is Temporal Popularity Prediction (Wu et al., 2017; Szabo and Huberman, 2010), addressing the problem to predict the future popularity of giving posts before they were published in social media. This treats popularity prediction at a time-related prediction problem (Shulman et al., 2016; Kobayashi and Lambiotte, 2016), and formulates popularity by online attention based on various user interactions (e.g. clicks, visits, reviews). To achieve this goal, the participated teams need to design new algorithms of understanding and learning techniques, and automatically predict with considering post content, future post time and its multiple multimedia information (as shown in Figure 1) in a time-related dynamic system (Myers and Leskovec, 2014; Kong et al., 2014; Yang and Leskovec, 2011).

In the literature, several large-scale datasets from social media have been established for various research tasks and helped lead to great advancements in multimedia technology and applications, such as YFCC (Kalkowski et al., 2015), Yelp2016 (Asghar, 2016), Visual Genome (Krishna et al., 2016), etc. However, most of the existing datasets are limited in the diversity of coverage, i.e. the collected data are often biased to the particular task in question, and lacking cross-task generalization. Therefore, we introduced Social Media Prediction Dataset (SMPD), a large-scale benchmark dataset for sociological understanding and predictions with over 486k posts and 80K users in total. SMPD collects multi-faceted information of a post, such as user profile, photo metadata, and visual content. Particularly, we aim to record the temporal order of social media data. For example, social media posts in the dataset are obtained with temporal information to preserve the continuity of post sequences. Our goal is to make the SMPD as varied and rich as possible to thoroughly represent the social media “world”.

Figure 2. The histogram of popularity score. Each score range collects the posts which have a popularity score within the range [x-0.5,x].

2. Temporal Popularity Prediction

2.1. Problem Formulation

Temporal Popularity Prediction (TPP) is a novel problem for social media analyzing and learning (Wu et al., 2016a). With the temporal dynamics of the social multimedia system, the popularity of online posts usually changed over time. Influenced by the temporal characteristics (McGrath and Kelly, 1992; Myers and Leskovec, 2014)

with complex contexts or patterns, how to predict accurate temporal popularity become more challenging than before. The task of TPP is to estimate the future impacts of giving social media posts (photos, videos or news) at a specific time before they were shared on the online platform. Specifically, given a new post

of a user , predict popularity describes how many attentions would obtain if it was published at time on social media. The formulations of popularity can be defined as a score by different dynamic indicators (e.g. views, likes or clicks, etc.) via diverse social multimedia platforms. In our challenge, we use “viewing count” as a basic indicator of how popular a post is, while this is more general. So temporal popularity can be defined as the following:

Popularity Normalization. To suppress the large variations among different photos (e.g. view count of different photos vary from zero to millions), we implement a log function (Wu et al., 2016b) to normalize the value of popularity, based on the previous work, as shown in Figure 2. In brief, the log-normalization function for popularity can be defined as:


where is the normalized value, is the view count of a photo, and is the number of days since the photo was posted.

Particularly, the post sequence with time information for each of user can be treated as time-series data. SMP Challenge 2019 aimed to make time-series feeds for popularity prediction. Then, we defined sequence data with time orders:

User-Post Sequence. Suppose we have user-photo pairs and the sharing time of each pair. Then the user-post sequence can be denoted by with its sharing time order .

Figure 3. An example of multi-level hierarchical photo categories. It shows a -level category “animal” with 5 different -level categories.

3. Social Media Prediction Dataset Overview

Social Media Prediction Dataset (SMPD) 111 is a large-scale benchmark for social multimedia researches. We selected Flickr as the data source of SMPD for multimedia and multi-modal data, which is one of the largest photo-sharing websites with over 2 billion photos monthly(Michel, 2019). Different with single-task datasets, SMPD is a multi-faced data collection, which contains rich contextual information and annotations for multiple-tasks (such as user profile, post category, customize tag, geography information, photo image, and photo metadata). The overview statistics of the dataset are shown in Table 1. It contains over 486K posts from 69K online users. And each of social media post has corresponding visual content and textual content information (e.g. posted photos, photo categories, custom tags, temporal and geography information).

SMPD Building. To create a multi-faced dataset for social media research, we attempt to utilize a concept-based sampling method to collect post data from the search engine of the Flickr platform. The concept-based approach aims to take a tag or concept as a searching keyword, collects the posts that involved with the keyword. On this basis, a second selection will be manipulated to ensure the accuracy of concept-related posts (Huiskes and Lew, 2008)

. The advantage of this approach is offering an accurate data source for theme extraction and feature extraction. Unlike traditional social bookmarking, Flickr does not involve creating an explicit vocabulary of tags to describe the post. Therefore, the referencing queries of different categories are filtered from the most popular tags which user liked most in 2015, such as other tag prediction study 

(Jang et al., 2015). We filtered the tags within incomplete or typo keywords, such as insta” or “instadog”, etc. Then we leave 756 categories within 11 topics for our dataset creation, as shown in Figure 4 and 5. To keep time-orders in our data, we obtained the public post stream continuously for each of the categories in every day from Nov. 2015 to March. 2016. To have various properties in our dataset, we extracted abundant data including visual content (e.g. photo and photo categories), textual content (e.g. post title and custom tags) and Spatio-temporal content, revealing the influence of region and time-zone on online social behavior. Finally, we have a large-scale multi-faced collection.

Statistics Value
Train Test
Number of Posts
Mean popularity of posts 6.41 5.12
STD popularity of posts 2.47 2.41
Number of users
Number of custom tags
Number of level categories 11
Number of level categories 77
Number of level categories 668
Temporal range of posts 480 days
Average length of title 29 words
Table 1. Summary of SMPD Statistics.
Figure 4. The statistics of posts in each level category. It shows 11 level categories and the number of posts in train and test data.

3.1. Visual Content

As an old saying: “A picture is worth a thousand words”, it is easier for users to reflect their thoughts or emotions by photo/image on social media (Cappallo et al., 2015). In our dataset, we collect 486k posts by querying 756 selected key-words (as mentioned in the prior section) with social media APIs (team, 2019). These key-words can be organized into 11 topics range from nature, people to animal (the directory tree in the left of Figure 3). Furthermore, each of key-words represents an individual concept for photo content, such as “bird”, “flower”, etc. In the right part of Figure 3, an example category “Animal” with five different sub-concepts for content visualization is shown. The visual content (photo or image) of the posts with the same key-word are similar in visual view. By utilized these categorized photos, it helps prepare train/test data for computer vision works. Furthermore, we generate 668 individual

level human-craft categories for photos of selected posts. By the 3 levels hierarchical fine-grained classifying, SMPD provides fine-grained classes.

3.2. Spatio-temporal Content

3.2.1. Time

Popularity prediction of social media posts is a time-sensitive task (Ellering, 2016). Temporal context of posts records user activities, and it is necessary to identify the uploading time. Meanwhile, Flickr provides an uploading time for each submitted post. Figure 5 plots the average post counts and its uploaded months in our dataset. The posts show the most of posts in SMPD were uploaded between March 2015 to the creation time of the dataset in 2016. Among the upload dates, October and December become the popular month for sharing posts on the Flickr social media platforms. We attribute those improvements to the holidays and the end of the year.

Figure 5. Overall distribution of popularity over days. The discrete black points represent the average popularity score per day. The blue line and shadow interval represent a regression result about the score. Specifically, the left part of the red line represents the training data and the right part represents the test data.

3.2.2. Location

Location information provides user spatial distribution (Yang et al., 2016). Not all posts have location information, but the location information of photos point out the spatial region of user activities. In SMPD, 32,068 posts have POI (Point of Interest) location information with a geographic coordinate, either manually by the user or automatically via GPS. Using this information, we were able to map 10% of all items in the dataset to a single country or area. Furthermore, SMPD provides geo accuracy value to represent the accuracy level of the location information, which ranges from 1 to 16 and represents from words level to street-level accuracy. To suppress the large variations among the number of posts in different territories, we implement a log function to normalize the number of posts in each country, based on the previous work. The distribution of all items over territories is shown in Figure 6.

Figure 6. The distribution of posts around the world. The legend in the figure represents 5 equal interval ranges of Log-normalization value of posts number in each territory from 0 to 9.28. The deeper the color is, the more posts are posted in the corresponding territory.

3.3. Textual Content

In addition to visual content, we also collected the surrounding text of posts provides to show semantic information for each post. As statistics, there are more than 95% posts have relative descriptions or titles. When uploading a photo on the social media platform, the relative textual content is appended to provide more details about the photo content or publisher status.

Post Title. Each posted photo has a unique title named by the user. As the saying goes: “There are a thousand Hamlets in a thousand peoples eyes”, each title contains the explanation and understanding of the photo. As shown in Table 1, users utilize average 29 words to describe the content of uploaded photos, which not only helps to analyze the visual content of the posts but also relate to the popularity of the corresponding post.

Figure 7. The tag-cloud of 668 -level keywords of photos. The larger the font size used, the corresponding tag is more frequently used.

Post Tags. Most of social network sites provide hash-tags to make user easier to find relevant post by topic with the same tag. It is possible to label a single post with multiple tags. With this kind of flexibility, this method is easier than the traditional one-to-one classification. By counting the tag frequently used, we can have a glimpse of which topics are more popular within these social network sites. In Figure 7, we generated the “tag cloud” of post tags. The larger the font size used, the corresponding tag is more frequently used. Based on the figure, the users of Flickr prefer to share a holiday-related popular tag.

4. Evaluation

To measure the performance of temporal popularity prediction on time-series data, we adopt the time-related partition strategy to generate train/test sets for evaluation. In proposed time-series data SMPD, we have a 10 length of the time window to build user-post sequences and divides the sequences to 2:1 for training and test dataset. Specifically, the train and test sets also share similar numbers of users.

By objective evaluation, we measure the performance of submitted methods on the unpublished SMPD test set. Our evaluation protocol is applied to the following criteria:

  • Ranking Relevance: to measure the ordinal association between ranked predicted popularity scores and actual ones.

  • Prediction Error: to judge the error of the score prediction.

As quantitative metrics of performance evaluation, we will compute Spearman Ranking Correlation (SRC, or Spearmans Rho), Mean Absolute Error (MAE) for each submitted model. SRC is a nonparametric measure of rank correlation, it applied to measure the ranking correlation between ground-truth popularity set and predicted popularity set , varying from 0 to 1. If there are samples, the SRC can be expressed as:


where and

are mean and variance of the corresponding popularity set. Furthermore, we also use Mean Absolute Error (MAE) to validate the prediction error. The goal of MAE is to calculate the averaged prediction error:


The ranking for the competition is based on an objective evaluation. Specifically, a rank list of teams is produced by sorting their performance on each of objective evaluation metrics, respectively. The final rank of a team is calculated by combining its two ranked metrics for balance. The smaller the final ranking, the better the performance.

5. Conclusions

In this paper, we have presented an overview of SMP Challenge 2019 and proposed a large-scale social multimedia dataset for real-world prediction challenges. Meanwhile, we formulate the temporal popularity prediction task, analyzes the proposed dataset and define evaluation metrics. You can find more information about the task, dataset and challenge at SMP Challenge website 222

We would like to thank CAS-ICT, Microsoft Research Asia, Academia Sinica for their helpful support. SMP Challenge 2019 was supported by Columbia University. SMP Challenge 2017 and 2018 was sponsored by Kwai Inc. and JD AI Research.


  • N. Asghar (2016) Yelp dataset challenge: review rating prediction. CoRR abs/1605.05362. Cited by: §1.
  • S. Cappallo, T. Mensink, and C. G. Snoek (2015) Latent factors of visual popularity prediction. In Proceedings of ACM International Conference on Multimedia Retrieval, Cited by: §3.1.
  • N. Ellering (2016) What 16 studies say about the best times to post on social media. Note:[Online] Cited by: §3.2.1.
  • E. Ferrara, R. Interdonato, and A. Tagarelli (2014) Online popularity and topical interests through the lens of instagram. In Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT ’14, pp. 24–34. Cited by: §1.
  • S. C. Hidayati, K. Hua, W. Cheng, and S. Sun (2014) What are the fashion trends in new york. In Proceedings of ACM International Conference on Multimedia (ACM MM), Cited by: §1.
  • M. J. Huiskes and M. S. Lew (2008) The mir flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR ’08, pp. 39–43. Cited by: §3.
  • J. Y. Jang, K. Han, P. C. Shih, and D. Lee (2015) Generation like: comparative characteristics in instagram. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 4039–4042. Cited by: §3.
  • S. Kalkowski, C. Schulze, A. Dengel, and D. Borth (2015) Real-time analysis and visualization of the yfcc100m dataset. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions, MMCommons ’15, pp. 25–30. Cited by: §1.
  • A. Khosla, A. Das Sarma, and R. Hamid (2014) What makes an imag e popular?. In Proceedings of International World Wide Web Conference (WWW), Cited by: §1.
  • R. Kobayashi and R. Lambiotte (2016) TiDeH time-dependent hawkes process for predicting retweet dynamics. In arXiv preprint arXiv:1603.09449, Cited by: §1.
  • S. Kong, Q. Mei, L. Feng, F. Ye, and Z. Zhao (2014) Predicting bursts and popularity of hashtags in real-time. In Proceedings of the 37th international ACM SIGIR Conference on Research & development in Information Retrieval, pp. 927–930. Cited by: §1.
  • R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, M. S. Bernstein, and F. Li (2016) Visual genome: connecting language and vision using crowdsourced dense image annotations. CoRR abs/1602.07332. Cited by: §1.
  • C. Li, Y. Lu, Q. Mei, D. Wang, and S. Pandey (2015) Click-through prediction for advertising in twitter timeline. In Proceedings of KDD, Cited by: §1.
  • L. Lo, C. Liu, R. Lin, B. Wu, and W. Cheng (2019) Dressing for attention: outfit based fashion popularity prediction. In IEEE International Conference on Image Processing (ICIP), Cited by: §1.
  • T. Martin, J. M. Hofman, A. Sharma, A. Anderson, and D. J. Watts (2016) Exploring limits to prediction in complex social systems. In Proceedings of International World Wide Web Conference (WWW), Cited by: §1.
  • J. E. McGrath and J. R. Kelly (1992) Temporal context and temporal patterning. Time & Society 1 (3), pp. 399–420. Cited by: §2.1.
  • T. Mei, B. Yang, X. Hua, and S. Li (2011) Contextual video recommendation by multimodal relevance and user feedback. ACM Transactions on Information Systems. Cited by: §1.
  • F. Michel (2019) How many photos are uploaded monthly to flickr?. Note:[Online] Cited by: §3.
  • S. A. Myers and J. Leskovec (2014) The bursty dynamics of the twitter information network. In Proceedings of the 23rd International Conference on World Wide Web, pp. 913–924. Cited by: §1, §2.1.
  • X. Qian, H. Feng, G. Zhao, and T. Mei (2014) Personalized recommendation combining user interest and social circle. IEEE Transactions on Knowledge and Data Engineering 26 (7), pp. 1763–1777. Cited by: §1.
  • B. Shulman, A. Sharma, and D. Cosley (2016) Predictability of popularity: gaps between prediction and understanding. In Proceedings of AAAI Conference on Artificial Intelligence, Cited by: §1.
  • G. Szabo and B. A. Huberman (2010) Predicting the popularity of online content. Communications of the ACM 53 (8), pp. 80–88. Cited by: §1, §1.
  • F. team (2019) Flickr api. Note:[Online] Cited by: §3.1.
  • B. Wu, W. Cheng, Y. Zhang, and T. Mei (2016a) Time matters: multi-scale temporalization of social media popularity. In Proceedings of the 2016 ACM on Multimedia Conference (ACM MM), Cited by: §2.1.
  • B. Wu, W. Cheng, Y. Zhang, H. Qiushi, L. Jintao, and T. Mei (2017) Sequential prediction of social media popularity with deep temporal context networks. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1.
  • B. Wu, T. Mei, W. Cheng, and Y. Zhang (2016b) Unfolding temporal dynamics: predicting social media popularity using multi-scale temporal decomposition. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Cited by: §2.1.
  • C. Wu, T. Mei, W. H. Hsu, and Y. Rui (2014) Learning to personalize trending image search suggestion. In Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–736. Cited by: §1.
  • D. Yang, D. Zhang, and B. Qu (2016) Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Transactions on Intelligent Systems and Technology 7 (3), pp. 30:1–30:23. Cited by: §3.2.2.
  • J. Yang and J. Leskovec (2011) Patterns of temporal variation in online media. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), Cited by: §1.
  • Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1513–1522. Cited by: §1.