Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks

by   Karan Sikka, et al.
SRI International

There has been an explosion of multimodal content generated on social media networks in the last few years, which has necessitated a deeper understanding of social media content and user behavior. We present a novel content-independent content-user-reaction model for social multimedia content analysis. Compared to prior works that generally tackle semantic content understanding and user behavior modeling in isolation, we propose a generalized solution to these problems within a unified framework. We embed users, images and text drawn from open social media in a common multimodal geometric space, using a novel loss function designed to cope with distant and disparate modalities, and thereby enable seamless three-way retrieval. Our model not only outperforms unimodal embedding based methods on cross-modal retrieval tasks but also shows improvements stemming from jointly solving the two tasks on Twitter data. We also show that the user embeddings learned within our joint multimodal embedding model are better at predicting user interests compared to those learned with unimodal content on Instagram data. Our framework thus goes beyond the prior practice of using explicit leader-follower link information to establish affiliations by extracting implicit content-centric affiliations from isolated users. We provide qualitative results to show that the user clusters emerging from learned embeddings have consistent semantics and the ability of our model to discover fine-grained semantics from noisy and unstructured data. Our work reveals that social multimodal content is inherently multimodal and possesses a consistent structure because in social networks meaning is created through interactions between users and content.


page 8

page 15

page 16


Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification

Social media is daily creating massive multimedia content with paired im...

Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

In this paper we present a novel interactive multimodal learning system,...

Learning Social Image Embedding with Deep Multimodal Attention Networks

Learning social media data embedding by deep models has attracted extens...

User Factor Adaptation for User Embedding via Multitask Learning

Language varies across users and their interested fields in social media...

C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap

The interplay between the image and comment on a social media post is on...

Predicting Human Activities from User-Generated Content

The activities we do are linked to our interests, personality, political...

DepressionNet: A Novel Summarization Boosted Deep Framework for Depression Detection on Social Media

Twitter is currently a popular online social media platform which allows...

Please sign up or login with your details

Forgot password? Click here to reset