Deep Text Mining of Instagram Data Without Strong Supervision

09/24/2019
by   Kim Hammar, et al.
0

With the advent of social media, our online feeds increasingly consist of short, informal, and unstructured text. This textual data can be analyzed for the purpose of improving user recommendations and detecting trends. Instagram is one of the largest social media platforms, containing both text and images. However, most of the prior research on text processing in social media is focused on analyzing Twitter data, and little attention has been paid to text mining of Instagram data. Moreover, many text mining methods rely on annotated training data, which in practice is both difficult and expensive to obtain. In this paper, we present methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain. In this context, we analyze a corpora of Instagram posts from the fashion domain, introduce a system for extracting fashion attributes from Instagram, and train a deep clothing classifier with weak supervision to classify Instagram posts based on the associated text. With our experiments, we confirm that word embeddings are a useful asset for information extraction. Experimental results show that information extraction using word embeddings outperforms a baseline that uses Levenshtein distance. The results also show the benefit of combining weak supervision signals using generative models instead of majority voting. Using weak supervision and generative modeling, an F1 score of 0.61 is achieved on the task of classifying the image contents of Instagram posts based solely on the associated text, which is on level with human performance. Finally, our empirical study provides one of the few available studies on Instagram text and shows that the text is noisy, that the text distribution exhibits the long-tail phenomenon, and that comment sections on Instagram are multi-lingual.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
01/13/2023

It's Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers

Depression detection from user-generated content on the internet has bee...
research
08/27/2018

Which Emoji Talks Best for My Picture?

Emojis have evolved as complementary sources for expressing emotion in s...
research
02/23/2019

Deep Sentiment Analysis using a Graph-based Text Representation

Social media brings about new ways of communication among people and is ...
research
11/15/2021

Sentiment Analysis of Fashion Related Posts in Social Media

The role of social media in fashion industry has been blooming as the ye...
research
08/12/2019

Automatic Fashion Knowledge Extraction from Social Media

Fashion knowledge plays a pivotal role in helping people in their dressi...
research
12/11/2018

Unsupervised domain-agnostic identification of product names in social media posts

Product name recognition is a significant practical problem, spurred by ...
research
03/01/2016

Characterizing Diseases from Unstructured Text: A Vocabulary Driven Word2vec Approach

Traditional disease surveillance can be augmented with a wide variety of...

Please sign up or login with your details

Forgot password? Click here to reset