Social Media Text Processing and Semantic Analysis for Smart Cities

With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content. With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization. We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and São Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle. We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture.

READ FULL TEXT

page 1

page 26

research
10/25/2017

Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings

Social media platforms have grown into an important medium to spread inf...
research
10/27/2016

Word Embeddings to Enhance Twitter Gang Member Profile Identification

Gang affiliates have joined the masses who use social media to share tho...
research
08/06/2021

Deriving Disinformation Insights from Geolocalized Twitter Callouts

This paper demonstrates a two-stage method for deriving insights from so...
research
08/20/2018

Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods

Massive tourism is becoming a big problem for some cities, such as Barce...
research
02/04/2020

From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Are nearby places (e.g. cities) described by related words? In this arti...
research
01/16/2020

#MeToo on Campus: Studying College Sexual Assault at Scale Using Data Reported on Social Media

Recently, the emergence of the #MeToo trend on social media has empowere...
research
10/28/2020

Micromobility in Smart Cities: A Closer Look at Shared Dockless E-Scooters via Big Social Data

The micromobility is shaping first- and last-mile travels in urban areas...

Please sign up or login with your details

Forgot password? Click here to reset