Automatic Query Optimization for Retrieving Traffic Tweets

06/21/2020
by   Emory Hufbauer, et al.
0

Twitter, like many social media and data brokering companies, makes their data available through a search API (application programming interface). In addition to filtering results by date and location, researchers can search for tweets with specific content with a boolean text query, using AND, OR, and NOT operators to select the combinations of phrases which must, or must not, appear in matching tweets. This boolean text search system is not at all unique to Twitter and is found in many different contexts, including academic, legal, and medical databases, however it is stretched to its limits in Twitter's use case because of the relative volume and brevity of tweets. In addition, the semi-automated use of such systems was well studied under the topic of Information Retrieval during the 1980s and 1990s, however the study of such systems has greatly declined since that time. As such, we propose updated methods for automatically selecting and refining complex boolean search queries that can isolate relevant results with greater specificity and completeness. Furthermore, we present preliminary results of using an optimized query to collect a sample of traffic-incident-related tweets, along with the results of manually classifying and analyzing them.

READ FULL TEXT

page 4

page 5

research
10/08/2021

Smart Crawling: A New Approach toward Focus Crawling from Twitter

Twitter is a social network that offers a rich and interesting source of...
research
08/17/2018

What do the US West Coast Public Libraries Post on Twitter?

Twitter has provided a great opportunity for public libraries to dissemi...
research
04/23/2019

Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach

Twitter has grown to become an important platform to access immediate in...
research
11/26/2015

Hierarchical classification of e-commerce related social media

In this paper, we attempt to classify tweets into root categories of the...
research
04/13/2020

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks

In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset t...
research
01/17/2018

A Pipeline for Post-Crisis Twitter Data Acquisition

Due to instant availability of data on social media platforms like Twitt...
research
05/16/2021

Follow the Money: Analyzing @slpng_giants_pt's Strategy to Combat Misinformation

In 2020, the activist movement @sleeping_giants_pt (SGB) made a splash i...

Please sign up or login with your details

Forgot password? Click here to reset