Mining Hidden Populations through Attributed Search

05/11/2019
by   Suhansanu Kumar, et al.
0

Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness De-Choudhury2017 and jazz musicians heckathorn2001finding. Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a Decision tree-based Thompson sampler (DT-TMP) that efficiently discovers the right combination of attributes to query. Our proposed sampler outperforms the state-of-the-art samplers in online experiments, for example by 54% on Twitter. When the number of matching entities to a query is known in offline experiments, DT-TMP performs exceedingly well by a factor of 0.9-1.5× over the baseline samplers. In the future, we wish to explore the option of finding hidden populations by formulating more complex queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

High-Throughput Vector Similarity Search in Knowledge Graphs

There is an increasing adoption of machine learning for encoding data in...
research
04/23/2019

Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach

Twitter has grown to become an important platform to access immediate in...
research
01/12/2023

The Keyword Explorer Suite: A Toolkit for Understanding Online Populations

We have developed a set of Python applications that use large language m...
research
08/10/2022

Population Size Estimation for Respondent-Driven Sampling and Capture-Recapture: A Unifying Framework

This paper deals with the estimation of population sizes for respondent-...
research
03/04/2020

Generate Descriptive Social Networks for Large Populations from Available Observations: A Novel Methodology and a Generator

When modeling a social dynamics with an agent-oriented approach, researc...
research
04/02/2020

Mapping Languages and Demographics with Georeferenced Corpora

This paper evaluates large georeferenced corpora, taken from both web-cr...
research
09/12/2018

Access to Population-Level Signaling as a Source of Inequality

We identify and explore differential access to population-level signalin...

Please sign up or login with your details

Forgot password? Click here to reset