Mining Hidden Populations through Attributed Search

05/11/2019
by   Suhansanu Kumar, et al.
0

Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness De-Choudhury2017 and jazz musicians heckathorn2001finding. Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a Decision tree-based Thompson sampler (DT-TMP) that efficiently discovers the right combination of attributes to query. Our proposed sampler outperforms the state-of-the-art samplers in online experiments, for example by 54% on Twitter. When the number of matching entities to a query is known in offline experiments, DT-TMP performs exceedingly well by a factor of 0.9-1.5× over the baseline samplers. In the future, we wish to explore the option of finding hidden populations by formulating more complex queries.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset