Mining Hidden Populations through Attributed Search
Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness De-Choudhury2017 and jazz musicians heckathorn2001finding. Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a Decision tree-based Thompson sampler (DT-TMP) that efficiently discovers the right combination of attributes to query. Our proposed sampler outperforms the state-of-the-art samplers in online experiments, for example by 54% on Twitter. When the number of matching entities to a query is known in offline experiments, DT-TMP performs exceedingly well by a factor of 0.9-1.5× over the baseline samplers. In the future, we wish to explore the option of finding hidden populations by formulating more complex queries.
READ FULL TEXT