Credible Sample Elicitation by Deep Learning, for Deep Learning
It is important to collect credible training samples (x,y) for building data-intensive learning systems (e.g., a deep learning system). In the literature, there is a line of studies on eliciting distributional information from self-interested agents who hold a relevant information. Asking people to report complex distribution p(x), though theoretically viable, is challenging in practice. This is primarily due to the heavy cognitive loads required for human agents to reason and report this high dimensional information. Consider the example where we are interested in building an image classifier via first collecting a certain category of high-dimensional image data. While classical elicitation results apply to eliciting a complex and generative (and continuous) distribution p(x) for this image data, we are interested in eliciting samples x_i ∼ p(x) from agents. This paper introduces a deep learning aided method to incentivize credible sample contributions from selfish and rational agents. The challenge to do so is to design an incentive-compatible score function to score each reported sample to induce truthful reports, instead of an arbitrary or even adversarial one. We show that with accurate estimation of a certain f-divergence function we are able to achieve approximate incentive compatibility in eliciting truthful samples. We then present an efficient estimator with theoretical guarantee via studying the variational forms of f-divergence function. Our work complements the literature of information elicitation via introducing the problem of sample elicitation. We also show a connection between this sample elicitation problem and f-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples.
READ FULL TEXT