Strategyproof Learning: Building Trustworthy User-Generated Datasets

06/04/2021
by   Sadegh Farhadkhani, et al.
0

Today's large-scale machine learning algorithms harness massive amounts of user-generated data to train large models. However, especially in the context of content recommendation with enormous social, economical and political incentives to promote specific views, products or ideologies, strategic users might be tempted to fabricate or mislabel data in order to bias algorithms in their favor. Unfortunately, today's learning schemes strongly incentivize such strategic data misreporting. This is a major concern, as it endangers the trustworthiness of the entire training datasets, and questions the safety of any algorithm trained on such datasets. In this paper, we show that, perhaps surprisingly, incentivizing data misreporting is not a fatality. We propose the first personalized collaborative learning framework, Licchavi, with provable strategyproofness guarantees through a careful design of the underlying loss function. Interestingly, we also prove that Licchavi is Byzantine resilient: it tolerates a minority of users that provide arbitrary data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset