ProtoDash: Fast Interpretable Prototype Selection
In this paper we propose an efficient algorithm ProtoDash for selecting prototypical examples from complex datasets. Our work builds on top of the learn to criticize (L2C) work by Kim et al. (2016) and generalizes it to not only select prototypes for a given sparsity level m but also to associate non-negative weights with each of them indicative of the importance of each prototype. Unlike in the case of L2C, this extension provides a single coherent framework under which both prototypes and criticisms (i.e. lowest weighted prototypes) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the open questions laid out in Kim et al. (2016). Our additional requirement of learning non-negative weights introduces technical challenges as the objective is no longer submodular as in the previous work. However, we show that the problem is weakly submodular and derive approximation guarantees for our fast ProtoDash algorithm. Moreover, ProtoDash can not only find prototypical examples for a dataset X, but it can also find (weighted) prototypical examples from X^(2) that best represent another dataset X^(1), where X^(1) and X^(2) belong to the same feature space. We demonstrate the efficacy of our method on diverse domains namely; retail, digit recognition (MNIST) and on the latest publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health.
READ FULL TEXT