Introduction
Predictive models are widely employed in a variety of domains ranging from judiciary and health care to autonomous driving. As we increasingly rely on these models for highstakes decisions, identifying and characterizing their unexpected failures in the open world is critical. We categorize errors of a predictive model as: known unknowns and unknown unknowns [Attenberg, Ipeirotis, and Provost2015]
. Known unknowns are those data points for which the model makes low confidence predictions and errs. On the other hand, unknown unknowns correspond to those points for which the model is highly confident about its predictions but is actually wrong. Since the model lacks awareness of its unknown unknowns, approaches developed for addressing known unknowns (e.g., active learning
[Settles2009]) cannot be used for discovering unknown unknowns.Unknown unknowns can arise when data that is used for training a predictive model is not representative of the samples encountered at test time when the model is deployed. This mismatch could be a result of unmodeled biases in the collection of training data or differences between the train and test distributions due to temporal, spatial or other factors such as a subtle shift in task definition. To illustrate, consider an image classification task where the goal is to predict if a given image corresponds to a cat or a dog (Figure 1
). Let us assume that the training data is comprised of images of black dogs, and white and brown cats, and the feature set includes details such as nose shape, presence or absence of whiskers, color, and shape of the eyes. A predictive model trained on such data might learn to make predictions solely based on color despite the presence of other discriminative features because color can perfectly separate the two classes in the training data. However, during test time, such a model would classify an image of a white dog as a cat with high confidence. The images of white dogs are, therefore, unknown unknowns with regard to such a predictive model.
We formulate and address the problem of informed discovery of unknown unknowns of any given predictive model when deployed in the wild. More specifically, we seek to identify unknown unknowns which occur as a result of systematic biases in the training data. We formulate this as an optimization problem where unknown unknowns are discovered by querying an oracle for true labels of selected instances under a fixed budget which limits the number of queries to the oracle. The formulation assumes no knowledge of the functional form or the associated training data of the predictive model and treats it as a black box which outputs a label and a confidence score (or a proxy) for a given data point. These choices are motivated by realworld scenarios in domains such as healthcare and judiciary, where predictive models are being deployed in settings where end users have no access to either the model details or the associated training data (e.g., COMPAS risk assessment tool for sentencing [Brennan, Dieterich, and Ehret2009]). Identifying the blind spots of predictive models in such highstakes settings is critical as undetected unknown unknowns can be catastrophic. In criminal justice, biases and blindspots can lead to the inappropriate sentencing or incarceration of people charged with crimes or unintentional racial biases [Crawford2016]. To the best of our knowledge, this is the first work providing an algorithmic approach to addressing this problem.
Developing an algorithmic solution for the discovery of unknown unknowns introduces a number of challenges: 1) Since unknown unknowns can occur in any portion of the feature space, how do we develop strategies which can effectively and efficiently search the space? 2) As confidence scores associated with model predictions are typically not informative for identifying unknown unknowns, how can we make use of the feedback from an oracle to guide the discovery of unknown unknowns? 3) How can we effectively manage the tradeoff between searching in neighborhoods where we previously found unknown unknowns and examining unexplored regions of the search space?
To address the problem at hand, we propose a twostep approach which first partitions the test data such that instances with similar feature values and confidence scores assigned by the predictive model are grouped together, and then employs an exploreexploit strategy for discovering unknown unknowns across these partitions based on the feedback from an oracle. The first step, which we refer to as Descriptive Space Partitioning (DSP), is guided by an objective function which encourages partitioning of the search space such that instances within each partition are maximally similar in terms of their feature values and confidence scores. DSP also provides interpretable explanations of the generated partitions by associating a comprehensible and compact description with each partition. As we later demonstrate in our experimental results, these interpretable explanations are very useful in understanding the properties of unknown unknowns discovered by our framework. We show that our objective is NPhard and outline a greedy solution which is a approximation, where is the number of data points in the search space. The second step of our methodology facilitates an effective exploration of the partitions generated by DSP while exploiting the feedback from an oracle. We propose a multiarmed bandit algorithm, Bandit for Unknown Unknowns (UUB), which exploits problemspecific characteristics to efficiently discover unknown unknowns.
The proposed methodology builds on the intuition that unknown unknowns occurring due to systematic biases are often concentrated in certain specific portions of the feature space and do not occur randomly [Attenberg, Ipeirotis, and Provost2015]. For instance, the example in Figure 1 illustrates a scenario where systematic biases in the training data caused the predictive model to wrongly infer color as the distinguishing feature. Consequently, images following a specific pattern (i.e., all of the images of white dogs) turn out to be unknown unknowns for the predictive model. Another key assumption that is crucial to the design of effective algorithmic solutions for the discovery of unknown unknowns is that available evidential features are informative enough to characterize different subsets of unknown unknowns. If such features were not available in the data, it would not be possible to leverage the properties of previously discovered unknown unknowns to find new ones. Consequently, learning algorithms designed to discover unknown unknowns would not be able to perform any better than blind search (no free lunch theorem [Wolpert and Macready1997]).
We empirically evaluate the proposed framework on the task of discovering unknown unknowns occurring due to a variety of factors such as biased training data and domain adaptation across various diverse tasks, such as sentiment classification, subjectivity detection, and image classification. We experiment with a variety of base predictive models, ranging from decision trees to neural networks. The results demonstrate the effectiveness of the framework and its constituent components for the discovery of unknown unknowns across different experimental conditions, providing evidence that the method can be readily applied to discover unknown unknowns in different realworld settings.
Problem Formulation
Given a blackbox predictive model which takes as input a data point with features , and returns a class label and a confidence score , our goal is to find the unknown unknowns of w.r.t a given test set using a limited number of queries, , to the oracle, and, more broadly, to maximize the utility associated with the discovered unknown unknowns. The discovery process is guided by a utility function, which not only incentivizes the discovery of unknowns unknowns, but also accounts for the costs associated with querying the oracle (e.g., monetary and time costs of labeling in crowdsourcing). Recall that, in this work, we focus on identifying unknown unknowns arising due to systematic biases in the training data. It is important to note that our formulation not only treats the predictive model as a blackbox but also assumes no knowledge about the data used to train the predictive model.
Although our methodology is generic enough to find unknown unknowns associated with all the classes in the data, we formulate the problem for a particular class , a critical class, where false positives are costly and need to be discovered [Elkan2001]. Based on the decisions of the system designer regarding critical class and confidence threshold , our search space for unknown unknown discovery constitutes all of those data points in which are assigned the critical class by model with confidence higher than .
Our approach takes the following inputs: 1) A set of instances, , which were confidently assigned to the critical class by the model , and the corresponding confidence scores, , assigned to these points by , 2) An oracle which takes as input a data point and returns its true label as well as the cost incurred to determine the true label of , 3) A budget on the number of times the oracle can be queried.
Our utility function, , for querying the label of data point at the step of exploration is defined as:
(1) 
where is an indicator function which returns if is identified as an unknown unknown, and a otherwise. is the cost incurred by the oracle to determine the label of .
Both the indicator and the cost functions in Equation 1 are initially unknown and observed based on oracle’s feedback on . is a tradeoff parameter which can be provided by the end user.
Problem Statement: Find a sequence of instances for which the cumulative utility is maximum.
Methodology
In this section, we present our twostep framework designed to address the problem of informed discovery of unknown unknowns which occur due to systematic biases in the training data. We begin this section by highlighting the assumptions required for our algorithmic solution to be effective:

Unknown unknowns arising due to biases in training data typically occur in certain specific portions of the feature space and not at random. For instance, in our image classification example, the systematic bias of not including white dog images in the training data resulted in a specific category of unknown unknowns which were all clumped together in the feature space and followed a specific pattern. Attenberg et. al. [Attenberg, Ipeirotis, and Provost2015] observed this assumption to hold in practice and leveraged human intuition to find systematic patterns of unknown unknowns.

We also assume that the features available in the data can effectively characterize different kinds of unknown unknowns, but the biases in the training data prevented the predictive model from leveraging these discriminating features for prediction. If such features were not available in the data, it would not be possible to utilize the characteristics of previously discovered unknown unknowns to find new ones. Consequently, no learning algorithm would perform better than blind search if this assumption did not hold (no free lunch theorem [Wolpert and Macready1997]).
Below we discuss our methodology in detail. First we present Descriptive Space Partitioning (DSP), which induces a similarity preserving partition on the set . Then, we present a novel multiarmed bandit algorithm, which we refer to as Bandit for Unknown Unknowns (UUB), for systematically searching for unknown unknowns across these partitions while leveraging feedback from an oracle.
Descriptive Space Partitioning
Our approach exploits the aforementioned intuition that blind spots arising due to systematic biases in the data do not occur at random, but are instead concentrated in specific portions of the feature space. The first step of our approach, DSP, partitions the instances in such that instances which are grouped together are similar to each other w.r.t the feature space and were assigned similar confidence scores by the model . Partitioning enables our bandit algorithm, UUB, to discover regions with high concentrations of unknown unknowns.
The intuition behind our partitioning approach is that two instances and are likely to be judged using a similar logic by model if they share similar feature values and are assigned to the same class with comparable confidence scores by . In such cases, if is identified as an unknown unknown, is likely to be an unknown unknown as well^{1}^{1}1Note that this is not always the case, as we will see in the next section.. Based on this intuition, we propose an objective function which encourages grouping of instances in that are similar w.r.t the criteria outlined above, and facilitates separation of dissimilar instances. The proposed objective also associates a concise, comprehensible description with each partition, which is useful for understanding the exploration behavior of our framework and the kinds of unknown unknowns of (details in the Experimental Evaluation Section).
DSP takes as input a set of candidate patterns where each is a conjunction of (feature, operator, value) tuples where operator . Such patterns can be obtained by running an offtheshelf frequent pattern mining algorithm such as Apriori [Agrawal, Srikant, and others1994] on . Each pattern covers a set of one or more instances in . For each pattern , the set of instances that satisfy is represented by , the centroid of such instances is denoted by , and their mean confidence score is .
The partitioning objective minimizes dissimilarities of instances within each partition and maximizes them across partitions. In particular, we define goodness of each pattern in as the combination of following metrics, where and
are standard distance measures defined over feature vectors of instances and their confidence scores respectively:
Intrapartition feature distance:  
Interpartition feature distance:  
Intrapartition confidence score distance:  
Interpartition confidence score distance:  
(feature, operator, value) tuples in pattern , included to  
favor concise descriptions. 
Given the sets of instances , corresponding confidence scores , a collection of patterns , and weight vector used to combine through , our goal is to find a set of patterns such that it covers all the points in and minimizes the following objective:
(2)  
where corresponds to an indicator variable associated with pattern which determines if the pattern has been added to the solution set () or not ().
The aforementioned formulation is identical to that of a weighted set cover problem which is NPhard [Johnson1974]. It has been shown that a greedy solution provides a approximation to the weighted set cover problem [Johnson1974, Feige1998] where is the size of search space. Algorithm 1 applies a similar strategy which greedily selects patterns with maximum coveragetoweight ratio at each step, thus resulting in a approximation guarantee. This process is repeated until no instance in is left uncovered. If an instance in is covered by multiple partitions, ties are broken by assigning it to a partition with the closest centroid.
Our partitioning approach is inspired by a class of clustering techniques commonly referred to as conceptual clustering [Michalski and Stepp1983, Fisher1987] or descriptive clustering [Weiss2006, Li, Peng, and Wu2008, Kim, Rudin, and Shah2014, Lakkaraju and Leskovec2016]. We make the following contributions to this line of research: We propose a novel objective function, whose components have not been jointly considered before. In contrast to previous solutions which employ post processing techniques or use Bayesian frameworks, we propose a simple, yet elegant solution which offers theoretical guarantees.
Multiarmed Bandit for Unknown Unknowns
The output of the first step of our approach, DSP, is a set of partitions such that each corresponds to a set of data points which are similar w.r.t the feature space and have been assigned similar confidence scores by the model . The partitioning scheme, however, does not guarantee that all data points in a partition share the same characteristic of being unknown unknown (or not being unknown unknown). It is important to note that sharing similar feature values and confidence scores does not ensure that the data points in a partition are indistinguishable as far as the model logic is concerned. This is due to the fact that the model is a blackbox and we do not actually observe the underlying functional forms and/or feature importance weights being used by . Consequently, each partition has an unobservable concentration of unknown unknown instances. The goal of the second step of our approach is to compute an exploration policy over the partitions generated by DSP such that it maximizes the cumulative utility of the discovery of unknown unknowns (as defined in the Problem Formulation section).
We formalize this problem as a multiarmed bandit problem and propose an algorithm for deciding which partition to query at each step (See Algorithm 2). In this formalization, each partition corresponds to an arm of the bandit. At each step, the algorithm chooses a partition and then randomly samples a data point from that partition without replacement and queries its true label from the oracle. Since querying the data point reveals whether it is an unknown unknown, the point is excluded from future steps.
In the first steps, the algorithm samples a point from each partition. Then, at each step , the exploration decisions are guided by a combination of , the empirical mean utility (reward) of the partition at time , and
, which represents the uncertainty over the estimate of
.Our problem setting has the characteristic that the expected utility of each arm is nonstationary; querying a data point from a partition changes the concentration of unknown unknowns in the partition and consequently changes the expected utility of that partition in future steps. Therefore, stationary MAB algorithms such as UCB [Auer, CesaBianchi, and Fischer2002] are not suitable. A variant of UCB, discounted UCB, addresses the nonstationary settings and can be used as follows to compute and [Garivier and Moulines2008].
The main idea of discounted UCB is to weight recent observations more to account for the nonstationary nature of the utility function. If denotes the discounting factor applied at time to the reward obtained from arm at time , in the case of discounted UCB, where . Garivier et. al. established a lower bound on the regret in the presence of abrupt changes in the reward distributions of the arms and also showed that discounted UCB matches this lower bound upto a logarithmic factor [Garivier and Moulines2008].
The discounting factor of discounted UCB is designed to handle arbitrary changes in the utility distribution, whereas the way the utility of a partition changes in our setting has a certain structure: The utility estimate of arm only changes by a bounded quantity when the arm is queried. Using this observation, we can customize the calculation of for our setting and eliminate the need to set up the value of , which affects the quality of decisions made by discounted UCB. We compute as the ratio of the number of data points in the partition at time to the number of data points in the partition at time :
(3) 
The value of is inversely proportional to the number of pulls of arm during the interval . is 1, if the arm is not pulled during this interval, indicating that the expected utility of remained unchanged. We refer to the version of Algorithm 2 that uses the discounting factor specific to our setting (Eqn. 3) as Bandit for Unknown Unknowns (UUB).
Experimental Evaluation
We now present details of the experimental evaluation of constituent components of our framework as well as the entire pipeline.
Datasets and Nature of Biases:
We evaluate the performance of our methodology across four different data sets in which the underlying cause of unknown unknowns vary from biases in training data to domain adaptation:
(1) Sentiment Snippets: A collection of 10K sentiment snippets/sentences expressing opinions on various movies [Pang and Lee2005]. Each snippet (sentence) corresponds to a data point and is labeled as positive or negative.
We split the data equally into train and test sets. We then bias the training data by randomly removing subgroups of negative snippets from it. We consider positive sentiment as the critical class for this data.
(2) Subjectivity: A set of 10K subjective and objective snippets extracted from Rotten Tomatoes webpages [Pang and Lee2004]. We consider the objective class in this dataset as the critical class, split the data equally into train and test sets, and introduce bias in the same way as described above.
(3) Amazon Reviews: A random sample of 50K reviews of books and electronics collected from Amazon [McAuley, Pandey, and
Leskovec2015]. We use this data set to study unknown unknowns introduced by domain adaptation; we train the predictive models on the electronics reviews and then test them on the book reviews. Similar to the sentiment snippets data set, the positive sentiment is the critical class.
(4) Image Data: A set of 25K cat and dog images [Kaggle2013]. We use this data set to assess whether our framework can recognize unknown unknowns that occur when semantically meaningful subgroups are missing from the training data. To this end, we split the data equally into train and test and bias the training data such that it comprises only of images of dogs which are black, and cats which are not black. We set the class label cat to be the critical class in our experiments.
Experimental Setting:
We use bag of words features to train the predictive models for all of our textual data sets. As the features for the images, we use superpixels obtained using the standard algorithms [Ribeiro, Singh, and
Guestrin2016]
. Images are represented with a feature vector comprising of 1’s and 0’s indicating the presence or absence of the corresponding super pixels. We experiment with multiple predictive models: decision trees, SVMs, logistic regression, random forests and neural network. Due to space constraints, this section presents results for decision trees as model
but detailed results for all the other models are included in the Appendix. We set the threshold for confidence scores to 0.65 to construct our search space for each data set. We consider two settings for the cost function (refer Eqn. 1): The cost is set to for all instances (uniform cost) in the image dataset and it is set to (variable cost) for all textual data. length denotes the number of words in a snippet (or review) ; minlength and maxlength denote the minimum and maximum number of words in any given snippet (or review). Note that these cost functions are only available to the oracle. The tradeoff parameter is set to 0.2. The parameters of DSP are estimated by setting aside as a validation set 5% of the test instances assigned to the critical class by the predictive models. We search the parameter space using coordinate descent to find parameters which result in the minimum value of the objective function defined in Eqn. Descriptive Space Partitioning. We set the budget to of all the instances in the set through out our experiments. Further, the results presented for UUB are all averaged across runs.Evaluating the Partitioning Scheme
The effectiveness of our framework relies on the notion that our partitioning scheme, DSP, creates partitions such that unknown unknowns are concentrated in a specific subset of partitions as opposed to being evenly spread out across them. If unknown unknowns are distributed evenly across all the partitions, our bandit algorithm cannot perform better than a strategy which randomly chooses a partition at each step of the exploration process. We, therefore, measure the quality of partitions created by DSP by measuring the entropy of the distribution of unknown unknowns across the partitions in . For each partition , we count the number of unknown unknowns, based on the true labels which are only known to the oracle. We then compute entropy of as follows:
Smaller entropy values are desired as they indicate higher concentrations of unknown unknowns in fewer partitions.
Figure 2
compares the entropy of the partitions generated by DSP with clusters generated by kmeans algorithms using only features in
(kmeansfeatures), only confidence scores in (kmeansconf) and both (kmeansboth) by first clustering using confidence scores and then using features. The entropy values for DSP are consistently smaller compared to alternative approaches using kmeans across all the datasets. This can be explained by the fact that DSP jointly optimizes inter and intrapartition distances over both features and confidence scores. As shown in Figure 2, the entropy values are much higher when kmeans considers only features or only confidence scores indicating the importance of jointly reasoning about them.We also compare the entropy values obtained for DSP as well as other kmeans based approaches to an upper bound computed with random partitioning. For each of the algorithms (DSP and other kmeans based approaches), we designed a corresponding random partitioning scheme which randomly reassigns all the data points in the set to partitions while keeping the number of partitions and the number of data points within each partition same as that of the corresponding algorithm. We observe that the entropy values obtained for DSP and all the other baselines are consistently smaller than those of the corresponding random partitioning schemes. Also, the entropy values for DSP are about 3237% lower compared to its random counterpart across all of the datasets.
Evaluating the Bandit Algorithm
We measure the performance of our multiarmed bandit algorithm UUB in terms of a standard evaluation metric in the MAB literature called
cumulative regret. Cumulative regret of a policy is computed as the difference between the total reward collected by an optimal policy , which at each step plays the arm with the highest expected utility (or reward) and the total reward collected by the policy . Small values of cumulative regret indicate better policies. The utility function defined in Eqn. 1 determines the reward associated with each instance.We compare the performance of our algorithm, UUB, with that of several baseline algorithms such as random, greedy, greedy strategies [Chapelle and Li2011], UCB, UCB [Slivkins and Upfal2008], sliding window and discounted UCB [Garivier and Moulines2008] for various values of the discounting factor . All algorithms take as input the partitions created by DSP. Figure 3(a) shows the cumulative regret of each of these algorithms on the image data set. Results for the other data sets can be seen in the Appendix. The figure shows that UUB achieves the smallest cumulative regret compared to other baselines on the image data set. Similarly, UUB is the best performing algorithm on the sentiment snippets and subjectivity snippets data sets, whereas discounted UCB () achieves slightly smaller regret than UUB on the Amazon reviews data set. The experiment also highlights a disadvantage of the discounted UCB algorithm as its performance is sensitive to the choice of the discounting factor , where as UUB is parameter free. Further, both UCB and its variant UCB which are designed for stationary and slowly changing reward distributions respectively have higher cumulative regret than UUB and discounted UCB indicating that they are not as effective in our setting.
Evaluating the Overall Methodology
In the previous section, we compared the performance of UUB to other bandit methods when they are given the same data partitions to explore. In this section, we evaluate the performance of our complete pipeline (DSP + UUB). Due to the lack of existing baselines which address the problem at hand, we compare the performance of our framework to other endtoend heuristic methods we devised as baselines. Due to space constraints, we present results only for the image dataset. Results for other data sets can be seen in the Appendix.
We compare the cumulative regret of our framework to that of a variety of baselines: 1) Random sampling: Randomly select instances from set for querying the oracle. 2) Least average similarity: For each instance in , compute the average Euclidean distance w.r.t all the data points in the training set and choose instances with the largest distance. 3) Least maximum similarity: Compute minimum Euclidean distance of each instance in from the training set and choose instances with the highest distances. 4) Most uncertain: Rank the instances in in increasing order of the confidence scores assigned by the model and pick the top
instances. The least average similarity and least maximum similarity baselines are related to research on outlier detection
[Chandola, Banerjee, and Kumar2007]. Furthermore, the baseline titled most uncertain is similar to the uncertainty sampling query strategy used in active learning literature. Note that the least average similarity and the least maximum similarity baselines assume access to the data used to train the predictive model unlike our framework which makes no such assumptions. Figure 3(b) shows the cumulative regret of our framework and the baselines for the image data. It can be seen that UUB achieves the least cumulative regret of all the strategies across all data sets. It is interesting to note that the least average similarity and the least maximum similarity approaches perform worse than UUB in spite of having access to additional information in the form of training data.Qualitative Analysis Figure 4 presents an illustrative example of how our methodology explores three of the partitions generated for the image data set. Our partitioning framework associated the super pixels shown in the Figure with each partition. Examining the super pixels reveals that partitions 1, 2 and 3 correspond to the images of white chihuahuas (dog), white cats, and brown dogs respectively. The plot shows the number of times the arms corresponding to these partitions have been played by our bandit algorithm. The figure shows that partition 2 is chosen fewer times compared to partitions 1 and 3 — because white cat images are part of the training data used by the predictive models and there are not many unknown unknowns in this partition. On the other hand, white and brown dogs are not part of the training data and our bandit algorithm explores these partitions often. Figure 4 also indicates that partition 1 was explored often during the initial plays but not later on. This is because there were fewer data points in that partition and the algorithm had exhausted all of them after a certain number of plays.
Related Work
In this section, we review prior research relevant to the discovery of unknown unknowns.
Unknown Unknowns The problem of model incompleteness and the challenge of grappling with unknown unknowns in the real world has been coming to the fore as a critical topic in discussions about the utility of AI technologies [Horvitz2008].
Attenberg et. al. introduced the idea of harnessing human input to identify unknown unknowns but their studies left the task of exploration and discovery completely to humans without any assistance [Attenberg, Ipeirotis, and Provost2015]. In contrast, we propose an algorithmic framework in which the role of the oracle is simpler and more realistic: The oracle is only queried for labels of selected instances chosen by our algorithmic framework.
Dataset Shift
A common cause of unknown unknowns is dataset shift, which represents the mismatch between training and test distributions [QuioneroCandela et al.2009, Jiang and Zhai2007]. Multiple approaches have been proposed to address dataset shift, including importance weighting of training instances based on similarity to test set [Shimodaira2000], online learning of prediction models [CesaBianchi and
Lugosi2006], and learning models robust to adversarial actions [Teo et al.2007, Graepel and
Herbrich2004, Decoste and
Schölkopf2002]. These approaches cannot be applied to our setting as they make one or more of the following assumptions which limit their applicability to realworld settings: 1) the model is not a black box 2) the data used to train the predictive model is accessible 3) the model can be adaptively retrained. Further, the goal of this work is different as we study the problem of discovering unknown unknowns of models which are already deployed.
Active Learning Active learning techniques aim to build highly accurate predictive models while requiring fewer labeled instances. These approaches typically involve querying an oracle for labels of certain selected instances and utilizing the obtained labels to adaptively retrain the predictive models [Settles2009]. Various query strategies have been proposed to choose the instances to be labeled (e.g., uncertainty sampling [Lewis and Gale1994, Settles2009], query by committee [Seung, Opper, and Sompolinsky1992], expected model change [Settles, Craven, and Ray2008], expected error reduction [Zhu, Lafferty, and Ghahramani2003]
, expected variance reduction
[Zhang and Oles2000]). Active learning frameworks were designed to be employed during the learning phase of a predictive model and are therefore not readily applicable to our setting where the goal is to find blind spots of a black box model which has already been deployed. Furthermore, query strategies employed in active learning are guided towards the discovery of known unknowns, utilizing information from the predictive model to determine which instances should be labeled by the oracle. These approaches are not suitable for the discovery of unknown unknowns as the model is not aware of unknown unknowns and it lacks meaningful information towards their discovery.Outlier Detection Outlier detection involves identifying individual data points (global outliers) or groups of data points (collective outliers) which either do not conform to a target distribution or are dissimilar compared to majority of the instances in the data [Han, Pei, and Kamber2011, Chandola, Banerjee, and Kumar2007]. Several parametric approaches [Agarwal2007, Abraham and Box1979, Eskin2000] were proposed to address the problem of outlier detection. These methods made assumptions about the underlying data distribution, and characterized those points with a smaller likelihood of being generated from the assumed distribution, as outliers. Nonparametric approaches [Eskin2000, Eskin et al.2002, Fawcett and Provost1997] which made fewer assumptions about the distribution of the data such as histogram based methods, distance and density based methods were also proposed to address this problem. Though unknown unknowns of any given predictive model can be regarded as collective outliers w.r.t the data used to train that model, the aforementioned approaches are not applicable to our setting as we assume no access to the training data.
Discussion & Conclusions
We presented an algorithmic approach to discovering unknown unknowns of predictive models. The approach assumes no knowledge of the functional form or the associated training data of the predictive models, thus, allowing the method to be used to build insights about the behavior of deployed predictive models. In order to guide the discovery of unknown unknowns, we partition the search space and then use bandit algorithms to identify partitions with larger concentrations of unknown unknowns. To this end, we propose novel algorithms both for partitioning the search space as well as sifting through the generated partitions to discover unknown unknowns.
We see several research directions ahead, including opportunities to employ alernate objective functions. For instance, the budget could be defined in terms of the total cost of querying the oracle instead of the number of queries to the oracle. Our method can also be extended to more sophisticated settings where the utility of some types of unknown unknowns decreases with time as sufficient examples of the type are discovered (e.g., after informing the engineering team about the discovered problem). In many settings, the oracle can be approximated via the acquisition of labels from crowdworkers, and the labeling noise of the crowd might be addressed by incorporating repeated labeling into our framework.
The discovery of unknown unknowns can help system designers when deploying predictive models in numerous ways. The partitioning scheme that we have explored provides interpretable descriptions of each of the generated partitions. These descriptions could help a system designer to readily understand the characteristics of the discovered unknown unknowns and devise strategies to prevent errors or recover from them (e.g., silencing the model when a query falls into a particular partition where unknown unknowns were discovered previously). Discovered unknown unknowns can further be used to retrain the predictive model which in turn can recognize its mistakes and even correct them.
Formal machinery that can shine light on limitations of our models and systems will be critical in moving AI solutions into the open world–especially for highstakes, safety critical applications. We hope that this work on an algorithmic approach to identifying unknown unknowns in predictive models will stimulate additional research on incompleteness in our models and systems.
Acknowledgments
Himabindu Lakkaraju carried out this research during an internship at Microsoft Research. The authors would like to thank Lihong Li, Janardhan Kulkarni, and the anonymous reviewers for their insightful comments and feedback.
References
 [Abraham and Box1979] Abraham, B., and Box, G. E. 1979. Bayesian analysis of some outlier problems in time series. Biometrika 66(2):229–236.
 [Agarwal2007] Agarwal, D. 2007. Detecting anomalies in crossclassified streams: a bayesian approach. Knowledge and information systems 11(1):29–44.
 [Agrawal, Srikant, and others1994] Agrawal, R.; Srikant, R.; et al. 1994. Fast algorithms for mining association rules. In VLDB.
 [Attenberg, Ipeirotis, and Provost2015] Attenberg, J.; Ipeirotis, P.; and Provost, F. 2015. Beat the machine: Challenging humans to find a predictive model’s unknown unknowns. J. Data and Information Quality 6(1):1:1–1:17.
 [Auer, CesaBianchi, and Fischer2002] Auer, P.; CesaBianchi, N.; and Fischer, P. 2002. Finitetime analysis of the multiarmed bandit problem. Machine learning 47(23):235–256.
 [Brennan, Dieterich, and Ehret2009] Brennan, T.; Dieterich, W.; and Ehret, B. 2009. Evaluating the predictive validity of the compas risk and needs assessment system. Criminal Justice and Behavior 36(1):21–40.
 [CesaBianchi and Lugosi2006] CesaBianchi, N., and Lugosi, G. 2006. Prediction, learning, and games. Cambridge university press.
 [Chandola, Banerjee, and Kumar2007] Chandola, V.; Banerjee, A.; and Kumar, V. 2007. Outlier detection: A survey.

[Chapelle and Li2011]
Chapelle, O., and Li, L.
2011.
An empirical evaluation of thompson sampling.
In NIPS, 2249–2257.  [Crawford2016] Crawford, K. 2016. Artificial intelligence’s white guy problem. New York Times. http://www.nytimes.com/2016/06/26/opinion/sunday/artificialintelligenceswhiteguyproblem.html.

[Decoste and
Schölkopf2002]
Decoste, D., and Schölkopf, B.
2002.
Training invariant support vector machines.
Machine learning 46(13):161–190.  [Elkan2001] Elkan, C. 2001. The foundations of costsensitive learning. In IJCAI.

[Eskin et al.2002]
Eskin, E.; Arnold, A.; Prerau, M.; Portnoy, L.; and Stolfo, S.
2002.
A geometric framework for unsupervised anomaly detection.
In Applications of data mining in computer security. Springer. 77–101. 
[Eskin2000]
Eskin, E.
2000.
Anomaly detection over noisy data using learned probability distributions.
In ICML, 255–262.  [Fawcett and Provost1997] Fawcett, T., and Provost, F. 1997. Adaptive fraud detection. Data mining and knowledge discovery 1(3):291–316.
 [Feige1998] Feige, U. 1998. A threshold of ln n for approximating set cover. Journal of the ACM 45(4):634–652.
 [Fisher1987] Fisher, D. H. 1987. Knowledge acquisition via incremental conceptual clustering. Machine learning 2(2):139–172.
 [Garivier and Moulines2008] Garivier, A., and Moulines, E. 2008. On upperconfidence bound policies for nonstationary bandit problems. arXiv preprint arXiv:0805.3415.

[Graepel and
Herbrich2004]
Graepel, T., and Herbrich, R.
2004.
Invariant pattern recognition by semidefinite programming machines.
In NIPS, 33.  [Han, Pei, and Kamber2011] Han, J.; Pei, J.; and Kamber, M. 2011. Data mining: concepts and techniques. Elsevier.
 [Horvitz2008] Horvitz, E. 2008. Artificial intelligence in the open world. Presidential Address, AAAI. http://bit.ly/2gCN7t9.
 [Jiang and Zhai2007] Jiang, J., and Zhai, C. 2007. A twostage approach to domain adaptation for statistical classifiers. In CIKM, 401–410.
 [Johnson1974] Johnson, D. S. 1974. Approximation algorithms for combinatorial problems. Journal of computer and system sciences 9(3):256–278.
 [Kaggle2013] Kaggle. 2013. Dogs vs cats dataset. https://www.kaggle.com/c/dogsvscats/data.
 [Kim, Rudin, and Shah2014] Kim, B.; Rudin, C.; and Shah, J. A. 2014. The bayesian case model: A generative approach for casebased reasoning and prototype classification. In NIPS, 1952–1960.
 [Lakkaraju and Leskovec2016] Lakkaraju, H., and Leskovec, J. 2016. Confusions over time: An interpretable bayesian model to characterize trends in decision making. In NIPS, 3261–3269.
 [Lakkaraju et al.2016] Lakkaraju, H.; Kamar, E.; Caruana, R.; and Horvitz, E. 2016. Discovering blind spots of predictive models: Representations and policies for guided exploration. https://arxiv.org/abs/1610.09064.
 [Lewis and Gale1994] Lewis, D. D., and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In SIGIR, 3–12.
 [Li, Peng, and Wu2008] Li, Z.; Peng, H.; and Wu, X. 2008. A new descriptive clustering algorithm based on nonnegative matrix factorization. In IEEE International Conference on Granular Computing, 407–412.
 [McAuley, Pandey, and Leskovec2015] McAuley, J.; Pandey, R.; and Leskovec, J. 2015. Inferring networks of substitutable and complementary products. In KDD, 785–794.
 [Michalski and Stepp1983] Michalski, R. S., and Stepp, R. E. 1983. Learning from observation: Conceptual clustering. In Machine learning: An artificial intelligence approach. Springer. 331–363.

[Pang and Lee2004]
Pang, B., and Lee, L.
2004.
A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts.
In ACL, 271.  [Pang and Lee2005] Pang, B., and Lee, L. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, 115–124.
 [QuioneroCandela et al.2009] QuioneroCandela, J.; Sugiyama, M.; Schwaighofer, A.; and Lawrence, N. D. 2009. Dataset Shift in Machine Learning. The MIT Press.
 [Ribeiro, Singh, and Guestrin2016] Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. ” why should i trust you?”: Explaining the predictions of any classifier. In KDD.
 [Settles, Craven, and Ray2008] Settles, B.; Craven, M.; and Ray, S. 2008. Multipleinstance active learning. In NIPS, 1289–1296.
 [Settles2009] Settles, B. 2009. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.
 [Seung, Opper, and Sompolinsky1992] Seung, H. S.; Opper, M.; and Sompolinsky, H. 1992. Query by committee. In COLT, 287–294.
 [Shimodaira2000] Shimodaira, H. 2000. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of statistical planning and inference 90(2):227–244.
 [Slivkins and Upfal2008] Slivkins, A., and Upfal, E. 2008. Adapting to a changing environment: the brownian restless bandits. In COLT, 343–354.
 [Teo et al.2007] Teo, C. H.; Globerson, A.; Roweis, S. T.; and Smola, A. J. 2007. Convex learning with invariances. In NIPS, 1489–1496.
 [Weiss2006] Weiss, D. 2006. Descriptive clustering as a method for exploring text collections. Ph.D. Dissertation.

[Wolpert and Macready1997]
Wolpert, D. H., and Macready, W. G.
1997.
No free lunch theorems for optimization.
IEEE transactions on evolutionary computation
1(1):67–82.  [Zhang and Oles2000] Zhang, T., and Oles, F. 2000. The value of unlabeled data for classification problems. In ICML, 1191–1198.

[Zhu, Lafferty, and
Ghahramani2003]
Zhu, X.; Lafferty, J.; and Ghahramani, Z.
2003.
Combining active learning and semisupervised learning using gaussian fields and harmonic functions.
In ICML workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.