1 Introduction
In genomicsbased personalized or precision medicine, largescale screenings and sequencing produce thousands of genomic and molecular features for each sample. However, the data sets are small: typically only hundreds, or at most a thousand cell lines, and even fewer patients, are included in the data sets. This is the case also for genomic cancer medicine, the focus of this work, which includes gene expression, somatic mutation, copy number variation and cytogenetic marker measurements characterizing ex vivo
bone marrow patient samples for the task of predicting sensitivity to a panel of drugs. Although largescale genomic studies in cancer have identified recurrent molecular events that predict prognosis and explain pathogenesis, causal effects on drug response have been established for only a few of those features. The limited data poses a challenge for learning predictive models from the data. The established statistical methods for finding predictive features (biomarkers) and learning predictive models are similar across omicsbased data analysis tasks. Multivariate analysis of variance
[1] is a classical linear method. Recently, sparse regression models such as LASSO and elastic net [1, 2] have become standard reliable benchmark methods, and kernel methods enable finding more complex nonlinear combinations of the features [3, 4].A natural way to solve the particular problems caused by a small sample size is to measure more data. This is, however, often not an available option, due to costs, risks, or the rarity of the disease. Statistical ways of alleviating the problem are multitask learning [5, 4], which increases predictive power by sharing statistical strength between multiple related outputs or data sets, and incorporating biological prior knowledge. Biological prior knowledge about cancer pathways has been used as side information for learning [3, 4]
, for feature selection
[6, 7], or to modify regularization of, for instance, an elastic net [8]. Although these methods improve the predictions, they sidestep the problem of what information to choose from the databanks, and naturally cannot include knowledge not yet in the databanks.A more rarely exploited alternative is to ask an expert. Prior elicitation techniques [9] have been used for constructing prior distributions for Bayesian data analysis that take into account expert knowledge, and hence can restrict the range of parameters to be later used in learning models [10, 11, 12, 13]. These techniques focus on how to reliably elicit knowledge, whereas in practice it is equally important to minimize the effort required from the expert. Interactive and sequential learning can help by carefully deciding what to ask the user, and has been used, for instance, for clustering [14, 15]
, learning of Bayesian networks
[16], and for visualization [17].Very recently, interactive learning has been proposed for including expert knowledge in a prediction task, in a linear regression setting with a small sample size. First indications that improvements are possible
[18] were obtained with strong assumptions on simulated experts. Human experts were included in two other studies [19, 20] on textual data, obtaining improved predictions with a small number of expert feedbacks, for the tasks of predicting user ratings, and predicting citation counts. The elicitation techniques were based on Bayesian experimental design and a multiarmed bandit user model, which helped the user solve the explorationexploitation dilemma [21] in giving feedback.In this paper, we introduce sequential knowledge elicitation methods to the precision medicine prediction task, illustrated in Figure 1. As a case study to show the potential of the methods, we use drug sensitivity prediction in patient samples, which is known to be very hard. We predict drug responses of ex vivo cell samples from blood cancer patients, based on mutation data and cytogenetics markers. Two wellinformed experts are asked to provide feedback about the relevance of features when predicting sensitivity to specific targeted drugs, and about the direction of the putative effects.
Our main contribution is to show for the first time that sequential expert knowledge elicitation can improve predictive modeling in precision medicine. Specifically, we show that

expert knowledge elicitation improves the accuracy of drug sensitivity predictions in the difficult case of predicting drug responses based on patient’s somatic mutations and cytogenetic markers, crucial for choosing which drug to prescribe to a patient in precision medicine,

sequential expert knowledge elicitation reduces the number of queries required from the expert compared to naive approach with randomly chosen queries.
These empirical results required four advances from recent [19, 20] methods proposed for different elicitation tasks: Firstly, we extend a method previously used in automatic design of experiments to the challenging expert knowledge elicitation task in precision medicine. In addition, we introduce feedback on the direction of the putative effect and show that it is more effective in improving the drug sensitivity predictions than general relevance feedback. We develop the bandit user model approach to incorporate biological information in the form of pathways and drug targets. Finally, we extend the methods from univariate to multivariate outputs (sensitivity to multiple drugs, with feedback given to (drug, feature) pairs).
2 Problem setup
In cancer treatment, doctors need to choose which of the available drugs to administer to new patients. Machine learning could be used to assist the doctor in the choice by predicting the drug responses of patients from their genomic features. The available patient data sets, however, are too small for accurate learning of predictive models. Here we aim at improving the drug response predictions in this challenging setting. In particular, we propose sequential expert knowledge elicitation as a solution to the problem of drug response prediction given a small sample size. In our specific case, data from 51 blood cancer patients are available, with the number of considered genomic features (mutations and cytogenetic markers) being 3032, and we wish to predict the responses for 12 drugs.
Experts in blood cancer medicine have knowledge of biomarkers, and could also associate other features to the drug responses based on their experience. Unfortunately, the approach of naively querying each feature for knowledge from the experts is burdensome given the large number of features. Therefore, we introduce two sequential knowledge elicitation algorithms that are able to choose the (drug, feature) pairs that have the highest effect in improving the predictions, and compare their performance in this precision medicine case.
We assume that the experts will be able to answer two types of questions regarding the effect of a feature to drug response. Firstly, if a feature is relevant (or irrelevant) in predicting the drug response. This type of feedback was used in [19] for textual data. In addition to that, the experts may possess knowledge on the direction of the effect for a subset of relevant features. This directional feedback tells if a feature is positively or negatively correlated with the drug response, extending the work in [20], where the feedback was only on positive correlations and used in a textual data application. The precise mathematical formulation of the effect of the two feedback types to the prediction model will be given in Section 3.1.
3 Models and algorithms
In this section, we describe the proposed models and algorithms for sequential expert knowledge elicitation. First, we describe a sparse linear regression model that is used to learn the relationship between genomic features and drug responses, and which takes into account the elicited expert knowledge. Then we introduce the two elicitation methods developed for the case study in precision medicine.
3.1 Prediction model
Sparse linear regression models are used to predict the drug sensitivities based on the genomic features. Let be the sensitivity of the th patient for drug , and
be the vector of the patient’s
genomic features. We assume a Gaussian observation model:where the are the regression weights and is the residual variance. A sparsityinducing spikeandslab prior [22, 23] is put on the weights:
where
is a binary variable indicating whether the
th feature is relevant (i.e., drawn from a zeromean Gaussian prior with variance ) or not ( is set to zero via the Dirac delta spike ) when predicting for theth drug. The prior probability of relevance
controls the expected sparsity of the model via the priorThe model is completed with the hyperpriors:
Settings for the values of the hyperparameters are discussed in Section
4.1.Expert knowledge is incorporated into the model via feedback observation models [19]. The relevance feedback ( denotes irrelevant, relevant) of feature for drug follows
where is the probability of the expert being correct. For example, when the th feature for drug is relevant in the regression model (i.e., ), the expert would a priori be assumed to say with probability . In the model learning , once the expert has provided the feedback based on his or her knowledge, effectively controls how strongly the model will change to reflect the feedback.
The directional feedback ( denotes negative weight, positive) follows
where when the condition holds and otherwise, and is again the probability of the expert being correct. For example, when the weight is positive, the expert would a priori be assumed to say with probability . To simplify the model, we assume and set a prior on as . The prediction model and learning are detailed in Appendix A.
3.2 Expert knowledge elicitation methods
The purpose of expert knowledge elicitation algorithms is to sequentially choose queries to the expert, so that the improvement in predictions is maximized. In the case of genomic data, very few genes are known to be associated with drug responses, and therefore restricting the suggestions to the possibly known genes greatly reduces the number of queries. Previously, two expert knowledge elicitation methods have shown promising results for prediction of a single outcome variable on textual data. We extend these methods to the multioutput precision medicine setting. Next, we will introduce the two alternative elicitation methods, the performances of which will be evaluated and compared in the experiments.
3.2.1 Sequential experimental design
We introduce a sequential experimental design approach to select the next (drug, feature) pair candidate to be queried for feedback from the expert, extending the work in [19]. Specifically, at each iteration
, we find the pair where the feedback from the expert is expected to have the maximal influence on the drug sensitivity prediction. The amount of information in the expert feedback is measured by the Kullback–Leibler divergence (
) between the predictive distributions before and after observing the feedback. As the feedback value itself is unobserved before the actual query, an expectation over the predictive distributions of the two types of feedbacks is computed in finding the (drug, feature) pair with the highest expected information gain:and , observed drug sensitivities for patients and drugs and the genomic features , and is the set of feedbacks given before the current query iteration. The summation in runs over the training data. The measure the influence that the feedback on feature would have for the predictive distribution of the th patient for drug . Once the query is selected and presented to the expert, the provided feedback is added to the set to produce . Queries where the expert is not able to provide an answer do not affect the prediction model, but are added to the set so as not to be repeated.
Using the approximated posterior distribution (see Appendix A), the posterior predictive distribution of the relevance and directional feedback,
, follows a product of Bernoulli distributions. The approximate posterior predictive distribution of
follows a Gaussian distribution which makes the
divergence calculation simple. Calculating the expected information gain for each (drug, feature) pair requires four posterior approximations, which would make the query phase too costly. We follow a similar approach as in [19], and approximate the posterior with the new feedbacks with only partial expectation propagation updates.3.2.2 User model
We introduce another approach for selecting the next (drug, feature) pair candidate using a multiarmed bandit user model, extending the work in [20]. The benefit of bandit user modeling is that the model learns from the previous answers of the expert, and can guide the elicitation towards (drug, feature) pairs that will most likely get an answer from the expert (exploitation), simultaneously balancing the tradeoff with exploration of uncertain pairs. We borrow this idea from the bandit literature (see, for instance, [24]) to ensure that our user model concentrates the queries to the (drug, feature) pairs that are likely to get an answer from the expert.
The user model predicts the expected response of the expert for each query, in order to select the query on which we ask feedback next. We follow a linear bandit model [21] and previous work on user intent modeling [25, 20]
, where the estimate for each query is given by a dot product between a feature vector describing the query (later
description vector, ), and an unknown parameter that gives the relevance of the queries. The expected response is then , and at the iteration is estimated using standard linear regression . Here is a regularizer, is a matrix containing description vectors of the pairs that have received feedback before or at the iteration , and similarly contains the responses of the expert before or at the iteration . The response of the expert is if the feedback for the pair is either ”relevant”, directional or ”irrelevant”, and if the answer is ”I don’t know”. The default response is set to 0.5. The model chooses the pair for the next query based on the upper confidence bound criterion [21, 26]. The details of the user model are given in the Appendix B.A simple, common choice for the description vector would be to use directly the patient measurements (as done in previous works, for instance, for news article recommendation, where the description vector corresponding to each news was given by the existing features in the news article dataset and the user features [27]). However, in the more difficult case of precision medicine, this simple description vector definition would not lead to good performance due to the small sample size. Furthermore, previous studies show that the use of auxiliary data is effective in both drug response prediction [3] and interactive expert knowledge elicitation [20]. Thus, we introduce description vectors for each (drug , feature ) pair, constructed by using prior knowledge in the form of KEGG pathways from Molecular Signatures Database (MSigDB) [28], and the drug target genes from the DrugBank [29].
Specifically, we first indicate if the feature is the target of the drug, and then if the feature belongs in the same pathway as the target of the drug. The description of mutation features is included as an indication of which pathways the mutated gene belongs to. In our experiments (see Section 4), this results in description vectors of length 133, including 1 description of the feature type (mutation or a cytogenetic marker), 2 descriptions of the (drug, feature) pair, and 130 KEGG pathways containing any of the included genes, specified in Section 4.2.
4 Experiments
In order to evaluate the proposed methods, we apply them to real patient data and use feedback from wellinformed experts to simulate sequential knowledge elicitation. Details of the data set and the expert feedback collection are presented in the next section, followed by the experimental results showing the effectiveness of the methods in practice.
4.1 Experimental methods
We used a complete set of measurements on ex vivo drug response, somatic mutations and karyotype data (cytogenetic markers), generated for a cohort of 51 multiple myeloma patient samples. Drug responses are presented as Drug sensitivity scores (DSS) as described in [30] and were calculated for 308 drugs that have been tested in 5 different concentrations over a 1000fold concentration range. Somatic mutations were identified from exome sequencing data and annotated as described earlier in [31].
We focus our analysis on 12 targeted drugs, grouped in 4 groups based on their primary targets (BCL2, Glucocorticoid, PI3K/mTOR, and MEK1/2). This results in data matrices of (samples vs. drugs), (samples vs. mutations), and (samples vs. cytogenetic markers). All data are normalized to have zero mean and unit variance. In this paper we ask the experts only about the somatic mutations and cytogenetics markers, which the experts know better and hence need to spend less time on in the experiments. We will extend to molecular features with less well known effects, such as gene expression, in followup work.
We use leaveoneout crossvalidation^{1}^{1}1That is, in computing the predictions for each patient, that particular patient is not used in learning the prediction model. to estimate the performances of the drug sensitivity prediction models, with the concordance index (Cindex; the probability of predicting the correct order for a pair of samples; higher is better)^{2}^{2}2We note that Cindex computed from leaveoneout crossvalidation can be biased as it compares predictions for pairs of samples. We do not expect this to favour any particular method. [32, 3] and the mean squared error (MSE; lower is better) as the performance measures. MSE values are given in the normalized DSS units (zero mean, unit variance scaling on training data). Bayesian bootstrap [33] over the predictions is used to evaluate the uncertainty in pairwise model comparisons: in particular, we compute the probability that model is better than model as , where if condition holds and otherwise [34].
The hyperparameters of the prediction model were set as , , , , , , and , to reflect relatively vague information on the residual variance (roughly higher than ), a minor preference for sparse models and moderate effect sizes, and the a priori quality of the expert knowledge as 9 correct feedbacks out of 10. The regularization in the user model is , following the work [20].
4.2 Feedback collection
We expect that the experts can give feedback on the relevance and the direction of the putative effect of features in predicting the response to a drug. As observed in practice, the effect of a feature is often specific to a certain type of drug, therefore, we decided to elicit feedback on (drug, feature) pairs. Furthermore, we note that the experts indicated that the same feedback applies to all drugs in the same drug group. Specifically, we collected feedback from two wellinformed experts of multiple myeloma, using a form containing 161 mutations known to be related to cancer [35], and 7 cytogenetic markers. The experts were asked to give feedback specific to 12 targeted drugs, grouped by the targets (BCL2, Glucocorticoid, PI3K/mTOR, and MEK1/2). The answer counts by feedback type can be found in the Appendix C. The experts were instructed not to refer to external databases while completing the feedback form, in order to collect their (tacit) prior knowledge on the problem and make the task faster for them.
4.3 Simulated user experiment
We simulate sequential expert knowledge elicitation by iteratively querying (drug, feature) pairs for feedback, and answering the queries using the precollected feedback described in Section 4.2. At each iteration, the models are updated and the next pair is chosen, based on the feedback elicited up to that iteration, and the measurement data set which does not change. We run three simulations for comparing the elicitation methods: two where the pairs are chosen using one of the methods presented in Sections 3.2.1 and 3.2.2, and one where the pair is chosen randomly. The pairs are selected without replacement from the 2016 () pairs included in the feedback collection. The rest 2864 mutations are not queried for feedback, but all 3025 are included in the prediction model.
4.4 Results
We present here the two main results of the experiments. Further supplementary results can be found in Appendix D.
Expert knowledge elicitation improves the accuracy of drug sensitivity prediction. Table 1
establishes the baselines by comparing the prediction model we use, the spikeandslab regression model without expert feedback, to constant prediction of training data mean, ridge regression, and elastic net regression
^{3}^{3}3Ridge regression and elastic net are implemented using the glmnet Rpackage [36] with nested crossvalidation for choosing the regularization parameters.. Elastic net has poor performance with regard to MSE on this dataset. The ridge and the spikeandslab models have comparable performances, with bootstrapped probabilities of 0.87 of ridge being better in the Cindex and 0.42 in MSE.Table 2 compares the spikeandslab model without feedback to the models with all expert feedback. Knowledge of both of the experts improves the predictions. The model with feedback from the senior researcher has 7% higher Cindex and 8% lower MSE compared to the no feedback model and is confidently better according to the bootstrapped probabilities (0.98 for Cindex and 0.95 for MSE). The predictions improve in all of the 12 drugs that were considered in the experiment. Detailed results of drugwise predictions are provided in Appendix D.
Data mean  Ridge  Elastic net  Spikeandslab  

Cindex  0.50  0.62  0.60  0.61 
MSE  1.06  0.94  1.00  0.93 
No feedback  Doctoral candidate  Senior researcher  

Cindex  0.61  0.63  0.65 
MSE  0.93  0.92  0.86 
Sequential knowledge elicitation reduces the number of queries required from the expert. In the results presented so far, the experts had evaluated all (drug, feature) pairs and given their answers. However, sequential knowledge elicitation has the potential to reduce this workload significantly. We compare the effectiveness of the elicitation methods developed in this paper using a simulated user experiment (see Section 4.3). The results in Figure 2 show that both methods achieve faster improvement in prediction accuracy than the random selection, as a function of the amount of feedback. We use Area Under the MSE Curves (AUC) to evaluate the significance of the improvements in predictions of our knowledge elicitation methods, compared to the random selection method^{4}^{4}4We compare the AUC values of our knowledge elicitation methods to the empirical distribution of the AUC values of 50 independent runs of the random selection method.. With the senior researcher feedback, for both methods , whereas with the doctoral candidate feedback for the sequential experimental design and for the bandit user model. With sequential knowledge elicitation, 50 % of the final improvement is reached in the first 496 (43) and 191 (562) feedbacks, for the experimental design and bandit user model respectively, using Senior researcher feedback (Doctoral candidate feedback). For a comparison, an average of 1139 (1228) feedbacks are required for similar accuracy if the queries are chosen randomly. Thus, on average, the sequential experimental design requires only 23% of the number of queries compared to random, and the bandit user model 32%, to achieve half of the potential improvement.
5 Conclusion
In this work we show, for the first time, that sequential expert knowledge elicitation improves drug sensitivity prediction in precision cancer medicine. We also show in a simulated user experiment with real expert feedback that the proposed algorithms can elicit knowledge from experts efficiently. The results indicate that expert knowledge can be very beneficial and, hence, should be taken into account in modeling tasks of precision medicine. The doctors and researchers are analyzing the data regardless of the advances in the automated methods, and to not take their knowledge and expertise into account is to neglect one possible source of data in a case where the lack of data is a significant problem.
Our results were based on on knowledge elicited from two experts only. Nevertheless, a significant improvement in knowledge elicitation performance was observed for each of them even individually. In the future we will carry out a wider study to thoroughly quantify the effect of expert feedback, and to investigate further the initial observations about the impact of the type of feedback and the level of seniority of the experts. We note that the experts in this study had seen the data before, but to minimize the risk of overfitting they were instructed to answer based on knowledge without seeing the data, and in the followup work we will recruit experts who are completely naive as to the particular data.
We found that the most efficient elicitation method was different for the two experts. An obvious next question is how to combine the two elicitation methods to optimally utilize the complementary principles in them. In addition, we have shown here the improvement in sparse linear regression models. The next step will be to extend the method to more complex nonlinear models, and study how to maximally benefit from the responses of multiple experts.
Acknowledgements
This work was supported by the Academy of Finland [grant numbers 295503, 294238, 292334, 286607, 294015] and Centre of Excellence in Computational Inference Research COIN; and by Jenny and Antti Wihuri Foundation. We acknowledge the computational resources provided by the Aalto ScienceIT project.
References
 [1] M. J. Garnett, E. J. Edelman, S. J. Heidorn, C. D. Greenman, A. Dastur, K. W. Lau, P. Greninger, I. R. Thompson, X. Luo, J. Soares, et al., “Systematic identification of genomic markers of drug sensitivity in cancer cells,” Nature, vol. 483, no. 7391, pp. 570–575, 2012.
 [2] I. S. Jang, E. C. Neto, J. Guinney, S. H. Friend, and A. A. Margolin, “Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data,” in Pacific Symposium on Biocomputing, pp. 63–74, 2014.
 [3] J. C. Costello, L. M. Heiser, E. Georgii, M. Gönen, M. P. Menden, N. J. Wang, M. Bansal, P. Hintsanen, S. A. Khan, J.P. Mpindi, et al., “A community effort to assess and improve drug sensitivity prediction algorithms,” Nature Biotechnology, vol. 32, no. 12, pp. 1202–1212, 2014.
 [4] M. Ammadud din, S. A. Khan, D. Malani, A. Murumägi, O. Kallioniemi, T. Aittokallio, and S. Kaski, “Drug response prediction by inferring pathwayresponse associations with kernelized Bayesian matrix factorization,” Bioinformatics, vol. 32, no. 17, pp. i455–i463, 2016.
 [5] H. Yuan, I. Paskov, H. Paskov, A. J. González, and C. S. Leslie, “Multitask learning improves prediction of cancer drug sensitivity,” Scientific Reports, vol. 6, p. 31619, 2016.
 [6] I. S. Jang, R. Dienstmann, A. A. Margolin, and J. Guinney, “Stepwise group sparse regression (SGSR): genesetbased pharmacogenomic predictive models with stepwise selection of functional priors,” in Pacific Symposium on Biocomputing, vol. 20, pp. 32–43, 2015.
 [7] C. De Niz, R. Rahman, X. Zhao, and R. Pal, “Algorithms for drug sensitivity prediction,” Algorithms, vol. 9, no. 4, p. 77, 2016.
 [8] A. Sokolov, D. E. Carlin, E. O. Paull, R. Baertsch, and J. M. Stuart, “Pathwaybased genomics prediction using generalized elastic net,” PLoS Comput Biol, vol. 12, no. 3, p. e1004790, 2016.
 [9] A. O’Hagan, C. E. Buck, A. Daneshkhah, J. R. Eiser, P. H. Garthwaite, D. J. Jenkinson, J. E. Oakley, and T. Rakow, Uncertain judgements: Eliciting experts’ probabilities. Chichester, England: Wiley, 2006.
 [10] P. H. Garthwaite and J. M. Dickey, “Quantifying expert opinion in linear regression problems,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 462–474, 1988.
 [11] P. H. Garthwaite, S. A. AlAwadhi, F. G. Elfadaly, and D. J. Jenkinson, “Prior distribution elicitation for generalized linear and piecewiselinear models,” Journal of Applied Statistics, vol. 40, no. 1, pp. 59–75, 2013.
 [12] J. B. Kadane, J. M. Dickey, R. L. Winkler, W. S. Smith, and S. C. Peters, “Interactive elicitation of opinion for a normal linear model,” Journal of the American Statistical Association, vol. 75, no. 372, pp. 845–854, 1980.
 [13] H. Afrabandpey, T. Peltola, and S. Kaski, “Interactive prior elicitation of feature similarities for small sample size prediction,” in Proceedings of the 25th International Conference on User Modelling, Adaptation and Personalization (UMAP ’17), to appear. ArXiv preprint arXiv:1612.02802, 2016.
 [14] Z. Lu and T. K. Leen, “Semisupervised clustering with pairwise constraints: A discriminative approach,” in Proc of AISTATS, pp. 299–306, 2007.
 [15] M.F. Balcan and A. Blum, “Clustering with interactive feedback,” in International Conference on Algorithmic Learning Theory, pp. 316–328, Springer, 2008.
 [16] A. Cano, A. R. Masegosa, and S. Moral, “A method for integrating expert knowledge when learning Bayesian networks from data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 5, pp. 1382–1394, 2011.
 [17] L. House, L. Scotland, and C. Han, “Bayesian visual analytics: BaVa,” Statistical Analysis and Data Mining, vol. 8, no. 1, pp. 1–13, 2015.
 [18] M. Soare, M. Ammaduddin, and S. Kaski, “Regression with n 1 by expert knowledge elicitation,” in Proceedings of the 15th IEEE ICMLA International Conference on Machine learning and Applications, pp. 734–739, 2016.
 [19] P. Daee, T. Peltola, M. Soare, and S. Kaski, “Knowledge elicitation via sequential probabilistic inference for highdimensional prediction,” in arXiv preprint arXiv:1612.03328, 2016.
 [20] L. Micallef, I. Sundin, P. Marttinen, M. Ammadud din, T. Peltola, M. Soare, G. Jacucci, and S. Kaski, “Interactive elicitation of knowledge on feature relevance improves predictions in small data sets,” in Proceedings of the 22Nd International Conference on Intelligent User Interfaces (IUI ’17), pp. 547–552, 2017.
 [21] P. Auer, “Using confidence bounds for exploitationexploration tradeoffs,” Journal of Machine Learning Research, vol. 3, pp. 397–422, 2002.
 [22] T. J. Mitchell and J. J. Beauchamp, “Bayesian variable selection in linear regression,” Journal of the American Statistical Association, vol. 83, no. 404, pp. 1023–1032, 1988.
 [23] E. I. George and R. E. McCulloch, “Variable selection via Gibbs sampling,” Journal of the American Statistical Association, vol. 88, no. 423, pp. 881–889, 1993.
 [24] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics, vol. 6, no. 1, pp. 4–22, 1985.
 [25] T. Ruotsalo, G. Jacucci, P. Myllymäki, and S. Kaski, “Interactive intent modeling: information discovery beyond search,” CACM, vol. 58, no. 1, pp. 86–92, 2015.
 [26] W. Chu, L. Li, L. Reyzin, and R. E. Schapire, “Contextual bandits with linear payoff functions,” in Proc of AISTATS, pp. 208–214, 2011.
 [27] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextualbandit approach to personalized news article recommendation,” in Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 661–670, 2010.
 [28] A. Liberzon, A. Subramanian, R. Pinchback, H. Thorvaldsdóttir, P. Tamayo, and J. P. Mesirov, “Molecular signatures database (MSigDB) 3.0,” Bioinformatics, vol. 27, no. 12, pp. 1739–1740, 2011.
 [29] D. S. Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali, P. Stothard, Z. Chang, and J. Woolsey, “DrugBank: a comprehensive resource for in silico drug discovery and exploration,” Nucleic acids research, vol. 34, pp. D668–D672, 2006.
 [30] B. Yadav, T. Pemovska, A. Szwajda, E. Kulesskiy, M. Kontro, R. Karjalainen, M. M. Majumder, D. Malani, A. Murumägi, J. Knowles, et al., “Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies,” Scientific Reports, vol. 4, p. 5193, 2014.
 [31] M. Kontro, H. Kuusanmäki, S. Eldfors, T. Burmeister, E. Andersson, Ø. Bruserud, T. Brümmendorf, H. Edgren, B. Gjertsen, M. ItäläRemes, et al., “Novel activating STAT5B mutations as putative drivers of Tcell acute lymphoblastic leukemia,” Leukemia, vol. 28, no. 8, pp. 1738–1742, 2014.
 [32] F. Harrell, Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, 2nd ed., 2015.
 [33] D. B. Rubin, “The Bayesian bootstrap,” The Annals of Statistics, vol. 9, no. 1, pp. 130–134, 1981.
 [34] A. Vehtari and J. Lampinen, “Bayesian model assessment and comparison using crossvalidation predictive densities,” Neural Computation, vol. 14, no. 10, pp. 2439–2468, 2002.
 [35] S. A. Forbes, D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward, C. Y. Kok, M. Jia, T. De, J. W. Teague, M. R. Stratton, U. McDermott, and P. J. Campbell, “COSMIC: exploring the world’s knowledge of somatic mutations in human cancer,” Nucleic Acids Research, vol. 43, pp. D805–D811, 2014.
 [36] J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010.

[37]
T. P. Minka and J. Lafferty, “Expectationpropagation for the generative
aspect model,” in
Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence
, pp. 352–359, 2002.  [38] J. M. HernándezLobato, D. HernándezLobato, and A. Suárez, “Expectation propagation in linear regression models with spikeandslab priors,” Machine Learning, vol. 99, no. 3, pp. 437–487, 2015.
Appendix A. Prediction model
Sparse linear regression models are used to predict the drug sensitivities based on the genomic features. Let be the sensitivity of the th patient for drug , and be the vector of the patient’s genomic features. We assume a Gaussian observation model:
where the are the regression weights and is the residual variance. A sparsityinducing spikeandslab prior [22, 23] is put on the weights:
where is a binary variable indicating whether the th feature is relevant (i.e., drawn from a zeromean Gaussian prior with variance ) or not ( is set to zero via the Dirac delta spike ) when predicting for the th drug. The prior probability of relevance controls the expected sparsity of the model via the prior
The model is completed with the hyperpriors:
Settings for the values of the hyperparameters are discussed in Section 4.1.
Given the observed drug sensitivities for patients and drugs and the genomic features , the posterior distribution of the model parameters
is computed via the Bayes theorem as
The posterior distribution of together with the observation model is then used to compute the predictive distribution of the drug sensitivities for a new data point :
(1) 
Expert knowledge is incorporated into the model via feedback observation models [19]. The relevance feedback ( denotes irrelevant, relevant) of feature for drug follows
where is the probability of the expert being correct. For example, when the th feature for drug is relevant in the regression model (i.e., ), the expert would a priori be assumed to say with probability . In the model learning (i.e., calculating the posterior distribution in Equation 2 below), once the expert has provided the feedback based on his or her knowledge, effectively controls how strongly the model will change to reflect the feedback.
The directional feedback ( denotes negative weight, positive) follows
where when the condition holds and otherwise, and is again the probability of the expert being correct. For example, when the weight is positive, the expert would a priori be assumed to say with probability .
To simplify the model, we assume and set a prior on as
Given the data and and a set of observed feedbacks encoding the expert knowledge, the posterior distribution is computed as
(2) 
where and now includes also . The predictive distribution follows from Equation 1. Figure 3 shows the plate diagram of the model.
The computation of the posterior distribution is analytically intractable. The expectation propagation algorithm [37] is used to compute an efficient approximation. In particular, the posterior approximation for the weights is a multivariate Gaussian distribution and the predictive distribution for is also approximated as Gaussian [38, 19]. The mean of the predictive distribution is used as the point prediction in the experimental evaluations in Section 4.
Appendix B. User model
The user model chooses the pair for the next query based on upper confidence bound criterion. The upper confidence bound at iteration is computed as , where the confidence of the response is computed as in [26]:
Here is the number of (drug, feature) pairs, and defines that the bound holds with the probability . The user model is initialized using regression weights from the prediction model as pseudofeedback, with lower weight such that one feedback from an expert corresponds to 10 pseudofeedbacks, similarly as in [20].
Appendix C. Feedback collection
The answer counts by feedback type are summarized in Table 3 for both of the experts.
Answer  SR  DC 

Relevant, positive correlation  192  47 
Relevant, negative correlation  14  34 
Relevant, unknown correlation direction  26  372 
Not relevant  13  0 
I don’t know  1771  1563 
Appendix D. Further results
Feedback on the direction of the putative effect is more effective than general relevance feedback. We also assess the importance of the type of the feedback by comparing a spikeandslab model with only relevance feedback (interpreting potential expert knowledge on the direction only as relevance) to a model with both types of feedback. Table 4 shows that the directional feedback improves the performance markedly, especially in the case of the senior researcher (who gave more directional feedback than the doctoral candidate; see Table 3). The bootstrapped probabilities are 0.70 in the Cindex and 0.71 in MSE in favour of both types of feedback compared to only relevance feedback for the doctoral candidate and, similarly, 0.93 and 0.93 in the case of the senior researcher.
Doctoral candidate  Senior researcher  

Relevance only  Both  Relevance only  Both  
Cindex  0.62  0.63  0.62  0.65 
MSE  0.93  0.92  0.93  0.86 
Use of biological prior information from databases benefits the user modeling. In the sequential knowledge elicitation with bandit user model, the queries are chosen based on expert’s earlier answers and feature descriptions. However, the expert’s feedback still defines which of the queried pairs are, in fact, relevant to the prediction task. Next we investigate how much the user models improve when using auxiliary biological information from databases. We form description vectors, as described in Section 3.2.2, from the patient data used in the prediction task or, alternatively, from pathway and drug target information available in the databases. We compare the two alternatives in how well they discriminate the (drug, feature) pairs for which the expert was able to provide feedback from those where the answer was ’I don’t know’. The results in Table 5 show that using biological prior information improves especially the recall of the useful (drug, feature) pairs, which means that the model finds a greater proportion of the (drug, feature) pairs that got feedback.
Patient data  Pathway and target information  

SR  Precision  1.00  0.89 
Recall  0.27  0.42  
DC  Precision  0.88  0.90 
Recall  0.23  0.46 
Drugwise overview of the results. Table 6 shows the effect of expert feedback on the 12 drugs used in the study. The predictions improved in all of the drugs, and the improvement is in general greater with the senior researcher’s feedback. On the other hand, we observed that the type of feedback each expert gave was different, as the senior researcher provided more directional feedback than the doctoral candidate, and the doctoral candidate provided more relevance feedback (Table 3). The greater number of directional feedback could explain the greater overall improvement with senior researcher’s feedback, as we have already observed that directional feedback is more effective than relevance feedback.
Cindex  MSE  

Drug  NF  DC  SR  NF  DC  SR 
Pimasertib  0.63  0.60  0.68  0.78  0.83  0.67 
Refametinib  0.67  0.64  0.68  0.80  0.85  0.71 
Trametinib  0.66  0.65  0.70  0.84  0.87  0.71 
Dexamethasone  0.65  0.71  0.68  0.96  0.91  0.96 
Methylprednisolone  0.65  0.69  0.63  0.95  0.90  0.96 
AZD2014  0.61  0.60  0.68  0.94  0.93  0.80 
Dactolisib  0.59  0.59  0.66  0.97  0.92  0.86 
Idelalisib  0.45  0.55  0.52  1.12  1.09  1.25 
PF.04691502  0.57  0.62  0.64  1.00  0.96  0.90 
Pictilisib  0.57  0.64  0.64  0.95  0.88  0.87 
Temsirolimus  0.59  0.57  0.63  0.94  1.02  0.78 
Venetoclax  0.66  0.71  0.68  0.95  0.83  0.88 
NF = No feedback, DC = Doctoral candidate, SR = Senior researcher.
Comments
There are no comments yet.