1 Introduction
With the development of molecular biology, the study of cancer genomics has enabled scientists to develop anticancer drugs according to cancers’ genomics features. These drugs are used widely and have great significance in the therapy of cancer treatment nowadays. However,the efficacy of anticancer drugs vary greatly from one kind of tumor to another, making it considerably difficult to customize therapy strategy for patients. Moreover, the efficacy of anticancer drugs is closely related with their molecular structure, which is also hard to predict even for sophisticated pharmacists. Many researchers study drug expression embeddings[7, 11, 13, 22, 15, 9] and genomic data [3, 1, 8, 20]
separately and some others combine them in deep neural networks
[2, 18, 19, 23] to predict drug efficacy. To provide more precise treatment strategy, a sufficient analysis and understanding of cancers’ genomic data and drugs molecular structure is essential.On the other hand, there exists many ways to learn features from drug molecular data and gene expression data. A more appropriate method to encode features from graph structure and gene expression can be helpful to a more accurate prediction of the drug response. According to supervised learning methods, random forest is able to list the importance of each gene, which can help us filter genes at the very first step. However it will encounter the problems when meeting data with no labels, such as the Cancer Cell Line Encyclopedia (CCLE) datasets that we want to explore. Some dimensionality reduction methods, such as principal component analysis (PCA) , independent component analysis (ICA) and manifold learning based tdistributed stochastic neighbor embedding (tSNE) are common in analyzing medical data where feature numbers are numerous and features are with no labels. However, they are primarily for 2D visualization in most cases and might lose many important information when compressing data into high dimensional latent features. If situated in a higher dimension, they do not perform well at all and we cannot judge its performance intuitively by visualizing latent space
[17].To extract features from a large amount of gene data , we take advantage of variational autoencoder(VAE)[12]
, which has achieved great success in the field of unsupervised learning of complex probability distribution. The amazing ability of VAE models to capture probabilistic distribution of latent information could enable more complete analysis of gene data, making it easier to predict the response of anticancer drugs when they are used in specific cancer cell lines. As for anticancer drugs, we will transform their molecular graph into junction trees by functional group split and implement a junction tree VAE model to extract their low dimensional features. Finally, we will implement a fully connected neural network to combine the extracted features to produce the final result,
value of the anticancer drug used against the cancer cell line. Specifically, in this research, we firstly select breast cancer, which is the dominant cancer in women group for our study. Many drug efficacy predictions were under the background of breast cancer but only a few studies are efficiently making the full use of encoded information to make predictions and further generalizations. We concentrate on judging encoders’ efficiency on extracting features. Then we generalize it on pancancer CCLE data to see if the VAE models fit well. We do visualization on latent space dimensionality reduction with unsupervised learning to see the model’s robustness with tSNE. Finally we check the similarity of drugs with Euclidean distance of latent vectors from junction tree encoder model. Drugs with similar structure or property may have certain close latent vectors, which enables us to compare their discrepancy on different cell lines.Present work
Our present work focuses on using variational autoencoders(VAE) to learn latent vectors from latent space and using latent vectors to do further tasks. Our work includes two kinds of VAE, one for gene expression data input and the other for drug’s sequence data input. Our aim is to extract explainable latent vectors from the VAE model by reducing its reconstruction loss as well as KullbackLeibler divergence(KL divergence) loss. Drug response model is based on a deep neural network.
is the value for final prediction. Besides KL loss, coefficient of determination metrics() and root mean squared error() are the metrics that we choose in our research. The datasets that we adopt are Cancer Cell Line Encyclopedia(CCLE) gene expression dataset, ZINC molecular structure dataset and GDSC drug response dataset [1, 25]. Generally we find it explainable to predict . While present work is based on VAE, more ideas can be found in our future work part that we will attempt to realize . All the figures, datas and handson notebooks can be found here: https://github.com/JIAQINGXIE/MachineLearninginGenomesSummerResearch2 Related work
Dimensionality reduction on features
Reducing feature numbers or encoding features into lower dimensions is common in those projects that use feature engineering to make predictions or analyze clustering effects. Supervised learning methods can select genes subset which are the most related to the research task, such as random forest with feature importance about gene in RNA sequence casecontrol studies[24] and support vector machines (SVM) with double RBFkernels to filter irrelevant gene features[16]. Unsupervised learning methods, such as principal component analysis(PCA) and hierarchical learning can help explain the genes’ group features and use certain PCs or hierarchical relationship to a lower dimension mapping space[10]. Our idea is to compare traditional unsupervised learning methods with the VAE since our CCLE and ZINC dataset is unsupervised. We try to find the difference between each latent space and talk about the feasibility of using VAE.
Variational autoencoders on gene expression
A plethora of works have been done on encoding important features from gene expression data. The core idea behind feature extraction is how to learn latent vectors effectively from input embedding. Usually multiple perceptrons can avoid the curse of dimensionality and simply encode gene features from input layer
[2, 3, 18]. An encodertodecoder structure[3] extends from a multilayer perceptron, which considers more about genes’ reconstruction. The bottleneck layer represents latent information from this kind of autoencoder. Recently variational autoencoder(VAE)[12] has appeared frequently in pretrained models that encode gene expression [8, 20, 21]. These studies are primarily focusing on latent space representation based on maximizing likelihood of gene distribution. Our approach on encoding CCLE gene expression data is mainly inspired by VAE. We take a simple deepneuralnetwork based VAE as our baseline model to form a pretrained encoder on gene expression.Representation learning on graph
Graph features can be encoded by deep learning methods, such as convolutional neural network(CNN), recurrent neural network(RNN) and message passing neural network(MPNN)
[5, 6, 7]. Besides, variational autoencoder(VAE) are also widely used in graph generation and graph encoders[13, 22, 14, 15]. In order to avoid generating nodes one by one, which is often of nonsense in drug design, a method that combined tree encoder with graph encoder was proposed[11]. It treats functional groups as nodes for broadcasting. Also, attention mechanism are applied to VAE[18, 19], which is used to be usual in the transformer architecture in natural language processing models . They learn attention weights by multiheadattention or selfattention with softmax operation in order to forget certain unimportant genes or drugs during propagation. Among all these studies, we choose junction tree variational autoencoder(JTVAE) as our pretrained baseline model on encoding drug structures. Although attention mechanism is popular in recent works, training such a transformer takes time and we have included it in our future work.
Drug response prediction methods
Supervised learning methods are useful to predict drug response with encoded information. Support vector machine regressor(SVR) and random forest regressor are basic alogrithms to perform regression. Recently deep neural network methods have been popular in the drug prediction network[2, 3, 18, 19]. Our own drug prediction network is also based on deep neural network but with some modifications. Our baseline model is mainy based on SVR and single deep neural network. We compare them with our VAE plus MLP model to see its lack of high score and lack of innovation.
3 Method
Problem restatement: Given CCLE gene expression data and drug’s SMILES expression, build a model that can precisely predict , which is the response of each anticancer drug. We build three main variational autoencoders and calculate its latent vectors from the latent space. We further fit it into our prediction network. Our further tasks concentrate on latent vectors to do visualization and predictions, which will be discussed about in result part. In this part we mainly describe how to apply VAE to our datasets.
3.1 Data
Gene expression data
We obtain gene expression data of 1021 cancer lines with 57820 genes provided by the Cancer Cell Line Encyclopedia (CCLE)[1]. Each cell line belongs to a specific cancer type. Specifically, we choose breast cancer as our research object primarily, and then generalize our model on pan cancer cell lines. After filtering by key word token [BREAST], we select 51 breast cancer cell lines from this dataset, which are [AU565_BREAST],[BT20_BREAST],[BT474_BREAST],...,[ZR7530_BREAST]. Gene expression data is given by G, where g is the number of genes and c is the number of cancer cell lines. The elements of matrix G are , where is transcriptome per million (tpm) value of the gene in the corresponding cell line. Moreover, we access the Cancer Genomic Census (CGC) dataset[2]
, which classifies different genes into two tiers. One tier is for the genes that are closely associated with cancers and have a high probability to mutate in cancers that change the activity of the gene product. The other tier includes genes that play a strong indicated role in cancer but show little evidence. Genes in both tiers are highly relevant with cancer, and we take all of these genes in our research. We select 51 breast cancer cell lines from CCLE data set and remove expression data of genes which are not in CGC dataset. Each gene expression entrance with a mean of
which is less than 1 or standard deviation
which is less than 0.5 are also removed for their little relevance with cancer cell lines [3]. Eventually, we get gene expression data of 597 genes in 51 breast cancer cell lines.Anticancer drug molecular structure data
In this research, we have prepared ZINC dataset for molecular structure data of organic compounds to train the JTVAE model. Molecular structure data is given in simplified molecularinput line entry system (SMILES) strings. SMILES expression is often used in defining drug structures[2, 18, 19, 11, 13, 22, 15, 23]. They are widely used as inputs in drug structure prediction tasks. Also the SMILES expression is easier for us to get embeddings from vocab parsing library that we have generated. From ZINC data set, we select 10, 000 SMILES strings to train our JTVAE model. The number of trained SMILES strings that we choose is far beyond the actual number of 222 drugs in processed GDSC dataset. The reason is that we would like to see a better generalization on all drug structures instead of drugs on specific cancer types.
Drug response data
Drug response data is obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) project[25], which contains response data of anticancer drugs used against numerous cancer cell lines. Data from GDSC data set is given by a matrix , where d is number of drugs and c is number of cancer lines. The elements in this matrix are , where is the half maximal inhibitory concentration value of the drugs used against specific cancer cell lines. We obtain molecular data of anticancer drugs from PubChem dataset with their unique PubChem ID available from GDSC dataset. Eventually, we get 3358 pieces of drug response data in breast cancer cell lines where gene expression data and molecular structure are available.
3.2 Gene expression VAE(GeneVAE)
We consider building a simple encoderdecoder model with fully connected neural network at the first stage. It aims to extract the latent vectors for CCLE gene expression data, which will be fit into the combined MLP drug prediction network. We use a simple multilayer perceptron for our forward propagation with a batchnorm layer before activation.
(1) 
is the activation function (ReLU in our model),
is the weight matrix andis the bias vector at the first dense layer. Batch normalization is used to train our model more efficiently. Using activation can filter unimportant information.
is the output of the first layer in our MLP model. We connect it to the second layer:(2) 
where is the activation function , is the weight matrix and is the bias vector at the second dense layer. is computed by another 2layer MLP with the same architecture as . Latent vector is randomly sampled from .
The decoder architecture is constructed by two dense layers with same output dimensions as the input. The decoded gene expression is written as G’.
(3) 
In this VAE, we need to compute reconstruction loss and KullbackLeibler(KL) divergence loss KL[12], where p is posterior distribution and q is distribution of z.
serves to standard normal distribution in a variational autoencoder. The total loss can be written to:
(4) 
We aim to reduce total loss until the loss converges.
3.3 Junction tree VAE(JTVAE)
Graph encoder
We take junction tree variational autoencoder (JTVAE) [11] as one of our encoding model to represent drug’s latent space. We use a message passing network[11, 9] as a graph encoder. Suppose there d nodes in the graph. Each node u has the property of atom type. encodes the bond type between node u and node v. The matrix represents the message passing from node j to node u in number of t iterations. is set to 0 initially. , ,
are three weight matrices separately. With the knowledge of loopy belief propagation, we can achieve the message passing embedding with a rectified linear unit from node
u to v at time t as:(5) 
Getting the message from the neighborhood of node u, we can aggregate those message embedding vectors with its atom type, which can be written in the summation form with a rectified linear unit as following equation(2). Final graph representation is shown as .
(6) 
Mean
and variance
can be computed fromby an affine layer. The graph latent vector
is sampled from (,).Tree encoder
The architecture of tree encoder is based on Gated Recurrent Unit(GRU)
[11, 4]. The hidden state isin this tree encoder model. It is used to reserve tree’s message passing information from the moment t1 together with the tree clusters {
, i = 1, 2, … ,d}.(7) 
There are two kinds of gates in our tree encoder model, which are reset gate and update gate . is used for calculating how much the system is going to reserve while is used for counting the probability that how likely the system is going to update the message passing information at the moment t. If the reset gate is set to 0, the elementwise multiplication in equation (3) will be simplified to tanh(), which means that there’s no reserved message at the previous stage.
(8) 
The total update function, which depends on the previous activation and candidate activation can be writtern into the form of elmentwise multiplication.
(9) 
We can get the tree’s latent representation of node u by aggregating its updated messages at the tth iteration(equation 9 ). is calculated in the smiliar way as graph encoder do. Since the graph and tree decoders’ structure in JTVAE is also based on GRU method, we will not discuss about it here but instead reference to raw paper of JTVAE [11].
(10) 
3.4 Drug response prediction network
Since gene VAE and molecular VAE have been trained at this stage, we implement two multilayer perceptron (MLP) models to poseprocess the output of the two VAE models respectively, and then build another MLP model to concatenate them and produce the final drug response prediction. The input to the final MLP model is , where and are outputs of the two postprocessing MLP models. Suppose and , then , which means that the total dimension of the input to the final MLP model is . Value of perceptrons of the layer in the final MLP model is computed according to:
(11) 
where is the weight matrix of a layer in the final MLP model and is a nonlinear activation function for which we choose PRelu in our model. The predicted is computed at the last layer of the final MLP model:
(12) 
where is the number of layers in the final MLP model.
Baseline model
We use Support Vector Regression (SVR) in substitute for MLP as our baseline model, showing a convenient way to take advantage of machine learning methods to make drug response prediction. Our baseline models also rely on the result of Junction Tree VAE.
4 Experiments
Experiment setup
We first train our gene expression VAE(geneVAE) model and Junction Tree VAE(JTVAE) unsupervisedly. Then we use geneVAE to encode gene expression data either filtered by CGC data set or not respectively on breast cancer cell lines, and use Junction Tree VAE(JTVAE) to encode anticancer drugs. With these encoded features, we train our Support Vector Regression (SVR) and MultiLayer Perceptron (MLP) model on breast cancer cell lines. Finally, we generalize our model and test it on pancancer cell lines.
4.1 Result of VAE on breast cancer
According to the training part of variational autoencoder, the sum of reconstruction loss and latent loss is the objective function that we aim to minimize. The reconstruction loss is , where G represents initial input gene expression data and G’ represents reconstructed data. It can be mean squared loss[MSEloss] or cross entropy loss[CrossEntropyloss]
. We choose cross entropy loss as our reconstruction loss in our experiments, because we normalize the input data add sigmoid function in the last layer to make sure the input and output both consist of values between
and . We connect the input layer to the final custom variational layer in our program to compute such loss.Filtering out a representative gene subset using CGC data set also matters in the training of our gene expression VAE model. We have mentioned in the 3.1 part that for breast cancer cell lines, the selected gene number from CGC is 597. We test our model either filtering out a gene subset or not on breast cancer cell lines, and the result indicates that filtering out such a gene subset could help improve the accuracy of the prediction of
value. According to the evaluation of total loss, our tests show that at the beginning of the training loop, validation VAEloss is much higher than training VAEloss, and VAEloss starts to convergent after 100 epochs. Model on CGCselected gene expression data has an average VAEloss of 27.3. Model without CGC selected gene expression data has an average VAEloss of 68 after validation loss becomes stable.
Model Comparison
We select 2 metrics: Coefficient of Determination (R2 score) and Root Mean Square Error (RMSE) to evaluate the discrepancy between our predicted drug response and true drug response. We propose 6 models and the results have been listed in Table 1. Among these models, the first 5 models are targeted on breast cancer, and the last one is tested on pan cancer cell lines: Support Vector Regression model trained on drug molecular structure data encoded by VAE model and gene expression data filtered by CGC dataset. Support Vector Regression model trained on gene expression data filtered by CGC dataset and drug molecular structure data which are both encoded by VAE model. MultiLayer Perceptron model trained on drug molecular structure data encoded by VAE model and gene expression data filtered by CGC dataset. MultiLayer Perceptron model trained on raw gene expression data (not filtered by CGC dataset) and drug molecular structure data which are both encoded by VAE model. MultiLayer Perceptron model trained on gene expression data filtered by CGC dataset and drug molecular structure data which are both encoded by VAE model. MultiLayer Perceptron model trained on gene expression data filtered by CGC dataset and drug molecular structure data which are both encoded by VAE model. This model is trained on pan cancer dataset.
The test results of these models are shown in Table 1. We can see that the MLP model and VAE model bring about huge improvement in the performance of our models: model outperforms model by 0.143 R2 score, and model performs even better than model with a 0.008 higher R2 score. Moreover, the selection of representative gene subset is essential to the performance of our models. For example, model on breast cancer cell lines reaches 0.830 R2 score, much better than that of model with a 0.025 higher R2 score.
Table 1 Metrics evaluation on different gene subsets in breast cancer dataset(average).
Models  Cancer type  

CGC + SVR  Breast  0.679  1.489 
CGC + VAE + SVR  Breast  0.700  1.439 
CGC + MLP  Breast  0.822  1.133 
RAW + VAE + MLP 
Breast  0.805  1.163 
CGC + VAE + MLP  Breast  0.830  1.130 
CGC + VAE + MLP  Pancancer  0.845  1.080 
CDRscan[2]  Pancancer  0.843  1.069 

4.2 Generalization on pancancer
We have achieved an ideal by testing our models on breast cancer cell lines to predict drug response. We generalize our model on the pancancer cell lines based on CCLE dataset. The only difference in pancancer gene expression data from that of breast cancer is that the total number of pancancer cell lines is 1021. Our model achieves a even higher 0.845 on pancancer cell lines, reaching the performance of CDRscan[2] on our dataset. To make our model more robust, we will incorporate more data into our model like TCGA dataset.
4.3 Exploring latent vectors from geneVAE
Taking advantage of the diversity of cancer types in pancancer dataset, we discover that latent vectors encoded by geneVAE retains the features of original data. We visualize latent vectors of gene expression data into two dimension Euclidean space. Effects of dimensionality reduction are evaluated by a single tSNE model compared with another tSNE mdodel combined with our pretrained VAE encoder. Generally tSNE is just used for visualization on a two dimensional plane since tSNE model performs worse at a higher dimension space. We begin with giving each cell line its tissue type, from "CERVIX" to "OVARY". We encode them by extracting the pattern after their first underscore in CCLE dataset. Especially we rename "HAEMATOPOIETIC_AND_LYMPHOID_TISSUE" to "HALT" since it’s the longest string. The parameters are perplexity and iterations for the single tSNE model. We set perplexity to n/120, where n are the numbers of cell lines and we set iterarions(n_iter in python) to 3000. The same settings are applied to the combined model. The result shows that many clusters are apparent both in a single tSNE model and a combined model. Therefore, the latent vectors encoded by geneVAE model retains the unique features of input data.
In the single model, tissue type labels with [HALT], [AUTONOMIC_GANGLIA] ,[BREAST] and [SKIN] etc are separate obviously, while some other type of tissues are clustered together with other similar tissue types. For example, genes do not have a great difference on their expression according to "STOMACH" and "LARGE INTESTINE". Several tissue types are so rare in cancer cell lines that they might be clustered with another tissue, because tSNE doesn’t exactly explain the real distance between cancer types.
Eliminating rare cancer types could help improve the tSNE results. We set the threshold of 30 to filter the tissue types, where 12 tissues are hold. They are [BREAST, CENTRAL_NERVOUS_SYSTEM, FIBROBLAST, HALT, KIDNEY, LARGE_INTESTINE, LUNG, OVARY, PANCREAS, SKIN, STOMACH, UPPER_AERODIGESTIVE_TRACT]. We remain the gene subset that we have filtered and eliminated from raw data. We visualize it again and find that more clusters are apparent in the picture, where we use black frames to represent. The clustering results of latent vectors and original data still remain similar in Figure 14, where primary cancer tissue types are separated clearly. Therefore, latent vectors encoded by geneVAE model retain the essential features of original data robustly. With geneVAE, our models are able to focus on the lowdimensional critical features of original data and produce a more accurate prediction.
4.4 Exploring latent vectors from JTVAE
Drugs sharing similar molecular structure are also similar in latent vectors. We get access to latent vectors encoded by JTVAE, and reveal drugs sharing similar molecular structure are also similar in latent vectors. We measure the similarity of latent vectors of different drugs in terms of Euclidean Distance. Shorter distance indicates a higher similarity between two drugs. For example, MG132(inhibitor) and Proteasome (inhibitor) share a shortest distance which is about 23.73. We obtain their molecular structures in Pubchem database and find that a majority of functional groups are similar between these two drugs. Small differences lie in a carboxyl and an amide at the endings of the molecule. However, not all related drugs have such a great similarity. According to drug Imatinib and Linifanib, their are even closer in terms of Euclidean distance between their latent vectors but they have only middle part of the functional groups exactly the same. JTVAE model might discover underlying similarity among functional groups that are not exactly the same. Also, the message passing network in JTVAE is based on GRU, and it might forget some functional groups during propagation by neighbours.
Though similar drugs share close latent vectors, our MLP model is still able to capture subtle differences and produce an accurate prediction. We focus on the example of MG132 and Proteasome used against HCC1187 cancer cell line. We remove these two pieces of data from training set, and test our trained model on them. The predicted
of MG132 and Proteasome in cell line HCC1187 are 0.84 and 0.866 in our best model. True value of these two drugs are 1.589 and 0.181 respectively. Although two values are not too close to the expected values, they do not step into the range of the other’s confidence interval. Therefore, despite the considerably high similarity between similar drugs, our MLP model is still able to differentiate each of them and produce a reasonable result.
5 Discussion and conclusions
In this research we build gene expression VAE(geneVAE) model, junction tree VAE(JTVAE)model, Support Vector Regression (SVR) model and several MultiLayer Perceptron (MLP) models. We extract latent vectors with geneVAE and JTVAE model to fit them into our drug prediction network. We compare our combined models with baseline SVR models that have been mentioned in previous related works. Generally speaking, we have achieved a great coefficient of determination value on our model (0.845 R2 score), reaching stateoftheart performance of CDRscan[2] on our data set. Besides, we discuss the effectiveness of geneVAE and JTVAE model from the perspective of visualization and drug similarity, further proving the validity of our pipeline.
There are still some interesting aspects that we have met during our research. Hyperparameter tuning and layer setting are superior. Different hyperparameters lead to different consequence. For example, adding BatchNorm (BN) layer in MLP model results in a worse performance. Batch normalization is a widely used technique to avoid gradient exploding and vanishing. However, its effectiveness is doubtful when it is used in shallow networks with Rectified Linear Unit, where gradient exploding and vanishing seldom occur. Moreover, the inconsistency among minibatches could influence the performance of the batchnorm layers badly. Besides BN layer, proportion of trainvalidtest split is essential to the final result, as well as the proportions of batch size and learning rate. We set the default batchsize of 8 and learning rate 0.001 in the training loop. Larger values of these two hyperparameters convergent faster, however might meet with problem of falling into local minimum. A proper proportion of train and validation and test sets are 10:1:1 and for each epoch we choose them randomly from total dataset. Kfold cross validation is also a good choice.
We suggest dimensional reduction should be done without PCA, as well as tSNE and some other powerful clustering methods. PCA is not suitable when reducing dimension numbers to 56 or even higher in our research. tSNE is better than PCA at 2dimensional representation. However we also find that there are something mess up when applying tSNE at 2dimensional space. Besides, we will showcase more predictions in our future works with similar structure to further generalize our idea that although similar, their latent vectors could not be changed when making predictions. Unless we find the prediction value is not in each other’s confidence interval.
6 Future work
Attention mechanism based models are included in our future works. Using attention mechanism is not only popular in transformer, BERT model which belong to natural language processing field, but also widely used in drug structure translation field[18, 19]. Apart from attention based models, there are other sequence generation models like GMM and graph neural network(GNN). Moreover, we’d like to build a toolkit for drug response prediction if given one cancer cell line data and corresponding drug’s response. Last but not least, select better gene subset of each drug since drug response may have different gene contributions.
Acknowledgement
Thanks to professor Manolis Kellis for reviewing this artile.
References
 [1] (2012) The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 (7391), pp. 603–607. Cited by: §1, §1, §3.1.
 [2] (2018) Cancer drug response profile scan (cdrscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Scientific reports 8 (1), pp. 1–11. Cited by: §1, §2, §2, §3.1, §3.1, §4.1, §4.2, §5.
 [3] (2019) Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC medical genomics 12 (1), pp. 18. Cited by: §1, §2, §2, §3.1.
 [4] (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §3.3.
 [5] (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §2.
 [6] (2016) Recurrent neural network grammars. arXiv preprint arXiv:1602.07776. Cited by: §2.
 [7] (2017) Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212. Cited by: §1, §2.
 [8] (2018) ScVAE: variational autoencoders for singlecell gene expression datas. bioRxiv, pp. 318295. Cited by: §1, §2.
 [9] (2016) Deep clustering: discriminative embeddings for segmentation and separation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35. Cited by: §1, §3.3.

[10]
(2006)
Unsupervised clustering analysis of gene expression
. Chance 19 (3), pp. 49–51. Cited by: §2.  [11] (2018) Junction tree variational autoencoder for molecular graph generation. arXiv preprint arXiv:1802.04364. Cited by: §1, §2, §3.1, §3.3, §3.3.
 [12] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1, §2, §3.2.
 [13] (2017) Grammar variational autoencoder. arXiv preprint arXiv:1703.01925. Cited by: §1, §2, §3.1.
 [14] (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §2.
 [15] (2018) Constrained graph variational autoencoders for molecule design. In Advances in neural information processing systems, pp. 7795–7804. Cited by: §1, §2, §3.1.
 [16] (2018) Feature selection of gene expression data for cancer classification using double rbfkernels. BMC bioinformatics 19 (1), pp. 1–14. Cited by: §2.
 [17] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §1.
 [18] (2019) Toward explainable anticancer compound sensitivity prediction via multimodal attentionbased convolutional encoders. Molecular Pharmaceutics. Cited by: §1, §2, §2, §2, §3.1, §6.
 [19] (2018) PaccMann: prediction of anticancer compound sensitivity with multimodal attentionbased neural networks. arXiv preprint arXiv:1811.06802. Cited by: §1, §2, §2, §3.1, §6.
 [20] (2017) Dr. vae: drug response variational autoencoder. arXiv preprint arXiv:1706.08203. Cited by: §1, §2.
 [21] (2019) Dr. vae: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics 35 (19), pp. 3743–3751. Cited by: §2.
 [22] (2018) Graphvae: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §1, §2, §3.1.
 [23] (2019) Compound–protein interaction prediction with endtoend learning of neural networks for graphs and sequences. Bioinformatics 35 (2), pp. 309–318. Cited by: §1, §3.1.
 [24] (2018) Using supervised learning methods for gene selection in rnaseq casecontrol studies. Frontiers in genetics 9, pp. 297. Cited by: §2.
 [25] (2012) Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research 41 (D1), pp. D955–D961. Cited by: §1, §3.1.
Comments
There are no comments yet.