1 Introduction
Innovation in materials is the key driver for many recent technological advances. From clean energy Tabor2018Accelerating to the aerospace industry Gibson2010review or drug discovery Chen2018rise , research in chemical and materials science is constantly pushed forward to develop compounds and formulae with novel applications, lower cost and better performance. Conventional methods for the discovery of new materials start from a welldefined set of substances from which properties of interest are derived. Then, intensive research on the relationship between structures and properties is performed. The gained insights from this procedure lead to incremental improvements in the compounds and the cycle is restarted with a new search space to be explored. This trialanderror approach to innovation often leads to costly and incremental steps towards the development of new technologies and in occasion relies on serendipity for leap progress. Materials development may require billions of dollars in investments DiMasi2016Innovation and up to 20 years to be deployed to the market DiMasi2016Innovation ; Tabor2018Accelerating .
Despite the challenges associated with such direct approaches, they have not prevented datadriven discovery of materials from happening. Highthroughput materials screening Shoichet2004Virtual ; Greeley2006Computational ; Alapati2006Identification ; Setyawan2011High ; Subramaniam2008Virtual ; Armiento2011Screening ; Jain2011high ; Curtarolo2013high ; PyzerKnapp2015What ; GomezBombarelli2016Design and data mining Morgan2004High ; Ortiz2009Data ; Yu2012Identification ; Yang2012search ; Lin2012silico ; Mounet2018Two have been responsible for several breakthroughs in the last two decades Potyrailo2011Combinatorial ; Jain2016Computational , leading to the establishment of the Materials Genome Initiative NSTC2011Materials and multiple collaborative projects around the world build around databases and analysis pipelines Curtarolo2012AFLOWLIB.ORG ; Calderon2015AFLOW ; Jain2013Commentary ; Saal2013Materials . Automated, scalable approaches leverage from data sets in the thousands to millions of simulations to offer a cornucopia of insights on materials composition, structure and synthesis.
Developing materials with the inverse perspective departs from these traditional methods. Instead of exhaustively deriving properties from structures, the performance parameters are chosen beforehand and unknown materials satisfying these requirements are inferred. Hence, innovation in this setting is achieved by reverting the mapping between structures and their properties. Unfortunately, this approach is even harder than the conventional one. Inverting a given Hamiltonian is not a welldefined problem, and the absence of a systematic exploratory methodology may result in delays, or outright failure, of the discovery cycle of materials SanchezLengeling2018Inverse . Furthermore, another major obstacle to the design of arbitrary compounds is the dimensionality of the missing data for known and unknown compounds Zunger2018Inverse . As an example, the breadth of accessible druglike molecules can be on the order of Polishchuk2013Estimation ; Virshup2013Stochastic , rendering manual searches or enumerations through the chemical space an intractable problem. In addition, molecules and crystal structures are discrete objects, which hinders automated optimization, and computergenerated candidates must follow a series of hard (valence rules, thermal stability) and soft (synthetic accessibility, cost, safety) constraints that may be difficult to state in explicit form. As the inverse chemical design holds great promise for economic, environmental and societal progress, one can ask how to rationalize the exploration of unknown substances and accelerate the discovery of new materials.
1.1 Early inverse design strategies for materials
The inverse chemical design is usually posed as an optimization problem in which molecular properties are extremized with respect to given parameters Joback1989Designing . This concept splits the inverse design problem in two parts: (i) efficiently sampling materials from an enormous configuration space, and (ii) searching for global maxima in their properties Kuhn1996Inverse corresponding to minima in their potential energy surface Wales1999Global ; Schoen2001Determination . Early approaches towards the inverse materials design used chemical intuition to address (i), narrowing down and navigating the space of structures under investigation with probabilistic methods Gani1983Molecular ; Marder1991Approaches ; Holmblad1996Designing ; Kuhn1996Inverse ; Sigmund1997Design ; Wolverton1997Invertible . Nevertheless, even constrained spaces can be too large to be exhaustively enumerated. Especially in the absence of an efficient exploratory policy, this discovery process demands considerable computational resources and time. Several different strategies are required to simultaneously navigate the chemical space and evaluate the properties of the materials under investigation.
Monte Carlo methods resort to statistical sampling to avoid enumerating a space of interest. When combined with simulated annealing Metropolis1953Equation , for example, they become adequate to locate extrema within property spaces. In physics, reverse Monte Carlo methods have long been developed to determine structural information from experimental data Kaplow1968Atomic ; Gerold1987determination ; McGreevy1988Reverse . However, the popularization of similar methods to de novo design of materials is more recent. Wolverton et al.Wolverton1997Invertible employed such methods to aid the design of alloys and avoid expensive enumeration of compositions and Franceschetti and Zunger Franceschetti1999inverse improved the idea to design AlGaAs and GaInP superlattices with a tailored band gaps. They started with configurations sampled using Monte Carlo, relaxed the atomic positions using valenceforcefield methods and calculated their band gap by fast diagonalization of pseudopotential hamiltonians. Through this practical process, they predicted superlattices with optimal band gaps after analyzing less than compounds among structures Franceschetti1999inverse .
Other popular techniques for multidimensional optimization that also involve a stochastic component are genetic algorithms (GAs)
Holland1992Adaptation . Based on evolution principles, GAs refine specific parameters of a population that improve a targeted property. In materials design, GAs have been vastly employed in the inverse design of small molecules Judson1993Conformational ; Glen1995genetic , polymers Venkatasubramanian1994Computer ; Venkatasubramanian1995Evolutionary , drugs Parrill1996Evolutionary ; Schneider2000De , biomolecules Gordon1999Branch ; Reetz2004Asymmetric , catalysts Wolf2000evolutionary , alloys Johannesson2002Combined ; Dudiy2006Searching , semiconductors Piquini2008Band ; dAvezac2012Genetic ; Zhang2013Genetic , and photovoltaic materials Yu2012Inverse . Furthermore, evolutioninspired approaches have been used as a general modeling tool to predict stable structures Brodmeier1994Application ; Woodley1999prediction ; Glass2006USPEXEvolutionary ; Oganov2006Crystal ; Froemming2009Optimizing ; Vilhelmsen2014genetic and Hamiltonian parameters Hart2005Evolutionary ; Blum2005Using . Many more applications of GAs in materials design are still being demonstrated after decades of its inception Virshup2013Stochastic ; Rupakheti2015Strategy ; Reymond2015Chemical ; Le2016Discovery ; Jennings2019Genetic .Monte Carlo and evolutionary algorithms are interpretable and often produce powerful implementations. The combination of sampling and optimization is a great improvement over random searches or full enumeration of a chemical space. Nonetheless, they still correspond to discrete optimization techniques in a combinatorial space, and require individual evaluation of their properties at every step. This discrete form hinders chemical interpolations and the definition of property gradients during optimization processes, thus retaining a flavor of “trialanderror” in the computational design of materials, rather than an invertible structureproperty mapping. One of the first attempts to use a continuous representation on the molecular design was performed by Kuhn and Beratan
Kuhn1996Inverse. The authors varied coefficients in linear combination of atomic orbitals while keeping the energy eigenvalues fixed to optimize linear chains of atoms. Later, Lilienfeld et al.
Lilienfeld2005Variational generalized the discrete nature of atoms by approximating atomic numbers by continuous functions and defining property gradients with respect to this “alchemical potential”. They used this theory to design ligands for proteins Lilienfeld2005Variational and tune electronic properties of derivatives of benzene Marcon2007Tuning . A similar strategy was proposed by Wang et al.Wang2006Designing around the same time. Instead of atomic numbers, a linear combination of atomic potentials was used as a basis for optimizations in property landscapes. Following the bijectiveness between potential and electronic density in the HohenbergKohn theory Hohenberg1964Inhomogeneous , nucleielectrons interaction potentials were employed as quasiinvertible representations of molecules. Potentials resulting from optimizations with property gradients can be later interpolated or approximated by a discrete molecular structure whose atomic coordinates give rise to a similar potential. Over the years, the approach was further refined within the tightbinding framework Xiao2008Inverse ; Balamurugan2008Exploring and gradientdirected Monte Carlo method Keinan2007Designing ; Hu2008gradient , its applicability demonstrated in the design of molecules with improved hyperpolarizability Wang2006Designing ; Keinan2007Designing ; Xiao2008Inverse and acidity Vleeschouwer2012Inverse .Despite these promising approaches, many challenges in inverse chemical design remain unsolved. Monte Carlo and genetic algorithms share the complexity of discrete optimization methods over graphs, particularly exacerbated by the rugged property surfaces. They rely on stochastic steps that struggle to capture the interrelated hard and soft constraints of chemical design: converting a single into a double bond may produce a formally valid, but impractical and unacceptable molecule depending on chemical context. On the other hand, a compromise between validity and diversity of the chemical space is difficult to achieve with continuous representations. Lastly, finding optimal points in the 3D potential energy surface that produce a desired output is still not the same as molecular optimization, since the generated “atom cloud” may not be a local minimum, stable enough in operating conditions, or synthetically attainable. An ideal inverse chemical design tool would offer the best of the two worlds: an efficient way to sample valid and acceptable regions of the chemical space; a fast method to calculate properties from given structures; a differentiable representation for a wide spectrum of materials; and the capacity to optimize them using property gradients. Furthermore, it should operate on the manifold of synthetically accessible, stable compounds. This is where modern machine learning (ML) algorithms come into play.
1.2 Deep learning and generative models
Deep learning (DL) is emerging as a promising tool to address the inverse design of many different applications. Particularly through generative models, algorithms in DL push forward how machines understand real data. Roughly speaking, the role of a generative model is to capture the underlying rules of a data distribution. Given a collection of (training) data points in a space , a model is trained to match the data distribution by means of a generative process in such a way that generated data resembles the real data
. Earlier generative models such as Boltzmann Machines
Hinton1986Learning ; Hinton1983Optimal, Restricted Boltzmann Machines
Smolensky1986Information Hinton2006Fast or Deep Boltzmann Machines Salakhutdinov2009Deepwere the first to tackle the problem of learning probability distributions based on training examples. Their lack of flexibility, tractability and generalizing ability, however, rendered them obsolete in favor of more modern ones
Goodfellow2016Deep .Current generative models have been successful in learning and generating novel data from different types of realworld examples. Deep neural networks trained on image datasets are able to produce realisticlooking house interiors, animals, buildings, objects and human faces
Karras2017Progressive ; Goodfellow2014Generative , as well as embed pictures with artistic style Gatys2015Neuralor enhance it with superresolution
Ledig2016Photo . Other examples include convincing text Bowman2015Generating ; Xu2015Show , music Mehri2016SampleRNN , voices Oord2016WaveNet and videos Vondrick2016Generatingsynthesized by such networks. Most interesting is the creation of novel data conditioned on latent features, which allows tuning models with vector and arithmetic operations in a property space
Radford2015Unsupervised ; Engel2017Latent. The adaptable architectures of these models also enable straightforward training procedures based on backpropagation
LeCun2015Deep. Within the DL framework, a proper loss function drives gradients so that the generative model, typically parameterized by a neural network, learns to minimize the distance between the two distributions.
Among the popular architectures for generating data from deep neural networks, the Variational AutoEncoder (VAE) Kingma2013Auto is a particularly robust architecture. It couples inference and generation by mapping data to a manifold conditioned to implicit data descriptors. To do so, the model is trained to learn the identity function while constrained by a dimensional bottleneck called latent space (see Fig. 1a). In this scheme, data is first encoded to a probability distribution matching a given prior distribution , where is called latent vector. Then, a sample from the latent space is reconstructed with the generative algorithm . In the VAE Kingma2013Auto , outcomes of both processes are parameterized by and to maximize a lower bound for the loglikelihood of the output with respect to the input data distribution. The VAE objective is, therefore,
(1) 
The encoder is regularized with a divergence term , while the decoder is penalized by a reconstruction error , usually in the form of meansquared or cross entropy losses. This maximization can then be performed by stochastic gradient ascent.
The probabilistic nature of VAE manifolds approximately accounts for many complex interactions between data points. Although functional in many cases, the modeled data distribution does not always converge to real data distributions Arjovsky2017Wasserstein
. Furthermore, KullbackLeibler or JensenShannon divergences cannot be analytically computed for an arbitrary prior, and most works are restricted to Gaussian distributions. Avoiding highvariance methods to determine this regularizing term is also an important concern. Recently, this limitation was simplified by employing the Wasserstein distance as a penalty for the encoder regularization
Arjovsky2017Wasserstein ; Tolstikhin2017Wasserstein . As a result, richer latent representations are computed more efficiently within Wasserstein AutoEncoders, resulting in disentanglement, latent shaping, and improved reconstruction Rubenstein2018Latent ; Arjovsky2017Wasserstein ; Tolstikhin2017Wasserstein .Another approach to generative models are the Generative Adversarial Networks (GANs) Goodfellow2014Generative
. Recognized by their sharp reconstructions, GANs are constructed by making two neural networks compete against each other until a Nash equilibrium is found. One of the networks is a deterministic generative model. It applies a nonlinear set of transformations to a prior probability distribution
in order to match the real data distribution . Interestingly, the generator (or actor) only receives the prior distribution as input, and has no contact with the real data whatsoever. It can only be trained through a second network, called discriminator or critic. The latter tries to distinguish real data from fake data , as depicted in Fig. 1b. The objective of the critic is to perfectly distinguish between and , thus maximizing the prediction accuracy. On the other hand, the generator tries to fool the discriminator by creating data points that look like real data points, minimizing the prediction accuracy of the critic. Consequently, the complete GAN objective is written as Goodfellow2014Generative(2) 
Despite the impressive results from GANs, their training process is highly unstable. The minmax problem requires a wellbalanced training from both networks to ensure nonvanishing gradients and convergence to a successful model. Furthermore, GANs do not reward diversity of generated samples and the system is prone to mode collapse. There is no reason why the generated distribution should have the same support of the original data , and the actor produces only a handful of different examples which are realistic enough. This does not happen for the VAE, since the loglikelihood term gives an infinite loss for a generated data distribution with a disjoint support with respect to the original data distribution. Several different architectures have been proposed to address these issues among GANs Mirza2014Conditional ; Chen2016InfoGAN ; Arjovsky2017Wasserstein ; Che2016Mode ; Odena2016Conditional ; Mao2016Least ; Hjelm2017Boundary ; Zhao2016Energy ; Nowozin2016f ; Donahue2016Adversarial ; Berthelot2017BEGAN ; Gulrajani2017Improved ; Yi2017DualGAN . Although many of them may be equivalent to a certain extent Lucic2017Are , steady progress is being made in this area, especially through more complex ways of approximating data distributions, such as with fdivergence Nowozin2016f or optimal transport Arjovsky2017Wasserstein ; Gulrajani2017Improved ; Berthelot2017BEGAN .
Other models such as the autoregressive PixelRNN Oord2016Pixel and PixelCNN Oord2016Conditional ; Salimans2017PixelCNN++ have also been successful as generators of images Oord2016Pixel ; Oord2016Conditional ; Salimans2017PixelCNN++ , video Kalchbrenner2016Video , text Kalchbrenner2016Neural and sound Oord2016WaveNet . Differently from VAE and GANs, these models approximate the data distribution by a tractable factorization . For example, in an image, the generative model is written as Oord2016Pixel
(3) 
where each is a pixel generated by the model (see Fig. 1c). These models with explicit distributions yield samples with very good negative loglikelihood and diversity Oord2016Pixel . The model evaluation is also straightforward, given the explicit computation of . As a drawback, however, these models rely on the sequential generation of data, which is a slow process. A diagram of the architectures of the three generative models here discussed is seen in Fig. 1.
1.3 Generative models meet chemical design
Apart from their numerous aforementioned applications, generative models are also attracting attention in chemistry and materials science. DL is being employed not only for the prediction and identification of properties of molecules, but also to generate new chemical compounds LeCun2015Deep . In the context of inverse design, generative models provide benefits such as: generating complex samples from simple probability distributions; providing meaningful latent representations, over which optimizations can be performed; and the ability to perform inference when coupled to supervised models. Therefore, unifying generative models with chemical design is a promising venue to accelerate innovation in chemistry and related fields.
To go beyond the limitations of traditional inverse design strategies, an ideal way to discover new materials should satisfy some requisites GomezBombarelli2018Automatic . To be a completely handsfree model, the model should be datadriven, thus avoiding fixed libraries and expensive labeling. It is also desirable that it outputs as many potential molecules as possible under a subset of interest, which means that the model needs a powerful generator coupled with a continuous representation for molecules. Furthermore, such a representation should be interpretable, allowing a correct description of structureproperty relationships within molecules. If, additionally, the model is differentiable, it would be possible to optimize certain properties using gradient techniques and, later, look for molecules satisfying such constraints.
The development of such a tool is currently a priority for ML models in chemistry and for the inverse chemical design. It relies primarily on two decisions: which model to use and how to represent a molecule in a computerfriendly way. Following our brief introduction to the early inverse design strategies and main generative models in the literature, we describe which molecular representations are possible.
In quantum mechanics, a molecular system is represented by a wavefunction that is a solution of the Schrödinger equation for that particular molecule. To derive most properties of interest, the spatial wavefunction is enough. Computing such a representation, however, is equivalent to solving an (approximate) version of the Schrödinger equation itself. Many methods for theoretical chemistry, such as HartreeFock Hartree1928Wave ; Fock1930Naherungsmethode or Density Functional Theory Hohenberg1964Inhomogeneous ; Kohn1965Self , represent molecules using wavefunctions or electronic densities and obtain other properties from it. Solving quantum chemical calculations is computationally demanding in many cases, though. The idea with many ML methods is not only to avoid these calculations, but also to make a generalizable model that highlight different aspects of chemical intuition. Therefore, we should look for other representations for chemical structures.
Thousands of different descriptors are available for chemical prediction methods Todeschini2000Handbook . Several relevant features for ML have demonstrated their capabilities for predicting properties of molecules, such as fingerprints Rogers2010Extended , bagofbonds Hansen2015Machine , Coulomb matrices Rupp2012Fast
, deep tensor neural networks train on the distance matrix
Schuett2017Quantum , manybody tensor representation Huo2017Unified , SMILES strings Weininger1988SMILES , and graphs Kearnes2016Molecular ; Duvenaud2015Convolutional ; Gilmer2017Neural . Not all representations are invertible for human interpretation, however. To teach a generative model how to create a molecule, it may suffice for it to produce a fingerprint, for example. However, how can one map any possible fingerprint to a molecule is an extra step of complexity equivalent to the generation of libraries. This is undesirable in a practical generative model. In this chapter, we focus on two easily interpretable representations, SMILES strings and molecular graphs, and how generative models perform with these representations. Examples of these two forms of writing a molecule are shown in Fig. 2.2 Chemical generative models
2.1 SMILES representation
SMILES (Simplified Molecular Input Line Entry System) strings have been widely adopted as representation for molecules Weininger1988SMILES . Through graphtotext mapping algorithms, it determines atoms by atomic number and aromaticity, and can capture branching, cycles, ionization, etc. The same molecule can be represented by multiple SMILES strings, and thus a canonical representation is typically chose, although some works leverage noncanonical strings as a data augmentation and regularization strategy. Although SMILES are inferior to the more modern InChI (International Chemical Identifier) representation in their ability to address key challenges in representing molecules as strings such as tautomerism, mesomerism and some forms of isomerism, SMILES follow a much simpler syntax that has proven easier to learn for ML models.
Since SMILES rely on a sequencebased representation, natural language processing (NLP) algorithms in deep learning can be naturally extended to them. This allows the transferability of several architectures from the NLP community to interpret the chemical world. Mostly, these systems make use of recurrent neural networks (RNNs) to condition the generation of the next character on the previous ones, creating arbitrarily long sequences character by character
Goodfellow2016Deep. The order of the sequence is very relevant to generate a valid molecule, and observation of such restrictions can be typically incorporated in RNNs with long shortterm memory cells (LSTM)
Hochreiter1997Long, gated recurrent units (GRUs)
Chung2014Empirical , or stackaugmented memory Popova2018Deep .A simple form of generating molecules using only RNN architectures is to extensively train them with valid SMILES from a database of molecules. This requires postprocessing analyses, as it resembles traditional library generation. As a proof of concept, Ikebata et al.Ikebata2017Bayesian used SMILES strings to design small organic molecules by employing Bayesian sampling with sequential Monte Carlo. Ertl et al.Ertl2017silico instead generated molecules using LSTM cells and later employed them in a virtual screening for properties.
Generating libraries, however, is not enough for the automatic discovery of chemical compounds. Asking an RNNbased model to simply create SMILES strings does not improve on the rational exploration of the chemical space. In general, the design of new molecules is also oriented towards certain properties, like solubility, toxicity and druglikeness GomezBombarelli2018Automatic
, which are not necessarily incorporated in the training process of RNNs. In order to skew the generation of molecules and better investigate a subset of the chemical space, Segler et al.
Segler2018Generatingused transferlearning to first train the RNN on a whole dataset of molecules and later finetune the model towards the generation of molecules with physicochemical properties of interest. This twopart approach allows the model to first learn the grammar inherent to SMILES to then create new molecules based only on the most interesting ones. In line with this depthsearch, Gupta et al.
Gupta2017Generative demonstrated the application of transfer learning to grow molecules from fragments. This technique is particularly useful for drug discovery Chen2018rise ; Ching2018Opportunities , in which the search of the chemical space usually begins from a known substructure with certain desired functionalities.Recently, the usage of reinforcement learning (RL) to generate molecules with certain properties became popular among generative models. Since the representation of a molecule using SMILES requires the generator to output a sequence of characters, each decision can be considered as an action. The successful completion of a valid SMILES string is associated with a reward, for example, and undesired features in the sequence are penalized. Jaques et al.
Jaques2017Sequence used RL to impose a structure on sequence generation, avoiding repeating patterns not only in SMILES strings but also in text and music. By penalizing large rings, short sequences of characters and long, monotonous carbon chains, they were able to increase the number of valid molecules their model produced. Olivecrona et al.Olivecrona2017Molecular demonstrated the usage of augmented episodic likelihood and traditional policy gradient methods to tune the generation of molecules from an RNN. Their method achieved 94% of validity on generating molecules sampled from a prior distribution. It was also taught to avoid functional groups containing sulfur and to generate structures similar to a given structure or with certain target activities. Similarly, Popova et al.Popova2018Deep designed molecules for drugs using a stackaugmented RNN. It demonstrated improved capacity to capture the grammar of SMILES while using RL to tune their synthetic accessibility, solubility, inhibition and other properties.As the degree of abstraction grows in the molecule design, more complex generative models are proposed to explore the chemical space. VAEs, for example, can include a direct mapping between structures and properties and viceversa. Its joint training with an encoder and a decoder is capable of approximating very complex data distributions using a realvalued and compressed representation, which is essential for improving the search for chemical compounds. Since the latent space is meaningful, the generator learns to associate patterns in the latent space with properties of the real data. After both the encoding and the decoding networks are jointly trained, the generative model can be decoupled from the inference step and latent variables then become the field for exploration. Therefore, VAEs map the original chemical space to a continuous, differentiable space conveying all the information about the original molecules, over which optimization can be performed. Additionally, conditional generation of molecules based on properties is made possible without handmade constraints in SMILES, semisupervised methods can be used to tune the model with relevant properties. This approach is closer to the model of an ideal, automatic, chemical generative model as discussed earlier.
Constructed over RNNs as both encoder and decoder, GómezBombarelli et al.GomezBombarelli2018Automatic trained a VAE on prediction and reconstruction tasks for molecules extracted from the QM9 and ZINC datasets. The latent space allowed not only sampling of molecules but also interpolations, reconstruction, and optimization using a Gaussian process predictor trained on the latent space (Fig. 3). Kang and Cho Kang2018Conditional used partial annotation on molecules to train a semisupervised VAE to decrease the error for property prediction and to generate molecules conditioned on targets. It can also be enhanced in combination with other dimensionality reduction algorithms Sattarov2019De . Within the chemical world, VAEs based on sequences also show promise for investigating proteins Sinai2017Variational , learning chemical interactions between molecules Kwon2017DeepCCI , designing organic lightemitting diodes Kim2018Deep and generating ligands Mallet2019Leveraging ; Lim2018Molecular .
In the field of molecule generation, GANs usually appear associated with RL. To finetune the generation of long SMILES strings, Guimaraes et al.Guimaraes2017Objective employed a Wasserstein GAN Arjovsky2017Wasserstein with a stochastic policy that increased the diversity, optimized the properties and maintained the druglikeness of the generated samples. SanchezLangelin et al.SanchezLengeling2017Optimizing and Putin et al.Putin2018Reinforced further improved upon this work to bias the distribution of generated molecules towards a goal. In addition, MendezLucio et al.MendezLucio2018De used a GAN to generate molecules conditioned on gene expression signatures, which is particularly useful to create active compounds towards a certain target. Similarly to what is done with molecules, Killoran et al.Killoran2017Generating employed a GAN to create realistic samples of DNA sequences from a small subset of configurations. The model was also tuned to design DNA chains adapted to protein binding and look for motifs representing functional roles. Adversarial training was also employed in the discovery of drugs for using molecular fingerprints as opposed to a reversible representation Kadurin2017druGAN ; Kadurin2017cornucopia ; Blaschke2018Application and SMILES Polykovskiy2018Entangled . However, avoiding the unstable training and mode collapse while generating molecules is still a hindrance for the usage of GANs in chemical design.
Although SMILES have proved to be a reliable representation for molecule generation, their sequential nature imposes some constraints to the architectures being learned. Forcing an RNN to implicitly learn their linguistic rules poses additional difficulties to the model under training. Additionally, decoding a sequence of generated characters into a valid molecule is especially difficult. In GomezBombarelli2018Automatic , the rate of success when decoding molecules depended on the proximity of the latent point to the valid molecule, and could be as low as 4% for random points on the latent space. Although RL is as an alternative to reward the generation of valid molecules Jaques2017Sequence ; Guimaraes2017Objective ; SanchezLengeling2017Optimizing , other architecture changes can also circumvent this difficulty. Techniques to generate valid sequences imported from NLP studies include: using revision to improve the outcome of sequences Mueller2017Sequence ; adding a validator to the decoder to generate more valid samples Janet2018Accelerating ; introducing a grammar within the VAE to teach the model the fundamentals of SMILES strings Kusner2017Grammar ; using compiler theory to constrain the decoder to produce syntactically and semantically correct data Dai2018Syntax ; and using machine translation methods to convert between representations of sequences and/or grammar Winter2019Learning .
Validity of generated sequences, however, is not the only thing that makes working with SMILES difficult. The sequential representation cannot represent similarity between molecules within edit distances Jin2018Junction and a single molecule may have several different SMILES strings Bjerrum2017SMILES ; Alperstein2019All . The tradeoff between processing this representation with textbased algorithms and discarding its chemical intuition calls for other approaches in the study and design of molecules.
2.2 Molecular graphs
An intuitive way of representing molecules is by means of its Lewis structure, computationally translated as a molecular graph. Given a graph , the atoms are represented as nodes and chemical bonds as edges . Then, nodes and edges are decorated with labels indicating the atom type, bond type and so on. Many times, hydrogen atoms are treated implicitly for simplicity, since their presence can be inferred from traditional chemistry rules.
One of the first usages of graphs with DL for property prediction treated molecules as undirected cyclic graphs further processed using RNNs Lusci2013Deep . Using graph convolutional networks Bruna2013Spectral , Duvenaud et al.Duvenaud2015Convolutional demonstrated the usage of machinelearned fingerprints to achieve better prediction of properties on neural networks. This approach started with a molecular graph and led to fixedsize fingerprints after several graph convolutions and a graph pooling layers. Kearnes et al.Kearnes2016Molecular and Coley et al.Coley2017Convolutional also evaluated the flexibility and promise of learned fingerprints from graph structures, especially because models could learn how to associate its chemical structure to their properties. Later, Gilmer et al.Gilmer2017Neural unified graph convolutions as messagepassing neural networks for quantum chemistry predictions, achieving DFT accuracy within their predictions of quantum properties, interpreting molecular 3D geometries as graphs with distancelabelled edges. Many more studies have explored the representative power of graphs within prediction tasks Hop2018Geometric ; Yang2019Are . These frameworks paved the way for using graphbased representations of molecules, especially because of their proximity with chemistry and geometrical interpretation.
The generation of graphs is, however, nontrivial, especially because of the challenges imposed by graph isomorphism. As in SMILES strings, one way to generate molecular graphs is by sequentially adding nodes and edges to the graph. The sequential nature of decisions over graphs have already been implemented using an RNN You2018GraphRNN for arbitrary graphs. Specifically for a small subset of graphs corresponding to valid molecules, Li et al.Li2018Multi used a decoder policy to improve the outcomes of the model. The conditional generation of graphs allowed for molecules to be created with improved druglikeness, synthetic accessibility, as well as allowed scaffoldbased generations from a template (Fig. 4a). Similar procedure was adopted by Li et al.Li2018Multi , in which a graphgenerating decision process using RNNs was proposed for molecules. These nodebynode generation rely on the ordering of nodes in the molecular graph and thus suffer with random permutations of the nodes.
In the VAE world, several methods have been proposed to deal with the problem of directly generating graphs from a latent code Kipf2016Variational ; Simonovsky2018GraphVAE ; Grover2018Graphite ; Samanta2018Designing ; Liu2018Constrained . However, when working with reconstructions, the problem of graph isomorphism cannot be addressed without expensive calculations Simonovsky2018GraphVAE . Furthermore, graph reconstructions suffer from validity and accuracy Simonovsky2018GraphVAE , except when these constraints are enforced in the graph generation process Samanta2018Designing ; Liu2018Constrained ; Ma2018Constrained . Currently, one of the most successful approaches to translate molecular graphs into a meaningful latent code while avoiding nodebynode generation is the Junction Tree Variational AutoEncoder (JTVAE) Jin2018Junction . In this framework, the molecular graph is first decomposed into a vocabulary of subpieces extracted from the training set, which include rings, functional groups and atoms (see Fig. 4b). Then, the model is trained to encode the full graph and the tree structure resulting from the decomposition into two latent spaces. A twopart reconstruction process is necessary to recover the original molecule from the two vector representations. Remarkably, the JTVAE achieves 100% of validity when generating small molecules, as well as 100% of novelty when sampling the latent code from a prior. Moreover, a meaningful latent space is also seen for this method, which is essential for optimization and the automatic design of molecules. The authors later improve over the JTVAE with graphtograph translation and autoregressive methods towards molecular optimization tasks Jin2019Learning ; Jin2019Multi .
Other autoregressive approaches combining VAE and sequential graph generation have been proposed to generate and optimize molecules. Assouel et al.Assouel2018DEFactor introduced a decoding strategy to output arbitrarily large molecules based on their graph representation. The model, named DEFactor, is endtoend differentiable, dispenses retraining during the optimization procedure and achieved high reconstruction accuracy ( even for molecules with about 25 heavy atoms. Despite the restrictions on node permutations, DEFactor allows the direct optimization of the graph conditioned to properties of interest. This and other similar models also allow the generation of molecules based on given scaffolds Lim2019Scaffold .
Autoregressive methods for molecules have also been reported with the use of RL. Zhou et al.Zhou2018Optimization
created a Markov decision process to produce molecules with targeted properties through multiobjective RL. Similarly to what is done with graphs, this strategy adds bonds and atoms sequentially. However, as the actions are restricted to chemically valid ones, the model scores 100% of validity in the generated compounds. The optimization process forgoes pretraining and allows flexibility in the choice of the importances for each objectives. As a followup to this work, the same group reports the usage of this generation scheme as a decoder in a RLenhanced VAE for molecules
Kearnes2019Decoding .In line with the usage of sequences of actions to create graphs, several groups have been working on different ways to represent and generate graphs through sequences. One approach is to split a graph in permutationinvariant Ngram path sets
Liu2018N , in analogy with NLP with atoms as words and molecules as sentences. This representation performs competitively with messagepassing neural networks in classification and regression tasks. The combination of strings and graph methods is also seen in the work of Krenn et al.Krenn2019SELFIES , which developed a sequence representation for generalpurpose graphs. Their scheme shows high robustness against mutations in sequences and outperforms other representations (including SMILES strings) in terms of diversity, validity, and reconstruction accuracy when employed in sequencebased VAEs.The adversarial generation of graphs is still very incipient, and few models of GANs with graphs have been demonstrated Guo2018Deep ; Bojchevski2018NetGAN ; Xiong2019DynGraphGAN . De Cao and Kipf DeCao2018MolGAN demonstrated MolGAN, a GAN trained with RL for generating molecular graphs, but their system is too prone to mode collapse. The output structure can be made discrete by differentiable processes such as Gumbelsoftmax Jang2016Categorical ; Kusner2016GANS , but balancing the adversarial training with molecular constraints requires more study. Pölsterl and Wachinger Poelsterl2019Likelihood builds on MolGAN by adding an adversarial training to avoid calculating the reconstruction loss and extending the graph isomorphism network Xu2018How to multigraphs. Further improvements include the approach from Maziarka et al.Maziarka2019Mol , which relies on the latent space of a pretrained JTVAE to produce and optimize molecules, and the work of Fan and Huang Fan2019Labeled , which aims to generate labeled graphs.
While the combination of DL with graph theory and molecular design seems promising, large room for improvement is available in the field of graph generation. Outputting an arbitrary graph is still an open problem and scalability to larger graphs is still an issue for graphs Gilmer2017Neural . Comparing graph isomorphism is a classNP problem, and the measure of similarity between two graphs usually resort to expensive kernels or edit distances Neuhaus2007Bridging , as are other problems with reconstruction, ordering and so on Li2018Learning . In some cases, a distance metric can be defined for such data structures Schieber2017Quantification ; Choi2018Comparing or a set of networks can be trained to recognize similarity patterns within graphs Ktena2017Distance . Furthermore, adding attention to graphs could also help in classification tasks Do2018Attentional or in the extraction of structureproperty relationships Ryu2018Deeply , and specifying grammar rules for graph reconstruction may lead to improved results in molecular validity and stereochemistry Kajino2018Molecular .
3 Challenges and outlook for generative models
The use of deep generative models is a powerful approach for teaching computers to observe and understand the real world. Far from being just a bigdata crunching tool, DL algorithms can provide insights that augment human creativity Kalchbrenner2016Neural . Completely evaluating a generative model is difficult Theis2015note , since we lack an expression for the statistical distribution being learned. Nevertheless, by approximating reallife data with an appropriate representation, we are embedding intuition in the machine’s understanding. In a sense, this is what we do, as human beings, when formulating theoretical concepts on chemistry, physics and many other fields of study. Furthering our limited ability to probe the inner workings of deep neural networks will allow us to transform learned embeddings into logical rules.
In the field of chemical design, generative models are still in their infancy (see timeline summary in Fig. 5). While many achievements have been reported for such models, all of them share many challenges before a “closed loop” approach can be effectively implemented. Some of the trials are still inherent to all generative models: the generalization capability of a model, its power to make inferences on the real world, and capacity to bring novelty to it. In the chemical space, originality can be translated as the breadth and quality of possible molecules that the model can generate. To push forward the development of new technologies, we want our generative models to explore further regions of the chemical space in search of new solutions to current problems and extrapolate the training set, avoiding mode collapses or naïve interpolations. At the same time, we want it to capture rules inherent to the synthetically accessible space. Finally, we want to critically evaluate the performance of such models. Several benchmarks are being developed to assess the evolution of chemical generative models, providing quantitative comparisons beyond the mere prediction of solubility or druglikeness Preuer2018Frechet ; Polykovskiy2018Molecular ; Wu2018MoleculeNet ; Brown2019GuacaMol .
The ease of navigation throughout the chemical space alone is not enough to determine a good model, however. Tailoring the generation of valid molecules for certain applications such as drug design Segler2018Generating is also an important task. It reflects how well a generative model focus on the structureproperty relationships for certain applications. This interpretation leads to even more powerful understandings of chemistry, and is closely tied to Gaussian processes GomezBombarelli2018Automatic , Bayesian optimization Haese2018PHOENICS , and virtual screening.
In the generation process, outputting an arbitrary molecule is still an open problem and is closely conditioned to the representation. While SMILES have been demonstrated useful to represent molecules, graphs are able to convey real chemical features in it, which is useful for learning properties from structures. However, threedimensional atomic coordinates should be considered for decoding as well. Recent works are going well beyond the connectivity of a molecule to provide equilibrium geometries of molecules using generative models Gebauer2018Generating ; Noe2018Boltzmann ; Gebauer2019Symmetry ; Joergensen2019Atomistic ; Mansimov2019Molecular . This is crucial to bypass expensive sampling of lowenergy configurations from the potential energy surface of molecules. We should expect advances not only on decoding and generating graphs from latent codes, but also in invertible molecular representations in terms of sequences, connectivity and spatial arrangement.
Finally, as the field of generative models advances, we should expect even more exciting models to design molecules. The normalizingflow based Boltzmann Generator Noe2018Boltzmann and GraphNVP Madhawa2019GraphNVP are examples of models based on more recent strategies. Furthermore, the use of generative models to understand molecules in an unsupervised way advances along with the inverse design, from coarsegraining Wang2018Machine ; Wang2018Coarse and synthesizability of small molecules Bradshaw2019Generative ; Bradshaw2019Model to genetic variation in complex biomolecules Riesselman2018Deep .
In summary, generative models hold promise to revolutionize the chemical design. Not only they allow optimizations or learn directly from data, but also bypass the necessity of a human supervising the generation of materials. Facing the challenges among these models is essential for accelerating the discovery cycle of new materials and, perhaps, improvement of the human understanding of the nature.
Acknowledgements.
D.S.K. acknowledges the MIT Nicole and Ingo Wender Fellowship and the MIT Robert Rose Presidential Fellowship for financial support. R.G.B. thanks MIT DMSE and Toyota Faculty Chair for support.Bibliography
 (1) D.P. Tabor, L.M. Roch, S.K. Saikin, C. Kreisbeck, D. Sheberla, J.H. Montoya, S. Dwaraknath, M. Aykol, C. Ortiz, H. Tribukait, C. AmadorBedolla, C.J. Brabec, B. Maruyama, K.A. Persson, A. AspuruGuzik, Nat. Rev. Mater. 3(5), 5 (2018)
 (2) R.F. Gibson, Compos. Struct. 92(12), 2793 (2010)
 (3) H. Chen, O. Engkvist, Y. Wang, M. Olivecrona, T. Blaschke, Drug Discov. Today 23(6), 1241 (2018)
 (4) J.A. DiMasi, H.G. Grabowski, R.W. Hansen, J. Health Econ. 47, 20 (2016)
 (5) B.K. Shoichet, Nature 432(7019), 862 (2004)
 (6) J. Greeley, T.F. Jaramillo, J. Bonde, I. Chorkendorff, J.K. Nørskov, Nat. Mater. 5(11), 909 (2006)
 (7) S.V. Alapati, J.K. Johnson, D.S. Sholl, J. Phys. Chem. B 110(17), 8769 (2006)
 (8) W. Setyawan, R.M. Gaume, S. Lam, R.S. Feigelson, S. Curtarolo, ACS Comb. Sci. 13(4), 382 (2011)
 (9) S. Subramaniam, M. Mehrotra, D. Gupta, Bioinformation 3(1), 14 (2008)
 (10) R. Armiento, B. Kozinsky, M. Fornari, G. Ceder, Phys. Rev. B 84(1) (2011)
 (11) A. Jain, G. Hautier, C.J. Moore, S.P. Ong, C.C. Fischer, T. Mueller, K.A. Persson, G. Ceder, Comput. Mater. Sci. 50(8), 2295 (2011)
 (12) S. Curtarolo, G.L.W. Hart, M.B. Nardelli, N. Mingo, S. Sanvito, O. Levy, Nat. Mater. 12(3), 191 (2013)
 (13) E.O. PyzerKnapp, C. Suh, R. GómezBombarelli, J. AguileraIparraguirre, A.A.A. AspuruGuzik, R. GomezBombarelli, J. AguileraIparraguirre, A.A.A. AspuruGuzik, D.R. Clarke, Annu. Rev. Mater. Res. 45(1), 195 (2015)
 (14) R. GómezBombarelli, J. AguileraIparraguirre, T.D. Hirzel, D. Duvenaud, D. Maclaurin, M.A. BloodForsythe, H.S. Chae, M. Einzinger, D.G. Ha, T. Wu, G. Markopoulos, S. Jeon, H. Kang, H. Miyazaki, M. Numata, S. Kim, W. Huang, S.I. Hong, M. Baldo, R.P. Adams, A. AspuruGuzik, Nat. Mater. 15(10), 1120 (2016)
 (15) D. Morgan, G. Ceder, S. Curtarolo, Meas. Sci. Technol. 16(1), 296 (2004)
 (16) C. Ortiz, O. Eriksson, M. Klintenberg, Comput. Mater. Sci. 44(4), 1042 (2009)
 (17) L. Yu, A. Zunger, Phys. Rev. Lett. 108(6) (2012)
 (18) K. Yang, W. Setyawan, S. Wang, M.B. Nardelli, S. Curtarolo, Nat. Mater. 11(7), 614 (2012)
 (19) L.C. Lin, A.H. Berger, R.L. Martin, J. Kim, J.A. Swisher, K. Jariwala, C.H. Rycroft, A.S. Bhown, M.W. Deem, M. Haranczyk, B. Smit, Nat. Mater. 11(7), 633 (2012)
 (20) N. Mounet, M. Gibertini, P. Schwaller, D. Campi, A. Merkys, A. Marrazzo, T. Sohier, I.E. Castelli, A. Cepellotti, G. Pizzi, et al., Nat. Nanotechnol. 13(3), 246 (2018)
 (21) R. Potyrailo, K. Rajan, K. Stoewe, I. Takeuchi, B. Chisholm, H. Lam, ACS Comb. Sci. 13(6), 579 (2011)
 (22) A. Jain, Y. Shin, K.A. Persson, Nat. Rev. Mater. 1(1) (2016)
 (23) National Science and Technology Council (US), Materials genome initiative for global competitiveness (Executive Office of the President, National Science and Technology Council, 2011)
 (24) S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R.H. Taylor, L.J. Nelson, G.L. Hart, S. Sanvito, M. BuongiornoNardelli, N. Mingo, O. Levy, Comput. Mater. Sci. 58, 227 (2012)
 (25) C.E. Calderon, J.J. Plata, C. Toher, C. Oses, O. Levy, M. Fornari, A. Natan, M.J. Mehl, G. Hart, M.B. Nardelli, S. Curtarolo, Comput. Mater. Sci. 108, 233 (2015)
 (26) A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson, APL Materials 1(1), 011002 (2013)
 (27) J.E. Saal, S. Kirklin, M. Aykol, B. Meredig, C. Wolverton, JOM 65(11), 1501 (2013)
 (28) B. SanchezLengeling, A. AspuruGuzik, Science 361(6400), 360 (2018)
 (29) A. Zunger, Nat. Rev. Chem. 2(4), 0121 (2018)
 (30) P.G. Polishchuk, T.I. Madzhidov, A. Varnek, J. Comput.Aided Mol. Des. 27(8), 675 (2013)
 (31) A.M. Virshup, J. ContrerasGarcía, P. Wipf, W. Yang, D.N. Beratan, J. Am. Chem. Soc. 135(19), 7296 (2013)
 (32) K.G. Joback, Designing molecules possessing desired physical property values. Ph.D. thesis, Massachusetts Institute of Technology (1989)
 (33) C. Kuhn, D.N. Beratan, J. Phys. Chem. 100(25), 10595 (1996)
 (34) D.J. Wales, H.A. Scheraga, Science 285(5432), 1368 (1999)
 (35) J. Schön, M. Jansen, Z. Kristallogr. Cryst. Mater. 216(6) (2001)
 (36) R. Gani, E. Brignole, Fluid Phase Equilib. 13, 331 (1983)
 (37) S.R. Marder, D.N. Beratan, L.T. Cheng, Science 252(5002), 103 (1991)
 (38) P.M. Holmblad, J.H. Larsen, I. Chorkendorff, L.P. Nielsen, F. Besenbacher, I. Stensgaard, E. Lægsgaard, P. Kratzer, B. Hammer, J.K. Nøskov, Catal. Lett. 40(34), 131 (1996)
 (39) O. Sigmund, S. Torquato, J. Mech. Phys. Solids 45(6), 1037 (1997)
 (40) C. Wolverton, A. Zunger, B. Schönfeld, Solid State Commun. 101(7), 519 (1997)
 (41) N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, J. Chem. Phys. 21(6), 1087 (1953)
 (42) R. Kaplow, T.A. Rowe, B.L. Averbach, Phys. Rev. 168(3), 1068 (1968)
 (43) V. Gerold, J. Kern, Acta Metall. 35(2), 393 (1987)
 (44) R.L. McGreevy, L. Pusztai, Mol. Simul. 1(6), 359 (1988)
 (45) A. Franceschetti, A. Zunger, Nature 402(6757), 60 (1999)
 (46) J.H. Holland, Adaptation in Natural and Artificial Systems (MIT Press Ltd, 1992)
 (47) R. Judson, E. Jaeger, A. Treasurywala, M. Peterson, J. Comput. Chem. 14(11), 1407 (1993)
 (48) R.C. Glen, A.W.R. Payne, J. Comput.Aided Mol. Des. 9(2), 181 (1995)
 (49) V. Venkatasubramanian, K. Chan, J. Caruthers, Computers & Chemical Engineering 18(9), 833 (1994)
 (50) V. Venkatasubramanian, K. Chan, J.M. Caruthers, J. Chem. Inf. Model. 35(2), 188 (1995)
 (51) A.L. Parrill, Drug Discov. Today 1(12), 514 (1996)
 (52) G. Schneider, M.L. Lee, M. Stahl, P. Schneider, J. Comput.Aided Mol. Des. 14(5), 487 (2000)
 (53) D.B. Gordon, S.L. Mayo, Structure 7(9), 1089 (1999)
 (54) M.T. Reetz, Proceedings of the National Academy of Sciences 101(16), 5716 (2004)
 (55) D. Wolf, O. Buyevskaya, M. Baerns, Appl. Catal., A 200(12), 63 (2000)
 (56) G.H. Jóhannesson, T. Bligaard, A.V. Ruban, H.L. Skriver, K.W. Jacobsen, J.K. Nørskov, Phys. Rev. Lett. 88(25) (2002)
 (57) S.V. Dudiy, A. Zunger, Phys. Rev. Lett. 97(4) (2006)
 (58) P. Piquini, P.A. Graf, A. Zunger, Phys. Rev. Lett. 100(18) (2008)
 (59) M. d’Avezac, J.W. Luo, T. Chanier, A. Zunger, Phys. Rev. Lett. 108(2) (2012)
 (60) L. Zhang, J.W. Luo, A. Saraiva, B. Koiller, A. Zunger, Nat. Commun. 4(1) (2013)
 (61) L. Yu, R.S. Kokenyesi, D.A. Keszler, A. Zunger, Adv. Energy Mater. 3(1), 43 (2012)
 (62) T. Brodmeier, E. Pretsch, J. Comput. Chem. 15(6), 588 (1994)
 (63) S.M. Woodley, P.D. Battle, J.D. Gale, C.R.A. Catlow, Phys. Chem. Chem. Phys. 1(10), 2535 (1999)
 (64) C.W. Glass, A.R. Oganov, N. Hansen, Comput. Phys. Commun. 175(1112), 713 (2006)
 (65) A.R. Oganov, C.W. Glass, J. Chem. Phys. 124(24), 244704 (2006)
 (66) N.S. Froemming, G. Henkelman, J. Chem. Phys. 131(23), 234103 (2009)
 (67) L.B. Vilhelmsen, B. Hammer, J. Chem. Phys. 141(4), 044711 (2014)
 (68) G.L.W. Hart, V. Blum, M.J. Walorski, A. Zunger, Nat. Mater. 4(5), 391 (2005)
 (69) V. Blum, G.L.W. Hart, M.J. Walorski, A. Zunger, Phys. Rev. B 72(16) (2005)
 (70) C. Rupakheti, A. Virshup, W. Yang, D.N. Beratan, J. Chem. Inf. Model. 55(3), 529 (2015)
 (71) J.L. Reymond, Acc. Chem. Res. 48(3), 722 (2015)
 (72) T.C. Le, D.A. Winkler, Chem. Rev. 116(10), 6107 (2016)
 (73) P.C. Jennings, S. Lysgaard, J.S. Hummelshøj, T. Vegge, T. Bligaard, npj Comput. Mater. 5(1) (2019)
 (74) O.A. von Lilienfeld, R.D. Lins, U. Rothlisberger, Phys. Rev. Lett. 95(15) (2005)
 (75) V. Marcon, O.A. von Lilienfeld, D. Andrienko, J. Chem. Phys. 127(6), 064305 (2007)
 (76) M. Wang, X. Hu, D.N. Beratan, W. Yang, J. Am. Chem. Soc. 128(10), 3228 (2006)
 (77) P. Hohenberg, W. Kohn, Phys. Rev. 136(3B), B864 (1964)
 (78) D. Xiao, W. Yang, D.N. Beratan, J. Chem. Phys. 129(4), 044106 (2008)
 (79) D. Balamurugan, W. Yang, D.N. Beratan, J. Chem. Phys. 129(17), 174105 (2008)
 (80) S. Keinan, X. Hu, D.N. Beratan, W. Yang, J. Phys. Chem. A 111(1), 176 (2007)
 (81) X. Hu, D.N. Beratan, W. Yang, J. Chem. Phys. 129(6), 064102 (2008)
 (82) F.D. Vleeschouwer, W. Yang, D.N. Beratan, P. Geerlings, F.D. Proft, Phys. Chem. Chem. Phys. 14(46), 16002 (2012)
 (83) G.E. Hinton, T.J. Sejnowski, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, ed. by D.E. Rumelhart, J.L. McClelland, C. PDP Research Group (MIT Press, Cambridge, MA, USA, 1986), pp. 282–317

(84)
G.E. Hinton, T.J. Sejnowski, in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(1983)  (85) P. Smolensky, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, ed. by D.E. Rumelhart, J.L. McClelland, C. PDP Research Group (MIT Press, Cambridge, MA, USA, 1986), chap. Informatio, pp. 194–281
 (86) G.E. Hinton, S. Osindero, Y.W. Teh, Neural Comput. 18(7), 1527 (2006)
 (87) R. Salakhutdinov, G. Hinton, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2), 2735 (2009)
 (88) I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016)
 (89) T. Karras, T. Aila, S. Laine, J. Lehtinen, arXiv:1710.10196 (2017)
 (90) I.J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, Y. Bengio, arXiv:1406.2661 (2014)
 (91) L.A. Gatys, A.S. Ecker, M. Bethge, arXiv:1508.06576 (2015)
 (92) C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi, arXiv:1609.04802 (2016)
 (93) S.R. Bowman, L. Vilnis, O. Vinyals, A.M. Dai, R. Jozefowicz, S. Bengio, G. Brain, arXiv:1511.06349 pp. 1–15 (2015)
 (94) K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, Y. Bengio, arXiv:1502.03044 (2015)
 (95) S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, Y. Bengio, arXiv:1612.07837 (2016)
 (96) A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, arXiv:1609.03499 (2016)
 (97) C. Vondrick, H. Pirsiavash, A. Torralba, arXiv:1609.02612 (2016)
 (98) A. Radford, L. Metz, S. Chintala, arXiv:1511.06434 (2015)
 (99) J. Engel, M. Hoffman, A. Roberts, arXiv:1711.05772 (2017)
 (100) Y. LeCun, Y. Bengio, G. Hinton, Nature 521(7553), 436 (2015)
 (101) D.P. Kingma, M. Welling, arXiv:1312.6114 (2013)
 (102) M. Arjovsky, S. Chintala, L. Bottou, arXiv:1701.07875 (2017)
 (103) I. Tolstikhin, O. Bousquet, S. Gelly, B. Schölkopf, B. Schoelkopf, arXiv:1711.01558 (2017)
 (104) P.K. Rubenstein, B. Schoelkopf, I. Tolstikhin, B. Schölkopf, I. Tolstikhin, arXiv:1802.03761 (2018)
 (105) M. Mirza, S. Osindero, arXiv:1411.1784 (2014)
 (106) X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, arXiv:1606.03657 (2016)
 (107) T. Che, Y. Li, A.P. Jacob, Y. Bengio, W. Li, arXiv:1612.02136 (2016)
 (108) A. Odena, C. Olah, J. Shlens, arXiv:1610.09585 (2016)
 (109) X. Mao, Q. Li, H. Xie, R.Y.K. Lau, Z. Wang, S.P. Smolley, arXiv:1611.04076 (2016)
 (110) R.D. Hjelm, A.P. Jacob, T. Che, A. Trischler, K. Cho, Y. Bengio, arXiv:1702.08431 (2017)
 (111) J. Zhao, M. Mathieu, Y. LeCun, arXiv:1609.03126 (2016)
 (112) S. Nowozin, B. Cseke, R. Tomioka, arXiv:1606.00709 (2016)
 (113) J. Donahue, P. Krähenbühl, T. Darrell, arXiv:1605.09782 (2016)
 (114) D. Berthelot, T. Schumm, L. Metz, arXiv:1703.10717 (2017)
 (115) I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, arXiv:1704.00028 (2017)
 (116) Z. Yi, H. Zhang, P. Tan, M. Gong, arXiv:1704.02510 (2017)
 (117) M. Lucic, K. Kurach, M. Michalski, S. Gelly, O. Bousquet, arXiv:1711.10337 (2017)
 (118) A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu, arXiv:1601.06759 (2016)
 (119) A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, K. Kavukcuoglu, arXiv:1606.05328 (2016)
 (120) T. Salimans, A. Karpathy, X. Chen, D.P. Kingma, arXiv:1701.05517 (2017)
 (121) N. Kalchbrenner, A. van den Oord, K. Simonyan, I. Danihelka, O. Vinyals, A. Graves, K. Kavukcuoglu, arXiv:1610.00527 (2016)
 (122) N. Kalchbrenner, L. Espeholt, K. Simonyan, A. van den Oord, A. Graves, K. Kavukcuoglu, arXiv:1610.10099 (2016)
 (123) R. GómezBombarelli, J.N. Wei, D. Duvenaud, J.M. HernándezLobato, B. SánchezLengeling, D. Sheberla, J. AguileraIparraguirre, T.D. Hirzel, R.P. Adams, A. AspuruGuzik, ACS Cent. Sci. 4(2), 268 (2018)
 (124) D.R. Hartree, Math. Proc. Cambridge Philos. Soc. 24(01), 89 (1928)
 (125) V. Fock, Z. Phys. A At. Nucl. 61(12), 126 (1930)
 (126) W. Kohn, L.J. Sham, Phys. Rev. 140(4A), A1133 (1965)
 (127) R. Todeschini, V. Consonni, Handbook of Molecular Descriptors. Methods and Principles in Medicinal Chemistry (WileyVCH Verlag GmbH, Weinheim, Germany, 2000)
 (128) D. Rogers, M. Hahn, J. Chem. Inf. Model. 50(5), 742 (2010)
 (129) K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O.A. von Lilienfeld, K.R. Müller, A. Tkatchenko, The Journal of Physical Chemistry Letters 6(12), 2326 (2015)
 (130) M. Rupp, A. Tkatchenko, K.R. Müller, O.A. von Lilienfeld, Phys. Rev. Lett. 108(5), 058301 (2012)
 (131) K.T. Schütt, F. Arbabzadah, S. Chmiela, K.R. Müller, A. Tkatchenko, Nat. Commun. 8, 13890 (2017)
 (132) H. Huo, M. Rupp, arXiv:1704.06439 (2017)
 (133) D. Weininger, J. Chem. Inf. Model. 28(1), 31 (1988)
 (134) S. Kearnes, K. McCloskey, M. Berndl, V. Pande, P. Riley, J. Comput.Aided Mol. Des. 30(8), 595 (2016)
 (135) D.K. Duvenaud, D. Maclaurin, J. AguileraIparraguirre, R. GómezBombarelli, T. Hirzel, A. AspuruGuzik, R.P. Adams, in Advances in Neural Information Processing Systems (2015), pp. 2215–2223
 (136) J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, G.E. Dahl, arXiv:1704.01212 (2017)
 (137) S. Hochreiter, J. Schmidhuber, Neural Comput. 9(8), 1735 (1997)
 (138) J. Chung, C. Gulcehre, K. Cho, Y. Bengio, arXiv:1412.3555 (2014)
 (139) M. Popova, O. Isayev, A. Tropsha, Sci. Adv. 4(7), eaap7885 (2018)
 (140) H. Ikebata, K. Hongo, T. Isomura, R. Maezono, R. Yoshida, J. Comput.Aided Mol. Des. 31(4), 379 (2017)
 (141) P. Ertl, R. Lewis, E. Martin, V. Polyakov, arXiv:1712.07449 (2017)
 (142) M.H.S. Segler, T. Kogej, C. Tyrchan, M.P. Waller, ACS Cent. Sci. 4(1), 120 (2018)
 (143) A. Gupta, A.T. Müller, B.J.H. Huisman, J.A. Fuchs, P. Schneider, G. Schneider, Mol. Inf. 37(12), 1700111 (2017)
 (144) T. Ching, D.S. Himmelstein, B.K. BeaulieuJones, A.A. Kalinin, B.T. Do, G.P. Way, E. Ferrero, P.M. Agapow, M. Zietz, M.M. Hoffman, W. Xie, G.L. Rosen, B.J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A.E. Carpenter, A. Shrikumar, J. Xu, E.M. Cofer, C.A. Lavender, S.C. Turaga, A.M. Alexandari, Z. Lu, D.J. Harris, D. DeCaprio, Y. Qi, A. Kundaje, Y. Peng, L.K. Wiley, M.H.S. Segler, S.M. Boca, S.J. Swamidass, A. Huang, A. Gitter, C.S. Greene, J. R. Soc. Interface 15(141), 20170387 (2018)
 (145) N. Jaques, S. Gu, D. Bahdanau, J.M. HernándezLobato, R.E. Turner, D. Eck, in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, ed. by D. Precup, Y.W. Teh (PMLR, International Convention Centre, Sydney, Australia, 2017), Proceedings of Machine Learning Research, vol. 70, pp. 1645–1654
 (146) M. Olivecrona, T. Blaschke, O. Engkvist, H. Chen, J. Cheminf. 9(1), 48 (2017)
 (147) S. Kang, K. Cho, J. Chem. Inf. Model. 59(1), 43 (2018)
 (148) B. Sattarov, I.I. Baskin, D. Horvath, G. Marcou, E.J. Bjerrum, A. Varnek, J. Chem. Inf. Model. 59(3), 1182 (2019)
 (149) S. Sinai, E. Kelsic, G.M. Church, M.A. Nowak, arXiv:1712.03346 pp. 1–6 (2017)
 (150) S. Kwon, S. Yoon, in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics  ACMBCB ’17 (ACM Press, New York, New York, USA, 2017), pp. 203–212
 (151) K. Kim, S. Kang, J. Yoo, Y. Kwon, Y. Nam, D. Lee, I. Kim, Y.S. Choi, Y. Jung, S. Kim, W.J. Son, J. Son, H.S. Lee, S. Kim, J. Shin, S. Hwang, npj Comput. Mater. 4(1) (2018)
 (152) V. Mallet, C.G. Oliver, N. Moitessier, J. Waldispuhl, arXiv:1905.12033 (2019)
 (153) J. Lim, S. Ryu, J.W. Kim, W.Y. Kim, J. Cheminf. 10(1) (2018)
 (154) G.L. Guimaraes, B. SanchezLengeling, C. Outeiral, P.L.C. Farias, A. AspuruGuzik, C. Outeiral, P.L.C. Farias, A. AspuruGuzik, arXiv:1705.10843 (2017)
 (155) B. SanchezLengeling, C. Outeiral, G.L.L. Guimaraes, A.A. AspuruGuzik, chemRxiv:5309668 pp. 1–18 (2017)
 (156) E. Putin, A. Asadulaev, Y. Ivanenkov, V. Aladinskiy, B. SanchezLengeling, A. AspuruGuzik, A. Zhavoronkov, J. Chem. Inf. Model. 58(6), 1194 (2018)
 (157) O. MendezLucio, B. Baillif, D.A. Clevert, D. Rouquié, J. Wichard, chemrXiv:7294388 (2018)
 (158) N. Killoran, L.J. Lee, A. Delong, D. Duvenaud, B.J. Frey, arXiv:1712.06148 (2017)
 (159) A. Kadurin, S. Nikolenko, K. Khrabrov, A. Aliper, A. Zhavoronkov, Mol. Pharmaceutics 14(9), 3098 (2017)
 (160) A. Kadurin, A. Aliper, A. Kazennov, P. Mamoshina, Q. Vanhaelen, K. Khrabrov, A. Zhavoronkov, Oncotarget 8(7), 10883 (2017)
 (161) T. Blaschke, M. Olivecrona, O. Engkvist, J. Bajorath, H. Chen, Mol. Inf. 37(12), 1700123 (2018)
 (162) D. Polykovskiy, A. Zhebrak, D. Vetrov, Y. Ivanenkov, V. Aladinskiy, P. Mamoshina, M. Bozdaganyan, A. Aliper, A. Zhavoronkov, A. Kadurin, Mol. Pharmaceutics 15(10), 4398 (2018)
 (163) J. Mueller, D. Gifford, T. Jaakkola, in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, ed. by D. Precup, Y.W. Teh (PMLR, International Convention Centre, Sydney, Australia, 2017), Proceedings of Machine Learning Research, vol. 70, pp. 2536–2544
 (164) J.P. Janet, L. Chan, H.J. Kulik, The Journal of Physical Chemistry Letters 9(5), 1064 (2018)
 (165) M.J. Kusner, B. Paige, J.M. HernándezLobato, arXiv:1703.01925 (2017)
 (166) H. Dai, Y. Tian, B. Dai, S. Skiena, L. Song, arXiv:1802.08786 (2018)
 (167) R. Winter, F. Montanari, F. Noé, D.A. Clevert, Chem. Sci. 10(6), 1692 (2019)
 (168) W. Jin, R. Barzilay, T. Jaakkola, arXiv:1802.04364 (2018)
 (169) E.J. Bjerrum, arXiv:1703.07076 (2017)
 (170) Z. Alperstein, A. Cherkasov, J.T. Rolfe, arXiv:1905.13343 (2019)
 (171) A. Lusci, G. Pollastri, P. Baldi, J. Chem. Inf. Model. 53(7), 1563 (2013)
 (172) J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, arXiv:1312.6203 (2013)
 (173) C.W. Coley, R. Barzilay, W.H. Green, T.S. Jaakkola, K.F. Jensen, J. Chem. Inf. Model. 57(8), 1757 (2017)
 (174) P. Hop, B. Allgood, J. Yu, Mol. Pharmaceutics 15(10), 4371 (2018)
 (175) K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. GuzmanPerez, T. Hopper, B. Kelley, M. Mathea, A. Palmer, V. Settels, T. Jaakkola, K. Jensen, R. Barzilay, arXiv:1904.01561 (2019)
 (176) J. You, R. Ying, X. Ren, W.L. Hamilton, J. Leskovec, arXiv:1802.08773 (2018)
 (177) Y. Li, L. Zhang, Z. Liu, arXiv:1801.07299 (2018)
 (178) T.N. Kipf, M. Welling, arXiv:1611.07308 (2016)
 (179) M. Simonovsky, N. Komodakis, arXiv:1802.03480 (2018)
 (180) A. Grover, A. Zweig, S. Ermon, arXiv:1803.10459 (2018)
 (181) B. Samanta, A. De, N. Ganguly, M. GomezRodriguez, arXiv:1802.05283 (2018)
 (182) Q. Liu, M. Allamanis, M. Brockschmidt, A.L. Gaunt, arXiv:1805.09076 (2018)
 (183) T. Ma, J. Chen, C. Xiao, arXiv:1809.02630 (2018)
 (184) W. Jin, K. Yang, R. Barzilay, T. Jaakkola, in International Conference on Learning Representations (2019)
 (185) W. Jin, R. Barzilay, T.S. Jaakkola, chemrXiv:8266745 (2019)
 (186) R. Assouel, M. Ahmed, M.H. Segler, A. Saffari, Y. Bengio, arXiv:1811.09766 (2018)
 (187) J. Lim, S.Y. Hwang, S. Kim, S. Moon, W.Y. Kim, arXiv:1905.13639 (2019)
 (188) Z. Zhou, S. Kearnes, L. Li, R.N. Zare, P. Riley, arXiv:1810.08678 (2018)
 (189) S. Kearnes, L. Li, P. Riley, arXiv:1904.08915 (2019)
 (190) S. Liu, T. Chandereng, Y. Liang, arXiv:1806.09206 (2018)
 (191) M. Krenn, F. Häse, A. Nigam, P. Friederich, A. AspuruGuzik, arXiv:1905.13741 (2019)
 (192) X. Guo, L. Wu, L. Zhao, arXiv:1805.09980 (2018)
 (193) A. Bojchevski, O. Shchur, D. Zügner, S. Günnemann, arXiv:1803.00816 (2018)
 (194) Y. Xiong, Y. Zhang, H. Fu, W. Wang, Y. Zhu, P.S. Yu, in Database Systems for Advanced Applications (Springer International Publishing, 2019), pp. 536–552
 (195) N. De Cao, T. Kipf, arXiv:1805.11973 (2018)
 (196) E. Jang, S. Gu, B. Poole, arXiv:1611.01144 (2016)
 (197) M.J. Kusner, J.M. HernándezLobato, arXiv:1611.04051 (2016)
 (198) S. Pölsterl, C. Wachinger, arXiv:1905.10310 (2019)
 (199) K. Xu, W. Hu, J. Leskovec, S. Jegelka, arXiv:1810.00826 (2018)
 (200) Łukasz Maziarka, A. Pocha, J. Kaczmarczyk, K. Rataj, M. Warchoł, arXiv:1902.02119 (2019)
 (201) S. Fan, B. Huang, arXiv:1906.03220 (2019)
 (202) M. Neuhaus, H. Bunke, Bridging the Gap Between Graph Edit Distance and Kernel Machines (World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2007)
 (203) Y. Li, O. Vinyals, C. Dyer, R. Pascanu, P. Battaglia, arXiv:1803.03324 (2018)
 (204) T.A. Schieber, L. Carpi, A. DíazGuilera, P.M. Pardalos, C. Masoller, M.G. Ravetti, Nat. Commun. 8, 13928 (2017)
 (205) H. Choi, H. Lee, Y. Shen, Y. Shi, arXiv:1807.00252 (2018)
 (206) S.I. Ktena, S. Parisot, E. Ferrante, M. Rajchl, M. Lee, B. Glocker, D. Rueckert, arXiv:1703.02161 (2017)
 (207) K. Do, T. Tran, T. Nguyen, S. Venkatesh, arXiv:1804.00293 (2018)
 (208) S. Ryu, J. Lim, W.Y. Kim, arXiv:1805.10988 (2018)
 (209) H. Kajino, arXiv:1809.02745 (2018)
 (210) L. Theis, A. van den Oord, M. Bethge, arXiv:1511.01844 (2015)
 (211) K. Preuer, P. Renz, T. Unterthiner, S. Hochreiter, G. Klambauer, J. Chem. Inf. Model. 58(9), 1736 (2018)
 (212) D. Polykovskiy, A. Zhebrak, B. SanchezLengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, A. Kadurin, S. Nikolenko, A. AspuruGuzik, A. Zhavoronkov, arXiv:1811.12823 (2018)
 (213) Z. Wu, B. Ramsundar, E.N. Feinberg, J. Gomes, C. Geniesse, A.S. Pappu, K. Leswing, V. Pande, Chem. Sci. 9(2), 513 (2018)
 (214) N. Brown, M. Fiscato, M.H. Segler, A.C. Vaucher, J. Chem. Inf. Model. 59(3), 1096 (2019)
 (215) F. Häse, L.M. Roch, C. Kreisbeck, A. AspuruGuzik, arXiv:1801.01469 (2018)
 (216) N.W.A. Gebauer, M. Gastegger, K.T. Schütt, arXiv:1810.11347 (2018)
 (217) F. Noé, H. Wu, arXiv:1812.01729 (2018)
 (218) N.W.A. Gebauer, M. Gastegger, K.T. Schütt, arXiv:1906.00957 (2019)
 (219) M.S. Jørgensen, H.L. Mortensen, S.A. Meldgaard, E.L. Kolsbjerg, T.L. Jacobsen, K.H. Sørensen, B. Hammer, arXiv:1902.10501 (2019)
 (220) E. Mansimov, O. Mahmood, S. Kang, K. Cho, arXiv:1904.00314 (2019)
 (221) K. Madhawa, K. Ishiguro, K. Nakago, M. Abe, arXiv:1905.11600 (2019)
 (222) J. Wang, S. Olsson, C. Wehmeyer, A. Perez, N.E. Charron, G. de Fabritiis, F. Noe, C. Clementi, arXiv:1812.01736 (2018)
 (223) W. Wang, R. GómezBombarelli, arXiv:1812.02706 (2018)
 (224) J. Bradshaw, M.J. Kusner, B. Paige, M.H.S. Segler, J.M. HernándezLobato, in International Conference on Learning Representations (2019)
 (225) J. Bradshaw, B. Paige, M.J. Kusner, M.H.S. Segler, J.M. HernándezLobato, arXiv:1906.05221 (2019)
 (226) A.J. Riesselman, J.B. Ingraham, D.S. Marks, Nat. Methods 15(10), 816 (2018)