I Introduction
Properties associated with the dynamics of a crystal lattice are of key importance for the design and discovery of new materials from nanoscale. Such properties can be computationally obtained by studying phonons  collective excitation in a periodic arrangement of atoms characterized by the modes of vibration and the corresponding frequencies. The phonon dispersion relations facilitate the calculation of many transport properties: heat capacity and vibrational contribution to the entropy of the system, thermal conductivity, superconductivity, and ferroelectricityPetretto et al. (2018). Phonon spectra also provide the information about the phase stability of compounds through the inspection of imaginary phonon modes and facilitate the interpretation of Raman spectraPetretto et al. (2018).
Datadriven approaches rooted in the firstprinciples modeling techniques recently received much attention for the purpose of the design and discovery of new materials, with multiple success stories reported to dateJain et al. (2013); Curtarolo et al. (2012); Saal et al. (2013); Pizzi et al. (2016); nom (2018). Firstprinciples computational techniques have also long been known as an accurate and reliable way to predict the vibrational properties of materialsR. di Meo and Cozzini (2009); Giannozzi et al. (2009). Although the efforts in organizing the calculation data have also existed for some timeTogo and Tanaka (2015), due to the associated computational complexity a highthroughput approach to the calculations of the vibrational properties of materials was reported only recentlyPetretto et al. (2018); Plata et al. (2017); Guido Petretto (2017).
Density Functional Perturbation TheoryGonze (1995) is a firstprinciples technique that allows one to extract the information about how a material responds to an atomic vibration at a specific frequency and shape. This approach is well established and accurateGiannozzi et al. (2009); R. di Meo and Cozzini (2009), however, computationally demanding due to having to repeat the total energy calculations for each of the perturbation shapes of the original crystal lattice. For a unit cell with N atoms, the total number of phonon modes is 3N per each perturbation. Since multiple perturbations need to be sampled to achieve accuracy, the complexity of phonon calculations often is two or more orders of magnitude higher than for the total energy calculationsGuido Petretto (2017).
We present example applications of the approach implemented inside Exabyte platformexa (2018a) and similar to previously described in Refs. Das et al. (2018); Das and Bazhirov (2018). The approach is able to facilitate highthroughput firstprinciples calculations in a repeatable way transferable from one material to another. In this manuscript we apply this approach for the extraction of the vibration properties of materials. We use Density Functional Perturbation Theory in the planewave pseudopotential approximationHohenberg and Kohn (1964); Ihm et al. (1979); Gonze (1995); Giannozzi et al. (2009) and obtain the phonon dispersions and densities of states for a set of 35 materials. We optimize the modeling workflows in order to minimize the human time required per each calculation, and obtain the frequencies per each individual irreducible representation of the phonon perturbation in parallelR. di Meo and Cozzini (2009). We compare the results with the available reference data from other authors and find good agreement.
This manuscript is structured as follows. We first explain the materials studied and discuss the logic, the methodology, and the parameters used inside the calculation workflows. Next, we present example results for a control subset of materials and compare them with the available computational data from other authors. Finally, we discuss the results and the achieved speedups more in depth and suggest the pathways toward further improvements. This work presents all the following: the results, the tools that generated the results, all associated data, and an easytoaccess way to reproduce and further improve upon our work.exa (2018b)
Ii Methodology
ii.1 General logic
We demonstrate the general execution flow employed in this work in Ref. Das et al. (2018). We employ the framework explained therein in order to construct the simulation workflows for the calculation of the vibrational properties discussed here. The users of Exabyte platform can clone the associated entities (eg. materials, workflows)  and recreate our calculations in order to reproduce or further improve the results.
ii.2 Materials
All materials studied in this work constitute a subset of the ESC71 set from Das et al. (2018) with the total count of 35 . The details about the materials studied are given in Fig 1. Our selection is based initially on the widely used semiconducting compounds that have relatively small crystal unit cells.
Formula  MP id  Formula  MP id  

Si  mp149  2  Li2O  mp1960  3 
Ge  mp32  2  BN  mp7991  4 
Bi  mp23152  2  AlN  mp661  4 
Sn  mp117  2  CaO  mp545512  4 
BP  mp1479  2  MgSe  mp1018040  4 
GaP  mp2490  2  MgTe  mp1039  4 
AlAs  mp2172  2  SnO2  mp856  6 
GaAs  mp2534  2  AlGaAs  N/A  8 
GaN  mp830  2  InGaAs  N/A  8 
YN  mp2114  2  InGaP  N/A  8 
ZnS  mp10695  2  AlInAs  N/A  8 
BeSe  mp1541  2  AlInSb  N/A  8 
MgO  mp1265  2  GaAsP  N/A  8 
MgS  mp1315  2  AlGaN  mp1019508  8 
ZnO  mp2229  2  B2O3  mp717  10 
BeO  mp1778  2  Al2O3  mp1143  10 
BaSe  mp1253  2  B  mp160  12 
CaSe  mp1415  2 
ii.3 Workflows
We implement a gridparallel workflow for the calculation of the phonon dynamical matrices initially explained in Ref. R. di Meo and Cozzini (2009) and demonstrated in Figs. 1 and 2 with an additional optional preceeding step for a variablecell relaxation. During the phonon calculation part the following happens:

First, the irreducible representations for the vibrational modes (irreps) are generated based on the sampling grid in the reciprocal space (qpoint grid). Full symmetry analysis is not performed in the current implementation.

Second, a separate calculation is prepared and submitted for execution to the cloud infrastructure manager per each irreducible representation (”map” stage).

Next, the computational infrastructure is provisioned ondemand at a cloud provider with a cap on the total number of nodes as explaned further in this section.

Finally, after the calculations for all irreps are finished, the dynamical matrices are collected and phonon dispersions and density of states are calculated (”reduce” stage).
Thus, we employ a ”mapreduce” type embarassingly parallel scenario, and couple the calculations of the individual phonon dispersion modes with the allocation of computational resources on the cloud. This allows for the improved efficiency and speedup, such that the limiting phase in the total calculation is the longest run per individual irreducible representation. Unlike the previously considered workflows categorization based on the inclusion of the semicore states, spinorbit coupling and magnetism Das et al. (2018), in this work we omit the considerations of the latter two and only include semicores as it is implemented in the GBRV pseudopotential setGarrity et al. (2014).
ii.4 Computational setup
We use Density Functional TheoryKohn and Sham (1965) in the planewave pseudopotential formalismIhm et al. (1979) as implemented in Quantum ESPRESSO (QE) packageGiannozzi et al. (2009). Within the generalized gradient approximation the exchangecorrelation effects were modeled using the PerdewBerkeErnzerhof (PBE)Perdew et al. (1996) functional. The ultrasoft GBRV pseudopotentials at version 1.5Garrity et al. (2014)
with the recommended cutoff values of 40 Ry and 200 Ry are used for the electronic wavefunctions and electronic densities correspondingly. We implemented sampling in the reciprocal cell based on kpoints per reciprocal atom (KPPRA) with a uniform unshifted grid. A minimum KPPRA of 1,600 was used for the electronic structure calculations. The phonon properties are calculated on an grid that corresponds to a minimum QPPRA of 200. The Fourier transform and subsequent interpolation to an effective IPPRA (interpolated points per reciprocal atom) of 12,800 was used, as it is implemented in QE through ”q2r” and ”matdyn” modules.
All calculations were performed using the hardware available from Microsoft Azure cloud computing serviceazu in the same manner as described in Refs. Das et al. (2018); Das and Bazhirov (2018), except for the use of ”F16” instances for the current work. Computational resources were provisioned and assembled ondemand by software implemented and available within the Exabyte platformexa (2018a). All runs were executed by a single person within a oneweek period in June 2018, which emphasizes the power of the underlying general approach to materials modeling implemented in Exabyte platform. The peak size of the computational infrastructure used during this work was administratively limited to 200 nodes or 3,200 total computing cores
ii.5 Data access and repeatability
The materials, workflows, batch jobs for each material with the associated properties, and files for each step of the simulation workflows are all freely available online at the link in Ref.exa (2018b). Readers interested in repeating or imporving upon our work may create an account, copy materials and/or workflows to their account collection, and recreate the simulation for this materials. An example simulation workflow, as employed in the current work, is presented in Fig. 2. Readers can see the mapreduce type logic included in the workflow, where the individual calculation tasks are performed independently in parallel in order to speed up the execution. Example results for InGaAs and B, in the same exact representation as can be accessed through Ref. exa (2018b), are shown in Fig. 3 and Fig. 4 correspondingly.
Iii Results and Discussion
iii.1 Results and comparison with prior calculations
The results for all the materials studied in this work are available online at the link in Ref. exa (2018b). Fig. 5 shows a comparison of the calculated phonon density of states for a subset of 9 compounds, including boron, GaN, GaAs, ZnS, AlO, MgTe, BeO, SiO, SnO, with the results of the Materials ProjectJain et al. (2013) (further referred to as MP). As it can be seen, the results are in agreement with each other in the overall shape, with a small (15%) shift toward the high frequency range in MP case. We attribute this shift to the use of PBEsolPerdew et al. (2008) functional in their work, versus PBEPerdew et al. (1996) for our calculations.
iii.2 Goals
We meant this study as a demonstration of the capabilities of Exabyte platform in deploying the density functional perturbation theory (DFPT) tools to predict the vibrational properties of materials. We also focused our attention on how it can be accelerated and applied in an accessible way with minimal additional computational setup (i.e. no specialized hardware or compilation routines). We elaborate on the results of our prior workDas et al. (2018) and extend the spectrum of materials properties available.
iii.3 Further improvements
We can consider multiple ways to further improve the results presented in this work. Firstly, as mentioned in Petretto et al. (2018)
, employing PSEsol instead of the more widely used (at the moment of this writing) PBE functional for the treatment of the exchangecorrelation in the materials studied might be beneficial. According to the comparison presented in Fig.
5 this effect is expected to be small  within 15%. Secondly, a rigorous relaxation routine, together with a more extensive convergence study for the reciprocal cell sampling in each of the computed cases might be beneficial. As the magnitude of the phonon frequencies is relatively small (meV range), the artifacts related to convergence can sometimes lead to the artificially present negative frequenciesGuido Petretto (2017). We originally attempted to include all materials from the ESC71 setDas et al. (2018), and left some out due to the time constraints related to resolving the presence of negative frequencies. Lastly, an improvement in speed and efficiency of the modeling workflows may be achieved by further optimizing the coupling of the computational infrastructure to the individual calculations per each irreducible representation. As the Table 2 demonstrates, the minimum attainable runtime with the ”mapreduce” workflow can be as much as 374 shorter than for a sequential, while we practically achieved speedups up to 134.iii.4 Computational time and cost
Material  , (hr)  Cost ()  

MgO  2  44.1  134  174  100 
MgSe  4  66.3  30  248  150 
AlGaN  8  18.6  13  34  50 
B  12  102.3  117  374  250 
We present the analysis of the runtimes for the different calculation scenarios. First, we list the total ”sequential” execution time that a phonon calculation would take without parallelizing the tasks for irreducible representations. In practice, this would mean confining the simulation to a single computing node with 16 cores as explained in section II. Next, we present the corresponding speedup ratios for the actual runtime recorded and for the minimum attainable runtime corresponding to the longest run among all irreps. As can be seen from the table, we achieved speedups in the 13134 range in practice, while corresponding maximum speedups possible are 34374. We also present the associated costs, which, due to the elastic nature of cloud computing, do not depend on the specific calculation scenario.
iii.5 Future outlook
The landscape of computational materials design is rapidly evolving toward a datadriven science, with multiple initiatives contributing toward the automated aggregation and categorization of materials properties. Major improvements in the way computational materials science is used would be possible when the range of materials properties feasible for calculation is extended to include the vibrational spectra and related. The approach described in this work can assist with the above. This work demonstrates that highfidelity data about the vibrational properties, perhaps only for the electronic materials at this moment, is readily attainable in an accessible and repeatable manner. Our intent is to welcome collaborative contributions in order to, firstly, further grow the online repository of the results; secondly, allow contributions from other modeling techniques beyond studied here; and, finally, facilitate the creation of statistical (machine learning) models based on the available data.
Iv Conclusions
We present the applications of a novel approach to materials modeling from nanoscale implemented within the Exabyte platformexa (2018a) and capable of rapidly delivering results about the vibrational properties of materials in an accessible and datacentric manner. We apply this approach to a set of 35 materials in order to demonstrate how it works. We report the results for the phonon densities of states and phonon dispersions obtained using the Density Functional Perturbation Theory within the Generalized Gradient Approximation (GGA). We compare the results with prior similar calculation attempts and discuss the corresponding computational costs and pathways to further improvements.
We demonstrate how computationally demanding task of calculating the phonon frequencies, that would otherwise take from 18 to 102 hours on an uptodate highperformance computing server, can be accelerated by a factor of 13134 such that the resulting runtime fits within one hour. We present not only the results and the associated data, but also an easytoaccess way to reproduce and extend the results by means of the Exabyte platform.exa (2018b) Our work provides an accessible and repeatable practical recipe for performing highfidelity firstprinciples calculations of the vibrational properties of materials in a highthroughput manner.
References
 exa (2018a) Exabyte.io: materials discovery cloud (2018a).
 Das et al. (2018) Protik Das, Mohammad Mohammadi, and Timur Bazhirov, “Accessible computational materials design with high fidelity and high throughput,” arxiv.org/abs/1807.05623 (2018).
 Das and Bazhirov (2018) Protik Das and Timur Bazhirov, “Electronic properties of binary compounds with high throughput and high fidelity,” arxiv.org/abs/1808.05325 (2018).
 Petretto et al. (2018) Guido Petretto, Shyam Dwaraknath, Henrique P.C. Miranda, Donald Winston, Matteo Giantomassi, Michiel J. van Setten, Xavier Gonze, Kristin A. Persson, Geoffroy Hautier, and GianMarco Rignanese, “Highthroughput densityfunctional perturbation theory phonons for inorganic materials,” Scientific Data 5, 180065 EP – (2018), data Descriptor.
 Jain et al. (2013) Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al., “Commentary: The materials project: A materials genome approach to accelerating materials innovation,” Apl Materials 1, 011002 (2013).
 Curtarolo et al. (2012) Stefano Curtarolo, Wahyu Setyawan, Shidong Wang, Junkai Xue, Kesong Yang, Richard H Taylor, Lance J Nelson, Gus LW Hart, Stefano Sanvito, Marco BuongiornoNardelli, et al., “Aflowlib. org: A distributed materials properties repository from highthroughput ab initio calculations,” Computational Materials Science 58, 227–235 (2012).
 Saal et al. (2013) James E Saal, Scott Kirklin, Muratahan Aykol, Bryce Meredig, and Christopher Wolverton, “Materials design and discovery with highthroughput density functional theory: the open quantum materials database (oqmd),” Jom 65, 1501–1509 (2013).
 Pizzi et al. (2016) Giovanni Pizzi, Andrea Cepellotti, Riccardo Sabatini, Nicola Marzari, and Boris Kozinsky, “Aiida: automated interactive infrastructure and database for computational science,” Computational Materials Science 111, 218–230 (2016).
 nom (2018) NOMAD laboratory: A European Centre of Excellence (2018).
 R. di Meo and Cozzini (2009) P. Giannozzi R. di Meo, A. Dal Corso and S. Cozzini, “Calculation of phonon dispersions on the grid using quantum espresso,” ICTP lecture notes 24, 163 24, 163 (2009).
 Giannozzi et al. (2009) Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, Andrea Dal Corso, Stefano de Gironcoli, Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos Gougoussis, Anton Kokalj, Michele Lazzeri, Layla MartinSamos, Nicola Marzari, Francesco Mauri, Riccardo Mazzarello, Stefano Paolini, Alfredo Pasquarello, Lorenzo Paulatto, Carlo Sbraccia, Sandro Scandolo, Gabriele Sclauzero, Ari P Seitsonen, Alexander Smogunov, Paolo Umari, and Renata M Wentzcovitch, “Quantum espresso: a modular and opensource software project for quantum simulations of materials,” Journal of Physics: Condensed Matter 21, 395502 (2009).
 Togo and Tanaka (2015) A Togo and I Tanaka, “First principles phonon calculations in materials science,” Scr. Mater. 108, 1–5 (2015).
 Plata et al. (2017) Jose J. Plata, Pinku Nath, Demet Usanmaz, Jesús Carrete, Cormac Toher, Maarten de Jong, Mark Asta, Marco Fornari, Marco Buongiorno Nardelli, and Stefano Curtarolo, “An efficient and accurate framework for calculating lattice thermal conductivity of solids: Aflow–aapl automatic anharmonic phonon library,” npj Computational Materials 3, 45 (2017).
 Guido Petretto (2017) Geoffroy Hautier GianMarco Rignanese Guido Petretto, Xavier Gonze, “Convergence and pitfalls of density functional perturbation theory phonons calculations from a highthroughput perspective,” arxiv.org/abs/1710.06028 (2017).
 Gonze (1995) Xavier Gonze, “Adiabatic densityfunctional perturbation theory,” Phys. Rev. A 52, 1096–1114 (1995).
 Hohenberg and Kohn (1964) P. Hohenberg and W. Kohn, “Inhomogeneous electron gas,” Phys. Rev. 136, B864–B871 (1964).
 Ihm et al. (1979) J Ihm, A Zunger, and M.L. Cohen, “Momentumspace formalism for the total energy of solids,” Journal of Physics C: Solid State Physics 12, 4409 (1979).
 exa (2018b) Exabyte platform: project URL with data about simulations (2018b).
 Garrity et al. (2014) Kevin F. Garrity, Joseph W. Bennett, Karin M. Rabe, and David Vanderbilt, “Pseudopotentials for highthroughput dft calculations,” Computational Materials Science 81, 446 – 452 (2014).
 Kohn and Sham (1965) Walter Kohn and Lu Jeu Sham, “Selfconsistent equations including exchange and correlation effects,” Physical review 140, A1133 (1965).
 Perdew et al. (1996) John P Perdew, Kieron Burke, and Matthias Ernzerhof, “Generalized gradient approximation made simple,” Physical review letters 77, 3865 (1996).
 (22) Microsoft Azure Cloud Computing platform: web page.
 Perdew et al. (2008) John P. Perdew, Adrienn Ruzsinszky, Gábor I. Csonka, Oleg A. Vydrov, Gustavo E. Scuseria, Lucian A. Constantin, Xiaolan Zhou, and Kieron Burke, “Restoring the densitygradient expansion for exchange in solids and surfaces,” Phys. Rev. Lett. 100, 136406 (2008).
Comments
There are no comments yet.