Accessible computational materials design with high fidelity and high throughput

07/15/2018 ∙ by Protik Das, et al. ∙ 0

Despite multiple successful applications of high-throughput computational materials design from first principles, there is a number of factors that inhibit its future adoption. Of particular importance are limited ability to provide high fidelity in a reliable manner and limited accessibility to non-expert users. We present example applications of a novel approach, where high-fidelity first-principles simulation techniques, Density Functional Theory with Hybrid Screened Exchange (HSE) and GW approximation, are standardized and made available online in an accessible and repeatable setting. We apply this approach to extract electronic band gaps and band structures for a diverse set of 71 materials ranging from pure elements to III-V and II-VI compounds, ternary oxides and alloys. We find that for HSE and G0W0, the average relative error fits within 20 Approximation the error is 55 an up-to-date server centrally available from a public cloud provider to fit within 48 hours. This work provides a cost-effective, accessible and repeatable practical recipe for performing high-fidelity first-principles calculations of electronic materials in a high-throughput manner.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Materials design and discovery based on first-principles modeling is an inter-disciplinary research area that recently received much attention with multiple success stories reported in the field of catalysis, hydrogen storage materials, Li-ion batteries, photovoltaics, topological insulators, carbon capture, piezoelectrics, and thermoelectrics Jain et al. (2013); Curtarolo et al. (2012); Saal et al. (2013); Pizzi et al. (2016); nom . These efforts enabled the integration of computational materials science with information technology (e.g., web-based dissemination, databases and data-mining), expanded access to properties computed by first-principles modeling approaches to new communities and promoted new collaborative work. Nevertheless, when compared with the more established computer-aided design and engineering sector, there is still much room for improvement in the way first-principles modeling is performed with respect to the accessibility and repeatability of high fidelity calculations.

High-throughput virtual screening produced large repositories of data for its further consumption by other scientists, notably the Materials Project Jain et al. (2013), AFLOWCurtarolo et al. (2012), and the open quantum materials databaseSaal et al. (2013). Other initiatives, like AIIDAPizzi et al. (2016), also provided a set of building blocks for the construction of the simulation workflows. Recently, other approaches like NOMADnom

, emerged with the idea of an open access data repository aimed to allow for advanced data analytics and the creation of machine learning models. Other notable example includes the Computational 2D Materials Database

Haastrup et al. (2018); Rasmussen and Thygesen (2015) targeted at the applications in semiconductor area. The efforts above include a significant computer science aspect and created software tools that facilitate the execution of simulations in a high-throughput way, such as PymatgenOng et al. (2013), Atomic Simulations EnvironmentLarsen et al. (2017), AIIDA stackPizzi et al. (2016) and similar. These tools facilitate the adoption of the original techniques by other computational materials scientists and help organize and standardize the community efforts. Naturally, the data-centric approaches to the development of new materials followed aftercit ; til ; Villars et al. (2004); Isayev et al. (2017); Ward et al. (2016). Additionally, the area of cloud computing as applied to the first-principles materials modeling emerged in the last few yearsYang et al. (2018); Bazhirov et al. (2017).

We present the approach conceived and implemented by Exabyte Inc. inside its web-based modeling platform since 2014. The approach is focused on the accessibility and repeatability of modeling workflows, is designed to support creation and execution of multiscale models online, and is reminiscent to NanoHUB.orgKlimeck et al. (2008). Compared to standalone software tools, such an approach allows users to focus on the physical essence of the problem and removes any obstacles related to the computational complexity, such as installation and parallelization concerns. Our approach enables access for (eg. experimental) scientists without direct knowledge of modeling techniques, promotes the exchange of ideas, and extends creative breadth of the resulting research. By relying on centrally-available cloud-based high performance computing the platform yields the computational power to facilitate high fidelity in a reproducible wayMohammadi and Bazhirov (2018), and its data-centric nature eliminates unnecessary repetition, facilitates collaboration, and embraces traceability, version control and other computer science paradigmsPizzi et al. (2016).

In this manuscript we report the example application of the above platformexa (a) to the electronic structural properties of semiconducting materials. We use Density Functional Theory in the plane-wave pseudopotential approximationHohenberg and Kohn (1964); Ihm et al. (1979) and obtain the electronic band structures and band gaps for a diverse set of 71 compounds ranging from pure elements to ternary oxides and further referred to as ESC-71. We provide the results for the Generalized Gradient Approximation Perdew et al. (1996), the Hybrid Screened ExchangeHeyd et al. (2003) and the GW approximationHybertsen and Louie (1985). We compare the results with the available experimental data and present the assessment of the accuracy levels for each model. For the first time ever this work presents all the following combined together: the results, the tools that generated the results, the simulations with all associated data, and an easy-to-access way to reproduce, improve and contribute results for other materials into a centralized ever-growing repository.exa (b).

Figure 1: Flowchart with the execution logic of the simulations. Branch (1), shown in light gray, represents the initial design of the simulation workflow with its subsequent storage as JSON object in database. Branch (2), dark gray, illustrates the upload and conversion to database entries of the structural materials information. (3), in black, demonstrates the main execution logic for the creation and execution of simulation jobs. Finally, (4) denotes further analysis and is show using dashed black lines.
Figure 2: Example unit of a simulation workflow with a pre-processor, main execution part, and post-processors.
Figure 3: Simulation workflow for an HSE calculation. Post-processors (dashed) used to extract materials properties. () denotes an auxiliary intermediate step.

Ii Methodology

ii.1 General logic

ii.1.1 Execution flow

We demonstrate the general execution flow employed in this work in Fig. 1. We start from the design of the simulation workflows (designated by (1) in the figure). We represent the logic of the workflows through a data structure encoded using a JavaScript Object Notation (JSON) using a data convention further referred to as Exabyte Data Convention (EDC). Next, we upload the initial structures for the materials to be studied (branch (2) in the figure). After that, we create and execute the simulation jobs using the cloud-based high-performance computing infrastructure assembled on-demand by our software (corresponding to branch (3) in the figure), and collect the resulting properties in a database. Finally, we analyze the results either through a graphical user interface or by means of the RESTful application programming interface (API). The general execution flow, including all the above components and the associated entities are freely available online. The users of Exabyte platform can clone the associated entities (eg. materials, workflows) - and re-create our calculations in order to reproduce or further improve the results.

ii.1.2 Workflow units

Within the EDC each workflow contains multiple units and each unit contains the corresponding input parameters for the simulation engine(s) used within. We logically separate each individual unit into pre-processors, main execution, and post-processors parts as demonstrated in Fig. 2. The pre-processors are ran before the main part and are used for auxiliary tasks, such as creating the required system folders for data on disk. The main execution part is where the main simulation is done. Post-processors are used to assert the completion of the simulation or attempt the main simulation again with a set of adjusted parameters. For the work described in this manuscript we used the error correction logic implemented in Jain et al. (2013). After asserting the validity of the simulation a set of material properties is extracted, organized into JSON data structures, and stored in the database.

ii.1.3 Workflows

An example workflow for HSE calculations utilized in this work in shown in Fig. 3. We start with obtaining the relaxed structures, and then self-consistently pre-calculate the electronic wavefunctions and charge density. These steps are done within the GGA. Next, we repeat the self-consistent calculation, this time including the exchange interaction within HSE. Next, we run an auxiliary step to assist with the construction of the reciprocal path for the final part of the calculation - the non-self-consistent HSE calculation. The latter produces the resulting band gap and band structure properties.

ii.2 Materials

All materials studied in this work constitute the E-71set and are divided in 7 categories according to their stoichiometric composition. The categories together with their shorthand names are listed in Table 1. We attempted to cover a diverse set of semiconductor stoichiometries accessible to the modeling from first principles. We prioritized compounds with smaller number of atoms within the crystal unit cell, however did not impose a hard limit on the unit cell size. Most of the structures studied have 4 or less atoms inside the unit cell, the largest unit cell has 32 atoms. We further sub-categorized materials into groups by the associated difficulty levels for the simulation workflows involved, as explained below. The details about the materials studied, including the corresponding categories and the results are given in Table 3. Our approach is similar to that of Heyd et al. (2005), with an attempt to improve the range of compounds studied and include materials with potential industrial applications.

Material category Symbol Count
Elemental EL 10 2 12
III-V 35 10 2 4
II-VI 26 11 2 4
Binary oxides BO 15 2 12
Ternary oxides TO 10 5 32
Dichalcogenides DC 5 3 6
Alloys AL 10 2 8
Table 1: Summary of the material categorization employed in this work with counts. - number of sites (atoms) in the crystal unit cell.

ii.3 Workflows

In order to organize the information about the simulation workflows we employ the categorization illustrated in Table 2. The categorization depends on: (a) whether the semi-core electronic states are included in the pseudopotentials, (b) whether the treatment of spin-orbit coupling is considered within the calculation, and (c) whether the treatment of magnetic interactions is included. Larger numbers, as included in Table 2 do not necessarily correspond to the higher computational difficulty (see Fig.16, for example). Difficulty 1 (D1) workflows are GGA calculations with default set of pseudopotentials, as implemented in VASP 5.4.4Kresse (1996)

. In terms of theory, the even difficulty numbers, (eg. 2) workflows are similar to the nearest odd (eg. 1), except for the inclusion of semi-core states in the pseudopotential for the following elements: Ga, Ge, In, Sn, Ti, Pb, Bi, Li, Na, Ca, K, Rb, Sr, Cs, Ba. Readers may consult the data online for further detailed information about the types of semi-core states included

exa (b). We prioritize the most comprehensive set whenever available, such that if a pseudopotential with only and both and states are present, we use the latter.

The difficulty 3 and 4 workflows incorporate spin-orbit coupling (SOC). We treated all materials that contain elements with atomic number Z 45 (Rh) as the ones that require spin-orbit coupling to be included in the calculations. This is, notably, a somewhat ”loose” approach, as there exists a well known spin-orbit splitting effect for GaAs, for exampleSurh et al. (1991). We argue, however, that since the latter effect is of the order of 100 meV it would not be critical to the results of this study. This statement is further supported by the band gap value for GaAs found in this work. For the difficulty levels 5 and 6 we incorporate collinear magnetism as follows: we switch on the magnetic interactions and set the initial magnetic moments to a pre-defined value for all ferromagnetic atoms (V, Cr, Mn, Fe, Co, Ni). When more than one atom is present in the unit cell, we alternate the signs for the magnetic moment effectively creating an anti-ferromagnetic arrangement in this case. Lastly, the difficulty 7 workflows have all three, and, due to the nature of the computational implementation, resolve the non-collinear magnetic interactions.

Difficulty level Semi-core SOC Magnetism materials
1 no no no 23
2 yes no no 16
3 no yes no 8
4 yes yes no 9
5 no no yes 4
6 no no yes 5
7 yes yes yes 6
Table 2: Summary of the simulation workflows categorization employed in this work. ”Semi-core” indicates that the pseudopotentials with semi-core states were used, ”SOC” stands for the inclusion of the spin-orbit coupling, and ”Magnetism” is used to denote the inclusion of collinear magnetic moments, except for the difficulty 7 when spin-orbit coupling and magnetism are included both, which lead to the treatment of non-collinear magnetic interactions.

ii.4 Computational setup

ii.4.1 Software/Theory

All Density Functional TheoryKresse and Furthmüller (1996); Kohn and Sham (1965) calculations were performed within the pseudopotential projector augmented wave (PAW)Blöchl (1994) formalism using the Vienna Ab initio Simulation Package (VASP)Kresse (1996); Hacene et al. (2012). Within the generalized gradient approximation the exchange-correlation effects were modeled using the Perdew-Berke-Ernzerhof (PBE)Perdew et al. (1996) functional. All calculations were performed with the largest default plane wave cutoff energy of the pseudopotentials involved. The energies of all calculations were converged to within eV. The Gaussian method was chosen as the smearing algorithm, the blocked Davidson iteration schemeJohnson and Joannopoulos (2001) was chosen as the electron minimization algorithm, and ions were updated using the conjugated gradient algorithm. A smearing value of meV was chosen for all the calculations. The semi-empirical Grimme-D2 correction to the Kohn-Sham energies were incorporated in all of our calculationsGrimme (2006). The Heyd-Scuseria-Ernzerhof (HSE) calculations incorporate a 25 short-range Hartree-Fock exchangeHeyd et al. (2003). The screening parameter is set to 0.2 Å. GW calculations were performed at the non-self-consistent GW level. The number of unoccupied bands for the band gap calculation step was set to the total number of plane waves in the SCF step.

We implemented sampling in the reciprocal cell based on k-points per reciprocal atom (KPPRA) with a uniform unshifted grid. In our calculations, KPPRA of 2,000 were used unless specified otherwise. The density of states (DOS) calculations were performed within a denser grid with KPPRA of 16,000 using tetrahedron interpolation as implemented in VASP

Kresse (1996). We ran most of the calculations within a single compute node described in the next subsection. For some calculations in particular, the memory requirements were larger than the resources available on a single node, however and efficient parallelization scheme for memory distribution is yet to be implemented for GW calculations in VASP at the moment of this writing. To accommodate the calculations within the available memory, we reduced the precision in a controlled way as follows: we limited the number of bands to 1000 at most, instead of using all available, then we reduced the KPPRA value to reduce memory requirements. The details about the cases with reduced precision are summarized in the footnotes of Table3.

ii.4.2 Hardware

All calculations were performed using the hardware available from Microsoft Azure cloud computing serviceazu . We utilized the ”H16r” and ”H16mr” instances specifically designed to handle high performance computing workloads. The instances are based on the Intel Xeon E5-2667 v3 Haswell 3.2 GHz (3.6 GHz with turbo) with 16 cores per node, and 112 and 224 GB of memory respectively. The instances carry a low latency, high-throughput network interface optimized and tuned for remote direct memory access. Computational resources were provisioned and assembled on-demand by software implemented and available within the Exabyte platformexa (a). Most of the calculations were executed within a two-week period with a few requiring further work beyond that time frame. The peak size of the computational infrastructure used during this work was administratively limited to 125 nodes or 2000 total computing cores.

ii.5 Data extraction

The relevant data for each workflow unit is extracted from the calculation output, parsed and stored in the database in the JSON format according to EDC. For instance, the forces on each atom after the volume relaxation are extracted and shown in the results page for each material. The band structure and the density of states (DOS) are also extracted from the band structure and density of states calculations, respectively. Thus, results for each material can be viewed online on Exabyte platform exa (b). The platform also support a programmatic way of extraction of the data associated with materials and simulations through a RESTful application programming interfaceexa (c), partly used in this work as well.

ii.6 Repeatability

The materials, workflows, batch jobs for each material with the associated properties, and files for each step of the simulation workflows are all made readily available onlineexa (b). The Exabyte platform now contains all materials and workflows mentioned in this work, so readers may create an account, copy one or more materials to their account collection, copy a workflow similarly, and use the simulations designer as mentioned in Fig. 1 to recreate the simulation for this material. Furthermore, users can introduce modifications to our workflow and further improve the results.

Formula Diff. Calculated Experiments Band gaps (eV) References
a b c a b c GGA HSE GW HSE GW Expt. Lat. Gap HSE GW
Si 1 3.75 3.75 3.75 3.82 3.82 3.82 0.56 1.14 1.09 1.2811footnotemark: 1 1.12 1.17 Jette and Foote (1935) Kittel et al. (1996) Heyd et al. (2005) Shishkin and Kresse (2007)
Ge 2 3.98 3.98 3.98 4.00 4.00 4.00 0.16 0.86 0.84 0.5611footnotemark: 1 0.66 0.75 Madelung (2012) Heyd et al. (2005) van Schilfgaarde et al. (2006)
Te 3 4.32 4.32 6.02 4.45 4.45 4.45 0.00 0.4222footnotemark: 2 0.00 0.32 - 0.32 Keller et al. (1977) Anzin et al. (1977a) Yi et al. (2018a)
B 1 4.85 4.85 5.00 5.06 5.06 5.06 1.20 1.70 1.58 - - 1.49 Decker and Kasper (1959) Madelung (2012)
Bi 4 4.50 4.50 4.50 4.54 4.54 4.54 0.06 0.00 0.0044footnotemark: 433footnotemark: 3 - - 0.00 Cucka and Barrett (1962) Madelung (2012)
P 1 3.33 4.37 5.45 3.31 5.92 4.38 0.14 0.14 0.16 0.39 0.30 0.35 Brown and Rundqvist (1965) Asahina and Morita (1984) Gomes and Carvalho (2015) Tran et al. (2014)
As 1 3.79 3.79 3.99 3.65 3.65 4.47 0.03 0.29 0.15 0.00 - 0.30 Smith et al. (1975) Madelung (2012) Kecik et al. (2016)
Sb 3 4.31 4.31 4.46 4.30 4.30 4.30 0.00 0.00 0.00 - - 0.00 Barrett et al. (1963) Madelung (2012)
Se 1 4.21 4.21 5.10 4.37 4.37 4.96 0.64 1.34 1.38 - - 1.85 Keller et al. (1977) Madelung (2012)
grey-Sn 3 4.59 4.59 4.59 4.57 4.57 4.57 0.00 0.00 0.00 0.00 - 0.00 Brownlee (1950) Madelung (2012) Hummer et al. (2009)
III-V semiconductors
BN 1 2.50 2.50 6.61 2.50 2.50 6.66 3.15 4.20 4.32 5.9811footnotemark: 1 5.477footnotemark: 7 5.95 Blase et al. (1995) Cassabois et al. (2016) Heyd et al. (2005) Tran and Blaha (2009)
BP 1 3.19 3.19 3.19 3.20 3.20 3.20 1.21 1.93 1.95 2.1611footnotemark: 1 - 2.1 Madelung (2012) Madelung (2012) Heyd et al. (2005)
GaP 2 3.82 3.82 3.82 3.85 3.85 3.85 1.56 2.25 2.22 2.4711footnotemark: 1 2.48 2.35 Madelung (2012) Madelung (2012) Heyd et al. (2005) Lee et al. (2016)
BAs 1 3.36 3.36 5.57 3.37 3.37 3.37 1.12 1.77 1.73 1.9211footnotemark: 1 1.93 - Merrill (1977) Heyd et al. (2005) Lee et al. (2016)
BSb 3 3.71 3.71 3.71 3.62 3.62 3.62 0.64 1.16 1.04 1.3711footnotemark: 1 1.28 - Madelung (2012) Heyd et al. (2005) Lee et al. (2016)
AlN 1 3.10 3.10 4.98 3.11 3.11 4.97 4.29 5.69 6.03 6.4511footnotemark: 1 5.83 6.19 Madelung (2012) Madelung (2012) Heyd et al. (2005) van Schilfgaarde et al. (2006)
AlAs 1 3.99 3.99 3.99 3.96 3.96 3.96 1.42 2.08 2.09 2.2411footnotemark: 1 2.59 2.23 Madelung (2012) Madelung (2012) Heyd et al. (2005) Lee et al. (2016)
GaAs 2 3.98 3.98 3.98 4.00 4.00 4.00 0.63 1.52 1.74 1.2111footnotemark: 1 1.30 1.51 Madelung (2012) Madelung (2012) Heyd et al. (2005) van Schilfgaarde et al. (2006)
GaN 4 3.19 3.19 3.19 3.20 3.20 3.20 1.71 2.98 2.98 3.0311footnotemark: 1 2.80 3.17 Powell et al. (1993) Madelung (2012) Heyd et al. (2005) van Schilfgaarde et al. (2006)
YN 1 3.43 3.43 3.43 3.44 3.44 3.44 0.16 1.04 0.75 - 0.97 - Saha et al. (2011) Saha et al. (2011)
II-VI semiconductors
ZnS 1 3.79 3.79 3.79 3.82 3.82 3.82 2.22 3.48 3.51 3.4211footnotemark: 1 3.29 3.54 Madelung (2012) Madelung (2012) Heyd et al. (2005) van Schilfgaarde et al. (2006)
BeS 1 3.42 3.42 3.42 3.44 3.44 3.44 3.1 4.05 4.47 4.14 4.92 5.5 Madelung (2012) Madelung (2012) Laref and Laref (2013) Lee et al. (2016)
BeSe 1 3.64 3.64 3.64 3.64 3.64 3.64 2.62 3.49 3.81 3.54 4.19 4.00 Madelung (2012) Madelung (2012) Laref and Laref (2013) Lee et al. (2016)
BeTe 3 3.98 3.98 3.98 3.98 3.98 3.98 1.67 2.34 2.65 2.68 3.17 2.80 Madelung (2012) Madelung (2012) Laref and Laref (2013) Lee et al. (2016)
MgS 1 3.65 3.65 3.65 3.67 3.67 3.67 2.84 3.84 4.46 4.7811footnotemark: 1 4.044 4.50 Madelung (2012) Heyd et al. (2005) Nejatipour and Dadsetani (2015)
BaSe 4 4.49 4.49 4.49 4.66 4.66 4.66 1.48 2.21 3.0944footnotemark: 455footnotemark: 5 2.8711footnotemark: 1 2.99 3.60 Grzybowski and Ruoff (1983) Madelung (2012) Heyd et al. (2005) Nejatipour and Dadsetani (2015)
BaTe 4 4.90 4.90 4.90 4.95 4.95 4.95 1.18 1.82 2.5644footnotemark: 455footnotemark: 5 2.5011footnotemark: 1 2.33 3.40 Grzybowski and Ruoff (1984) Madelung (2012) Heyd et al. (2005) Nejatipour and Dadsetani (2015)
CaSe 2 4.13 4.13 4.13 4.18 4.18 4.18 1.93 2.64 3.5 3.0211footnotemark: 1 3.94 - Luo et al. (1994) Heyd et al. (2005)
NaS 2 4.56 4.56 4.56 4.62 4.62 4.62 2.63 3.71 4.67 - 4.77 5.00 Zintl et al. (1934)
MgSe 1 4.20 4.20 6.80 4.15 4.15 6.72 2.71 3.71 4.26 2.6211footnotemark: 1 4.58 4.05 Mittendorf (1965) Madelung (2012) Heyd et al. (2005)
MgTe 3 4.56 4.56 7.41 4.53 4.53 7.40 2.26 3.00 3.54 3.7411footnotemark: 1 4.19 3.49 Madelung (2012) Madelung (2012) Heyd et al. (2005) Lee et al. (2016)
MoS 1 3.17 3.17 12.37 3.16 1.16 12.29 0.95 1.45 1.37 1.06 1.28 1.29 Bronsema et al. (1986a) Wickramaratne et al. (2014) Cheiwchanchamnangij and Lambrecht (2012)
HfSe 3 3.71 3.71 6.04 3.67 3.67 6.00 0.07 0.63 0.77 1.07 1.08 1.10 Hodul and Stacy (1984) Heyd et al. (2005) Abdulsalam and Joubert (2016)
TiS 2 3.36 3.36 6.62 3.41 3.41 5.69 0.00 0.38 0.0944footnotemark: 466footnotemark: 6 0.40 - 0.30 Wiegers and Meerschaut (1992) Suga et al. (2015)
CrS 5 3.04 3.04 6.77 - 0.00 0.00 0.3488footnotemark: 833footnotemark: 3 - - -
MnS 5 3.28 3.28 6.57 - 0.00 0.00 0.0099footnotemark: 9 - - 0.00 Ennaoui et al. (1993)

label-width = 1em , before-skip = -after-skip = -after-item-skip = - (5) HSE03 Mixing is tuned KPPRA 400 Number of bands 1,000 KPPRA 200 KPPRA 650 GW calculation Number of bands 864 KPPRA 500

Table 3: Data for materials studied in this work. “Diff.” - difficulty level. “Calc.” and “Expt.” have lattice constants of the relaxed structure and the experimental values, respectively. Lattice constants of the primitive unit cell are given, unless otherwise noted. A linear relationship between the two materials is assumed to determine the experimental lattice constant for alloys. The HSE and GW values are compared with references when available: ”HSE” and ”GW”. Reduced precision calculations are indicated in footnotes. ”HSE03” - HSE03 approachHeyd et al. (2003). ”GW” indicates the full GW approximation, ”sc–GW” - self-consistent GW calculations. Materials for which the DFTU approach was used are also noted.
Formula Diff. Calculated Experiment Band gaps (eV) References
a b c a b c GGA HSE GW HSE GW Expt. Lat. Gap HSE GW
Binary oxides
LiO 2 3.15 3.15 3.15 - 5.18 6.85 8.07 - 8.10 8.00 Ishii et al. (1999) Sommer et al. (2012)
MgO 1 2.95 2.95 2.95 2.98 2.98 2.98 5.05 6.89 8.01 6.50 7.25 7.67 Tsirelson et al. (1998) Madelung (2012) Heyd et al. (2005)
BeO 1 2.68 2.68 2.68 2.70 2.70 2.70 6.92 8.71 9.62 10.09 10.29 10.59 Madelung (2012) Madelung (2012) Shi et al. (2014)
BO 1 4.12 4.49 4.49 4.13 4.61 4.61 9.97 11.00 12.04 - - - Prewitt and Shannon (1968)
SnO 4 3.23 4.74 4.75 3.19 4.74 4.7411footnotemark: 1 1.00 2.84 2.7422footnotemark: 21212footnotemark: 12 3.5033footnotemark: 3 2.88 3.60 McCarthy and Welton (1989) Madelung (2012) Janotti and Van de Walle (2011) Berger et al. (2010)
AlO 1 4.75 4.75 5.11 4.76 4.76 4.76 6.28 8.22 9.29 8.82 - 8.8033footnotemark: 3 Finger and Hazen (1978) Robertson (2000) Janotti and Van de Walle (2011)
-SiO 1 4.84 4.84 4.97 4.96 4.96 4.96 6.07 8.12 9.48 8.72 10.1044footnotemark: 4 9.30 Pluth et al. (1985) Weinberg et al. (1979) Varley et al. (2012) Kresse et al. (2012)
BaO 4 3.72 3.72 4.15 3.78 3.78 4.30 2.02 3.60 3.7322footnotemark: 288footnotemark: 8 - - 4.29 Bernal et al. (1935) Rao and Kearney (1979)
NaO 2 3.17 4.21 4.21 2.94 4.12 4.12 0.00 0.00 0.00 - - - Klein et al. (1998)
VO 5 2.80 4.52 4.52 2.85 4.55 4.5511footnotemark: 1 0.00 0.18 0.0022footnotemark: 266footnotemark: 6 - - - Kucharczyk and Niklewski (1979)
TiO 2 12.19 3.72 6.55 12.16 3.74 6.5155footnotemark: 5 2.78 4.28 4.66 3.1333footnotemark: 3 3.10 - Banfield et al. (1991) Janotti and Van de Walle (2011) Berger and Neaton (2012)
NiO 5 2.93 2.93 2.93 2.95 2.95 2.9555footnotemark: 5 0.00 2.98 4.07 4.10 3.60 4.3077footnotemark: 7 SASAKI et al. (1979) Gillen and Robertson (2013) Toroker et al. (2011)
CaO 2 3.94 3.94 4.73 - 3.30 4.72 4.97 4.23 4.40 6.93 Madelung (2012) Riefer et al. (2011) Riefer et al. (2011)
ZnO-cub 1 3.04 3.04 3.04 3.02 3.02 3.02 0.83 2.42 2.42 2.49 2.12 3.44 Bates et al. (1962) Madelung (2012) Heyd et al. (2005)
SrO 2 3.60 3.60 3.60 3.64 3.64 3.64 3.25 4.75 5.10 4.70 5.57 5.22 Primak et al. (1948) Madelung (2012) Bajdich et al. (2015)
Ternary oxides
NdClO 3 3.85 3.85 6.77 4.03 4.03 6.7655footnotemark: 5 0.00 0.00 0.0022footnotemark: 288footnotemark: 8 - - - Zachariasen (1949)
SmNiO 7 5.31 5.33 7.41 5.33 5.44 7.5755footnotemark: 5 0.00 0.00 0.00 - - - Lacorre et al. (1991)
NaOsO 4 5.25 5.25 5.25 - 0.00 1.34 0.1222footnotemark: 21414footnotemark: 14 - - -
GdTiO 4 5.38 5.46 7.60 5.41 5.67 7.69 0.00 0.00 0.0022footnotemark: 21212footnotemark: 12 - - - McCarthy et al. (1969)
SrTiO 2 5.50 5.50 5.51 5.51 5.51 5.51 1.94 3.24 2.90 3.20 3.57 3.25 Jauch and Palmer (1999); Long et al. (2013) Van Benthem et al. (2001)
LaCoO 7 5.30 5.30 5.30 5.34 5.34 5.34 0.00 2.52 0.2422footnotemark: 21212footnotemark: 12 2.52 - 0.60 Thornton et al. (1986) Chainani et al. (1992) Zhang et al. (2014)
LaNiO 7 5.32 5.32 5.32 5.45 5.45 5.45 0.00 0.00 0.0022footnotemark: 21212footnotemark: 12 - - 0.0001 Garcia-Munoz et al. (1992) Rüegg et al. (2012)
LaMnO 7 3.84 3.84 3.84 3.88 3.88 3.88 0.00 0.00 0.0022footnotemark: 21212footnotemark: 12 - - - v. Náray-Szabó (1943)
GdMnO 7 5.63 7.23 8.5 5.68 7.35 8.5411footnotemark: 1 0.62 2.29 2.27 - - - Kagomiya et al. (2002)
GdCoO 7 3.72 3.72 3.74 3.80 3.80 3.80 0.00 0.00 0.0022footnotemark: 21111footnotemark: 11 - - - Casalot et al. (1971)
Semiconductor Alloys
SiGe 2 3.84 3.84 6.33 3.85 3.85 3.85 0.36 0.78 0.94 - - 0.91 Ferrari and Bocchi (2008)
SiSn 6 4.15 4.15 4.15 4.21 4.21 4.21 0.49 0.95 1.00 - - 1.11 Soref (1992) Hussain et al. (2015)
AlGaAs 2 5.59 5.59 5.59 5.66 5.66 5.66 1.21 1.95 2.20 - - 2.02 Agostini and Lamberti (2011) Agostini and Lamberti (2011)
InGaAs 6 5.84 5.84 5.85 5.84 5.84 5.84 0.00 0.82 1.1922footnotemark: 21010footnotemark: 10 - - 0.77 Agostini and Lamberti (2011) Agostini and Lamberti (2011)
InGaP 6 5.60 5.60 5.65 5.66 5.66 5.661313footnotemark: 13 1.10 1.95 1.8922footnotemark: 21010footnotemark: 10 - - 1.90
AlInAs 6 5.79 5.79 5.80 5.86 5.86 5.861313footnotemark: 13 0.97 1.86 2.1622footnotemark: 21010footnotemark: 10 - - 1.50
AlInSb 6 6.19 6.19 6.21 6.30 6.30 6.301313footnotemark: 13 0.78 1.31 1.3522footnotemark: 21010footnotemark: 10 - - 1.13
GaAsP 2 5.51 5.51 5.51 5.55 5.55 5.551313footnotemark: 13 1.20 1.77 1.9622footnotemark: 266footnotemark: 6 - - 2.03
GaAsSb 4 5.84 5.84 5.83 5.87 5.87 5.871313footnotemark: 13 0.08 0.53 0.5499footnotemark: 91212footnotemark: 12 - - 0.72
AlGaN 2 4.46 4.46 4.46 4.45 4.45 4.451313footnotemark: 13 2.36 3.66 3.83 - - 4.60

label-width = 1em , before-skip = -after-skip = -after-item-skip = - (2) Lattice parameters are shuffled. Number of bands 1,000 The mixing parameter is tuned to get experimental value Self-consistent GW Conventional unit cell KPPRA 1,000 DFT+U KPPRA 500 Number of bands 500 KPPRA 1,500 KPPRA 650 KPPRA 200 KPPRA 250

Table 4: Table 3 continued.

Iii Results

Fig. 4 shows a comparison of the calculated band gaps within GGA, HSE and GW with their experimental values for all the materials where the experimental data is available. We also include the results of Materials ProjectJain et al. (2013) (further referred to as MP) calculated within GGA (or GGA+U approach when specifically noted) for reference. As expected, in can be seen that the GGA underestimates the band gaps, and HSE and GW

both significantly improve the results. A linear regression model fit to the three different levels of theory is shown in Fig.

5. From the figure it can be seen that when a simple linear fit to the data is used, the resulting values for the model-wise errors based on the coefficient of proportionality are as follows: GGA - 35%, HSE - 17%, GW - 7%.

Figure 4: Comparative plot of the calculated and experimentally available values for all the electronic band gaps obtained in the current work. Legend: GGA, HSE, and denote the results of this work for the corresponding level of theory. MP-GGA denote the results of Materials ProjectJain et al. (2013) available at the moment of this writing.
Figure 5: Comparative plot for all the band gaps calculated in the current work, including the linear fits to data per each model. The legend is same as in Fig.4. The equations for each of the linear fits are shown in the figure.
Figure 6: Comparative plot of the calculated and experimentally available values for all the electronic band gaps in the elemental (EL) category.
Figure 7: Same as Fig. 6 for the binary oxides (BO).
Figure 8: Same as Fig. 6 for the III-V compounds (35).
Figure 9: Same as Fig. 6 for the semiconductor alloys (AL).
Figure 10: Same as Fig. 6 for the II-VI compounds (26).
Figure 11: Same as Fig. 6 for the dichalcogeniges (DC).
Figure 12: Same as Fig. 6 for the ternary oxides (TO)..
Figure 13: Calculated band gap values for different levels of theory for the materials without experimental data. The legend is same as in Fig.4 (color-wise). For materials with an asterisk sign, the MP band gaps are calculated using DFTU.

iii.1 Elements

Fig. 6 shows band gaps of elemental materials compared with experimental values. The gap for Te was measured to be 0.3 eV in Anzin et al. (1977b), however our GGA band structure does not have a gap. The lattice parameter of our relaxed structure is underestimated by 2.23%. MP-GGA value is 0.186 eV, however in their case the lattice parameter is overestimated by 1.37% Keller et al. (1977). The reduction in lattice parameter in our case can be attributed to the vdW correction. In another study the meta-GGA-SCAN functional is seen to predict the lattice parameters well Yi et al. (2018b), however the calculated HSE band gap is larger than the experimental value by 40%. An exact-exchange mixing parameter of was needed to reproduce the correct experimental gap for Te in Yi et al. (2018b). Our GW calculation predicts Te to be metallic.

Ge has the experimental band gap of 0.75 eVMadelung (2012). We find the lattice parameter of the relaxed configuration within 0.5% of the experimental valueMadelung (2012). Both HSE and GW overestimate the band gap by 16.21% and 13.51% correspondingly. While MP predicts Ge to be metallic, our calculation predicts a band gap of 0.16 eV within the GGA. We attribute this to the inclusion of semi-core states in our case.

For Si, the GGA band gap is underestimated by 48.7%. Both HSE and GW predict band gap of Si very well: within 3.6% and 0.9% correspondingly.

For boron, our GGA band gap is 19.5% less compared to the experimental valueMadelung (2012). The MP GGA band gap for boron is underestimated by 4%. We attribute this difference to the inclusion of vdW correction in our case. Our lattice parameter is about 4% smaller compared to that of MP which results in smaller band gap. Similar trend is observed for Se where our lattice parameter is underestimated by 7%.

For other materials in this category - P, As and Se we find GW to underestimate the gaps by 54%, 50% and 25.5%, respectively. Incorporating partially self-consistent GW corrects the errors to 11.1%, 21% and 10.23%, respectively. Due to the small value of the gaps for P and As, the Gaussian smearing used during the calculation (50meV) can contribute to the error significantly.

Material GW sc-GW
Gap (eV) Gap (eV) Iterations Energy cut-off (eV)
P 0.16 0.39 4 330
As 0.15 0.24 4 250
Se 1.38 2.04 3 300
BN 4.32 5.02 4 520
NiO 4.07 5.30 4 415
Table 5: Band gaps calculated with partially self-consistent GW calculations. For all the calculations the self-energy of the non-diagonal components were included. The iteration of the quasi-particle (QP) energy shifts and the energy cut-off used are also tabulated.

iii.2 III-V semiconductors

Fig. 8 has the band gaps for category 35. For BP, AlAs, GaP, GaN and AlN, HSE and GW predicts band gap within 8.1% and 7% accuracy, respectively. For GaAs, HSE predicts the band gap within 0.4%. The GW

overestimates the band gap by 15.4%, which might be due to the fact that the band gap estimate is performed on a grid as explained in sections

II and IV. Among the materials considered, BN has smaller GGA band gap compared to MP. Inclusion of vdW in our case reduces the lattice parameter by 2.4% which reduces the GGA band gap.

Our GGA band gap of BN is 26.3% smaller compared to MP value. Also the MP GGA band gap predicts the band gap within 1.7% of HSE band gap of BN. This is due to the absence of vdW interaction within the BN layers in MP calculations. While the lattice parameter of MP GGA matches within 0.8% Solozhenko et al. (1995), lattice parameter is overestimated by 23% which in turn causes the larger gap value that is close to experiment. We also find that within GW, the band gap has an error of 27.5%. We employed self-consistent GW calculation to improve the band gap, and find that 4 iterations reduce the band gap error to 11.8%.

iii.3 Ii-Vi

Fig. 10 shows the band gaps for category 26. The bonding nature of the materials in this category is ionic and covalent. As a result, the inclusion of vdW interaction does not affect the lattice parameters much, except for BaSe and BaTe. Due to the inclusion of vdW interaction, our lattice parameters are smaller compared to MP results and hence the gaps are smaller as well. Our HSE band gap of MgS is underestimated by 19.67% compared to Heyd et al. (2005). We attribute this difference to the smaller lattice constant due to vdW correction in our case.

iii.4 Dichalcogenides

Fig. 11 has the results for category DC. All the materials within this category are layered two-dimensional structures. For these materials, the inclusion of vdW interaction is critical to get the lattice parameter correctly. For HfSe and MoS, our calculated value is within 1.65%Greenaway and Nitsche (1965) and 0.8%Bronsema et al. (1986b) of the experimental value. The MP relaxed structures overestimate the values by 21% and 12.7%, respectively. Within GGA, TiS is predicted not to have a band gap, while the HSE and GW calculations open it. HSE overestimates the gap by 27.7%.

iii.5 Binary Oxides

Fig. 11 has results for binary oxides. Within this category MP GGA band gaps match closely with our values, except in the case of NiO. While the MP has a gap of 2.498 eV, our calculation predicts material to be metallic. This discrepancy is due to the use Hubbard correction by MP, since a value of 6.2 eV was employed there. Our HSE and GW calculations predict the band gaps within 30% and 5.4% of the experimental value, respectively. Further improvement can be obtained using the self-consistent GW approach as demonstrated in Table. 5.

iii.6 Ternary oxides

The calculated band gaps for materials in category TO is compared with experimental band gap in Fig. 12. Within this category the experimental band gaps are found only for LaCoO and SrTiO, as shown in Table.3. For LaCoO, both GGA and HSE predict the material to be metallic. In GW we get a gap of 0.24 eV for LaCoO3, however, since we do not calculate the full band structure as explained in section II, a further study might be required to confirm the result. For SrTiO3 HSE predicts the gap very well with a 0.4% error while GW has an error of 10.7%.

iii.7 Semiconductor alloys

Fig. 9 has results for AL category. InGaAs has no gap within GGA, while the HSE predicts the gap within 6.1% of the experimental band gap. The GW calculation overestimates the band gap by 54.7%, we attribute this to the lack of the full band structure calculation and indirect nature of the gap. In the case of AlInAs GW overestimates the band gap by 43.9% for the same reason, while HSE has a 23.7% larger value than experimental.

iii.8 Other materials

Materials for which experimental data is not found are plotted in Fig. 13. Our GGA band gap matches well with MP band gap. HSE and GW improves the result. The band gap difference of BO within GGA between our calculation and MP can be attributed to the vdW interaction included in our calculations. Due to the localization (DFT+U) effect included in MP, VO is semiconducting while our calculation shows VO as metallic.

Within GGA, NaOsO is predicted to be metallic both by us and MP. HSE opens a band gap of 1.34 eV. GW predicts a band gap of 0.12 eV. We believe that self-consistent GW may increase the band gap closer to the HSE value. All levels of theory predict GdMnO to be semiconducting.

Iv Discussion

We meant this study as a practical ”end-to-end” benchmark of the ability of the current generation of pseudopotential density functional theory (DFT) to predict the electronic properties of materials. We also focused our attention on how it can be applied in an accessible way with minimal additional computational setup (i.e. no specialized hardware or compilation routines). As it was recently demonstrated in a comprehensive overview of the DFT simulation engines in Lejaeghere et al. (2016), most of them are inter-changeable with respect to the results delivered within the same model approximation. We selected VASPKresse (1996) as one of the most used tools in the space. Unlike the previous benchmarks, however, that considered computing aspects exclusively Mohammadi and Bazhirov (2018); Jackson et al. (2010), we went further and calculated the properties for a diverse set of material compounds.

iv.1 Accessibility

The problem of accurate calculations of the electronic band gaps has been around for nearly as long as the computing itself. Although much effort was put into producing a way to obtain reliable high fidelity results, an accessible and repeatable option is still largely missingPizzi et al. (2016); nom . Our work is an attempt to demonstrate how a standardized approach to the creation and execution of the first-principles modeling workflows developed by Exabyte Inc. can resolve the above. We present an accessible, repeatable and cost-effective way to deploy first-principles modeling workflows. Furthermore, we make the data freely available on the web, and provide an intuitive way to reproduce our work. Recently there has been much attention to high-throughput first-principles calculations of materials properties, which lead to the proliferation of the online databases and the development of the associated software toolsJain et al. (2013); Curtarolo et al. (2012); Pizzi et al. (2016); nom ; Saal et al. (2013). Our approach has similar capabilities, as demonstrated by this work, and is accessible to a larger community, in particular, to those without first-hand knowledge of DFT.

Another important recent advancement came from the data-centric approaches where large repositories of data can be used together with machine learning techniques in order to build predictive modelsIsayev et al. (2017); Ward et al. (2016). Such models are able to deliver the predictions much faster, as they do not require the solution of physical equations. Our data-centric platform can power the construction of such models, with the potential to achieve improved accuracy of predictions by basing them on more accurate DFT results. Others approached the problem from a more traditional perspective attempting to construct DFT functionals capable of delivering high accuracy for the calculations of the electronic band structuresCrowley et al. (2016). We will refrain here from discussing transferability from one material class to another for any such functional. Instead, we would like to point out that in practical applications the bottleneck that prevents the adoption of any of the aforementioned techniques is the human time required to get a prediction with a certain expected level of precision. Surely, reliably delivering high-fidelity results quickly is the best, however, an approach that takes a long time to compute but little to set up and oversee can work just as well.

Figure 14: Difficulty-wise average errors. The width of the bars are proportional to the number of materials in category. Difficulty 5 and 7 is excluded due to low count (3).
Figure 15: The average errors per each stoichiometric category. The width of the bars are proportional to the number of materials in category. Ternary oxides excluded due to low count (3).

iv.2 Fidelity and error analysis

When comparing with the available experimental data we point out some important conditions used within our approach that are known to affect the calculation results. Firstly, we conduct the structural relaxation within the GGA and subsequently use the resulting structure for HSE and GW calculations. GGA is largely believed to work well for the ground-state properties of materials, and thus little change is expected when the structures are relaxed with HSE, for exampleHeyd et al. (2005). Secondly, in order to improve the treatment of van-der-Waals (vdW) interaction within our models we introduce a correction as implemented in VASPHarl et al. (2010); Grimme (2006). This improves the results for layered materials especially, where the layered materials are considered. This is due to the fact that layered materials are self–passivated and the inter–layer interaction is dominated by the vdW interaction. Lastly, due to the computational complexity of the current implementation for

within VASP, we calculate the band gaps using the electronic eigenvalues on a grid of points inside the Brillouin zone, instead of using the standard path

Curtarolo et al. (2012) as we did for GGA and HSE. The latter fact affects the fidelity of results for indirect gap semiconductors especially where the band extrema are located far from high symmetry points sampled by the grid. Due to constraints to the availability of memory, we reduced precision for few materials as indicated in Table 3. The reduced number of k-points within the irreducible Brillouin zone may have contributed to the error as well.

Figures 14 and 15 have the data about the average errors per material category. Width of each column are proportional to the number or materials in each difficulty or category. We have omitted categories and difficulties that have experimental band gap available for less than 3 materials. We find that the GGA calculations have the largest error. HSE and GW improve the band gap by similar margin. Category D3 has the largest error, although, notably, the sampling per this category is substantially less than for D1 and D2, for example. The dichalcogenides (DC) produced the largest error by material type, although it also has to be noted that the sampling in this category is lowest. We attribute this to the applied vdW correction in the layered materials.

The lattice constants of layered materials are sensitive to the type of vdW correction applied. Case specific vdW correction to the material can improve error in lattice constants and reduce the error. Even though HSE and GW calculations improve the band gap for this category, we suspect correct inter–layer spacing can improve the band gaps further. We also find that for category EL, the GW band gaps give larger error compared to HSE calculations. The self–consistent GW calculations improve the band gap for EL materials substantially (discussed in Sec. IV.3 in detail). Category 26 and 35 compounds had most accurate predictions within HSE and GW. Category TO materials are excluded from Fig. 15 due to low sample count.

iv.3 Further improvements to accuracy

There exist multiple ways to further improve the accuracy of the results obtained in this work. For the GW calculations, the self-consistent GW approach can improve the results. Table 5 summarizes the band gaps calculated with self-consistent GW. We find the approach where the non-diagonal components of the self-energy are included to provide the best accuracyShishkin et al. (2007) within a manageable time frame. We believe that executing the self-consistent GW calculations in a high-throughput manner is already possible for the materials studied in this work. In practice, it would presently require using compute nodes with extra large memory. Although such nodes are readily available from public cloud providers, the current computational implementation in VASP is not optimized for this regime and thus we would expect the resulting calculations to be more expensive and less reliable.

Another way to improve the accuracy of the results would be to use a dynamically adjustable value for the HSE mixing parameter similar to how it is done in Skone et al. (2014). This approach would be more computationally intensive as it requires the convergence of the static dielectric constant with respect to the mixing parameter to be achieved during the calculation. Alternatively, a ”hybrid” scheme could be considered where initially the improved value for the mixing parameter is calculated ”on-the-fly” based on a statistical model, and then a ”single-shot” HSE calculation is executed.

Lastly, the precision of the resulting calculations can be improved by addressing the concerns stated in the previous sub-section related to the sampling in the Brillouin zone. We used an approach based on the KPPRA, whereas introducing the logic for explicit convergence into the resulting workflows could be beneficial. For calculations in particular a more thorough approach to treating the convergence of the end results with respect to the size of the pseudopotential basis set might be beneficial. We assumed the default recommended value of the cutoff and used all available planewave states (bands) for the summation whenever possible. When memory concerns arised we reduced the fidelity in a controlled way as explained in the section II.

Figure 16: Calculation time per each difficulty level (as defined in section II). The time is normalized per one compute node and unit cell volume (Å). The electronic band structures are calculated in full for GGA and HSE only.
Model Avg. err, () Avg. runtime Cost () Note
*Exact* 0 30 days 5,000 extrapolated
HSE 20 43 hrs 250 factual
GGA 54 18 min 5 factual
*Zero* 100 0.1 sec 0 extrapolated
Table 6: Average errors and the associated average calculation time for the HSE and GGA cases studied in this work. *Exact* and *Zero* values are constructed through a simple logarithmic fit of the HSE/GGA data for the (hypothetic) models that would produce exact and zero-fidelity results correspondingly.

iv.4 Computational time and cost

In order to provide insights about the feasibility of further improved approaches and the ability to obtain the ultimate exact accuracy, we construct a simple logarithmic regression using the data obtained for the GGA and the HSE results. We exclude G0W0 because the results for it did not include the full band structure calculations, thus its set of computed properties is different. We assume that the average simulation lifetime increases exponentially as the average error is dropping. This is, of course, an overly simplified treatment and is only meant to produce qualitative results. We base our logic on the fact that the calculation of exchange interaction as employed within the HSE formalism includes the integral sums over the electronic states, and thus increasing the number of individual computations to the square of the number of wavefunctions. As can be seen from Table 6, within this logic one would need to run a simulation for about 30 days on average in order to produce an exact result. On the opposite side, a simulation with a runtime of less than 0.1 sec would fail to produce a meaningful result.

Our motivation for the above is to provide a metric of the extent to which the physics-based first-principles modeling can augment the trial-and-error experimental approach when compared with respect to the capital and time investments required. We suggest that for the equivalent of one month of calculation time (human time) on a commodity compute server readily available from a cloud provider it is possible to obtain results that are accurate well within 20 and potentially within 1-5 range for the properties that we study in the current work. There are, admittedly, many factors that can adversely affect the result and many ways to optimize and improve upon the setup we used. Nevertheless, it is clear that the high fidelity results are not prohibitively expensive already today, and with the advancements in computing technology will become more and more prevalent. Furthermore, when compared with the capital spends required to manufacture and prototype the materials in experiment, even the ”Exact” scenario we considered above appears attractive. Moreover, when data-centric community efforts without repetition are taken into account, the costs are further amortized. We believe that the correct approach to materials development from nanoscale is to use both high-fidelity simulations and experiments in a collaborative ”funnel”-like scenario similar to how the computer-aided design and engineering is applied at present.

iv.5 Future outlook

We believe that the landscape of computational materials design is rapidly evolving toward a data-driven science where the modeling results are aggregated and classified by their precision/accuracy. We believe, however, that the major improvements in way computational materials science is used would be significantly delayed if possible at all when only performed by means of the selected few. As the volume and variety of data available to community is growing at an accelerated speed, the veracity of this data also becomes increasingly important. The approach described in this work can solve both aforementioned concerns. The improved creative ideation with contributions from people with multiple backgrounds is enabled by modeling workflows accessible in a standardized and repeatable way and the shift away from the ”medieval artisan-like” model

Pizzi et al. (2016) still prevalent nowadays. On the other hand, this work provides first proof that high precision is also achievable, perhaps only for electronic materials at this moment, using existing first-principles modeling techniques. We believe that a hybrid data-driven approach with roots in high-fidelity modeling is most powerful.

V Conclusions

We report on the application of a novel approach to materials modeling from nanoscale implemented within the Exabyte platformexa (a) to a diverse representative set of 71 semiconducting materials (ESC-71). The approach makes high-fidelity techniques such as pseudopotential Density Functional Theory with Hybrid Screened Exchange (HSE) and GW approximation available in an accessible, repeatable and data-centric manner. We introduce a categorization for the materials according to the level of approximation used and explain the implementation of the corresponding modeling workflows. We present the results for the electronic band gaps obtained within the Generalized Gradient Approximation (GGA), HSE and GW, analyze the level of fidelity for the prediction delivered by each of the models used, and discuss the corresponding computational costs.

We compare the results with experimental data and prior similar calculation attempts, when available. We find the average relative error to be within 20 for HSE and GW results and within 55 for GGA. We further find the average calculation time on a current up-to-date compute server centrally available from a public cloud provider to fit within 30 min and 48 hours respectively for GGA and HSE. For the first time ever we present not only the results and the associated data, but also an easy-to-access way to reproduce and extend the results by means of Exabyte platform.exa (b) Our work provides an accessible, repeatable, and extensible practical recipe for performing high-fidelity first-principles calculations in a high-throughput manner.

Vi Acknowledgement

This study was conceived, executed and sponsored in full by Exabyte Inc. All computations for this work were performed using Microsoft Azure cloud computing platform. We are grateful for the support from the Microsoft BizSpark team in particular. We thank Steven G. Louie, Marvin L. Cohen, Roger K. Lake, and Georg Kresse for their advice and fruitful discussions.


  • Jain et al. (2013) A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, et al., Apl Materials 1, 011002 (2013).
  • Curtarolo et al. (2012) S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R. H. Taylor, L. J. Nelson, G. L. Hart, S. Sanvito, M. Buongiorno-Nardelli, et al., Computational Materials Science 58, 227 (2012).
  • Saal et al. (2013) J. E. Saal, S. Kirklin, M. Aykol, B. Meredig,  and C. Wolverton, Jom 65, 1501 (2013).
  • Pizzi et al. (2016) G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari,  and B. Kozinsky, Computational Materials Science 111, 218 (2016).
  • (5) The NOMAD laboratory: A European Centre of Excellence.
  • Haastrup et al. (2018) S. Haastrup, M. Strange, M. Pandey, T. Deilmann, P. S. Schmidt, N. F. Hinsche, M. N. Gjerding, D. Torelli, P. M. Larsen, A. C. Riis-Jensen, et al., arXiv preprint arXiv:1806.03173  (2018).
  • Rasmussen and Thygesen (2015) F. A. Rasmussen and K. S. Thygesen, The Journal of Physical Chemistry C 119, 13169 (2015).
  • Ong et al. (2013) S. P. Ong, W. D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V. L. Chevrier, K. A. Persson,  and G. Ceder, Computational Materials Science 68, 314 (2013).
  • Larsen et al. (2017) A. H. Larsen, J. J. Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. Dułak, J. Friis, M. N. Groves, B. Hammer, C. Hargus, et al., Journal of Physics: Condensed Matter 29, 273002 (2017).
  • (10) Citrine Informatics: Materials Data Platform.
  • (11) Tilde Materials Informatics.
  • Villars et al. (2004) P. Villars, M. Berndt, K. Brandenburg, K. Cenzual, J. Daams, F. Hulliger, T. Massalski, H. Okamoto, K. Osaki, A. Prince, et al., Journal of Alloys and Compounds 367, 293 (2004).
  • Isayev et al. (2017) O. Isayev, C. Oses, C. Toher, E. Gossett, S. Curtarolo,  and A. Tropsha, Nature Communications 8, 15679 EP (2017), article.
  • Ward et al. (2016) L. Ward, A. Agrawal, A. Choudhary,  and C. Wolverton, Npj Computational Materials 2, 16028 EP (2016), article.
  • Yang et al. (2018) X. Yang, Z. Wang, X. Zhao, J. Song, M. Zhang,  and H. Liu, Computational Materials Science 146, 319 (2018).
  • Bazhirov et al. (2017) T. Bazhirov, M. Mohammadi, K. Ding,  and S. Barabash, Proceedings of the American Physical Society March Meeting 2017  (2017).
  • Klimeck et al. (2008) G. Klimeck, M. McLennan, S. P. Brophy, G. B. Adams III,  and M. S. Lundstrom, Computing in Science & Engineering 10, 17 (2008).
  • Mohammadi and Bazhirov (2018) M. Mohammadi and T. Bazhirov, in Proceedings of the 2Nd International Conference on High Performance Compilation, Computing and Communications, HP3C (ACM, New York, NY, USA, 2018) pp. 1–5.
  • exa (a) materials discovery cloud (a).
  • Hohenberg and Kohn (1964) P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964).
  • Ihm et al. (1979) J. Ihm, A. Zunger,  and M. Cohen, Journal of Physics C: Solid State Physics 12, 4409 (1979).
  • Perdew et al. (1996) J. P. Perdew, K. Burke,  and M. Ernzerhof, Physical review letters 77, 3865 (1996).
  • Heyd et al. (2003) J. Heyd, G. E. Scuseria,  and M. Ernzerhof, The Journal of chemical physics 118, 8207 (2003).
  • Hybertsen and Louie (1985) M. S. Hybertsen and S. G. Louie, Phys. Rev. Lett. 55, 1418 (1985).
  • exa (b) Exabyte platform: project URL with data about simulations (b).
  • Heyd et al. (2005) J. Heyd, J. E. Peralta, G. E. Scuseria,  and R. L. Martin, The Journal of chemical physics 123, 174101 (2005).
  • Kresse (1996) G. Kresse, Phys. Rev. B 54, 169 (1996).
  • Surh et al. (1991) M. P. Surh, M.-F. Li,  and S. G. Louie, Phys. Rev. B 43, 4286 (1991).
  • Kresse and Furthmüller (1996) G. Kresse and J. Furthmüller, Physical review B 54, 11169 (1996).
  • Kohn and Sham (1965) W. Kohn and L. J. Sham, Physical review 140, A1133 (1965).
  • Blöchl (1994) P. E. Blöchl, Physical review B 50, 17953 (1994).
  • Hacene et al. (2012) M. Hacene, A. Anciaux-Sedrakian, X. Rozanska, D. Klahr, T. Guignon,  and P. Fleurat-Lessard, Journal of computational chemistry 33, 2581 (2012).
  • Johnson and Joannopoulos (2001) S. G. Johnson and J. D. Joannopoulos, Optics express 8, 173 (2001).
  • Grimme (2006) S. Grimme, Journal of computational chemistry 27, 1787 (2006).
  • (35) Microsoft Azure Cloud Computing platform: web page.
  • exa (c) Exabyte RESTful API client: online URL (c).
  • Jette and Foote (1935) E. R. Jette and F. Foote, The Journal of Chemical Physics 3, 605 (1935).
  • Kittel et al. (1996) C. Kittel, P. McEuen,  and P. McEuen, Introduction to solid state physics, Vol. 8 (Wiley New York, 1996).
  • Shishkin and Kresse (2007) M. Shishkin and G. Kresse, Physical Review B 75, 235102 (2007).
  • Madelung (2012) O. Madelung, Semiconductors: data handbook (Springer Science & Business Media, 2012).
  • van Schilfgaarde et al. (2006) M. van Schilfgaarde, T. Kotani,  and S. V. Faleev, Physical Review B 74, 245125 (2006).
  • Keller et al. (1977) R. Keller, W. Holzapfel,  and H. Schulz, Physical Review B 16, 4404 (1977).
  • Anzin et al. (1977a) V. Anzin, M. Eremets, Y. V. Kosichkin, A. Nadezhdinskii,  and A. Shirokov, physica status solidi (a) 42, 385 (1977a).
  • Yi et al. (2018a) S. Yi, Z. Zhu, X. Cai, Y. Jia,  and J.-H. Cho, Inorganic chemistry 57, 5083 (2018a).
  • Decker and Kasper (1959) B. Decker and J. Kasper, Acta Crystallographica 12, 503 (1959).
  • Cucka and Barrett (1962) P. Cucka and C. Barrett, Acta Crystallographica 15, 865 (1962).
  • Brown and Rundqvist (1965) A. Brown and S. Rundqvist, Acta Crystallographica 19, 684 (1965).
  • Asahina and Morita (1984) H. Asahina and A. Morita, Journal of Physics C: Solid State Physics 17, 1839 (1984).
  • Gomes and Carvalho (2015) L. C. Gomes and A. Carvalho, Physical Review B 92, 085406 (2015).
  • Tran et al. (2014) V. Tran, R. Soklaski, Y. Liang,  and L. Yang, Physical Review B 89, 235319 (2014).
  • Smith et al. (1975) P. Smith, A. Leadbetter,  and A. Apling, Philosophical Magazine 31, 57 (1975).
  • Kecik et al. (2016) D. Kecik, E. Durgun,  and S. Ciraci, Physical Review B 94, 205409 (2016).
  • Barrett et al. (1963) C. Barrett, P. Cucka,  and K. Haefner, Acta Crystallographica 16, 451 (1963).
  • Brownlee (1950) L. Brownlee, Nature 166, 482 (1950).
  • Hummer et al. (2009) K. Hummer, J. Harl,  and G. Kresse, Physical Review B 80, 115205 (2009).
  • Blase et al. (1995) X. Blase, A. Rubio, S. G. Louie,  and M. L. Cohen, Physical review B 51, 6868 (1995).
  • Cassabois et al. (2016) G. Cassabois, P. Valvin,  and B. Gil, Nature Photonics 10, nphoton (2016).
  • Tran and Blaha (2009) F. Tran and P. Blaha, Physical review letters 102, 226401 (2009).
  • Lee et al. (2016) J. Lee, A. Seko, K. Shitara, K. Nakayama,  and I. Tanaka, Physical Review B 93, 115104 (2016).
  • Merrill (1977) L. Merrill, Journal of Physical and Chemical Reference Data 6, 1205 (1977).
  • Powell et al. (1993) R. Powell, N.-E. Lee, Y.-W. Kim,  and J. Greene, Journal of applied physics 73, 189 (1993).
  • Saha et al. (2011) B. Saha, T. D. Sands,  and U. V. Waghmare, Journal of Applied Physics 109, 073720 (2011).
  • Laref and Laref (2013) S. Laref and A. Laref, Journal of Materials Science 48, 5499 (2013).
  • Nejatipour and Dadsetani (2015) H. Nejatipour and M. Dadsetani, Physica Scripta 90, 085802 (2015).
  • Grzybowski and Ruoff (1983) T. Grzybowski and A. Ruoff, Physical Review B 27, 6502 (1983).
  • Grzybowski and Ruoff (1984) T. A. Grzybowski and A. L. Ruoff, Physical review letters 53, 489 (1984).
  • Luo et al. (1994) H. Luo, R. G. Greene, K. Ghandehari, T. Li,  and A. L. Ruoff, Physical review B 50, 16232 (1994).
  • Zintl et al. (1934) E. Zintl, A. Harder,  and B. Dauth, Zeitschrift für Elektrochemie und angewandte physikalische Chemie 40, 588 (1934).
  • Mittendorf (1965) H. Mittendorf, Zeitschrift für Physik 183, 113 (1965).
  • Bronsema et al. (1986a) K. Bronsema, J. De Boer,  and F. Jellinek, Zeitschrift für anorganische und allgemeine Chemie 540, 15 (1986a).
  • Wickramaratne et al. (2014) D. Wickramaratne, F. Zahid,  and R. K. Lake, The Journal of chemical physics 140, 124710 (2014).
  • Cheiwchanchamnangij and Lambrecht (2012) T. Cheiwchanchamnangij and W. R. Lambrecht, Physical Review B 85, 205302 (2012).
  • Hodul and Stacy (1984) D. T. Hodul and A. M. Stacy, Journal of Solid State Chemistry 54, 438 (1984).
  • Abdulsalam and Joubert (2016) M. Abdulsalam and D. P. Joubert, physica status solidi (b) 253, 705 (2016).
  • Wiegers and Meerschaut (1992) G. Wiegers and A. Meerschaut, Journal of alloys and compounds 178, 351 (1992).
  • Suga et al. (2015) S. Suga, C. Tusche, Y.-i. Matsushita, M. Ellguth, A. Irizawa,  and J. Kirschner, New Journal of Physics 17, 083010 (2015).
  • Ennaoui et al. (1993) A. Ennaoui, S. Fiechter, C. Pettenkofer, N. Alonso-Vante, K. Büker, M. Bronold, C. Höpfner,  and H. Tributsch, Solar Energy Materials and Solar Cells 29, 289 (1993).
  • Ishii et al. (1999) Y. Ishii, J.-i. Murakami,  and M. Itoh, Journal of the Physical Society of Japan 68, 696 (1999).
  • Sommer et al. (2012) C. Sommer, P. Krüger,  and J. Pollmann, Physical Review B 85, 165119 (2012).
  • Tsirelson et al. (1998) V. Tsirelson, A. Avilov, Y. A. Abramov, E. Belokoneva, R. Kitaneh,  and D. Feil, Acta Crystallographica Section B 54, 8 (1998).
  • Shi et al. (2014) L. Shi, Y. Qin, J. Hu, Y. Duan, L. Qu, L. Wu,  and G. Tang, EPL (Europhysics Letters) 106, 57001 (2014).
  • Prewitt and Shannon (1968) C. Prewitt and R. Shannon, Acta Crystallographica Section B: Structural Crystallography and Crystal Chemistry 24, 869 (1968).
  • McCarthy and Welton (1989) G. J. McCarthy and J. M. Welton, Powder Diffraction 4, 156 (1989).
  • Janotti and Van de Walle (2011) A. Janotti and C. G. Van de Walle, physica status solidi (b) 248, 799 (2011).
  • Berger et al. (2010) J. Berger, L. Reining,  and F. Sottile, Physical Review B 82, 041103 (2010).
  • Finger and Hazen (1978) L. W. Finger and R. M. Hazen, Journal of Applied Physics 49, 5823 (1978).
  • Robertson (2000) J. Robertson, Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures Processing, Measurement, and Phenomena 18, 1785 (2000).
  • Pluth et al. (1985) J. Pluth, J. Smith,  and J. Faber Jr, Journal of Applied Physics 57, 1045 (1985).
  • Weinberg et al. (1979) Z. Weinberg, G. Rubloff,  and E. Bassous, Physical Review B 19, 3107 (1979).
  • Varley et al. (2012) J. Varley, A. Janotti, C. Franchini,  and C. G. Van de Walle, Physical Review B 85, 081109 (2012).
  • Kresse et al. (2012) G. Kresse, M. Marsman, L. Hintzsche,  and E. Flage-Larsen, Physical Review B 85, 045205 (2012).
  • Bernal et al. (1935) J. Bernal, E. Djatlowa, I. Kasarnowsky, S. Reichstein,  and A. Ward, Zeitschrift für Kristallographie-Crystalline Materials 92, 344 (1935).
  • Rao and Kearney (1979) A. Rao and R. Kearney, physica status solidi (b) 95, 243 (1979).
  • Klein et al. (1998) W. Klein, K. Armbruster,  and M. Jansen, Chemical Communications , 707 (1998).
  • Kucharczyk and Niklewski (1979) D. Kucharczyk and T. Niklewski, Journal of Applied Crystallography 12, 370 (1979).
  • Banfield et al. (1991) J. F. Banfield, D. R. Veblen,  and D. J. Smith, American Mineralogist 76, 343 (1991).
  • Berger and Neaton (2012) R. F. Berger and J. B. Neaton, Physical Review B 86, 165211 (2012).
  • SASAKI et al. (1979) S. SASAKI, K. FUJINO,  and Y. TAKÉUCHI, Proceedings of the Japan Academy, Series B 55, 43 (1979).
  • Gillen and Robertson (2013) R. Gillen and J. Robertson, Journal of Physics: Condensed Matter 25, 165502 (2013).
  • Toroker et al. (2011) M. C. Toroker, D. K. Kanan, N. Alidoust, L. Y. Isseroff, P. Liao,  and E. A. Carter, Physical Chemistry Chemical Physics 13, 16644 (2011).
  • Riefer et al. (2011) A. Riefer, F. Fuchs, C. Rödl, A. Schleife, F. Bechstedt,  and R. Goldhahn, Physical Review B 84, 075218 (2011).
  • Bates et al. (1962) C. H. Bates, W. B. White,  and R. Roy, Science 137, 993 (1962).
  • Primak et al. (1948) W. Primak, H. Kaufman,  and R. Ward, Journal of the American Chemical Society 70, 2043 (1948).
  • Bajdich et al. (2015) M. Bajdich, J. K. Nørskov,  and A. Vojvodic, Physical Review B 91, 155401 (2015).
  • Zachariasen (1949) W. Zachariasen, Acta Crystallographica 2, 388 (1949).
  • Lacorre et al. (1991) P. Lacorre, J. Torrance, J. Pannetier, A. Nazzal, P. Wang,  and T. Huang, Journal of Solid State Chemistry 91, 225 (1991).
  • McCarthy et al. (1969) G. J. McCarthy, W. B. White,  and R. Roy, Materials Research Bulletin 4, 251 (1969).
  • Jauch and Palmer (1999) W. Jauch and A. Palmer, Physical Review B 60, 2961 (1999).
  • Long et al. (2013) J. Long, L. Yang,  and X. Wei, Journal of Alloys and Compounds 549, 336 (2013).
  • Van Benthem et al. (2001) K. Van Benthem, C. Elsässer,  and R. French, Journal of applied physics 90, 6156 (2001).
  • Thornton et al. (1986) G. Thornton, B. Tofield,  and A. Hewat, Journal of Solid State Chemistry 61, 301 (1986).
  • Chainani et al. (1992) A. Chainani, M. Mathew,  and D. Sarma, Physical Review B 46, 9976 (1992).
  • Zhang et al. (2014) X.-b. Zhang, F. Gang,  and H.-l. Wan, Chinese Journal of Chemical Physics 27, 274 (2014).
  • Garcia-Munoz et al. (1992) J. Garcia-Munoz, J. Rodriguez-Carvajal, P. Lacorre,  and J. Torrance, Physical review B 46, 4414 (1992).
  • Rüegg et al. (2012) A. Rüegg, C. Mitra, A. A. Demkov,  and G. A. Fiete, Physical Review B 85, 245131 (2012).
  • v. Náray-Szabó (1943) S. v. Náray-Szabó, Naturwissenschaften 31, 466 (1943).
  • Kagomiya et al. (2002) I. Kagomiya, K. Kohn,  and T. Uchiyama, Ferroelectrics 280, 131 (2002).
  • Casalot et al. (1971) A. Casalot, P. Dougier,  and P. Hagenmuller, Journal of Physics and Chemistry of Solids 32, 407 (1971).
  • Ferrari and Bocchi (2008) C. Ferrari and C. Bocchi, in Characterization of Semiconductor Heterostructures and Nanostructures (Elsevier, 2008) pp. 93–132.
  • Soref (1992) R. Soref, Journal of applied physics 72, 626 (1992).
  • Hussain et al. (2015) A. M. Hussain, N. Wehbe,  and M. M. Hussain, Applied Physics Letters 107, 082111 (2015).
  • Agostini and Lamberti (2011) G. Agostini and C. Lamberti, Characterization of semiconductor heterostructures and nanostructures (Elsevier, 2011).
  • Anzin et al. (1977b) V. Anzin, M. Eremets, Y. V. Kosichkin, A. Nadezhdinskii,  and A. Shirokov, physica status solidi (a) 42, 385 (1977b).
  • Yi et al. (2018b) S. Yi, Z. Zhu, X. Cai, Y. Jia,  and J.-H. Cho, Inorganic chemistry 57, 5083 (2018b).
  • Solozhenko et al. (1995) V. Solozhenko, G. Will,  and F. Elf, Solid state communications 96, 1 (1995).
  • Greenaway and Nitsche (1965) D. L. Greenaway and R. Nitsche, Journal of Physics and Chemistry of Solids 26, 1445 (1965).
  • Bronsema et al. (1986b) K. Bronsema, J. De Boer,  and F. Jellinek, Zeitschrift für anorganische und allgemeine Chemie 540, 15 (1986b).
  • Lejaeghere et al. (2016) K. Lejaeghere, G. Bihlmayer, T. Björkman, P. Blaha, S. Blügel, V. Blum, D. Caliste, I. E. Castelli, S. J. Clark, A. Dal Corso, S. de Gironcoli, T. Deutsch, J. K. Dewhurst, I. Di Marco, C. Draxl, M. Dułak, O. Eriksson, J. A. Flores-Livas, K. F. Garrity, L. Genovese, P. Giannozzi, M. Giantomassi, S. Goedecker, X. Gonze, O. Grånäs, E. K. U. Gross, A. Gulans, F. Gygi, D. R. Hamann, P. J. Hasnip, N. A. W. Holzwarth, D. Iuşan, D. B. Jochym, F. Jollet, D. Jones, G. Kresse, K. Koepernik, E. Küçükbenli, Y. O. Kvashnin, I. L. M. Locht, S. Lubeck, M. Marsman, N. Marzari, U. Nitzsche, L. Nordström, T. Ozaki, L. Paulatto, C. J. Pickard, W. Poelmans, M. I. J. Probert, K. Refson, M. Richter, G.-M. Rignanese, S. Saha, M. Scheffler, M. Schlipf, K. Schwarz, S. Sharma, F. Tavazza, P. Thunström, A. Tkatchenko, M. Torrent, D. Vanderbilt, M. J. van Setten, V. Van Speybroeck, J. M. Wills, J. R. Yates, G.-X. Zhang,  and S. Cottenier, Science 351 (2016), 10.1126/science.aad3000.
  • Jackson et al. (2010) K. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. Wasserman,  and N. Wright, Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom 2010) , 159 (2010).
  • Crowley et al. (2016) J. M. Crowley, J. Tahir-Kheli,  and W. A. Goddard, The Journal of Physical Chemistry Letters 7, 1198 (2016), pMID: 26944092, .
  • Harl et al. (2010) J. Harl, L. Schimka,  and G. Kresse, Phys. Rev. B 81, 115126 (2010).
  • Shishkin et al. (2007) M. Shishkin, M. Marsman,  and G. Kresse, Physical review letters 99, 246403 (2007).
  • Skone et al. (2014) J. H. Skone, M. Govoni,  and G. Galli, Phys. Rev. B 89, 195112 (2014).