Nowadays, many devices are able to supply high quality spectrum measurements. However the interpretation of the samples remains a difficult task because the numerical accuracy that is possible to obtain is very limited for medium to large molecules ( atoms). In a typical resolution of vibrational Schrödinger equations in the Born-Oppenheimer frame, one can question the correctness of the model from two principal angles. First, the quality of the results will be affected by the one of the Potential Energy Surface (PES). This last point aside, there remains the validity of the numerical solutions when the PES is provided. That is the focus of the proposed method. A prior analysis of the potential energy supplies the harmonic states that are not able to correctly describe complex combinations of translational and rotational motions of the nucleus. Although still accurately limited, the VSCF method Bowman1979 ; Bowman+1986
alone allows a better representation. With the idea that these first estimations can be combined to give a more authentic description, the cheapest technique remains the perturbation theoryChristiansen2003 ; Yagi2009 ; Respondek2009 that is known to struggle with strong resonances conventionally encountered in molecular spectroscopy Herman2013 ; Roth2009 .
Vibration configuration interaction (VCI) method Christoffel1982 ; Thompson1980 ; Christiansen2007 permits a better precision with a much higher computational expense coming from the size of the variational space that is tenfold with the number of atoms of the molecule. To avoid this bottleneck, contraction techniques have been proposed with MCTDH Multi-configurational1989 ; Worth2000 , followed by the Alternating Least Square (ALS) procedure Beylkin2005 formulated for wave function representation with the Vibrational Coupled Cluster (VCC) theory OVE ; Madsen2017 . In the category of variational methods using perturbation criteria, the decomposition AK originally designed for electronic structure calculation and introduced in the VCI context with the Vibrational Multi Reference Configuration Interaction (VMRCI) Pfeiffer2014 , has been able to identify relevant sub-blocks of the Hamiltonian matrix thereby reducing the size of the system. Analogous construction has been employed with PyVCI_VPT2PyVCIVPT2 ; Sibaev2016 111VPT2=Vibrational Perturbation Theory of order 2 and Adaptive-VCI (A-VCI) Garnier2016 ; Odunlami2017 to respectively access the fundamentals and smallest eigenvalues with a very good precision. These last approaches showed promising results in term of size reduction, but still require to a posteriori determine a big matrix block designed to improve the accuracy of the solutions. In the present work, the non zeros of this sub-block are never collected and the number of operation to perform proper Matrix Vector Products (MVPs) is highly reduced thanks to a new factorisation of the Hamiltonian. Beside, the expense of RAM is even more diminished because there is the possibility to constraint numerical accuracy on a few specified targets. In next sections are presented the context and the state of the art. The concept of duality intervenes all along the paper each time an association is made between a group of objects especially in section 4 explaining the theoretical aspects of the algorithm. Thereafter a description of the successive processed operations are followed by benchmarks. In this section, the method can provide comparable quality (i.e less than 1 deviation) of a full A-VCI calculation, but with a memory consumption scaled down by more than a factor 15. Next, the list of input parameters of the program serves as a user manual. At the end, the conclusion also mentions a list of possible future developments.
For a molecule composed of atoms, one considers NM dimensionless normal coordinates Wilson1955 with . The corresponding harmonic frequencies are designated by . The model is based on the Watson Hamiltonian Watson1968 with zero rotational angular momentum (J=0). Its vibrational part contains the PES that is a multidimensional function known only for few points calculated by a first electronic resolution most commonly accompanied by a chained evaluation of the derivativesALLEN1996 ; Allen1993
. A natural choice of interpolation points would be through the Gauss quadrature rules enhancing polynomial approximation. We can also note the relevance of the ones selected by the Adaptive Density-Guided ApproachSparta2009 ; Sparta2010 smoothing the PES where the variations of the energy is the most important. After this first step involving ab-initio calculations, a fittings is generally performed with classical least square methods Carter2001 ; Carter2002 recently improved with Kronecker factorisation Ziegler2016 . The current version of the code accepts the multivariate polynomial
identified with combinations of monomial degrees defined up to a maximal degree DP and attributed to force constants . From a general point of view, the construction fully exploits the sum of product separability of the PES which makes it a minimal requirement. The vibrational part of the Hamiltonian writes
The left over coupling terms consists on the Coriolis corrections
is the inverse of the moment of inertia simplified by its constant values obtained at equilibrium geometry and coefficientsare calculated according to the method of Meal and Polo Meal1956 . The Watson term does not modify the transition energies, then it can be independently evaluated and added to the ground state at the end of a vibrational treatment. In regard to the wave function of the total Hamiltonian , it is a linear combination of basis set elements belonging to an Ansatz
each one writing as a product of one dimensional harmonic oscillators
solution of the equation
Implicitly, is assimilated to the multi-index (cf figure 1),
and recognized by an integer coinciding with a pointer address when stored in memory. The algorithm could also possibly work optimized basis set Bowman1979 ; Bowman+1986 , but the efficiency will be impoverished because DVCI fully uses harmonic oscillator properties. Afterwards, classical variational formulation leads to the eigenvalue problem
with matrix coefficients of built from the integrals222Cf Appendix.
The curse of dimensionality
The number of basis functions exponentially increases with the number of nucleus when adopting a brute force variational method. As example, for a 12-d normal coordinate system, the dimension of the configuration space will be for a maximal quantum level equal to 10 in each direction. To solve the associated eigenvalue problem, one needs to manipulate items with the same size, and a multiplication by 8 gives about 251 terabytes of memory for a double precision vector.
3 State of the Art
In the field of basis selection techniques, one widely uses combination of VCI and perturbation criteria with variation-perturbation theory Rauhut2007 ; Scribano2008 ; Neff2009 ; Rauhut2008 ; Baraille2001 ; Carbonniere2010 ; Pouchan1997 ; Begue2007 . For an member of a growing subspace , selected basis functions should verify the perturbation criterion
where is a given threshold depending on the accuracy one wants to reach. and are eigen-pairs of relying upon the nature of the basis functions that are used. It typically designs the harmonic part (6) or the sum of single-mode VSCF operators Chaban2000 ; Chaban1999 ; Jung1996 ; Roy2013 . A practical manner to increase the configuration space is described in MULTIMODEMULTIMODE package, using four classes of excitation namely simple (S), double (D), triple (T) and quadruple (Q). The construction is flexible though it is difficult to guess the optimal combination of excitations that would provide the variational limit.
To significantly reduce memory usage, matrix entries may not be stocked and computed with pruning conditions CARNUM2 ; Avila2011 ; Avila2011a . It consists in finding proper weights function calculated on harmonic frequency criteria, and a maximal quantum level to define the VCI space
For example Brown2016 . When possible, symmetry properties may also be employed to separate basis set into groups of functions belonging to different irreducible representations. Prospering works on tensorial factorisation relying upon ALS minimisation Beylkin2005 permit a drastic reduction of RAM and time expenditure. In the present context, we can notice the efficacy of the Hierarchical Reduced-Rank Block Power Method (HRRBPM) Thomas2016 ; Thomas2015
and the tensor train factorisationRakhuba2016 . As for the MCTDH Meyer2012 ; Cao , the efficiency directly depends on the number of summed products in the PES that should be small enough to observe accurate results with a low computational cost.
Among the non variational methods, the VCC theory Christiansen2004 constitutes a robust way to get precision out of small spaces with a computational cost sharply increasing with the level of excitations exponentially deployed. This effect has been recently mitigated by incorporating the ALS techniques inside the algorithm Godtliebsen2015 ; Madsen2017 .
An other pertinent criteria for subspace selection is the residue. It is widely adopted in Davidson like methods Davidson1975 ; Ribeiro2005 ; Ribeiro2002 ; Zhou2007 and has recently been implemented in A-VCI Garnier2016 ; Odunlami2017 sharing same structure than the decomposition AK . In the theory, one considers primary and secondary spaces respectively called and . In the whole space , Hamiltonian matrix writes:
where the different sub-blocks combine and in the following way
Regarding the A-VCI is included in , namely the complement of in its image by (2), VMRCI builds and for PyVCI_VPT2 is comprised in the -level excitation space written
One can remark the use of the inclusion instead of equality because additional truncation on secondary space may be added to diminish the memory usage. For example one can cut down the maximal Harmonic energy as for the A-VCI, or only consider STD excitations in the case of VMRCI. The A-VCI is dedicated to compute the first eigenvalues of the Hamiltonian whereas VMRCI and PyVCI_VPT2 are focussed on the fundamentals and its degenerates generally constituting the states of interest. For an eigenpair of (11), and
the zero padded array ofin , the residual vector and its components write
They measure an error on the energy and on the wave function (4) respectively related to the second and one order corrections expressed in the approximation as
In an iterative process, A-VCI considers maximal components of the residual vector whereas VMRCI and PyVCI_VPT2333 and only is retained formula (15) with PyVCI_VPT2. select the configurations from the partial energies (15) lying above a given threshold impacting the quality of the final results. Whether we consider the residual vector or the correction energy, the selected configurations are always assigned to the secondary space, then the corresponding errors will be nullified at next iteration. The expected effect is to minimize the deviations between eigenpairs computed in and the ones that we would have calculated in . As a consequence, should be big enough or carefully enlarged to guaranty the pertinence of the measured precision and so the quality of the eigenpairs. In this framework of study, there is then trade off between the backing up of the entries of that would increase the memory requirement and their computation on the fly which might be time consuming.
4 Theory and algorithm
In this part, we will see that the general algorithm is made from an enhanced mix of the previously recalled methods and a new theoretical approach to the construction of the Hamiltonian. Here are the main features:
For the generation of , in addition to the usual ways of expansion previously reminded, the method is capable to calibrate an optimal choice of excitations according to the analysis of the force field and Coriolis terms.
The precision is focused on a given choice of targets, which allows an effective compression of the information and a significant gain of memory.
The presented factorisation relieves the occupation of memory by performing on-the-fly operations for the MVP (14) that are not compensated by a systematic augmentation of the latency.
4.1 The local factorisation
In the field of mathematics, the duality is a principal that associates two different sets belonging to a same or a distinct structure. A famous example is given by the Riesz representation theoremrudin1966real that assimilates a vector space to a set of linear forms (e.g scalar product). In the same order of ideas, we associate a product of creation and annihilation operators with an occupation-number vector Leaf1973 . This correspondence is based on the ascertainment that the space of occupation numbers can be generated by applying raising and lowering excitations repeatedly on any element of the same space. Considering by and the creation (or raising) and annihilation (or lowering) operators, acting on Hermite functions in the following manner cohen1977quantum
the position and derivative write
The local factorisation intends to answer the question:
Is it possible to develop and factorise expression (18) as a sum of product of excitations? Since the multiplication is not commutative, there is no straightforward response.
We can, in fact, treat it from another angle by noticing from elementary properties of Hermite functions 444Cf Appendix. that the paired elements involve only the local force constants555 because it is included in the harmonic part (6) and will be accounted when .
In the same way, for a group of four canonical vectors of lengths NM with entry 1 respectively on position the local Coriolis interactions contain the combinations
These sets are gathered in the local force field written
and defined no matter the sign of the differences .
as depicted in the following loop
From expressions (17), a creation is always accompanied by annihilation. Then, the dual of (18) consisting in its intrinsic sum of product of excitations written as if they were commutative, is constructed from the positive and negative multi-increments
and expresses as777 is abusively employed to indicate the presence of a or sign in front of .
The spaces are growing up together along the repetitions of the main loop. The set designs the added basis functions in from one iteration to the other. is completed by browsing the image . In the meantime the MVP is partially calculated for the couples as 888 is defined equation (3).
The other part of the sum and other components are then completed as explained in the next section. When setting the parameter DoGraphx to zero, one also has the possibility to directly compute the whole vector by fetching instead of in (25). Under these circumstances, a supplement of execution time balanced by a smaller usage of RAM will be observed essentially because many tests are required to locate the addresses of the members of . In addition to the classical truncations, the excitations of can be selected with ThrKXj, defining a threshold on the sum of the force constants contained in each local force field. The purpose is to avoid the runaway of size and incorporate only the most contributive basis functions to the residue.
4.2 Complementary storage
Solving an eigenvalue problem usually requires a significant amount of MVPs, then the non null coefficients of (12) might rather be collected than evaluated on the fly. So far, for any related method, the complexity to determine a non null matrix entry is at least of order where NPES designates the total number of force constant in the PES. When using the local force field for the evaluation of , it is enough to fetch
calculate the terms in brackets equation (25) for and add the harmonic energy when . The localization of the position of in is performed with a binary search BSEARCH costing operations. Consequently, the total complexity is
and one can easily check that . Indeed for 999The case intervenes only for the construction of the diagonal elements of ., the cardinal of (21) is always very much lower than NPES. The worst case being NPES/2*NM happening only for a PES with no crossings. It therefore appears that the local factorisation consumes less operations than traditional methods for the determination of matrix coefficients, and for a small amount of MVPs, will be capable to integrate into a calculation on the fly without jeopardizing the execution time. Nevertheless, it will be all the more accelerated as the address of the paired elements is known in advance. This technique is actually employed for the residual block when the parameter DoGraph is strictly positif, by keeping only the pointers on the non zeros of , thus directly accessible to complete the MVP (25) for the missing parts:
4.3 Initial space construction
Let’s consider the ordered eigenvalues of block built at iteration
One can demonstrate with the Poincaré separation theoremPoincare that each is a decreasing sequence of . Consequently, the minimization of the differences
could be effective only if no spectral hole is introduced in the interval holding targets at step zero where the eigenvalues are simple harmonic energies (6). Also, in order to integrate the perturbation effects, the initial space writes
where 101010., MaxFreqr is the maximal tracked frequency and s an empirical elongation factor accounting the global deviation between converged eigenvalues and associated harmonic energies. Its default value is set to 1.2, but it is automatically increased in agreement with the anharmonicity growing up as we get away from the smallest eigenvalue. To avoid timing issues caused by the great number of combination band111111. to be tested, the initial space is calculated by recursively applying simple raising excitations starting from the zero configuration until size consistency.
5 The iterative process
The maximal size of the arrays used for and is determined with the allocated memory controlled by the parameter Memory t. The direct sum evolves in the product space :
Note that Freq0Max is also the maximal allowed value of harmonic combination bands that is equivalent to the following pruning condition
Naturally, the resulting reference space is never fully browsed over all the possible configurations, but it constitutes a barrier for the growth of that is recursively enlarged from (24). At any iteration, the set
refers to eigenvectors ofhaving one component larger than ThrCoor o and assigned to targets given by the parameter TargetState n. After a prior construction of the objects (21), the sequence of successive main steps decomposes as
Build the initial subspace (31).
Evaluate the residual vectors on the fly and secondary basis set as explained in section 4. The expansion of can additionally be reduced with an elimination of the less contributing excitations of via the parameter ThrKXj. Biggest components (in absolute value) of residual vectors are then employed to select basis functions to be added for next iteration through the inputs NAdd k, EtaComp l and MaxAdd m.
Go back to step b as long as the maximal relative residue stays above a given threshold namely
This criterion can be fulfilled only if enough memory has been allocated at the beginning. Otherwise, the algorithm will stop until maximal number of basis functions has been reached. It also trivially appears that increasing the targets does also augment the required memory to make them converge at once.
All the calculus are done on a 64 bits, 2.70GHz quad core processor (model Intel i5-3340M) with 8 Gigabytes of RAM. No parallel process is used in here. DVCI program runs on a single CPU so that the actual computational time is the same as the CPU wall time. The memory unit is the Megaoctet (MO)equivalently known as the Megabyte (MB). On reminder, the relative residues(35), and correction energies (15) are calculated in the secondary space truncated with the pruning condition (33, 34) and generated with the most contributive excitations of (24) selected with ThrKX. For the default parameter values, a calculation carried out to the end with EpsRez will deviate around from the reference. Another indicator of convergence is given by the height of the eigenvalues which is decreasing according to the Poincaré separation theorem. Shrinking the reference space by lowering down the value of (Freq0Max, MaxQlevel), will certainly diminish the required computational resources, yet it will be difficult to predict which values will provide the variational limit. It is also possible to raise the threshold for the matrix elements, but for the benchmarks presented here, one tends to avoid cutting down matrices and reference spaces to get more chances to actually converge. To visualize the pruning condition (34) with a linear combination of quantum numbers, the weights are rounded to the closest integers.
6.1 : Acetonitrile
Methods are compared on molecule with the same PES as the one used in Ref. Avila2011 ; Leclerc2014 ; Thomas2015 ; Garnier2016 that was initially introduced by Bégué and al. Begue2005 , computed at CCSDT/cc-pVTZ level for harmonic frequencies and B3LYP/cc-pVTZ for higher order terms. This PES counts 311 terms (12 quadratic, 108 cubic and 191 quartic). The benchmark results are taken from Avila et Carrington Avila2011 where symmetry has been employed to separate the full VCI space into two smaller subspaces. This PES has a small number of non null derivatives regarding the size of the molecule. The variational space defined in Avila2011 is the pruned basis set
It contains harmonic functions and the fundamental harmonic frequencies are
The default values (Freq0Max,MaxQLevel)=(30000 ,15) induce the following rounded pruning condition
This space is rather huge (712 713 289 elements) and the calculus could be exact in a much smaller one, but it shows that there is almost no limitation on its choice.
|(0.74), (0.44),||1389.17(15)||0.0038||-0.1885||-0.1980||1385 Shimanouchi1977|
|(0.62), (0.52),||1397.94(17)||0.0042||-0.2368||-0.2485||1402 Paso1994|
|(0.67), (0.45),||2981.01(193)||0.0037||-0.2522||-0.2316||2954 Shimanouchi1977|
|Final||Final||Final||Final||CPU Wall time(s)||Memory|
|size of||size of||(Iterations)||usage (MO)|
The complexity of this problem stems from the fact that the fundamental anharmonic frequencies are dispatched far away from the extremity of the spectrum. In the recent work of Ondunlami et alOdunlami2017 , the A-VCI didn’t catch up all of them when computing the first 238 eigenvalues. Their best declared results parallelized on a 24-core Intel Xeon E5-2680 processors running at 2.8 GHz shows a maximal absolute error (relatively to the same reference) equal to 0.305 on the first 121 eigenvalues with a time equal 5637 seconds and a final basis set of elements. In the HRRBPM of Thomas et CarringtonThomas2015 all the anharmonic frequencies before 2209 were computed with a error lower than 0.38 , 3.2 hours cpu time and 115.6 MB on a single Intel Core i7-4770 processor running at 3.4 GHz. In here all the fundamentals are computed with 115.6 megabytes memory for a maximal error lower than and a time of 5 minutes 52 seconds on a single CPU running at 2.70GHz.
6.2 : Ethylene
The potential energy surface is originally the sixth order curvilinear symmetry-adapted coordinates of Delahaye & al Delahaye2014 transformed into the sextic normal coordinate force field with PyPES PyPES . In there work, point symmetry group D2h of , has been exploited to divide VCI matrix into 8 symmetry blocks of respective dimension 106 889, 101 265, 100 366, 105 518, 105 643, 101 145, 101 255, 105 697. No symmetry assumption is applied in here.
This study is supported by a comparison with the software PyVCI_VPT2 previously introduced. Although the two methods have some notable distinctions, we try to match different parameters. The thresholds on the matrix elements and force constants are set to in both cases.
Harmonic Frequencies and derivative orders of the PES
|Number of terms||45||147||290||642||1732|
The second derivatives contain the harmonic and non null crossed terms. The thresholds for basis state selection have been chosen so that the adjustment on the fundamentals is of the order respectively observable for (cf PyVCI_VPT2 manual) and . The reference space for PyVCI is (13), which appears to be quite small ( basis functions), but complete enough to get good accuracy on the fundamentals. A further study shows that this is not exact for all the frequencies close to the mid infrared limit, in particular for that is strongly coupled with . For DVCI the paired parameters (Freq0Max,MaxQLevel)=(24000 ,10) translate into the following pruned space
totalizing 15 896 872 elements.
|(Comp)||DVCI121212Watson term||PyVCI||Full VCI||(DVCI)||PyVCI||PyVCI|
|Targets||Final||Final||Final||Final||CPU Wall time||Memory|
|size of||size of||(Iterations)||usage (MO)|
|Fund||15549||1914601||4902728||40719820||20 min 43 s (6)||378|
|4368||555782||1195293||12283453||3 min (6)||83|
Even if the number of normal coordinates is the same as in the previous system, the memory requirement and the CPU time are significantly increased due to the higher number of terms in the PES.
The state has been isolated from the fundamentals with DVCI by setting ThrCooro to 0.65 whereas it is automatically included in PyVCI.
Three series of calculations have been launched for PyVCI. The first one is the full VCI calculation taking about 3 days and 3GO of RAM. The second one with the VPT2 selection in VCI(8) lasted more than 9 hours compensated by a low buffer storage thanks to a clever system of backup of VCI matrix elements in binary files on the hard drive. The maximal memory expend recorded from the command line was 226 MO of RAM and 395 MO of ROM. The third calculation was included in (containing 646646 elements) instead of VCI(8). It has spread over a period of about one day and a half with an utilization of 953 MO of ROM and 653 MO of RAM. Thus we notice that enlarging the reference space absorbs more computational resources and mechanically increases the correction energies especially for the degenerated target who is shifted by more than . From this observation, we deduce that VCI(8) is not large enough to be considered as a reference for the variational limit when the computation is not strictly reduced to the fundamentals.
In the case of DVCI, the quantum levels are very little restricted, which makes it possible to customize the secondary space and thus more correctly assimilate the correction energy to a real error. Besides, the frequencies have a better accuracy and the latency is reduced by more than a factor 20 while spending less memory even though the reference space is around 126 times larger than VCI(8) and 25 times VCI(10).
6.3 : Ethylene oxide
Here is computed the fundamentals of a 15-d Hamiltonian system where the PES is the one of Bégué and al Begue2007 calculated at the CCSD(T)/cc-pVTZ level for harmonic frequencies and B3LYP/6-31+G(d,p) for the other terms. Apart from the harmonic force constants, the PES contains 180 cubic and 445 quartic terms. The reference results are taken from the A-VCIOdunlami2017 where final basis set contains 7 118 214 elements.
Harmonic frequency in
Correspondence with the ones listed in Odunlami2017
The maximal harmonic frequency Freq0Max=30000 additionally truncated by MaxQlevel=15, leads to the rounded pruning condition
|size of||size of||(Iterations)||usage (MO)|
The total CPU wall time is then 1 day 27 minutes and 53 seconds, and the total memory usage is 6.142 Gigabytes. The maximal absolute error on eigenvalues does not go over 0.47 giving a maximal relative error lower than . As a matter of comparison the reference calculation were done with a total memory usage of 128 gigabytes and 3 days time on a 24 cores computer meaning that the CPU wall time is much larger. Less accurate results (4-5 error on higher frequencies) are achieved with HRRBPM Thomas2016 with a memory usage of 14.6 gigabytes and a CPU wall time of 8.7 days.
The PES was constructed using the adaptive density-guided approach (ADGA) introduced by Sparta et al Sparta2010 ; Sparta2009 ; Sparta2009_2 . The force constants, equilibrium geometry and normal coordinates where extracted from Madsen et al Madsen2017 . In their work they describe the construction of oxazole PES at CCSD(T)/cc-pVTZ level for the one-mode part and MP2/cc-pVTZ for the two-mode part. The three-mode part is extrapolated from the two-mode surface using MP2/cc-pVTZ gradients. The number of terms is 146 for the one mode, 4786 for the two modes and 4335 for the three modes couplings.
The maximal harmonic frequency Freq0Max=20000 associated with MaxQLevel=10, gives the rounded pruning condition