# Guaranteed convergence for a class of coupled-cluster methods based on Arponen's extended theory

A wide class of coupled-cluster methods is introduced, based on Arponen's extended coupled-cluster theory. This class of methods is formulated in terms of a coordinate transformation of the cluster operators. The mathematical framework for the error analysis of coupled-cluster methods based on Arponen's bivariational principle is presented, in which the concept of local strong monotonicity of the flipped gradient of the energy is central. A general mathematical result is presented, describing sufficient conditions for coordinate transformations to preserve the local strong monotonicity. The result is applied to the presented class of methods, which include the standard and quadratic coupled-cluster methods, and also Arponen's canonical version of extended coupled-cluster theory. Some numerical experiments are presented, and the use of canonical coordinates for diagnostics is discussed.

## Authors

• 1 publication
• 2 publications
• 1 publication
05/19/2021

### Coupled-Cluster Theory Revisited

We propose a comprehensive mathematical framework for Coupled-Cluster-ty...
06/06/2020

### Structure-preserving numerical methods for stochastic Poisson systems

We propose a class of numerical integration methods for stochastic Poiss...
12/16/2019

### On the Convergence of Numerical Integration as a Finite Matrix Approximation to Multiplication Operator

We study the convergence of a family of numerical integration methods wh...
01/11/2022

### CDNNs: The coupled deep neural networks for coupling of the Stokes and Darcy-Forchheimer problems

In this article, we present an efficient deep learning method called cou...
03/18/2020

### Order theory for discrete gradient methods

We present a subclass of the discrete gradient methods, which are integr...
07/15/2021

### The Completion of Covariance Kernels

We consider the problem of positive-semidefinite continuation: extending...
10/20/2021

### KabOOM: Unsupervised Crash Categorization through Timeseries Fingerprinting

Modern mobile applications include instrumentation that sample internal ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

It is with delight that the authors dedicate this work to Professor Jürgen Gauß on the occasion of his sixtieth birthday. In the spirit of his pursuit of scientific rigor, especially the attention to detail in coupled-cluster (CC) theory, we here present a mathematical study of some alternative formulations based on Arponen’s extended CC (ECC) method Arponen1983; Arponen1987.

Our perspective and our assumptions are natural for the electronic structure problem of molecules in the Born–Oppenheimer approximation, but should also be useful in a more general setting. The collection of methods is defined by substitution of the dual exponential by a Taylor polynomial of fixed degree in the exact ECC energy functional in canonical (C) and non-canonical (NC) coordinates introduced by Arponen (see Eqs. (2) and (13)). This can be viewed as a coordinate transformation, and leads to two hierarchies of models, the NC-ECC class using non-canonical coordinates, and the C-ECC class using canonical coordinates (see Eq. (18)). Our mathematical results imply that when the cluster operators in the energy functionals are not truncated, all these models are exact and equivalent to the Schrödinger equation. Moreover, Galerkin approximations (i.e., generic truncation schemes that can approach the untruncated limit) will converge under certain relatively mild single-reference type conditions.

The various forms of CC methods are today among the most widely used for wavefunction-based calculations on manybody systems. The main idea stems from Hubbard’s exponential parameterization of the wavefunction based on cluster operators in manybody perturbation theory Hubbard1957, which was taken as starting point for ab initio treatments by Coester and Kümmel for nuclear structure calculations in the 1950s Coester1958; Coester1960. The modern form of standard CC theory was developed by, among others, Sinanoğlu, Paldus and Cizek in the 1960s Paldus2005 and the CC method with singles, doubles and perturbative triples [CCSD(T)] today constitutes “the gold standard of quantum chemistry” due to its excellent balance between computational cost and accuracy Bartlett2007. In nuclear structure calculations the same method has gained traction in the last decade, providing excellent predictive power for light to medium nuclei Hagen2014. Coupled-cluster theory has also been applied to superconductivity Emrich1984, lattice gauge theory McKellar2000, and systems of trapped bosons such as Bose–Einstein condensates Cederbaum2006. These examples and the cited works are by no means exhaustive, but serve to illustrate the flexibility of the CC formalism.

In the early 1980s, Arponen introduced a novel concept into CC theory, namely the bivariational principle Arponen1982; Arponen1983, resulting in the ECC method Arponen1983; Arponen1987; Arponen1987b, and an interpretation of standard CC theory and ECC theory as variational methods in a more general sense, i.e., they are bivariational. Today, the view of the CC energy functional as a Lagrangian is standard in quantum chemistry Helgaker1988.

However, the ECC method has seen little use in chemistry due to its immense complexity, even for truncated versions. In physics, on the other hand, the ECC model has advantages over standard CC theory that can make it very useful. To illustrate, the ECC method correctly describes symmetry breaking in the Lipkin–Meshkov–Glick quasispin model of collective monopole vibrations in nuclei Arponen1982; Bishop1991, in contrast to the standard CC method, which cannot. For the electronic-structure problem in quantum chemistry, the standard CC model fails dramatically to reproduce dissociation curves of even simple dimers like , while the ECC method performs quite well Cooper2010; Evangelista2011, as do the quadratic CC doubles model introduced by Van Voorhis and Head–Gordon VanVorhiis2000a; Byrd2002. The latter approach has asymptotic cost similar to standard CC with singles and doubles (CCSD). Thus, we conclude that the ECC method is still worthwhile to study, and approximate forms such as studied in this article, may still prove to be useful.

The non-canonical and canonical hierarchies (N)C-ECC() introduced in this article turn out to be equivalent, and give identical predictions, when truncated with an excitation-rank complete scheme. On the other hand, the working equations are different and in fact cheaper in the canonical case, albeit marginally. An example is the NC-ECC(1)SD method, i.e., the standard CCSD approach, and the C-ECC(1)SD method, which are equivalent. We also raise the question about diagnostics for practical calculations, and show some numerical evidence that diagnostics can favorably be done using canonical coordinates, even if the computations are done in the usual manner using noncanonical variables.

Another well-known special case is NC-ECC(2)D, the quadratic coupled-cluster (QCC) method VanVorhiis2000a; Byrd2002, which is also equivalent in the canonical and non-canonical versions. Furthermore, the perfect-pairing (PP) hierarchy Lehtola2016 of amplitude truncation schemes can be applied to our methods. The PP hierarchy are approximations to the complete-active space self-consistent field (CASSCF) method, including only a tiny subset of even-rank amplitudes combined with orbital-optimization, the latter which we disregard here. The corresponding canonical and non-canonical formulations (N)C-ECC()PPH are inequivalent. The versions could also be interesting in their own right, as investigated by Byrd and coworkers in the case of QCC Byrd2002.

The remainder of the article is organized as follows: In Section II, we introduce the bivariational principle and the mathematical setting of local analysis of CC methods. The key concept of our analysis is the notion of local strong monotonicity of the flipped gradient of a smooth bivariational energy functional (see Eq. (7)). The usefulness of this property is presented in Theorem 1

, where local uniqueness and quadratic error estimates are established in a very general setting using Zarantonello’s Theorem from nonlinear monotone operator theory

Zarantonello1960; Zeidler1990. Next, Theorem 2 summarizes the main results of Ref. Laestadius2018, where strong montonicity is proven for the non-canonical ECC method. For a recent review on monotonicity in CC theory we refer to Laestadius2019, where this property is linked to spectral gaps of the systems under study. Section III presents the idea of monotonicity-preserving coordinate transformations. Our main result, Theorem 3, is a change-of-coordinates result. When combined with Theorems 1 an 2, the analysis of (N)C-ECC() follows in Corollary 4. Our tools rely heavily on the functional analytic formulation of cluster operators and the Schrödinger equation developed by Rohwedder and Schneider Schneider2009; Rohwedder2013; Rohwedder2013b. In Section IV, we perform some numerical experiments to elucidate some aspects of the (N)C-ECC() hierarchies, before we finish with some concluding remarks in Section V.

## Ii The non-canonical extended coupled-cluster model

### ii.1 Bivariational principle

The starting point is a generalization of the Rayleigh–Ritz variational principle to operators that are not necessarily self-adjoint (Hermitian in the finite-dimensional case). For simplicity, we assume a real Hilbert space . Given a system Hamiltonian , where is dense, we define a bivariate Rayleigh quotient, ,

 (1)

Requiring the functional to be stationary at with respect to arbitrary variations in the two wavefunctions leads to the conditions , and , with

, i.e., the right and left eigenvalue problem for

. If

is self-adjoint, the eigenfunctions are identical up to normalization. The introduction of two independent wavefunctions therefore might seem to complicate matters. However, the bivariate Rayleigh quotient

allows distinct approximations of and , introducing more flexibility for approximate schemes. Moreover, the state defined is a (non-Hermitian) density operator, which is unique,

When determined variationally, the Hellmann–Feynman theorem Feynman1939 gives well-defined physical predictions in terms of .

As is common in analysis of partial differential equations

ReedSimonI; Schmuedgen2012, we pass to a weak formulation, which in this case is equivalent to the strong formulation outlined above. Under the assumption that is below bounded, we can introduce a unique extension (dual space), where is a dense subspace, a Hilbert space with norm , continuously embedded in . It follows that is continuously embedded in , and we have a scale of spaces with dense embeddings, . The operator is bounded (i.e., continuous), and satisfies a Gårding estimate, i.e., for some and some ,

for all . For the electronic-structure problem can be taken to be the space of functions with finite kinetic energy.

If is finite-dimensional, we can set , simplifying matters a lot, and the reader may if she or he wishes stick to this picture for simplicity, where all operators are basically matrices. In the infinite dimensional case, however, is typically unbounded as an operator over , and the above construction is necessary.

Under the above stated conditions, is a (Fréchet) smooth map away from the singularity , and differentiation and Taylor series exist and converge locally, allowing a certain degree of intuition to be borrowed from the finite-dimensional case. The right and left Schrödinger equations are then and , respectively. This is the bivariational principle.

### ii.2 Exponential ansatz and the ECC method

The standard CC method is formulated relative to a fixed reference on determinantal form, and by introducing a cluster operator with containing all excitations of rank , i.e., of fermions relative to , we have the exact parameterization

 ψ=eTϕ0,

assuming intermediate normalization, .

Since all excitations commute, the cluster operators form a commutative Banach algebra under suitable conditions which we now describe Schneider2009; Rohwedder2013. We expand the cluster operators using amplitudes and basis operators, i.e., , where excites a number of fermions in the reference into the virtual space, i.e.,

 Xμ=c†a1ci1⋯c†ancin,

where the are among the occupied orbitals of , and among the unoccupied orbitals. The set is the generic set of amplitude indices. We introduce a Hilbert space with norm , which becomes a useful space for formulating abstract CC theory. Fundamental results include that any is a bounded operator on , such that, e.g., also is a bounded operator. Moreover, is also a bounded operator, which means that we can make sense of, e.g., , and that we can represent any intermediately normalized as with unique. Finally, all the elements of the algebra are nilpotent. The Banach algebra structure on allows CC theory to be rigorously formulated in the full, infinite-dimensional case. This was the approach taken in Ref. Laestadius2018 for a first analysis of NC-ECC theory.

Again, the finite-dimensional case may by kept in mind: In this case, cluster amplitudes are simply finite-dimensional vectors, and the existence of the exponential parameterization is a trivial result. There is no need to introduce the norm

, instead the Euclidean norm on the amplitudes may be used.

Any normalized according to can be represented by introducing a second cluster operator , viz.,

 ~ψ=e−T†eΛϕ0.

Plugging into the bivariate Rayleigh quotient, we obtain the energy functional of the non-canonical ECC method, given by

 (2)

This map is everywhere smooth, and its critical points are equivalent to the Schrödinger equation and its dual: Under the assumption that the eigenfunctions can be normalized according to , and solve the Schrödinger equation and its dual if and only if

 ∂ENC-ECC(T∗,Λ∗)∂Λ=0and∂ENC-ECC(T∗,Λ∗)∂T=0. (3)

Assuming that the eigenvalue is nondegenerate, is easily seen to be locally unique.

### ii.3 Truncations and monotonicity analysis

The NC-ECC energy is just one out of many possible parameterizations of the exact bivariate Rayleigh quotient . In this section, we take a more abstract approach and consider a general energy functional , obtained by some exact parameterization of by means of the space , i.e., by a pair of cluster operators . We will discuss several such functionals in Sec. III, obtained from the NC-ECC functional by coordinate transformations.

Only in rare cases can the amplitude equations (3) be solved exactly. Introduce therefore a discretized space of finite dimension by truncating the amplitude index set , that is, if and only if

 Td=∑μ∈Idτd,μXμ∈Vd. (4)

The set is typically defined by the restriction of the excitations to a finite virtual space (a finite basis), and to a finite excitation rank. In the chemistry literature, the excitation hierarchy for a given basis is traditionally denoted singles (S), doubles (D), and so on. In the ECC literature, one typically speaks of the SUB approximation, with being the maximum rank.

When the discrete space is established, we define a discrete solution by the stationary conditions of the restricted energy function . The stationary equations take the form

 ∂E(Td∗,Λd∗)∂λμ=∂E(Td∗,Λd∗)∂τμ=0, (5)

for all .

It is not necessary to use the traditional truncation scheme outlined here; any increasing sequence of subspaces , with a parameter, that can approximate elements in arbitrarily well by increasing can be used. We let be the distance from to measured with respect to the norm of . Consequently, for all we have as . Such a sequence of spaces is referred to as a Galerkin sequence. Other options than the traditional truncation schemes are explicitly correlated methods Hattig2012 and complete-active space methods Adamowicz2000; Kowalski2018; Lehtola2016 such as the PP hierarchy.

An often overlooked point in the physics literature is the fact that convergence of the equations does not in general imply convergence of their solutions. An important question is therefore whether the discrete critical points converge to the exact critical points as . This would imply that the energy converges too, and in a quadratic manner due to the critical point formulation.

Monotonicity is an important notion in connection to the local analysis of the CC method and its variations Schneider2009; Rohwedder2013; Rohwedder2013b; Laestadius2018; faulstich2018; Laestadius2019. The use of montonicity in the analysis of the standard CC method was introduced by Schneider and Rohwedder Schneider2009; Rohwedder2013b. It allows the establishment of locally unique solutions of the Galerkin problem and is therefore important for the motivation of numerical implementations. As such it is a fundamental result of the CC method’s practical usage in quantum chemistry. It also connects spectral gaps, e.g., HOMO-LUMO gap, to stability constants within the analysis Laestadius2019. (See also the steerable CAS-ext gap connected to the tailored CC method Kinoshita2005 that treats quasi-degenerate systems faulstich2019numerical.)

The particular monotonicity property that is key for this presentation is as follows: A function , is locally strongly monotone at if there is an open ball containing , such that for all , we have

 (6)

for some constant . (Here, the bracket is the dual pairing of and .

Furthermore, we need the concept of Lipschitz continuity: is locally Lipschitz with constant if

 ∥F(Z1)−F(Z2)∥≤L∥Z1−Z2∥.

In particular, any (Fréchet) smooth function is locally Lipschitz continuous, and so are all its derivatives.

The map that we will study is the flipped gradient of the general energy functional , defined as

 F(T,Λ)=(∂ΛE(T,Λ),∂TE(T,Λ)), (7)

or more compactly , with being the map that exchanges the partial derivatives. The motivation is as follows: If we consider the bivariate Rayleigh quotient, is not locally strongly monotone, as its critical points are saddle points. On the other hand, the flipped gradient can be seen to be locally strongly monotone near the ground state, given that this ground state is non-degenerate with a finite spectral gap to the remaining spectrum. It is natural to expect that one can find conditions such that the flipped gradient of the energy when expressed in new coordinates is locally strongly monotone.

The following is a central result, combining a result due to Zarantonello Zarantonello1960; Zeidler1990 (points 1. and 2.), adapted to the present notation and setting, and applied to the flipped gradient of an energy functional (point 3.).

###### Theorem 1.

Let be a map, and let be an open ball containing a such that .

Let be a Galerkin sequence of subspaces with being the orthogonal projector onto . Furthermore, let be the Galerkin discretization of , i.e., .

Assume that is locally strongly monotone with constant and Lipschitz continuous with constant on . Then, the following holds:

1. is the only root in .

2. There is a sufficiently large , such that for any , there exists such that . This root is unique in and we have the following error estimate (quasi-optimality of the discrete solution):

 ∥Z∗d−Z∗∥≤Lηdist(Z∗,Vd⊕Vd). (8)

Let , be a (Fréchet) smooth energy functional. Let be the flipping map as introduced after Eq. (7) and set , and .

1. For , the discrete Galerkin equations have locally unique solutions, and in addition to the error estimate (8), we have the energy error

 |E(Z∗d)−E∗| ≤C∥Z∗d−Z∗∥2 (9)

We will only present a partial proof of the theorem, as the proofs of points 1. and 2. are standard, and can be found in, e.g., Ref. Zeidler1990. Before we turn to the (short) proof of 3., we note that Brouwer’s fixed point theorem Zeidler1986 can be used to obtain a sufficient condition for the constant , where quadratic convergence sets in. In particular, if the Lipschitz and strong monotonicity constants are comparable, we here note that can be taken roughly such that implies

 κ(d)=dist(Z∗,Vd⊕Vd)<δηη+L≈δ2, (10)

where is the radius of the ball . It should be noted, that this radius is unknown in general.

###### Proof of 3..

For point 3., we note that is locally Lipschitz continuous as a consequence of being smooth, which together with strong monotonicity makes points 1. and 2. applicable. The remaining argument follows Laestadius2018 closely (where the case was treated). First, by assumption of , and are equivalent to and , respectively (note that commutes with ). Now, Taylor expanding around and evaluating at gives

 E(Z∗d)−E∗=12⟨Z,∂2E(Z∗)Z⟩+O(∥Z∥3).

By the smoothness of , there exists a constant such that

 ⟨Z,∂2E(Z∗)Z⟩≤C′∥Z∥2.

Further, the fact that on we can control the higher order terms by the quadratic one, we have

 |E(Z∗d)−E∗|≤C∥Z∗d−Z∗∥2.

Using Eq. (8) gives the full statement in Eq. (9) ∎

The error estimate (9) shows that for (smooth) energy functionals with a locally strongly monotone flipped gradient, the bivariational method of discretization behaves very similar to the usual Rayleigh–Ritz variational method of discretization. As we enlarge the Galerkin space, the discrete ground state converges, and the energy error is quadratic in the error of the state. However, we cannot guarantee convergence from above, but this is much less important than actually having a quadratic error.

The following summarizes the main results of Ref. Laestadius2018, where the proof and more details can be found:

###### Theorem 2 (NC-ECC monotonicity).

Assume that the system Hamiltonian is self-adjoint, and that the ground-state of exists, is non-degenerate, and that there is a spectral gap between the ground-state energy and the rest of the spectrum. Assume that the reference is such that it is not orthogonal to the ground-state wavefunction. Let be the corresponding critical point of , and assume that and are not too large, i.e., that is a sufficiently good approximation to . Then, is locally strongly monotone near with a constant , for some .

## Iii Monotonicity-preserving coordinate transformations

### iii.1 A class of exact coupled-cluster models

In addition to the non-canonical ECC parameterization, Arponen also considered a second parameterization of the bra and ket wavefunctions, which gives equations of motion for the time-dependent Schrödinger equation that are canonical in the sense of Hamiltonian mechanics Arponen1983; Arponen1987. (This must not be confused with the use of canonical Hartree–Fock orbitals, which is unrelated.) This parameterization is given in terms of a coordinate transformation as

 (T,Λ)=θC-ECC(T′,Λ′), (11)

where and where defined by

 (12)

which has inverse . (In Arponen’s work Arponen1987, the notation is used.) The map is smooth and invertible with a smooth inverse, and we therefore obtain a new exact energy functional

 EC-ECC=ENC-ECC∘θC-ECC% ,

with values

 (13)

A remarkable consequence of this second parameterization is that it corresponds to retaining, in the perturbation series for the ground-state energy in terms of , only those terms that can be represented by “doubly linked” diagrams Arponen1983; Arponen1987,

 (14)

This should be compared with Eq. (2). The phrase “doubly linked” means that every power of is connected to two operators, unless it is connected directly to

. Thus, the canonical coordinates represent a more compact representation in that the resulting tensor contractions or diagrams in the energy are

identical to those obtained in the NC-ECC energy (2), except for some diagrams that are explicitly eliminated.

Similarly, for the standard CC method, Arponen introduced the coordinate transformation given by

 (T,Λ)=θCC(T′,Λ′)=(T′,eΛ′−1). (15)

We obtain the energy functional , where

 (16)

that is, the standard CC Lagrangian Helgaker1988. Incidentally, the standard CC coordinates are also canonical.

The map can be generalized to Taylor polynomials. By setting

 eΛ=(eΛ′)n≡1+Λ′+12(Λ′)2+⋯+1n!(Λ′)n,

we can solve for in terms of by, e.g., considering first the singles, then doubles, etc., giving a smooth map such that . In fact, since the cluster operators are nilpotent, , where the logarithm is expanded in a (finite) Taylor series around the identity. Similarly, we can solve for in terms of , demonstrating that this map has an inverse, and in fact that this inverse is smooth. We obtain a coordinate transformation given by

 (T,Λ)=θn(T′,Λ′)=(T′,Gn(Λ′)), (17)

and the corresponding energy functional

 (18a)

Coordinate transformations form a group, and may thus be composed. By combining , we obtain an energy functional

 (18b)

Since, in the NC-ECC energy functional (2), an exponential can be inserted after without changing the result, both of these hierarchies correspond to truncations of a Baker–Campbell–Hausdorff expansion at order , and are thus manifestly extensive.

### iii.2 Coordinate transformation theorem

Equations (18a) and (18b) represent two hierarchies of exact parameterizations of the bivariate Rayleigh quotient. It is therefore of interest to determine whether they have locally strongly monotone flipped gradients. To establish this, we study the effect on local strong monotonicty of a coordinate transformation.

###### Theorem 3.

Let be a smooth energy functional, let be a critical point, and assume that is locally strongly monotone near with constant . Let a smooth with a smooth inverse be a given coordinate transformation, and let be the energy functional expressed in the new coordinates. Let be the corresponding critical point for , and let be its flipped gradient. Let be the Jacobian at . Then we have the following conclusions:

1. If , then is locally strongly monotone near with constant .

2. In the noncommuting case, if is sufficiently small, is locally strongly monotone near with constant

 η′=η∥M−1∗∥−2−C(I+∥m∗∥)∥m∗∥,

where is the constant from Theorem 1(3).

###### Proof.

With the flipped gradient and , we have

In the sequel, we omit the specification of the spaces in the dual pairing.

Let be such that , i.e., . Since is smooth, local strong monotonicity of is equivalent to (a bounded linear operator) being coercive, i.e., there exists an such that

(The constant in Eq. (6) approaches as the ball in the definition of local strong monotonicity approaches a point.) To see this, we find an expression for in terms of the energy map,

 (19)

Here is the directional derivative in the direction of such that

where . By choosing small enough, Eq. (19) and strong monotonicity gives the coercivity claim. The logical implication also goes in the reverse direction. (This will be used below.)

Recall that and that denotes the flipped gradient of . We use that is locally strongly monotone at if and only if is coercive, i.e.,

for some and all

. A straightforward application of the chain rule now gives

 Δθ(X) M∗ =∂θ(W∗)∈B(V⊕V,V⊕V).

We note that this is almost . Indeed,

 (20)

In particular, if then the last term vanishes in the utmost right-hand side of Eq. (20), and we obtain monotonicity of but with a modified constant.

In the case where , we write , and note that . We obtain,

 Δθ(X)≥η∥M∗X∥2−∥∂2E(Z∗)∥∥M∗∥∥m∗∥∥X∥2≥[η∥M−1∗∥−2−C∥(1+∥m∗∥)∥m∗∥]∥X∥2. (21)

Here, we used that has a smooth inverse, implying , and that . ∎

### iii.3 Monotonicity of (N)C-ECC(n) models

We apply Theorem 3 to the maps and that define the NC-ECC and C-ECC models, respectively. The conclusion is as follows:

###### Corollary 4.

For any of the NC-ECC or C-ECC models, the assumption that the ground-state critical point is not too large is sufficient to guarantee local strong monotonicity of the flipped gradient of the energy, and hence a quasi-optimal solution to the Galerkin problem and a quadratic error estimate for the energy.

###### Proof.

We consider the Jacobian of the coordinate map, which on block form reads

 ∂θ(T′,Λ′)=⎛⎝∂T∂T′∂T∂Λ′∂Λ∂T′∂Λ∂Λ′⎞⎠. (22)

For the map (see Eq. (17)), we first observe that by definition,

 eΛ=eΛ′+O(∥Λ′∥n+1),

from which it follows, by taking the logarithm and expanding the logarithm around , which is a finite Taylor series,

 Λ=Λ′+O(∥Λ′∥n+1).

We obtain

 ∂θn(T′,Λ′)=(I00I+O(∥Λ′∥n+1)). (23)

For the map we have, using the chain rule,

 ∂θC-ECC(T′,Λ′)=(A(Λ′)∂Λ′A(Λ′)T0I+O(∥Λ′∥n+1)). (24)

Here,

is the linear transformation on

such that (see Eq. (12)), i.e., can be expressed in terms of the matrix representation of ,

 (25)

We have , and . For both maps, the Jacobian of the coordinate transformation at the critical point becomes with . Applying Theorem 3(2), the local strong monotonicity follows, and by Theorem 1, quasi-optimality of the truncated solutions and a quadratic error estimate. ∎

We note that our estimates are probably

pessimistic for some of the models covered here. The analysis starts with a given monotonicity constant for the NC-ECC scheme, and consistently produces an for the method obtained using the coordinate change, worsening the error estimates. However, it may well be that a direct analysis of the secondary method yields a better . However, the important point here is that Theorem 3 does guarantee that the new method is convergent under some reasonable conditions. For example, we have now proven that quadratic coupled-cluster (QCC) theory VanVorhiis2000a is convergent if the reference is a sufficiently good approximation to the ground state, and using Eq. (10) also a basic means to study which truncations or Galerkin schemes can be reasonable, at least in principle. It may be interesting to see whether truncation schemes like the PP hierarchy with orbital optimization, also in a quadratic or higher formulation Byrd2002, can be further analyzed based on our results.

In the proof of Theorem 3, it arises naturally that the most favorable coordinate transformations are those that commute with the flipping map , since local strong monotonicity then follows with no assumptions on the Jacobian of the map. The Jacobian commutes with if and only if

 ∂T∂T′=∂Λ∂Λ′,and∂T∂Λ′=∂Λ∂T′, (26)

and one must assume that this holds at every point in , as one does not know a priori where the critical point is. It is not clear what such transformations in general look like, and whether such transformations are useful reparameterizations of the energy.

### iii.4 Properties of the canonical and non-canonical schemes

The coordinate transformation as represented by Eq. (25) is such that when applied to a cluster operator of rank , it generates terms with . The same is true for the inverse map. Thus, if is excitation-rank complete, i.e., it contains all excitations of rank up to and including some , then the Galerkin discretization of the NC-ECC() method commutes with changing coordinates via the coordinate map ,

 EC-ECC(n),d=ENC-ECC(n),d∘θC-ECC.

By inpection, one can also see that this holds for a doubles-only truncation, since in this case. We obtain the following result:

###### Theorem 5.

Let be excitation-rank complete or consist of doubles excitations only. Then, the discrete solutions and of the NC-ECC() and C-ECC() methods, respectively, are equivalent and related via , i.e., and .

We stress that, if is not excitation-rank complete, the canonical and non-canonical parameterizations are not equivalent. This would be the case for the PP hierarchy of truncations Byrd2002; Lehtola2016.

According to the doubly linked structure of the energy functional , see the discussion after Eq. (14), the amplitude equations for the canonical case are cheaper, albeit by a small amount. Moreover, it is reasonable to expect that the canonical solution is more compact compared to . We investigate this claim numerically in Section IV.

## Iv Numerical Results

### iv.1 Implementation

The (un-)truncated (N)C-ECC()SD equations (18) and (5), together with the untruncated coordinate transformation Eq. (25) have been implemented in a local full CI-based program, i.e., all intermediates are expressed as vectors in the full CI basis. To this end, the C-ECC() amplitudes are computed using transformed NC-ECC() residual expressions Evangelista2011. The C-ECC() amplitude equations are thus:

 0=∑ν∈I⟨ϕν,e−(Λ′)†ϕμ⟩⟨ϕ0,(e(Λ′)†)n[HS,Xν]ϕ0⟩, (27a) 0=⟨ϕμ,(e(Λ′)†)n−1HSϕ0⟩−∑ν∈I⟨ϕμ,X†νe−(Λ′)†T′ϕ0⟩⟨ϕ0,(e(Λ′)†)n[HS,Xν]ϕ0⟩, (27b)

where . The sparsity of the coordinate transformation Eq. (25) has been exploited throughout, e.g., only singles amplitudes are transformed in the case where contains only singles and doubles. The coupled amplitude equations (27) are solved iteratively starting from an MP2-guess, using an alternating scheme and applying DIIS convergence acceleration. In all computations, residuals and energies were converged to a threshold of and a.u., respectively. The (N)C-ECC()SD and (N)C-ECC()SD implementations are verified by reproducing the “CCSD” and “ECCSD” energies presented in Evangelista2011.

### iv.2 Numerical Experiments

The (N)C-ECC()SD and (N)C-ECC()DT models have been studied numerically by investigating the potential energy curves of the hydrogen fluoride molecule with intermolecular distances in a DZV basis set Evangelista2011 as well as the H model system with structural parameters in a MBS basis set Jankowski1985; Adamowicz2000. For large distances and small , respectively, the systems comprise multireference character, i.e., the weight of the Hartree–Fock configuration in the full CI wave function, , is fairly small. Thus, these species are good candidates to study novel quantum chemical methods.

The energy curves of the canonical models C-ECC()SD are identical to the non-canonical NC-ECC()SD ones and are thus not presented here. However, the results differ if excitation-rank incomplete truncation schemes are employed, e.g., when orbital optimization is considered. For instance, in canonical ECC()DT, singles amplitudes are effectively generated from doubles and triples amplitudes, while these are absent in the non-canonical model. This effect has been studied on the potential curve of the H–F molecule and is depicted in Fig. 1: The generation of singles amplitudes entails that the canonical computation is lower in energy, in particular towards the multireference region where these contribute significantly to the wave function expansion. This depends, however, on the role of the singles amplitudes in the wave function: In test computations on the H model system a different trend was observed, consistent with the diminished importance of singles in the wave function (vide infra). Therefore, we cannot conclude that the canonical coordinates are consistently better when unconventional truncations are used.

In order to investigate the effect of using different coordinates in ECC()SD computations, we calculated a set of CC diagnostics which are often used to assess the quality of CC computations Lee1989

. These are based on the largest singular value (

) and Frobenius norm () of the matrix representation of the singles amplitudes. (Equivalently, , the sum of the squares of the singles amplitudes, with the number of correlated electrons.) Though diagnostics based on doubles amplitudes are preferred, they are not as available in implementations as are the singles-based variants Jiang2012. Additionally, we computed the diagnostic which involves all the amplitudes. This choice can be motivated from monotonicity arguments and will be discussed in a forthcoming paper.

The values have been computed for truncation schemes . Since the values are very similar, only the data for is presented. Fig. 2 shows the diagnostics correlated with the multireference character for the H–F potential curve, in Fig. 3 values for the H model are shown. In H, electron correlation is dominated by doubles amplitudes, as can be seen from the small values of the singles based diagnostics. Since the reparameterization of the amplitudes in the canonical model does not affect the amplitudes of highest excitation rank, the difference between the NC-ECC(1)SD and C-ECC(1)SD amplitude vectors is negligible. This is different for the H–F case. Here, the amplitude norms of the canonical models are consistently smaller than the non-canonical variants, indicating that the wave function parameterization is more compact.

Our numerical experiments suggest, that for excitation-rank incomplete models, the canonical map generates effectively an excitation-rank complete parameterization, but does not necessarily yield significantly better results. Concerning the a priori excitation-rank complete models, it has been found that the canonical parameterization can be more compact compared to the non-canonical one, a desired property for post-Hartree–Fock methods.

## V Concluding remarks

In this article, we formulated basic error estimates for a class of exact models, defined in terms of replacing, in Arponen’s ECC method, the exact exponential of the dual cluster operator with a finite-order Taylor polynomial, the canonical C-NCC() models and the non-canonical NC-ECC() models. The central result was a coordinate-transformation theorem, Theorem 3, that gives error estimates for any method that can be described as a coordinate transformation of ECC theory. Notably, these results guarantee asymptotically quadratic error estimates for the ground-state energy of all models, under certain mild conditions.

Apart from Theorem 3, a basically self-contained mathematical framework for local error analysis of coupled-cluster methods was presented. This was based on Arponen’s bivariational principle and basic results from nonlinear monotone operator theory, i.e., Zarantonello’s theorem. Also central was our prior analysis of Arponen’s extended coupled-cluster method in its noncanonical formulation.

The methods covered by our analysis include standard CC theory, quadratic CC theory VanVorhiis2000a; Byrd2002, the perfect-pairing hierarchy Lehtola2016 for approximating CASSCF, also in its quadratic version Byrd2002, and, Arponen’s canonical ECC method.

The error estimates are not optimal for many methods. A direct analysis of canonical ECC would probably provide the most optimistic analysis for all the (N)C-ECC() methods, due to the doubly linked structure and the equivalence of excitation-rank complete Galerkin discretizations.

Finally, we performed some simple numerical experiments, focusing on the possibility of using canonical coordinates in place of the usual CC amplitudes when doing diagnostic estimates on CC calculations on systems with multireference character. Our preliminary findings support the hypothesis that the canonical coordinates are more compact compared to the usual coordinates, providing more accurate diagnostics.

An interesting extension of the present work would be to study truncations where singles-amplitudes are replaced by orbital rotations, either unitary or biorthogonal, as in the QCC and PP approaches, or the non-orthogonal orbital optimized coupled-cluster method of Pedersen and coworkers. Pedersen2001. Moreover, the complete-active space coupled-cluster method by Adamowicz and coworkers Adamowicz2000 fits the present scheme. It is also known that quadratic CC and ECC in general are quite good at reproducing multireference character, while standard single-reference CC is quite poor at this. Thus, a modified analysis of the ECC method that includes multireference assumptions, such as the steerable CAS-ext gap of Ref. faulstich2018, could potentially lead to a deeper understanding of how CC methods generally behave in the presence of static correlation.

###### Acknowledgements.
This work has received funding from the Research Council of Norway (RCN) under CoE Grant Nos. 287906 and 262695 (Hylleraas Centre for Quantum Molecular Sciences), and from ERC-STG-2014 under grant No. 639508. The authors are thankful to M. A. Csirik for useful comments.