1 Introduction
We consider a generalization of power iteration for finding the leading eigenvector of a matrix . In power iteration, the update rule
is repeatedly applied until some stopping criterion is satisfied. Since no hyperparameter is required, this update rule is very practical yet attains global linear convergence with the rate of
where is the largest absolute eigenvalue of . This convergence result is analogous to that of gradient descent for convex optimization. Therefore, many variants including coordinatewise [13], accelerated [23], stochastic [17], stochastic variancereduced (VR)
[19, 20], and stochastic VR heavy ball [10] power iterations have been developed, drawing a parallel literature to gradient descent for convex optimization. Also, a general form of power iteration has been used to solve(1) 
in many applications such as sparse principal component analysis (PCA)
[8, 16], norm kernel PCA [9], phase synchronization [15], and the BurerMonteiro factorization of semidefinite programs [3]. (All norms are 2norms unless indicated otherwise.) Nevertheless, theoretical understanding of when such an algorithm enjoys the attractive convergence property of power iteration is limited. While convex is considered in [8], only global sublinear convergence is shown, not generalizing the appealing linear convergence property of power iteration.In this work, we introduce a new class of optimization problems called scale invariant problems and show that they can be efficiently solved by scale invariant power iteration (SCIPI) with a generalized convergence guarantee of power iteration. Scale invariant problems consider scale invariant functions in (1). We say that is scale invariant, which is rigorously defined later, if its geometric surface is invariant under constant multiplication of . Many important optimization problems in statistics and machine learning can be formulated as scale invariant problems, for instance,
norm kernel PCA and maximum likelihood estimation of mixture proportions, to name a few. Moreover, as studied herein, independent component analysis (ICA), nonnegative matrix factorization (NMF), and Gaussian mixture models (GMM) can be formulated as extended settings of scale invariant problems.
Derivatives of scale invariant functions have the interesting relation that holds for some . Using the KKT condition, we derive an eigenvector property stating that any stationary point satisfying for some is an eigenvector of . Due to the eigenvector property, scale invariant problems can be locally seen as the leading eigenvector problem. Therefore, we can expect that a simple update rule like power iteration would efficiently solve scale invariant problems near a local optimum . Another interesting property of scale invariant problems is that by swapping the objective function and the constraint, a geometrically interpretable dual problem with the goal of finding the closest point to the origin from the constraint is obtained. By mapping an iterate to the dual space, taking a descent step in the dual space and mapping it back to the original space, we geometrically derive SCIPI, which replaces with in power iteration. We show that SCIPI converges to a local maximum at a linear rate when initialized close to it. The convergence rate is proportional to / where is the spectral norm of and is the Lagrange multiplier corresponding to , generalizing the convergence rate of power iteration. Moreover, under some mild conditions, we provide an explicit expression regarding the initial condition on to ensure linear convergence.
In the extended settings, we discuss three variants of scale invariant problems. In the first setting,
is replaced with a sum of scale invariant functions. This setting covers a Kurtosisbased ICA and can be solved by SCIPI with similar convergence guarantees. We also consider a block version of scale invariant problems which covers NMF and the BurerMonteiro factorization of semidefinite programs. To solve block scale invariant problems, we present a block version of SCIPI and show that it attains linear convergence in a twoblock case. Lastly, we consider partially scale invariant problems which include general mixture problems such as GMM. To solve partially scale invariant problems, we present an alternative algorithm based on SCIPI and the gradient method and prove its local linear convergence. In numerical experiments, we benchmark the proposed algorithms against stateoftheart methods for KLNMF, GMM and ICA. The experimental results show that our algorithms are computationally competitive and result in better solutions in select cases.
Our work has the following contributions.
1. We introduce scale invariant problems which cover interesting examples in statistics and machine learning yet can be efficiently solved by SCIPI due to the eigenvector property.
2. We present a geometric derivation of SCIPI using a dual reformulation and provide a convergence analysis for it. We show that SCIPI converges to a local maximum at a linear rate when initialized close to , generalizing the attractive convergence property of power iteration. Moreover, we introduce 3 extended settings of scale invariant problems together with their convergence analyses.
3. We report numerical experiments including a novel reformulation of KLNMF to extended settings of scale invariant problems. The experimental results demonstrate that SCIPI are not only computationally competitive to stateoftheart methods but also often yield better solutions.
The paper is organized as follows. In Section 2, we define scale invariance and present interesting properties of scale invariant problems including an eigenvector property and a dual formulation. We then provide a geometric derivation of SCIPI and a convergence analysis in Section 3. The extended settings are discussed in Section 4 and we report the numerical experiments in Section 5. All proofs are deferred to Supplementary Material.
2 Scale Invariant Problems
Before presenting properties of scale invariant problems, we first define scale invariant functions.
Definition 2.1.
We say that a function is multiplicatively scale invariant if it satisfies
(2) 
and is additively scale invariant if it satisfies
(3) 
for some even functions with and with .
The following proposition characterizes the exact form of and for continuous .
Proposition 2.2.
Next, we establish derivativebased properties of scale invariant functions.
Proposition 2.3.
The interesting relation that holds for some is presented in Proposition 2.3. Using the firstorder optimality conditions, we derive an eigenvector property as follows.
Proposition 2.4.
Proposition 2.4 states that a stationary point is an eigenvector of and becomes a local maximum if the Lagrange multiplier is greater than the remaining eigenvalues of . Due to this property, scale invariant problems can be considered as a generalization of the leading eigenvector problem. Next, we introduce a dual formulation of scale invariant problems.
Proposition 2.5.
Suppose that a continuous function is either multiplicatively scale invariant such that or additively scale invariant with an additive factor with . Then, solving (1) is equivalent to solving the following optimization problem
(6) 
Note that a dual reformulation for a multiplicatively scale invariant with or an additively scale invariant with can be obtained by replacing with in (6). The dual formulation (6) has a nice geometric interpretation that an optimal solution is the closest point to the origin from . This understanding is used to derive SCIPI in Section 3.
norm kernel PCA, estimation of mixture proportions, and KLdivergence NMF are all cases of scale invariant problems. The details of these cases are provided in Appendix A.1.
3 Scale Invariant Power Iteration
In this section, we provide a geometric derivation of SCIPI to find a local optimal solution of (1). The algorithm is developed using the geometric interpretation of the dual formulation (6) as illustrated in Figure 1. Starting with an iterate , we obtain a dual iterate by projecting to the constraint . Given
, we identify the hyperplane
which the current iterate lies on and is tangent to . After identifying the equation of , we find the closest point to the origin from and obtain a new dual iterate by projecting to the constraint . Finally, we obtain a new primal iterate by mapping back to the set .Now, we develop an algorithm based on the above idea. For derivation of the algorithm, we assume that an objective function is continuous and satisfies either (2) with where and for all or (3) with where . Under these conditions, a scalar mapping from to can be well defined as or , respectively. Let . Since is on the constraint
, the tangent vector of the hyperplane
is . Therefore, we can write down the equation of the hyperplane as . Note that is a scalar multiple of where the scalar can be determined from the requirement that is on . Since is the projection of , it must be a scalar multiple of the tangent vector . Therefore, we can write as . Finally, by projecting to , we obtainwhere the last equality follows from Proposition 2.3. Summarizing all the above, we obtain SCIPI presented in Algorithm 1.
Next, we provide a convergence analysis of SCIPI. Global sublinear convergence of SCIPI for convex has been addressed in [8]. We additionally show that SCIPI yields an ascent step even for quasiconvex .
Proposition 3.1.
If is quasiconvex and differentiable, a sequence of iterates generated by SCIPI satisfies for .
If is quasiconvex, is convex, therefore, from Figure 1, we can expect that SCIPI would yield an ascent step. If is not quasiconvex, is not necessarily increasing, making it hard to analyze global convergence. Assuming that an initial point is close to a local maximum , we study local convergence of SCIPI as follows.
Theorem 3.2.
Let be a scale invariant, twice continuously differentiable function on an open set containing and let be a local maximum satisfying and where is an eigenpair of with . Then, there exists some such that under the initial condition , the sequence of iterates generated by SCIPI satisfies
Moreover, if has a continuous Hessian on an open set containing , we can explicitly write as
where
Theorem 3.2 presents a local convergence result of SCIPI with generalizing the convergence rate of power iteration. Note that Theorem 3.2 requires while it is sufficient to have for to ensure local optimality. However, by adding for some to the objective function , we can always enforce . On the other hand, by adding for some , we may improve the convergence rate by increasing the relative gap between and .
4 Extended Settings
4.1 Sum of Scale Invariant Functions
Consider a sum of scale invariant functions having the form of where is a multiplicatively scale invariant function with and is an additively scale invariant function with . Note that this does not imply that is scale invariant in general. However, by Proposition 2.3, the gradient of has the form of
therefore a stationary point satisfying is an eigenvector of . We present a local convergence analysis of SCIPI for a sum of scale invariant functions as follows.
Theorem 4.1.
Let be a sum of scale invariant functions and twice continuously differentiable on an open set containing and let be a local maximum satisfying and Then, there exists some such that under the initial condition , the sequence of iterates generated by SCIPI satisfies
Moreover, if has a continuous Hessian on an open set containing ,
where
4.2 Block Scale Invariant Problems
Next, consider a class of optimization problems having the form of: where is scale invariant in for fixed and vice versa. We derive the following alternating maximization algorithm called block SCIPI as
(7) 
Theorem 4.2.
Suppose that is twice continuously differentiable on an open set containing and let be a local maximum satisfying , , , where and are eigenpairs of and , respectively such that and . If holds, then for the sequence of iterates generated by (7), there exists some such that if then we have for some sequence such that where
4.3 Partially Scale Invariant Problems
Lastly, we consider a class of optimization problems of the form: where is a scale invariant function in for each . This problem has the form of (1) with respect to once is fixed. Also, by fixing , we obtain an unconstrained optimization problem with respect to . Using SCIPI and the gradient method, an alternative maximization algorithm is derived as
(8) 
While the gradient method is used in (8), any method for unconstrained optimization can replace it.
Theorem 4.3.
Suppose that is scale invariant in for each , strongly concave in with an Lipschitz continuous for each , and threetimes continuously differentiable on an open set containing . Let be a local maximum satisfying and where is an eigenpair of with . If holds, then for the sequence of iterates generated by (8) with , there exists some such that if , then we have for some sequence such that where
5 Numerical Experiments
We tested the proposed algorithms on realworld data sets. All experiments were implemented on a standard laptop (2.6 GHz Intel Core i7 processor and 16GM memory) using the Julia programming language. Let us emphasize that scale invariant problems frequently appear in many important applications in statistics and machine learning. We select 3 important applications, KLNMF, GMM and ICA. A description of the data sets is provided in Supplementary Material. All of them are standard sets used in prior works on these 3 problems.
KlNmf:
The KLdivergence NMF (KLNMF) subproblem can be solved via SCIPI (see Supplementary Material A.1). Our focus is to compare this algorithm with other famous alternating minimization algorithms listed below, updating and alternatively. To lighten the notation, let , and denote elementwise product, division and square, respectively. We let and denote a vector of ones.

Projected gradient descent (PGD): It iterates followed by projection onto the simplex, where is an appropriate learning rate [14].

Multiplicative update (MU): A famous multiplicative update algorithm is originally suggested by [12], which iterates and is learning rate free.

Our method (SCIPI): It iterates and rescales , where is a shift parameter. We simply use for preconditioning.

Sequential quadratic programming (MIXSQP): Solving each subproblem via a convex solver mixsqp [11]. This algorithm performs sequential nonnegative least squares.
To study the convergence rate for KLNMF subproblems, we use four simulated data sets exhibited in [11]. We study MU, PGD and SCIPI since they have the same order of computational complexity per iteration, but omit MIXSQP since it is a secondorder method which cannot be directly compared. For PGD, the learning rate is optimized by grid search. The stopping criterion is where is the solution obtained by MIXSQP after extensive computation time. The average runtime for aforementioned 3 methods are 33, 33 and 30 seconds for 10,000 iterations, respectively. The result is shown in Figure 2^{1}^{1}1For each evaluation, we randomly draw 10 initial points and report the averaged relative errors with respect to . The initial input for the KLNMF problem is a onestep MU update of a Unifrandom matrix.. It shows that SCIPI outperforms the other 2 for all simulated data sets. Also, all methods seems to exhibit linear convergence.
Next, we test the 4 algorithms on 4 realworld data sets for 3 different purposes: 287 waving tree (WT) images for image reconstruction, two bagofwords data sets from the KOS blog and NIPS full papers for topic modeling, and a Wikipedia (WIKI) vote network for graph clustering. We estimate factors. At each iteration, all 4 algorithms solve subproblems simultaneously.
The result is summarized in Figure 3^{2}^{2}2In all plots we do not show the first few iterations. The initial random solutions have the gap of approximately 50% which drops to a few percent after 10 iterations where the plots start.. The convergence plots are based on the average relative errors over 10 repeated runs with random initializations. The result shows that SCIPI is an overall winner, showing faster convergence rates. The stopping criterion is the same as above. To assess overall performance when initialized differently, we select KOS and WIKI and run MU, PGD, SCIPI, and MIXSQP 10 times. The 3 algorithms except MIXSQP have (approximately) the same computational cost per iteration, take runtime of 391, 396, 408 seconds for KOS data and 372, 390, 418 seconds for WIKI data, respectively for 200 iterations. MIXSQP has a larger per iteration cost. After 400 seconds, SCIPI achieves lowest objective values in all cases but one for each data set (38 out of 40 in total). Thus it clearly outperforms other methods and also achieves the lowest variance. Unlike the other 3 algorithms, SCIPI is not an ascent algorithm but an eigenvaluebased fixedpoint algorithm. We observe that sometimes SCIPI converges to a better solution due to this fact. Admittedly, this can be potentially dangerous but for the KLNMF problem its performance turns out to be stable.
Gmm:
GMM fits a mixture of Gaussian distributions to the underlying data. Let
where is the sample index and the cluster index and let be the actual mixture proportion vector. GMM fits into our restricted scale invariant setting (Sec. 4.3) with reparametrization, but the gradient update for is replaced by the exact coordinate ascent step. The EM and SCIPI updates for can be written respectively as(9) 
We compare SCIPI and EM for different realworld data sets. All the algorithms initialize from the same standard Gaussian random variable, repeatedly for 10 times. The result is summarized in the left panel in Figure
4. The stopping criterion is . In some cases, SCIPI achieves much larger objective values even if initialized the same. In many cases the 2 algorithms exhibit the same performance. This is because estimation of ’s and ’s are usually harder than estimation of , and EM and SCIPI have the same updates for and . For a few cases EM outperforms SCIPI. Let us mention that SCIPI and EM have the same order of computational complexity and require 591 and 590 seconds of total computation time, respectively.Ica:
We implement SCIPI on the Kurtosisbased ICA problem [6] and compare it with the benchmark algorithm FastICA [5], which is the most popular algorithm. Given a preprocessed^{3}^{3}3A centered matrix is preprocessed by so that . data matrix , we seek to maximize an approximated negative entropy subject to , for maximizing Kurtosisbased nonGaussianity [7]. This problem fits into the sum of scale invariant setting (Sec. 4.1). SCIPI iterates and FastICA iterates , both followed by normalization.
We compare SCIPI and FastICA for different realworld data sets. The majority of data points (81 out of 100 in total) show that SCIPI tends to find a better solution with a larger objective value, but in a few cases SCIPI converges to a suboptimal point. Both algorithms are fixedpoint based and thus have no guarantee of global convergence but overall SCIPI outperforms FastICA. SCIPI and FastICA have the same order of computational complexity and require 11 and 12 seconds of total computation time, respectively.
6 Final Remarks
In this paper, we propose a new class of optimization problems called the scale invariant problems, together with a generic solver SCIPI, which is indeed an eigenvaluebased fixedpoint iteration. We showed that SCIPI directly generalizes power iteration and enjoys similar properties, for instance, that SCIPI has local linear convergence under mild conditions and its convergence rate is determined by eigenvalues of the Hessian matrix at a solution. Also, we extend scale invariant problems to problems with more general settings. Although scale invariance is a rather restrictive assumption, we show by experiments that SCIPI can be a competitive option for numerous important problems such as KLNMF, GMM and ICA. Finding more examples and extending SCIPI further to a more general setting is a promising direction for future studies.
References
 [1] Christopher M Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
 [2] Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning, 8(34):231–357, 2015.
 [3] Murat A Erdogdu, Asuman Ozdaglar, Pablo A Parrilo, and Nuri Denizcan Vanli. Convergence Rate of BlockCoordinate Maximization BurerMonteiro Method for Solving Large SDPs. arXiv preprint arXiv:1807.04428, 2018.
 [4] Cédric Févotte and Jérôme Idier. Algorithms for Nonnegative Matrix Factorization with the divergence. Neural Computation, 23(9):2421–2456, 2011.

[5]
Aapo Hyvarinen.
Fast ICA for Noisy Data using Gaussian Moments.
In Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI, volume 5, pages 57–61. IEEE, 1999.  [6] Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis, volume 46. John Wiley & Sons, 2004.
 [7] Aapo Hyvärinen and Erkki Oja. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(45):411–430, 2000.
 [8] Michel Journée, Yurii Nesterov, Peter Richtárik, and Rodolphe Sepulchre. Generalized Power Method for Sparse Principal Component Analysis. Journal of Machine Learning Research, 11(Feb):517–553, 2010.
 [9] Cheolmin Kim and Diego Klabjan. A Simple and Fast Algorithm for L1norm Kernel PCA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
 [10] Cheolmin Kim and Diego Klabjan. Stochastic Variancereduced Heavy Ball Power Iteration. arXiv preprint arXiv:1901.08179, 2019.
 [11] Youngseok Kim, Peter Carbonetto, Matthew Stephens, and Mihai Anitescu. A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming. arXiv preprint arXiv:1806.01412, 2018.
 [12] Daniel D Lee and H Sebastian Seung. Algorithms for Nonnegative Matrix Factorization. In Advances in Neural Information Processing Systems, pages 556–562, 2001.
 [13] Qi Lei, Kai Zhong, and Inderjit S Dhillon. Coordinatewise Power method. In Advances in Neural Information Processing Systems, pages 2064–2072, 2016.
 [14] ChihJen Lin. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computation, 19(10):2756–2779, 2007.
 [15] Huikang Liu, ManChung Yue, and Anthony ManCho So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization, 27(4):2426–2446, 2017.
 [16] Ronny Luss and Marc Teboulle. Conditional Gradient Algorithms for RankOne Matrix Approximations with a Sparsity Constraint. SIAM REVIEW, 55(1):65–98, 2013.

[17]
Erkki Oja.
Simplified Neuron Model as a Principal Component Analyzer.
Journal of Mathematical Biology, 15(3):267–273, 1982.  [18] Prasanna K Sahoo and Palaniappan Kannappan. Introduction to Functional Equations. Chapman and Hall/CRC, 2011.
 [19] Ohad Shamir. A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate. In International Conference on Machine Learning, pages 144–152, 2015.
 [20] Ohad Shamir. Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity. In International Conference on Machine Learning, pages 248–256, 2016.
 [21] Rong Wang, Feiping Nie, Xiaojun Yang, Feifei Gao, and Minli Yao. Robust 2DPCA With Nongreedy Norm Maximization for Image Analysis. IEEE Transactions on Cybernetics, 45(5):1108–1112, 2015.
 [22] YuXiong Wang and YuJin Zhang. Nonnegative Matrix Factorization: A Comprehensive Review. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353, 2013.

[23]
Peng Xu, Bryan He, Christopher De Sa, Ioannis Mitliagkas, and Chris Re.
Accelerated Stochastic Power Iteration.
In
International Conference on Artificial Intelligence and Statistics
, pages 58–67, 2018.
Appendix A Supplementary Material
a.1 Examples
We introduce two immediate applications, norm kernel PCA and the mixture model, which have been intensively studied over the past few decades in the field of statistics and machine learning.
Example A.1 (norm kernel PCA).
Example A.2 (Estimation of Mixture Proportions).
Given a design matrix satisfying , the problem of estimating mixture proportions seeks to find a vector
of mixture proportions on the probability simplex
that maximizes the loglikelihood . We reformulate the problem by reparametrizing by and obtain(11) 
which now satisfies property (3) with .
The reformulation idea in Example A.2 implies that any simplexconstrained problem with scale invariant can be reformulated to a scale invariant problem. Example A.2 has a direct application to general mixture models, including the GMM [1]. The same optimization problem also appears in the KullbackLeibler (KL) divergence NMF problem. In what follows, we show that the KL divergence NMF subproblem is indeed a scale invariant problem.
Many popular algorithms for KLNMF are based on alternate minimization of and . We consider a subproblem given and :
(13) 
where we let and , as the objective is decomposed into separate subproblems. Problem (13) can be reformulated to a scale invariant problem as follows.
Lemma A.4.
The KLNMF subproblem (13) is equivalent to the following scale invariant problem:
(14) 
with the relationship .
a.2 Description of Data Sets
For KL divergence nonnegative matrix factorization (Section 5), we used 4 public real data sets available online^{4}^{4}4These 4 data sets are retrieved from https://www.microsoft.com/enus/research/project, https://archive.ics.uci.edu/ml/datasets/bag+of+words, and https://snap.stanford.edu/data/wikiVote.html. Waving Trees (WT) has 287 images, each having pixels. KOS and NIPS are sparse, large matrices implemented for topic modeling. WIKI is a large binary matrix having values or representing the adjacency matrix of a directed graph.
For GMM (Section 5), we used 10 public real data sets. We used all small/moderate data sets provided by the mlbench package in R. We select data sets for multiclass classification problems and run EM and SCIPI for given number of classes without class labels. The sample size varies from to , the dimension varies from to , and the number of classes varies from to . If missing data exists, we simply replace it by since our main focus is to solve the optimization problem better.
For ICA, we used 9 public data sets from the UCI Machine Learning repository^{5}^{5}5https://archive.ics.uci.edu/ml/index.php. The sample size varies from to 58,000 and the dimension varies from to .
a.3 More on Nonnegative Matrix Factorization
We first draw averaged convergence plots for the 4 real world data sets in Figure LABEL:fig:nmftotal. For the WT data set, MIXSQP exhibits a best convergence. Also, the convergence of SCIPI is much faster than those of MU and PGD. For the other 3 data sets, MIXSQP sometimes converges to suboptimal points. Also, SCIPI exhibits fastest convergence.
Next, we design a simple simulation study to evaluate the performance of block SCIPI on KLNMF problems. To this end, we sample a data matrix
independently from a single “zeroinflated” Poisson distribution (ZIP):
(15) 
where is the proportion of zero inflation and is the mean parameter of the Poisson distribution. Although this data generating distribution does not always reflect empirical distributions of realworld data sets, our focus here is to understand the behavior of SCIPI compared to the other two methods, MU and PGD.
Let and be the row and column lengths of , be the number of nonnegative factors and be the number of zero entries in . We set , , , and as a default and change some parameters to understand how the algorithms work for different settings.
Figure 5 summarizes the result. We conclude that SCIPI tends to perform better in comparison to MU and PGD when is denser ((1,1) vs. (1,3) and (2,1) vs. (2,2) in Figure 5), when is larger ((1,1) vs. (2,3) in Figure 5) and when
is more uniformly distributed ((1,1) vs (1,2) and (1,1) vs (2,1) in Figure
5).a.4 Proofs of Results in Section 2
Proof of Proposition 2.2.
We first consider the multiplicative scale invariant case. Let be a point such that . Then, we have
resulting in
for all . Letting , we have
implying that satisfies the Cauchy functional equation. Since is continuous, so is and thus . Therefore, by [18], we have
(16) 
for all . From the definition of and (16), we have
(17) 
Representing as and using (17), we obtain
If , then since , we have
contradicting the fact that is continuous at . If , we get , which contradicts . Therefore, we must have . From being an even function, we finally have
for .
Consider now the additive scale invariant case. For any , we have
resulting in
for all . Letting , we have
Since is continuous and satisfies the Cauchy functional equation, we have
for all . For , letting , we have
where
Comments
There are no comments yet.