Arimoto  proposed a sequential algorithm for calculating the channel capacity of a discrete memoryless channel. Based on the Bayes probability, the algorithm is given by the alternating minimization between the input probabilities and the reverse channel matrices. For arbitrary channel matrix the convergence of the Arimoto algorithm is proved and the convergence speed is evaluated. In the worst case, the convergence speed is the order, and if the input distribution that achieves the channel capacity is in the interior of the set of input distributions, the convergence is exponential.
In this paper, we first consider the exponential convergence and evaluate the convergence speed. We show that there exist cases of exponential convergence even if is on the boundary of . Moreover, we also consider the convergence of the order, which is not dealt with in the previous studies. Especially, when the input alphabet size , we will analyze the convergence of the order in detail and the convergence speed is evaluated by the derivatives of the Kullback-Leibler divergence with respect to the input probabilities.
As a basic idea for evaluating the convergence speed, we consider that the function which defines the Arimoto algorithm is a differentiable mapping from to , and notice that the capacity achieving input distribution is the fixed point of . Then, the convergence speed is evaluated by analyzing the Taylor expansion of about the fixed point .
2 Related works
There have been many related works on the Arimoto algorithm. For example, extension to different types of channels , , , acceleration of the Arimoto algorithm , , characterization of Arimoto algorithm by divergence geometry , , , etc. If we focus on the analysis for the convergence speed of the Arimoto algorithm, we see in ,,
that the eigenvalues of the Jacobian matrix are calculated and the convergence speed is investigated in the case thatis in the interior of .
In this paper, we consider the Taylor expansion of the defining function of the Arimoto algorithm. We will calculate not only the Jacobian matrix of the first order term of the Taylor expansion, but also the Hessian matrix of the second order term, and examine the convergence speed of the exponential or order based on the Jacobian and Hessian matrices. Because our approach for the evaluation of the convergence speed is very fundamental, we hope that our results will be applied to all the existing works.
3 Channel matrix and channel capacity
Consider a discrete memoryless channel with the input source and the output source . Let be the input alphabet and be the output alphabet.
The conditional probability that the output symbol is received when the input symbol was transmitted is denoted by
and the row vectoris defined by . The channel matrix is defined by
We assume that for any there exist at least one with . This means that there are no useless output symbols.
The set of input probability distributions on the input alphabetis denoted by . The interior of is denoted by . Similarly, the set of output probability distributions on the output alphabet is denoted by .
Let be the output distribution for the input distribution , where the representation by components is , then the mutual information is defined by . The channel capacity is defined by
The Kullback-Leibler divergence for two output distributions is defined by
The Kullback-Leibler divergence satisfies , and if and only if .
An important proposition for investigating the convergence speed of the Arimoto algorithm is the Kuhn-Tucker condition on the input distribution to achieve the maximum of (2).
Theorem (Kuhn-Tucker condition) In the maximization problem (2), a necessary and sufficient condition for the input distribution to achieve the maximum is that there is a certain constant with
In (4), is equal to the channel capacity .
Since this Kuhn-Tucker condition is a necessary and sufficient condition, all the information about the capacity achieving input distribution can be derived from this condition.
4 Arimoto algorithm for calculating channel capacity
4.1 Arimoto algorithm 
A sequence of input distributions
is defined by the Arimoto algorithm as follows. First, let be an initial distribution taken in , i.e., . Then, the Arimoto algorithm is given by the following recurrence formula;
On the convergence of this Arimoto algorithm, the following results are obtained in Arimoto ;
they obtained the following theorems;
Theorem A1: If the initial input distribution is in , then
Theorem A2: If , then
where is the entropy of .
Theorem A3: If the capacity achieving input distribution is in , then
where and is a constant.
In , they consider the Taylor expansion of by , and the Taylor expansion of by , however they do not consider the Taylor expansion of the mapping , which will be considered in this paper. Further, in the above Theorem A3, they consider only the case , where the convergence is exponential.
4.2 Mapping from to
Let be the defining function of the Arimoto algorithm (6), i.e.,
Define , then we can consider that is a differentiable mapping from to , and (6) is represented by
In this paper, for the analysis of the convergence speed, we assume
The capacity achieving input distribution is unique.
Proof: By Csiszàr, p.137, eq.(37), for arbitrary ,
Substituting into (14), we have . Because is a constant,
The capacity achieving input distribution is the fixed point of the mapping in . That is, .
Proof: In the Kuhn-Tucker condition (4), let us define as the number of indices with , i.e.,
which shows .
The sequence of the Arimoto algorithm converges to the fixed point , i.e.,
We will investigate the convergence speed by using the Taylor expansion of about .
4.3 Type of index
Now, we classify the indicesin the Kuhn-Tucker condition (4) in more detail into the following 3 types;
Let us define the sets of indices as follows;
, , , . We have and .
is not empty and for any channel matrix, but and may be empty for some channel matrix.
4.4 Examples of convergence speed
Let us consider the difference of convergence speed of the Arimoto algorithm depending on the channel matrices.
For many channel matrices , the convergence is exponential, but for some special the convergence is very slow. Let us consider the following examples taking types I, II, III into account, where the input alphabet size and the output alphabet size .
(only type I) If only type I indices exist, then , hence is in the interior of . As a concrete channel matrix of this example, let us consider
For this , we have and . See Fig.1. The vertices of the large triangle in Fig.1 are the output probability distributions . We have , then considering the analogy to Euclidean geometry, can be regarded as an “acute triangle”.
(types I and II) If there are type I and type II indices, we can assume without loss of generality, hence is on the side and . As a concrete channel matrix of this example, let us consider
For this , we have and . See Fig.2. Considering the analogy to Euclidean geometry, can be regarded as a “right triangle”.
(types I and III) If there are type I and type III indices, we can assume without loss of generality, hence is on the side and . As a concrete channel matrix of this example, let us consider
For this , we have and . See Fig.3. Considering the analogy to Euclidean geometry, can be regarded as an “obtuse triangle”.
For the above , Fig.4 shows the state of convergence of . By this Figure, we see that in Examples 1 and 3 the convergence is exponential, while in Example 2 the convergence is slower than exponential.
From the above three examples, it is inferred that the Arimoto algorithm converges very slowly when type II index exists, and converges exponentially when type II index does not exist. We will analyze this phenomenon in the following.
5 Taylor expansion of about
We will examine the convergence speed of the Arimoto algorithm by the Taylor expansion of about the fixed point . Taylor expansion of the function about is
where denotes the transpose of and denotes the Euclidean norm .
In (31), is the Jacobian matrix at , i.e.,
We consider in this paper that the input probability distribution is a row vector, thus the Jacobian matrix is such as
i.e., is the component. Note that our is the transpose of a usual Jacobian matrix corresponding to column vector.
Every row sum of is equal to .
In (31), , where is the Hessian matrix of at , i.e.,
and is an abbreviated expression of the dimensional vector
satisfy the constraint , but in (31), (32), (34) we consider as independent variables to have the Taylor series approximation (31). This approximation is justified as follows. By the Kuhn-Tucker condition (4), , hence by the assumption put below (1), we have . See . For , define , i.e., is an open ball in centered at with radius . Note that is free from the constraint . Taking sufficiently small, we can have , for any . The function is defined for with , even if some . Therefore, the domain of definition of can be extended to , where is the inverse image of by the mapping . is an open neighborhood of in . Then is a function of as independent variables (free from the constraint ). We can consider (31) to be the Taylor expansion by independent variables , then substituting into (31) to obtain the approximation for about .
By (22), we will investigate the convergence
based on the Taylor expansion (36). Let
denote the components of , and write by components as , then we have
5.1 Basic analysis for fast and slow convergence
For the investigation of the convergence speed, we consider the following simple case.
Let us define a real sequence by the recurrence formula;
If , then we have , hence decays exponentially.
5.2 On Jacobian matrix
Let us consider the Jacobian matrix for any . We are assuming in (13), hence .
We will calculate the components (32) of .
we can write (11) as
then differentiating the both sides of (45) by , we have
where is the Kronecker delta.
Let us define
Note that , from Remark 2.