In 1965 Cooley and Tukey developed the Discrete Fourier Transform (dft) to recover continuous functions from discrete samples. This was a landmark discovery because it allowed for the digital manipulation of analogue signals (like sound) by computers. Soon after a variant called the Fast Fourier Transform (fft) eclipsed the dft to the extent that fft is often mistakenly substituted for dft. As its name implies, the fft is a method for computing the dft faster.
ffts have an interesting application in Computer Algebra. Let be a ring with a unit. If has a primitive th root of unity with (i.e. ) then the fft computes the product of two polynomials with in operations in . Unfortunately, when is sufficiently far from a power of two many computations wasted. This deficiency was addressed by the signal processing community using a method called fft-pruning . However, the difficult inversion of this method is due to van der Hoeven .
2 The Discrete Fourier Transform
For this paper let be a ring with a unit and an th root of unity. The Discrete Fourier Transform***In signal processing community this is called the “decimation-in-time” variant of the fft., with respect to , of vector is the vector with
Alternatively we can view these -tuples as encoding the coefficients of polynomials from and define the dft with respect to as the mapping
We let the relationship between and its coefficients be implicit and write
The dft can be computed efficiently using binary splitting. This method requires evaluation only at for , rather than at all . To compute the dft of with respect to we write
and recursively compute the dft of and with respect to :
Finally, we construct according to
This description has a natural implementation as a recursive algorithm, but in practice it is often more efficient to implement an in-place algorithm that eliminates the overhead of creating recursive stacks.
Let and be a positive integers and let for . The length- bitwise reverse of is given by
Notice if we were to write 3, 24, 11, and 26 as a binary numbers to five digits we have 00011 reverses to 11000 and 01011 reverses to 11010 — in fact this is the inspiration for the name “bitwise reverse.”
For the in-place non-recursive dft algorithm, we require only one vector of length . Initially, at step zero, this vector is
and is updated (incrementally) at steps by the rule
where and for all , . Note that two additions and one multiplication are done in (1) as one product is merely the negation of the other.
We illustrate the dependencies of the values in (1) with
We call this a “butterfly” after the shape it forms and may say controls the width — the value of which decreases as increases. By placing these butterflies on a grid (Figure 1) we can see which values of are required to compute particular entires of (and vice-versa). For example
denotes that and are required to determine and (and vice-versa).
Using induction over ,
for all and . In particular, when and , we have
for all . That is, is a (specific) permutation of as illustrated in Figure 2.
The key property of the dft is that it is straightforward to invert, that is to recover from :
since whenever . This yields a polynomial multiplication algorithm that does operations in (see [1, §4.7] for the outline of this algorithm).
3 The Truncated Fourier Transform
The motivation behind the Truncated Fourier Transform (tft) is the observation that many computations are wasted when the length of (the input) is not a power of two.†††The tft is exactly equivalent to a technique called “fft pruning” in the signal processing literature . This is entirely the fault of the strategy where one “completes” the -tuple by setting when to artificially extend the length of to the nearest power of two (so the dft can be executed as usual).
However, despite the fact that we may only want components of , the dft will calculate all of them. Thus computation is wasted. We illustrate this in Figures 3 and 4. This type of wasted computation is relevant when using the dft to multiply polynomials — their products are rarely of degree one less some power of two.
The definition of the tft is similar to that of the dft with the exception that the input and output vector ( resp. ) are not necessarily of length some power of two. More precisely the tft of an -tuple is the -tuple
where , (usually ) and a th root of unity.
van der Hoeven  gives a more general description of the tft where one can choose an initial vector and target vector . Provided the ’s are distinct one can carry out the tft by considering the full dft and removing all computations not required for the desired output. In this paper, we restrict our discussion to that of the scenario in Figure 4 (where the input and output are the same initial segments) because it can be used for polynomial multiplication, and because it yields the most improvement.
If we only allow ourselves to operate in a size vector it is straightforward to modify the in-place dft algorithm from the previous section to execute the tft. (It should be emphasized that this only saves computation and not space. For a “true” in-place tft algorithm that operates in an array of size , see Harvey and Roche’s .) At stage it suffices to compute with where .‡‡‡This is a correction to the bound given in  as pointed out in .
Let , and be a primitive th root of unity in . The tft of an -tuple with respect to can be computed using at most additions and multiplications of powers of .
Let ; at stage we compute . So, in addition to we compute
more values. Therefore, in total, we compute at most
values . The result follows. ∎
4 Inverting The Truncated Fourier Transform
Unfortunately, the tft cannot be inverted by merely doing another tft with and adjusting the output by some constant factor like inverse of the dft. Simply put: we are missing information and must account for this.
Let , , with a th primitive root of unity. Setting , the tft of at 5 is
Now, to show the tft of this with respect to is not , define
The tft of with respect to is
which is not a constant multiple of .
This discrepancy is caused by the completion of to — we should have instead completed to .
To invert the tft we use the fact that whenever two values among
are known, that the other values can be deduced. That is, if two values of some butterfly are known then the other two values can be calculated using (1) as the relevant matrix is invertible. Moreover, these relations only involve shifting (multiplication and division by two), additions, subtractions and multiplications by roots of unity — an ideal scenario for implementation.
As with dft, observe that can be calculated from . This is because all the butterfly relations necessary to move up like this never require for any and . This is illustrated in Figure 5. More generally, we have that
is sufficient information to compute
provided that . (In Algorithm 1, that follows, we call this a “self-contained push up”.)
5 Inverse tft Algorithm
Finally, in this section, we present a simple recursive description of the inverse tft algorithm for the case we have restricted ourselves to (all zeroes packed at the end). The algorithm operates in a length array for which we assume access; here corresponds to , a th primitive root of unity. Initially, the content of the array is
where is the result of the tft on .
In keeping with our “Illustrated” description we use pictures, like Figure 6, to indicate what values are known (solid dots ) and what value to calculate (empty dot ). For instance, “push down with Figure 6”, is shorthand for: use and to determine . We emphasize with an arrow that this new value should also overwrite the one at . This calculation is easily accomplished using (1) with a caveat: the values and are not explicitly known. What is known is , and therefore , and some array position . Observe that is recovered by (the quotient of ).
The full description of the inverse tft follows in Algorithm 1; note that the initial call is InvTFT. A visual depiction of Algorithm 1 is given in Figure 9. A sketch of a proof of its correctness follows.
Let , , and be the values of head, tail, and last at the th recursive call. Consider the integer sequences given by
If then we have termination. Otherwise, either branch (7) executes giving
and thus , or branch (13) executes, giving
and thus .
Neither branch can run forever since causes termination and means either strictly decreases or condition (13) fails, forcing termination. ∎
Sketch of correctness.
The Truncated Fourier Transform is a novel and elegant way to reduce the number of computations of a dft-based computation by a possible factor of two (which may be significant). Additionally, with the advent of Harvey and Roche’s paper , it is possible to save as much space as computation. The hidden “cost” of working with the tft algorithm is the increased difficulty of determining the inverse tft. Although in most cases this is still less costly than the inverse dft, the algorithm is no doubt more difficult to implement.
The author wishes to thank Dr. Dan Roche and Dr. Éric Schost for reading a draft of this paper and offering suggestions.
-  K. O. Geddes, S. R. Czapor, and G. Labahn. Algorithms for Computer Algebra. Kluwer Academic Publishers, 1992.
-  David Harvey and Daniel S. Roche. An in-place truncated fourier transform and applications to polynomial multiplication. In Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, ISSAC ’10, pages 325–329, New York, NY, USA, 2010. ACM.
-  H.V. Sorensen and C.S. Burrus. Efficient computation of the dft with only a subset of input or output points. Signal Processing, IEEE Transactions on, 41(3):1184 –1200, mar 1993.
-  J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical report, Université Paris-Sud, Orsay, France, 2008.
-  Joris van der Hoeven. The truncated fourier transform and applications. In ISSAC ’04: Proceedings of the 2004 international symposium on Symbolic and algebraic computation, pages 290–296, New York, NY, USA, 2004. ACM.