 # Constant-time filtering using shiftable kernels

It was recently demonstrated in  that the non-linear bilateral filter  can be efficiently implemented using a constant-time or O(1) algorithm. At the heart of this algorithm was the idea of approximating the Gaussian range kernel of the bilateral filter using trigonometric functions. In this letter, we explain how the idea in  can be extended to few other linear and non-linear filters [14, 17, 2]. While some of these filters have received a lot of attention in recent years, they are known to be computationally intensive. To extend the idea in , we identify a central property of trigonometric functions, called shiftability, that allows us to exploit the redundancy inherent in the filtering operations. In particular, using shiftable kernels, we show how certain complex filtering can be reduced to simply that of computing the moving sum of a stack of images. Each image in the stack is obtained through an elementary pointwise transform of the input image. This has a two-fold advantage. First, we can use fast recursive algorithms for computing the moving sum [15, 6], and, secondly, we can use parallel computation to further speed up the computation. We also show how shiftable kernels can also be used to approximate the (non-shiftable) Gaussian kernel that is ubiquitously used in image filtering.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The function has the remarkable property that, for any translation , . That is, we can express the translate of a sinusoid as the linear combination of two fixed sinusoids. More generally, this holds true for any function of the form . This follows from the addition-multiplication property of the exponential. As special cases, we have the pure exponentials when is real, and the trigonometric functions when is imaginary.

The translation of not all functions can be written in this way, e.g., consider the functions and . The other class of functions that have this property are the polynomials,

. Functions with this property can be realized in higher dimensions using higher-dimensional polynomials and exponentials, or simply by taking the tensor product of the one-dimensional functions.

More generally, we say that a function is shiftable in if there exists a fixed (finite) collection of functions such that, for every translation in , we can write

 φ(x−τ)=c1(τ)φ1(x)+⋯+cN(τ)φN(x). (1)

We call the fixed functions the basis functions, the interpolating coefficients, and the order of shiftability. Note that the coefficients depend on , and are responsible for capturing the translation action.

Shiftable functions, and more generally, steerable functions , have a long history in signal and image processing. Over the last few decades, researchers have found applications of these special functions in various domains such as image analysis [7, 9]

and motion estimation



, and pattern recognition

, to name a few. In several of these applications, the role of steerability and its formal connection with the theory of Lie groups was only recognized much later. We refer the readers to the exposition of Teo  for an excellent account on the theory and practice of steerable functions.

In a recent paper , we showed how specialized trigonometric kernels could be used for fast bilateral filtering. This work was inspired by the work of Porikli , who had earlier shown how polynomials could be used for the same purpose. We now realize that it is the shiftability of the kernel that is at the heart of the matter, and that this can be applied to other forms of filtering and using more general kernels. We will provide a general theory of this in Section 2, where we also propose some algorithms that have a constant-time complexity per pixel, independent of the size of the filtering kernels. To the best of our knowledge, such algorithms have not been reported in the community in its full generality. The problem of designing shiftable kernels is addressed in Section 3. Here we also propose a scheme for approximating the ubiquitous Gaussian kernel using shiftable functions. Finally, in Section 4, we present some thoughts on how shiftability could be used for reducing the complexity of the non-local means filter .

## 2 Filtering using moving sums

We now show how certain constant-time algorithms for image filtering can be obtained using shiftable kernels. The idea is that by using shiftability, we can decompose the local kernels (obtained using translations) in terms of “global” functions – the basis functions. The local information gets encoded in the interpolating coefficients. This allows us to explicitly take advantage of the redundancy in the filtering operation.

To begin with, we consider the simplest neighborhood filter, namely the spatial filter. This is given by

 ¯¯¯f(x)=1η∫Ωφ(y)f(x−y) dy (2)

where . Here is the kernel, and is the neighborhood over which the integral (sum) is taken. Note that, henceforth, we will use the term kernel to specifically mean that the function is symmetric, non-negative, and unimodal over its support (peak at the origin). It is not immediately clear that one can construct such kernels using shiftable functions. We will address this problem in the sequel.

For the moment, suppose that the kernel

is shiftable, so that (1) holds. We use this, along with symmetry, to center the kernel at . In particular, we write . Using linearity, we have

 ∫Ωφ(y)f(x−y) dy=N∑n=1cn(x)∫Ωφn(x−y)f(x−y) dy.

Similarly, .

Now consider the case when is a square, . Then the above integrals are of the form

 ∫[−T,T]2F(x−y) dy.

This is easily recognized as the moving sum of evaluated at . We will use the notation to denote this integral (this is simply the “moving-average”, but without the normalization). As is well-known, this can be efficiently computed using recursion [15, 6].

The main idea here is that, by using shiftable kernels, we can express (2) using moving sums and these in turn can be computed efficiently. In particular, note that the number of computations required for the moving sum is independent of , that is, the size of the neighborhood . These are referred to as the constant-time or algorithms in the image processing community. The main steps of our algorithm are summarized in Algorithm 1.

The above idea can also be extended to non-linear filters such as the edge-preserving bilateral filter [14, 17, 12]. The bilateral filtering of an image is given by the formula

 ~f(x)=1η(x)∫Ωφ(y) ϕ(f(x−y)−f(x)) f(x−y) dy (3)

where .

In this formula, the bivariate function is called the spatial kernel, and the one-dimensional function is called the range kernel. Suppose that both these kernels are shiftable. In particular, let , and .

Plugging these into (3) and using linearity, we can write the numerator as

 ∑m,ncm(x)dn(f(x))∫Ωφm(y)ϕn(f(x−y))f(x−y) dy.

Similarly,

 η(x)=∑m,ncm(x)dn(f(x))∫Ωφm(x−y)ϕn(f(x−y)) dy.

As before, we again recognize the moving sums when . This gives us a new constant-time algorithm for bilateral filtering, which is summarized in Algorithm 2. We note that this algorithm is an extension of the one in , where we used shiftable kernels only for the range filter.

The main computational advantage of Algorithms 1 and 2 is that the number of computations per pixel does not depend on the size of the spatial or the range kernel. It only depends on the order of shiftability. More precisely, the computation time scales linearly with the number of basis functions. For example, in case of the bilateral filter, the computation size is roughly times the computations required to perform a single moving sum of the image, plus the overhead time of initializing the basis images and recombining the outputs. To get an estimate of the running time, we implemented Algorithm 2 in MATLAB on an Intel machine with a quad-core 2.83 GHz processor. We found that a single recursive implementation of the moving sum requires roughly milliseconds on a image when . Considering an instance where , the total computation time was roughly seconds, seconds to initialize the basis images and combine the results, and seconds for the moving sums. This indeed looks promising since we can definitely bring down the time using a JAVA or C compiler. Moreover, we can also use multithreading (parallel computation) to further accelerate the implementation.

## 3 Shiftable kernels

We now address the problem of designing kernels that are shiftable. The theory of Lie groups can be used to study the class of shiftable functions, e.g., see discussions in [13, 8]. It is well-known that the polynomials and the exponentials are essentially the only shiftable functions.

###### Theorem 3.1 (The class of shiftable functions, ).

The only smooth functions that are shiftable are the polynomials and the exponentials, and their sum and product.

We recall that a kernel is a smooth function that is symmetric, non-negative, and unimodal. A priori, it is not at all obvious that there exists a kernel that is a polynomial or exponential. Indeed, the real exponentials cannot form valid kernels since they are neither symmetric nor unimodal. On the other hand, while there are plenty of polynomials and trigonometric functions that are both symmetric and non-negative, it is impossible to find a one that is unimodal over the entire real line. This is simply because the polynomials blow up at infinity, while the trigonometric functions are oscillatory.

###### Proposition 3.2 (Conflicting properties).

There is no kernel that is shiftable on the entire real line.

Nevertheless, unimodality can be achieved at least on a bounded interval, if not the entire real line, using polynomial and trigonometric functions. Note that, in practice, a priori bounds on the data are almost always available. For the rest of the paper, and without loss of generality, we fix the bounded interval to be .

We now give two concrete instances of shiftable kernels on . The reason for these particular choices will be clear in the sequel. For every integer , consider the functions

 pN(t)=(1−t2T2)NandqN(t)=[cos(πt2T)]N,

where takes values in . It is easily verified that these are valid kernels on . The crucial difference between the above kernels is that the order of shiftability of is , while that of is much lower, namely . We note that it is the kernel that was used in .

The fact that the sum and product of shiftable kernels is also shiftable can be used to construct kernels in higher dimensions. For example, we can set , or . We call these the separable kernels. In higher dimensions, one often requires the kernel to have the added property of isotropy. In two dimensions, we can achieve near-isotropy using these separable kernels provided that is sufficiently large. We will give a precise reason later in the section. However, as shown in a different context in , it is worth noting that kernels other than the standard separable ones (of same order) can provide better isotropy, particularly for low orders. Indeed, consider the following kernel on the square :

 ϕ(x1,x2)=q1(x1)q1(x1+x2√2)q1(x2)q1(x1−x2√2). (4)

The kernel is composed of four cosines distributed uniformly over the circle. It can be verified that is more isotropic than the separable kernel of same order, . However, note that the non-negativity constraint is violated on the corners of the square in this case. This is unavoidable since we are trying to approximate a circle within a square. Nevertheless, this does not create much of a problem since the negative overshoot is well within of the peak value. Following the same argument, the polynomial

 ϕ(x1,x2)=p1(x1)p1(x1+x2√2)p1(x2)p1(x1−x2√2)

tends to be more isotropic on than .

### 3.1 Approximation of Gaussian kernels

The most commonly used kernel in image processing is the Gaussian kernel. This kernel, however, is not shiftable. As a result, we cannot directly apply our algorithm for the Gaussian kernel. One straightforward option is to instead approximate the Gaussian using its Fourier series or its Taylor polynomial, both of which are shiftable. The difficulty with either of these is that they do not yield valid kernels. That is, it is not guaranteed that the resulting approximation is non-negative or unimodal; see [5, Fig. 1]. This is exactly the problem with the polynomial approximation used for the bilateral filter in . As against these, we can instead use the specialized shiftable kernels and to approximate the Gaussian to any arbitrarily precision, based on the following results.

###### Theorem 3.3 (Gaussian approximation).

For every ,

 limN⟶∞pN(t√N)=exp(−t2T2), (5)

and

 limN⟶∞qN(t√N)=exp(−π2t28T2). (6)

In either case, the convergence takes place quite rapidly. The first of these results is well-known. For a proof of the second result, we refer the readers to [5, Sec. II-D]

. The added utility of these asymptotic formulas is that we can control the variance of these kernels using the variance of the target Gaussian. This is particularly useful because no simple closed-form expressions for the variance of

and are available. We refer the authors to [5, Sec. II-E] for details on how one can exactly control the variance of . The same idea applies to .

We now discuss how to approximate isotropic Gaussians in two dimensions using shiftable kernels. Doing this using separable kernels is straightforward. For example,

 limN⟶∞qN(x1√N)qN(x2√N)=exp(−π(x21+x22)8T2). (7)

There is yet another form of convergence which is worth mentioning. Consider the following kernel defined on :

 ϕN(x1,x2)=N∏k=1q1(√6N(x1cosθk+x2sinθk)),

where . The kernel is simply the (rescaled) kernel in (4). By slightly adapting the proof of Theorem 2.2 in [3, Appendix A], we can show the following.

###### Theorem 3.4 (Approximation of isotropic Gaussian).

For every in ,

 limN⟶∞ϕN(x1,x2)=exp(−π2(x21+x22)8T2).

Note that the target Gaussian in this case is the same as in (7). The key difference, however, is that for low orders , e.g. , looks more isotropic than . However, as noted earlier, the non-negativity requirement of a kernel is mildly violated by at the corners of the square domain.

The significance of the order is that it allows one to arbitrarily control the accuracy of the Gaussian approximation. As discussed in [5, Sec. II-E], has to be greater than a threshold for the approximating kernels in (5) and (6) to be non-negative and unimodal. In particular, if is the variance of the target Gaussian, then is of the order . Thus, a large is required to approximate a narrow Gaussian on a large interval. For the spatial kernel, the ratio is usually small, since is small in this case. This is, however, not the the case for the range kernel of the bilateral filter, where can almost be as large as the dynamic range of the image. The good news is that, by allowing for mild (and controlled) violations of the non-negativity constraint, one can closely approximate the Gaussian using a significantly lower number of terms. This is done by discarding the less significant basis functions in (1); see  for detailed discussion and results. The same idea can also be applied to the polynomials.

## 4 Discussion

We have shown how, using shiftable kernels, we can express two popular forms of image filters (and possibly many more) in terms of simple moving sums. We note that we can speed up the implementation of Algorithm 1 and 2 using parallel computation or multithreading. In future work, we plan to implement these algorithms in C or JAVA, and make extensive comparison of the result and execution time with the state-of-the-art algorithms.

To conclude, we note that the main idea can also be extended to some other forms of neighborhood filtering, e.g., the ones in [17, 12]. What is even more interesting is that we can extend the idea for approximating the non-local means filter . The non-local means is a higher order generalization of the bilateral filter, where one works with patches instead of single pixels. The filtered image in this case is given by

 ^f(x)=∫f(x−y)w(x,y) dy∫w(x,y) dy (8)

where

 w(x,y)=exp(−h−2∫g(u)(f(x+u)−f(x−y+u))2du)

Here is a two-dimensional Gaussian, and the integrals (sums) are taken over the entire image domain. In practice, though, the sum is performed locally .

It is possible to express (8) in terms of moving sums, using shiftable approximates of the Gaussian. First, we approximate the domain of the outer integral by a sufficiently large square , and the inner integral by a finite sum over neighborhood pixels , where, say, . In this case, is given by

 1η(x)∫[−T,T]2 f(x−y)φ(f(x+u1)−f(x−y+u1),… …,f(x+up)−f(x−y+up))dy (9)

where , and where is given by

 ∫[−T,T]2 φ(f(x+u1)−f(x−y+u1),… …,f(x+up)−f(x−y+up))dy. (10)

Note that is an anisotropic Gaussian in variables with covariance . Now, using a shiftable approximation (we continue using the same symbol) of as in (1), we can write the integrand in (4) as .

We can then write the numerator in (4) as , where we set . Similarly, letting , we have .

The catch here is that it get a good approximation of non-local means we need to make both and large. While there is no problem in making large (the cost of the moving sum is independent of ), it is rather difficult to make large. For example, with a separable approximation of , the overall order would scale as , where is the approximation order of the Gaussian along each dimension. This limits the scheme to coarse approximations, and to small neighborhoods. To make this practical, we need a a polynomial or trigonometric approximation of whose order grows slowly with . It would indeed be interesting to see if such approximations exist at all.

## 5 Acknowledgment

The author would like to thank Michael Unser for reading the manuscript and for his useful comments. The author was supported by a fellowship from the Swiss National Science Foundation under grant PBELP2-.

## References

•  E. Adelson and J. Bergen. Spatiotemporal energy models for the preception of motion. J. Optical Society of America, 2(2):284–299, 1985.
•  A. Buades, B. Coll, and J.M. Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling and Simulation, 4:490–530, 2005.
•  K.N. Chaudhury, A. Muñoz-Barrutia, and M. Unser. Fast space-variant elliptical filtering using box splines. IEEE Trans. Image Process., 19:2290–2306, 2010.
•  K.N. Chaudhury, D. Sage, and M. Unser. Appendix to ”Fast bilateral filtering using trigonometric range kernels.”. Technical report, Ecole Polytechnique Federale de Laussanne, http://bigwww.epfl.ch/preprints/chaudhury1101pdoc01.pdf, 2011.
•  K.N. Chaudhury, D. Sage, and M. Unser. Fast bilateral filtering using trigonometric range kernels. IEEE Trans. Image Process., 2011.
•  F.C. Crow. Summed-area tables for texture mapping. ACM Siggraph, 18:207–212, 1984.
•  W. Freeman and E. Adelson. The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell., 13(9):8891–906, 1991.
•  Y. Hel-Or and P.C. Teo. Canonical decomposition of steerable functions. J. Math. Imaging Vision, 9(1):83–95, 1998.
•  P. Perona. Deformable kernels for early vision. IEEE Trans. Pattern Anal. Mach. Intell., 17(5):488–499, 1991.
•  F. Porikli. Constant time bilateral filtering. IEEE CVPR, pages 1–8, 2008.
•  E. Simoncelli, W. Freeman, E. Adelson, and D. Heeger. Shiftable multiscale transforms. IEEE Trans. Information Theory, 38(2):587–607, 1992.
•  S. M. Smith and J. M. Brady. Susan–A new approach to low level image processing. Int. J. Comp. Vision, 23:45–78, 1997.
•  P.C. Teo. Theory and applications of steerable functions. Technical report, Stanford University, 1998.
•  C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. IEEE ICCV, pages 839–846, 1998.
•  P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. IEEE CVPR, pages 511–518, 2001.
•  A. Watson and A. Ahumada. Models of human visual-motion sensing. J. Optical Society of America, 2(2):322–342, 1985.
•  L.P. Yaroslavsky. Digital Picture Processing–An Introduction. Springer-Verlag, Berlin, 1985.