1 Introduction
Image recognition systems have become wide spread in resent years. They are used in many application areas including mobile devices and embedded systems. Practical usage on different kinds of hardware imposes some constrains on algorithms for those applications. For example, algorithms with high computational complexity can not be used in realtime, however, decrease of computational complexity in many cases leads to poor recognition quality. Moreover, mobile devices do not have as powerful graphics processing units (GPUs) as those of work stations or specialized servers. Also it is hard to develop crossplatform applications for GPUs because of the lack of standard tools supporting various devices. Sometimes data transfer from central processing unit (CPU) to GPU and back may take quite a while compared to computation time. All those factors force us to conduct calculations on CPU for mobile devices. To increase computational performance we can use availiable CPU cores and SIMD extensions, if they are supported by the architecture [1, 2].
One of basic operations of image processing is projective transformation. It is used in the number of image recognition systems for rectification of input image from camera (for example in realtime document recognition systems [3], realtime recognition of TV content [4], or visual navigation of unmanned aerial vehicles [5]). Another application area of projective transformation is rendering of 3D scenes in computer graphics [6] in case they are not rendered by computationally complex ray tracing algorithm [7]. This area is widely described in literature, however, authors of such algorithms do not usually pay much attention to computational efficiency and implementation aspects important for mobile devices.
It is also worth noting, that there are many methods of interpolation and antialiasing used in projective transformation. It happens due to the fact that some parts of image may be scaled up or down as a result of transformation. In upscaled parts interpolation is needed to get value of signal ”between” pixels of the source image. In downscaled parts some actions need to be taken to avoid aliasing, which can cause artifacts such as moire pattern.
In this article we look into different variations of projective image transformation, which use common methods of interpolation and antialiasing. Most of those methods are well known and have been developed in 90s or early 2000s however became forgotten in sphere of recognition systems in recent years due to increase in computational power, but problems of computational efficiency remain crutial for mobile applications. We compare computational complexity of these algorithms and their quality and provide experimental time measurements on mobile CPU with ARM architecture. The results allow us to choose algorithm of projective transformation that fits constraints of specific task on mobile device.
2 Projective image transformation
Let us consider how digital image of planar object is formed by camera. Digital image is a result of sampling of optical image, formed by camera. In case of simple camera models (for example, camera obscura or pinhole camera) is just a central projection of planar object. Since, optical images of the planar object formed in two different points of view are both central projections, they are connected by projective transformation of a plane, so if we need an image as if it was taken from a different view point (e.g. orthogonal projection of a document), we can compute it using projective transformation of an image we already have. Projective transformation in Cartesian coordinate system can be expressed as: (
1).(1) 
where is the matrix of projective transformation, and are the coordinates on planes of original and transformed images.
However, we can not apply (1) directly to compute projective transformation of one digital image into another, because both input and output images are discrete. According to [8] projective transformation of one digital image is special case of image resampling problem, which consists of 3 stages:

Image reconstruction. During this stage we retrieve continuous approximation of optical image from digital image , where is image size.

Spatial transformation. On this stage we apply spatial transformation, which is in our case projective transformation (1), to get optical image from another point of view .

Image sampling. Finally, we sample optical image to get digital one .
However, these steps are not an algorithm, because second stage can not be directly computed using finite number of operations. That is why we consider algorithms based on inverse mapping:

We choose coordinates of sampling points on the output image, probably more than one per pixel (sampling grid).

For each point we calculate its coordinates on input image using inverse transformation of (1).

For each point we calculate value of it’s intensity using selected interpolation of nearby pixels of input image.

Finally, we compute values of pixels of output image from sampling points.
In these algorithms we can choose different interpolation functions and ways of selection and weighting of sampling points.
2.1 Interpolation
Interpolation is a way to get values of image intensity ”between” pixels of source image. It is necessity becomes obvious when we deal with image magnification and as projective transformation can lead to magnification of some image parts we need interpolation in our algorithm.
If we assume usage of simple image sampling model, which defines value of a pixel as an intensity of an optical image in its center, we can in theory derive ideal restoration method. This method is a convolution with ideal low pass filter (LPF), and it will restore optical image precisely, if Nyquist frequency (half of a sampling frequency) is lower than maximal frequency in Fourier spectrum of an optical image [9]. In fact this is not a reallife scenario for several reasons: firstly, sampling in real optical systems is not a point sampling and, which is more important, in real objects have sharp borders and therefore image has infinite Fourier spectrum and ideal restoration is not possible.
So, we consider kernelbased interpolation functions (2). There are different approaches to image reconstruction, for example adaptive interpolation based on local gradient features [10] or local isophote orientation [11]. However, they are more computationally complex than nonadaptive methods, so we do not consider them in this work. In all examples we provide onedimensional interpolation
(2) 
where define pixels of the input image, is interpolation kernel.
Nearestpixel interpolation is the most computationally simple interpolation, which uses boxfilter (3). Its Fourier spectrum is very different from that of LPF (Fig. 1a).
(3) 
Bilinear interpolation is twodimensional analog of linear interpolation. Its kernel (4) is continuous therefore optical image, reconstructed by this kernel, is also continuous (Fig. 1b). It works good enough far from edges, but smooths object edges.
(4) 
Bicubic interpolation is twodimensional analog of linear cubic interpolation. Its kernel has continuous first derivative and so has optical image, reconstructed by this kernel (5) (Fig. 1d). Parameter is usually set to [12]. Far from edges this interpolation produces smooth image that looks fine, however, artifact known as halo appears on the edges (bright line alongside brighter side of the edge and dark line alongside darker).
(5) 
To get rid of halo we need kernel to be positive. We also want it to be smooth and have small support (so that we use as small number of pixels as possible for computation of interpolated value). Kernel of cubic Bspline (6) satisfies these conditions. Its Fourier spectrum is very thin (Fig. 1c), that means reconstructed image is oversmoothed, which however can be a good thing, if the input image is nosy.
(6) 
Cubic Hermite spline is yet another interpolation method. If we define Hermit polynomials as:
(7) 
than interpolation function on between 0 and 1 will be:
(8) 
where are values of target function we want to reconstruct and are its derivatives. It may seem that this is not kernel interpolation, however, if we select fixed difference approximation for derivative, it will become one. For example, if we approximate derivative as:
(9) 
we will get cubic kernel with . So, to get a different kernel and to increase quality of interpolation we choose [13]:
(10) 
resulting filter’s Fourier spectrum is slightly closer to that of LPF, than spectrum of cubic kernel (Fig. 1d). This interpolation method still leads to halo artifact.
Computational complexity of all the above methods is defined by the number of multiplications and number of memory accesses (number of pixels of input image, needed for interpolation of one point) per pixel point. We can decrease the number of multiplications to number of memory accesses, if we approximate kernel by piecewise constant function, and save it into precalculated table [14]. In table 1 we represent computational complexity of methods described above. In Fig. 2 there are an examples of character ”A”, which are upscaled 10 times using above interpolation methods.
Interpolation  Multiplications per point  Memory accesses per point 

Nearest pixel  0  1 
Bilinear  8  4 
Bicubic  68  16 
Cubic Bspline  68  16 
Cubic Hermite spline  76  36 
2.2 Sampling
We consider that digital image is computed from optical through convolution with sampling filter (11).
(11) 
Though this expression can not be calculated directly and sampling filter is defined by physical properties of optical system, we can approximate it using finite number of sampling points and some model of a filter. However, if sampling rate is too low, there will be artifacts on the resulting image caused by aliasing of Fourier spectra, because conditions of Nyquist–Shannon sampling theorem are not met. So, once again we need to balance between the computational complexity and the quality of an output image.
The simplest sampling method is point sampling, where . We simply calculate value in center of every pixel of the output image. However it leads to the most significant aliasing artifacts.
To avoid those artifacts we can calculate value using sampling grid of smaller step and then take average over pixel. This method is known as supersamplig and uses equal to a boxfilter. Choosing different will lead to similar methods, which will nevertheless have significantly higher computational complexity, than pointsampling.
One may notice that aliasing artifacts appear as a result of undersampling in downscaled parts of image, so to avoid them we can downscale input image, and then apply point sampling. This process is called prefiltering. A way to implement it is using a mipmap [15]
, which is an image pyramid where an image on each next level in twice downscaled copy of a previous image. For each pixel of output image we estimate local scale factor (how significantly this part of image is downscaled) and choose mipmap level accordingly. This estimation can be done as follows
[16]. The scale factor is , where and are coordinates on the output image, and are coordinates on the input one. To avoid sharp borders between regions, where different levels of mipmap are selected, we use linear interpolation between two nearest mipmap layers.Mipmap approximates an area of output pixel on the input image (which is either quadrangle, if we consider rectangular , or ellipse, if has radial symmetry) by a square, which is incorrect in general case, especially if projection is highly anisotropic. To partly overcome this problem we can use data structure like mipmap, which scales image along its height and width independently [17]. It is known as ripmap. Ripmap approximates pixel area as rectangular with sides parallel to coordinate axis of input image. However, the direction of anisotropy is not necessary parallel to this direction.
To overcome this problem, we can combine ideas of mipmap prefiltering and supersamplig. Authors of algorithm FAST [18] proposed to use jittered sampling over mipmap level, defined by minor axis of anisotropy, where the number of samples is defined by major axis.
Computational complexity of sampling methods is defined by number of samples per pixel and number of multiplications per sample. If we decompose backward projection as:
(12) 
where , is matrix of backward projection, we will need 2 multiplications and 1 division per sample of uniform sampling grid, because and can be precomputed. Calculating of partial derivatives, rescaling and interpolation between levels for mipmap and ripmap will add 5 multiplications per sample (derivatives can be decomposed as well). FAST sampling may require a lot of samples per pixel in case of significant anisotropy, so their number is usually limited with some constant . Also mipmap and ripmap require additional memory for data structure. In the Table 2 we represent computational complexity of the methods described above. In the Figure 3 the examples of the methods are shown in process of projective transformation of letter ”A”.
Sampling  Samples per pixel  Multiplications per sample  Aditional memory 

(in image size)  
Point sampling  1  2  0 
Supersampling  2  0  
Mipmap prefiltering  2  7  
Ripmap prefiltering  4  7  3 
FAST  7 
3 Experiments
To experimentally estimate computational complexity and quality of different algorithms of projective image transformation, we apply a set of 3 projective transformations to an input image, so that their composition is equal transformation, then measure average time and average quality metric PSNR (13) between the input and the output images.
(13) 
To conduct experiment we took 20 document images of size pixels and 10 matrix sets and measured time and quality on microcomputer ODROIDXU4 with ARM CortexA7 CPU (2000 MHz and 2GB of RAM). The results of the experiment are represented in Table 3.
Experiments confirm that usage of nearest pixel as interpolation or point sampling as sampling method leads to pure quality, Hermite spline interpolation and supersampling are too computationally complex and take much time to apply. This makes those methods ineffective in practical realtime applications for mobile devices. It worth mentioning, that mipmap prefiltering and cubic Bspline interpolation lead to image blure and as a result to low PSNR, however, in some cases that may not be a serious problem. Usage of bicubic and cubic Bspline interpolation with any sampling method but point sampling also leads to high computational complexity and working time. Ripmap prefiltering requires a lot of additional memory and at the same time does not work as good as FAST, while having approximately the same computational complexity and time of work. So, the best choice for realtime applications on mobile devices seems to be algorithms of image projective transformation based on combination of bilinear interpolation with either mipmap or FAST sampling. The first works faster, but the letter provides better quality.
Sampling  Interpolation  Time (sec)  PSNR (dB) 

Point sampling 
Nearest pixel  0.074  23.1 
Bilinear  0.161  25.4  
Bicubic  0.496  26.0  
Cubic Bspline  0.517  24.8  
Hermite spline  0.978  26.2  
Supersampling (49 samples) 
Nearest pixelе  4.01  25.6 
Bilinear  7.76  25.3  
Bicubic  23.7  26.1  
Cubic Bspline  24.7  24.6  
Hermite spline  46.9  26.2  
Mipmap prefiltering 
Nearest pixel  0.194  24.0 
Bilinear  0.302  24.6  
Bicubic  0.770  25.3  
Cubic Bspline  0.800  23.9  
Hermite spline  1.56  25.5  
Ripmap prefilterin 
Nearest pixel  0.328  24.2 
Bilinear  0.510  25.0  
Bicubic  1.231  25.8  
Cubic Bspline  1.270  24.2  
Hermite spline  2.41  25.8  
FAST 
Nearest pixel  0.342  24.0 
Bilinear  0.539  25.2  
Bicubic  1.31  26.0  
Cubic Bspline  1.36  24.4  
Hermite spline  2.42  26.2 
4 Discussion & Conclusion
In this work we compared different algorithms of projective transformation of digital image that use different interpolation methods: nearest pixel, bilinear, Bspline, bicubic, Hermite spline and different sampling techniques: point sampling, supersampling, mipmap prefiltering, ripmap prefiltering and FAST. Estimation of computational complexity and experiments on typical mobile CPU confirmed that bicubic, Bspline and especially Hermite spline interpolations are too hard to compute in realtime mobile applications, while quality of biliner interpolation is higher than that of nearest pixel interpolation. At the same time supersampling is also too computationally complex, ripmap prefiltering requires more memory, while its quality is lower than that of FAST sampling and point sampling produces significant aliasing artifacts. According to those results best choices or realtime applications are algorithms of projective image transformation that use bilinear interpolation combined with either mipmap prefiltering or FAST sampling.
It worth mentioning, that if application does not have realtime constraint one might prefer bicubic, Bspine or one of adaptive interpolation methods to increase quality of projective transformation.
Acknowledgements.
This work is partially supported by Russian Foundation for Basic Research (projects 172903297, 180701387).References

[1]
E. Limonova, D. Ilin, and D. Nikolaev, “Improving neural network performance on SIMD architectures,” in ICMV 2015 ,
9875(98750L), 1–6, SPIE (2015). DOI: 10.1117/12.2228594.  [2] E. Limonova, A. Terekhin, D. Nikolaev, and V. Arlazarov, “Fast implementation of morphological filtering using ARM NEON extension,” IJAER 11(24), 11675–11680 (2016).
 [3] K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream,” in CBDAR 2017 , 39–44 (2018). DOI: 10.1109/ICDAR.2017.347.
 [4] N. S. Skoryukina, T. S. Chernov, K. B. Bulatov, D. P. Nikolaev, and V. L. Arlazarov, “Snapscreen: Tvstream frame search with projectively distorted and noisy query,” in ICMV 2016 , Petia RadevaD. P. N. W. Z. J. Z. Antanas Verikas, ed., 10341, 1–5, SPIE, Bellingham, Washington 982270010 USA (July 2017).
 [5] D. Stepanov, “Techniques of feature points matching in the problem of uav’s visual navigation,” Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering 4(4) (2015).
 [6] P. S. Heckbert, “Survey of texture mapping,” IEEE computer graphics and applications 6(11), 56–67 (1986).
 [7] A. Maltsev and M. Mikhayluk, “Regular grid construction and using for realtime high realistic 3d scene visualization,” Journal of information technologies and computing systems (4), 49–58 (2012).
 [8] G. Wolberg, Digital image warping , vol. 10662, IEEE computer society press Los Alamitos, CA (1990).
 [9] P. S. Diniz, E. A. Da Silva, and S. L. Netto, Digital signal processing: system analysis and design , Cambridge University Press (2010).
 [10] J. W. Hwang and H. S. Lee, “Adaptive image interpolation based on local gradient features,” IEEE signal processing letters 11(3), 359–362 (2004).
 [11] Q. Wang and R. K. Ward, “A new orientationadaptive interpolation method,” IEEE Transactions on Image Processing 16(4), 889–900 (2007).
 [12] R. Keys, “Cubic convolution interpolation for digital image processing,” IEEE transactions on acoustics, speech, and signal processing 29(6), 1153–1160 (1981).
 [13] R. Seta, K. Okubo, and N. Tagawa, “Digital image interpolation method using higherorder hermite interpolating polynomials with compact finitedifference,” in Proceedings: APSIPA ASC 2009: AsiaPacific Signal and Information Processing Association, 2009 Annual Summit and Conference , 406–409 (2009).
 [14] E. A. Feibush, M. Levoy, and R. L. Cook, “Synthetic texturing using digital filters,” in ACM SIGGRAPH Computer Graphics , 14, 294–301, ACM (1980).
 [15] L. Williams, “Pyramidal parametrics,” in Acm siggraph computer graphics , 17, 1–11, ACM (1983).
 [16] J. P. Ewins, M. D. Waller, M. White, and P. F. Lister, “Mipmap level selection for texture mapping,” IEEE Transactions on Visualization and Computer Graphics 4(4), 317–329 (1998).
 [17] P. S. Heckbert, “Fundamentals of texture mapping and image warping,” (1989).
 [18] B. Chen, F. Dachille, and A. E. Kaufman, “Footprint area sampled texturing,” IEEE Transactions on Visualization and Computer Graphics 10(2), 230–240 (2004).
Comments
There are no comments yet.