The Analysis of Projective Transformation Algorithms for Image Recognition on Mobile Devices

12/03/2019 ∙ by Anton Trusov, et al. ∙ 0

In this work we apply commonly known methods of non-adaptive interpolation (nearest pixel, bilinear, B-spline, bicubic, Hermite spline) and sampling (point sampling, supersampling, mip-map pre-filtering, rip-map pre-filtering and FAST) to the problem of projective image transformation. We compare their computational complexity, describe their artifacts and than experimentally measure their quality and working time on mobile processor with ARM architecture. Those methods were widely developed in the 90s and early 2000s, but were not in an area of active research in resent years due to a lower need in computationally efficient algorithms. However, real-time mobile recognition systems, which collect more and more attention, do not only require fast projective transform methods, but also demand high quality images without artifacts. As a result, in this work we choose methods appropriate for those systems, which allow to avoid artifacts, while preserving low computational complexity. Based on the experimental results for our setting they are bilinear interpolation combined with either mip-map pre-filtering or FAST sampling, but could be modified for specific use cases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image recognition systems have become wide spread in resent years. They are used in many application areas including mobile devices and embedded systems. Practical usage on different kinds of hardware imposes some constrains on algorithms for those applications. For example, algorithms with high computational complexity can not be used in real-time, however, decrease of computational complexity in many cases leads to poor recognition quality. Moreover, mobile devices do not have as powerful graphics processing units (GPUs) as those of work stations or specialized servers. Also it is hard to develop cross-platform applications for GPUs because of the lack of standard tools supporting various devices. Sometimes data transfer from central processing unit (CPU) to GPU and back may take quite a while compared to computation time. All those factors force us to conduct calculations on CPU for mobile devices. To increase computational performance we can use availiable CPU cores and SIMD extensions, if they are supported by the architecture [1, 2].

One of basic operations of image processing is projective transformation. It is used in the number of image recognition systems for rectification of input image from camera (for example in real-time document recognition systems [3], real-time recognition of TV content [4], or visual navigation of unmanned aerial vehicles [5]). Another application area of projective transformation is rendering of 3D scenes in computer graphics [6] in case they are not rendered by computationally complex ray tracing algorithm [7]. This area is widely described in literature, however, authors of such algorithms do not usually pay much attention to computational efficiency and implementation aspects important for mobile devices.

It is also worth noting, that there are many methods of interpolation and anti-aliasing used in projective transformation. It happens due to the fact that some parts of image may be scaled up or down as a result of transformation. In up-scaled parts interpolation is needed to get value of signal ”between” pixels of the source image. In down-scaled parts some actions need to be taken to avoid aliasing, which can cause artifacts such as moire pattern.

In this article we look into different variations of projective image transformation, which use common methods of interpolation and anti-aliasing. Most of those methods are well known and have been developed in 90s or early 2000s however became forgotten in sphere of recognition systems in recent years due to increase in computational power, but problems of computational efficiency remain crutial for mobile applications. We compare computational complexity of these algorithms and their quality and provide experimental time measurements on mobile CPU with ARM architecture. The results allow us to choose algorithm of projective transformation that fits constraints of specific task on mobile device.

2 Projective image transformation

Let us consider how digital image of planar object is formed by camera. Digital image is a result of sampling of optical image, formed by camera. In case of simple camera models (for example, camera obscura or pinhole camera) is just a central projection of planar object. Since, optical images of the planar object formed in two different points of view are both central projections, they are connected by projective transformation of a plane, so if we need an image as if it was taken from a different view point (e.g. orthogonal projection of a document), we can compute it using projective transformation of an image we already have. Projective transformation in Cartesian coordinate system can be expressed as: (

1).

(1)

where is the matrix of projective transformation, and are the coordinates on planes of original and transformed images.

However, we can not apply (1) directly to compute projective transformation of one digital image into another, because both input and output images are discrete. According to [8] projective transformation of one digital image is special case of image resampling problem, which consists of 3 stages:

  1. Image reconstruction. During this stage we retrieve continuous approximation of optical image from digital image , where is image size.

  2. Spatial transformation. On this stage we apply spatial transformation, which is in our case projective transformation (1), to get optical image from another point of view .

  3. Image sampling. Finally, we sample optical image to get digital one .

However, these steps are not an algorithm, because second stage can not be directly computed using finite number of operations. That is why we consider algorithms based on inverse mapping:

  1. We choose coordinates of sampling points on the output image, probably more than one per pixel (sampling grid).

  2. For each point we calculate its coordinates on input image using inverse transformation of (1).

  3. For each point we calculate value of it’s intensity using selected interpolation of nearby pixels of input image.

  4. Finally, we compute values of pixels of output image from sampling points.

In these algorithms we can choose different interpolation functions and ways of selection and weighting of sampling points.

2.1 Interpolation

Interpolation is a way to get values of image intensity ”between” pixels of source image. It is necessity becomes obvious when we deal with image magnification and as projective transformation can lead to magnification of some image parts we need interpolation in our algorithm.

If we assume usage of simple image sampling model, which defines value of a pixel as an intensity of an optical image in its center, we can in theory derive ideal restoration method. This method is a convolution with ideal low pass filter (LPF), and it will restore optical image precisely, if Nyquist frequency (half of a sampling frequency) is lower than maximal frequency in Fourier spectrum of an optical image [9]. In fact this is not a real-life scenario for several reasons: firstly, sampling in real optical systems is not a point sampling and, which is more important, in real objects have sharp borders and therefore image has infinite Fourier spectrum and ideal restoration is not possible.

So, we consider kernel-based interpolation functions (2). There are different approaches to image reconstruction, for example adaptive interpolation based on local gradient features [10] or local isophote orientation [11]. However, they are more computationally complex than non-adaptive methods, so we do not consider them in this work. In all examples we provide one-dimensional interpolation

(2)

where define pixels of the input image, is interpolation kernel.

Nearest-pixel interpolation is the most computationally simple interpolation, which uses box-filter (3). Its Fourier spectrum is very different from that of LPF (Fig. 1a).

(3)

Bilinear interpolation is two-dimensional analog of linear interpolation. Its kernel (4) is continuous therefore optical image, reconstructed by this kernel, is also continuous (Fig. 1b). It works good enough far from edges, but smooths object edges.

(4)

Bicubic interpolation is two-dimensional analog of linear cubic interpolation. Its kernel has continuous first derivative and so has optical image, reconstructed by this kernel (5) (Fig. 1d). Parameter is usually set to [12]. Far from edges this interpolation produces smooth image that looks fine, however, artifact known as halo appears on the edges (bright line alongside brighter side of the edge and dark line alongside darker).

(5)

To get rid of halo we need kernel to be positive. We also want it to be smooth and have small support (so that we use as small number of pixels as possible for computation of interpolated value). Kernel of cubic B-spline (6) satisfies these conditions. Its Fourier spectrum is very thin (Fig. 1c), that means reconstructed image is over-smoothed, which however can be a good thing, if the input image is nosy.

(6)

Cubic Hermite spline is yet another interpolation method. If we define Hermit polynomials as:

(7)

than interpolation function on between 0 and 1 will be:

(8)

where are values of target function we want to reconstruct and are its derivatives. It may seem that this is not kernel interpolation, however, if we select fixed difference approximation for derivative, it will become one. For example, if we approximate derivative as:

(9)

we will get cubic kernel with . So, to get a different kernel and to increase quality of interpolation we choose [13]:

(10)

resulting filter’s Fourier spectrum is slightly closer to that of LPF, than spectrum of cubic kernel (Fig. 1d). This interpolation method still leads to halo artifact.

Computational complexity of all the above methods is defined by the number of multiplications and number of memory accesses (number of pixels of input image, needed for interpolation of one point) per pixel point. We can decrease the number of multiplications to number of memory accesses, if we approximate kernel by piece-wise constant function, and save it into pre-calculated table [14]. In table 1 we represent computational complexity of methods described above. In Fig. 2 there are an examples of character ”A”, which are up-scaled 10 times using above interpolation methods.

Interpolation Multiplications per point Memory accesses per point
Nearest pixel 0 1
Bilinear 8 4
Bicubic 68 16
Cubic B-spline 68 16
Cubic Hermite spline 76 36
Table 1: Computational complexity of interpolation methods.
a) b) c) d)
Figure 1: Interpolation kernels (fist row) and Fourier spectrum (second row). a) Nearest pixel, b) Bilinear, c) B-spline, d) Bicubic and Hermite spline
a) b) c) d) e)
Figure 2: Interpolation examples. a) Nearest pixel, b) Bilinear, c) B-spline, d) Bicubic, e) Hermite spline

2.2 Sampling

We consider that digital image is computed from optical through convolution with sampling filter (11).

(11)

Though this expression can not be calculated directly and sampling filter is defined by physical properties of optical system, we can approximate it using finite number of sampling points and some model of a filter. However, if sampling rate is too low, there will be artifacts on the resulting image caused by aliasing of Fourier spectra, because conditions of Nyquist–Shannon sampling theorem are not met. So, once again we need to balance between the computational complexity and the quality of an output image.

The simplest sampling method is point sampling, where . We simply calculate value in center of every pixel of the output image. However it leads to the most significant aliasing artifacts.

To avoid those artifacts we can calculate value using sampling grid of smaller step and then take average over pixel. This method is known as supersamplig and uses equal to a box-filter. Choosing different will lead to similar methods, which will nevertheless have significantly higher computational complexity, than point-sampling.

One may notice that aliasing artifacts appear as a result of undersampling in down-scaled parts of image, so to avoid them we can downscale input image, and then apply point sampling. This process is called pre-filtering. A way to implement it is using a mip-map [15]

, which is an image pyramid where an image on each next level in twice down-scaled copy of a previous image. For each pixel of output image we estimate local scale factor (how significantly this part of image is down-scaled) and choose mip-map level accordingly. This estimation can be done as follows

[16]. The scale factor is , where and are coordinates on the output image, and are coordinates on the input one. To avoid sharp borders between regions, where different levels of mip-map are selected, we use linear interpolation between two nearest mip-map layers.

Mip-map approximates an area of output pixel on the input image (which is either quadrangle, if we consider rectangular , or ellipse, if has radial symmetry) by a square, which is incorrect in general case, especially if projection is highly anisotropic. To partly overcome this problem we can use data structure like mip-map, which scales image along its height and width independently [17]. It is known as rip-map. Rip-map approximates pixel area as rectangular with sides parallel to coordinate axis of input image. However, the direction of anisotropy is not necessary parallel to this direction.

To overcome this problem, we can combine ideas of mip-map pre-filtering and supersamplig. Authors of algorithm FAST [18] proposed to use jittered sampling over mip-map level, defined by minor axis of anisotropy, where the number of samples is defined by major axis.

Computational complexity of sampling methods is defined by number of samples per pixel and number of multiplications per sample. If we decompose backward projection as:

(12)

where , is matrix of backward projection, we will need 2 multiplications and 1 division per sample of uniform sampling grid, because and can be precomputed. Calculating of partial derivatives, rescaling and interpolation between levels for mip-map and rip-map will add 5 multiplications per sample (derivatives can be decomposed as well). FAST sampling may require a lot of samples per pixel in case of significant anisotropy, so their number is usually limited with some constant . Also mip-map and rip-map require additional memory for data structure. In the Table 2 we represent computational complexity of the methods described above. In the Figure 3 the examples of the methods are shown in process of projective transformation of letter ”A”.

Sampling Samples per pixel Multiplications per sample Aditional memory
(in image size)
Point sampling 1 2 0
Supersampling 2 0
Mip-map pre-filtering 2 7
Rip-map pre-filtering 4 7 3
FAST 7
Table 2: Computational complexity of sampling methods.
a) b) c) d) e)
Figure 3: Sampling examples. a) Point sampling, b) Supersampling, c) Mip-map pre-filtering, d) Rip-map pre-filtering, e) FAST

3 Experiments

To experimentally estimate computational complexity and quality of different algorithms of projective image transformation, we apply a set of 3 projective transformations to an input image, so that their composition is equal transformation, then measure average time and average quality metric PSNR (13) between the input and the output images.

(13)

To conduct experiment we took 20 document images of size pixels and 10 matrix sets and measured time and quality on microcomputer ODROID-XU4 with ARM Cortex-A7 CPU (2000 MHz and 2GB of RAM). The results of the experiment are represented in Table 3.

Experiments confirm that usage of nearest pixel as interpolation or point sampling as sampling method leads to pure quality, Hermite spline interpolation and supersampling are too computationally complex and take much time to apply. This makes those methods ineffective in practical real-time applications for mobile devices. It worth mentioning, that mip-map pre-filtering and cubic B-spline interpolation lead to image blure and as a result to low PSNR, however, in some cases that may not be a serious problem. Usage of bicubic and cubic B-spline interpolation with any sampling method but point sampling also leads to high computational complexity and working time. Rip-map pre-filtering requires a lot of additional memory and at the same time does not work as good as FAST, while having approximately the same computational complexity and time of work. So, the best choice for real-time applications on mobile devices seems to be algorithms of image projective transformation based on combination of bilinear interpolation with either mip-map or FAST sampling. The first works faster, but the letter provides better quality.

Sampling Interpolation Time (sec) PSNR (dB)

Point sampling
Nearest pixel 0.074 23.1
Bilinear 0.161 25.4
Bicubic 0.496 26.0
Cubic B-spline 0.517 24.8
Hermite spline 0.978 26.2

Supersampling (49 samples)
Nearest pixelе 4.01 25.6
Bilinear 7.76 25.3
Bicubic 23.7 26.1
Cubic B-spline 24.7 24.6
Hermite spline 46.9 26.2

Mip-map pre-filtering
Nearest pixel 0.194 24.0
Bilinear 0.302 24.6
Bicubic 0.770 25.3
Cubic B-spline 0.800 23.9
Hermite spline 1.56 25.5

Rip-map pre-filterin
Nearest pixel 0.328 24.2
Bilinear 0.510 25.0
Bicubic 1.231 25.8
Cubic B-spline 1.270 24.2
Hermite spline 2.41 25.8

FAST
Nearest pixel 0.342 24.0
Bilinear 0.539 25.2
Bicubic 1.31 26.0
Cubic B-spline 1.36 24.4
Hermite spline 2.42 26.2
Table 3: Time and quality measurements for considered interpolation and sampling methods.

4 Discussion & Conclusion

In this work we compared different algorithms of projective transformation of digital image that use different interpolation methods: nearest pixel, bilinear, B-spline, bicubic, Hermite spline and different sampling techniques: point sampling, supersampling, mip-map pre-filtering, rip-map pre-filtering and FAST. Estimation of computational complexity and experiments on typical mobile CPU confirmed that bicubic, B-spline and especially Hermite spline interpolations are too hard to compute in real-time mobile applications, while quality of biliner interpolation is higher than that of nearest pixel interpolation. At the same time supersampling is also too computationally complex, rip-map pre-filtering requires more memory, while its quality is lower than that of FAST sampling and point sampling produces significant aliasing artifacts. According to those results best choices or real-time applications are algorithms of projective image transformation that use bilinear interpolation combined with either mip-map pre-filtering or FAST sampling.

It worth mentioning, that if application does not have real-time constraint one might prefer bicubic, B-spine or one of adaptive interpolation methods to increase quality of projective transformation.

Acknowledgements.
This work is partially supported by Russian Foundation for Basic Research (projects 17-29-03297, 18-07-01387).

References

  • [1]

    E. Limonova, D. Ilin, and D. Nikolaev, “Improving neural network performance on SIMD architectures,” in ICMV 2015 ,

    9875(98750L), 1–6, SPIE (2015).
    DOI: 10.1117/12.2228594.
  • [2] E. Limonova, A. Terekhin, D. Nikolaev, and V. Arlazarov, “Fast implementation of morphological filtering using ARM NEON extension,” IJAER 11(24), 11675–11680 (2016).
  • [3] K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream,” in CBDAR 2017 , 39–44 (2018). DOI: 10.1109/ICDAR.2017.347.
  • [4] N. S. Skoryukina, T. S. Chernov, K. B. Bulatov, D. P. Nikolaev, and V. L. Arlazarov, “Snapscreen: Tv-stream frame search with projectively distorted and noisy query,” in ICMV 2016 , Petia RadevaD. P. N. W. Z. J. Z. Antanas Verikas, ed., 10341, 1–5, SPIE, Bellingham, Washington 98227-0010 USA (July 2017).
  • [5] D. Stepanov, “Techniques of feature points matching in the problem of uav’s visual navigation,” Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering 4(4) (2015).
  • [6] P. S. Heckbert, “Survey of texture mapping,” IEEE computer graphics and applications 6(11), 56–67 (1986).
  • [7] A. Maltsev and M. Mikhayluk, “Regular grid construction and using for real-time high realistic 3d scene visualization,” Journal of information technologies and computing systems (4), 49–58 (2012).
  • [8] G. Wolberg, Digital image warping , vol. 10662, IEEE computer society press Los Alamitos, CA (1990).
  • [9] P. S. Diniz, E. A. Da Silva, and S. L. Netto, Digital signal processing: system analysis and design , Cambridge University Press (2010).
  • [10] J. W. Hwang and H. S. Lee, “Adaptive image interpolation based on local gradient features,” IEEE signal processing letters 11(3), 359–362 (2004).
  • [11] Q. Wang and R. K. Ward, “A new orientation-adaptive interpolation method,” IEEE Transactions on Image Processing 16(4), 889–900 (2007).
  • [12] R. Keys, “Cubic convolution interpolation for digital image processing,” IEEE transactions on acoustics, speech, and signal processing 29(6), 1153–1160 (1981).
  • [13] R. Seta, K. Okubo, and N. Tagawa, “Digital image interpolation method using higher-order hermite interpolating polynomials with compact finite-difference,” in Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference , 406–409 (2009).
  • [14] E. A. Feibush, M. Levoy, and R. L. Cook, “Synthetic texturing using digital filters,” in ACM SIGGRAPH Computer Graphics , 14, 294–301, ACM (1980).
  • [15] L. Williams, “Pyramidal parametrics,” in Acm siggraph computer graphics , 17, 1–11, ACM (1983).
  • [16] J. P. Ewins, M. D. Waller, M. White, and P. F. Lister, “Mip-map level selection for texture mapping,” IEEE Transactions on Visualization and Computer Graphics 4(4), 317–329 (1998).
  • [17] P. S. Heckbert, “Fundamentals of texture mapping and image warping,” (1989).
  • [18] B. Chen, F. Dachille, and A. E. Kaufman, “Footprint area sampled texturing,” IEEE Transactions on Visualization and Computer Graphics 10(2), 230–240 (2004).