Motion-Compensated Temporal Filtering for Critically-Sampled Wavelet-Encoded Images

We propose a novel motion estimation/compensation (ME/MC) method for wavelet-based (in-band) motion compensated temporal filtering (MCTF), with application to low-bitrate video coding. Unlike the conventional in-band MCTF algorithms, which require redundancy to overcome the shift-variance problem of critically sampled (complete) discrete wavelet transforms (DWT), we perform ME/MC steps directly on DWT coefficients by avoiding the need of shift-invariance. We omit upsampling, the inverse-DWT (IDWT), and the calculation of redundant DWT coefficients, while achieving arbitrary subpixel accuracy without interpolation, and high video quality even at very low-bitrates, by deriving the exact relationships between DWT subbands of input image sequences. Experimental results demonstrate the accuracy of the proposed method, confirming that our model for ME/MC effectively improves video coding quality.

READ FULL TEXT VIEW PDF
09/25/2017

Camera-Aware Multi-Resolution Analysis (CAMRA) for Raw Sensor Data Compression

We propose a novel lossless and lossy compression scheme for color filte...
08/29/2018

Wavelet Video Coding Algorithm Based on Energy Weighted Significance Probability Balancing Tree

This work presents a 3-D wavelet video coding algorithm. By analyzing th...
03/02/2019

Non-decimated Quaternion Wavelet Spectral Tools with Applications

Quaternion wavelets are redundant wavelet transforms generalizing comple...
03/26/2021

Wavelet Spatio-Temporal Change Detection on multi-temporal PolSAR images

We introduce WECS (Wavelet Energies Correlation Sreening), an unsupervis...
10/02/1998

A Linear Shift Invariant Multiscale Transform

This paper presents a multiscale decomposition algorithm. Unlike standar...
12/11/2016

A Novel Motion Detection Method Resistant to Severe Illumination Changes

Recently, there has been a considerable attention given to the motion de...
07/11/2016

Systholic Boolean Orthonormalizer Network in Wavelet Domain for SAR Image Despeckling

We describe a novel method for removing speckle (in wavelet domain) of u...

I Introduction

Motion Estimation/Compensation and video coding have wide range of applications in various areas of image/video processing, including restoration [113, 121, 116, 47, 33, 117, 94, 111, 107, 108, 112, 109, 32, 123, 34, 67, 114, 115, 120, 7], content/context analysis [139, 140, 141, 138, 39, 38, 147, 95, 101, 2, 1, 55, 3, 37, 54], surveillance [71, 77, 135, 84, 134, 16, 137, 75, 74, 76, 70, 69, 17, 11, 132], action recognition [129, 18, 13, 136, 125, 133, 19, 124, 127, 14, 15, 36, 126, 128, 10], self-localization [79, 78, 85, 86, 83, 87], tracking [131, 97, 99, 122, 98], scene modeling [82, 35, 68, 12], and video post-production [48, 49, 130, 29, 143, 100, 4, 5, 66, 23, 144, 45]. to name a few.

Reliable motion estimation/compensation can substantially reduce the residual energy in the coding of video data. Motion estimation methods are either global [65, 60, 30, 31, 6, 20, 21, 119, 61, 118, 27, 26, 64, 25, 62, 24, 22], or local [58, 110, 59, 57] in their nature in terms of treating the transformation relating two images. There is also a separate but related body of work on camera motion quantification, which requires online or offline calibration of camera [46, 44, 52, 80, 41, 43, 40, 72, 81, 51, 63, 73, 89, 90, 91, 9, 88, 96, 28, 42, 42, 50]. While these methods and their variations have been proposed in the past for motion compensation in different applications, space-time subband/wavelet coding [102] is by far the method of choice for coding and compressing images and videos due to its superior performance. Its effectiveness, however, can be significantly improved with motion compensation, which is the topic of the proposed method in this paper.

Ii Related Work

Still image coding [8] and video coding [146] are important topics of research in coding and compression of multimedia data. On the other hand, scalable video coding [142, 103] is an emerging trend in numerous multimedia applications with heterogeneous networks, due to their ability to adapt different resolution and quality requirements. Recently, a large body of research has focused on wavelet-based methods [106, 8, 93, 53], where motion compensated temporal filtering (MCTF) is shown to play an essential role in both scalable video coding and still image coding. MCTF is performed either directly on input images, or on their transforms. Thus, MCTF methods can be categorized into two groups depending on the order of temporal and spatial transforms. MCTF techniques which perform temporal decomposition before a spatial transform include, Secker and Taubman [106], and Pesquest-Popescu and Bottreau [104] who used lifting formulation of three dimensional temporal wavelet decomposition for motion compensated video compression. Kim et al. [92] proposed a 3-D extension of set partitioning in hierarchical trees (3D-SPIHT) by a low bit-rate embedded video coding scheme. More recently, Xiong et al. [145] extended spatiotemporal subband transform to in-scale motion compensation to exploit the temporal and cross-resolution correlations simultaneously, by predicting low-pass subbands from next lower resolution and high-pass subbands from neighboring frames in the same resolution layer. Furthermore, Chen and Liu [53] used an adaptive Lagrange multiplier selection model in rate-distortion optimization (RDO) for motion estimation. In order to achieve more accurate motion data, Esche et al. [56] proposed an interpolation method for motion information per pixel using block based motion data, and Rüfenacht et al. [105] anchor motion fields at reference frames instead of target frames to resolve folding ambiguities in the vicinity of motion discontinuities.

Although the methods cited above have good performance, they suffer from drifting and operational mismatch problems [145]. Therefore, performing spatial transform before temporal decomposition was introduced to overcome these drawbacks. However, since complete DWT is shift variant, in order to achieve in-band ME/MC (i.e. directly in the wavelet domain), several methods were proposed to tackle this problem by redundancy. Van der Auwera et al. [142] used a bottom-up prediction algorithm for a bottom-up overcomplete discrete wavelet transform (ODWT). Park and Kim [103] proposed a low-band-shift method by constructing the wavelet tree by shifting low-band subband in each level for horizontal, vertical, and diagonal directions for one pixel and performing downsampling. Andreopoulos et al. [8] defined a complete to overcomplete discrete wavelet transform (CODWT), which avoids inverse DWT generally used to obtain ODWT. More recently, Liu and Ngan [93] use partial distortion search and anisotropic double cross search algorithms with the MCTF method in [8] for a fast motion estimation. Amiot et al. [7] perform MCTF for denoising, using dual-tree complex wavelet (DT-CW) coefficients.

All MCTF methods summarized above perform motion estimation/motion compensation either in the temporal domain before DWT, or in the wavelet domain with the help of redundancy (e.g. ODWT, DT-CW, etc.), due to the fact that complete DWT is shift-variant and motion estimation directly on DWT subbands is a challenging task. However, redundancy in these methods leads to high computational complexity [93]. Inspired by the fact that shift variance keeps the perfect reconstruction and nonredundancy properties of wavelets and breaks the coupling between spatial subbands, and that wavelet codecs always operate on complete DWT subbands [8], we propose a novel in-band ME/MC method, which avoids the need of shift invariance, and operates directly on the original DWT coefficients of the input sequences. Since Haar wavelets are widely utilized in MCTF methods due to the coding efficiency based on their short kernel filters [8], our method is built on Haar subbands. For accurate ME/MC, we define the exact relationships between the DWT subbands of input video sequences, which allows us to avoid upsampling, inverse DWT, redundancy, and interpolation for subpixel accuracy.

The rest of the paper is organized as follows. We introduce the problem and our proposed solution in Section III. We define the derived exact inter-subband relationships in Section IV, demonstrate the experimental results in Section V, and finally conclude our paper in Section VI.

Iii Motion Compensated Temporal Filtering

In this section, we explain our proposed method for in-band motion compensated temporal filtering, operating directly on DWT subbands.

Fig. 1: A block diagram of the proposed in-band Motion Compensated Temporal Filtering model.

The wavelet transform provides localization both in time and frequency; therefore, it is straightforward to use wavelets in MCTF. In order to perform ME/MC in MCTF, wavelet subbands of the transformed signal need to be predicted. However, due to decimation and expansion operations of DWT, direct band-to-band estimation is generally not practical [103]. The proposed method overcomes this challenge by revealing the relationships between subbands of reference and target frames.

The proposed in-band MCTF method is demonstrated in Fig. 1. Given a video sequence, first, DWT is performed on each frame for spatial decomposition, then a temporal decomposition is performed by splitting video frames into groups. ME/MC ( in Fig. 1) is performed by block matching, using reference frames () to predict the target frames (

). Employing the found motion vectors (MV), reference frames are mapped onto the target frames to generate error frames,

in Fig. 1, which are then quantized (), encoded/decoded by a wavelet codec, together with the MVs.

We employ Haar wavelet decomposition in spatial transform due to the benefits mentioned earlier. Since the method in Section IV is accurate for any arbitrary subpixel translation defined as a multiple of , where is the decomposition level, our method does not need interpolation for subpixel accuracy. A block matching method with unidirectional full search is used for ME/MC steps which is a common method used for MCTF. Our cost function is based on mean square error minimization using all subbands, as follows:

(1)

where denote the original target frame wavelet subbands, and are the estimated subbands for the same target image, using the method described in Section IV and a reference frame.

Iv Inter-subband Relationship

In-band (wavelet domain) shift method along with the related notation are provided in this section.

Iv-a Notation

Here, we provide the notation used throughout the paper beforehand, in Table I, for a better understanding of the proposed method and to prevent any confusions.

Input video frame at time
Haar wavelet transform approximation, horizontal, vertical, and diagonal subbands of input image, respectively
Coefficient matrices to be multiplied by approximation, horizontal, vertical, and diagonal DWT subbands
Number of hypothetically added levels in case of non-integer shifts
Integer shift amount after the hypothetically added levels ()
TABLE I: Notation

Bold letters in the following sections demonstrate matrices and vectors. The subscripts and indicate the horizontal and vertical translation directions, respectively. Finally, the subscript indicates the th video frame, where .

Iv-B In-band Shifts

Our goal for the MCTF method described in Section III is to achieve ME/MC in the wavelet domain using DWT subbands, given a video frame sequence. For this purpose, wavelet subbands of the tranformed signal should be predicted using only DWT subbands of the reference frame. Therefore, we derive the relationship between the subbands of transformed and reference images, which can be described by in-band shift (in the wavelet domain) of the reference image subbands. Below, we derive the mathematical expressions which demonstrate these relationships.

Let , , , and be the first level approximation, horizontal, vertical, and diagonal detail coefficients (subbands), respectively, of a reference frame at time , , of size , where and are positive integers. Since decimation operator in wavelet transform reduces the size of input frame by half in each direction for each subband, we require the frame sizes to be divisible by 2. Now, the level subbands of translated frame in any direction (i.e. horizontal, vertical, diagonal) can be expressed in matrix form using the level Haar transform subbands of reference frame as in the following equations:

As already mentioned in Section IV-A, F, K, and L stand for coefficient matrices to be multiplied by the lowpass and highpass subbands of the reference frame, where subscripts and indicate horizontal and vertical shifts. are translated frame subbands in any direction. The low/high-pass subbands of both reference and transformed frames are of size , and are , whereas and are .

By examining the translational shifts between subbands of two input frames in the Haar domain, we realize that horizontal translation reduces L to zero and

to the identity matrix. This could be understood by examining the coefficient matrices defined later in this section (namely, Eq. (

3)), by setting the related vertical components to zero (specifically, and ). Likewise, vertical translation depends solely on approximation and vertical detail coefficients, in which case K is reduced to zero and is equal to the identity matrix.

Here, we first define the matrices for subpixel shift amounts. The algorithm to reach any shift amount using the subpixel relationship will be described later in this section.

For subpixel translation, contrary to the customary model of approximating a subpixel shift by upsampling an image followed by an integer shift, our method models subpixel shift directly based on the original coefficients of the reference frame, without upsampling and the need for interpolation. To this end, we resort to the following observations:

(1) Upsampling an image , is equivalent to adding levels to the bottom of its wavelet transform, and setting the detail coefficients to zero, while the approximation coefficients remain the same, as demonstrated in Fig. 2 for upsampling by as an example, where gray subbands show added zeros.

(2) Shifting the upsampled image by an amount of is equivalent to shifting the original image by an amount of , where is the number of added levels (e.g. in Fig. 2).

Fig. 2: Upsampling illustration.

These observations allow us to do an in-band shift of the reference subbands for a subpixel amount, without upsampling or interpolation, which saves both memory and reduces the computational cost. Transformed signals therefore can be found by using the original level subbands of the reference image with the help of a hypothetically added level () and an integer shift value () at the added level.

Now, the aforementioned coefficient matrices, , , and can be defined, in lower bidiagonal Toeplitz matrix form as follows.

(3)

where and demonstrate the integer shift amounts at the hypothetically added level and the number of added levels for direction, respectively.

, , and matrices are defined in a similar manner by upper bidiagonal Toeplitz matrices, using direction values for and .

As mentioned earlier, and are , while and are . Sizes of these matrices also indicate that in-band shift of subbands is performed using only the original level Haar coefficients (which are of size ) without upsampling. When the shift amount is negative, diagonals of the coefficient matrices interchange. The matrices are adapted for boundary condition by adding one more column/row at the end, for the MCTF method proposed in Section III, where subband sizes are also adjusted to be .

The relationship defined above for subpixel shifts, can be used to produce any shift amount based on the fact that wavelet subbands are periodically shift-invariant. Table II demonstrates the calculation of any shift using subpixels, where stands for modulo, and are the greatest integer lower than, and smallest integer higher than the shift amount . Using circular shift of subbands for the given amounts in each shift amount case, and setting the new shift amount to the new shift values in Table II, we can calculate any fractional or integer amount of shifts using subpixels.

Shift amount Circular shift New shift amount
TABLE II: Arbitrary shifts defined by circular shift and subpixel amount

If the shift amount (or the new shift amount in Table II) is not divisible by , in order to reach an integer value at the th level, the shift value at the original level is rounded to the closest decimal point which is divisible by .

V Experimental Results

In this section, we demonstrate the results obtained with our method compared to the methods which perform in-band MCTF for video coding. We report our results on CIF video sequence examples with resolutions and . We set our block size to or depending on the resolution of the sequences (in order to have integer number of blocks in subbands) and the required accuracy. Even though our MCTF method is based on 1-level DWT, we perform more spatial decomposition levels after ME/MC steps before encoding, since compared methods use spatial decomposition levels in total. Motion vectors and error frames are encoded using context-adaptive variable-length coding (CAVLC) and global thresholding with Huffman coding methods, respectively.

Fig. 3: Rate-distortion comparison for the Football sequence.

Fig. 3 shows the comparison of our method with respect to two conventional in-band methods, which are direct wavelet subband matching (band-to-band) and wavelet-block low-band-shift (LBS) [103] for CIF video sequence ”Football”. The graph demonstrates rate-distortion curves for a predicted frame of the Football sequence, where the shown bitrates are for error frame only (same as in the compared methods), and the accuracy for our method is set to pixel. As seen in this figure, our method improves PSNR compared to conventional in-band methods by dB in general.

Fig. 4: PSNR performance of proposed method.
Fig. 5: Residual images for predicted frames of Foreman for bpp on the left and bpp on the right.

We demonstrate our results for several video sequences at different bitrates in Fig. 4, where bitrates include the luminance component only for the reference frame, the error frame, and MVs. The graph on the left shows the results with pixel accuracy using blocks, and the one on the right uses pixel accuracy with blocks. We also show the residual images for a predicted frame of the Foreman sequence in Fig. 5, for and bpp, respectively. The examples show how our method reduces the residual signal energy even at very low bitrates by providing more accurate reconstruction (prediction).

Vi Conclusion

We propose a novel method for wavelet-based (in-band) ME/MC for MCTF in for video coding, where DWT is applied before temporal decomposition, and ME/MC steps are performed directly on DWT subbands. We avoid the need for shift-invariance property for non-redundant DWT (required by conventional methods for ME/MC), by deriving the exact relationships between DWT subbands of reference and transformed video frames. Our method avoids upsampling, inverse-DWT (IDWT), and calculation of redundant DWT while achieving high accuracy even at very low-bitrates. Experimental results demonstrate the accuracy of presented method for ME/MC, confirming that our model effectively improves video coding quality by reducing the residual energy in the error frames. The proposed ME/MC scheme can also be adapted for several image/video processing applications such as denoising, or scalable video coding.

References

  • [1] Muhamad Ali and Hassan Foroosh. Natural scene character recognition without dependency on specific features. In

    Proc. International Conference on Computer Vision Theory and Applications

    , 2015.
  • [2] Muhamad Ali and Hassan Foroosh. A holistic method to recognize characters in natural scenes. In Proc. International Conference on Computer Vision Theory and Applications, 2016.
  • [3] Muhammad Ali and Hassan Foroosh.

    Character recognition in natural scene images using rank-1 tensor decomposition.

    In Proc. of International Conference on Image Processing (ICIP), pages 2891–2895, 2016.
  • [4] Mais Alnasser and Hassan Foroosh. Image-based rendering of synthetic diffuse objects in natural scenes. In

    Proc. IAPR Int. Conference on Pattern Recognition

    , volume 4, pages 787–790, 2006.
  • [5] Mais Alnasser and Hassan Foroosh. Rendering synthetic objects in natural scenes. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 493–496, 2006.
  • [6] Mais Alnasser and Hassan Foroosh. Phase shifting for non-separable 2d haar wavelets. IEEE Transactions on Image Processing, 16:1061–1068, 2008.
  • [7] Carole Amiot, Catherine Girard, Jérémie Pescatore, Jocelyn Chanussot, and Michel Desvignes. Fluorosocopic sequence denoising using a motion compensated multi-scale temporal filtering. In ICIP, pages 691–695. IEEE, 2015.
  • [8] Yiannis Andreopoulos, Adrian Munteanu, Geert Van der Auwera, Jan PH Cornelis, and Peter Schelkens. Complete-to-overcomplete discrete wavelet transforms: theory and applications. IEEE Transactions on Signal Processing, 53(4):1398–1412, 2005.
  • [9] Nazim Ashraf and Hassan Foroosh. Robust auto-calibration of a ptz camera with non-overlapping fov. In Proc. International Conference on Pattern Recognition (ICPR), 2008.
  • [10] Nazim Ashraf and Hassan Foroosh. Human action recognition in video data using invariant characteristic vectors. In Proc. of IEEE Int. Conf. on Image Processing (ICIP), pages 1385–1388, 2012.
  • [11] Nazim Ashraf and Hassan Foroosh. Motion retrieval using consistency of epipolar geometry. In Proceedings of IEEE International Conference on Image Processing (ICIP), pages 4219–4223, 2015.
  • [12] Nazim Ashraf, Imran Junejo, and Hassan Foroosh. Near-optimal mosaic selection for rotating and zooming video cameras. Proc. of Asian Conf. on Computer Vision, pages 63–72, 2007.
  • [13] Nazim Ashraf, Yuping Shen, Xiaochun Cao, and Hassan Foroosh. View-invariant action recognition using weighted fundamental ratios. Journal of Computer Vision and Image Understanding (CVIU), 117:587–602, 2013.
  • [14] Nazim Ashraf, Yuping Shen, Xiaochun Cao, and Hassan Foroosh. View invariant action recognition using weighted fundamental ratios. Computer Vision and Image Understanding, 117(6):587–602, 2013.
  • [15] Nazim Ashraf, Yuping Shen, and Hassan Foroosh. View-invariant action recognition using rank constraint. In Proc. of IAPR Int. Conf. Pattern Recognition (ICPR), pages 3611–3614, 2010.
  • [16] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. Motion retrieval using low-rank decomposition of fundamental ratios. In Proc. IEEE International Conference on Image Processing (ICIP), pages 1905–1908, 2012.
  • [17] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. Motion retrival using low-rank decomposition of fundamental ratios. In Image Processing (ICIP), 2012 19th IEEE International Conference on, pages 1905–1908, 2012.
  • [18] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. View-invariant action recognition using projective depth. Journal of Computer Vision and Image Understanding (CVIU), 123:41–52, 2014.
  • [19] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. View invariant action recognition using projective depth. Computer Vision and Image Understanding, 123:41–52, 2014.
  • [20] Vildan Atalay and Hassan Foroosh. In-band sub-pixel registration of wavelet-encoded images from sparse coefficients. Signal, Image and Video Processing, 2017.
  • [21] Vildan A. Aydin and Hassan Foroosh. Motion compensation using critically sampled dwt subbands for low-bitrate video coding. In Proc. IEEE International Conference on Image Processing (ICIP), 2017.
  • [22] Murat Balci, Mais Alnasser, and Hassan Foroosh. Alignment of maxillofacial ct scans to stone-cast models using 3d symmetry for backscattering artifact reduction. In Proceedings of Medical Image Understanding and Analysis Conference, 2006.
  • [23] Murat Balci, Mais Alnasser, and Hassan Foroosh. Image-based simulation of gaseous material. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 489–492, 2006.
  • [24] Murat Balci, Mais Alnasser, and Hassan Foroosh. Subpixel alignment of mri data under cartesian and log-polar sampling. In Proc. of IAPR Int. Conf. Pattern Recognition, volume 3, pages 607–610, 2006.
  • [25] Murat Balci and Hassan Foroosh. Estimating sub-pixel shifts directly from phase difference. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 1057–1060, 2005.
  • [26] Murat Balci and Hassan Foroosh. Estimating sub-pixel shifts directly from the phase difference. In Proc. of IEEE Int. Conf. Image Processing (ICIP), volume 1, pages I–1057, 2005.
  • [27] Murat Balci and Hassan Foroosh. Inferring motion from the rank constraint of the phase matrix. In Proc. IEEE Conf. on Acoustics, Speech, and Signal Processing, volume 2, pages ii–925, 2005.
  • [28] Murat Balci and Hassan Foroosh. Metrology in uncalibrated images given one vanishing point. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 361–364, 2005.
  • [29] Murat Balci and Hassan Foroosh. Real-time 3d fire simulation using a spring-mass model. In Proc. of Int. Multi-Media Modelling Conference, pages 8–pp, 2006.
  • [30] Murat Balci and Hassan Foroosh. Sub-pixel estimation of shifts directly in the fourier domain. IEEE Trans. on Image Processing, 15(7):1965–1972, 2006.
  • [31] Murat Balci and Hassan Foroosh. Sub-pixel registration directly from phase difference.

    Journal of Applied Signal Processing-special issue on Super-resolution Imaging

    , 2006:1–11, 2006.
  • [32] M Berthod, M Werman, H Shekarforoush, and J Zerubia. Refining depth and luminance information using super-resolution. In Proc. of IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 654–657, 1994.
  • [33] Marc Berthod, Hassan Shekarforoush, Michael Werman, and Josiane Zerubia. Reconstruction of high resolution 3d visual information. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 654–657, 1994.
  • [34] Adeel Bhutta and Hassan Foroosh. Blind blur estimation using low rank approximation of cepstrum. Image Analysis and Recognition, pages 94–103, 2006.
  • [35] Adeel A Bhutta, Imran N Junejo, and Hassan Foroosh. Selective subtraction when the scene cannot be learned. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 3273–3276, 2011.
  • [36] Hakan Boyraz, Syed Zain Masood, Baoyuan Liu, Marshall Tappen, and Hassan Foroosh. Action recognition by weakly-supervised discriminative region localization.
  • [37] Ozan Cakmakci, Gregory E. Fasshauer, Hassan Foroosh, Kevin P. Thompson, and Jannick P. Rolland. Meshfree approximation methods for free-form surface representation in optical design with applications to head-worn displays. In Proc. SPIE Conf. on Novel Optical Systems Design and Optimization XI, volume 7061, 2008.
  • [38] Ozan Cakmakci, Brendan Moore, Hassan Foroosh, and Jannick Rolland. Optimal local shape description for rotationally non-symmetric optical surface design and analysis. Optics Express, 16(3):1583–1589, 2008.
  • [39] Ozan Cakmakci, Sophie Vo, Hassan Foroosh, and Jannick Rolland.

    Application of radial basis functions to shape description in a dual-element off-axis magnifier.

    Optics Letters, 33(11):1237–1239, 2008.
  • [40] X Cao and H Foroosh. Metrology from vertical objects. In Proceedings of the British Machine Conference (BMVC), pages 74–1.
  • [41] Xiaochun Cao and Hassan Foroosh. Camera calibration without metric information using 1d objects. In Proc. International Conf. on Image Processing (ICIP), volume 2, pages 1349–1352, 2004.
  • [42] Xiaochun Cao and Hassan Foroosh. Camera calibration without metric information using an isosceles trapezoid. In Proc. International Conference on Pattern Recognition (ICPR), volume 1, pages 104–107, 2004.
  • [43] Xiaochun Cao and Hassan Foroosh. Simple calibration without metric information using an isoceles trapezoid. In Proc. of IAPR Int. Conf. Pattern Recognition (ICPR), volume 1, pages 104–107, 2004.
  • [44] Xiaochun Cao and Hassan Foroosh. Camera calibration using symmetric objects. IEEE Transactions on Image Processing, 15(11):3614–3619, 2006.
  • [45] Xiaochun Cao and Hassan Foroosh. Synthesizing reflections of inserted objects. In Proc. IAPR Int. Conference on Pattern Recognition, volume 2, pages 1225–1228, 2006.
  • [46] Xiaochun Cao and Hassan Foroosh. Camera calibration and light source orientation from solar shadows. Journal of Computer Vision & Image Understanding (CVIU), 105:60–72, 2007.
  • [47] Xiaochun Cao, Wenqi Ren, Wangmeng Zuo, Xiaojie Guo, and Hassan Foroosh. Scene text deblurring using text-specific multi-scale dictionaries. IEEE Transactions on Image Processing, 24(4):1302–1314, 2015.
  • [48] Xiaochun Cao, Yuping Shen, Mubarak Shah, and Hassan Foroosh. Single view compositing with shadows. The Visual Computer, 21(8-10):639–648, 2005.
  • [49] Xiaochun Cao, Lin Wu, Jiangjian Xiao, Hassan Foroosh, Jigui Zhu, and Xiaohong Li. Video synchronization and its application on object transfer. Image and Vision Computing (IVC), 28(1):92–100, 2009.
  • [50] Xiaochun Cao, Jiangjian Xiao, and Hassan Foroosh. Camera motion quantification and alignment. In Proc. International Conference on Pattern Recognition (ICPR), volume 2, pages 13–16, 2006.
  • [51] Xiaochun Cao, Jiangjian Xiao, and Hassan Foroosh. Self-calibration using constant camera motion. In Proc. of IAPR Int. Conf. Pattern Recognition (ICPR), volume 1, pages 595–598, 2006.
  • [52] Xiaochun Cao, Jiangjian Xiao, Hassan Foroosh, and Mubarak Shah. Self-calibration from turn table sequence in presence of zoom and focus. Computer Vision and Image Understanding (CVIU), 102(3):227–237, 2006.
  • [53] Ying Chen and Guizhong Liu. Adaptive lagrange multiplier selection model in rate distortion optimization for 3d wavelet-based scalable video coding. In ICIP, pages 3190–3194. IEEE, 2014.
  • [54] Kristian L Damkjer and Hassan Foroosh. Mesh-free sparse representation of multidimensional LIDAR data. In Proc. of International Conference on Image Processing (ICIP), pages 4682–4686, 2014.
  • [55] Farshideh Einsele and Hassan Foroosh. Recognition of grocery products in images captured by cellular phones. In Proc. International Conference on Computer Vision and Image Processing (ICCVIP), 2015.
  • [56] Marko Esche, Michael Tok, and Thomas Sikora. Adpative dense vector field interpolation for temporal filtering. In ICIP, pages 1918–1922. IEEE, 2013.
  • [57] H Foroosh. Adaptive estimation of motion using generalized cross validation. In 3rd International (IEEE) Workshop on Statistical and Computational Theories of Vision, 2003.
  • [58] Hassan Foroosh. A closed-form solution for optical flow by imposing temporal constraints. In Proc. of IEEE International Conf. on Image Processing (ICIP), volume 3, pages 656–659, 2001.
  • [59] Hassan Foroosh. An adaptive scheme for estimating motion. In Proc. of IEEE International Conf. on Image Processing (ICIP), volume 3, pages 1831–1834, 2004.
  • [60] Hassan Foroosh. Pixelwise adaptive dense optical flow assuming non-stationary statistics. IEEE Trans. on Image Processing, 14(2):222–230, 2005.
  • [61] Hassan Foroosh and Murat Balci. Sub-pixel registration and estimation of local shifts directly in the fourier domain. In Proc. International Conference on Image Processing (ICIP), volume 3, pages 1915–1918, 2004.
  • [62] Hassan Foroosh and Murat Balci. Subpixel registration and estimation of local shifts directly in the fourier domain. In Proc. of IEEE International Conference on Image Processing (ICIP), volume 3, pages 1915–1918, 2004.
  • [63] Hassan Foroosh, Murat Balci, and Xiaochun Cao. Self-calibrated reconstruction of partially viewed symmetric objects. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages ii–869, 2005.
  • [64] Hassan Foroosh and W Scott Hoge. Motion information in the phase domain. In Video registration, pages 36–71. Springer, 2003.
  • [65] Hassan Foroosh, Josiane Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE Trans. on Image Processing, 11(3):188–200, 2002.
  • [66] Tao Fu and Hassan Foroosh. Expression morphing from distant viewpoints. In Proc. of IEEE International Conference on Image Processing (ICIP), volume 5, pages 3519–3522, 2004.
  • [67] Apurva Jain, Supraja Murali, Nicolene Papp, Kevin Thompson, Kye-sung Lee, Panomsak Meemon, Hassan Foroosh, and Jannick P Rolland. Super-resolution imaging combining the design of an optical coherence microscope objective with liquid-lens based dynamic focusing capability and computational methods. In Optical Engineering & Applications, pages 70610C–70610C. International Society for Optics and Photonics, 2008.
  • [68] I Junejo, A Bhutta, and Hassan Foroosh. Dynamic scene modeling for object detection using single-class svm. In Proc. of IEEE International Conference on Image Processing (ICIP), volume 1, pages 1541–1544, 2010.
  • [69] Imran Junejo, Xiaochun Cao, and Hassan Foroosh. Configuring mixed reality environment. In Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance, pages 884–887, 2006.
  • [70] Imran Junejo, Xiaochun Cao, and Hassan Foroosh. Geometry of a non-overlapping multi-camera network. In Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance, pages 43–48, 2006.
  • [71] Imran Junejo, Xiaochun Cao, and Hassan Foroosh. Autoconfiguration of a dynamic non-overlapping camera network. IEEE Trans. Systems, Man, and Cybernetics, 37(4):803–816, 2007.
  • [72] Imran Junejo and Hassan Foroosh. Dissecting the image of the absolute conic. In Proc. of IEEE Int. Conf. on Video and Signal Based Surveillance, pages 77–77, 2006.
  • [73] Imran Junejo and Hassan Foroosh. Robust auto-calibration from pedestrians. In Proc. IEEE International Conference on Video and Signal Based Surveillance, pages 92–92, 2006.
  • [74] Imran Junejo and Hassan Foroosh. Euclidean path modeling from ground and aerial views. In Proc. International Conference on Computer Vision (ICCV), pages 1–7, 2007.
  • [75] Imran Junejo and Hassan Foroosh. Trajectory rectification and path modeling for surveillance. In Proc. International Conference on Computer Vision (ICCV), pages 1–7, 2007.
  • [76] Imran Junejo and Hassan Foroosh. Using calibrated camera for euclidean path modeling. In Proceedings of IEEE International Conference on Image Processing (ICIP), pages 205–208, 2007.
  • [77] Imran Junejo and Hassan Foroosh. Euclidean path modeling for video surveillance. Image and Vision Computing (IVC), 26(4):512–528, 2008.
  • [78] Imran Junejo and Hassan Foroosh. Camera calibration and geo-location estimation from two shadow trajectories. Computer Vision and Image Understanding (CVIU), 114:915–927, 2010.
  • [79] Imran Junejo and Hassan Foroosh. Gps coordinates estimation and camera calibration from solar shadows. Computer Vision and Image Understanding (CVIU), 114(9):991–1003, 2010.
  • [80] Imran Junejo and Hassan Foroosh. Optimizing ptz camera calibration from two images. Machine Vision and Applications (MVA), pages 1–15, 2011.
  • [81] Imran N Junejo, Nazim Ashraf, Yuping Shen, and Hassan Foroosh. Robust auto-calibration using fundamental matrices induced by pedestrians. In Proc. International Conf. on Image Processing (ICIP), volume 3, pages III–201, 2007.
  • [82] Imran N. Junejo, Adeel Bhutta, and Hassan Foroosh. Single-class svm for dynamic scene modeling. Signal Image and Video Processing, 7(1):45–52, 2013.
  • [83] Imran N Junejo, Xiaochun Cao, and Hassan Foroosh. Calibrating freely moving cameras. In Proc. International Conference on Pattern Recognition (ICPR), volume 4, pages 880–883, 2006.
  • [84] Imran N. Junejo and Hassan Foroosh. Trajectory rectification and path modeling for video surveillance. In Proc. International Conference on Computer Vision (ICCV), pages 1–7, 2007.
  • [85] Imran N. Junejo and Hassan Foroosh. Estimating geo-temporal location of stationary cameras using shadow trajectories. In Proc. European Conference on Computer Vision (ECCV), 2008.
  • [86] Imran N. Junejo and Hassan Foroosh. Gps coordinate estimation from calibrated cameras. In Proc. International Conference on Pattern Recognition (ICPR), 2008.
  • [87] Imran N Junejo and Hassan Foroosh. Gps coordinate estimation from calibrated cameras. In Proc. International Conference on Pattern Recognition (ICPR), pages 1–4, 2008.
  • [88] Imran N. Junejo and Hassan Foroosh. Practical ptz camera calibration using givens rotations. In Proc. IEEE International Conference on Image Processing (ICIP), 2008.
  • [89] Imran N. Junejo and Hassan Foroosh. Practical pure pan and pure tilt camera calibration. In Proc. International Conference on Pattern Recognition (ICPR), 2008.
  • [90] Imran N. Junejo and Hassan Foroosh. Refining ptz camera calibration. In Proc. International Conference on Pattern Recognition (ICPR), 2008.
  • [91] Imran N. Junejo and Hassan Foroosh. Using solar shadow trajectories for camera calibration. In Proc. IEEE International Conference on Image Processing (ICIP), 2008.
  • [92] Beong-Jo Kim, Zixiang Xiong, and William A Pearlman. Low bit-rate scalable video coding with 3-d set partitioning in hierarchical trees (3-d spiht). IEEE Transactions on Circuits and Systems for Video Technology, 10(8):1374–1387, 2000.
  • [93] Yu Liu and King Ngi Ngan. Fast multiresolution motion estimation algorithms for wavelet-based scalable video coding. Signal Processing: Image Communication, 22(5):448–465, 2007.
  • [94] Anne Lorette, Hassan Shekarforoush, and Josiane Zerubia. Super-resolution with adaptive regularization. In Proc. International Conf. on Image Processing (ICIP), volume 1, pages 169–172, 1997.
  • [95] Sina Lotfian and Hassan Foroosh. View-invariant object recognition using homography constraints. In Proc. IEEE International Conference on Image Processing (ICIP), 2017.
  • [96] Fei Lu, Xiaochun Cao, Yuping Shen, and Hassan Foroosh. Camera calibration from two shadow trajectories. In Proc. of IEEE International Conference on Advanced Video and Signal-based Surveillance, volume 2.
  • [97] Brian Milikan, Aritra Dutta, Qiyu Sun, and Hassan Foroosh. Compressed infrared target detection using stochastically trained least squares. IEEE Transactions on Aerospace and Electronics Systems, page accepted, 2017.
  • [98] Brian Millikan, Aritra Dutta, Nazanin Rahnavard, Qiyu Sun, and Hassan Foroosh. Initialized iterative reweighted least squares for automatic target recognition. In Military Communications Conference, MILCOM, IEEE, pages 506–510, 2015.
  • [99] Brian A. Millikan, Aritra Dutta, Nazanin Rahnavard, Qiyu Sun, and Hassan Foroosh. Initialized iterative reweighted least squares for automatic target recognition. In Proc. of MILICOM, 2015.
  • [100] Brendan Moore, Marshall Tappen, and Hassan Foroosh. Learning face appearance under different lighting conditions. In Proc. IEEE Int. Conf. on Biometrics: Theory, Applications and Systems, pages 1–8, 2008.
  • [101] Dustin Morley and Hassan Foroosh. Improving ransac-based segmentation through cnn encapsulation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [102] J-R Ohm. Advances in scalable video coding. Proceedings of the IEEE, 93(1):42–56, 2005.
  • [103] Hyun-Wook Park and Hyung-Sun Kim. Motion estimation using low-band-shift method for wavelet-based moving-picture coding. IEEE TIP, 9(4):577–587, 2000.
  • [104] Béatrice Pesquet-Popescu and Vincent Bottreau. Three-dimensional lifting schemes for motion compensated video compression. In Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE International Conference on, volume 3, pages 1793–1796. IEEE, 2001.
  • [105] Dominic Rüfenacht, Reji Mathew, and David Taubman. Hierarchical anchoring of motion fields for fully scalable video coding. In ICIP, pages 3180–3184. IEEE, 2014.
  • [106] Andrew Secker and David Taubman. Motion-compensated highly scalable video compression using an adaptive 3d wavelet transform based on lifting. In ICIP, volume 2, pages 1029–1032. IEEE, 2001.
  • [107] H Shekarforoush. Super-Resolution in Computer Vision. PhD thesis, PhD Thesis, University of Nice, 1996.
  • [108] H Shekarforoush, M Berthod, and J Zerubia. Sub-pixel reconstruction of a variable albedo lambertian surface. In Proceedings of the British Machine Vision Conference (BMVC), volume 1, pages 307–316.
  • [109] H Shekarforoush and R Chellappa. adaptive super-resolution for predator video sequences.
  • [110] H Shekarforoush and R Chellappa. A multifractal formalism for stabilization and activity detection in flir sequences. In Proceedings, ARL Federated Laboratory 4th Annual Symposium, pages 305–309, 2000.
  • [111] H Shekarforoush, R Chellappa, H Niemann, H Seidel, and B Girod. Multi-channel superresolution for images sequences with applications to airborne video data. Proc. of IEEE Image and Multidimensional Digital Signal Processing, pages 207–210, 1998.
  • [112] Hassan Shekarforoush. Conditioning bounds for multi-frame super-resolution algorithms. Computer Vision Laboratory, Center for Automation Research, University of Maryland, 1999.
  • [113] Hassan Shekarforoush. Noise suppression by removing singularities. IEEE Trans. Signal Processing, 48(7):2175–2179, 2000.
  • [114] Hassan Shekarforoush. Noise suppression by removing singularities. IEEE transactions on signal processing, 48(7):2175–2179, 2000.
  • [115] Hassan Shekarforoush, Amit Banerjee, and Rama Chellappa. Super resolution for fopen sar data. In Proc. AeroSense, pages 123–129. International Society for Optics and Photonics, 1999.
  • [116] Hassan Shekarforoush, Marc Berthod, Michael Werman, and Josiane Zerubia. Subpixel bayesian estimation of albedo and height. International Journal of Computer Vision, 19(3):289–300, 1996.
  • [117] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. 3d super-resolution using generalized sampling expansion. In Proc. International Conf. on Image Processing (ICIP), volume 2, pages 300–303, 1995.
  • [118] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. Subpixel image registration by estimating the polyphase decomposition of the cross power spectrum. PhD thesis, INRIA-Technical Report, 1995.
  • [119] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. Subpixel image registration by estimating the polyphase decomposition of cross power spectrum. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 532–537, 1996.
  • [120] Hassan Shekarforoush and Rama Chellappa. Blind estimation of psf for out of focus video data. In Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, pages 742–745, 1998.
  • [121] Hassan Shekarforoush and Rama Chellappa. Data-driven multi-channel super-resolution with application to video sequences. Journal of Optical Society of America-A, 16(3):481–492, 1999.
  • [122] Hassan Shekarforoush and Rama Chellappa. A multi-fractal formalism for stabilization, object detection and tracking in flir sequences. In Proc. of International Conference on Image Processing (ICIP), volume 3, pages 78–81, 2000.
  • [123] Hassan Shekarforoush, Josiane Zerubia, and Marc Berthod. Denoising by extracting fractional order singularities. In Proc. of IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP), volume 5, pages 2889–2892, 1998.
  • [124] Yuping Shen, Nazim Ashraf, and Hassan Foroosh. Action recognition based on homography constraints. In Proc. of IAPR Int. Conf. Pattern Recognition (ICPR), pages 1–4, 2008.
  • [125] Yuping Shen and Hassan Foroosh. View-invariant action recognition using fundamental ratios. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–6, 2008.
  • [126] Yuping Shen and Hassan Foroosh. View invariant action recognition using fundamental ratios. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
  • [127] Yuping Shen and Hassan Foroosh. View-invariant recognition of body pose from space-time templates. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pages 1–6, 2008.
  • [128] Yuping Shen and Hassan Foroosh. View invariant recognition of body pose from space-time templates. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
  • [129] Yuping Shen and Hassan Foroosh. View-invariant action recognition from point triplets. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(10):1898–1905, 2009.
  • [130] Yuping Shen, Fei Lu, Xiaochun Cao, and Hassan Foroosh. Video completion for perspective camera under constrained motion. In Proc. of IAPR Int. Conf. Pattern Recognition (ICPR), volume 3, pages 63–66, 2006.
  • [131] Chen Shu, Luming Liang, Wenzhang Liang, and Hassan Forooshh. 3d pose tracking with multitemplate warping and sift correspondences. IEEE Trans. on Circuits and Systems for Video Technology, 26(11):2043–2055, 2016.
  • [132] Chuan Sun and Hassan Foroosh. Should we discard sparse or incomplete videos? In Proceedings of IEEE International Conference on Image Processing (ICIP), pages 2502–2506, 2014.
  • [133] Chuan Sun, Imran Junejo, and Hassan Foroosh. Action recognition using rank-1 approximation of joint self-similarity volume. In Proc. IEEE International Conference on Computer Vision (ICCV), pages 1007–1012, 2011.
  • [134] Chuan Sun, Imran Junejo, and Hassan Foroosh. Motion retrieval using low-rank subspace decomposition of motion volume. In Computer Graphics Forum, volume 30, pages 1953–1962. Wiley, 2011.
  • [135] Chuan Sun, Imran Junejo, and Hassan Foroosh. Motion sequence volume based retrieval for 3d captured data. Computer Graphics Forum, 30(7):1953–1962, 2012.
  • [136] Chuan Sun, Imran Junejo, Marshall Tappen, and Hassan Foroosh. Exploring sparseness and self-similarity for action recognition. IEEE Transactions on Image Processing, 24(8):2488–2501, 2015.
  • [137] Chuan Sun, Marshall Tappen, and Hassan Foroosh. Feature-independent action spotting without human localization, segmentation or frame-wise tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2689–2696, 2014.
  • [138] Amara Tariq and Hassan Foroosh. Scene-based automatic image annotation. In Proc. of IEEE International Conference on Image Processing (ICIP), pages 3047–3051, 2014.
  • [139] Amara Tariq, Asim Karim, and Hassan Foroosh. A context-driven extractive framework for generating realistic image descriptions. IEEE Transactions on Image Processing, 26(2):619–632, 2002.
  • [140] Amara Tariq, Asim Karim, and Hassan Foroosh. Nelasso: Building named entity relationship networks using sparse structured learning. IEEE Trans. on on Pattern Analysis and Machine Intelligence, 2017.
  • [141] Amara Tariq, Asim Karim, Fernando Gomez, and Hassan Foroosh. Exploiting topical perceptions over multi-lingual text for hashtag suggestion on twitter. In The Twenty-Sixth International FLAIRS Conference, 2013.
  • [142] G Van der Auwera, A Munteanu, P Schelkens, and J Cornelis. Bottom-up motion compensated prediction in wavelet domain for spatially scalable video coding. Electronics Letters, 38(21):1251–1253, 2002.
  • [143] Jiangjian Xiao, Xiaochun Cao, and Hassan Foroosh. 3d object transfer between non-overlapping videos. In Proc. of IEEE Virtual Reality Conference, pages 127–134, 2006.
  • [144] Jiangjian Xiao, Xiaochun Cao, and Hassan Foroosh. A new framework for video cut and paste. In Proc. of Int. Conf. on Multi-Media Modelling Conference Proceedings, pages 8–pp, 2006.
  • [145] Ruiqin Xiong, Jizheng Xu, and Feng Wu. In-scale motion compensation for spatially scalable video coding. IEEE Transactions on Circuits and Systems for Video Technology, 18(2):145–158, 2008.
  • [146] Mai Xu, Yilin Liang, and Zulin Wang. State-of-the-art video coding approaches: A survey. In IEEE International Conference on Cognitive Informatics & Cognitive Computing, pages 284–290, 2015.
  • [147] Changqing Zhang, Xiaochun Cao, and Hassan Foroosh. Constrained multi-view video face clustering. IEEE Transactions on Image Processing, 24(11):4381–4393, 2015.