Discrete Cosine Transform in JPEG Compression

02/13/2021 ∙ by Jacob John, et al. ∙ 0

Image Compression has become an absolute necessity in today's day and age. With the advent of the Internet era, compressing files to share among other users is quintessential. Several efforts have been made to reduce file sizes while still maintain image quality in order to transmit files even on limited bandwidth connections. This paper discusses the need for Discrete Cosine Transform or DCT in the compression of images in Joint Photographic Experts Group or JPEG file format. Via an intensive literature study, this paper first introduces DCT and JPEG Compression. The section preceding it discusses how JPEG compression is implemented by DCT. The last section concludes with further real world applications of DCT in image processing.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Two-dimensional images stored in digital format are a collection of millions of pixels each represented as a combination of bits. These images are referred to as raster or bitmapped images, as opposed to vector images that use a mathematical formula to create geometric objects. As the demand for better quality images and videos increases, efforts have been made to increase the resolution images are being stored in


For example, a 4k image, i.e., pixels in resolution, stored in RAW/DNG 16 bits/pixel format would require around 23 MB to store [4]. This would mean 45,000 images would occupy roughly 1 TB of storage space, increasing transmission bandwidth and transmission time when sharing several images at once. Furthermore, hosting images on web servers would also cost storage space while also increasing load times, making customer experiences suboptimal [5]

Image compression eradicates this need for large storage space by offering efficient solutions for sharing, viewing and archiving a large number of images. Some generic image formats that offer image compress are JPG, TIF, GIF, and PNG [6].

I-a Discrete Cosine Transform

During the past decade, the Discrete Cosine Transforms or DCT, has found its application in speech and image processing in areas such as compression, filtering, and feature extraction. Using DCT, an image can be transformed into its elementary components

[7]. DCT uses a sum of cosine functions oscillating at different frequencies to express a sequence of finitely many discrete and real data points with even symmetry [8]. This can be expressed as equation (2) that consists of a set of basis vectors that are sampled cosine functions.




Furthermore, the set of basis vectors given by and are a class of discrete Chebyshev polynomials [9]. Thus, these basis vectors can be defined recursively and composes a polynomial sequence.

The 2D DCT for a signal of length , is given by in equation (3).


where varies between 0 to .

Fig. 1: Rate-distortion criteria of various transforms for and . [7]

Due to its performance with respect to the rate-distortion criterion defined in [10] and results given in Figure 1, DCT is said to be analogous to the Karhunen-Loève transform (KTL) for first-order Markov stationary random data [11]. Figure 1 also shows how close the curve for DCT is to Karhunen-Loève transform.

Fig. 2: Scalar Weiner filtering, mean-Square error performance of Karhunen-Loève, Fourier, Walsh-Hadamard, and Haar transforms. Given [7]

Figure 2 also illustrates this close optimality between Karhunen-Loève transforms and DCT when comparing performance curves obtained from the mean-square error in scalar Weiner filtering with .

We can also compute the DCT matrix for . Using the formula given in (3) by expressing it in cosine form, equation (5) gives us the kernel for a 1D DCT matrix.


where .


We thus obtain the following kernel after collecting the coefficients of and from after substituting into equation (5). The transform is real and orthogonal.



Lee [8] proposes a novel approach to make the system more manageable by reducing the number of multiplications to about half of those required by the existing DCT algorithm initially proposed by Ahmed et al. Furthermore, DCT also has fast computation implementations available unlike KLT [12]. One such implementation is a fast-recursive algorithm proposed by Hseih in [11]. This method requires fewer multipliers and adders as it allows the “generation of the next higher order DCT from two identical lower order DCTs.” Other fast-recursive implementations are surveyed and considered in [13].

In [14] and [15], McMillan et al. propose a patent for fast implementation of discrete inverse cosine transform in digital image processing using low-cost accumulators or using optimized lookup tables.

I-B Joint Photographic Experts Group Compression

JPEG compression [16] is a generic standard for the compression of grayscale and color, continuous-tone, still images. Typically preceded by the file extensions .jpg or .jpeg. This standard is categorized as image/jpeg under the MIME media type when uploading images. According to [17], “JPEG/JFIF supports a maximum image size of pixels, hence up to 4 gigapixels for an aspect ratio of 1:1”.

Furthermore, a report published by [18] as of 2016 stated that JPEG is one of the most widely utilized common formats for storing and transmitting photographic images across the Internet. It aims to reduce transmission and storage costs. Furthermore, while also offering affordable image acquisition, display devices and ensuring interoperability among vendors. In fact, the “joint” in JPEG refers to the collaboration between International Organization for Standardization (ISO) and Comité Consultatif International Téléphonique et Télégraphique (CCITT)/International Telegraph Union-Telecommunication Standardization Sector (ITU-T) and hence JPEG is both an ISO Standard and CCITT recommendation.

The JPEG standard supports two compression methods – lossy compression using a DCT-based method and lossless compression using a predictive method.

In lossless compression, there is almost no loss of information post compression. These techniques are always guaranteed to generate a replica of the original input image. The resultant file that is compression is a duplicate of the source file after the compress/expand cycle [19]. Such compression is typically used in mission-critical applications such as health and military database records, where even the mishandling of a single bit could be catastrophic. Run-Length encoding (RLE) and Lempel–Ziv–Welch (LZW) [20] compression techniques are some of the common methods for lossless compression.

Fig. 3: Image degradation with lossy compression at higher compression ratios [21]

In lossy compression, on the other hand, the decompressed image is not an exact match of the source input image. Information loss is apparent but aids in the reduction of file sizes. Furthermore, the lost information cannot be restored and results in loss of image quality. Figure 3 depicts this slight loss of image quality of an image at different compression ratios. Hence, this form of compression is also termed as irreversible compression. The compression ratios are higher in lossy compression methods due to the loss of information, e.g., it can be as low as 50:1. Thus, such methods are used in scenarios such as archiving data where an exact reproduction of the original data is not vital. Examples of lossy compression methods are JPEG, JPEG 2000 [22] and Wavelet Compression [23].

The following are the goals as defined by JPEG before synthesizing an architecture for JPEG image compression:

  • Provide “very good” or “excellent” image quality rating post compression.

  • The encoder should provide flexibility to the user or application to select the desired compression and/or quality tradeoff.

  • Should not be restricted or biased to any towards any image attributes, this includes, but is not limited to, dimensions, aspect ratio, range of colors and statistical properties.

  • The JPEG algorithm must have a tractable computational complexity or be solvable in polynomial time. Thus, making it feasible to create software implementations that are cost effective and run on a wide range of single processor CPUs.

JPEG also aims to provide the following four modes of operation:

  1. Sequential encoding – images are encoded in a single top to bottom or right to left scan.

  2. Progressive encoding – encoding takes place in multiple scans for applications with prolonged transmission time.

  3. Hierarchical encoding – the image is encoded at multiple resolutions. This way several resolutions, even lower ones, are obtainable without having to decompress the source image.

  4. Lossless encoding – the image is encoded in such a manner that guarantees recovery of an exact copy of the original image with no information loss.

Ii DCT In JPEG Compression

The basis for all DCT-based decoders is the Baseline sequential codec. Equation (2) previously illustrated the formula for Forward DCT (FDCT) or DCT image compression. Equation (8), derived from (2), illustrates the mathematical definition for Inverse DCT (IDCT) used in a DCT-based decoder.




Given the “four modes of operation” in the previous section, a codec is specified for each version. The reader must note that though codec is being used in several times in this assignment, it is not mandatory for applications to implement both an encoder and decoder. JPEG and DCT’s flexibility allows for the independent implementation of each operation mode’s decoder and encoder separately.

Fig. 4: (a, top) processing steps for a DCT-based Encoder and (b, bottom) a DCT-based Decoder [16]

For a baseline sequential codec operation mode, figure 4 (a) and (b) illustrate the compression in a complete manner. While for a progressive-mode codec, prior to the entropy coding step, an image buffer must be maintained. This allows for multiple scans with successive improvements between consecutive scans. Furthermore, hierarchical-mode codec uses a broader and more comprehensive framework while borrowing aspects from the process described in figure 4.

Assuming an interleaving of 8 × 8 blocks (single-component, grayscale images) as inputs for the FDCT, 64 DCT coefficients are produced. These coefficients are then used in the quantization process as shown in figure 4 (a).

Quantization, as described in [24], is the processes of mapping a set of larger input values to a smaller set of output values, almost analogous to the process of rounding off numbers. The above-described implementation of a DCT-based encoder utilizes a 64-element quantization table taken as an input from the user. This process involves digitizing the amplitude or brightness values. The red line in figure 5 depicts this digitizing an analog signal. The discrete amplitude of the quantized output is represented on the y-axis as a binary digit. These discretized levels are known as representation levels separated by spacing called quantum. The signal is approximated to the closest level. This process is the core of all lossy compression algorithms and helps in achieving further compression.

Fig. 5: A simple quantization of a signal by choosing the amplitude values closest original analog amplitude [25]

In the given implementation, quantization is performed in conjunction with a quantization table and the input signal is digitized. This process is fundamentally lossy since this is a many-to-one mapping. This method causes the image loss in lossy image compression for DCT-based encoders. Quantization can be represented using equation (9). This is done by rounding of the quotient of dividing each DCT coefficient by its corresponding quantum to the nearest integer. Followed by normalization of .


Normalization is removed via multiplication with the quantum in IDCT, thus, returning the original coefficient supplied for quantization during DCT. This is also represented as equation (10).


Some quantization tables can be referred from [26] for CCIR-601 images and displays. Furthermore, they are ISO and JPEG recommendations, but not requirements.

The final step is entropy coding in DCT and its counterpart, entropy decoding corresponds to the first step in IDCT. The DC coefficients are taken separately from the AC coefficients after quantization as given in figure 6. The difference of the DC terms in the previous block in the encoding order as shown in figure 6. Followed by arranging the DC coefficients in a zig-zag sequence to help facilitate entropy coding.

Fig. 6: DC coefficients and AC coefficients being treated separately for entropy coding. [16]

JPEG recommends two encoding methods for entropy coding – Huffman coding [27], primarily used by baseline sequential codec, and arithmetic coding [28]. However, both codecs can be used for all modes of operation. Using the statistical characteristics of the coefficients, these entropy coding methods attain additional compression losslessly. This is in contrast to quantization, where some of the data is lost.

Entropy coding is defined as a two-step process by JPEG:

  1. The zig-zag sequence (figure 6) is converted into an intermediate sequence of symbols.

  2. The symbols are then converted into a data stream with no ‘externally identifiable boundaries.’

Each of the coding methods specify the form and definition of the intermediate symbol. Though arithmetic coding can be seen as a more complex process than Huffman coding, it produces 5-10

Iii Other Applications of DCT

As discussed in the previous section, DCT finds its primary application in lossy image compression. This is due to its strong “energy compaction” property and its closely related performance to KLT for strongly correlated Markov processes [7][29].

A variant of DCT, Modified Discrete Cosine Transform (MDCT) [30], and has found implications for audio coding and error concealment [31]. MDCT typically are designed using a recursive DCT-IV algorithm. Furthermore, audio formats such as MP3 [32], AAC [33], WMA and Vorbis [34] employ MDCTs for audio compression. This is because the data throughput for MDCT and Inverse MDCT algorithms is four times higher than previous algorithms. Furthermore, it also boasts a 50%-79% in ROM size. Hence, increasing overall chip efficiency and providing feasible architectural solutions.

Multidimensional DCTs or MD DCTs, another variant of DCT, finds its application in video compressions such as Theora video compression [35], MPEG and Daala. More advanced applications include adaptive video encoding [36]; this implementation uses an MD DCT coder for medical images. The MD DCT or 3D DCT is used to compress 3D cuboid which is a resultant of the given segmentation technique.


The author, Jacob John would like to thank Dr. Prabu Sevugan for his continuous support throughout this paper. I would also like to thank Vellore Institute of Technology for their aid without which this paper wouldn’t have been completed.


  • [1] Uhrina, M., Frnda, J., Ševčík, L., & Vaculik, M. (2014). Impact of H. 264/AVC and H. 265/HEVC compression standards on the video quality for 4K resolution.
  • [2] Chow, K. Y., Lui, K. S., & Lam, E. Y. (2006). Efficient on-demand image transmission in visual sensor networks. EURASIP Journal on Advances in Signal Processing, 2007(1), 095076.
  • [3] Piella, G., & Heijmans, H. (2003, September). A new quality metric for image fusion. In Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429) (Vol. 3, pp. III-173). IEEE.
  • [4] Forret, P. (n.d.). Megapixel calculator. Retrieved March 17, 2019, from https://toolstud.io/photo/megapixel.php
  • [5] Fisher, Y. (2012). Fractal image compression: theory and application. Springer Science & Business Media.
  • [6] Essays, UK. (November 2013). The Need For Image Compression Information Technology Essay. Retrieved from https://www.uniassignment.com/essay-samples/information-technology/the-need-for-image-compression-information-technology-essay.php?vref=1
  • [7] Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE transactions on Computers, 100(1), 90-93.
  • [8] Lee, B. (1984). A new algorithm to compute the discrete cosine transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1243-1245.
  • [9] Fike, C. T. (1968). Computer evaluation of mathematical functions.
  • [10] Pearl, J., Andrews, H., & Pratt, W. (1972). Performance measures for transform data coding. IEEE Transactions on Communications, 20(3), 411-415.
  • [11] Hou, H. (1987). A fast recursive algorithm for computing the discrete cosine transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(10), 1455-1461.
  • [12]

    Andrews, H. C. (1971). Multidimensional rotations in feature selection. IEEE Transactions on Computers, 100(9), 1045-1051.

  • [13] N. Chelemal-D and K. R. Rao, “Fast computational algorithms for the discrete cosine transform,” presented at the 9th Annu. Asilomar Conf. Circuit, Syst., Comput., Pacific Grove, CA, Nov. 1985.
  • [14] McMillan Jr, L., & Westover, L. A. (1994). U.S. Patent No. 5,301,136. Washington, DC: U.S. Patent and Trademark Office.
  • [15] McMillan Jr, L., & Westover, L. A. (1993). U.S. Patent No. 5,224,062. Washington, DC: U.S. Patent and Trademark Office.
  • [16] Wallace, G. K. (1992). The JPEG still picture compression standard. IEEE transactions on consumer electronics, 38(1), xviii-xxxiv.
  • [17] “Wayback Machine” (PDF). Web.archive.org. 3 September 2014. Archived from the original on 3 September 2014.
  • [18] “HTTP Archive - Interesting Stats”. http://archive.org.
  • [19] Nelson, M., & Gailly, J. L. (1996). The data compression book(pp. 457-458). New York: M & t Books.
  • [20] Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE transactions on Information Theory, 24(5), 530-536.
  • [21] Lossy vs. Lossless Compression - KeyCDN Support. (2018, November 21). Retrieved March 17, 2019, from https://www.keycdn.com/support/lossy-vs-lossless
  • [22] Adams, M. D. (2001). The JPEG-2000 still image compression standard.
  • [23] Meyer, Y. (1992). Wavelets and operators (Vol. 1). Cambridge university press.
  • [24] Gray, R. M., & Neuhoff, D. L. (1998). Quantization. IEEE transactions on information theory, 44(6), 2325-2383.
  • [25] Tutorialspoint.com. (n.d.). Digital Communication Quantization. Retrieved March 18, 2019, from https://www.tutorialspoint.com/digital_communication/digital_communication_quantization.htm
  • [26] Encoding parameters of digital television for studios. CCIR Recommendations, Recommendation 601, 1982.
  • [27] Huffman, D.A. A method for the construction of minimum redundancy codes. InProceedings IRE, vol. 40, 1962, pp. 1098-1101.
  • [28] Pennebaker, W.B., Mitchell, J.L., et. al. Arithmetic coding articles. IBM J. Res. Dev., vol. 32, no. 6 (Nov. 1988), pp. 717-774.
  • [29] Rao, K. R., & Yip, P. (1993). Discrete cosine transform: Algorithms, advantages, applications. Boston: Acad. Press.
  • [30] Princen, J., Johnson, A., & Bradley, A. (1987, April). Subband/transform coding using filter bank designs based on time domain aliasing cancellation. In ICASSP’87. IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 12, pp. 2161-2164). IEEE.
  • [31] Wang, Y., & Vilermo, M. (2003). Modified discrete cosine transform: Its implications for audio coding and error concealment. Journal of the Audio Engineering Society, 51(1/2), 52-61.
  • [32] Jacaba, J. S. (2001). Audio Compression Using Modified Discrete Cosine Transform: The MP3 Coding Standard. BSc Research Paper, The University of the Philippines, Diliman, Quezon City.
  • [33] Lai, S. C., Lei, S. F., & Luo, C. H. (2009). Common architecture design of novel recursive MDCT and IMDCT algorithms for application to AAC, AAC in DRM, and MP3 codecs. IEEE Transactions on Circuits and Systems II: Express Briefs, 56(10), 793-797.
  • [34] Lad, B. G. K. S. P. (2007). Implementation Of Ogg Vorbis On Analog Devices Sharc Processor Adsp–21364 (Doctoral Dissertation, Department Of Electronics & Communication Engineering, National Institute Of Technology Karnataka, Surathkal).
  • [35] Theora video compression. (n.d.). Retrieved from http://www.theora.org/
  • [36] Tai, S. C., Wu, Y. G., & Lin, C. W. (2000). An adaptive 3-D discrete cosine transform coder for medical image compression. IEEE Transactions on information technology in biomedicine, 4(3), 259-263.