CSA3020

Lecture 8 - JPEG

References:
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia: Computing, Communications & Applications. Prentice Hall. Chapter 7.
Steinmetz, R. and Nahrstedt, K. (2002).Multimedia Fundamentals: Vol. 1. Prentice Hall. Chapter 7, Section 5.

Aravind, R., et. al., (1993). Image and Video Coding Standards.
JPEG/JBIG Home page

JPEG

In 1982, Working Group 8 of the International Standards Organization began working on the standardization of compression and decompression of still images. In 1986, the Joint Photographers Expert Group (JPEG) was formed, and in 1992, JPEG became an ISO standard.
The need for image compression is evident in the following example. A typical digital image has 512x480 pixels. In 24-bit colour (one byte for each of the red, green and blue components), the image requires 737,280 bytes of storage space. It would take about 1.5 minutes to transmit the uncompressed image over a 64kb/second link. The JPEG algorithms offer compression rates of most images at ratios of about 24:1. Effectively, every 24 bits of data is stuffed into 1 bit, giving a compressed file size (for the above image dimensions) of 30,720 bytes, and a corresponding transmission time of 3.8 seconds.

Overview of JPEG

Although JPEG is one algorithm, to satisfy the requirements of a broad range of still-image compression applications, it has 4 modes of operation.

Sequential DCT-based

In this mode, 8x8 blocks of the image input are formatted for compression by scanning the image left to right and top to bottom. A block consists of 64 samples of one component that make up the image. Each block of samples is transformed to a block of coefficients by the forward discrete cosine transform (FDCT). The coefficients are then quantized and entropy-encoded.

Progressive DCT-based

This method produces a quick low-resolution version of the image, which is gradually (progressively) refined to higher resolutions. This is particularly useful if the medium separating the coder and decoder has a low bandwidth (e.g., a 14.4K modem connection to the Internet, in turn providing a slow connection to a remote image database). The user can stop the download at any time. This is similar to the sequential DCT-based algorithm, but the image is encoded in multiple scans.

Lossless

The decoder renders an exact reproduction of the original digital image.

Hierarchical

The input image is coded as a sequence of increasingly higher resolution frames. The client application will stop decoding the image when the appropriate resolution image has been reproduced.

JPEG Operating Parameters and definitions

Parameters

An image to be coded using any JPEG mode may have from 1 to 65,535 lines and 1 to 65,535 pixels per line. Each pixel may have 1 to 255 components, although progressive mode supports only 1 to 4 components.

Data interleaving

To reduce the processing delay and/or buffer requirements, up to four components can be interleaved in a single scan. A data structure called the minimum-coded unit has been defined to support this interleaving. An MCU consists of one or more data units, where a data unit is a component sample for the lossless mode, and an 8x8 block of component samples for the DCT modes. If a scan consists of one components, then its MCU is equal to one data unit. For multiple component scans, the MCU contains the interleaved data units. The maximum number of data units per MCU is 10.

Marker codes

Different sections of the compressed data stream are delineated using defined marker codes. All marker codes being with a left-aligned hex "FF" bytes, making it easy to scan and extract part of the compressed data without needing to decompress it first.

Compressed-image data structure

At the top level of the compressed data hierarchy is the image. A non-hierarchical mode image consists of a frame surrounded by SOI and EOI marker codes. A hierarchical coded image will have multiple frames. Within each frame, a SOF marker identifies the coding mode used. Following an SOF marker will be a number of parameters and one or more scans. Each scan beings with a header identifying the components to be contained within the scan, and more parameters. The scan header is followed by an entropy-coded segment. The ECS can be broken into chunks of MCUs called restart intervals, which is useful for identifying select portions of a scan, and for recovery from limited corruption of the entropy-coded data. Quantization and entropy-coding tables may either be included in with the compressed image data, or be held separately.

Sequential DCT

This mode offers excellent compression ratios while maintaining image quality. A subset of the DCT capabilities has been identified by JPEG for a "baseline system". This section describes the baseline system.

DCT and quantization

All JPEG DCT-based coders begin by partitioning the image into non-overlapping 8x8 blocks of component samples. The samples are level shifted, so that their values range from -128 to +127 (instead of 0 to 255). These data units of 8x8 shifted pixel values are defined by S_ij, where i and j are in the range 0 to 7. The blocks are then transformed from the spatial domain into the frequency domain using FDCT:

This transformation is carried out 64 times per data unit, resulting in 64 coefficients of S_vu.
. The resulting 8x8 matrix will have coefficients ranging from S₀₀ to S₇₇, where S₀₀ is known as the DC-coefficient and determines the fundamental colour of the data unit of 64 pixels in the original image. The other coefficients are called AC-coefficients. To reconstruct the image, the decoder uses the IDCT:

The next step to perform is quantization. The process of quantization reduces the number of bits needed to encode data and also to increase the number of zero-valued coefficients. For this purpose, JPEG applications can specify a table with 64 entries, with a one-to-one mapping between the values in the table and the DCT-coefficients. Each DCT-coefficient is divided by its corresponding quantization value, and is rounded to the nearest integer. JPEG does not specify a quantization table in the standard. Applications can develop their own tables, which best suit the type of images used. The quantization table must be available to the decoder, or else the decoded image may be distorted. Dequantization is performed by multiplying each DCT-coefficient by the corresponding quantization value. Notice, however, that in the compression process, the dividend is rounded - therefore, this technique is lossy, as the decompression process cannot recover the original values of each pixel!
Most of the areas of a typical image contain large regions composed of the same colour. After FDCT and quantization, the corresponding S values will have very low values, although edges in the image will have high frequencies. On average, images have many AC-coefficients which are almost zero. The image is further compressed by entropy-encoding the DCT-coefficients in each data unit.

Entropy Encoding

Zig-zag scan.

If a vector of quantized values is constructed using a zig-zag scan shown in the figure above, then there will usually be a long run of zeros. First, the zero values of the AC-coefficients are run-length coded. Then, in the baseline system, the vector is Huffman coded. In non-baseline systems, Huffman or the more efficient arithmetic coding can be applied. In both cases, the Huffman or arithmetic tables must be available to the decoder. This supports sequential encoding, where the image is encoded and decoded in a single run.

Expanded Lossy DCT-based Mode

In addition to the method described previously, JPEG specifies progressive encoding. Instead of using just one quantization step, progressive encoding supports several which are applied iteratively. Basically, the bigger the quantization block, the less definition is encoded. So, using an 8x8 quantization table will directly match the 8x8 data blocks extracted from the image in the first place, and apart from the rounding error, will give a fairly accurate decompressed image. However, consider the situation where a 64x64 quantization table is used. Now 8 8x8 blocks will be quantized at a time, resulting in a significant loss in precision. The greater the quantization table, the lower the overall precision of the decompressed image. However, if many quantization tables are used and reapplied to the same DCT-coefficients, then as the image is being decompressed, it will be possible to gradually discern more and more definition. The major advantage is that if the image is being downloaded over a slow network connection, then the user can see what is in the image faster than if the sequential encoding has been used. The user can then interrupt the download if the image is not what s/he was expecting.

Lossless Mode

This mode is used when it is necessary to decode a compressed image identical to the original. Compression ratios are typically only 2:1. Rather than grouping the pixels into 8x8 blocks, data units are equivalent to single pixels. Image processing and quantization use a predictive technique, rather than a transformation encoding one. For a pixel X in the image, one of 8 possible predictors is selected (see table below). The prediction selected will be the one which gives the best result from the a priori known values of the pixel's neighbours, A, B, and C. The number of the predictor as well as the difference of the prediction to the actual value is passed to the subsequent entropy encoding.

Selection Value	Prediction
0	No prediction
1	X=A
2	X=B
3	X=C
4	X=A+B-C
5	X=A+(B-C)/2
6	X=B+(A-C)/2
7	X=(A+B)/2

Hierarchical Mode

This mode uses either the lossy DCT-based algorithms or the lossless compression technique. The main feature of this mode is the encoding of the image at different resolutions. The prepared image is initially sampled at a lower resolution (reduced by the factor 2ⁿ). Subsequently, the resolution is reduced by a factor 2^n-1 vertically and horizontally. This compressed image is then subtracted from the previous result. The process is repeated until the full resolution of the image is compressed.
Hierarchical encoding requires considerably more storage capacity, but the compressed image is immediately available at the desired resolution. Therefore, applications working at lower resolutions do not have to decode the whole image and then subsequently reduce the resolution.

Back to the index for this course.
In case of any difficulties or for further information e-mail cstaff@cs.um.edu.mt

Date last amended: 2nd September 2002