daveread.com - Underwater Photography: The Digital Photographer

1. Sub-sampling. First up, you have to know that most computer images are stored as "RGB", which stands for red/green/blue. In a 24-bit image, each color component is assigned 8 bits, for a total of 256 different possible values. That means that every color you see on your monitor is constructed by picking one of 256 values of pure red, one of 256 of pure green, one of 256 of pure blue, and then combining those colors into one. 256 x 256 x 256 = ~ 16 million, so your "average" computer image has 16 million colors to choose from for each pixel. OK, so what? Well, JPEG doesn't use RGB, strictly speaking. The first step in JPEG is converting RGB to another "color space" called YCC. YCC is very similar to the color scheme used by color televisions. The 'Y' component is the black-and-white signal. If you looked at the Y component alone, it would look just like a B&W version of your original image. The two 'C' color components ('Cr' and 'Cb') are their actual names represent the color in the image. Collectively, the 'C' components are called the "chrominance," and the 'Y' is called the 'luminance.'

OK, back to JPEG. The first step in the process is to convert from RGB to YCC. This step is lossless, and does not involve any actual compression. But right after converting to YCC, JPEG does do some compression. First up, you split the image into 16x16 blocks of pixels, called 'macroblocks.' Each 16x16 macroblock consists of 4 smaller blocks of 8x8 pixels each. Here's where the compression comes in: JPEG "subsamples" the chrominance blocks. You should have 4 blocks of Y and 4 blocks of Cr and 4 blocks of Cb per macroblock, right? Well, JPEG throws away 3 of the Cr blocks, stretching 1 8x8 block of Cr to fit the whole 16x16 macroblock. The same is true for the Cb block. So you started with 12 blocks of data per macroblock (4 each of Y, Cr, and Cb), but now you only have 6 blocks (4 of Y, 1 of Cr, and 1 of Cb). Right off the bat, JPEG has achieved a 2-to-1 compression...at the cost of throwing away some information, though. Is this good or bad?

It turns out that your eye is very sensitive to changes in luminance, but not so sensitive to changes in chrominance. As a result, you can "stretch" the color information from an 8x8 block to fit an entire 16x16 macroblock without your eye being able to detect the difference.

2. DCT. DCT stands for "discrete cosine transform," and this is the heart of the JPEG algorithm. If you're familiar with a Fourier transform, DCT is the same idea except that it uses only cosines for the basis functions. If you've never encountered Fourier transforms, the basic idea is that you can transform a signal to determine what frequency components make up that signal. Think of striking two notes on a piano at the same time; your ear can detect that there are two different notes in there, and if you have a good ear you can figure out which exact notes. A Fourier transform is a mathematical way of performing the same process. DCT does a similar thing to the blocks of image data, except that it changes a 2-D block of pixels into a 2-D block of frequency components. The DCT stage of JPEG is "almost" lossless, but it does have some losses. In principle, you need infinite precision floating point numbers (and an infinite array of them) to do this sort of conversion. But JPEG uses integers, and only 64 (i.e. 8x8) of them, so the DCT has to throw away some data.

3. Quantization This is where the major compression takes place in JPEG. After the DCT stage, you have an 8x8 block of frequency components. Then you build an 8x8 "quantization" block, which is just an 8x8 block of numbers. You take each number in the 8x8 DCT block and divide it by the corresponding number in the quantization block. Obviously, the larger the number in the quant block, the smaller the resulting dividend will be. But wait, there's more. The division is an integer process. For instance, if you divide 8 by 3, normally you'd expect 2.6666 to be your answer. But in 'integer math,' the answer is 2 with a remainder of 2...except we don't keep remainders, so 8 / 3 = 2 in integer math. That's how the quantization is done. But there's an implication there: any dividend that is smaller than 1 will be set to zero. Later on, the zeros will be discarded.

The quantization stage is where you get to pick your image quality. When you choose "high quality," the software supplies a quantization block whose values are close to 1, so fewer cells have their values get truncated to zero. When you choose "low quality," the software supplies a quantization block with large values, so more and more cells of image data get divided down to zero. This is the reason that its difficult to compare the image quality settings of different programs; JPEG doesn't specify the values of the quantization blocks, so software developers are free to choose their own. The quant block gets written into the final JPEG file so the image-display program will know exactly what values were used in the quant block -- but the actual values are up to the software developers. This gives the developers tremendous freedom, but it also means that two different software packages can have radically different quant blocks (and thus radically different quantization results), even at the same quality setting.

The quantization step is the most lossy of all the steps, but it depends entirely on the quant block i.e. the image quality setting.

4. Run-length Encoding. This is the step that removes the zeros from the quantized DCT data. Run-length encoding is a very old idea. Here's an example: let's say we have a data stream like this: ABBBAACCCCD Instead of writing that stream directly, you could write A3B2A4CD, which you should read as "A" then "3 Bs" then "2 As" then "4 Cs" then "D." Our original stream of data was 11 bytes long, and the compressed one is just 8 bytes long. We saved 3 bytes. Clearly, though, the savings are dependent on how often you have repeated strings of values, and how long those strings are. You want longer strings of repeated values to make run-length encoding work better. Well, JPEG really concerns itself only with strings of zeroes, so in order to get good performance from the run-length encoding stage, you want long strings of zeroes. JPEG "encourages" this with the quant matrix. If you use larger and larger values as you go "down" and "across" the matrix, you pretty much ensure that you'll get a general trend of more and more zeroes in the process. This is "OK" from a visual standpoint, because the later values represent higher-frequency components. In other words, the DCT values that get truncated are the ones that allow for very sharp transitions between one color (or brightness) and the next in adjacent pixels. This is why your images start to look blurry at lower-quality settings; you are throwing away the information that allows them to be sharp!

Run-length encoding is lossless.

5. Huffman Coding. This is the last stage, and it's difficult to explain. Basically, Huffman coding removes statistical repetitions in the data. It does this by using short bit-strings to represent commonly occurring values, and longer strings to represent less common values. Imagine how much space you could save if you could store the letter 'e' in half the space! Well, Huffman coding is a way to do just that. An example: ASCII coding is the more-or-less universal method for storing text written in English. It uses 7 bits to store each & every character. In ASCII coding, a lower-case 'e' has a value of 101, which in binary is 1100101. What if we chose to represent 'e' with a single binary bit, say '1', instead of the 7-bit string listed above? Great! But then we're stuck, aren't we, because we could only have two letters in our alphabet, right? Well, no. The next most common letter is 't'. What if we used '01' for 't' ? And how about '011' for 'a'? and '0111' for 'o' ? You get the idea. By using smaller bit-strings to represent more common values, you can use fewer bits overall to encode the whole data stream. JPEG uses Huffman coding to encode the run-length values from stage 4. It's a small increase in compression, maybe 10-20% at best, but it's cheap, easy, and better than nothing.

Huffman coding is lossless.

So there's JPEG in a nutshell. There are 5 stages in the compression. 3 are lossless, 2 are lossy, and of the 2 lossy stages you can control the 'lossiness' in only one.


About \| UW Photos \| Scuba Picture Uv the Day \| Flower Gardens \| Aquashot \| Trip Reports \| RSDiver \| Photo Instruction \| News \| Desktop/Screensaver Images \| Site Map