Graphics file formats and file compression

Graphics files tend to be large. If you remember earlier info, this is easily explained. 2 bytes per pixel, thousands of pizels.... Yup, they will fill your harddrive VERY QUICKLY if you don't do something. Quick example: Photo, 4x6" scanned at 300dpi, 16bit color: 2 x 300x300 x 24 (bytes, dots per inch, 24 square inches) = 4.32MB.

So - what do we do with it? The answer is compression - one of two types: lossy and lossless. The basic difference here is that you lose some information when going with lossy compression.

Let's start by discussing lossless compression. While there are a number of options, the big one is 'zip'. These days, it's built into windows (right click on a file or group of files/folders and select send to - compressed (zipped) folder. To unzip, just use the data explorer to open the zipped folder and copy/paste the files to a location outside the zipped file. Simple and easy. Zipping has two valuable properties/uses: first, a reduction in file size without losing anything (20-90% compression depending on the file) and second, you can zip multiple files into a single .zip file. This makes life much simpler when transferring files (especially over the web) - only one file to move, and it's smaller. The bummer is that you cannot directly work with zip files. You have to uncompress them. Thus, zipping is great for either backups or transfers, but not for working with the files. A number of files you will use at CWU are zip files.

As you go forth on the web, you will run into two other sorts of lossless compression - .tar and .gz files. These are, basically, zip files created on unix workstations. The properties are all similar to zip files, however, Windows cannot handle them. To unzip these files, use 7zip (loaded in the lab - free software).

Now, on to lossy compression as it applies to graphics files. Think about the example above. The raw files are huge.

Both .tif and .bmp files are uncompressed. In other words, huge. They are also industry standard file formats - basically every application can read them.

What about compressed files - the two most common (and industry standard) are .jpg and .gif. Almost every graphic you see on the web is one of these two. Why you might ask? - basically because compressed files download much faster. How much? Well, an order of magnitude (10x) compression without visible change is typical for a photo. In other words, that tif file we discussed above (scanned photo) would only be about 400k in size as a .jpg file. And, at this level of compression, the human eye typically cannot tell the difference. As a rule of thumb, I usually use about a 15-20% compression ratio - works great.

What's the difference between .gif and .jpg files? For all practical purposes, the only difference is 8 vs 16 bit color. Obviously, then, .gif files should be smaller - but they lose a lot in color. So - clipart-type stuff (only a couple of colors) are usually .gif format - photos are pretty much always .jpg.

In summary: .tif (lossless), .bmp (lossless), .jpg (lossy) 16 bit, .gif (lossy) 8 bit, Zip compression (lossless).

One more comment - in the GIS/remote sensing world, we work with two other types of compressed files: Mr Sid (.sid) and .ecw (by ERmapper). Both use wavelet compression (don't ask....) and get very high compression ratios without apparent file changes. I just want to expose y'all to these names, as you will run into them in the future.