Graphics file formats and file compression

Graphics files tend to be large. If you remember the lessons on print files, this is easily explained. 2 bytes per pixel, thousands of pizels.... Yup, they will fill your harddrive VERY QUICKLY if you don't do something. Quick example: Photo, 4x6" scanned at 300dpi, 16bit color: 2 x 300x300 x 24 (bytes, dots per inch, 24 square inches) = 4.32MB. Uncompressed, you couldn't put even one photo onto a floppy disk.

So - what do we do with it? The answer is compression - one of two types: lossy and lossless. The basic difference here is that you lose some information when going with lossy compression.

Let's start by discussing lossless compression. While there are a number of options, the big one is 'zip'. In the lab, we use the application 'Winzip.' Winzip has two valuable properties/uses: first, a reduction in file size without losing anything (20-90% compression depending on the file) and second, you can zip multiple files into a single .zip file. This makes life much simpler when transferring files (especially over the web) - only one file to move, and it's smaller. The bummer is that you cannot directly work with zip files. You have to uncompress them. Thus, zipping is great for either backups or transfers, but not for working with the files. A number of files you will use at CWU are zip files. Play around with WinZip a little - you'll be using it (or something similar) for years.

Update - the basic zip functionality is now built into Windows. Right click on a file/folder, select send to - compressed (zipped) folder. This will toss however many files you have selected into a single, compressed file. While you would think this would eliminate the need for Winzip (or something similar), you would be wrong. Remember UNIX workstations? That many GIS users use? Well, compression on those systems are usually .tar files. Which Windows can't handle.... To get around this, use Winzip or a competitor like 7Zip (free download and use).

Now, on to lossy compression as it applies to graphics files. Think about the example above. The raw files are huge.

Both .tif and .bmp files are uncompressed. In other words, huge. They are also industry standard file formats - basically every application can read them.

What about compressed files - the two most common (and industry standard) are .jpg and .gif. Almost every graphic you see on the web is one of these two. Why you might ask? - basically because compressed files download much faster. How much? Well, an order of magnitude (10x) compression without visible change is typical for a photo. In other words, that tif file we discussed above (scanned photo) would only be about 400k in size as a .jpg file. And, at this level of compression, the human eye typically cannot tell the difference. As a rule of thumb, I usually use about a 15-20% compression ratio - works great.

What's the difference between .gif and .jpg files? For all practical purposes, the only difference is 8 vs 16 bit color. Obviously, then, .gif files should be smaller - but they lose a lot in color. So - clipart-type stuff (only a couple of colors) are usually .gif format - photos are pretty much always .jpg.

In summary: .tif (lossless), .bmp (lossless), .jpg (lossy) 16 bit, .gif (lossy) 8 bit, Zip compression (lossless). Note zip compression is useful for both file compression and combining multiple files into a single file.

One more comment - in the GIS/remote sensing world, we work with two other types of compressed files: Mr Sid (.sid) and .ecw (by ERmapper). Both use wavelet compression (don't ask....) and get very high compression ratios without apparent file changes. I just want to expose y'all to these names, as you will run into them in the future.