Downloadable formats
Posted: September 12th, 2005 | Author: amake | Filed under: Software | No Comments »I wasn’t sure what to call this topic. Software delivery vectors? Software download file formats? Anyway, what I want to talk about is form a piece of software is in when you download it.
First of all, why is software put into some special format for transfer? There are a couple reasons.
- To save space. Back in the day, the difference between 10 KB and 1 KB was massive. If you don’t want to wait all day for your several-hundred-baud modem to download that text file, or if you want to squeeze the most out of your several-hundred-KB floppy, compression is your friend.
- To bundle files together. Include a readme or a license file with your executable. There’s nothing like good documentation that people never actually read.
- On the Macintosh platform, data can be stored in both the regular data fork, and another fork called the “resource fork.” Other platforms mostly don’t have such a concept, so when travelling over the internet you need to make sure all your data is in the data fork. Compression formats do that. (Note that, for instance, OS X’s built-in zip compression scheme has special extensions to handle resource forks.)
In dealing with these things, it seems like things went from complicated to simple to complicated. Way back in the day, most people didn’t have things like Stuffit, so a lot of things were distributed as self-expanding archives. It was basically an executable that would decompress an attached data payload. But you couldn’t count on everything to be in this format, so you were often caught in the chicken-and-egg problem of “I need Stuffit, but the download of Stuffit is stuffed, so I need Stuffit to download Stuffit, but…”
Then Aladdin (now Allume; what idiot came up with that name?) wanted to entice people to shell out for Stuffit Deluxe, plus handing out executables with your compressed data is inefficient, so they managed to get the free Stuffit Expander bundled with all Macs. That ushered in the Stuffit era, where pretty much everything for Mac OS was delievered in .sit, or later .sitx, files.
But why should Allume get to suck the teat of the industry when there are open and more versatile compression formats out there? Apple introduced the Disk Image format late in the life of the “classic” Mac OS, and in OS X it has become the preferred method of software delivery, with the built-in zip solution coming in a close second.
Disk Images are interesting beasts. It’s basically a file with a filesystem inside it, that gets mounted as a virtual disk when you open them. They can be fixed in size or expandable (“sparse” disk images), compressed or uncompressed, encrypted or unencrypted. They support interesting features like the ability to display a license agreement before mounting, where disagreeing will prevent the image from being mounted. There’s also the “internet-enabled” feature by which a disk image’s contents will be copied out and the image thrown away upon opening it. Since the result is basically a disk, developers can do cute things like setting custom background images to make things pretty or give instructions like “drag to the Applications folder.” In short, the .dmg (or .sparseimage) format is pretty nifty.
So how do developers manage to ruin it? They zip or gzip the disk image. This means that when I download such a beast, first I have to unzip it, and then I have to deal with the disk image. Why not just max out the disk image’s internal compression setting? That way everyone’s happy. The CLI command to do this is:
hdiutil create -imagekey zlib-level=9 -srcfolder $SOURCE $TARGET
I did a little experiment: I took a large text file and put it in a disk image with different compression settings. Then I tried also compressing the disk images. Here are the results:
| Format | Compression | Size (KB) |
|---|---|---|
| .txt | none | 5,359 |
| .dmg | zlib-level=1 (default) | 1,403 |
| .dmg | zlib-level=9 (max) | 1,207 |
| .zip | default | 937 |
| .dmg.zip | zlib-level=1 & default | 1,100 |
| .dmg.zip | zlib-level=9 & default | 930 |
Clearly, at least for plain text, the best results are given by using the max dmg internal compression and zipping it on top of that. But considering that most people seem to be using the default level and then zipping, the difference between that and max level without zipping is small (about a 10% increase). For the convenience, I think it’s worth it, especially in today’s world of fat pipes and monstrous disks.
]]>






Leave a Reply