Mac OS X WebArchive Extractor Utility

2007-09-18

The other day I wanted to use TextEdit to make simple web pages for documentation purposes. I actually didn’t really care what application I used, just one that was more word processor-ish than a coding tool. TextEdit did most of what I needed by saving to HTML, but the kicker was I wanted to have images in the document.

TextEdit can save HTML with images in a file format called WebArchive, but in order to put the files on a web server for the world to view, the files need to be extracted from that format into a normal directory structure.

I found a utility on Sourceforge that would extract the file from a WebArchive format, but it seemed to mangle the src attribute of the image tags. Since the application was so close to what I wanted, I decided to tweak the code to my liking.

However, the source code provided on Sourceforge was incomplete - it was missing the .nib files. So I decided to just fork that project and put together my own Web Archive Extractor utility using most of that project’s code as a base.

After I got it all setup, it seems the core worked correctly without me changing anything. Meaning the image references where getting generated correctly. So either the binary distribution is not the same as the source, or I just messed up somehow when originally extracting.

Either way, my version works like I want it to, and it looks better ^_^

If you’re looking for such a thing, you can download it on the project page.

(Safari can also save whole web pages in WebArchive format - images, css, js, and all - by selecting Save As… If you are leaning something about a site, you can do that and then look over the site’s code at your leisure.)