Paranoid Backup

Can my workstation fit in this VM box?
    Photo by meddygarnet

I’ve always said that there are two kinds of people:  Those who backup and those who will. After working on computers for as many years as I have, I’ve had a number of mishaps that keep me firmly in the first camp. Nothing so major as losing a whole hard drive of important data, but enough small losses to keep me constantly on edge. I have several layers of back up in my backup plan, and here’s a peek into the mind of the paranoid backupper.

I actually have two back-up plans, one for my personal data and one for my professional. They’re different for a couple of reasons. My personal data is several times larger than my professional, for one, my personal data is all on Apple Macs while my professional is on Windows, and most of my professional data is already versioned in source control. In this post I’m only going to go over my personal backup, and leave the professional for another day.

I have 3 layers of backup:

  1. Hourly back-up using Time Machine.
  2. Weekly back-up using rsync and two rotating portable hard-drives.
  3. Weekly off-site backup using a custom python script and an FTP server.

Some details. I have 3 Macs to keep backed up and about 80 GB of data between them (including photos, e-mails, documents, and what-not) so I bought a 500 GB Time Capsule to replace my ageing wireless router. That works really well, and I’ve used it on more than one occasion to retrieve a past version of a file. The Time Machine interface is quite a bit of eye-candy (it’s pretty sluggish even over Gigabit Ethernet) but it got the job done on the rare occasion that I’ve needed it.

I then have two portable hard-drives when I swap between my office and my shed every week where I use rsync to backup the three Macs. With one copy always in the shed I should be protected against all but the biggest disasters.

My last layer is off-site. I tried a variety of different cloud backup solutions but wasn’t happy with any of them. They each had one flaw or another that I didn’t like. Using Mozy, for example, restore was far too slow to be practical; I waited several hours just to retrieve a couple of files. I hear that they’ve made improvements, but I’ve already moved on. Mainly, with the commercial solutions I didn’t like the idea of having my back-up in some form that I couldn’t access directly. That’s why I decided to write my own last year.

I wanted something that I could manage myself and that would use the lowest common denominator, FTP. I have a web server with GoDaddy that has unlimited disk space that I thought I’d use. I created a python script that finds any file that has changed, compresses it, encrypts it, and uploads it to the server. But with a twist. I take the MD5 hash and size of the file and turn that into a signature, and only upload the file if a file with that signature isn’t already on the server (the odds of a signature clash between two different files is astronomical.) Now, even if I copy a file from one computer to another it will only be uploaded once to my FTP server (which is handy when you only have 384 kbs upload speed.) I can move whole directories around and there’s no problem.

There you have it, my paranoid backup. Just because you’re paranoid doesn’t mean someone isn’t trying to degauss your disk… or something like that.