scp
it down to a local machine to
archive to a tape or CD.
In December of 2003, one of Acorn's upstream providers decided to suddenly flake out, close up shop, and remove the physical computers that contained this site (and others) from the data centers, leaving us needing to move to a new server and restore from backups, which unfortunately were a couple of months old. Luckily the Acorn Hosting crew moved heaven and earth and got us back up and running with recent data, and I am thankful for them expending the effort to do that.
After nearly averting disaster, it was time to figure out a reliable
and automated backup strategy. A wise friend pointed me to rsync
, a remote file
synchronization tool, as well as a page on easy
automated snapshot-style backups with Linux and rsync. And it did
look pretty easy.
What follows here is how I have things set up between the website computer (running a flavor of Linux) and my home systems (Mac OS X), but the techniques should be applicable to any unix system. This is just a simple "synchronize directories across two machines" kind of setup, not the more complicated snapshot backups referenced in the above link. What I end up with is a picture of the disk contents (as of 4 in the morning) saved to my home computer, which I can then archive to tape or a CD.
rsync
keeps two directory trees synchronized by sending
the differences between files across the network. This means that the
initial synchronization will pull all of the data down but subsequent
synchronizations will only send across the changes that have been
made. Very little information is transferred if the majority of the
data is unchanged, as it is with my websites. This is perfect for
satisfying both doing backups and for not exceeding my bandwidth quota,
which I would do if I moved all the files across the network
every time.
Most files, like the website pageroots, can be rsync
ed
"in-place", meaning rsync
will just pull down the files
from where they sit. There is other data that is not as easily
accessed, such as database contents (which need to be exported first),
and some privileged configuration files that have to be read as root.
This data will need to put into a different form before being
downloaded.
$HOME
directory
rsync
command on the local
system. The last two need some work.
pg_dumpall
command will do a full
consistent export of the database as a sql script that can be fed to
psql
on another system to restore the database as it was
at the time of the export. This isn't quite as nice as Oracle's
point-in-time recovery (since I can lose up to 24 hours of database
activity), but for the minimal amount of work involved, and for
general disaster recovery, it's not bad.
This is my daily-backup script, which I cleverly call
pg-daily-dump.sh
:
This can probably be turned into a nice little one-liner, but I find this easy enough to understand. It puts the dump files with the name pattern "pg-day-month-year.dmp.gz" (specifically like#!/bin/sh LD_LIBRARY_PATH=/usr/local/pgsql/lib:/usr/local/lib DATENAME=`date +"%d-%b-%Y"` BASENAME="/home/bork/db-backup/pg-${DATENAME}.dmp" pg_dumpall -U webuser > ${BASENAME} gzip ${BASENAME}
pg-31-Dec-2003.dmp.gz
) into a directory in my
$HOME
. The webuser
name is whatever
username owns the databases you're interested in, which is usually the
same as username that the webservers use to access the database. By
putting the database dumps into my $HOME
they'll be
pulled down when I rsync
my home directory.
Some folks like the "YYYY-MM-DD" format, since it sorts nicely with
ls
. Use date +"%Y-%m-%d"
in the
DATENAME
part above to get this date format.
I run this script at 1:45 every morning via my crontab
(not as root):
# run at 1:45 A.M. 45 1 * * * /home/bork/pg-daily-dump.sh
qmail
as my incoming
mail handler and to provide some forwarding services for some of the
domains I run. qmail
is pretty paranoid about ownership
and permissions, and because of this I can't read many of the files as
a regular user to back them up. So I have root do it. Root
can make a tarfile and drop it into my home directory. Like the
Postgresql dump, this will be pulled down to the local machine when my
$HOME
is rsync
ed.
This is in root's crontab:
Which just tars up# run at 3:05 A.M. 5 3 * * * /bin/tar zcf /home/bork/root-backup/qmail.tar.gz /var/qmail
/var/qmail
, containing the binaries,
configurations, and mailboxes and sticks it into my
$HOME
.
I also use DJB's daemontools for keeping
services up and running. The run
scripts are owned by
root and unreadable by the rest of the world, so something similar is
done to back them up:
# run at 3:30 A.M. 30 3 * * * /bin/tar zcf /home/bork/root-backup/service.tar.gz /service /service/*/run
$HOME
. Now it's time pull the files down.
On my iLamp here at home, I have a crontab running at 4 am as an ordinary user:
And the# run at 4 A.M. 0 4 * * * /Users/bork/rsync-borkware.sh
rsync-borkware.sh
script is just:
The#!/bin/sh cd /Users/bork/rsync-backup rsync -a -e ssh bork@borkware.com:/home/bork/ home rsync -a -e ssh bork@borkware.com:/usr/local/cvsroot/ cvs rsync -a -e ssh bork@borkware.com:/var/lib/aolserver/ web
-a
parameter performs a recursive synchronization
(directories and subdirectories), and it preserves symbolic links,
creation and modification times, and preserves group ownership. The
-e
parameter specifies the mechanism for getting to the
remote system. Here I say to use ssh
, the secure shell,
to provide an encrypted way of moving the data.
bork@borkware.com:
is the machine, and the user name on
that machine to use for logging. The full paths there are the
directories on the remote machine I want to synchronize. The final
argument is the directory (inside of my
$HOME/rsync-backup
) on the local machine to put the
synchronized files.
rsync
via ssh
will prompt for a
password, which will break when run via cron
.
ssh
can be configured to use public/private key
authentication for passwordless login. In short, you'll do something
like this:
% ssh-keygen -t rsa
% scp ~/.ssh/id_rsa.pub borkware.com:
authorized_keys
% mv ~/id_rsa.pub ~/.ssh/authorized_keys
crontab
entries, and then another day to test and iron
out problems.