Website Backup

In the interest of open-source I thought I would share the backup setup we have running for the OpenEnergyMonitor website. I'm relatively new to sys-admin tasks and writing bash scripts so please suggest if you think if something could be implemented better.

Backing up our Drupal SQL databases which contain the user credentials and all the forum and text content of the website was relatively easy since the disk space they take up is relatively small. A nightly SQL dump then a scheduled secure FTP bash script running as a nightly cronjob on a Raspberry Pi with external hard drive to download the zipped SQL database does the trick. The FTP login credentials are stored away from prying eyes in .netrc file (with chmod 600), two sets of credentials are required and the relevant .netrc file is copied to the home folder when needed.

cp netrc/.netrc1 .netrc
today=$(date +"%d-%b-%Y")
ftp -vp -z secure $HOST << EOT
get $db_name-$today_backup.gz $LOCAL_BACKUP/$db_name-$today_backup.gz
rm .netrc

 Backing up the files (images, documents etc) is a bit more of an issue since the ever increasing size of the content mean it's impractical and would unnecessary load the server and bandwidth to download a full snapshot every night.

I found wget has many customisable options. A nightly scheduled bash script running on a Raspberry Pi with an external hard drive with the following wget options looks at files have been created or modified since the last time the command was run and only downloads the changes. Once the initial download is done the command only takes less then a minute to execute and often only downloads a few Mb of data. The option '-N' tells wget only to download new or modified files

cp netrc/.netrc2 .netrc
wget -m -nv -N -l 0 -P $LOCAL_BACKUP ftp://$HOST/public_html/FILES_LOCATION -o $LOCAL_BACKUP/filelog=$today.txt
rm .netrc
# This is what the other options do:
# -l 0 infinite level of recursive (folder depth)
# -m mirror
# -N only download new files
# -o logfile
# -b run in the background
# -q turn off logs
# -nv non-verbose logs
This setup seems to be working well. It has a few weak points and limitations that I can think of:
  • The wget files backup script only downloads new and modified files, it does not mirror the fact that a file could have been deleted on the server, the file would remain in the backup. 
  • The wget script does not keep historical snapshots meaning that if something bad was to happen it would not be possible to rollback to a certain date in history. Update: I have since had recommend to me Rsnapshot which is a backup utility based on Rsync. Rsnapshot looks great and can work over FTPS. My friend Ryan Brooks wrote a good blog post on how to set up Rsnapshot over FTPS
  • Currently the Raspberry Pi only has the one external 1TB hard drive used for backup, ideally this would be two hard drives in a raid array for double safety Backups are only done nightly, this is plenty good enough for us at the moment but might need to be improved in the future. 

I think it's amazing that a little £25 Raspberry Pi is powerful enough to handle backup for several websites. the Pi with an external 1TB hard drive connected through a USB hub consumes only 5.7W making it not too bad to leave on 24/7.

 One issue that I had initially with the Pi is that the external hard driver would move from /dev/sdb to /dev/sdc therefore loosing it's mount point. I think this was caused by the HDD momentarily losing power. Switching to using a Pimoroni PiHub to power the setup and mounting the drive by it's UUID instead of /dev/xxx reference in fstab fixed the problem: 

UUID=2921-FCE8 /home/pi/1TB vfat  user,umask=0000   0   0

I would be interested to hear if you think how the backup could be implemented more efficiently or more securely. To engage in discussion regarding this post, please post on our Community Forum.