Emoncms.org backup

I'm happy to announce that emoncms.org is fully backed up and has incremental backup implemented, all data is incrementally backed up once every 24 hours, a backup cron job runs hourly syncing 640 feeds each time so 15360 feeds every 24 hours. emoncms.org disk use is currently growing at a rate of about 300MB a day and the transfer format is csv which gives you an idea of the volume of data that the backup implementation needs to sync. The total volume of data I have synced so far using this is 49 GB.

The backup implementation uses many of the things already developed as part of the sync module which allows you to download feed data from a remote server. I've put the full emoncms backup script in the tools repository on github here:

https://github.com/emoncms/usefulscripts/blob/master/backupemoncms.php

For the above script to run, you need to first copy the users and feeds table from the master server to the backup server using the more common backup procedure of using mysqldump and scp, the steps to do this are described in the header comments of the backup script.

This method of backing up is much faster than using rsync which I originally tried for incremental backup as it does not go through each feed looking for changes it just checks when was the last datapoint in the backed up feed and downloads every new datapoint from the master server recorded after that time, one disadvantage of this is that any changes to feed data using the datapoint editor tools in emoncms will not get updated to the backup server. It would be good though to make it possible to delete data on the backup server if its deleted off the master server, as disk space is expensive and if you delete data off emoncms.org you would expect no copy to remain from a data privacy point of view.

I implemented the backup system like this because I had most of what I needed already in place in the sync module and so it was the quickest way for me to get this up and running using what I already knew. I'm aware the database replication can be performed with mysql replication, where a transaction log is stored on the master server and transferred to and then executed on the backup server. I'm interested in exploring this option too and if anyone can tell me that using mysql replication will offer significant performance benefits over the method above and why, that would certainly motivate me to look at it sooner.

I'm still reluctant to guarantee data on emoncms.org as both vm servers are in the same datacenter and they are part of bigv cloud which could even mean that both share the same disk (which would invalidate one of the reasons for a separate backup to protect against disk failure) although bigv suggested that this is unlikely as there are plenty of tails. They recognise this as a weakness and something they hope to change soon.

So if you want extra peace of mind I suggest installing a local installation of emoncms on your own computer and downloading your data periodically using the sync module, I do this both for extra backup and so that I can access the raw data for trying new visualisation and data analysis, processing ideas. I will write a guide on how to do this soon. The sync module is available here:

http://github.com/emoncms/sync

I'm interested in being transparent about how emoncms.org is hosted, so that rather than give opaque promises you can asses things like how its backed up for yourself. You often here people say that no system is absolutely secure and completely safe from failure so I hope that by being transparent about this you can see what has been done. I'm relatively new to administering web services and I'm sure if your a more experienced web admin reading this you may know how this can be done better, I would appreciate hearing how you think it could be improved.

To engage in discussion regarding this post, please post on our Community Forum.