Timestore timeseries database

The first and most developed solution to both the query speed problem and disk space problem is timestore.

Timestore is a lightweight time-series database developed by Mike Stirling. It uses a NoSQL approach to store an arbitrary number of time points without an index.

Query speeds
Timestore is fast, here's the figures given by Mike Stirling on the documentation page:

From the resulting data set containing 1M points spanning about 1 year on 30 second intervals:

Retrieve 100 points from the first hour: 2.6 ms
Retrieve 1000 points from the first hour (duplicates inserted automatically): 6.2 ms
Retrieve 100 points over the entire dataset (about a year worth): 2.5 ms
Retrieve 1000 points over the entire dataset: 7.0 ms

Disk use

Timestore uses a double as a default data type which is 8 bytes. The current emoncms mysql database stores data values as floats which take up 4 bytes, its easy to change the data type in timestore so for a fair comparison we can change the default datatype to a 4-byte float:

Layer 1: 10s layer = 3153600 datapoints x 4 bytes = 12614400 bytes
Layer 2: 60 layer1 datapoints averaged = 52560 datapoints x 4 bytes = 210240 Bytes
Layer 3: 10 layer2 datapoints averaged = 5256 datapoints x 4 bytes = 21024 bytes
Layer 4: 6 layer3 datapoints averaged = 876 datapoints x 4 bytes = 7008 bytes
Layer 5: 6 layer4 datapoints averaged = 146 datapoints x 4 bytes = 1168 bytes
Layer 6: 4 layer5 datapoints averaged = 36 datapoints x 4 bytes = 288 bytes
Layer 7: 7 layer6 datapoints averaged = 5 datapoints x 4 bytes = 40 bytes

total size = 12854168 Bytes or 12.26Mb

The current emoncms data storage implementation uses 60Mb to hold the same data as it saves both the timestamp and an associated index. Timestore therefore has the potential to reduce diskuse by 80% for realtime data feeds.

Interestingly all the downsampled layers created by timestore only come too 0.23 Mb. Before doing the calculation above I used to think that adding all the downsampled layers would add to the problem of disk space significantly but evidently it a very small contribution compared with the full resolution data layer.

Emoncms timestore development branch

I made a start on integrating timestore in emoncms, there's still a lot to do to make it fully functional but it works as a demo for now, here's how to get it setup:

1) Download, make and start timestore

$ git clone
$ cd timestore
$ make
$ cd src
$ sudo ./timestore -d

Fetch the admin key

$ cd /var/lib/timestore
$ nano adminkey.txt

copy the admin key which looks something like this: POpP)@H=1[#MJYX<(i{YZ.0/Ni.5,g~<
the admin key is generated anew every time timestore is restarted.

2) Download and setup the emoncms timestore branch

Download copy of the timestore development branch

$ git clone -b timestore timestore

Create a mysql database for emoncms and enter database settings into settings.php.

Add a line to settings.php with the timestore adminkey:
$timestore_adminkey = "POpP)@H=1[#MJYX<(i{YZ.0/Ni.5,g~<";

Create a user and login

The development branch currently only implements timestore for realtime data and the feed/data api is restricted to timestore data only which means that daily data does not work. The use of timestore for daily data needs to be implemented.

The feed model methods implemented to use timestore so far are create, insert_data and get_data.

Try it out

Navigate to the feeds tab, click on feed API helper, create a new feed by typing:

It should return {"success":true,"feedid":1}

Navigate back to feeds, you should now see your power feed in the list.
Navigate again to the api helper to fetch the insert data api url

Call the insert data api a few times over say a minute (so that we have at least 6 datapoints - one every 10 seconds). Vary the value to make it more interesting:

Select the rawdata visualisation from the vis menu

zoom to the last couple of minutes to see the data.

I met Mike Stirling a little over a month ago in Chester for a beer and a chat after Mike originally got in contact to let me know about timestore. We discussed data storage, secure authentication, low cost temperature sensing and openTRV the project Mike is working on. I think there could be great benefit to work on making what we're developing here with openenergymonitor interoperable with what Mike and others are developing with openTRV, especially as we develop more building heating and building fabric performance monitoring tools. This could all develop into a super nice open source whole building energy (both electric and heat) monitoring and control ecosystem of hardware and software tools.

Check out Mike's blog here:

The current emoncms feed storage implementation

Following on from the last blog post on server load and disk use, lets look at the current emoncms implementation of feed storage in a bit more depth before going on to look at how it can be improved.

Emoncms currently stores realtime feed data in a mysql database, every feed has its own mysql table. A feed table contains two fields: timestore and data value. Feed data is usually on a regular time interval, ie: 5,10,60s data. The time interval is set by the posting sensor node rather than emoncms.

Calculating feed disk use
We can calculate the estimated feed table size using the current implementation used in emoncms.

Lets say we want to store a year of 10s data. There are 31536000 seconds in a year and so 3153600 datapoints at a 10s data rate.

A single datapoint is made up of a timestamp which is stored as an unsigned integer, which takes up 4 bytes, and a float data value which also takes up 4 bytes.

3153600 datapoints x 8 bytes per datapoint (table row) = 24 Mb

In addition to the feed data we also have a table index which speeds up queries considerably. The worst case index size can be estimated with the equation detailed on this page:

index row size = (key_length+4) / 0.67

The key we are using is the time field which is 4 bytes and so the index row size is = (4 + 4) / 0.67 =~ 12 bytes

The index size for 3153600 datapoints is therefore approximately = 3153600 * (4 + 4) / 0.67 = 36Mb

The total feed table size will therefore be approximately 60Mb.

Feed query speeds
As emoncms has developed a fair bit of work has gone into improving the method that realtime data is queried. At first improvements seem promising, see this documentation page for detailed discussion on the query implementation and query speeds:

But growing server demand on and feed table size means they have often only staved off an eventual slow down. 

I think the last idea I had of using a php for-loop to request a single row at given intervals that originally  reduced query times by about 10x is no longer working well on, it still gives the 1.6s query time on my local installation of emoncms but on Im getting a mixture of short query times 500ms and much longer query times 20s+ (in the more than 55 hour timewindow). The reason for this I think is due to the php for loop having to wait when the server is under heavy load for other mysql queries to complete. I think another solution is needed.

In the next few blog posts I will look at some of the potential solutions to both disk use and query speeds. load stats

You might be wondering what kind of load is on, maybe you have experienced it slow down from time to time and then other times be much faster, maybe your just intrigued about how much its used and what are the challenges with hosting a site like this. So what better way to investigate the load on that use the visualisation features built into emoncms itself.

This first graph is of server load since late January 2013 as recorded by the command 'uptime'. You can see a clear drop in load on the 16th of March where the re-factored emoncms v5 was introduced with its reduction in mysql queries in input processing:

On a shorter time scale the load fluctuates at what seems to be periodic intervals with a significant spike every 10-12 minutes, a load of 3 is enough to make emoncms feel slightly sluggish on feed data requests.

This graph shows the number of feeds that where updated sometime in the last 5 minutes (not the number of feed updates in the last 5 mins which is much higher). I use this graph to check that an emoncms update has not caused a big drop off in active feeds, I check to make sure that the number of active feeds returns to the same level after an update.

The number of active feeds has grown from around 1350 to 2100 (750) since early March 2013, just over 300 new active feeds a month. The total number of feeds created in all time is 15660 a portion of these will have been deleted and replaced with new feeds, a portion will just be inactive.

Zooming in again on the last 4 hours shows that there are about 120 feeds that are updated on a longer than 5 min timescale and there doesn't seem to be a clear correlation between the server load spikes above and the update spikes here. Maybe some kind of mechanism to even out the load could be a beneficial feature to look into.

The next graph is the last server load related graph, it shows the time in seconds taken by the server to serve all requests, you can see again the time saved by the emoncms v5 implementation of input processing (the change of gradient on the 16th of March). High load events are also clearly visible as steps, some of these events made unresponsive for quite sometime, one of the larger high load events lasted 40 mins. disk use

Apart from load spikes, disk use is probably the most pressing concern with and probably emoncms in general. Disk use is growing pretty fast rising from 22GB at the start of February to 47GB now at the end of May. The vast majority of this is realtime feed data. I need to measure for certain but I think all the other tables including daily averages and histogram data come to only a few hundred MB.

Disk use is rising at about 280 Mb a day so about 100 GB a year, disk space is charged for on a monthly basis (£2 per month for 10GB For every 100GB stored the annual cost would therefore be £240 per 100GB per year without backup and £480 a year with second bigv server used for backup, so even if the number of feeds that are posting to stay constant and there are no new users, disk use costs for existing users will continue to rise if historic data is retained in its current form.

Luckily there are several potential solutions to this which I will come back to in another blog post, the most promising one being a method of compressing the data without loosing the vital time resolution on feed events. Essentially removing redundant datapoints, only preserving datapoints where changes are happening. This should be a beneficial feature for raspberry pi installations of emoncms as well where disk space is also at a premium.

So to summarise, I think there are two design challenges identified above that would be good to tackle going forward with emoncms:

- How to identify the source of and either distribute or limit the effect of whatever is causing load spikes.
- How to fix the disk space use problem.

While fixing these it would also be good to reduce query times and retain the vital resolution on feed events.

Apart from challenges the above graphs show the success of which is exciting, its great to see people use it, that's really encouraging. Lets keep rising the bar with what's possible with a fully open source cloud application energy monitoring service.

Recent commits to emoncms

There have been many great recent commits to emoncms thanks to PlaneteDomo, Baptiste Gaultier, Simon Stamm, Erik Karlsson (IntoMethod), Bryan Mayland (CapnBry), Ildefonso Martínez, Paul Reed and Jerome. Including improved translations, ability to translate javascript, query speed-ups, a working remember me implementation and work on the raspberry pi module. I thought Id write this blog post to draw attention to the great contributions that are being made and so that credit goes where its due:

Summary of additions:

PlaneteDomo - Implementation of a clean way of adding ability to translate text previously defined in javascript

Baptiste Gaultier (bgaultier) - A lot of French translation work

Simon Stamm - Added ability to display yen and euro in zoom visualisation, including an option to place the currency after the value ( 1 = after value, 0 = before value)
and fixed issue with floatval and json_decode:

Erik Karlsson (IntoMethod) - Fixed dashboard height issue, thanks to Paul Reed for reporting this bug on the forums:

Addition of async ajax calls for some visualisations this makes the dashboard feel alot snappier and page load is about 4-5 times faster.

Also a really significant fix that I've been really enjoying, Erik Karlsson fixed the remember me implementation that I failed to get to work properly:

Bryan Mayland (CapnBry) - Improved feed/data request query times: adds a 3rd query type using the mysql average method for times less than 50 hours (180,000 seconds).

Ildefonso Martínez (ildemartinez) - javascript code re-factoring

Paul Reed - tab between fields when logging in, average field in visualisations moved to the right.

Jerome (Jerome-github) A lot of work on the RaspberryPI emoncms module including continued work on the python gateway script. For ongoing discussion on raspberrypi module development see the github issues page here:

Id really like to thank these guys and everyone who continues to help out with development, there's a lot of hard work going in that's really pushing things forward. backup

I'm happy to announce that is fully backed up and has incremental backup implemented, all data is incrementally backed up once every 24 hours, a backup cron job runs hourly syncing 640 feeds each time so 15360 feeds every 24 hours. disk use is currently growing at a rate of about 300MB a day and the transfer format is csv which gives you an idea of the volume of data that the backup implementation needs to sync. The total volume of data I have synced so far using this is 49 GB.

The backup implementation uses many of the things already developed as part of the sync module which allows you to download feed data from a remote server. I've put the full emoncms backup script in the tools repository on github here:

For the above script to run, you need to first copy the users and feeds table from the master server to the backup server using the more common backup procedure of using mysqldump and scp, the steps to do this are described in the header comments of the backup script.

This method of backing up is much faster than using rsync which I originally tried for incremental backup as it does not go through each feed looking for changes it just checks when was the last datapoint in the backed up feed and downloads every new datapoint from the master server recorded after that time, one disadvantage of this is that any changes to feed data using the datapoint editor tools in emoncms will not get updated to the backup server. It would be good though to make it possible to delete data on the backup server if its deleted off the master server, as disk space is expensive and if you delete data off you would expect no copy to remain from a data privacy point of view.

I implemented the backup system like this because I had most of what I needed already in place in the sync module and so it was the quickest way for me to get this up and running using what I already knew. I'm aware the database replication can be performed with mysql replication, where a transaction log is stored on the master server and transferred to and then executed on the backup server. I'm interested in exploring this option too and if anyone can tell me that using mysql replication will offer significant performance benefits over the method above and why, that would certainly motivate me to look at it sooner.

I'm still reluctant to guarantee data on as both vm servers are in the same datacenter and they are part of bigv cloud which could even mean that both share the same disk (which would invalidate one of the reasons for a separate backup to protect against disk failure) although bigv suggested that this is unlikely as there are plenty of tails. They recognise this as a weakness and something they hope to change soon.

So if you want extra peace of mind I suggest installing a local installation of emoncms on your own computer and downloading your data periodically using the sync module, I do this both for extra backup and so that I can access the raw data for trying new visualisation and data analysis, processing ideas. I will write a guide on how to do this soon. The sync module is available here:

I'm interested in being transparent about how is hosted, so that rather than give opaque promises you can asses things like how its backed up for yourself. You often here people say that no system is absolutely secure and completely safe from failure so I hope that by being transparent about this you can see what has been done. I'm relatively new to administering web services and I'm sure if your a more experienced web admin reading this you may know how this can be done better, I would appreciate hearing how you think it could be improved.

12 input pulse counter idea

A while ago now Glyn and I worked on a design for a 12 input pulse counter, we where doing some work at the Centre for Alternative technology, a stripboard version was built and is in continued use monitoring grid import/export, chp and diesel generator (the last two not actually in active use).

We wrote up about it here:

After visiting CAT again recently and discussing a project they hope to do, it got me thinking again about the 12 input pulse counter. In non-domestic buildings that already have pulse output meters on many of the circuits and a meter room with all the meters in one place, a multiple input pulse counter may be the most effective way to add automatic meter reading.

I've wanted to make a PCB for the 12 input pulse counter for a while so I though I'd do a little work on it this morning, here's a screenshot of where I've got to so far:

Here are the features Im thinking it will have:
  • 12-input pulse counter
  • Optional pull down resistor with option for SMT or through hole, see building blocks pages linked above for why pull down resistors are required.
  • Input status LED, driven by pulse signal.
  • Dedicated ATmega for pulse counting
  • Serial connection to second ATmega used for ethernet or/and rfm12 comms.
  • Enclosed in a DIN rail mounted enclosure.
Here's the eagle design so far:

I used the rfm12pi board design as a starting point as it already had the basic atmega + rfm12 circuit in place. 

One thing I'm still wondering about is whether to add a second optional resistor between the terminals and the pull down resistor which would provide the option of having a voltage divider on the input for stepping down from higher pulse voltages like 24V.

More to come soon..

Continuation of emontx testing - feed comparison tool

Glyn has been running a parallel test of the new emontx v3 vs the old emontx v2 for over a month now. See Glyn's original post introducing emontx v3 here:

In testing a new emontx version here are a few questions that we would like to answer:

How does emonTx v3 accuracy compare at lower power's?
Is there any difference in low power readings with or without the powered from AC-AC adapter feature?
Can differences be explained by calibration error?
Are there any other measurement variations that need investigating?

To make it easier to compare the parallel test power feeds I though Id create a visualisation tool in emoncms that made it easier to see the difference between the feeds.

If difference is caused by calibration error then applying a calibration to the measured data should bring the difference down close to zero.

Any deviations in measurements that remain should be non-calibration errors, and they will appear off to one side of the linear PowerX vs PowerY plot.

The above visualisation can be viewed here:

This visualisation tool is available in the emoncms visualisations list if youd like to try this on your own monitor, even comparing say two different CT channels on a single emontx.

There are some issues I need to fix with the visualisation tool implementation that gives rise to some incorrect comparisons at some scales to do with the way it selects datapoint id's to compare.

In the next post I will explore differences between the two parallel test power feeds.

On another topic:
One of the interesting things I did yesterday was use the raspberrypi emoncms module on my ubuntu laptop. I used a jeelink connected to the usb port of the laptop and then configured raspberrypi_run.php to connect on port /dev/ttyUSB4 instead of the default raspberrypi port, this could be a useful configuration for anyone who just wants to log data from the nodes locally to their laptop and as Jerome pointed out here, maybe the emoncms raspberrypi module should just be called the emoncms linux board module or just serial interface module.

Testing DS18B20 temperature sensing on emonTx v3

Hello, so I thought Id try something new on this blog, I'm going to try writing short blog posts about progress and general work on things day to day, these are not intended to mark significant developments or milestones which many of the other blog posts have tended to be, something more like a log book that will hopefully give insight into the development process.

Testing DS18B20 temperature sensing on emonTx v3

Glyn and I with help from Robert are continuing with testing emontx V3, there have been a few niggling issues that we're working on but its pretty much there. This morning I tested the DS18B20 temperature sensor connection, it all worked fine, one of the nice things of the new emontx is that there are screw terminals for connecting up the temperature sensor which makes connecting up the encapsulated DS18B20 temperature sensors much easier.
DS18B20 connected to emontx v3
Another new addition for temperature sensing is the ability to switch the sensor power pin on and off for use when powering the emontx of batteries (this avoids the hack introduced in the low power temperature node that uses the emontx v2.2 pcb). Anyway the circuit was originally designed with analog 5 (used as digital pin 19) being used for data and digital 5 used for power. While reading about using analog inputs as digital pins I came across this note on the arduino site:

The Atmega datasheet also cautions against switching analog pins in
close temporal proximity to making A/D readings (analogRead) on other
analog pins. This can cause electrical noise and introduce jitter in
the analog system. It may be desirable, after manipulating analog pins
(in digital mode), to add a short delay before using analogRead() to
read other analog pins.

It turns out one can just switch the atmega pins used around so analog 5 (digital 19) is used for power and digital 5 is used for the data which will be higher frequency switching and so good to bring that off the analog pin. As power switching is only needed in battery operation one could make a temperature measurement after the analog read section just before putting the emontx to sleep which should not cause interference.

Update (15/06/13): On future emonTx V3 PCB revisions 3.x ADC 5 (Dig 19) and Dig5 have been swapped round on the PCB to fix this issue. Dig 5 is now DS18B20 one-wire signal and ADC5 (Dig19) is now DS18B20 power.

If no DS18B20 temperature sensor is connected the ports can be used for other functions. ADC 5 can be used as a analogue input/output and Dig 5 (with R27/R24 4.7K pull up removed) can be used as a general Digital I/O with PWM capabilities. 

The DS18B20 data 4.7K pull-up resistor on Dig 5 and the Dig 2 IRQ 10K pull-down resistor have been designed to take a thru-hole resistor if the SMT resistor was removed to make it possible to change the value of these resistors. 

Open Source Hardware Users Group (OSHUG) #26 Meetup

Last night I attended OSHUG event #26 in London and gave a short talk on Low Power Wireless sensors as well as the brief overview of the OpenMonitorProject and a quick look at what we've been working on recently.

There are a couple of new screenshots in the presentation from the open-source SAP building modelling emoncms module that Trystan is been working in with Manchester @CarbonCoop recently. This is an exciting bit of development. The idea is that the building model could be matched up to the monitoring results (temperature profile & heat input) of a building, the model could then be used to investigate the effect of undertaking improvement measures such as wall, loft insulation or external cladding etc.. Finally after completion of improvement measures the monitoring data can again be used to evaluate the actual performance of the improvement measures.

Osug #26 low power wireless sensors from OpenEnergyMonitor

It was great to meet everyone and put faces to (twitter) names! Big thanks to Andrew Back @9600 for organising the event and @skpang_uk for sponsoring the event...and buying me a beer afterwards!  

One Year of Solar PV Monitoring

On the 23rd November 2011 we have had a 2.9Kwp solar PV installed. 

We recently got our first payment from the UK's governments Feed In Tariff scheme. This has prompted me to take stock of how the system has been performing and how the data collected using the OpenEnergyMonitor energy monitoring system compares to the utility company's billing meter:

From the 23rd Nov 2011 - 23rd Nov 2012 the billing utility meter on the solar PV system has recorded a  generation of 2069 kWh. Over the same period we consumed 3588 Kwh, 57% of our electrical energy needs have been met by the solar PV. 

At this rate it looks like we're on track for the system to pay back in 7-8 years, maybe even less if we have some more sun in the next few years, fingers crossed! 

For the same time period the OpenEnergyMonitor monitoring system has recorded a generation of 2029 kWh, giving the energy monitor an accuracy of 98%! 

The monitoring system is a standard emonTx with an AC-AC adapter taking Real Power readings

One Year of PV Generation
Kwh/d Electricity Consumption (orange) Overlaid with Generation (blue) 

When the system was installed the solar PV company estimated we would produce 2434 Kwh per year, we generated 15% less than this estimate. This could partly be attributed to the very poor summer we experienced in 2012.

This year is already looking promising, on the 2nd of May 2013 our system generated a record (for us) of 18.3 Kwh 1.9 times more than we consumed on that day (9.7 Kwh):

Record Generation on 2nd May 2013
Fingers crossed for a sunny 2013 summer!