Ciseco SRF and RFM12B Power Consumption Investigation

Following on yesterdays post on the Hope RF RFM12B power consumption I decided to do a comparison with the Ciseco SRF Radio. An RFu328 (miniature ATmega328 in XBEE footprint) was used to mount and interface with the two radios. The same 3.3V power supply was used with both modules

Ciseco RFu328 with SRF and RFM12B

The scope was connected up to measure the voltage drop across a shunt resistor as follows:

Oscilloscope probe measuring voltage drop across a 10R series resistor

Hope RF RFM12B

Here is the RFM12B current consumption trace while sending 4 integers using the JeeLib packet structure. Using this packet structure each integer takes up 2 bytes, therefore 4 integers is 8 bytes plus 1 byte containing the node ID, this gives a total packet size of 9 bytes. Transmission takes 2.7ms and the current consumption in the time is about 24mA @ 3.3V. This gives a power and energy consumption of 24mA * 3.3V = 79.2mW * 2.7mS = 0.214mJ = 214uJ


Ciseco SRF


A SRF V1.0a with serial firmware was used for this test.

The SRF is serial based. Ciseco have standardized on a communication structure called LLAP (Lightweight Local Automation Protocol)

A LLAP packet consists of one start byte 'a' , two bytes for the node ID then 9 bytes for the message. Encoding as HEX each LLAP packet can give us space for two integers. Each integer has a range of -32767 to 32767 which is fine for our standard emonTx setup which has a maximum reading of 25000W (100A x 250Vrms).

To transmit four integers from the emonTx (3 x power and 1 x voltage) would require two LLAP packets which each contains 12 char characters which gives a packet size of 12 bytes transmitted twice giving a total of 24 bytes.

Here is a current capture waveform of the SRF transmitting two LLAP packets, it's rather more interesting than the RFM12B, I would love to know exactly what the SRF is doing at each spike and dip.


Average power consumption of 20.8mA


Transmission of two LLAP packets takes 15ms

Transmission of two LLAP packets takes 15ms with an average current of consumption of 20.8mA. This gives and power and energy consumption of 20.8mA * 3.3V = 68.6mW * 15ms =1mJ.

This is 4.7 times more energy than the RFM1B for the transmission of the same four integers. This is mainly due to the efficient nature of the JeeLib packet structure sending the integers as binary rather than serial characters as in the case of the SRF. Transmitting four integers as HEX characters in two LLAP packets takes 24 bytes as opposed to the 9 bytes needed for the same four integers in the RFM12B JeeLib packet structure. Taking this into account the SRF consumes 41uJ per byte where the RFM12B consumes 23uJ per byte, this is around 1.8 times more power byte for byte than the RFM12B.


SRF startup 50mA spike

An interesting observation is that the SRF exhibits a rather high current spike of about 50mA as it's turned on / comes out of sleep. As this spike only lasts for only about 100nS it won't contribute that much to the overall power consumption. 

Energy Consumed While Sleeping

The energy used the the RF modules needs to be put in perspective with the overall consumption of the system. An emonTx running on batteries or low power temperate node will spend much of it's time sleeping, the ATmega328 consumes 4.3uA when sleeping and the SRF and RFM12B consume about the same when sleeping 0.2-0.3uA, giving an overall sleep mode power draw of 4.6uA or 0.0046mA.

Sleeping for 10s

Assuming a the case of a wireless node which sleeps for 10s in between readings. This gives a energy consumption of 0.0046mA * 3.3V = 0.0152mW * 10s = 0.152mJ = 152uJ.

If this node was using an RFM1B 1.4 times more energy would be consumed in the 3ms that the RFM12B is active while transmitting the data via RF then in the proceeding 10s when the node is sleeping

If the temperature node was using an SRF 6.6 times more energy would be consumed in the 15ms that the SRF is active while transmitting the data via RF then in the proceeding 10s when the node is sleeping.

Sleeping for 10 min

Assuming a the case of a wireless node which sleeps for 10s in between readings. This gives a energy consumption of 0.0046mA * 3.3V = 0.0152mW * (60s *10) =  9.13mJ.

The energy consumed while sleeping now becomes the greatest consumer. The energy consumed during sleeping for 10s is 43 times greater than the energy required by the RFM12B to transmit the data or 9 times greater than the energy required by the SRF to transmit the data.


Conclusion

If a ATmega328 based 'sleepy' node sleeps for 14s or more the energy used during sleeping will equal or greater the energy used by the RFM12B (to transmit four integers). If the nodes sleeps for 1 min or more the energy used during sleeping will equal or greater then energy used by the SRF (using serial LLAP to transmit four integers).

LLAP serial on the SRF not the most power efficient way to transmit integers compared to the RFM12B using the JeeLib packet structure. Power consumption of the SRF can be reduced at the expense of human readability of the data packets. I plan to investigate this further, see questions to answer below:

Questions to answer:

Does the extra energy consumed by the SRF result in increased range over the RFM1B?

The SRF by default is set at 10dBm transmission power (compared to 0dBM for RFM12B), this can be reduced all the way down to -30dBm in various increments, how much will this reduce energy consumption and range? Is there a sweet spot? The RFM12B transmits at 0dBm, how will the range of the SRF transmitting at 0dBM compare to the SRF? 

The SRF currently transmits at 9600 baud rate, this can be increased to 115200, will this reduce the time taken to complete a transmission and therefore energy used. How much will this effect loss of packets and range? Ciseco SRF setup documentation. 

Is it possible to interface directly with the SRF to transmit the raw packets not using serial?

Can power consumption of SRF be improved with new firmware?

I hear it's possible to use the CC chip on the SRF to offload the WDT to wake up the ATmega328 using a hardware interrupt, this could result in sleep current draw of around 0.3uA. I'm keen to investigate this.

New Oscilloscope - RFM12B Power Consumption

We have finally taken the plunge and have upgraded our measurement facilitates.

The main (and most costly!) acquisition has been a Rigol DS2072 70Mhz digital scope.



We also upgraded our old cheapo multimeter to a more accurate model which as well as the usual multimeter functions can measure AC power and frequency and has a USB link (Uni-T UT71E) and to complete the setup a variable DC voltage and current bench top power supply (Rapid HY3003D).

Test setup

Uni-T UT71E True RMS 40K count multimeter

I thought an interesting investigation and a good starting point for getting to grips with the scope would be to measure the power consumption during an RFM12B transmission.

With the standard passive probes the scope only measures voltage so a 10R 0.25W shunt resistor was used in series and the voltage drop measured over this. A 10R resistor works nicely since the voltage drop can easily be converted into current by dividing by 10 which is done by the probe which is 10:1 passive probe. Any other value could be used and the conversion to current done in software on the scope. The units for current (A) can be set on the scope. The shut resistor must be connected on the GND side of the circuit as to make one side of the resistor 0V or else use differential probes (or two standard probes with some clever settings on the scope).

Scope probe connection, measuring voltage drop across a 10R resistor

The maximum current which a shunt resistor of a particular power rating can handle can be calculated with Imax = sqrt( Pmax / R).

A 0.25W 1R shunt resistor can handle up to 500mA
A 0.25W 10R resistor can handle up to 158mA
A 0.25W 100R resistor can handle up to 50mA

All measurements were taken with a 3.3V supply voltage using and RFu328 on a prototype emonTx V3.


These traces show the ATmega328 coming out of sleep then transmitting 4 integers. Transmission of 4 integers takes 3ms during which the current draw alternates between 23-28mA for 3ms.   

Testing the scope's capabilities using the waveform zoom function
The built in measurement functions on the scope are pretty amazing, these of the sort of measurements it's possible to obtain for a particular signal. 

 
We intend to eventually setup a proper AC power test rig. In the meantime I intend to continue playing with more of the scopes functions, I've only scratched the surface so far. I'm particularly interested in hooking it up to my laptop for further data processing. The scope has both USB and Ethernet connectivity and an open standard protocol. As well as the software bundled with the scope which look pretty decent but is only for windows a quick google shows that much of the work has already been done to import the data from the scope on Linux.

The EEVblog review of the Rigol scope give more of an idea of it's full capability: 


http://www.eevblog.com/forum/testgear/first-impressions-and-review-of-the-rigol-ds2072-ds2000-series-dso/

http://www.eevblog.com/forum/projects/software-tips-and-tricks-for-rigol-ds200040006000-ultravision-dsos/
I hear it's also possible to perform a light software hack to convert the scope to 100Mhz :-)  http://hackaday.com/2013/07/02/unlocking-a-rigol-scope-once-again/




Emoncms powered by timestore

I've been making some good progress with a timestore powered version of emoncms, its available on github here: https://github.com/emoncms/emoncms/tree/timestore. The basics now work, its not ready for stable switchover, but if your interested in experimenting, its good to go for trying out. There are instructions for installation in the github repo readme.

To recap: timestore is time-series database designed specifically for time-series data developed by Mike Stirling.


Faster Query speeds
With timestore feed data query requests are about 10x faster (2700ms using mysql vs 210ms using timestore). The initial benchmarks I mentioned in previous blog posts show timestore request time to be around 45ms so I still need to investigate the slightly slower performance which may be on the emoncms end rather than timestore.

Reduced Disk use
Disk use is much smaller, A test feed stored in an indexed mysql table used 170mb, stored using timestore which does not need an index and is based on a fixed time interval the same feed used 42mb of disk space.

In-built averaging
Timestore also has an additional benefit of using averaged layers which ensures that requested data is representative of the window of time each datapoint covers.


Due to the integration of averaging in timestore there are significant implications for input processing and visualisations. Rather than compute kwh/d data from power data in an emoncms input process this can now be moved to timestore by querying timestore for average power in a day and converting this value to kwh/d by multiplying the average power by 0.024.

This means we no longer have two feeds one for kwh/d data and another for power data which changes the way many of the visualisations need to work. Visualisations such as simple zoomer need to query the same feed at different average levels.

To fully integrate timestore in emoncms and reach a similar level of input process and visualisation options is a big task. To simplify things, the current release has cut down functionality with only a few input processes and visualisations.

To create feeds with a fixed datapoint interval I added an interval selector to the input processing configure input page:

The data point interval is displayed as a field in feeds page:


Example of timestore working displaying solar production here over the last few hours:

Try it out following the instructions here for installation on a raspberrypi:

Next I intend to test the histogram datatype, ensure that works ok. The input processors for plus, minus, divide and multiply by another input also need looking at and then the more complex kwhd to power feed type zoomer visualisations. If a timestore version of emoncms is useful for you and your interested in helping to get it working well, I would be glad to have your help.


Great emoncms dashboards by Paul Reed, Jürgen and Tom + update on backup

There's a new forum on the forums for showcasing emoncms dashboard creations (thanks to Paul Reed for the idea), here are a few screenshots of dashboards created by Paul Reed and Jürgen:

I really like the additions of the icons, you can do this by inserting html in the text widget.


Some really nice looking multigraph data, forum post by Jürgen: http://openenergymonitor.org/emon/node/2557

Feel free to share your dashboard screenshots or links on the emoncms showcase forum:

Emoncms backup
A couple of additions to the script linked to in the last blog post: http://openenergymonitor.blogspot.com/2013/07/a-new-backup-system-for-emoncmsorg.html

Completes file backup correctly and better verbose output:

To run the backup script from a service, there's a service script available here:

To install the service script:
sudo cp /home/username/backup/emonbackup /etc/init.d/ 
sudo chmod 755 /etc/init.d/emonbackup 
sudo update-rc.d emonbackup defaults

Start the service:
sudo /etc/init.d/emonbackup start log

A new backup system for emoncms.org

The current emoncms.org backup system works by using two separate servers with data being synced from the main one to the second backup server using the same implementation used in the emoncms sync module http://openenergymonitor.blogspot.co.uk/2013/05/emoncmsorg-backup.html

A couple of weeks after getting all that setup BigV announced a new feature: archive storage which they say is ideal for backups. Archive storage works in much the same way as connecting a second external drive to a computer. Archive storage on bigv is guaranteed to be on a separate storage pool from the main vm disc which is good news as it wasn't guaranteed that the two separate server method used seperate storage pools (If I understand correctly)

Another advantage to the new archive storage is that its much cheaper to run, costing £2/month for 50GB rather than £16/month for another vm and 30GB extra space.

A simple php script is used to perform the backup, it makes use of the direct file access stuff I recently learnt about (http://openenergymonitor.blogspot.co.uk/2013/07/more-direct-file-storage-research.html) to do incremental backup file copy using php file access commands making it potentially very fast, it can backup files at full file copy speeds:

It accesses the mysql data directly, copying the content of the mysql feed data file i.e:  /var/lib/mysql/emoncms/feed_1.MYD to the backup drive.

On another slightly related note the data in feed_1.MYD is stored simply as:

1 byte for null flag
4 bytes for the timestamp
4 bytes for the float data value
repeated...

and because the readings are inserted one after the other in ascending time we can actually use the mysql feed data files directly with the direct file get_feed_data method to get 10-20x query speed improvement for generating visualisations: https://github.com/emoncms/experimental/blob/master/storage/directfiles/get_feed_data.php

We could even get rid of the mysql feed table index's to save disk space although that will probably slow down mysql updates (I need to look into it). This could be a short term measure before a full timestore emoncms implementation is complete which provides many other benefits.

Probably the best student placement opportunity ever!? developing an open energy management system at the Center for Alternative Technology

If your a student, looking for a student placement and interested in open source hardware and software, energy monitoring and sustainability this might be of real interest to you. The Center for Alternative Technology are offering an exciting student placement developing an open source energy control and management system for the CAT site, intended to ultimately replace two large proprietary Building Management/SCADA systems that will help them use the site more effectively as a living laboratory for sustainable energy systems.



Here's the full project description from Adam Taylor at CAT:

This placement will assist with the initial development and demonstration of a wireless open source energy management and monitoring system

Project description and person specification

Why?
At the centre we have two proprietary Building Management/SCADA systems that are used for monitoring and heating control; for historical reasons the two systems are not linked in anyway, leading to operational problems. Also as the site is effectively a living laboratory, constantly evolving in its use and layout, the proprietary nature of the systems lead to areas not being integrated or lead to expensive integrations.

Because of the lack of complete site wide monitoring, our knowledge of energy usage is restricted to a site wide level, not the office level that would be useful. Also because of the various none linked systems, heating operation is down to a member of staff switching it on appropriately early to warm the room up, and remembering to switch it off after the room has been finished with. With a detailed understanding of how much energy is actually being used, in comparison with our site energy model, as well as being able to automate the operation of the heating, has the potential to save CAT significant amounts of carbon, time and money.

Once the initial development and demonstration of the system Is complete, the project can then be opened up to the wider open source community, for further development. The demonstration system would also be expanded to cover the whole site, eventually replacing the two existing proprietary systems.

What?
Due to the proprietary and incomplete nature of the existing systems at CAT, a project was begun a few years ago with openenergymonitor.org developers to develop a system to gather data from our various electrical generators, and present it to the public visitors in a understandable way. This proposed project will build on the work of the open energy monitoring project, to develop an open energy management and monitoring system.

The project would initially be looking into the feasibility, and best method of implementation of the proposed plan, to use known well developed technologies such as Arduino micro controllers and xBee wireless RF modules, as part of a meshed wireless energy management and monitoring system.

The main part of the project would be to develop and install a relatively small demonstration of the system, but one that demonstrates all the fundamental requirements of the system. The fundamental requirements are to:
  • Monitor and record a variety of types of sensor readings; e.g. pulsed output anemometers, and variable resistance thermocouples 
  • Operate external systems; e.g. boilers, pumps and radiators 
  • Display recorded data in a useful manor to the general public; e.g. an electronic sign showing the performance of a solar thermal system 
  • Be able to be programmed easily with new control strategies and room bookings 
  • Be developed under open source principles 
  • Be fully documented 
Who?

The applicant should be someone with an interest in open source electronic development. The project requires someone with a rigorous and detailed approach to their work, who has an interest in working in the field of environmental monitoring.

The applicant will be working within CAT’s Estates and Technical department, specifically alongside the Engineering team. The multi-discipline teams contain experienced qualified mechanical, electrical, control and heating Engineers, as well as a plumber, electrician and builder. Guidance and support will be provided throughout the project by the team, but a fair amount of previous experience with electronics and software development will be required.

This is also an ideal opportunity for someone who wishes to learn, in a hands-on manner, about renewable energy generating technologies, eco building, mechanical or control engineering. As well as the energy management system project, the applicant will be helping out with the day to day duties of the teams, including fixing and maintaining the site buildings, heating, electrical supplies, control systems and displays. There will also be opportunities to be involved in any of the projects being undertaken by the teams during the placement. Projects either on-going, or due to start in the next twelve months include:
  • Specifying, installing and commissioning a new biomass boiler for the WISE building 
  • Connecting the new WISE boiler, the new biomass training centre and site community cottages onto existing site heat main, including reconfiguration of SCADA control strategy to account for new loads and heat sources 
  • Upgrading the two main hydro turbines on site 
  • On-going maintenance and upgrade work to the site cliff railway 
S.M.A.R.T. Targets 

Specific
goal:
The goal of the project during the year placement is for the applicant to develop and demonstrate and document, a wireless open source energy management and monitoring system. This system can then be expanded in the future so that it can be used to monitor and reduce site energy usage.

Measurable goals:
The system must be able to perform four basic requirements. All four basic requirements will be restricted to an individual node on the network for the demonstration, and therefore can be developed separately. The four basic requirements are to:
  • Monitor and record a variety of types of sensor readings; e.g. pulsed output anemometers, and variable resistance thermocouples 
  • Operate external systems; e.g. boilers, pumps and radiators 
  • Display recorded data in a useful manor to the general public; e.g. an electronic sign showing the performance of a solar thermal system 
  • Be able to be programmed easily with new control strategies and room bookings 
Attainable and Realistic goals: The project is a significant undertaking, but also one that is realistic and attainable. Within the Engineering team at CAT, exists all the skills and knowledge to undertake the project, all of which will be available to the applicant to help guide and steer when required. The project also has a real world end use as an objective, which will provide significant motivation to the applicant.

Timely goals:
The placement is limited to a period of one year, which provides a fixed end point to the project.


How to apply & Role description
http://content.cat.org.uk/index.php/vacancies?download=123%3Afunded-student-placement-building-energy-monitoring-system-developer-fixed-term

We've done a fair bit of work with the Center for Alternative Technology over the years, recently running a openenergymonitor course from there and in 2010-2011 working on a emoncms powered microgrid display and pulse counter/data logger for grid import/export.

Microgrid display project at CAT [1] [2] [3]
CAT OpenEnergyMonitor course

We will be happy to support you, go through things that may be helpful both on hardware and software design, the openenergymonitor lab is based 30 miles north of CAT and we often go down to CAT and Machynlleth.

CAT is one of Europe's leading sustainability center's, it is a university, a visitors center a pulisher of books, reports such as zerocarbonbritain and a place were practical solutions, sustainable technologies are tried out, tested and demonstrated.

Application deadline is the 15th of July

More direct file storage research

I was surprised to find how easy it was to use flat file storage for feed data using PHP file access commands and how fast this approach could be.

While reading up on indexes I realised that the timestamp column in a feed data table is its own index as it is naturally sorted in ascending order and each row (datapoint) should be unique. A data-point can then be searched for efficiently using binary search which I remember covering in A-level computing. The feed data in mysql had a binary tree index which If I understand correctly is similar but it is implemented using a separate index layer which uses quite a bit of extra disk space.

I had a go at implementing the get_feed_data function used in emoncms to select a given number of datapoint's over a timewindow used for drawing graphs. An example of the standalone function can be found here:

https://github.com/emoncms/experimental/blob/master/storage/directfiles/get_feed_data.php

A development branch of emoncms that uses this flat file approach and includes this function can be found here (inserting data, input processing such as power to kwhd and visualisation all work, but its still quite conceptual)

https://github.com/emoncms/emoncms/tree/flatfilestore

The get_feed_data function as implemented above takes roughly 120-230ms on a RaspberryPI to select 1000 datapoints over 1 to 300 days with a feed table with over 9 million rows.

Thats much better than the 900-2700ms achieved at similar ranges with the current mysql implementation.

The feed table in mysql used 178Mb of disk space. The same feed with no loss of data stored without an index and accessed as above takes up 67Mb so that's a considerable saving. Interestingly a 67Mb feed can be compressed to 18.5Mb with tar.gz compression.

One of the issues with the above get_feed_data query is that it needs to know the data interval to do the get a datapoint every x number of datapoints approach. We could use binary search to find every datapoint but this would be slower although maybe worth trying to get a benchmark so that it can be compared.

The other issue is that the datapoints selected may not be representative of the window they represent as they are just one random datapoint at a particular point in time. Which is the problem that the averaging approach used by Mike Stirling in Timestore and by Frank Oakner in EmonWeb solves.

Timestore is also a fair bit faster than the above get_feed_data function returning 1000 datapoints in 45ms.

The advantage of the above approach is that it can fit into emoncms without having to change the current implementation too much, the feed data retains its timestamps, input processing is used in the same way.

Not storing timestamps as timestore does could also be an advantage as it helps keep data quality high: fixed interval datapoints should be easier to use for comparison's, mathematical operations between feeds, it gives you higher certainty when querying the data, fetching data is faster and disk use is potentially half the size of the above approach if the values are stored as 4 byte floats as above rather than the default 8 byte double. This coupled with averaged layers provides data that is representative at all time scales and datapoint number requests.

My next step will therefore be to explore timestore further, first creating a script to export data from emoncms into timestore. The script needs to analyse the emoncms feed to work out the most common data interval. It needs to check for missing data, If a monitor went offline for an extended length of time it needs to give you the option to take this into account. It then needs to export and import into timestore as efficiently as possible.

In memory storage: PHP shared memory vs Redis vs MYSQL

Continuing on the theme of rethinking the data core of emoncms, as previously mentioned for short term storage, storage to disk may not be necessary, instead we can store data in memory using an in-memory database. Here are some tests and benchmarks for in memory storage:

To start with I created baseline test using MYSQL updating a feed's last time and value in the feeds table row for that feed. This took around: 4800 – 5100 ms to update a row 10000 times.


We would imagine Redis doing a lot better as its in-memory, it isn't writing to disk each time which is much slower than memory access. Redis did indeed perform faster completing the same number of updates to a key-value pair in 1900 – 2350ms. I'm a little surprised thought that it was only 2.3x as fast and not much faster, but then there is a lot going on Redis has its own server which needs to be accessed from the PHP client this is going to slow things down a bit, I tested both the phpredis client and Predis. Phpredis was between 500-1000 ms faster than the Predis client and is written in c.


How fast can in-memory storage be? A general program variable is also a form of in-memory storage, a quick test suggests that it takes 21ms to write to a program variable 10000 times, much better than 2.3x faster that's 230x faster! The problem with in program variables is that if they are written to in one script say an instance of input/post they cannot be accessed by another instance serving feed/list, we need some form of storage that can be accessed across different instances of scripts.


The difference between 21ms and 1900-2350ms for redis is intriguingly large and so I thought I would search for other ways of storing data in-memory that would allow access between different application scripts and instances.

I came across the PHP shared memory functions which are similar to the flat file access but for memory, the results of a simple test are encouraging showing a write time of 48ms for 10000 updates. So from a performance perspective using php shared memory looks like a better way of doing things.


The issue though is implementation, mysql made it really easy to search for the feed rows that you wanted (either by selecting by feed id or by feeds that belong to a user or feeds that are public), I'm a little unsure about how best to implement the similar functionality in redis but it looks like it may be possible by just storing each feed meta data roughly like this: feed_1: {"time":1300,"value":20}.

Shared memory though looks like it could be quite a bit more complicated to implement, but then it does appear to be much faster. Maybe the 2.3x speed improvement over mysql offered by redis is fast enough? and its probably much faster in high-concurrency situations. I think more testing and attempts at writing full implementations using each approach is needed before a definitive answer can be reached.

Load stat's for MYISAM vs INNODB for feed storage on the RaspberryPI

Here are some historic load stats for a raspberrypi running here, with 36 feeds being written to.

First using the INNODB storage engine:

A load of 3.5 causes an issue where the time that is recorded for data packets coming in gets messed up creating bunched up datapoints:

Switching the storage engine over the MYISAM, reduced the load to around 0.2 and the timing issue is no longer present:


To convert your raspberry pi emoncms Innodb tables to MYISAM you can run the following script on your raspberrypi which will go through each table converting them in turn:



  $mysqli = new mysqli("localhost","root","raspberry","emoncms");

  $result = $mysqli->query("SHOW tables");
  while ($row = $result->fetch_array())
  {
    echo "ALTER TABLE `".$row[0]."` ENGINE=MYISAM\n";
    $mysqli->query("ALTER TABLE `".$row[0]."` ENGINE=MYISAM");
  }


Rethinking the data input and storage core of emoncms: benchmarks

Over the last few days I've been looking again at the core data input, storage and access part of emoncms. There is definitely a lot of opportunity to improve performance and there are a lot of options so I thought I would start to do some more systematic benchmarking.

So here are some initial benchmarks of feed data storage in different storage engines: mysql (myisam vs innodb), timestore and direct file access. I also thought Id have a go at writing the current implementation of input processing in both python and nodejs in addition to php to learn a bit more about these languages as they are being used and favoured by others in the community such as Jerome (python), Houseahedron (python) and Jean Claude Wippler of Jeelabs (nodejs). Id like to see if there is any measurable difference in performance between these different languages for the kind of application that we are developing and if there are any other benefits: easier to do certain things etc.

Housemon by Jean Claude Wippler is a good example of how a timeseries data storage and visualisation application can be implemented in a different way by using a mixture of direct file storage and a redis in-memory database with the server side part of the application written in nodejs.

Intrigued by the idea of using direct file storage as Jean Claude Wippler does in Housemon and following the approach used by Mike Stirling in timestore of using a fixed time interval to simplify and speed up searching I had a go at writing a basic implementation using php file access and the results are good.

Storage engine test

All tested on a raspberrypi, running off the standard SanDisk SDHC 4Gb SD Card. 

MYSQL

https://github.com/emoncms/experimental/blob/master/storage/MYSQL/mysql.php
  • InnoDB INSERT 1000 points 21s,25s,20s (Normalised to 100,000 ~ 2200s)
  • InnoDB INSERT 10000 points 167s,183s (Normalised to 100,000 ~ 1750s)
  • MYISAM INSERT 10000 points 15-17s (Normalised to 100,000 ~ 160s)
  • MYISAM INSERT 100000 points 165s
MYISAM | INNODB READ

Benchmark of current emoncms mysql read function that selects given number of datapoints over a time window.

MYISAM results on the left | INNODB results on the right

https://github.com/emoncms/experimental/blob/master/storage/MYSQL/mysql_read.php

10000 datapoint table:
  • 1000dp over 5 hours (average method) 232ms | 391ms
  • 1000dp over 24 hours (average method) 424ms | 675ms
1000000 datapoint table: (115 days @ 10s)
  • all 0.2 hours (all method) 40ms | 38ms
  • all 0.5 hours (all method) 58ms | 55ms
  • all over 1 hours (all method) 90ms | 82ms
  • all over 1.3 hours (all method) 108ms | 100ms
  • 1000dp over 3 hours (average method) 237ms | 272ms
  • 1000dp over 5 hours (average method) 280ms | 327ms
  • 1000dp over 24 hours (average method) 726 ms | 949ms
  • 1000dp over 48 hours (average method) 1303 ms | 1767ms
  • 1000dp over 52 hours (php loop method) 2875 ms | 2650ms
  • 1000dp over 100 hours (php loop method) 3124 ms | 2882ms
  • 1000dp over 200 hours (php loop method) 2934 ms | 2689ms
  • 1000dp over 400 hours (php loop method) 2973 ms | 2749ms
  • 1000dp over 2000 hours (php loop method) 2956 ms | 2762ms
  • 1000dp over 2600 hours (php loop method) 2969 ms | 2767ms
PHP loop method timing may be quite a bit longer if the server is under heavy load as it involves making many separate mysql queries, each query needs to wait for other queries in the mysql process list to complete.
Timestore

Timestore is a promising solution, developed specifically for timeseries data, written by Mike Stirling.
Blog post on timestore: Timestore timeseries database

https://github.com/emoncms/experimental/blob/master/storage/timestore/timestore.php
  • 10000 inserts 52s
  • 100,000 inserts 524s
https://github.com/emoncms/experimental/blob/master/storage/timestore/timestore_read.php
  • Read 1000 datapoints over 5 hours: 45ms
  • Read 10 datapoints over 5 hours 20ms
Includes layer averaging and multiple layers so there is quite a bit more going on (that would still need to be added to other implementations like direct file and mysql above), so benchmarks are not directly comparable.

Direct file
For some reason I did not think this method would work as well as the benchmarks show but its great that it does because from an implementation point of view its really simple and very flexible as its easy to modify the code to do what you want, see the examples linked:
  • Direct file write 100,000: 6-7s
  • Direct file write 100,000 open and close each time: 27,24,26s
  • Direct file read 1000 datapoints over 5 hours of 10 second data in 85-88ms
  • Reads 1000 datapoints over 200 hours of 10 second data in 93ms
  • Reads 1000 datapoints over 2000 hours of 10 second data in 130ms
  • Reads 1000 datapoints over 2600 hours of 10 second data in 124ms
Redis
For a short term storage, storage to disk may not be necessary, instead we can store data in memory using an in-memory database like redis. Benchmarks to add.

Blog post: Redis idea

Other ideas for storage format
Languages
What about the programming language? No benchmarks yet but interesting to look at the difference in how the code looks. I found each language pretty straightforward to use and online resources to get me passed the bits I didn't know where readily available. The language links below show the core parts of the input processing stage of emoncms written in php, nodejs and python. I've also linked to emonweb a port of emoncms (or more a build in its own right be now) by Frank Oxener in ruby on rails.
Servers
Emoncms.org stats
HouseMon

HouseMon by Jean Claude Wippler stores data in 3 forms: 
  • Raw log of the serial data received to file (compressed daily) 
  • Redis in-memory storage for last 48 hours which makes for quick access of most recent data. 
  • Archival storage via direct file access for data older than 48 hours, the archive is hourly aggregated data (hourly - unless a use case demands finer resolution at which point the archive can be rebuilt from the raw logs). 
http://jeelabs.org/2013/02/17/data-data-data/
http://jeelabs.org/2013/02/18/who-needs-a-database/

Its quite clear from some of the above tests that the housemon implementation is going to be fast in terms of data access speeds (with redis storing everything in memory for the last 48 hours) and efficient in terms of data storage (binary files – hourly data), the big difference is that full resolution data is not available after 48 hours but Jean Claude Wippler argues that it would be better to wait for a use case rather than implement higher resolution for higher resolution sake and that logs can be used to rebuild archives at higher resolution if needed anyway.

Next steps

If you have a standard emoncms raspberrypi install, changing the mysql storage engine to myisam should bring immediate performance improvements, especially if you have a lot of feeds being recorded, I will try and put together a script to make this easier and also update the ready to go image.

The next development step I think is to integrate redis into emoncms by rebuilding the input processing implementation to use redis rather than go to disk to get the last feed and input values. Then it would be good to test both timestore and the integrated direct file storage in action on several parallel raspberrypi's, keep benchmarking the differences and then see where that gets us.