Data portability: Version 1 complete!

Following on from the last series of posts detailing an implementation for exporting and importing data into emoncms to make it possible to move feed data between emoncms instances with ease, these features are now fully integrated in to the latest emoncms version.

I have also created a rough initial interface for selecting the remote account to download from and for selecting the feeds to download, you can get to this page by clicking on the Sync feeds button on the feedlist page:

To select an account enter the account URL and write apikey and click ok, after a moment the feed list of the remote account will appear. 

You can then click on the feeds you wish to download. This will enter the feeds into a download queue. As detailed in the last post the download and import process is run by a separate script.  The script is called import.php and can be found in the root directory of the emoncms install. You can either run this script manually or set up a cron job to run it periodically.


Once you start downloading feeds, if you refresh the page you will see the remote ahead by value decrease as data is downloaded starting with the oldest data first. It is also possible to sync the data again at a later data to bring the local copy up to date.

This feature makes for a really convenient way to backup feed data, as you only need to download the latest changes rather than all the data every time.

There are still a few things missing in the current implementation, that would be great to add:

- The first is that histogram data is not downloaded correctly as the import and export script only works at present with one column of data.

- It would be great to have a feature to select all the remote feeds to download, rather than having to click each one.

- It would be nice to have the sync interface page built in javascript/jquery to have it automatically update on progress without having to refresh the page.

But I think other than these which can be fixed and improved in time, its largely functional and good to go.

Data portability: Importing feed data into emoncms

The last post detailed an implementation for exporting feed data from emoncms, this post explores an implementation for importing feed data, which can be used in conjunction with the export implementation to sync or download feed data from one emoncms instance to another enabling full feed data portability.

I started by trying to create an import script that was build into a normal page action, you would go to a page ie:

sync/feed?id=10...

The problem with this is that downloading and importing a feed takes a long time and so the page will appear unresponsive for that time, the loading animation just turns and turns.

If you mistakenly click on the refresh button, the initial process will not close and so you will start to get duplicate processes which will mess up the data and increase the server load.

Ideally while downloading and importing the data there would be a progress bar so that you can see what's happening and any other useful feedback if things go wrong.

To get this kind of feedback you need to be able to update the browser as the process is running.

You also want the syncing to happen directly between servers rather than server to local computer to server as this would be faster in instances where both servers are remote and the internet access to the local computer is relatively slow, - i.e you want the import process handled in php which is server side (you could use another server side language if you wanted).

The way to get feedback about what's happening while having the import process being handled in php is to have two separate scripts one that handles the importing and another that can be refreshed regularly serves the gui webpage.

We can extend the idea of a separate import script to sort the problem of duplicate processes too by implementing a download and import queue.

When a user clicks to download/sync a feed in the sync page, the action behind this enters the details of the feed to be downloaded in the queue, it can also check at this point that it does not already exist in the queue.

We can then run the import script independently of the gui page, such as via a scheduled cron job that runs the script one a minute. We can put a check in there to ensure that only one instance of the script is running at any given time too.

The import script then checks if there are feeds to be downloaded in the import queue, working through each queue item sequentially.

The added advantage of the queue implementation is that we can keep the load that importing places on the server low allowing only one feed to be downloaded and imported at any given time.

So all in all this implementation:
  • Enables feedback on import progress.
  • Ensures we don’t get duplicate feed syncing processes.
  • Ensures that the server load is kept under control.
On to the code, here's the import script that works through the import queue and downloads and imports each item sequentially, I would be interested to hear if there are any ways the efficiency of the export and import process can be improved.

<?php

// Ensure only one instance of the script can run at any one time.
$fp = fopen("importlock", "w");
if (! flock($fp, LOCK_EX | LOCK_NB)) { echo "Already running\n"; die; }

// Connect to the database
define('EMONCMS_EXEC', 1);
require "Includes/process_settings.php";
require "Includes/db.php";
switch(db_connect()) {
case 0: break;
case 1: break;
case 3: show_dbsettingserror_message(); die ;
}

// Fetch the import queue
$result = db_query("SELECT * FROM importqueue ORDER BY `queid` Asc");

// For each item in the queue
while($row = db_fetch_array($result))
{
$queid = $row['queid'];
$feedname = "feed_".trim($row['localfeedid'])."";

// Check if we have already downloaded part of the feed and get the last
// value entered so that we dont download and insert data that has already
// been inserted this makes this utility useful for syncing in general
// and in particlar backup that only downloads the latest changes.
$feed_result = db_query("SELECT * FROM $feedname ORDER BY time Desc LIMIT 1");
$feed_row = db_fetch_array($feed_result);
$start = 0; if ($feed_row[0]) $start = $feed_row[0];

// Open the file served from the export page on the remote server
$url = $row['baseurl'].'/feed/export?apikey='.$row['apikey'].'&id='
.$row['feedid'].'&start='.$start;

echo "Opening file $url\n";
$fh = @fopen( $url, 'r' );

// Read through the file
$i = 0;
while (($data = fgetcsv($fh, 0, ",")) !== FALSE)
{
$feedtime = $data[0]; $value = $data[1];

if ($feedtime!='' && $value!='')
{
$i++;
//Contruct values part of the query
if ($i!=1) $vals .= ',';
$vals .= "('$feedtime','$value')";

// Execute query every 400 rows (same block size as export script)
if ($i>400)
{
$i = 0;
if ($vals) db_query("INSERT INTO $feedname (`time`,`data`) VALUES ".$vals);
$vals = "";
}
}
}
// If there are lines to be inserted left over insert them here at the end
if ($vals) db_query("INSERT INTO $feedname (`time`,`data`) VALUES ".$vals);

fclose($fh);

echo "Transfer complete\n";
echo "Deleting item $queid from queue\n";
db_query("DELETE FROM importqueue WHERE queid = $queid");
}

?>

In the next post I will explain the interface for generating the feed import queue.

Data portability: exporting feed data from emoncms

Following on from the previous two posts:
Making emoncms data portable
Data portability challenges

Here's my initial solution for feed export code that also allows it's load on the server to be regulated.

In trying to work this out I first tired requesting all the feed data (which could be millions of rows) in one mysql query and then writing the data to a file in one push, Testing on my laptop server installation and monitoring the load with the linux top tool the mysql query can use around 97% of the cpu for up to tens of seconds depending on the feed size. Once the query is complete the writing of the file (an apache process) then takes up another block of 97% cpu use until the file has completed downloading. Emoncms still seems fairly responsive in that time.

My next thought was to break up that larger query into smaller blocks and introducing a delay between the reading of each block, maybe the delay could wait for other actions to finish before continuing in future versions.

Spiting the query into blocks of 400 rows, alternately loading and writing to the file without a delay drops the load to around 50% for both the mysql and apache processes.

By introducing a delay of about 80 microseconds its then possible to drop the load down to roughly 10% for the apache process and a little less for the mysql process.

Increasing the block size or/and reducing the delay speeds the transfer rate up, reducing the block size or/and increasing the delay slows the transfer rate down.

Any thoughts on improving this would be welcome, I will try and add it to the emoncms repo soon.

Here's the code:


<?php
  // Open database etc here
  // Extend timeout limit from 30s to 2mins
set_time_limit (120); // Feed id and start time of feed to export $feedid = 1; $start = 0; // Regulate mysql and apache load. $block_size = 400; $sleep = 80000; $feedname = "feed_".trim($feedid).""; $fileName = $feedname.'.csv'; // There is no need for the browser to cache the output header("Cache-Control: no-cache, no-store, must-revalidate"); // Tell the browser to handle output as a csv file to be downloaded header('Content-Description: File Transfer'); header("Content-type: text/csv"); header("Content-Disposition: attachment; filename={$fileName}"); header("Expires: 0"); header("Pragma: no-cache"); // Write to output stream $fh = @fopen( 'php://output', 'w' ); // Load new feed blocks until there is no more data $moredata_available = 1; while ($moredata_available) { // 1) Load a block $result = db_query("SELECT * FROM $feedname WHERE time>$start  
    ORDER BY time Asc Limit $block_size");

$moredata_available = 0;
while($row = db_fetch_array($result)) {

// Write block as csv to output stream
fputcsv($fh, array($row['time'],$row['data']));

// Set new start time so that we read the next block along
$start = $row['time'];
$moredata_available = 1;
}
// 2) Sleep for a bit
usleep($sleep);
}

fclose($fh);

In the next post I will discuss creating a script to import feed data by pointing directly at the export script on the other server, plus a background running daemon process and queueing mechanism to manage the import process.

Data portability challenges

Continuing from yesterday's post on making emoncms data portable.

There are some challenges with implementing data portability for energy data. 5s energy feed data can grow to be quite large, I have several feeds that are approaching 100Mb and a total account size of about 700Mb and thats only just approaching one year.

Moving large quantities of data uses server resources including: cpu, memory and bandwidth. Emoncms.org is hosted on a BigV virtual machine, where you get 1GiB Ram, 25GiB Disk space, 200GiB transfer for £10 a month, which is a fantastic deal for those specs.

As exporting or importing large feeds takes time its important that it does not hog resources on the server making the server unusable for other users, any import or export code needs to use a small percentage of the server resources and ideally the amount of resources used could be adjustable. We also need to keep an eye on the data transfer limits.

In the next post I will post up the initial working feed export code.






Making emoncms data portable

One of the features I've wanted in emoncms for some time is the ability to move feed data between different server's with ease.

I want to be able to log in the emoncms instance running on my local machine and go to a page where I can enter the authentication details and location of my account on a remote emoncms server such as emoncms.org at which point it would bring up a list of the feeds that are available and I could either download particular feeds or just click download all and it would automatically download the whole account.

This would allow me to get at my data when I'm offline and it could serve as a personal backup of the data. Another need for this feature is we have an old remote emoncms instance that has some old energy data on, this feature would make it really easy to move the data to the new instance.

This feature could also work in a similar way to a ubuntu one or dropbox like service, there could be an automatic syncing option.

This would make the energy data very portable, data potability helps to create an open web, ensures that we can be in full control of our energy data, we can remove it from a service if we no longer want to be with a particular service, use it in other applications and for other applications that may not have been thought of in the original application design.

Over the next few blog posts I'm going to try and explore how this can be implemented. I've already got some working code, but a lot of work will need to be done to make this a nice well implemented feature, so if you can help with the coding or/and provide insight into better ways of doing things, that would be most welcome.

Cleanweb UK

Last Thursday Glyn and I attended the Ignite Cleanweb UK event in London, an event exploring the power and potential of the web to help us solve our environmental challenges.

The format of the talks where all 5 minute, 20 slides, 15 seconds per slide which was a challenge to create but makes for a really fascinating event where you here about a lot of different projects in a short space of time. Videos of all the presentations talks are up on the web here: http://www.cleanweb.org.uk/ignite.html.

Here's a video of our presentation on OpenEnergyMonitor:



The slides on their own can be viewed on slideshare:




Thanks a lot to James Smith (@floppy) for organising the event and inspiring us with the cleanweb uk manifesto at the end.

EmonTx and NanodeRF Code guides


These EmonTx and NanodeRF code guides go through the main components required to put a full EmonTx and NanodeRF (or ENC28J60 based OpenKontrolGateway) firmware together.

If your just starting out, you might find going through these guide examples first better than starting straight in with the fully fledged firmware examples.


Cleanweb UK - Ignite Evening - London 13th Sep

Thursday, September 13, 2012

7:00pm, at Forward Technology in London



Trystan and I are looking forward to attending and speaking at the first Cleanweb UK Ignite evening in London. It looks like it will be a great event, there will be 12 short (5min) fast paced talks to do with sustainability and the internet. 

Please come along and joins us, registration and full event details here: http://www.cleanweb.org.uk/ignite.html



Reading Watt-hour data from an Elster A100C electricity meter using IrDA by Dave Berkeley part 2



As mentioned before if you have an Elster meter this is a fantastic way to read the exact accumulated watt hours that you have generated or used and can compliment and cross check a CT based measurement.

Dave Berkeley has now packaged the code into a nice library which makes it really easy to use. We have uploaded the library to a repository on github here: https://github.com/openenergymonitor/ElsterMeterReader

How to build it

1) To build the TSL261R based sensor for use with the emontx, follow the jack plug connection diagram on the emontx reference page here: http://openenergymonitor.org/emon/emontx/reference

We've put together a small kit available in the shop of all the components needed to connect it to an emontx including the TSL261R a male 3.5mm jack, 1m of cable and heat-shrink.

2) Plug the sensor into the pulse input on the emontx (digital pin 3, interrupt pin 1)

3) Download the ElsterMeterReader library and place in the arduino libraries folder.

4) Open the Arduino IDE and run the following example:

#include "elster.h"

void meter_reading(unsigned long r)
{
  Serial.print(r);
  Serial.print("\r\n");
}

ElsterA100C meter(meter_reading);

void setup()
{
  Serial.begin(9600);
  Serial.println("IRDA Meter reader");
  meter.init(1);
}
void loop()
{
  // Decode the meter stream
  const int byte_data = meter.decode_bit_stream();
  if (byte_data != -1) {
    meter.on_data(byte_data);
  }
}

5) If you now go to the serial window and place the sensor over the IrDA port (bottom-left of the meter) you should see the Watt-hours elapsed printed out to the serial window:


Thanks a lot to Dave Berkeley for this, for more information see his project page: http://www.rotwang.co.uk/projects/meter.html

Open Kontrol Gateway - emonBase

For a while now we have been using the NanodeRF as our emonBase web-connected base station. The NanodeRF is great and has met our needs well. However using the ENC28J60 Ethernet chip does have some limitations.

We have through about having an emonBase option which supports the Wiznet Ethernet chip. This is the chip that is used in the official Arduino Ethernet. On the Wiznet chip the TCP/IP stack is implemented on the chip as opposed to on the ATmega328 as in the case of the ENC28J60, this means ATmega328 has a greatly reduced load, more application memory is freed up and the code is much simpler and brought in line with the Arduino Ethernet code examples.

In testing we have found the Wiznet chip to be more stable at holding a connection over a long period.

However the Wiznet chip is more expensive than the ENC28J60, what we need is a flexible emonBase unit which is interchangeable between the two Ethernet chips, enter the Open Kontrol Gateway (OKG)!




The OKG is a very flexible web-connected base station based on the ATmega328 which is totally Arduino IDE compatible (Arduino UNO bootloader). The OKG has been designed and manufactured by our friend Miles who runs Ciseco Plc.

The OKG PCB supports multiple wireless options such as RFM12B, XBEE, XRF, RN-XV Wifi. There is also PCB footprints for expansions like a RTC, SD card and SRAM. Finally what attracted us to the OKG was the ability to easily choose between the ENC28J60 or Wiznet Ethernet chips:

OKG ENC28J60 module

OKG Wiznet W5200 module 
We now have stock of the OKG in kit form in the OpenEnergyMonitor shop. The kits will come with an RFM12B wireless radio (choice of frequency) and a choice of ENC28J60 or Wiznet Ethernet chips.

The OKG with the ENC28J60 is almost hardware identical to the NanodeRF. NanodeRF emonBase sketches will run on the OKG with a couple of little changes.

See the OKG OpenEnergyMonitor wiki page for the changes required to make the OKG work with the RFM12B and OpenEnergyMonitor emonBase examples:

http://wiki.openenergymonitor.org/index.php?title=Open_Kontrol_Gateway

Note: The wiki is still in beta, we will blog more about it once it's been a bit more polished. It's our hope that by using a Media Wiki for our more technical documentation people will be more inclined to contribute corrections and additions and the documentations should be easier to organise and maintain. The OpenEnergyMonitor login should automatically log you in to the wiki. Just make sure your logged into the OpenEnergyMonitor website before opening up the wiki page. Try clearing your browsers cookies if this doesn't work first time.

We are now stocking the OKG in the OpenEnergyMonitor shop: http://shop.openenergymonitor.com/open-kontrol-gateway/

What about the Raspberry Pi!? 
Don't worry we haven't overlooked the Raspberry Pi! We have successfully had emoncms running on a Pi receiving RFM12B data from a USB JeeLink, we plan to blog more about this soon. We're currently with Martin Harizanov to make a RFM12B to Raspberry Pi GPIO expansion board. The Raspberry Pi will make a very powerful emonbase/home energy server however it's quite a complicated little beast for beginners who have limited linux knowledge. We think there is still a need to also have a simpler Arduino based emonBase option such as the NanodeRF or OKG.