Data portability: exporting feed data from emoncms
Making emoncms data portable
Data portability challenges
Here's my initial solution for feed export code that also allows it's load on the server to be regulated.
In trying to work this out I first tired requesting all the feed data (which could be millions of rows) in one mysql query and then writing the data to a file in one push, Testing on my laptop server installation and monitoring the load with the linux top tool the mysql query can use around 97% of the cpu for up to tens of seconds depending on the feed size. Once the query is complete the writing of the file (an apache process) then takes up another block of 97% cpu use until the file has completed downloading. Emoncms still seems fairly responsive in that time.
My next thought was to break up that larger query into smaller blocks and introducing a delay between the reading of each block, maybe the delay could wait for other actions to finish before continuing in future versions.
Spiting the query into blocks of 400 rows, alternately loading and writing to the file without a delay drops the load to around 50% for both the mysql and apache processes.
By introducing a delay of about 80 microseconds its then possible to drop the load down to roughly 10% for the apache process and a little less for the mysql process.
Increasing the block size or/and reducing the delay speeds the transfer rate up, reducing the block size or/and increasing the delay slows the transfer rate down.
Any thoughts on improving this would be welcome, I will try and add it to the emoncms repo soon.
Here's the code:
<?php// Open database etc here
// Extend timeout limit from 30s to 2mins
set_time_limit (120); // Feed id and start time of feed to export $feedid = 1; $start = 0; // Regulate mysql and apache load. $block_size = 400; $sleep = 80000; $feedname = "feed_".trim($feedid).""; $fileName = $feedname.'.csv'; // There is no need for the browser to cache the output header("Cache-Control: no-cache, no-store, must-revalidate"); // Tell the browser to handle output as a csv file to be downloaded header('Content-Description: File Transfer'); header("Content-type: text/csv"); header("Content-Disposition: attachment; filename={$fileName}"); header("Expires: 0"); header("Pragma: no-cache"); // Write to output stream $fh = @fopen( 'php://output', 'w' ); // Load new feed blocks until there is no more data $moredata_available = 1; while ($moredata_available) { // 1) Load a block $result = db_query("SELECT * FROM $feedname WHERE time>$start
ORDER BY time Asc Limit $block_size");
$moredata_available = 0;
while($row = db_fetch_array($result)) {
// Write block as csv to output stream
fputcsv($fh, array($row['time'],$row['data']));
// Set new start time so that we read the next block along
$start = $row['time'];
$moredata_available = 1;
}
// 2) Sleep for a bit
usleep($sleep);
}
fclose($fh);
In the next post I will discuss creating a script to import feed data by pointing directly at the export script on the other server, plus a background running daemon process and queueing mechanism to manage the import process. To engage in discussion regarding this post, please post on our Community Forum.