Graphing Orlando IoT Temperature Sensor Readings

I wondered what temperatures in Orlando have done over this last week. You see I just happen to have a set of IoT devices which are streaming data that I persist into an archive. One of those sensors is on a covered patio in Orlando, so it would be interesting to see what kind of data there is from this last week.

In my setup, Smartthings sends data to a Spring Cloud Data Flow endpoint which drops the data into files. Those files are parsed and uploaded to Ceph host, a simple IoT data flow. Within Greenplum  I have created external tables that can read the data and the table looks like this

gpadmin=# \d smartthings_tracking

         Table "public.smartthings_tracking"
   Column   |            Type             | Modifiers 
------------+-----------------------------+-----------
 deviceid   | text                        | 
 name       | text                        | 
 value      | text                        | 
 recordedat | timestamp without time zone | 
 unit       | text                        | 
Number of child tables: 170 (Use \d+ to list them.)
Distributed randomly
Partition by: (recordedat)

The structure is simple and I can sample of few of the data points with this query

gpadmin=# \x
Expanded display is on.

gpadmin=# SELECT * FROM smartthings_tracking 
WHERE deviceid = 'a15b41c3-8bf0-49df-8571-a5a5d138fdff' 
ORDER BY recordedat DESC LIMIT 2;

-[ RECORD 1 ]------------------------------------
deviceid   | a15b41c3-8bf0-49df-8571-a5a5d138fdff
name       | temperature
value      | 72
recordedat | 2017-09-12 01:35:47.993
unit       | F
-[ RECORD 2 ]------------------------------------
deviceid   | a15b41c3-8bf0-49df-8571-a5a5d138fdff
name       | temperature
value      | 73
recordedat | 2017-09-12 00:20:46.677
unit       | F

I know I’m going to want to have a visual representation of the data because looking at it as a list won’t easily convey the information I want. Also I know I will be doing some interactive querying to try to get this right, so I switch from the command line over to Apache Zeppelin which connects to my Greenplum server.

This is one of those cases where I can really appreciate interactive querying of IoT data. I need to do some exploring and will submit queries or attempts at a query a few dozen times. As I’m crafting the visualization I will adjust the window to varying widths of data in order to optimize what I see which will generate more queries. Getting immediate feedback greatly decreases the cycle time in order to get a result. What I’m doing here is fairly simple though and it doesn’t take long before I come up with a query and time window I like.

SELECT 
 deviceid, 
 recordedat::timestamp, 
 value 
FROM smartthings_tracking 
WHERE deviceid = 'a15b41c3-8bf0-49df-8571-a5a5d138fdff' 
AND name = 'temperature' 
AND recordedat > '2017-09-05 00:00:00' 
ORDER BY recordedat DESC;

A little bit of messing around with the visualization buttons and I come up with this

Which is not a bad image, but I quickly realize something is wrong with it. These nice angled lines do not show how temperatures work over time. Why is that?

Then I remember, it is because these sensors kick out a temp value only when they have registered a temperature change of a full degree. So the time distance between two points could be 3 seconds or it could be 12 hours.

This will cause the time to not be reflected with spatial accuracy along my X axis. While it is showing the strange weather, reflected by the low values on the left, it isn’t giving me a good perception of how that actually happened in time over the past week.

Frustration.

Then I remember a post by Caleb Welton a few years ago on Time Series analysis. This has exactly what I need. A big part of making this work is setting up a few functions to that will handle and maintain the last known values.

create function
 last_known_t(prev float8, current float8)
 returns float8
 as $$return current or prev$$
 language plpythonu;

create aggregate last_known(float8) (
 stype =float8,
 sfunc = last_known_t,
 prefunc = last_known_t
);

create function
 last_known_t(prev timestamp, current timestamp)
 returns timestamp
 as $$return current or prev$$
 language plpythonu;

create aggregate last_known(timestamp) (
 stype = timestamp,
 sfunc = last_known_t,
 prefunc = last_known_t
);

With those in place I can now create a new query that will create time slices and drop the data into those slices. Now the slices will carry forward the last known value.

WITH
 bounded AS (
 SELECT
 recordedat,
 interval_bound(recordedat, '15 minutes') AS slice,
 value
 FROM smartthings_tracking
 WHERE deviceid = 'a15b41c3-8bf0-49df-8571-a5a5d138fdff' 
 AND name = 'temperature' 
 AND recordedat::timestamp > '2017-09-05 00:00:00'
 ),
 dense AS (
 SELECT slice
 FROM generate_series('2017-09-05 00:00:00'::timestamp,
 '2017-09-13 00:00:00', '15 minutes') s(slice)
 )
SELECT 
 slice,
 value::int,
 last_known(value::int) OVER (ORDER BY slice, recordedat)
FROM BOUNDED RIGHT JOIN dense USING (slice)
ORDER BY slice, recordedat;

Looking at a piece of the raw output it becomes easy to see how different the data appears as it is chunked into even time periods

I will potentially double up on some numbers if it jumped more than a degree in a 15 minute period, but my visualization can show the average so it is not a problem. This leads to a much better visualization of the temperatures

The temperatures sure did deviate from the normal pattern there for a little while. This isn’t a fantastic visualization, for under and hour of work though it does give me a decent visual representation of the strangeness in the weather. I feel lucky the the only impact I see is in this graph and pray that those dealing with greater impacts stay healthy and find a path to recovery.