DEV Community

Cover image for Home Grown IoT - Data
Aaron Powell for Microsoft Azure

Posted on • Originally published at aaron-powell.com on

Home Grown IoT - Data

One part of my IoT solution design that I wanted to dive into a bit more in the data side of things because after all, the reason Iā€™m making this project is to capture data.

The first question you might want to ask yourself when making an IoT project is where are you going to store the data, this was where I started, but now that Iā€™ve completed the first release Iā€™ve realised that that was the wrong question to have started with, instead I should have asked what was I going to do with the data?

Using Your Data

Before you choose a storage type and data structure itā€™s really important to start thinking about just what you will use your data for. Azure has lot of things to choose from such as Time Series Insights, Machine Learning services, Stream Analytics, Power BI or even the humble Excel spreadsheet! You can also build your own dashboards, maybe even some animated charts with React šŸ˜‰.

All of this will influence the decisions that you make around storage and structure.

For my solution I have two ways I want to use the data at the moment, first is to generate Power BI reports that allow me to look at trends over time (generation, consumption, etc.), second is to create some custom real-time charts.

Structuring Your Data

Now weā€™ve got an idea of what we want to do without data itā€™s time to think about how we will structure it, as that will have an influence on the type of storage that we use.

From my inverter Iā€™m getting three data sets:

  1. The labels for the sensors being monitored
  2. The values from each monitored sensor
  3. The power generated in 5-minute increments (I thinkā€¦ Iā€™m not 100% sure if thatā€™s what it, but thatā€™s what I think it is)

The data in 1 & 3 are interesting but the really valuable information is the data from the 2nd API. In here itā€™s broken down with a few valuable groups:

  • The watts, amps and volts per panel group
  • A summary of the watts, amps and volts that went into the inverter
  • A summary of the watts, amps, volts and hertz that went to the grid (I thinkā€¦ Itā€™s called ā€œoutā€ in the API so Iā€™m guessing thatā€™s out of the inverter to the grid)
  • Total generation summary in day, week, month, year and all time

Thereā€™s a heap of other points that I get back that I donā€™t understand too (this is an undocumented API after all šŸ¤£).

With this in mind, I started to think about the kinds of ā€œquestionsā€ I would formulate for the data, such as ā€œwhat is the power generated by each panel set for the last 30 days?ā€, ā€œwhatā€™s the total in and out?ā€ or ā€œhow much power do we use as a household?ā€. This helped me think about how best to structure the data.

I decided that I wanted to store the raw message untouched since I donā€™t use all fields yet (but may in the future), and I want to do this for each API that I call.

Next, I want to break down the main one into a few groups, Panel Feed and Summary. This is where I use multiple functions and consumer groups that I described in the solution design.

Finally, we want to structure our data for the kinds of queries we want to do against it. I made the decision that I would optimise for read in a non-relational manner, meaning Iā€™ll duplicate data across the different structures instead of doing joins. But I do still want to have a loose relationship between each piece of data, so for that Iā€™m generating a correlation ID that is attached to the message so each record can be related if I want.

Choosing Our Storage Type

So letā€™s take stock, we want to store half a dozen different data structures in a non-relational manner with some basic query support. Oh, and I want it to be cheap (hey, itā€™s my credit card each month!). With all this in mind I landed on Azure Table Stroage.

Since Iā€™m using F# I have Record Types that represent the different structures:

type PanelInfo =
     { [<PartitionKey>] Panel: string
       [<RowKey>] Id: string
       MessageId: string
       Current: float // Iin#
       Volts: float // Vin#
       Watts: float // Pin#
       MessageTimestamp: DateTime // SysTime
       CorrelationId: string }

Enter fullscreen mode Exit fullscreen mode

Source

Youā€™ll see here that I have a CorrelationId property, this allows me to trace the panel record (of which I have 2 per message) back to the full data set when it was sent up. I also have a timestamp in there for the message that allows me to group them over time.

For each of my data structures I use a different table rather than a combined table. This is mainly so I can look at an individual type and not find data gaps when the structure of each record is different.

Itā€™s Not Perfect

Itā€™s worth noting though that this isnā€™t a perfect solution. When I started looking into the Power BI reporting my friend Dom Raniszewski asked me why I was using Table and not Blob, which would be more efficient. And heā€™s right, there are a number of inefficiencies in how the data is stored for read in Power BI, but the reason for that is I also wanted an easy programmatic model so I could build my own real-time reporting (Power BI refreshes the data every 24 hours). Iā€™m going to keep it as is for the moment but weā€™ll see, I may revise it in the future.

And it turns out that future is now, as while writing this post I realised I had a design flaw in the way Iā€™m storing data for retrieval. Since the main view I want is at the day level, not seconds (which I capture in) I need some way to view that. But I canā€™t do it because the date is a timestamp to the second and Table Storageā€™s query engine isnā€™t advanced enough. Iā€™m going to think through how to best address this and retrofit it back into the 30k+ records I already have in storage!

Conclusion

Data is often a cornerstone of an IoT project and ensuring you have the right approach to storing it will dramatically improve the benefit you can get from it. As a technologist, your thought might immediately jump to choosing the right database type and then determine how to work your application into it.

Instead, Iā€™d encourage you to flip the direction, start thinking about what you want to do with your data and then find out what will be the best fit for that.

For me Table Storage is the best fit for a number of reasons, but there are still imperfect edges that Iā€™ll have to deal with.

Top comments (0)