Some Tips & Tricks when working with AWS IoT Rules Engine

#aws #iot

The AWS IoT Rules Engine is widely used for routing IoT device messages to different downstream services, based on evaluation of rule statements and conditions. Even if working with the AWS IoT Rules Engine can be straightforward, there are a few tricks to keep in mind when things don’t go according to plan.

In this post, I will put forward a few tips that you should apply to your AWS IoT Rules-based implementation, to prevent failures, increase observability, and verify that everything works as expected.

Enable the AWS IoT Logs

Debugging IoT rule executions is not trivial. Incorrect syntax is usually detected at creation time, however, a lot of issues can only be detected at runtime. When and if something goes wrong, your best bet of identifying the problem is looking in the AWS IoT Logs. Once you enable the AWS IoT Logs, which you can do using the AWS CLI, AWS CloudFormation, CDK or the AWS Console, you should be able to explore the log streams in Amazon CloudWatch Logs, or, at an even more granular level, in CloudWatch Insights.

With Amazon CloudWatch Insights, for instance, you could examine all the failures in your rule execution with the following query:

fields @timestamp, @message | filter ruleName="<your-rule-name>" and status = 'Failure' | sort @timestamp desc

Don’t forget to configure an Error Action for your Rule

Any of the actions you configure for execution could fail at any point. This can be caused by various reasons, such as missing execution permissions (a very common error, especially in cross account executions), runtime errors, throughput exceeded exceptions appearing at high TPS, just to name a few.

To ensure resilience, you need to make sure that that you build a mitigation for this point of failure. This mitigation should come in the form of an Error Action, which log your error, ensure you don’t lose data, and allow you to build a decoupled error handling flow. Setting up an error action is easy.

Below is an AWS CloudFormation S3 Error Action example.

ActivityToKinesisTopicRule:
  Type: AWS::IoT::TopicRule
  Properties:
    RuleName: "<your-rule-name>"
    TopicRulePayload:
      RuleDisabled: false
      AwsIotSqlVersion: "2016-03-23"
      Sql: "Some SQL"
      Actions:
        - Kinesis:
            RoleArn:<your-kinesis-role>
            StreamName: "your-kinesis-data-stream"
            PartitionKey: "<your key>"
      ErrorAction:
        S3:
          BucketName: "<your-bucket-name>"
          Key: "error-${timestamp()}.json"
          RoleArn: "<your-action-role>"

Ensure that the SQL Version is set to "Latest" for your IoT Rule

So let me share with you a little anecdote to kick start this best practice section.

Only just last week, I was building a data ingestion pipeline routing time-series data from IoT devices into Amazon Kinesis Data Streams, via the Rules Engine’s Kinesis Action. As I was doing due diligence and checking if ingestion worked, I noticed that all the array key/value pairs from the device JSON payload were missing from the Kinesis records. No error, no warning, just missing.

I looked in the IoT logs, checked for Kinesis errors, built an Error Action. But nothing helped. All the executions were cleanly performed, nothing seemed wrong. As I looked closely at my IoT Rule as it was displayed in the AWS IoT console, I noticed that my rule’s SQL version was set to "2015-10-08". That was clearly not what I wanted, but, as I forgot to set the desired version in my AWS CloudFormation template, my rule resource got created with the default SQL version, which is the old SQL version "2015-10-08". If you create your rule using the AWS SDK, you will see the same behaviour. And, as the old version does not have the right support for arrays, my array key/value pairs in the payload were silently ignored .

I remembered later that, every once in a while, when I work with the AWS IoT Rules Engine, I found myself encountering one problem or another that ends up being related to using an old SQL version. The behaviour can be unpredictable, so don’t expect a straight forward error message.

The conclusion here is, as a best practice, make sure you set the SQL version to the desired version explicitly (for example the latest current version "2016-03-23") when you create your AWS IoT Rule.

Does your IoT Rule have the right permissions?

Failures in your rule execution can be caused by lack of permissions. Your AWS IoT Rule needs an IAM Rule and a policy that allows the AWS IoT service to perform the operations needed to execute the action you specified. If you forget to add this, your rule execution will fail with authorization errors. Below is an example of an AWS IAM role, that allows your rule to write to an AWS Kinesis Data Stream.

KinesisRuleRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Version: "2012-10-17"
      Statement:
        - Effect: Allow
          Principal:
            Service:
              - iot.amazonaws.com
          Action:
            - sts:AssumeRole
    Path: "/"
    Policies:
      - PolicyName: AllowKinesis
        PolicyDocument:
          Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Action:
                - kinesis:PutRecord
              Resource: "<your-kinesis-data-stream-arn"

In conclusion, working with the AWS IoT Rules Engine can be straightforward, but, when issues occur, debugging is not always trivial. Having a few tricks up your sleeve, like the ones above, can help save some development and debugging time.
The four take-aways from this post are: enabling AWS IoT Logs, adding an Error Action, verifying the SQL version, and giving the AWS IoT Service the needed permissions for your action.

If you liked this post and would like to see more, follow me on Twitter (@fay_yette), or LinkedIn.

DEV Community

Some Tips & Tricks when working with AWS IoT Rules Engine

Enable the AWS IoT Logs

Don’t forget to configure an Error Action for your Rule

Ensure that the SQL Version is set to "Latest" for your IoT Rule

Does your IoT Rule have the right permissions?

Top comments (0)

Read next

The unseen beauty of Antelope Canyon!!

Day 4: Configuring CloudFront and Securing Your Website with HTTPS

AWS re:Invent 2024 Day 4

Multi-Region Distributed SQL Transaction Latency