Rethinking IoT Command and Control: Beyond AWS Shadows

Article

Rethinking IoT Command and Control: Beyond AWS Shadows

December 9, 2024

Rocky Sherriff

By Rocky Sherriff | Principal Software Engineer

By Blake Lapum | Senior Software Engineer

Command and control for IoT devices refers to the mechanisms and protocols that enable remote interaction with connected devices, allowing users or systems to trigger specific actions or state changes and to receive responses or updates about the device’s status.

There are certainly some IoT devices that don’t need command and control – think of a sensor node that just streams readings to the cloud. But IoT devices become enormously more useful once command and control is added to the mix. Such interactive devices allow users to customize behavior, automate tasks, and respond to changing needs, transforming static devices into dynamic and engaging solutions.

When using AWS’s IoT tools, one common approach for remote command and control is to use IoT Core’s shadow desired objects. This is simple and quick to implement, but as we’ll see this is not always a good fit. We’ll discuss some limitations of that approach and describe some alternatives, including direct MQTT request and response messages as used by the SpinDance CallBox IoT framework.

Understanding command and control using AWS Shadows

In AWS IoT Device Shadows, the desired property is a section of the shadow document used to specify the state you want the device to achieve. It represents the target or intended state for the device’s attributes.

When you update the desired property, AWS IoT passes this information to the device via a delta MQTT topic, prompting the device to take actions to match the specified state. Once the device updates its status and reports back to AWS IoT, the reported section of the shadow document reflects the current state of the device, and AWS IoT automatically removes the corresponding entries from the desired section when the states match.

For example, the cloud (or a mobile device via the cloud) might want to set your thermostat to 72 degrees. It could send a shadow update with a desired section like this

{

“state”: {

“desired”: {

“temperature”: 72

}

The device should process the corresponding delta, update its internal state, and report back its current temperature in the reported section of the shadow. When AWS sees the reported state match the desired state, it will null that particular desired object, which will remove it from the shadow.

On the surface, this looks great! With minimal fuss, you’re able to command a device to change state and most of the cloud-side plumbing is handled for you. Indeed, this approach works well under a few constraints (see AWS Best practices for the AWS IoT Classic Shadow):

Restrict writing desired objects (including any clearing) to the cloud, and restrict writing reported objects to the device. This can be locked down with IoT Policies.
Limit use to relatively infrequent commands, typically on timescales of minutes or hours not seconds.

But alas, there are some gotcha’s as we’ll discuss in the next section.

Limitations of command and control via AWS Shadows

The easy implementation of command and control via thing shadows does come with some costs and some applicability limitations that we describe here.

State-Driven, Not Action-Oriented:

The desired property is designed to represent a target state, not to trigger immediate or one-time actions. For example:

a. A good use: “Set the thermostat to 72° F” (this is a target state to be maintained).
b. A bad use: “Reboot the device” (this is a one-time action).

If the device cannot reach the desired state, the desired property could persist indefinitely

c. Can require the cloud to detect this situation and null the desired property.
d. Lacks a native feedback mechanism for why a value was invalid or how a command failed.

2. Race Conditions:
Because there is no “handshake” between cloud and device for accepting or rejecting a command, the response via the shadow reported object is prone to race conditions. For instance, when a device cannot accept a command, you might be tempted to have the device null the corresponding desired property. But as soon as you have both cloud and device writing to the same property, you open race conditions when updates conflict or overlap. Alternatively, you might add a new reported object property to report error codes for these rejections, but then you have a race condition between multiple desired posts and the corresponding fault code report, making it difficult to unambiguously associate the fault code with a specific command.

3. High Messaging Overhead:

Each shadow update involves multiple MQTT messages to one of four topics:

1. $aws/things/<thing_name>/shadow/update
2. $aws/things/<thing_name>/shadow/update/documents
3. $aws/things/<thing_name>/shadow/update/accepted or /rejected
4. (Optional) $aws/things/<thing_name>/shadow/update/delta if there is a mismatch between desired and reported.

As a result, a single command using the desired property typically results in 7 total MQTT messages: 4 for the initial shadow update and 3 for the response shadow update after execution. This is particularly problematic as you scale up the number of devices in the field.

4. Rate limitations:

AWS best practices suggest using the shadow for low-to-medium transaction-per-second use cases, such as infrequent state updates occurring in minutes, hours, or days. High-frequency updates need a more responsive, lighter-weight command pattern.

5. Resource Overhead:

Shadow documents consume memory and processing on embedded devices, particularly when managing deeply nested objects or frequently processing deltas. This can overload constrained devices and lead to performance issues.

Command and control alternatives

At this point, you should have a feel for when you can do command and control via shadows and when you shouldn’t. What are your alternatives? There are really two big contenders:

AWS Jobs
Direct MQTT request and response messages

AWS Jobs provide a powerful command and control sequence of MQTT messages that maintains job state synchronization between cloud and device throughout the process. That allows for meaningful accept and reject communications from the device and avoids the other limitations of the desired object. But the implementation is rather heavyweight. For mission-critical use cases, like over-the-air (OTA) updates, it’s worth it to get the structure and reliability of jobs. But for many other command and control use cases, it’s overkill.

Direct MQTT request and response messages are a lightweight alternative to AWS jobs. They are exactly what they sound like. You implement a command using a single MQTT command message from cloud to device.The device can then respond with one or more response MQTT messages to the cloud.

The CallBox command and control pattern

The SpinDance CallBox IoT framework implements direct MQTT command and control with a single request message from cloud to device one or two response messages from device to cloud. The device sends a single response for commands that get rejected and two messages for commands that get accepted. Here’s an abbreviated message sequence chart describing the interaction:

The cloud initiates a command by sending a command request MQTT message to the device. The message specifies one of a set of previously agreed command IDs that identifies the action requested of the device.

The device validates the command to ensure that it’s a well-formed command message and that the command is valid for the device’s current state. If the command is invalid, the device sends a response message telling the cloud that the command was rejected along with an error code that provides information about why it was rejected.

If the command is valid, the device sends a response message saying the command is accepted and then starts the associated action. The action might be either a change in persistent state or a one-time action. Once the command has completed, the device sends a second response indicating that the execution is complete. The distinction between command validation and execution phases is valuable for implementing user interfaces where the user needs to distinguish a bad command and a failed execution. The initial accept response can be used to update the mobile or cloud UI informing the user that the command execution has begun. The pattern works particularly well for commands that take a while to execute.

RGB Nightlight example

To make things a little more concrete, let’s examine a classic IoT use case – the RGB Nightlight. This is some device that supports a controllable RGB LED. And we want to control the color of the LED remotely.

With the CallBox pattern, the cloud would send a command request message with something like the following payload:

{

“requestId”: “A34FD723”, // Unique to each request

“commandName”: “rgbControl”, // Identifies the action

“timestamp”: “2023-10-04T18:57:07Z”, // Useful for debugging

“payload”: {

“state”:”solid”,

“color”: {

“r”: 250,

“g”: 125,

“b”: 10

}

If the command is ill-formed or the device cannot currently handle the request, then the device sends a reject response

{

“requestId”: “A34FD723”,

“commandName”: “rgbControl”,

“timestamp”: “2023-10-04T18:57:08Z”,

“phase”: “validate”,

“statusCode”: 2, // Rejection code

“details”: “invalid RGB value” // Reason

}

If the command is well-formed and the device is able to handle it, then the device sends an accept response similar to this:

{

“requestId”: “A34FD723”,

“commandName”: “rgbControl”,

“timestamp”: “2023-10-04T18:57:07Z”,

“phase”: “validate”,

“statusCode”: 0,

“details”: “OK”,

}

and then starts executing the command. When the execution completes, the device sends a completion message similar to:

{

“requestId”: “A34FD723”,

“commandName”: “rgbControl”,

“timestamp”: “2023-10-04T18:57:07Z”,

“phase”: “execute”,

“statusCode”: 0, // Success

“details”: “OK”,

}

Note that the execution complete message might alternatively send a “statusCode” that says the execution failed and give a reason for it.

So far, this RGB LED example is a classic state–driven command case. We expect the LED to remain in the newly commanded state. But it’s easy to see that this same pattern can easily handle one-time actions use cases like turning on the LED for a specific duration, or flashing the LED a specific number of times. In both those cases, the final state of the LED is eventually “off” but the one-time actions offer dramatically different ways to get there.

Another important point about the pattern is less obvious with the RGB LED example. There would typically be no delay in executing a command to change an LED color. But it should be easy to see that this pattern is equally at home supporting commands that take some time to execute. A good example might be a remote garage door opener, where the opening takes ~20-30 seconds.

Benefits of Direct MQTT Command/Response

Hopefully, we have demonstrated by now the power and flexibility of the direct MQTT request and response pattern for implementing remote command and control. Here is a summary of the benefits over a state-driven solution in general and the AWS shadow desired property in particular

Supports either state change commands or one-time action commands using the same pattern.
Eliminates race conditions since request and response messages are queued and handled in order. There is never ambiguity between request and response.
Reduces message traffic for simple commands.
Gives the device an opportunity to validate commands before execution and a simple way to reject invalid commands in an informative way.
Supports separate command “validate” and “execute” messages for cases where execution requires noticeable time. This is particularly useful for providing such feedback to the user via the cloud or mobile UI.
Allows easy implementation and customization for developers.

Notice that the same pattern supports queries as well as actions. You can just add a “queryResponse” field to the execution response message.

Conclusion

Direct MQTT command and response is a powerful and flexible pattern for IoT command and control, offering benefits like reduced messaging overhead, clear feedback, and support for both immediate and long-running actions. While AWS shadows have their place for low-frequency, state-driven use cases, a direct command approach is often a better fit for dynamic, high-frequency, or action-oriented scenarios. By leveraging patterns like the SpinDance Callbox framework, developers can build responsive, scalable, and user-friendly IoT solutions that meet the demands of modern connected devices.

Start your
IoT journey.

SpinDance