Article

Harmonizing Complex IoT Ecosystems: The Role of a System Interfaces Repository

April 10, 2024

By Ben Jacques, Director of Engineering

The Challenge:

Code organization is hard. Many blogs and heated reddit discussions have been created on the best way to tackle multiple projects. One of those debates is around the pros and cons of monorepos and polyrepos (https://github.com/joelparkerhenderson/monorepo-vs-polyrepo). In my experience, there is never a one-size-fits-all approach. However, when you find something that works for your code and your team, it makes everyone’s lives easier. For us, that has actually been a polyrepo of monorepos, loosely coupled together by a system_interfaces repo. Why would we do such a thing? Partly because the world of Internet of Things (IoT) is just inherently complex. It contains an array of components from the embedded devices, cloud services, web, and mobile applications – each playing a pivotal role in the functioning of the entire ecosystem. Defining how each of these domains interacts with the other is critical in making a cohesive system that “just works” for the end users. After a lot of trial and error, we have landed on this hybrid approach of code organization and wanted to share those learnings. 

Understanding the Complexity of IoT Systems

IoT systems are complex, involving various components that need to seamlessly interact. Let’s explore these elements:

  • Embedded Devices: These are the fundamental units in IoT, responsible for collecting and acting on data. Their roles, limitations, and capabilities are crucial in the IoT landscape. For us, these devices are often running on small micros using C and some flavor of a Real-time Operating System (RTOS). Sometimes they are more complex with a full-blown embedded Linux OS and a higher level UI framework like Flutter. 
  • Cloud Services: Serving as the backbone for data processing and storage, cloud services ensure seamless communication and data exchange among IoT components. Most of our applications leverage Amazon Web Services (AWS) and are written in Typescript with AWS Cloud Development Kit (CDK). 
  • Web and Mobile Applications: They are the interfaces that users interact with, playing a crucial role in the accessibility and usability of IoT systems. For most users, this is where their primary interaction is with your device, so the user experience really matters. Additionally, most IoT applications need some way for people to administer their devices, manage over-the-air updates, and monitor fleet performance. Most of our mobile applications are written in cross-platform frameworks like React Native of Flutter while our web apps are mostly written in React.

There could be others as well such as third-party integrations such as Google Home or Apple HomeKit. The challenge lies in maintaining code coherence across these diverse components. We need to strike a balance between loosely coupled individual components and a highly cohesive overall system.

The Problem with Multiple Repositories

Juggling multiple repositories can lead to a plethora of issues including version control challenges, version compatibility, and larger complexity in how bugs manifest.

For version control, different repositories may hold different versions of code that need to be synchronized and compatible. This situation complicates the development process, as developers must keep track of various versions and ensure that their code changes are compatible across the entire system. The potential for error increases, as does the difficulty in tracking dependencies and changes across repositories. This can be even more challenging in IoT systems where updates to end devices (the things) are not always predictable or in your control as a developer. Many things can cause an over-the-air (OTA) update to fail or be delayed for several versions.

Ensuring compatibility and cohesiveness across the different parts of the system is critical when deploying new features and updates. This often involves a balance of simplicity and backward compatibility. In a single repository, the pull request and peer review process can help coordinate when new changes are introduced. However, when code is spread across multiple repositories, this gate check isn’t inherently present. It can require a higher level of communication between developers and sometimes even between development teams. As much as we would like to think communication will just fix all the problems, we all know from experience that miscommunication happens and mistakes are made, even with the best of intentions and processes.

One possible solution would be to just jam all of your code into one giant repository. This comes with its own host of problems including overly complex CI/CD pipelines, painful merge processes, huge code bases, and increased complexity working across multiple teams, especially in circumstances where not all developers should have access to all of the code.

Our Use of Monorepos

Even though multiple repositories make a ton of sense across multiple domains, we have actually adopted a monorepo strategy within a specific domain. For example, in the realm of mobile development, particularly when dealing with the complexities of Internet of Things (IoT) components, adopting a monorepo approach that houses both an SDK for the IoT elements and an application layer offers substantial benefits. This unified repository structure fosters an environment where the dependencies between the SDK and the application layer are closely managed and updated in tandem, ensuring seamless integration and compatibility. This approach mitigates the risk of version discrepancies and simplifies the development workflow, allowing developers to make changes to the SDK and see their immediate impact on the application. Furthermore, having both components in a single repository simplifies the process of testing, as any changes can be validated across the entire system at once, ensuring that updates to the SDK do not inadvertently break application functionality.

Moreover, a monorepo strategy in mobile development encourages a more collaborative and cohesive team environment. Developers working on the application layer gain a deeper understanding of the underlying SDK, leading to more informed decision-making and a greater ability to contribute improvements or optimizations back to the SDK. This cross-pollination of knowledge enhances the team’s overall efficiency and product quality. Additionally, this approach streamlines the onboarding process for new developers, providing them with immediate access to both the SDK and the application layer within a single repository. As a result, they can quickly get up to speed with the entire system’s architecture and dependencies, accelerating their contribution to the project. Ultimately, leveraging a monorepo for mobile IoT development brings a synergy between the SDK and application layers, driving innovations, enhancing system reliability, and improving the speed of development cycles.

This approach works great within a specific domain, such as cloud, mobile, or embedded, but it just doesn’t work across the whole ecosystem for the reasons mentioned above.

Introduction to the System Interfaces Repository

Enter the system_interfaces repo. 

One way we have found to mitigate some of the issues of multiple repositories or one giant repository is to create a unique type of repository that we call system_interfaces.

For us, the system interfaces repo serves as the single source of truth for all the interface definitions between various parts of the system. This isn’t a new concept. It is very similar to the idea of API definitions in a microservices architecture. However, we expand upon that idea to cover more types of protocols and more concrete implementations of data models. For us, we tend to have a mixture of HTTP, GraphQL, and MQTT protocols depending on the needs of the system. This communication can also happen over a variety of communication channels including WiFi, BLE, or cellular connections. Our system interfaces repo allows us to capture all of these things in one place.

If any updates are made to how system components communicate, it is first made in the system interfaces repo. From there, it will go through a typical PR process to ensure everyone understands the changes and their intentions. The changes to the system interfaces are then pulled in as a dependency to the other repos and those are updated accordingly. 

This simplifies many of the challenges of version control that we run into with multiple repos while still allowing shared schemas and definitions that maintain consistency across the components. Below is a high level look at the organization of this repo.

Defining Boundaries and Schemas in System Interfaces

Diving into the technical intricacies of IoT systems reveals the critical role of defining clear boundaries and schemas. Establishing these parameters is essential for delineating the responsibilities and interactions of various system components. Boundaries serve as the demarcation lines that separate different areas of functionality, ensuring that the system is modular and each component has a clearly defined role. Schemas, on the other hand, act as blueprints for data formats and communication protocols, ensuring that information flows smoothly and consistently across the system. This meticulous definition of boundaries and schemas not only aids in system organization but also simplifies the integration of new components and the scaling of the system over time. One example of this is defining the boundaries for MQTT payloads when working with IoT devices that are talking directly to the cloud. We provide the topic names, the definitions of the payloads in each topic, and the naming patterns.

The repository plays a pivotal role in maintaining the coherence and compatibility of different IoT components by acting as a centralized hub for these definitions. It provides a single source of truth for all boundary and schema information, facilitating easy access and updates. By housing common schemas and concepts, the repository ensures that all components speak the same “language,” thereby enhancing interoperability among disparate parts of the IoT ecosystem. This centralized approach streamlines the development process, as developers can quickly reference and apply standard schemas, reducing the likelihood of inconsistencies and integration issues. In essence, the repository not only serves as a knowledge base for the technical framework of the system but also as a foundational tool that supports the seamless interaction of the IoT system’s myriad components.

Documentation and Architectural Decisions

Documenting the architecture and key decisions of a system is not just a formality; it’s an essential practice for ensuring the long-term maintainability and scalability of software projects. This documentation serves as a roadmap, guiding both current team members and future developers through the rationale behind decisions, the structure of the system, and how components interact. Without this clarity, teams can struggle with understanding the project’s evolution, leading to increased onboarding time for new members and potentially hindering efficient scaling and adaptation to new requirements.

But what does this practically look like? We have settled on using Architecture Decision Records. These are essentially just markdown templates in the repo that define how to document the decisions that were made. This allows us to have revision control as well as approvals that move through a typical PR process. Below is an example of an ADR documenting our decision to use ADRs in a repo.

Beyond ADRs, we also leverage Mermaid Diagrams to document procedures or flows of data. Since Github has built-in rendering support for Mermaid, this makes reviewing the diagrams really easy while still allowing us to maintain version control and history easily. For us, this was a major improvement over links to external charting tools or screenshots that aren’t easily tracked and versioned. Below is an example from our CallBox framework.

Leveraging the repository itself as a documentation hub can significantly streamline this process. By integrating documentation directly with the code, it encourages developers to update documentation as part of their development workflow, ensuring that it remains current and closely aligned with the codebase. Adopting best practices such as clear, concise writing; regular updates; and the use of diagrams where appropriate can transform documentation from a neglected chore into a valuable asset. This approach not only facilitates better understanding and cooperation among team members but also enhances the overall quality and agility of the software development process. This is also a great place to house your Definition of Done if you are an agile shop.

Implementing the System Interfaces Repository

Implementing a dedicated repository for system interfaces involves a few critical steps, beginning with the setup process. This initial phase should focus on establishing a clear structure within the repository that categorizes interfaces based on their functionality, usage, or system components they interact with. It’s also essential to define standard naming conventions and documentation practices from the outset to ensure consistency and clarity across the repository. Once the repository’s foundation is laid out, integrating it with existing system components becomes the next focal point. Strategies for this phase include setting up automated processes to ensure that changes in the repository trigger updates in the relevant components and vice versa, thereby maintaining synchronization across the system.

Effective management and updating of the system_interfaces repository are vital to its long-term usefulness and reliability. One practical tip is to leverage code generation tools to automate the implementation of schemas and interfaces, reducing manual coding errors and saving time. As an example, we use ajv and json-schema-to-typescript for type generation within our CallBox framework. Additionally, regular reviews and updates of the repository should be institutionalized as part of the development cycle, ensuring that the repository evolves in lockstep with the system it supports. By following these practical steps, teams can create a robust framework that enhances collaboration, reduces integration issues, and paves the way for a more scalable and maintainable system.

Case Study: CallBox

Since this approach was very successful on a number of client projects we have done, we also rolled the approach into our own IoT framework, CallBox. After several iterations of various repository and documentation approaches, we landed on the system_interfaces approach for CallBox simply because it worked and worked really well. It has become a key communication tool for our developers and has reduced confusion across our teams. We have used this approach to successfully iterate through a number of versions of CallBox while maintaining functionality.

Challenges and Limitations

While this approach has been great for us, we know there is no perfect solution to managing this type of software complexity. While it has certainly reduced compatibility issues, it has not eliminated them completely. Good testing and quality assurance are still important to ensuring the end products work as expected. Additionally, there are still limitations to the types of code generation possible from the system interfaces. For example, generating MQTT messages for embedded code in C aren’t as great as their Typescript counterparts. 

It is also not easy to implement something like this in a brown-field project. Having everyone on-board with the approach is critical, because it can feel like this slows down the development process for some developers. You can hear the conversation in your head: “Ugh. I have to wait for this system interfaces PR to get approved before I can even start working on this new feature…”.

However, for us, the benefits very much outweigh the limitations.

Conclusion:

Navigating the intricate landscape of code organization and system architecture in the realm of Internet of Things (IoT) development is no small feat. Our journey through the complex terrain of monorepos and polyrepos has led us to a hybrid solution that has streamlined our processes and bolstered our team’s efficiency: a polyrepo of monorepos, interconnected through a system_interfaces repository. This unique setup has not only addressed the inherent complexities of IoT systems—spanning embedded devices, cloud services, and user interfaces—but has also provided a scalable framework that adapts to evolving project requirements. The system_interfaces repository, acting as a single source of truth, has enhanced communication across teams, reduced integration headaches, and ensured consistency across our diverse components.

As we share our experiences and the practical steps toward implementing and managing this hybrid approach, it’s clear that while no solution is without its challenges, the benefits we’ve reaped have been substantial. From improving version control to facilitating seamless component integration and promoting effective documentation practices, this strategy has empowered us to tackle the complexities of IoT development head-on. Our case study with CallBox underscores the viability and success of this approach, demonstrating its potential to not only streamline development processes but also to foster innovation within the IoT space. Embracing such strategies requires a commitment to adaptation and continuous improvement, but as we’ve seen, the rewards can significantly outweigh the hurdles, paving the way for more robust, flexible, and maintainable systems in the fast-evolving world of technology.