BIG Data, Fast Data - Part I
By
Published: September 04, 2019
I regularly get asked the question: “What’s so special about Internet of Things (IoT) data?”
Surely data is just data after all? It really can’t be all that much different from any other type of data that developers work with on a daily basis to create compelling solutions to business problems. My answer is normally, “yes and no; it’s complicated”. I’ll aim to shed some light on that question and provide insights into the common pitfalls of working in the data quagmire of the IoT world. In this, the first of a three-part series, we will look at IoT from a data perspective to appreciate the design challenges we face in a world where everything will be connected.
Before we start focussing on the here and now, a small history lesson should help to put this into perspective.
Historically, “Things” solutions of old required highly specialised experience combining hardware and low-level software development. Looking back, the effort required to extract even relatively simple telemetry data from a physical device was considerable. Devices were far less standardised and generally required significant prior engineering knowledge of the device platform (and usually) the accompanying SDK. Decoding the data from a relatively simple sensor (such as temperature) would usually require some rudimentary electronics understanding, in addition to the learning curve for the device’s SDK, which would be needed to read, process and communicate the data.
Every aspect of this process was fraught with challenges and constraints. Devices were usually extremely constrained in terms of CPU processing power and memory capacity. Networking and serial connectivity capabilities revolved around bits and bytes of data. If you were lucky enough to be working on a truly “connected” device, you still had to address the constraints around limiting the data transmissions costs for cellular networks. When working at this level, you typically had to use low-level programming languages, such as C, to guarantee minimal resource usage and optimal performance of the device in question. All of this had to be done while ensuring end-to-end security of the solution.
In terms of the processing of the data from an application perspective, the majority of the earlier back-end solutions would be traditionally client-server based, usually written in Java. The diagram below illustrates a typical IoT Sensor application solution deployment.
Most solutions would have required a great deal of bespoke development and customisation, even those based on available IoT vendor application platforms/frameworks. The resultant accumulation of device telemetry data would be considered miniscule by modern standards. The design of these solutions, more often than not, followed a traditional Big Design Up-Front (BDUF) approach. The concept of agility and flexibility was quite alien and as a result, the data variety and volumes were predetermined at the start of the project. The specialist skillsets, hardware cost and availability meant that there was little room for maneuverability when the solution implementation was underway, regardless of any key learnings or data discoveries that could have improved the final outcome. Essentially, many projects during this time were hamstrung by many factors that limited the value that could be obtained from the data.
The landscape has been transformed in so many ways that we now have the opposite problem: too much data to handle. The reason why nobody agrees on future data predictions is due to the rapid pace of technological progress and innovation in the IoT world. More importantly, this progress is showing no signs whatsoever of slowing down. Despite IoT’s complexity, it’s still possible to break down the area into a small number of fundamental building blocks, the core essence of which has remained relatively unchanged over the years. These areas are summarised in the diagram below.
Let’s delve into each of these in more detail to explore the concerns and challenges of data in the design of the Internet of Things.
Previous attempts to standardise the data devices produced were partially constrained by the processing capability and storage capabilities of devices. That is no longer the case. We have an almost limitless choice of low-cost available devices to choose from today. These devices are now designed and built with modularity in mind, which enables a wide combination of components to seamlessly integrate to meet a solution’s need. If we;re still unable to find a particular device combination off-the-shelf to suit our needs, then we can have one designed especially for us, at a fraction of the cost of only a few years ago. These chipsets are now of an order of magnitude faster than their predecessors. They also manage to achieve this while also being far more efficient at power management. This now means we have the ability to process data on the device at a rate thought unimaginable previously, at a far lower cost and consuming far less power.
As modern devices are far smaller than their predecessors, industrial equipment can now contain multiple devices, each focused on capturing and managing certain aspects of the machine’s behaviour. Each of these devices in turn is capable of interfacing with a multitude of either internal and/or external sensors and actuators. Not only does this allow for capture of a wide spectrum of environment and machine data, it also enables redundancy, which is vital when dealing with equipment in hard-to-service remote locations.
Traditionally, embedded devices would require connections (usually wired) to machine-to-machine (M2M) gateways in order to forward sensor data to a central server. Modern electronic design and manufacturing advances now enables many devices to have a choice of network communications built right into the silicon in device designs such as System-On-Chip (SoC) and System-On-Module (SoM). It’s far more commercially viable to provide connectivity as part of the core chipset, thus facilitating increased opportunities for device connectivity. Even if this isn’t required from day one, it can be enabled by software at a later date when needed.
The combined result of having devices with all the capabilities described above is a machine capable of continuously producing enormous amounts of data in near real-time for little cost. This means you can install connected devices almost anywhere you might need them.
Despite all the interest in emerging LPWAN standards such as LoRaWAN, Sigfox and also NB-IoT, it’s important to remember that we still have an extensive toolbox of pre-existing connectivity mechanisms at our disposal to incorporate into IoT solutions. These should be considered complementary to standards such as LPWAN. The table opposite provides a high-level summary of the key connectivity options available.
In Part II of this three-part series we will focus on the data processing and data storage aspects of the Internet of Things.
Surely data is just data after all? It really can’t be all that much different from any other type of data that developers work with on a daily basis to create compelling solutions to business problems. My answer is normally, “yes and no; it’s complicated”. I’ll aim to shed some light on that question and provide insights into the common pitfalls of working in the data quagmire of the IoT world. In this, the first of a three-part series, we will look at IoT from a data perspective to appreciate the design challenges we face in a world where everything will be connected.
Before we start focussing on the here and now, a small history lesson should help to put this into perspective.
The olden days
The Internet of Things has evolved considerably over the many years I have been involved in this exciting yet confusing space. Before the “Internet of Things” became a commonplace term, we liked to use magical terms such as RFID (Radio Frequency Identification), M2M (machine-to-machine) and telemetry to describe the technology components and concepts required to bring forth data associated with physical “things” kicking and screaming into this newly enlightened digital world.Historically, “Things” solutions of old required highly specialised experience combining hardware and low-level software development. Looking back, the effort required to extract even relatively simple telemetry data from a physical device was considerable. Devices were far less standardised and generally required significant prior engineering knowledge of the device platform (and usually) the accompanying SDK. Decoding the data from a relatively simple sensor (such as temperature) would usually require some rudimentary electronics understanding, in addition to the learning curve for the device’s SDK, which would be needed to read, process and communicate the data.
Every aspect of this process was fraught with challenges and constraints. Devices were usually extremely constrained in terms of CPU processing power and memory capacity. Networking and serial connectivity capabilities revolved around bits and bytes of data. If you were lucky enough to be working on a truly “connected” device, you still had to address the constraints around limiting the data transmissions costs for cellular networks. When working at this level, you typically had to use low-level programming languages, such as C, to guarantee minimal resource usage and optimal performance of the device in question. All of this had to be done while ensuring end-to-end security of the solution.
In terms of the processing of the data from an application perspective, the majority of the earlier back-end solutions would be traditionally client-server based, usually written in Java. The diagram below illustrates a typical IoT Sensor application solution deployment.
Most solutions would have required a great deal of bespoke development and customisation, even those based on available IoT vendor application platforms/frameworks. The resultant accumulation of device telemetry data would be considered miniscule by modern standards. The design of these solutions, more often than not, followed a traditional Big Design Up-Front (BDUF) approach. The concept of agility and flexibility was quite alien and as a result, the data variety and volumes were predetermined at the start of the project. The specialist skillsets, hardware cost and availability meant that there was little room for maneuverability when the solution implementation was underway, regardless of any key learnings or data discoveries that could have improved the final outcome. Essentially, many projects during this time were hamstrung by many factors that limited the value that could be obtained from the data.
Back to the future
Fast forward just a few years and we’re into our modern world of a hyper-connected, always-on digital planet. It’s seems barely a week goes by with some analyst report predicting an ever-higher number of connected. Not everyone agrees on these projections but we can agree that the number will be very, very large indeed. For example, in 2018 Cisco, published the following example relating to our current rate of data generation:5 quintillion bytes of data produced every day (that’s 2.5 followed by 18 zeros)
In terms of the predicted number of devices:
By the year 2020, the IoT will comprise more than 30 billion connected devices”
In terms of the predicted number of devices:
By the year 2020, the IoT will comprise more than 30 billion connected devices”
The landscape has been transformed in so many ways that we now have the opposite problem: too much data to handle. The reason why nobody agrees on future data predictions is due to the rapid pace of technological progress and innovation in the IoT world. More importantly, this progress is showing no signs whatsoever of slowing down. Despite IoT’s complexity, it’s still possible to break down the area into a small number of fundamental building blocks, the core essence of which has remained relatively unchanged over the years. These areas are summarised in the diagram below.
Let’s delve into each of these in more detail to explore the concerns and challenges of data in the design of the Internet of Things.
Devices
The current and future capabilities of our connected devices is far beyond the expectations we had 10 years ago. This has been made possible by relatively recent but rapid advances across the entire IoT ecosystem.
Faster, Better, Cheaper
Previous attempts to standardise the data devices produced were partially constrained by the processing capability and storage capabilities of devices. That is no longer the case. We have an almost limitless choice of low-cost available devices to choose from today. These devices are now designed and built with modularity in mind, which enables a wide combination of components to seamlessly integrate to meet a solution’s need. If we;re still unable to find a particular device combination off-the-shelf to suit our needs, then we can have one designed especially for us, at a fraction of the cost of only a few years ago. These chipsets are now of an order of magnitude faster than their predecessors. They also manage to achieve this while also being far more efficient at power management. This now means we have the ability to process data on the device at a rate thought unimaginable previously, at a far lower cost and consuming far less power.
Strength in Numbers
As modern devices are far smaller than their predecessors, industrial equipment can now contain multiple devices, each focused on capturing and managing certain aspects of the machine’s behaviour. Each of these devices in turn is capable of interfacing with a multitude of either internal and/or external sensors and actuators. Not only does this allow for capture of a wide spectrum of environment and machine data, it also enables redundancy, which is vital when dealing with equipment in hard-to-service remote locations.
Built to Talk
Traditionally, embedded devices would require connections (usually wired) to machine-to-machine (M2M) gateways in order to forward sensor data to a central server. Modern electronic design and manufacturing advances now enables many devices to have a choice of network communications built right into the silicon in device designs such as System-On-Chip (SoC) and System-On-Module (SoM). It’s far more commercially viable to provide connectivity as part of the core chipset, thus facilitating increased opportunities for device connectivity. Even if this isn’t required from day one, it can be enabled by software at a later date when needed.The combined result of having devices with all the capabilities described above is a machine capable of continuously producing enormous amounts of data in near real-time for little cost. This means you can install connected devices almost anywhere you might need them.
Data Connectivity
Ubiquitous data from devices assumes ubiquitous connectivity. The availability of affordable, high-speed and reliable cellular networks has revolutionised the connectivity capabilities of IoT. We can now deploy devices in far-flung locations and (if designed accordingly) provide remote maintenance and diagnostics of the equipment without the need for an expensive visit by a field service engineer. However, these benefits don’t come free-of-charge. The cost associated with the addition of a cellular model combined with the associated connectivity data charges can be significant, particularly for organisations with large number of devices to manage. As a result, many organisations have been looking for viable alternatives to cellular connectivity, particularly in scenarios where cellular is considered overkill for the miniscule amounts of data needed.New Kids on The Block
The need for alternatives to the traditional cellular connections has resulted in some exciting innovations. We now have viable commercial alternatives to cellular that are not only cheaper (free in some cases) but also more efficient. Network connectivity in the form of LPWANs (Low Power Wide Area Networks) such as LoRaWAN and Sigfox are gaining significant traction in the marketplace. Grassroots open IoT networks, such as The Things Network, have emerged rapidly following the potential offered by LPWAN protocols such as LoRaWAN and the associated low-cost chipsets from Semtech. We now have many more options available to us to select the most appropriate network technology and provider based on our technical and commercial requirements. This obviously also means we have more to learn about how to handle the data but luckily these networks are being built with this in mind. The result is that we have removed yet another set of constraints in consuming IoT data which means considerably more data delivered on-demand.Despite all the interest in emerging LPWAN standards such as LoRaWAN, Sigfox and also NB-IoT, it’s important to remember that we still have an extensive toolbox of pre-existing connectivity mechanisms at our disposal to incorporate into IoT solutions. These should be considered complementary to standards such as LPWAN. The table opposite provides a high-level summary of the key connectivity options available.
In Part II of this three-part series we will focus on the data processing and data storage aspects of the Internet of Things.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.