The Root-Reliant Bluetooth Mesh: GilgaMesh
Published: April 06, 2017
This is the final article in our three-part series looking at Bluetooth Meshes, inspired by the challenge of improving data collection from conference attendees.
In “Introduction to Bluetooth Meshes”, we learned about Bluetooth connection protocols and designed a simple meshing algorithm called a conforming mesh. Next, in “Bluetooth Mesh Structures”, we explained the hazardous nature of loops and added a loop-avoidance strategy to our existing algorithm.
In this piece, we delve into the specific properties of loopless meshes. We will ultimately propose an alternative to the conforming mesh that provides built-in solutions for some common issues with mesh formation. Finally, we introduce the real-world open source implementation of our mesh, GilgaMesh.
Graph theory defines a connected graph as one where there exists a path between any two vertices, meaning all vertices are reachable. We can infer then that any loopless connected graph with n vertices (or, in our case, devices) must contain n-1 edges (or connections).
Every Bluetooth connection requires one device to play the central role and the other to play the peripheral role. Since peripherals can only have one central, each device in the mesh can only act as a peripheral for at most one of its connections. We can infer, then, that there are n-1 devices in the mesh currently performing the peripheral role, regardless of whether or not it is also playing the central role.
Therefore, in a loopless connected mesh, there is always one—and only one device at any given time—without a peripheral-role connection, acting purely as a central. In a network graph visualization of the mesh, the nodes will appear to form a tree, with the non-peripheral device at the root of the tree. This device is called the root node of the mesh.
But in our project, the existence of a root node posed a problem.
We’d deployed our conforming mesh framework onto our custom-built programmable devices—those with BLE-capable nRF51 chips—and successfully creating a stable network. We then attempted to introduce a tablet into the mesh as a “gateway” device to aggregate data and display diagnostic information. We ensured that the tablet could only act as a central—it could only scan and not advertise— to directly control the connection.
We discovered that connecting the tablet was much harder than we expected. It would often take several rounds of rebooting devices, essentially reforming the mesh, to get the tablet to connect.
The reason? Since the only device that is advertising itself in a conforming mesh is the root node, a new central’s only possible point of entry into the mesh is through the root node. If by chance the new device is not within the root node’s advertising range, it will be unable to establish a Bluetooth connection and tap into the network.
In a mesh with a small geographical footprint where the vast majority of devices are within range of each other, that might not be a problem. Once the mesh covers more space and devices, it becomes increasingly difficult to add a new central device.
Through brainstorming, we realized something important about the data passed through the network: the individual devices never really needed to share it directly with one another. The only function of passing data from device to device was to ultimately to pass data to and from the gateway tablet.
We discovered that we actually did not need a stable working mesh unless the gateway was in range. Each device could function on its own just fine without communicating to others, and would store up its data until a working connection was established.
It would be so much easier if, instead of attempting to join an already-existing mesh, the gateway could trigger the creation of the mesh—and establish its place as the root node—simply by being within range of any device. This led us to try a new connection algorithm: the root-reliant mesh.
In our scenario, using root-reliance means that our custom hardware will remain unconnected until our tablet enters into Bluetooth range and initiates connections with the custom devices. At this point, a mesh will form. When the tablet disappears, so will the mesh.
The responsibilities and functionality of the mesh devices—and the process of forming the mesh—are now encapsulated as follows:
A loop initially requires a series of dual-role devices to form a chain of connections with each other and is closed when any device in the chain performs the central role and connects to the one device in the chain without a peripheral-role connection—the root node of that cluster. But in a root-reliant mesh, the root node is a central-only device and cannot perform the peripheral role. It can therefore never participate in a loop.
The other dual-role devices in the mesh will never sustain their own separate loops. They are forced to join the main cluster as peripherals, using their single peripheral-role connection slot first, before creating any additional connections. We now have a guaranteed loopless mesh without needing to add any additional loop-preventing logic, such as the “mesh ID” strategy introduced in the previous article.
A further advantage is that the dedicated root node decreases data traffic over the mesh. If one of your devices is being used as a gateway, or has any other special properties such as extra processing power, it can be useful to direct particular data packets to that device—rather than sending them to every single node in the mesh.
For example, our project incorporates “heartbeat” messages generated by each device. These contain information about current status—useless to other members of the mesh but can be used by the gateway to build and display a real-time dashboard.
Directed messages like these will reach their destination eventually when spread throughout the entire mesh, but traffic can be reduced if the destination device is known to be the root node. To send data to the root, messages only need to flow “up” through the mesh’s tree structure. This flows from peripheral to central—through the single peripheral-role connection maintained by each device—until reaching the root node. Data travels via a single pathway to reach its destination, with no extra mesh structure knowledge needed.
A further benefit of root-reliance is decreased power consumption. By controlling the root node’s availability, the mesh exists only when needed. If your devices often require and act on information from each other, this is less useful. But if the vast majority of traffic involves data flowing from an individual device to a gateway, or vice versa, it makes sense to only create a mesh when the gateway is actually available.
Devices that are not required to constantly maintain unnecessary Bluetooth connections will display increased efficiency and battery life. With the root-reliant algorithm, we found our resulting mesh to be extremely efficient, stable, reliable, and battery-friendly.
In “Introduction to Bluetooth Meshes”, we learned about Bluetooth connection protocols and designed a simple meshing algorithm called a conforming mesh. Next, in “Bluetooth Mesh Structures”, we explained the hazardous nature of loops and added a loop-avoidance strategy to our existing algorithm.
In this piece, we delve into the specific properties of loopless meshes. We will ultimately propose an alternative to the conforming mesh that provides built-in solutions for some common issues with mesh formation. Finally, we introduce the real-world open source implementation of our mesh, GilgaMesh.
The Root Node
Previously, we demonstrated how to prevent loop formation in the most recent iteration of our mesh algorithm, but it’s not over yet. Now that our mesh is free of loops, what else can we deduce about its structure? Let’s review our facts.Graph theory defines a connected graph as one where there exists a path between any two vertices, meaning all vertices are reachable. We can infer then that any loopless connected graph with n vertices (or, in our case, devices) must contain n-1 edges (or connections).
Every Bluetooth connection requires one device to play the central role and the other to play the peripheral role. Since peripherals can only have one central, each device in the mesh can only act as a peripheral for at most one of its connections. We can infer, then, that there are n-1 devices in the mesh currently performing the peripheral role, regardless of whether or not it is also playing the central role.
Therefore, in a loopless connected mesh, there is always one—and only one device at any given time—without a peripheral-role connection, acting purely as a central. In a network graph visualization of the mesh, the nodes will appear to form a tree, with the non-peripheral device at the root of the tree. This device is called the root node of the mesh.
Image 1: A connected loopless mesh that displays the position of the root node.
The existence of a single root node in a mesh has some interesting implications. As it is the only device without a central connection, the root node is the only device in a mesh that is still able to actively advertise itself.
On the whole, this doesn’t create issues when adding more dual-role devices to the mesh. Those devices can either scan for peripherals and connect to the root node as a central—thereby becoming the new root mode; or they can advertise themselves and accept a connection request from any other scanning node in the mesh, becoming a peripheral.But in our project, the existence of a root node posed a problem.
We’d deployed our conforming mesh framework onto our custom-built programmable devices—those with BLE-capable nRF51 chips—and successfully creating a stable network. We then attempted to introduce a tablet into the mesh as a “gateway” device to aggregate data and display diagnostic information. We ensured that the tablet could only act as a central—it could only scan and not advertise— to directly control the connection.
We discovered that connecting the tablet was much harder than we expected. It would often take several rounds of rebooting devices, essentially reforming the mesh, to get the tablet to connect.
The reason? Since the only device that is advertising itself in a conforming mesh is the root node, a new central’s only possible point of entry into the mesh is through the root node. If by chance the new device is not within the root node’s advertising range, it will be unable to establish a Bluetooth connection and tap into the network.
In a mesh with a small geographical footprint where the vast majority of devices are within range of each other, that might not be a problem. Once the mesh covers more space and devices, it becomes increasingly difficult to add a new central device.
Image 2: A tablet scans for advertisements, but is too far away from the advertising range of the root node (which is the only actively advertising device) and therefore unable to connect to the mesh.
This was the unfortunate position we found ourselves in with our mesh application. Our most common scenario involved a set of a dozen or so devices spread out across a large conference room. The tablet would enter into Bluetooth range of the devices after they had already formed a stable mesh with one another. Frequently, the tablet happened to be outside the advertising range of the one existing root node and simply wouldn't connect. This was unreliable and inefficient. We needed an alternative solution.Through brainstorming, we realized something important about the data passed through the network: the individual devices never really needed to share it directly with one another. The only function of passing data from device to device was to ultimately to pass data to and from the gateway tablet.
We discovered that we actually did not need a stable working mesh unless the gateway was in range. Each device could function on its own just fine without communicating to others, and would store up its data until a working connection was established.
It would be so much easier if, instead of attempting to join an already-existing mesh, the gateway could trigger the creation of the mesh—and establish its place as the root node—simply by being within range of any device. This led us to try a new connection algorithm: the root-reliant mesh.
Root-Reliant Mesh Logic
The principle behind a root-reliant mesh is that the devices in the mesh will not connect to one another—or even recognize one another—until a dedicated root node is present. This root node could be a gateway or other special device, like a tablet or smartphone. Alternatively, it could be identical to the other devices in the mesh but programmed to act only as a central.In our scenario, using root-reliance means that our custom hardware will remain unconnected until our tablet enters into Bluetooth range and initiates connections with the custom devices. At this point, a mesh will form. When the tablet disappears, so will the mesh.
The responsibilities and functionality of the mesh devices—and the process of forming the mesh—are now encapsulated as follows:
- The default behavior of all devices, excluding the root node, is to advertise for connections as potential peripherals. The devices are not scanning, so no connections form yet.
- The default behavior of the root node is to scan for advertisements as a potential central. The root node itself is not advertising.
- When the root node appears within range of one or more mesh devices, it picks up the device advertisement packet and initiates a connection. Because the root node is the initiator, it becomes the central device for any and all connections it forms. The connected devices serve as peripherals.
- Once a mesh device is successfully connected to the root node as a peripheral, it stops advertising—because its one central connection slot is taken—and starts scanning.
- The scanning devices that are part of the forming mesh pick up advertisements from nearby devices and connect to them as centrals. The newly-connected devices now cease advertising and begin scanning, just as in the previous step.
- This process repeats until all in-range devices are part of the mesh.
Image 3: The formation of a root-reliant mesh, where connections propagate outwards from the gateway device
To disband the mesh, a similar chain of events occurs, but with disconnections:- When the root node is turned off or travels out of range of the mesh, its connections are severed. If it is still on, it continues to scan for available devices.
- The devices that were directly connected to the root node (one “hop” away) have now lost their connected central. In response, they immediately disconnect from all peripherals. They also stop scanning for more peripherals and start advertising again to accept any incoming connections from centrals.
- The previous step repeats for devices that were two hops away from the root node and subsequently for every device in the mesh, until they have all disconnected from each other. All devices are now back to their default state; the root node is scanning and every other device is advertising.
Image 4: The disbanding of a root-reliant mesh, instigated by the disconnection of the gateway device from the mesh.
A nested version of the disconnection algorithm also applies if any individual node in the mesh loses a connection or stops working altogether. Any isolated mesh clusters that appear due to disconnections are quickly disbanded, because any device without an active peripheral-role connection—which is now one device per cluster—will drop all of its other connections, stop scanning and resume advertising. This behavior propagates through the separated clusters until all affected devices are disconnected and advertising. They can then rejoin the main mesh through the connection algorithm, as illustrated below.Image 5: The healing of a mesh after the death of one of its members.
We now have a collection of devices that form a completed mesh only when the gateway device is in range and disassemble the mesh as soon as the gateway is unreachable. In other words: the entire mesh relies on the presence of the gateway, which has claimed the “root node” role simply by virtue of not advertising for a central connection. Advantages to Root-Reliance
The root-reliant mesh structure has some powerful benefits. One major point in its favor is that it is impossible for loops to form when using this algorithm.A loop initially requires a series of dual-role devices to form a chain of connections with each other and is closed when any device in the chain performs the central role and connects to the one device in the chain without a peripheral-role connection—the root node of that cluster. But in a root-reliant mesh, the root node is a central-only device and cannot perform the peripheral role. It can therefore never participate in a loop.
The other dual-role devices in the mesh will never sustain their own separate loops. They are forced to join the main cluster as peripherals, using their single peripheral-role connection slot first, before creating any additional connections. We now have a guaranteed loopless mesh without needing to add any additional loop-preventing logic, such as the “mesh ID” strategy introduced in the previous article.
A further advantage is that the dedicated root node decreases data traffic over the mesh. If one of your devices is being used as a gateway, or has any other special properties such as extra processing power, it can be useful to direct particular data packets to that device—rather than sending them to every single node in the mesh.
For example, our project incorporates “heartbeat” messages generated by each device. These contain information about current status—useless to other members of the mesh but can be used by the gateway to build and display a real-time dashboard.
Directed messages like these will reach their destination eventually when spread throughout the entire mesh, but traffic can be reduced if the destination device is known to be the root node. To send data to the root, messages only need to flow “up” through the mesh’s tree structure. This flows from peripheral to central—through the single peripheral-role connection maintained by each device—until reaching the root node. Data travels via a single pathway to reach its destination, with no extra mesh structure knowledge needed.
A further benefit of root-reliance is decreased power consumption. By controlling the root node’s availability, the mesh exists only when needed. If your devices often require and act on information from each other, this is less useful. But if the vast majority of traffic involves data flowing from an individual device to a gateway, or vice versa, it makes sense to only create a mesh when the gateway is actually available.
Devices that are not required to constantly maintain unnecessary Bluetooth connections will display increased efficiency and battery life. With the root-reliant algorithm, we found our resulting mesh to be extremely efficient, stable, reliable, and battery-friendly.
The Future of Bluetooth Meshes
At the end of this project, our team created GilgaMesh, an open-source mesh framework that uses the root-reliant mesh strategy and is compatible with the nRF51/52 Nordic System-on-Chip series of devices. We hope that the GilgaMesh framework—and this series of articles—will encourage and inspire more experimentation with Bluetooth. We invite you to look at and contribute to this OS mesh. And please reach out through the comments section if you have more questions.Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.