As a tester, observability — the ability to understand what’s happening inside a system — is incredibly useful. While it might be best known as something typically part of the world of reliability engineering, from a testing perspective observability makes it easier for you to learn about and explore a complex system. In turn, this puts you — and your team — in a better position to improve a system’s quality (its security, reliability, and performance) than ever before.
I learned this the hard way, like many others probably do: by getting lost in a product’s complexity. Although it’s true that this isn’t particularly uncommon if you’re a tester, as I tried to get to grips with one particular product — through its documentation and conversations with stakeholders — it became clear that the puzzle pieces of what we were working on weren’t really fitting together. I didn’t know the terminology at the time, but looking back it’s clear that the system was far from observable; it was almost impossible to understand exactly what was happening inside the application.
Symptoms of a lack of observability
The symptoms of this lack of observability could be seen in the day-to-day practices of engineers. Whenever we had a production issue, for example (there was at least one issue or incident every single day), the debugging carried out by the developers would often reach a dead end and tickets would sit blocked in Jira. This was because they lacked the contextual information and detail needed to determine and understand the root cause of a given issue.
Sometimes, to overcome these blocks, developers would use existing logs. While this might make sense as a very basic heuristic (and sometimes as a developer you’ve just got to do what you’ve got to do!) this was, in fact, incredibly limiting because of how the logs were stored and how they could be accessed. We not only had to access logs for each service separately, we were also only able to access it through notepad++. This meant developers would end up having multiple notepad++ instances open while debugging different services. The only way to search through these logs would be through manual search! (CTRL+F certainly isn’t a replacement for true observability.)
The consequences of all this were significant. Not only did it mean that developers often felt frustrated, from a business perspective it was immensely difficult to get transparency on how and when certain critical issues could be fixed. This had the potential to slowly erode the organization’s reputation with customers.
Observability is multi-dimensional
Looking back, it’s easy to see how observability was overlooked on the product. The fact, for example, that developers were using logs to debug may have given the impression that certain practices were happening and were effective.
In truth, this story underlines exactly why observability needs to be understood as something distinct from monitoring. While monitoring is useful, it only allows you to see a system or application in one dimension. One nice summary that's often used is that monitoring simply tells us whether a system works, while observability helps us to understand why it's happening and how we can go about fixing it. The developers were essentially using logs that could tell you whether parts of the system were working, but they had little more detail or insight than that. It was, to return to my analogy, like trying to piece together a puzzle without really knowing what the final thing should look like.
Testing and observability
From a tester’s perspective, there’s no replacement for the level of detail that a truly observable system can provide. Although on a practical level observability has three pillars — logs (a record of an event that has happened inside a system), metrics (a value that reflects some particular behavior inside a system), and traces (a low-level record of how something has moved inside a system) — it is also more than those three elements. “It’s not about logs, metrics, or traces,” software engineer Cindy Sridharan writes in Distributed Systems Observability, “but about being data-driven during debugging and using the feedback to iterate on and improve the product.” In other words, to do observability well, you not only need effective metrics, well-structured logs, and extensive tracing. You also need a mindset that is inquisitive, exploratory and eager to learn and the processes that can make all of those things meaningful and impactful.
This makes testers and observability natural allies. Testing is, after all, about asking questions of a system or application, being curious about how something works or, often, how something should work; observability is very much about all of those things. It’s too bad, then, that too many testers are unaware of observability — not only will it help them do their job more effectively, they’re also exactly the sort of people in the software development lifecycle who can evangelize for building observable systems.
To keep things simple, there are two key ways we should see observability as helping testers:
It makes it easier for testers to uncover more — and better — information about system issues. This is particularly true during exploratory testing when you might find an unexpected behavior inside the system. Observability can help you to dig deeper and find out what the cause might be. Sometimes you won’t find a solution; but with observability tooling that can give you a detailed insight into logs, metrics, and traces, you will be better placed to share more information with developers, making it easier to collaborate and work together to find a solution.
Observability can help testers ask questions and explore a system in a creative way. Testers are typically curious. They like to explore and are great at asking questions. With observability tools, they can explore a product in great depth and detail. This allows them to uncover valuable information that can then guide their decision making when testing.
From a tester’s perspective, there’s no replacement for the level of detail that a truly observable system can provide.
From a tester’s perspective, there’s no replacement for the level of detail that a truly observable system can provide.
Getting started with observability
So, observability can help testers in a number of different ways. But where should you begin? If you’re already in a team that has embraced observability, with exemplary instrumentation in place, then good for you! But if you’re not quite there yet, you can at least start to ask certain questions that can at the very least open up a conversation about observability.
This might start with some fundamental questions a QA should ask when preparing to test a new feature:
- Do we have any logs? If so, in what format? And how are they structured?
- Do we need any alerts once this feature goes live? Who should be receiving them?
- Do we need any dashboards for capturing business metrics?
- Have we implemented tracing? Is it extensive enough to explain the flow of tests through different services?
But beyond that, we also need to be sensitive to the reality that there will be things we don’t know we don’t know. This, really, is where we begin to move into the sphere of observability: when we address unknown unknowns.
“Plenty of tools are terrific at helping you ask the questions you could predict wanting to ask in advance,” Charity Majors, CTO of observability product Honeycomb wrote back in 2020. “That’s the easy part… But if you *can’t* predict all the questions you’ll need to ask in advance, or if you *don’t* know what you’re looking for, then you’re in o11y territory.” When Majors puts it like that, it’s pretty easy to see how observability can benefit testers: as a QA the very reason you’re doing the work you are is because we can’t predict how software systems might behave.
This is where we begin to ask questions like:
- How easy would it be for us to find out what’s happened if something goes wrong post-deployment?
- Can I understand the current & previous states of the system just by asking questions from outside the system?
Answering those questions isn’t just important from a testing perspective, it’s critical for everyone.
Making our software and our work better
Observability is a concept that helps us tackle complexity. It should come as no surprise, then, that the idea behind it is actually very simple: the more we know about our application, the more we can improve its quality and reliability.
Comprehensive observability will not only help us improve a system early in the software development lifecycle (SDLC) and prevent defects from reaching production, it can also help us pinpoint and resolve issues in production too. That's all pretty standard stuff; where things start to get really cool is when we begin to use what we've learned from observability to feedback into the continuous improvement of the system during the next development cycle. This helps us to shift even further left; once we do that, we’re not simply catching individual bugs and errors, we’re able to prevent whole classes of issues reaching prod.
Clearly such a way of thinking can benefit everyone. But insofar as testers are critical in ensuring the quality of software, observability is particularly useful — it can help us better explore and understand complex systems in ways that are easy to overlook during the development process. That means we need to not only embrace it, but also evangelize for it in our teams and organizations. We need to show people how observability can help make our software and our work better.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.