There appears to be no end of possible use cases for generative AI. While this is exciting, it can also be overwhelming. This means teams need to be intentional in how they use the technology: it’s vital to ask where generative AI can make a meaningful impact on your team's work.
One intriguing application in software engineering is in requirements analysis. It’s an often challenging but overlooked step in the process which, if not done properly, can have many negative downstream consequences.
This article describes a pilot we ran with one of our clients, where we had a team validate the hypothesis that leveraging GenAI to create high-quality user stories can lead to shorter lead times and higher quality for requirements analysis. In this case study we put that hypothesis to the test; we’ll explain what we did and what we found out.
Approach
Defining scope and goals
After picking this team as a pilot, we ran a workshop with them to identify which tasks could be supported with AI. We also worked with them to define what impact we could expect using AI to have. The workshop accomplished two main steps:
1. Find tasks suitable for meaningful AI support
The team discussed which tasks they are doing frequently and/or with a certain amount of pain or difficulty. They then picked a subset of tasks with high value and feasibility for AI assistance. One of the tasks picked was requirements analysis, because the team works in a relatively complex domain and often have to do rework later in the development process due to missed edge cases or requirements being misunderstood.
2. Define hypotheses and expected outcomes
In the second step of the workshop, the team defined the goals they expect to achieve by using AI. Here is the hypothesis for requirements analysis:
We believe that using GenAI to assist with... | ...will result in... | We will know that it is valuable when... | Risks to monitor |
Writing epics and stories |
|
|
|
Implementation
We utilized an accelerator from our service toolkit to help with the implementation of the AI assistance. The HaivenTM team assistant is an accelerator we use with our clients for a lean way to pilot the use of AI assistance for software delivery teams. In this case it provided users AI capabilities to integrate reusable prompts with contextual information about their domain and architecture.
The team’s business analyst (BA) and quality analyst (QA) were the main users of the tool. They’re both experienced in their craft and have been working on this particular team for quite a while. During this pilot, they used the tool to break down the requirements of three new epics into user stories. Each epic was about building additional capability for an existing feature.
Learnings
Context is key!
One of the key learnings was about just how much context the team needed to provide the AI to be useful. Haiven was helpful here as it allowed users to define reusable context descriptions that could then be pulled in by a user for each conversation with the AI. This meant they didn’t have to repeat the same contextual information in every single interaction.
As we’ve mentioned already, this team was working in a relatively complex domain, and the epics were about expanding the capabilities of an existing feature. So, initially, they spent some time on describing the domain and architecture for the AI in a way that could be reused every time they asked the AI for assistance. The resulting context gave both a general description of the logic and domain language and also specified how the current feature actually works, to allow AI to help with the expansion of those features.
This significantly improved the results they got, but also showed that an initial investment was necessary to make the AI useful in their situation. Similar to coding assistants, it’s harder to use AI for changing existing requirements than for designing brand new features from scratch.
Users need time to get used to AI assistance
Initially, users struggled with how to interact effectively with the AI. Discovering the non-determinism of LLM responses and understanding the implications of that required a learning curve. Over time, users adjusted their expectations of the outcomes and became more comfortable with the AI as an assistant, as opposed to a piece of software that gives them perfect artifacts. They also learned how to ask the AI to course correct in a chat conversation when they found that initial outputs were incorrect or flawed.
Developers often report “review fatigue” when working with coding assistants, so we also asked the BA and QA about their experience when reviewing AI outputs. They reported that it wasn’t too cumbersome to review the scenarios, at least not with their level of experience.
It’s hard to measure impact quantitatively
We found that it’s even harder to measure the impact of AI in this area than it is for coding assistants.
These tasks aren’t done as frequently as coding, so improving them doesn’t have the type of impact on a team that can be measured separately.
Epics are even less comparable units of work than stories or technical tasks; this means it’s hard to compare them with historical data.
One of the indicators of requirements analysis quality is the amount of times stories get blocked for clarification, or bounce back and forth in a team’s workflow because they are incomplete or unclear. This type of data is usually not tracked in a very fine-granular way, as that would make processes and task boards very complicated.
However, just because something can’t be measured quantitatively, it doesn’t mean it’s not valuable! The following observations about quality and speed are based on estimations of the AI users in this case.
Impact on quality and team flow
To reiterate, part of the hypothesis was that using AI for requirements analysis would lead to shorter lead times, reduce rework and result in fewer stories being blocked for further clarification.
The BA reported they could go into discussions with developers with greater confidence, thanks to their preparation being more effective and comprehensive due to the AI assistant. They were able to answer the questions that came from developers in the estimation session, and didn’t have to go another round of filling gaps in the analysis.
The QA found that once the context was well-defined, the AI-generated acceptance criteria and testing scenarios were better than what they could have produced by themselves. They estimated that when they started testing the developer work, they found ~10% fewer bugs and reasons for rework than usual, because edge case scenarios were better covered in story definitions.
Impact on analysis speed
While the sample size of three epics makes it challenging to draw any definitive conclusions, the team estimates that there was a reduction in analysis time of ~20%, despite the time required to create the context. The expectation is that as context creation becomes more streamlined, and contexts can be reused, which means time savings will be more significant in the future.
Conclusion and outlook
In summary, this case study shows that AI can bring benefits to quality, speed and overall team flow. In this particular organization, the approach will next be used on a few more teams, with different experience levels and different processes, to see if the learnings can be reproduced, and if further gains in effectiveness can be made.
This — as well as other experiences of using AI on software teams — confirmed to us once more that context orchestration is key when thinking about this space.
The team described here works in a relatively complex and unusual domain, and they found AI wasn't really helpful to them until they fed it with more elaborate descriptions of that domain context. This barrier is much lower for teams who work in commonplace domains like e-commerce or customer data management as the model training data can usually be helpful for a wide range of use cases in those domains without extra nudging.
Haiven supports semi-manual curation and lets you reuse context descriptions: The team described the domain as part of the prompts, and indexed the relevant parts of their wiki documentation for the assistant application. While this requires some effort to set up, it’s nevertheless a lean and easy way to explore AI's potential in this space. However, we're closely monitoring the software tools market for innovation in more automated and intelligent context orchestration, to best consult our clients on how to get the most out of AI on their software teams.
Codebases contain the ultimate truth about how an application works. It’s always better than documentation or descriptions that are potentially out of date or inaccurate. Beyond the scope of this case study, there are interesting and powerful ways to provide AI with context about a codebase that we have used with clients, which let users ask questions about that codebase without needing to understand or browse the code. The team didn’t use such a tool in this instance, but the potential is obvious, especially for specifying changes to existing features.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.