Advocating for software quality at METRO
Published: August 15, 2017
Technical change on a big software development project can be a smart business choice, but it's hard. You’re always working against the inertia of the ever-expanding existing codebase and have to make tough decisions about improving what's there versus delivering new functionality. METRO Cash & Carry is a membership-based, self-service wholesaler with operations across 25 countries. Here's what our team learned while moving their UI from Reflux to Redux:
Redux and Reflux are two JavaScript-based UI libraries that help describe and manage an application's state. Both are, generally speaking, implementations of the Flux architecture pattern of unidirectional data flow. I personally prefer the simplicity of Redux.
It's only fair to acknowledge up front that a lot of what we struggled with was due to the complexity and untested nature of the inherited Reflux codebase. It isn't really a fair basis for comparison. I don't intend this to be an anti-Reflux tirade; in fact, the specific technologies in question are pretty irrelevant.
Development and testing of our inherited codebase turned out to be slow and frustrating, particularly for the UI. From the beginning, the intention of moving to Redux lay in the air across teams/verticals. Due to lack of Redux experience on the project, there was initially some concern over the development opportunity cost of a major refactoring. We needed a way to assess the business risks, benefits, and technical feasibility.
The first step was to get buy-in on Redux from the Product Owner and business team. Our strategy was to present the pros and cons of a migration. We let the analysts and Project Owner decide whether to invest in Redux.
In summary, this is what we presented to discuss with our client. Moving to Redux would bring our team and METRO the following technical, business, and political benefits:
We found that in general it was no trouble to have the two fluxes co-exist, and that incremental strangulation of Reflux was possible (See the Technical Appendix for details). The refactoring task could be broken up into bite-sized chunks and spread it out over time if required (e.g. due to other high-priority work/new scope coming in), keeping the application fully functional at every stage.
However, we found that in practice there was a high cost in quality assurance at each stage, both because of lack of regression tests and because the tests that did exist were brittle and tightly coupled to Reflux.
One of the uses of test-driven development is a tool to document the existing codebase. Fixing bugs becomes hard when the tests documenting a critical piece of code are either insufficient or obscured by complexity, both of which applied in our case. 75% of our time on the spike was spent trying to make the existing tests pass even though the code was obviously working. This made our incremental strategy impossibly painful. Of course, your mileage may vary, and with a more forgiving codebase I'm confident that this approach would work just fine.
This put us in a difficult situation. We could come up with a better test strategy for Reflux, refactor the tests, then migrate to Redux incrementally while keeping the tests green – but then we'd be investing developer time in a framework that was on its way out
This had the dual advantages of giving us all the benefits of increased velocity and testability while developing the rest of the feature, without having to refactor much Reflux code in the first place. However, since the only source of truth for business regression confidence was the existing Reflux implementation itself, our rewrites relied heavily on TDD and our functional test suites for regression confidence.
New flows would be written in Redux from the start; old but non-key flows could be addressed on a case-to-case basis. For example, after successfully implementing the 20%-complete flow in Redux, we kept the conversation open with our Product Owner and eventually also migrated the 90% complete flow once we had the capacity three months later. The long-term goal was to refactor more and more business flows as they became relevant to the scope. We ultimately ended up with something similar to an incremental migration strategy, but from a high-level business flow perspective rather than technical perspective.
Of course, this incurred the cost of permanently having two state management systems in the codebase, which in theory required developers to be familiar with both Reflux and Redux. However, this turned out not to pose much of a problem in practice. We considered this a small price to pay because of these major benefits:
Our ultimate approach was inspired by a recent email thread on the Thoughtworks internal Software Development group on the strangulation vs. containment of legacy code. The thread helped me realise that this was a false dichotomy – the most practical middle path was to do a partial refactor only in key areas and treat already existing features as a black box (containment).
Critically, any flow that was selected to be moved to Redux needed to be refactored completely, since we had discovered that the quality assurance cost of maintaining interleaved fluxes in the same flow was insurmountable given our timeline.
Any technical rework/major refactoring involves two migrations: one of tech, the other of information/skills (assuming not everyone is familiar with the new tech). I get the feeling that this latter task tends to be the more daunting, and contributes significantly to teams choosing not to invest in quality (when quality means new/better tech). How to approach this problem?
I don't agree with the "having to learn new things will reduce our velocity" argument. Developers upskill all the time – new projects tend to introduce new tech and business domains. Learning and applying new stuff is a critical skill. And of course, there are usually long-term business costs associated with not upskilling developers.
It was important to realise that performing the Redux migration would create an information silo, and that whoever worked on it needed to spend time upskilling the rest of the team. Being a local expert is seductive, and there is no doubt a certain dark temptation to "be needed" and protect one's territory. This is something that everyone on the team needs to be aware of and complicit in managing, for example by giving feedback when appropriate.
Pair programming was an indispensable knowledge-sharing tool. I paired on the Redux migration with our UI developer, then pulled in other developers for new stories in Redux-land. This worked well, and I stopped primarily focusing on upskilling others after around a month. By this time, two new developers had a strong enough grasp of Redux to help others learn. By Agile/Scrum standards, we were a big team (~12 developers), so not everyone immediately got the chance to learn Redux, but the team had enough knowledge to become self-supporting.
Here are the key ideas to make the case for the daunting "investment" of upskilling a team in the future:
- Identify technical pain points that cause business pain points.
- Get buy-in from analysts and the Project Owner to make sure everyone understands the cost of the pain point.
- Assess the cost of technical mitigation (time box this).
- If the cost of complete mitigation is too high, mitigate the issue only for key business flows that are likely to see high development in the future and turn non-essential and already-developed flows into a black box.
- Up-skilling the team on the new tech is a critical part of the migration. Pair programming is a powerful and effective tool to spread knowledge among developers while continuing to deliver functionality.
- The benefits of having different parallel technical implementations (in this case parallel use of Reflux and Redux) outweigh the costs.
Motivation
This is a resource to help teams advocate and implement technical quality. I'll cover the business-focused arguments we made at METRO for moving UI codebases to Redux, as well as the specific steps we took to negate the business risks of the migration.Redux and Reflux are two JavaScript-based UI libraries that help describe and manage an application's state. Both are, generally speaking, implementations of the Flux architecture pattern of unidirectional data flow. I personally prefer the simplicity of Redux.
It's only fair to acknowledge up front that a lot of what we struggled with was due to the complexity and untested nature of the inherited Reflux codebase. It isn't really a fair basis for comparison. I don't intend this to be an anti-Reflux tirade; in fact, the specific technologies in question are pretty irrelevant.
The business problem
In late 2016, I joined one of our teams at METRO. They were in the middle of a digital (and Agile!) transformation and needed to translate their offline business into a new microservice-based online platform. Thoughtworks was brought in to continue development of several of the platform’s microservices.Development and testing of our inherited codebase turned out to be slow and frustrating, particularly for the UI. From the beginning, the intention of moving to Redux lay in the air across teams/verticals. Due to lack of Redux experience on the project, there was initially some concern over the development opportunity cost of a major refactoring. We needed a way to assess the business risks, benefits, and technical feasibility.
Making the case for quality
The first step was to get buy-in on Redux from the Product Owner and business team. Our strategy was to present the pros and cons of a migration. We let the analysts and Project Owner decide whether to invest in Redux.
In summary, this is what we presented to discuss with our client. Moving to Redux would bring our team and METRO the following technical, business, and political benefits:
- Gracefully scalable UI
- Improved code readability and simplicity
- Better extensibility
- Reduced brittleness
- Fewer bugs
- Improved debugging and incident response
- Potential velocity increase
- Better test coverage and code testability
- Solve a common problem and improve the whole platform
- Improve our team's reputation and influence on the platform
The spike
The pitch got us enough buy-in to immediately play a two-day spike to assess the technical feasibility of incrementally moving to Redux and having it co-exist with Reflux (a big-bang migration seemed to be out of the question due to time constraints).We found that in general it was no trouble to have the two fluxes co-exist, and that incremental strangulation of Reflux was possible (See the Technical Appendix for details). The refactoring task could be broken up into bite-sized chunks and spread it out over time if required (e.g. due to other high-priority work/new scope coming in), keeping the application fully functional at every stage.
However, we found that in practice there was a high cost in quality assurance at each stage, both because of lack of regression tests and because the tests that did exist were brittle and tightly coupled to Reflux.
One of the uses of test-driven development is a tool to document the existing codebase. Fixing bugs becomes hard when the tests documenting a critical piece of code are either insufficient or obscured by complexity, both of which applied in our case. 75% of our time on the spike was spent trying to make the existing tests pass even though the code was obviously working. This made our incremental strategy impossibly painful. Of course, your mileage may vary, and with a more forgiving codebase I'm confident that this approach would work just fine.
This put us in a difficult situation. We could come up with a better test strategy for Reflux, refactor the tests, then migrate to Redux incrementally while keeping the tests green – but then we'd be investing developer time in a framework that was on its way out
Strategy for moving to Redux
Focusing on the business value for METRO helped us resolve this quandary. In the end, we ditched the incremental migration strategy and essentially did a rewrite of key flows (end-to-end user interactions) that hadn't been developed much. For example, in our mobile UI we had two distinct flows – 90% of one had already been developed, but only 20% of the other. We chose to initially move only the latter flow to Redux.This had the dual advantages of giving us all the benefits of increased velocity and testability while developing the rest of the feature, without having to refactor much Reflux code in the first place. However, since the only source of truth for business regression confidence was the existing Reflux implementation itself, our rewrites relied heavily on TDD and our functional test suites for regression confidence.
New flows would be written in Redux from the start; old but non-key flows could be addressed on a case-to-case basis. For example, after successfully implementing the 20%-complete flow in Redux, we kept the conversation open with our Product Owner and eventually also migrated the 90% complete flow once we had the capacity three months later. The long-term goal was to refactor more and more business flows as they became relevant to the scope. We ultimately ended up with something similar to an incremental migration strategy, but from a high-level business flow perspective rather than technical perspective.
Of course, this incurred the cost of permanently having two state management systems in the codebase, which in theory required developers to be familiar with both Reflux and Redux. However, this turned out not to pose much of a problem in practice. We considered this a small price to pay because of these major benefits:
- Choosing a clear business flow to refactor allowed us to rework existing functionality in that flow and treat all other flows as a black box (there was never a case where a flow used both Reflux and Redux).
- Learning Redux was easy after learning Reflux due to their similarity
- Developers worked mainly with Redux since that flow had the most items in the backlog.
- We received all the benefits of Redux for the new flow. This gave us the critical velocity to stay on-track for the MVP go-live date.
- By reducing the scope of the migration we needed less time to complete it.
- We could TDD the Redux rewrite from the bottom up and use our functional test suite for regression safety until we had finished the migration.
Strangulation vs. containment
I was skeptical of a complete incremental rewrite (strangulation) due to the size of the legacy codebase and the time pressure. At the same time, not doing the migration was causing Thoughtworks teams to burn developer time with Reflux and slow the pace of delivery.Our ultimate approach was inspired by a recent email thread on the Thoughtworks internal Software Development group on the strangulation vs. containment of legacy code. The thread helped me realise that this was a false dichotomy – the most practical middle path was to do a partial refactor only in key areas and treat already existing features as a black box (containment).
Critically, any flow that was selected to be moved to Redux needed to be refactored completely, since we had discovered that the quality assurance cost of maintaining interleaved fluxes in the same flow was insurmountable given our timeline.
Upskilling the team
We had to make sure everybody knew how to use the new Redux codebase. Since only two developers on our team had used Redux before, and since I had become an information silo for our particular implementation during the migration, we still needed to get the relevant knowledge out of my head and into everybody else's.Any technical rework/major refactoring involves two migrations: one of tech, the other of information/skills (assuming not everyone is familiar with the new tech). I get the feeling that this latter task tends to be the more daunting, and contributes significantly to teams choosing not to invest in quality (when quality means new/better tech). How to approach this problem?
I don't agree with the "having to learn new things will reduce our velocity" argument. Developers upskill all the time – new projects tend to introduce new tech and business domains. Learning and applying new stuff is a critical skill. And of course, there are usually long-term business costs associated with not upskilling developers.
It was important to realise that performing the Redux migration would create an information silo, and that whoever worked on it needed to spend time upskilling the rest of the team. Being a local expert is seductive, and there is no doubt a certain dark temptation to "be needed" and protect one's territory. This is something that everyone on the team needs to be aware of and complicit in managing, for example by giving feedback when appropriate.
Pair programming was an indispensable knowledge-sharing tool. I paired on the Redux migration with our UI developer, then pulled in other developers for new stories in Redux-land. This worked well, and I stopped primarily focusing on upskilling others after around a month. By this time, two new developers had a strong enough grasp of Redux to help others learn. By Agile/Scrum standards, we were a big team (~12 developers), so not everyone immediately got the chance to learn Redux, but the team had enough knowledge to become self-supporting.
Here are the key ideas to make the case for the daunting "investment" of upskilling a team in the future:
- Upskilling a team doesn't need to disrupt the normal workflow too much – use pair programming to introduce team members to the new tech while delivering new stories.
- It's helpful to have someone who has a strong grasp of how the new thing works and can help others learn, but they need to be clear that as long as they are a silo they are a threat to the team's productivity.
- Everyone on the team needs to hold this expert accountable.
- Senior developers have likely seen the "new" patterns before and will grasp them immediately. Focus on upskilling seniors first, then they can pair with others while delivering new functionality.
- Focusing on strict TDD helps the team understand new tech by making it really obvious how the system should be working.
Other benefits
- Once Redux was in, it was easy to extend. Any new flow we added was done in Redux. Reflux was contained to legacy areas.
- We demoed the success of the Redux migration to METRO at our sprint showcase by presenting the refactored flow in the browser with the Redux devtools open. This gives business folks some feeling for what has happened, since the flow itself hasn't changed.
- Our QAs and PO seem to like the Redux devtools. They're super powerful for developers, but also useful for analysts because they visualise state and actions, which are usually declaratively-named in a way that relates to some business concept.
- There were quite a few instances during the migration where we found issues with the existing implementation, which had probably been introduced due to lack of TDD. We made note of them but still translated them one-to-one into Redux since we didn't want to change too many things at once and had no tests or business context to explain why the implementation was the way it was. After the migration was complete we used our best judgement, based on our experience, to address each of these issues.
Conclusion
At METRO we decided to invest in technical quality. We performed a migration of legacy UI code to Redux, focusing on the deliverable business value to manage concerns about development opportunity cost. We demonstrated the feasibility of making technical improvements while delivering and have kept the conversation around quality open, making further improvements when possible. I believe that this success will result in a more valuable and more sustainable product, and I’m looking forward to seeing where the Thoughtworks-METRO partnership will take us next.Technical appendix
Our original plan for incrementally migrating to Redux is summarized below. Because Reflux and Redux both follow the basic Flux architecture pattern of unidirectional data flow, one can incrementally take over the Reflux state cycle with Redux:- Complete Reflux cycle
- Action handling (e.g. async) in Reflux, but keep all state in Redux. Reflux store action listeners dispatch Redux actions with the results of remote API calls and no longer call this trigger. React components receive props mapped to Redux state and no longer listen to Reflux stores. Reflux essentially acts as async middleware for Redux.
- Action handling (e.g. async) in Redux, with appropriate middleware. Reflux store action listeners now do nothing but straight away dispatch corresponding Redux actions.
- Complete Redux cycle – stop dispatching Reflux actions from React components and start directly dispatching Redux actions.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.