Why test-driven development and pair programming are perfect companions for GitHub Copilot

Juan Infante Zumer ,

Carlos Cavero and

Carlos Barroso

Published: July 08, 2024

GitHub Copilot, an AI-powered coding assistant, is rapidly transforming software development. Studies show Copilot can significantly boost developer productivity with features like predictive text leading to 55% faster task completion. Copilot can also improve code quality in areas like readability and maintainability. This frees developers from repetitive tasks, boosts morale and allows them to focus on the creative and strategic aspects of software development.

However, measuring the true impact of Copilot requires a comprehensive approach that considers not just speed and quality metrics, but also developer adoption, team dynamics and potential biases within the AI model itself.

After practicing with GitHub Copilot for several months now, we identified some pitfalls when coding and how to overcome them. The tool gives you a much faster pace, but it is not magic. Its non-deterministic nature, where it generates various suggestions for a given prompt, can be an unexpected and potentially distracting feature.

While GitHub Copilot offers remarkable functionality, it's essential to remember that “with great power comes great responsibility”. Or, quoting Michael Feathers, “when we use AI for code generation, quality assurance becomes much more important”. We advise exercising caution when using it to mitigate the occurrence of code generated by the AI assistant that may not meet expectations such as Context poisoning, Automation Bias, Sunk Cost Fallacy, Anchoring Effect and Auto-completion on steroids. We term these incorrect behaviors AI 'smells'.

Instead of leaning back and relaxing because GitHub Copilot will code for you, you need to be extra focused on implementation decisions. Otherwise, you may miss relevant requirements of your code, as you are pulled in different directions by random recommendations. Remember: you are still the one at the controls.

Our exploration revealed that test-driven development (TDD) and pair programming can effectively mitigate the occurrence of AI smells. In this blog, we'll delve into why we think this is.

GitHub Copilot, your AI rubber duck

Copilot is an AI assisted tool for programmers. It allows users to interact with Copilot in a natural language format or issuing specific commands and wait for code completion. Developers can describe their desired functionality through prompting or request completions for particular code lines. Copilot then analyzes the existing codebase and contextual elements to generate relevant suggestions, ranging from code completions to entire pieces of code, tailored to the programmer's intent.

While the tagline GitHub Copilot, your AI Pair Programmer may initially grab attention, it can lead to some misconceptions. It’s important to recognize that it does not fully capture the true nature of what pair programming really is.

A more accurate and descriptive phrase would be, “GitHub Copilot: your AI Rubber Duck”, aptly reflecting its ability to facilitate a process of self-explanation and introspection. GitHub Copilot's role lies in helping developers articulate their thoughts and understand the problem at hand.

Additionally, composing clear and concise comments before seeking code recommendations can significantly enhance the process of self-reflection and understanding. In essence, GitHub Copilot acts as a catalyst for internal dialogue, prompting developers to articulate their ideas and start the discussion with themselves.

Pair programming and test-driven development

Why is pair programming important?

Pair programming is a technique in software development where two programmers work together on a single computer or remotely, each taking turns to be the driver and the navigator. The driver is responsible for typing code, while the navigator is responsible for guiding the implementation, giving feedback and suggesting improvements. Pair programming promotes a culture of collaboration, error reduction and code improvement.

GitHub Copilot lacks the ability to question or challenge your assumptions, so it cannot give you alternative perspectives the way a human being can when pairing.

Test-driven development (TDD)

TDD forces us to think about the problem in a structured way, breaking it down into smaller, testable units, also preventing YAGNI violations. TDD not only improves code quality through testing, but also promotes well-designed code. TDD also emphasizes modularity and separation of concerns, design principles that make the code more flexible and adaptable to future changes, easier to understand, maintain, and modify. TDD also helps us to identify and address potential issues early on, reducing the likelihood of introducing AI smells later in the development process. GitHub Copilot is extraordinarily effective when suggesting big chunks of code and repetitive patterns, but there is always a risk of falling into some AI smells — TDD can mitigate them.

With TDD, developers write code in the last responsible moment, so they are always in control of the implementation, not GitHub Copilot. It’s worth remembering there is a task before the Red/Green/Refactor cycle: Think. This keeps you focused and avoids some of the auto-completion steroid traps.

AI 'smells' — and how to prevent them

GitHub Copilot's predictive nature makes it challenging to engage in a truly bidirectional dialogue. While the AI assistant can prompt us to think and reflect, it is important not to become passive recipients of its suggestions which can lead to some pitfalls such as the adoption of poorly crafted code, or even harmful code smells.

Interestingly, the AI smells can arise not only from relying solely on GitHub Copilot but also from solo coding. This is because our brains have a natural tendency towards laziness and gravitate towards familiar and repetitive tasks, as these require less cognitive effort. This tendency can lead us to take shortcuts and overlook important considerations.

The AI smells described in the following sections have been identified elsewhere. Along with exploring AI's potential smells, we provide helpful tips to address them.

Context poisoning

Context poisoning is when we follow good practices and prompt Copilot to do the same, but it nevertheless continues to suggest stale implementations or bad patterns

TDD is not only a way of covering your code with tests to favor the refactoring process — it’s also a way to think about architecture and the best and minimum implementation for the feature. Thus, context poisoning can be greatly mitigated using TDD and pair programming. One thing to take into consideration is that GitHub Copilot can easily generate more code than you need to make the next test pass. Some discipline and dialogue is required not to accept too much code in one go and avoid the next AI smell, auto-complete on steroids.

Tip #1. Reflection: TDD and pair programming forces us to discuss approaches and solutions. Saying and explaining things out loud pushes us to reflect if we really have the right level of understanding or if we really have the best solution.

Auto-completion on steroids

This refers to the unfortunate tendency to switch off the brain and accept everything, in an infinite Tab/Enter loop.

This can occur when we use GitHub Copilot, because communication is unidirectional; it makes it all too easy to disconnect and simply accept its suggestions. Accepting every recommendation provided by the AI assistant requires less cognitive effort.

Tip #2. Critical thinking: Pair programming encourages us to actively engage our own critical thinking abilities through discussion. It should force us to carefully evaluate the suggestions offered, and consider their relevance, maintainability and whether they adhere to good practice.

Automation bias

This is the erroneous preference to accept the automated suggestions of GitHub Copilot over the manual code input that comes from us. It stems from an assumption that an output generated through computation is ‘better’ or more accurate than something done by a human.

In reality, GitHub Copilot often provides code snippets that aren’t the most appropriate. Sometimes this is because the prompt isn’t sufficiently precise; we may need to rephrase it or provide more information or context.

Tip #3. Keeping focus: Pair programming keeps you focused by requiring you to communicate your progress and decisions to your partner. It prevents you from getting distracted and going down rabbit holes. When two pairs of eyes are looking at the code, we can better identify potential problems that might be overlooked by a single developer.

Sunk-cost fallacy

The sunk cost fallacy in this context is when we’re hesitant to delete GitHub Copilot generated code, even though coding an alternative from scratch would yield a better long-term solution.

Tip #4. Code review on-the-go and collective code ownership: Regular pairing ensures that every line of code is reviewed by at least two individuals, promoting a sense of collective ownership. It fosters a codebase that is both consistent and maintainable. It also allows objectively evaluating the current implementation, considering potential benefits, giving enough psychological safety to remove code snippets (or even the entire work) when necessary no matter who or what created it.

Anchoring effect

This is when we find it harder to develop alternative code implementations once we’ve seen a suggestion from Copilot.

Pair programming can, again, be useful in tackling this: it can encourage reflection and discussion, which can lead to a deeper understanding of the problem, the code and other potential solutions.

Tip #5. Two modes of thinking combined: Pair programming allows you to have different perspectives on the code, providing more strategic alternatives to the overall design.

Final Tip. Regular breaks: give your brain a chance to rest and recharge by taking regular breaks during your coding sessions. Stepping away from the computer can help you come back with fresh eyes and a more objective perspective.

Conclusion

The future is unpredictable: as long as new tools will continue to find their way into the developer toolchain, what really matters is getting the best out of them. At Thoughtworks, TDD and pair programming are part of our sensible defaults. GitHub Copilot (and similar tools) can enhance the quality of the software produced, but only if the development process fosters communication, and if the implementation is focused on iteratively producing clean and efficient code.

GitHub Copilot aims to transform people into 10x developers. This might stem from the erroneous belief that apparent fluency in code generation implies comprehension of underlying concepts. It is the moment to step back and reflect, not only on velocity but on effectiveness by using GitHub Copilot wisely. Thus, we strongly recommend GitHub Copilot as the perfect companion for Pair Programming and TDD and developers who utilize them are on the right track.

Use coding assistants to make pairs better, not to replace pair programming.

Thanks to Birgitta Boeckeler and the “Ensembling with Copilot” group around Paul Sobocinski in Thoughtworks Canada for their insightful memos which helps to better understand what we call GitHub Copilot AI Smells.

Thanks also to Bruno Belarte, Javier Molina, Javier López, Chris Ford and Paul Sobocinski for their review comments

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights