Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Lessons from CrowdStrike

Striking the right balance to reduce risk and maximize resilience

If you’re building software, you’ll never be able to remove all risk: whatever kind of artifact you’re developing, there’s always a chance it will contain a flaw or vulnerability. That’s not news to anyone in software, but the recent CrowdStrike outage did serve as a timely reminder that what matters is, above all else, preparedness to address issues and problems when they do inevitably arise.

 

But what does preparedness look like? At one level, it involves business leaders getting closer to the reality of what’s happening on the ground of their organizations. But another important element is striking a balance between risk and engineering efficiency.

 

Getting it right will not only ensure you can deliver high quality software quickly, it will also put you in the best possible position to respond effectively if (when) technical disasters do hit. Let’s take a look at how that can be done — we think the practices and processes we use at Thoughtworks provide a robust foundation for organizations to properly balance risk and resilience.

 

Enable asset awareness

 

As Jim Gumbley highlighted in his recent article, resilience is all about trade-offs. The first step to making the right trade-offs for your business is to identify levels of criticality within your unique context and broader ecosystem. That’s where asset awareness becomes extremely valuable.

 

Asset awareness means building up a complete understanding of the assets across your digital ecosystem — and the users, partners and services that interact with them. 

 

Becoming asset-aware takes a lot of internal reflection, ecosystem mapping and dependency analysis. It’s best enabled by organizing multidisciplinary groups comprised of diverse stakeholders from across the business to sit down, share and discuss details about your existing and emerging assets. The more perspectives you can bring into the analysis process, the better — and deeper — your understanding of risk will be.

 

Once you have a clear idea of where your most critical risk areas are — and which failure points could have the greatest impact on your business — you can start making informed decisions about where to prioritize your mitigation and resilience efforts.

 

Look ahead with futurespectives

 

Once you’ve built up a strong and holistic view of risk areas across your digital ecosystem, it’s time to get more granular. You need to begin looking at how risk might manifest itself across your software delivery processes and products.

 

One technique Thoughtworks frequently applies to achieve that is futurespectives. Futurespectives are very similar to agile retrospectives, but rather than reflecting on how things have played out in practice, teams consider how they could unfold — along with any major issues that may occur.

 

Futurespectives help teams to look at upcoming or ongoing projects through many different lenses, and identify opportunities to optimize and improve them proactively. From a risk mitigation and resilience perspective, one activity that’s especially useful is conducting pre-mortems.

 

Pre-mortems are opportunities to discuss what could go wrong with upcoming software projects and releases. Crucially, pre-mortems don’t just surface potential routine challenges with these projects; they get everyone talking about unlikely but highly impactful disaster events — much like that experienced at CrowdStrike.

 

This makes them a great tool for preparing for unpredictable events. By exploring hypothetical disasters, teams can start to understand and map their impacts and, most importantly, develop clear and actionable response plans in case one of these unlikely possibilities becomes a reality.

 

It's likely your organization has a number of risk-averse decision makers: leaders who are held accountable when unplanned events occur. Pre-mortems can help here, too. Concerns and fears can be captured via this method so that worrying scenarios are considered, discussed and documented no matter how unlikely they may seem. 

Max Griffiths, Thoughtworks
Our aim is always to understand an organization’s unique risk environment, build robust and actionable response plans and embed engineering practices that strike the optimal balance between risk mitigation, time to market and organizational agility.
Max Griffiths
Head of Platform Engineering, Thoughtworks Europe
Our aim is always to understand an organization’s unique risk environment, build robust and actionable response plans and embed engineering practices that strike the optimal balance between risk mitigation, time to market and organizational agility.
Max Griffiths
Head of Platform Engineering, Thoughtworks Europe

Deepen understanding with threat modeling

 

In a similar vein, threat modeling is an agile security practice that our teams will often implement when supporting major software projects or engineering and organization transformation initiatives. Where pre-mortems are one-off exercises undertaken at the beginning of a project, threat modeling is a repeatable practice that should become habitual within delivery teams’ workflows. It's also primarily concerned with cybersecurity risks.

 

Threat modeling continuously challenges teams to consider what may go wrong within their projects and what risks their decisions may expose the organization to. This, in turn, drives them to plan responses to those risks, should they materialize. This fosters confidence in software and systems, improves general security awareness, and helps break down silos between teams.

 

Threat modeling is at its most valuable when teams:

 

  • Nominate a threat modeling facilitator responsible for managing the process and ensuring it is frequently conducted

  • Involve SMEs where possible. Security SMEs should be involved across most aspects but also consider data protection, architecture and risk management groups

  • Schedule threat modeling exercises that cover all the applications and workflows a team touches, and share them with security SMEs.

  • Convert outcomes and actions to additional acceptance criteria, tech debt stories and definitions of “done”

 

  • Share action plans that clearly lay out what will be done by whom when a threat manifests itself

 

Convert risk awareness into balanced actions

 

The practices I’ve introduced so far are primarily concerned with identifying and understanding risk so leaders can make informed decisions about what a balanced risk mitigation plan looks like for them. But that’s just one piece of the puzzle.

 

Once you’ve established a clear understanding of your environment and risk profile — and developed robust response plans — you need to put the right daily practices and standards in place to make risk mitigation an always-on process for engineering teams.

 

At Thoughtworks, we take a holistic approach to risk mitigation and minimization that starts with engineering effectiveness and excellence. By enabling engineering teams to do their best work, we can naturally reduce errors and, by extension, risk.

 

We utilize a huge range of levers to keep risk to a minimum without sacrificing agility, such as:

 

  • Embedding DevOps practices proven to increase code quality and delivery speed simultaneously

  • Enabling continuous delivery processes that incorporate rigorous in-pipeline quality checks

  • Building robust testing strategies that leverage automation to accelerate time to market without sacrificing release quality

  • Establishing strong security practices from the beginning of every product or service, so that they can be considered at every stage of development

  • Prioritizing stability metrics like mean time to recover (MTTR) that ensure organizations can rapidly get back on their feet following a crisis event

As we’ve already noted, no organization can realistically achieve 100% risk mitigation. So, our aim is always to understand an organization’s unique risk environment, build robust and actionable response plans and embed engineering practices that strike the optimal balance between risk mitigation, time to market, and organizational agility.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

We help organizations build resilience through modernization