In the last year we've seen Chaos Engineering move from a much talked-about idea to an accepted, mainstream approach to improving and assuring distributed system resilience. As organizations large and small begin to implement Chaos Engineering as an operational process, we're learning how to apply these techniques safely at scale. The approach is definitely not for everyone, and to be effective and safe, it requires organizational support at scale. Industry acceptance and available expertise will definitely increase with the appearance of commercial services such as Gremlin and deployment tools such as Spinnaker implementing some Chaos Engineering tools.
In previous editions of the Radar, we've talked about using Chaos Monkey from Netflix to test how a running system is able to cope with outages in production by randomly disabling instances and measuring the results. Chaos Engineering is the nascent term for the wider application of this technique. By running experiments on distributed systems in production, we're able to build confidence that those systems work as expected under turbulent conditions. A good place to start understanding this technique is the Principles of Chaos Engineering website.
In previous editions of the Radar, we've talked about using Chaos Monkey from Netflix to test how a running system is able to cope with outages in production by randomly disabling instances and measuring the results. Chaos Engineering is the nascent term for the wider application of this technique. By running experiments on distributed systems in production, we're able to build confidence that those systems work as expected under turbulent conditions. A good place to start understanding this technique is the Principles of Chaos Engineering website.