By Omar Bashir, principal consultant, financial services, UK
"Cloud is about how you do computing, not where you do computing", Paul Maritz, former CEO VMWare
Introduction
"Cloud is more than technology: it’s a generational shift", Mark Hurd, former Co-CEO Oracle
Cloud can help organizations achieve business agility, technological flexibility and operational economy. However, traditional governance frameworks are prescriptive and process intensive. This distracts teams from achieving the above outcomes and restricts opportunities for innovation, optimization and flexibility on the cloud.
Hence, to maximize benefits, governance on the cloud needs to be driven from first principles that help achieve the technology and business outcomes. We believe that governance should be driven with the following three as the first principles:
Reduce delivery friction
Increase confidence
Optimize margins
Lower friction reduces the time to market which aids in achieving business and technological agility and flexibility. Higher confidence through early and often testing reduces failure demand, which helps teams focus on value delivery. Cloud cost conversations usually focus on just keeping costs low. However, as business volumes increase, costs are bound to increase along with revenues. Instead, the focus should be on maintaining or increasing margins as businesses scale on the cloud rather than just simply reducing costs.
Further, these principles make teams autonomous and empower them to adopt and adapt the necessary practices. This enhances their productivity in delivering the desired outcomes within the organizational and technical constraints they face. These principles are also enablers to optimize the four-key (Accelerate) metrics, which, according to the 2019 State of DevOps Report, relate directly to an organization’s ability to achieve its goals.
Challenges with traditional governance
"Today’s problems come from yesterday’s solutions", Peter Senge, author of The Fifth Discipline
IT Governance is the means to ensure that technology and business strategies are aligned and driving towards business outcomes. Popular and established governance frameworks include COBIT (Control Objectives for Information and Related Technologies) and ITIL (Information Technology Infrastructure Library). COBIT provides strategic guidelines to develop, implement, monitor and improve technology governance at an organizational level. ITIL is a framework for improving IT services by optimizing teams and their workloads to meet the business’ needs. Together they seem to complement each other.
However, in practice, organizations can experience the following challenges when implementing these frameworks:
Disruptive and time consuming implementations. Organizations spend significant time and resources on elaborate COBIT and ITIL implementations. Thus, subsequent adaptations or migrations to more recent versions of these frameworks become challenging (e.g., as indicated in this report). Hence, there is often the risk of parts of these implementations becoming irrelevant in evolving business and technology environments such as the cloud
Prescriptive governance. While business alignment is a key objective with both frameworks, these implementations end up being process oriented and activity driven rather than outcome focused. Thus governance itself may become an impediment to innovation and optimization
Disconnected governance. More importantly, governance ends up being implemented and executed by those who are distant from technology development and service delivery. This results in overcompensation in the form of unnecessary gates in the delivery workflow causing lead times and employee disengagement to escalate. The 2020 State of the DevOps Report has surveyed the impact of governance on change delivery. Organizations with highly orthodox approvals are nine times more likely to be highly inefficient. Conversely, firms with more automation and employee involvement in change management are three times more confident about change management and five times more effective with higher employee engagement
Thus, businesses may not benefit from the flexibility, agility and economy of the cloud if their technology on the cloud is governed using prescriptive, procedural and activity driven traditional frameworks. Aligning teams and their practices to the principles of governance and encouraging governance via code — as opposed to prose — leads to governance becoming a business enabler rather than seen as an obstacle.
First principles for cloud governance
"First principles is kind of a physics way of looking at the world. You boil things down to the most fundamental truths and say, ‘What are we sure is true?’ … and then reason up from there.", Elon Musk
For governance to be seen as a business enabler, it needs to be business aligned and outcome focused. To achieve this successfully, it should be:
Responsive to changes in business and technology environments
Lightweight in implementation and execution
Covering the entire lifecycle of systems on the cloud
Operating environments, business requirements and regulatory constraints largely define organizations’ governance needs, making them specific to organizations. Additionally, the cloud introduces rapidly evolving technology and an OpEx focus. Hence, effective governance on the cloud requires frequently revisiting the first principles of governance on the cloud and adapting operations to continue to align with those.
We believe that the following three are the first principles of cloud governance and they align very closely with the fundamental principles of business success:
Reduce friction: Lower the time and effort required to deliver value to the market, which may also lower the opportunity cost to the business
Increase confidence: Deliver and operate high quality, secure, compliant and resilient technology
Optimize margins: Leverage greater cloud OpEx transparency to manage costs such that business margins are sustained or increased even when scaling
These principles are intertwined. Higher confidence, e.g., early, fast and frequent testing, helps reduce regressions which also helps lower friction in delivering value. Reduced friction allows for faster feedback and the ability to fix defects fast and early. This provides opportunities for higher overall confidence. Higher confidence reduces rework and lower friction is generally achieved through more efficient and less wasteful pipelines and processes. Both reduce cloud costs which helps optimize margins.
Following paragraphs discuss these principles in detail.
Reducing friction
Friction in delivery has organizational, process and technology dimensions. Removing obstacles in business value’s path to production requires transparency across the organization. This transparency highlights waste which must be removed by optimizing business and technology operating models.
Organizational silos restrict visibility in, and accountability of, delivery because of limited understanding of the overall context. Hence, handovers across silos require considerable coordination which severely impedes the flow of value. These can be reduced with outcome-focused, cross-functional teams that are accountable for end-to-end delivery. These teams are allowed the freedom to operate while adhering to guiding principles aligned with organizational goals.
Most friction and waste results from manual processes in provisioning environments, testing and promoting builds and releases. End-to-end automation of delivery pipelines makes these processes consistent and efficient. Optimizing the four-key (Accelerate) metrics ensures continuous improvement of delivery pipelines and processes.
Balancing technology standardization and diversification also affects friction. Tighter standardization restricts adapting implementations to pursue emerging business requirements. Conversely, no standardization results in divergence risks leading to higher TCOs. Business capability driven development allows identifying sub-capabilities that are commodity and can be standardized into a platform. This supports diversification of sub-capabilities that provide differentiating value to the business.
Increasing confidence
Confidence in the value being delivered is much more comprehensive than just quality assurance and testing. Though both of these are key means to gain that confidence, they rely on a number of other factors.
Confidence building starts at the time of product definition with both functional and cross-functional requirements (CFRs) being specified together. Both are essential for defining the architecture of the system. Thus, relegating CFRs (like performance, resilience and security) and corresponding cross functional testing invariably leads to expensive and complicated rework late in a project.
Early, frequent and fast testing is pivotal in obtaining early feedback and resolving defects early during development, when it is easy, economical and less risky. This includes ensuring that architecture and design integrity is maintained through architecture fitness functions.
Further, as testing is a means of gaining confidence, we should have confidence in those means also. Flaky and non-deterministic tests reduce this confidence in the testing regime leading to an increased testing overhead and longer lead times.
Continuously improving resilience and supportability is key to increasing operational confidence. When support issues require escalation to the development team, the resulting failure demand reduces the capacity of the team to deliver value. Supportability is helped with increased observability along with monitoring and alerting.
Enhanced security with a zero trust architecture helps avoid malicious or accidental misuse of the system. Last but not least, enforcing compliance within the delivery pipelines ensures consistency and confidence in meeting compliance requirements, especially in heavily regulated industries.
Optimizing margins
Without holistically considering the factors affecting cloud costs, conversations about it may adversely impact both business and technology outcomes. Traditional budgeting cycles and processes are speculative and motivate on-premises behaviors, such as overprovisioning. These can negate the savings cloud offers and restrict the business in responding to market volatilities.
Cloud costs represent consumption. Wasteful consumption that provides no customer value or business benefits needs to be identified and eliminated. This is usually achieved by increasing a service’s unit capacity while reducing its scale-up costs. Business should then concentrate on optimizing margins rather than just reducing costs because costs will increase as business volumes increase.
Defining and monitoring run costs as an architecture fitness function, preferably within build pipelines, provides early cloud cost visibility with opportunities to optimally balance unit capacity and scale-up costs. Furthermore, leveraging hexagonal architectures allows to conveniently and efficiently rehost business functionality between containers and functions (or lambdas) to optimize costs when traffic profiles change substantially.
There are numerous opportunities, from product definition to engineering and operations, to lower cloud costs and optimize margins. Some key practices are listed in the following figure. The common denominator here is optimization; optimize the product, implementation, delivery and operations.
Interestingly, State of the DevOps Report 2019 found that firms implementing all the five essential NIST Cloud characteristics are:
2.6 times more likely to accurately estimate software operating costs
Twice as likely to identify most operationally expensive applications
1.65 times as likely to stay under their software operating budget
Putting it into practice
"Vision without execution is hallucination", Thomas Edison
Any lightweight governance framework for the cloud starts with clarity on the business outcomes. These help put in place guiding principles which are derived from the first principles described above and aligned with the business outcomes.
One or more cross functional governance teams run the governance function, which is objective in nature and evolves with the organization's cloud journey. One of these teams ensures that the right guiding principles are in place and that the governance helps deliver business outcomes. Another team may be required to coach the remaining technology organization on implementing practices that align with the agreed guiding principles. To be effective, this coaching may involve the complete product lifecycle, including product management practices, technology development and delivery, and operations.
Transparency is key to the success of the governance function and to demonstrate how governance is an enabler for the organization. The governance teams should openly and abundantly communicate the business outcomes, guiding principles and accepted practices. They should also create means to obtain and provide feedback to the organisation on how well these business outcomes are being achieved. This would involve agreeing on few but relevant metrics with the four-key metrics being the essential leading indicators on the performance of the organization. The governance teams should enable the organization to collect these metrics automatically and to make them available for the entire organization to view on demand.
All put together, teams have the freedom within the governance framework to adopt practices that best suit their mission. These practices are aligned with the first principles and help achieve business outcomes.
Public cloud providers have also recommended governance frameworks. Microsoft Azure defines five governance disciplines as part of their Cloud Adoption Framework governance model. These include cost management, security baseline, resource consistency, identity baseline and deployment acceleration. Similarly AWS defines a governance framework that includes defining the governance requirements, implementing a governance operating model, and measuring, assessing and optimising operations on AWS. Both Azure and AWS provide comprehensive guidelines and recommend tools on implementing these frameworks on respective platforms.
On closer inspection, these frameworks align with the first principles described here with their components being either second order principles or practices. Awareness of and alignment with the first principles help make conscious outcome-focused implementation decisions when working with these frameworks.
Hybrid cloud governance can be challenging. Hybrid clouds exist because of certain constraints, for example critical data residency regulations requiring some applications to stay on-premises, keeping performance critical applications on-premises, etc. The first principles still apply in governing a hybrid cloud as a single entity within such constraints. The key challenge here is having unified management and monitoring across both the public and private sides of a hybrid cloud. Implementing a hybrid cloud using the platform options provided by a public cloud provider, e.g., AWS Outposts, Azure Stack, Azure Arc or Google Anthos, may simplify this tooling challenge. Additionally, off-the-shelf management tooling options exist for hybrid clouds.
Getting started
"A year from now, you may wish you had started today", Karen Lamb
Even adopting a lightweight governance framework can be daunting. It requires a different way of thinking and operating. These first principles may sound more like business concepts rather than technology principles. To succeed on the cloud, both business and technology need to move closer and have strong alignment to deliver undeniable customer value.
But teams may want to know what practices they may prioritize as they incrementally align with these principles. Because these first principles are intertwined, many of the practices described here map to more than one principle, as shown below. Such a mapping helps to prioritize practices that align with most of the principles. As discussed above, the governance function should facilitate and not obstruct the teams in adapting and adopting practices to suit the needs of the teams while achieving this objective.
For example, decommissioning unused features (3) is facilitated if the architecture is modular (7). A hexagonal architecture (8) depends on modularization. Modular and hexagonal architectures enhance testability and help shift testing to the left. These together help lower friction, increase confidence and optimise margins.
Governance and engineering need to work closely to implement these practices and efficiently achieve both business and technology outcomes on the cloud. And with the efficient and automated collection and dissemination of metrics, particularly the four-key metrics, both governance and engineering get early, frequent and actionable feedback to continuously improve to achieve business outcomes.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.