Brief summary
Through the adoption of DevOp practices, we’ve all become accustomed to product teams having full control over their continuous delivery pipelines right the way through to production. When organizations start out with homogenous sets of product teams, all doing similar things, compliance can fit in pretty readily. But issues arise as the scale grows and teams want to do validation checks in different ways. Our podcasts explore the ideas of compliance as a product, which aims to make compliance more manageable at scale.
Full transcript
Rebecca Parsons: Hello everyone. My name is Rebecca Parsons and I'm the CTO of Thoughtworks and one of your regular podcast hosts. I'd like to welcome you to another edition of the Thoughtworks Technology Podcast. I'm joined today by one of my co-hosts, Neal Ford.
Neal Ford: Hello everyone. My name is Neal Ford. I'm a director, software architect, and meme wrangler here at Thoughtworks and another of your regular podcast host. We are joined by two of our colleagues today who have an interesting topic that they've been doing some research on. I will let them introduce themselves, starting with Ken.
Ken Mugrage: I'm Ken Mugrage. I'm a principal technologist for the office of the CTO. Help with research. I have a background in continuous delivery and DevOps is where my interest in this topic comes from.
Carl Nygard: Hello, I'm Carl Nygard. I'm also a principal technologist at Thoughtworks. I also have a background in architecture, product, R&D across a couple different industry verticals.
Rebecca: Our topic for today stems from an article Carl recently wrote and was posted on martinfowler.com about compliance in a DevOps culture. Before we get to Carl and his article, I'd like to start with Ken. Talk to me a little bit about how you view compliance pipelines and DevOps and such because I know this is something you and I have spoken about quite often.
Ken: Yes. First off, I do want to say that when we're talking about compliance at this level, we really are talking about legal compliance, things like Sarbanes-Oxley, and ISO, and those types of things. A lot of people will lump compliance into policy and so forth. That could probably follow many of the same patterns, but we are specifically talking about compliance from a legal perspective and those types of things.
My background, as I mentioned with the DevOps continuous delivery was a lot with the tool side of continuous delivery. There is some bias there admittedly. One of the patterns that I've seen work a lot is-- First off, we do want to say that a product team, somebody creating an application, they have full control of their continuous delivery pipeline from CodeCommit through to production.
I'm not a big fan of sticking other things in the middle. A lot of times you'll see, there'll be a compliance check as step three out of six or whatever it is. That means that that compliance check is only going to run if the earlier ones passed. It means that anything after that compliance check, so might be functional testing or some other type of verification, is not going to run if the compliance fails. It also makes it hard to collaborate on it. Who's going to write the test? Who's going to maintain them, et cetera. I was really excited to see Carl's article, which we'll get into because it covers a lot of this.
The pattern that I'm referring to it's a lot like a diamond dependency in programming, call it fan in and fan out sometimes, or what have you. The idea is that a product team controls their pipeline. They define the tests that are required to meet their needs, to deliver the value that they're supposed to deliver. In parallel with that, running off the same code base, we can also run tests from different organizations and so compliance being one of them.
The pattern that I saw a lot was, let's say that someone's building a JAR as a deployable unit, just as an example, a Java archive. They built their JAR and then their pipeline continues with unit tests and functional tests and all of the things that we expect in the test pyramid, but then in parallel, that JAR is pulled into another pipeline, which does the compliance checks. Then at the end, we put in a dependency that they both have to pass in order for it to be deployed or released.
What that does is it makes sure that everyone's tests run all the time. The compliance checks are going to run, even if they have a unit test that's failing and they need to fix it. We're still going to run the compliance test and vice versa. Yet it still verifies that it has to be done to go forward. We've just found that to be a good pattern for keeping teams fast and yet also putting in the necessary controls.
Rebecca: Thanks, Ken. Carl, tell us a little bit about these patterns that you've come up with and what actually prompted you to put the article together in the first place?
Carl: What prompted me to put the article together was actually some work that we were doing with one of our clients where they were currently on what actually was the first pattern in the article, basically an owner's manual process. That process was causing quite a lot of grief and friction to the point where the amount of friction was so great that people were actually just flat out trying to avoid it. That bled into other, let's say, behavior patterns or architectural decisions that if you look at the holistic, the overall goal is to get products into production, decisions were made based on that as an outcome. If you shrink your perspective and say, "Well, what about the architecture?" The decisions were actually incredibly poor for the architecture. Over time that causes additional problems because now you don't have just one problem in terms of the friction with compliance, you now have the friction with the poor architecture.
As part of that, we were working on better ways to do it. As part of the EMPC, that's enterprise modernization, platform and cloud. As part of that group, Thoughtworks has a lot of experience implementing different forms of compliance. In the past few years, we've been working and deploying various forms of what appears as the fourth pattern, which is compliance at the point of change. That was the driver. In terms of the patterns, of course, there's different flavors and different, let's say, degrees of each of these, but for the purposes of the article, try to boil it down to the simplest but completest form of each of these patterns starting with the manual compliance typically is just form-based compliance. I need to fill out a PDF or submit a series of forms to get approval, to deploy into production.
As I mentioned, that generates a lot of friction especially as you grow at scale. It may be the best choice for a small company where they don't have a lot of compliance activity or they don't have a lot of compliance checks but as they grow, as the teams get more diverse, the work that's involved with the manual compliance becomes excessive and it starts generating other forms of friction.
Typically as a company grows, they might start looking at automating things, and that leads us to the second pattern, which is compliance inside the pipeline. As Ken described, you have a pipeline that developers are using, but you also have part of that pipeline that's defined by a secondary organization like compliance. Passing that that pipeline is what says I'm okay to go into production.
One of the issues, and, again, at scale this is appropriate for companies that are growing. As long as they have, let's say, a homogeneous set of teams where everyone's pretty much doing the same thing, that works great, but as the scale grows, as teams start doing different things, teams start desiring to perform their validation checks in different ways. For example, I want to use Python, but I don't have a compliance pipeline that's suited for Python so that's holding me back. Now I have to sit and wait for the compliance organization to spin up a compliance pipeline that's suited for Python. That just represents a set of frictions that are due to essentially a shared central service with a central ownership. That's really where a lot of the friction comes from.
Again, you get the same types of effects. People start bypassing or making decisions that are based on the friction associated with compliance rather than the right decisions for the architecture. A lot of times, teams, as they grow, they look at containers and say, "Wait a second. I have a container, how about if I just get a container, certify the container, and now I have a building block that's a bit larger and now teams can reuse my container?" Just like centralized pipelines, you have the same problems at scale and diversity where trying to get a blessed container is you have to sit and wait in line for some other team to create that or you create it and still go through a normal manual certification process.
Again, there's also the question of, how much can I use or configure this container because with containers I can actually put in another layer, change the configurations. For example, turn off the encryption at rest that helps me gain components. That's dangerous. It's not necessarily something that's going to solve for the solution at scale. That brings us to the fourth pattern which is essentially trying to separate the idea that the checking and the compliance validation has to occur at the same time.
What that looks like is all the normal checks that teams are doing, they're still doing, and they own those checks. They share those checks, but they're free to self solution as well if they have different requirements. The information essentially goes into a repository. Then, let's say, for example, in the sense of a Kubernetes installation, an admission controller actually checks to make sure that all of the tests that are required have been run, that all the results, the evidence that says this should work in production, is there, and that I trust all the answers that I've got. That lets me actually just validate that if I'm trying to change production, I'm allowed to because someone has done all the required checks before I got to this stage.
Because you're splitting apart those responsibilities, you can reduce the friction. It doesn't get rid of it totally. If a team wants to do something different, they're still going to have to get that difference, their individual flavor of compliance approved. What they're actually proving is a much smaller piece of the whole. It tends not to hold them back from making progress or getting feedback from the other parts of the Kubernetes process.
Rebecca: How do you decide in that fourth pattern, what's the right granularity of checks to be making? In many architectural decisions, is this too big or too small? It's the classic kind of problem. What's your sense on how you characterize what is the right size of one of these checks, these individual steps in that pattern?
Carl: I think like anything, it depends. It strikes me that that question is very similar to a question like, "How big should my microservice be?" Well, it should be as big, but no bigger than it needs to be. There is no one size fits all. The point to keep in mind is that you can start with what you think is the best granularity but like any other product, if you treat compliance like a product, what you're really trying to do is enable the teams to deliver software safely. That means that the granularity of what you're trying to deliver, it's based on how your teams are using it.
You don't have to find the perfect solution right out of the gate. You can make a guess. Try it, deliver it, see how people are using it. You'll get feedback to say, "Oh, I would like that, but I only want 70% of that thing that you delivered to me. I want to refactor it and I want the freedom to change part of it." How you deliver it is really based on how your customers are going to use it. Your customers are the other development teams within your organization.
Neal: I think that atomicity is better driven by a specific compliance that you want or don't want versus something like a golden blessed image because you get into this infinite regress problem of, just how granular are the pieces of that and what parts are blessed and what parts are not blessed. Really what you're getting at is, I think that having automated compliance mechanisms provide the friction that you want. The alternative to that in many organizations is the big, giant heavyweight governance model which is built to create all these bureaucratic checks but mostly what it does is just validate against itself that are you using the governance model and it doesn't actually provide any value. It just becomes its own kind of self-validating thing without actually giving you feedback.
Whereas picking some level of compliance and some level of granularity, like you say, that adds value allows you to start building those things and iterating on them and keep focusing on value and not focus on the framework itself so much.
Ken: Since he said value, but you also said something in your previous thing where as compliance as a product. I really love that reference. It's another topic that Rebecca and I have talked a lot about. I love the whole as a product movement, and I really want to stress to people that we mean product in the largest, most holistic sense. It's customer-focused and feedback-driven and willing to pivot and change what you're working on and so forth. I love the idea when we have to have, and I do think we have to have separate compliance teams.
I don't think that a cross-functional team can truly have every single skill and every single background on it. That whole idea of acting as a product team, so listening to your customers, going to their need, providing value, I love that as opposed to just some other silo.
Carl: To build on that, I think it's also important for the compliance organization to express what their goals and outcomes are rather than express an implementation in the form of a requirement. That's one of the patterns that we see quite often where, for example, one of the organizations that I've spoken with said, "Hey, we have a requirement that we have to redeploy every 30 days. That's part of our compliance." My question is, okay, that's nice, but what problem are they trying to solve? What risk are they trying to mitigate? Is it disaster recovery? I want to make sure that I know how to deploy this piece of software over time. Is it that I want to make sure that I don't have the CVEs and the only way for me to do the CVE scanning is through the pipeline?
Or is it that I want to make sure that there's no single point of failure and that people-- that I can actually remediate things of that sort? Depending on what they're actually focused on, for example, if it's a CVE, a better solution is actually to have CVE scanning in the cluster or in your live environment. That's a better solution than requiring someone to redeploy every 30 days. If it's just that I know I have reproducible builds I can work on reproducible builds and not actually have to go through the process of deploying something.
Expressing the outcomes that you're looking for allows the teams to self solution but even then, you don't necessarily want every team to come up with their own individual solutions. You also need to work with your central teams to say, "Here's what we consider best practice. Here's an example." Or let's say, "Here's your starting point, it works out of the gate. If you need to deviate, well, then you can self solution but we've done most of the work for you already."
Neal: Yes, Rebecca and I have long argued that enterprise architects should not be in the business of specifying technologies, but instead specifying what kinds of outcomes they want from parts of their ecosystem. It's very outcome-focused around the capabilities they need not choosing technologies, because the further you get away from technology, of course, the less qualified you are to actually make choices about that. That meshes nicely with the things that we talked about with architectural fitness functions.
Rebecca: Yes, they're actually quite a few similarities in the kinds of things that we talk about with evolution architecture and these compliance patterns. I want to get back to the granularity question because I think it also gets to some other questions around what should this compliance as product look like? I like what you were saying about focusing on as big as it needs to be but no bigger.
Maybe that means I want to break it up because, using your example, oh, I need 70% of this. If we start to look at what is the right quantum of reuse, if you will, we can look at this from the perspective of the technology stack. Something might vary, whether it's in Java or Python, or whatever, it might vary as a result of a deployment target. There are all kinds of different technology focus dimensions that might say, "Okay, I need one of these for Java and one of these for Python, et cetera."
Then there are also different axes of compliance. This is HIPAA, or this is something to do with encryption, et cetera. Can you talk to me a little bit about how that has played out and some of the examples where you've seen this put into practice? How difficult is it for people to get to that stage where they've got this wonderful set of building blocks that are ripe for reuse because people are actually reusing them?
Carl: Sure. I think a lot of it actually depends on the requirements that you're trying to meet. For example, NIST standards is a common one. Within NIST, there's various guidelines and guardrails that they put. For example, SSH into root, don't run daemons as root in your containers, things like that. I think you can size the checks based on the size or the requirements that you have to meet based on the use case.
Let's say, if you think of an axis, in one direction, you have, what are my requirements and how are my requirements grouped? Then on the other axis, you have, what kind of services do I have? Do I have a service that actually deploys these containers? Then I need to have all the container checks. You can baseline those to say, "These are the ones that apply no matter what container I put into place." That's a logical granular group. That's not going to change based on what I'm delivering.
Other things that might change based on what I'm delivering, those are the ones that makes sense to refactor into common blocks of functionality such that, depending on what I'm delivering, and what environment, it's in, like embedded devices, totally different arena. You want to kind of slice and dice your granularity based on how you're actually using it.
Rebecca: I'm sure this is a question you've been asked a lot. Compliance organizations aren't necessarily known for embracing agility, and all of this kind of culture that is around test automation, and let's deploy fast, and all of that. What are some strategies that you've used to start to get the compliance organizations to view their relationship differently and to get them to even consider this as a possibility or is just the sheer weight and friction from the bureaucratic process enough to at least allow them to more readily accept doing things in a very different way?
Carl: That's a great question. I think the first part is that both the compliance organization and the developer organization really need to have empathy for the goals that you're trying to meet. One of the things that occurs to me is that when the compliance organization is really just focused on minimizing risk, then their attitude is more towards like, you can't do this. Whereas developers, they want to be able to do develop, deliver value to their customers. That product mindset is really important in the sense that you're shifting from a compliance perspective. You're shifting your focus to say, "I'm trying to enable teams to deliver software safely."
That's my end goal. That's my outcome that I'm trying to achieve, not prevent them from making a mistake but enable them to do something well. When you start looking at it that way, it becomes a lot clearer in terms of the cooperation, the collaboration, the shared outcomes, align goals. From a development perspective, you want to have the compliance team as a partner. That means that you're involving them earlier in the process, you are inviting compliance folks to the table when you're doing the design, when you're speccing out the features that you want to deliver so that you have, number one, early warning that you're doing something that might need extra attention. Also, you can work with compliance to do things in ways that are easier to find, like manage compliance process.
The collaboration, I think, between the two teams is really, really key. Ken said earlier, you don't have enough resources to have that role totally embedded but that doesn't mean that you can't invite them to the table, that you can't involve them at the right points in your development process so that you are collaborating and essentially smoothing the way forward both for compliance and for the developer teams.
Ken: Yes. I would say invite them to the table literally. A former colleague, who since retired, Joanne Molesky, who ran a lot of our internal compliance at InfoSec, she was giving a talk one time where she said, "Hey, when you're spinning up a new team, you're spinning up a new product, invite someone from the compliance team to come sit with you for several days and pair on things and understand both ways, again, building that empathy."
What is it we're going to be creating? What are the systems that we anticipate we're going to touch? What is the kinds of data we anticipate that we're going to be working with? Then really working together to decide what is the right level. The old example of if I'm building cat gifs, there's less compliance risk than financial systems. That literal invitation to the table and pairing and saying, "Hey, we both have the same goal. The compliance folks, believe it or not, do want you to build more products and be more successful as a company just in a safe way." I really think that working together and building empathy is vital.
Neal: I think that two-way communication is important because otherwise, you get the people who lean into this too much and go to the ivory tower and build a whole bunch of conflicting compliance mechanisms that no one can get to pass, and holistically because they all are overlapping each other. Having some sort of two-way communication empathy is, I think, a great way to avoid that. That brings up the obvious. If you've identified some patterns here, have you also identified some attendant anti patterns around this that people should avoid or look out for if they are going down this path.
Carl: Yes, there are some anti-patterns. I think the biggest one that we see generate friction is really just the central ownership. There is value in having a central team responsible for, let's say, the best practice or the starting point, but forcing every team to go through and use something that may not be appropriate for their specific purpose or context, that generates a lot of friction.
The bigger you get, the larger the organization is, the more friction you're going to generate. At a certain point, that amount of friction actually starts taking over other decisions. Like I said earlier, holistically, you're trying to get something in production so you're going to make decisions. They make sense to get something in production but if you shrink your perspective, it doesn't make sense for the architecture. You just start creating other problems here, there, and everywhere.
The maintaining balance, maintain balance of ownership, and still trying to, let's say, enable economy of scale via a good starting point, a comprehensive starting point is really important, but don't lock everything down because then you have all the problems you inherent with the central ownership of something that's needed by many teams. It's basically inappropriate coupling, it boils down to. Treat that in the same way that you would manage any other form of coupling.
Rebecca: One more question. You can see these patterns as a progression. In our industry, we love our maturity models. Here's where you start and everybody aspires to be at the top level. Is that always an appropriate aspiration or does some of that depend on scale? I assume it depends, in part, on the maturity of the organization.
You obviously don't want to build out a comprehensive suite of compliance tools when you don't have any idea what your product is supposed to be doing. If you think about steady stage, should everybody be aspiring to that final pattern with the distinction drawn between the checks and the, yes, we can deploy?
Carl: I would say no for the simple reason that that fourth solution is inherently complex. Unless you have friction and pain that's proportional to the complexity that you would need to implement to fix that pain, then you shouldn't do that. However, that doesn't mean that you can't pick bits and pieces from the ideas presented there and use what's appropriate for your contest.
By and large, you need to pick the solution that's at the right size for your organization. The article tries to go through, this is good when and this tends to start falling apart when things get bigger. The article also tries to illustrate some of the behavior patterns that you start to see. As an organization, you might start off as a startup with manual compliance and that's totally fine.
As you grow, you'll see the evidence of the effects of not handling compliance in an appropriate way for your scale. Those are the signs that you should look out for to decide when you should level up or move to another solution. Depending on what pain you're feeling, you might pick the containers, you might pick pipelines. It really just depends on the actual friction that you're feeling and the context in which you're operating.
Rebecca: This one's really more for you, Ken. How would you propose to an organization that they get started in this mindset shift to compliance as product?
Ken: Really tough question because culture is always the hard part. It does come down to, as we've talked about, focusing on outcomes and value. If you're inside an organization, and as Carl mentioned, if you're having this friction and the compliance or security even is seen as the department of no, it's the thing that we have to overcome, it's the people we have to get by, that's a smell or a symptom that, hey, we need to do something here.
What a product focus does for a team on anything, whether it be creating printed material or software or compliance code, is it allows you to say, "These are the things that are most important and these are the things that aren't important." Really focus on that and react to the needs of the organization. If you can build some trust there, because we all have used products that we don't trust the maker of so we just still give them all the feature requests, okay, but if you can build some trust bi-directionally. I can say to my compliance organization that's acting as a product service, "These are the problems I'm having. I need your help solving them."
They are responsive and say, "Okay, let's think about this. If we do this, it might help you in this way", really working together to come up with a solution, not a, "Hey, compliance team, I need you to write a test that checks this against that."
Don't define the solution, just like you wouldn't at any other feature request but work together to come up with that. I should say, I don't know, might be a little bit controversial, but it does require certain kinds of organizations. We talk about generative or pathological organizations and what it's like to work inside them. It does require an open mind to say, "Okay, we're going to do that. We're going to work as a product. We're not just going to be the department of no", and as I mentioned, it does require some trust.
I know that was kind of round-robin. Rebecca, I'm not sure if it's answering exactly the question because I don't know if there is a direct answer to the question.
Rebecca: That's fine. Well, another fascinating discussion. I like thinking about ‘as a product.’ Our podcast group is going to have to brainstorm what other ‘as a products’ we might want to talk about in the future. Thank you, Carl. Thank you, Ken, for participating in this discussion. Pleasure as always, Neal, to co-host with you. Thank you all for joining us for another edition of the Thoughtworks Technology Podcast.