Brief summary
We take a deep dive into the process and why organizations are looking at creating platforms on top of their cloud infrastructure. Our co-hosts, Alexey Boas, Head of Technology for Thoughtworks Brazil and Ashok, Head of Technology Thoughtworks UK are joined by Alexandre Goedert Head of Technology, Thoughtworks Chile. Together, they look at the complexities of building digital platforms that can give teams sufficient freedom to innovate, while maintaining some standardization.
Podcast transcript
Alexey Boas:
Hello and welcome to the Thoughtworks Technology podcasts. My name is Alexey, I'm the head of technology for Thoughtworks Brazil, and I will be one of your hosts this time, together with Ashok. Hello, Ashok.
Ashok Subramanian:
Hello, Alexey. I'm Ashok, I'm the head of technology for Thoughtworks in the UK, and we have Alexandre with us. Alexandre?
Alexandre Goedert:
Hi everybody, thanks for the invitation again for the second episode. It's a pleasure to be here. I'm working currently as a head of tech for Chile.
Alexey Boas:
Yeah, thank you very much for joining us again Alexandre, we're delighted to have you back with us. Something in the last episode, in the last conversation, we talked a lot about the journey to migrate to the cloud. We touched on why go to the cloud itself, we talked about different strategies to migrate from lift and shift to completely rearchitecting the application. We talked about tools, containers, security, and even some human aspects, like changing the mindset of the organization towards cloud itself and not just a tool or platform. Now, if we take all of that for granted, what is the next step for an organization?
Alexandre Goedert:
Usually many organizations, they start moving because they have some urgent matters to attend, that's pretty common. Companies, they have problems, for example, they are searching for better stability, they want to have better cold state infrastructure, better backup options or better redundancy. Sometimes there's a business pressure so they cannot wait, and it's understandable, but as we discussed it previously, it's not worth doing everything as a lift and shift.
Alexandre Goedert:
Just to quickly remind what we talked about, so lift and shift, when we talk about that, we are saying we are just recreating the same infrastructure from on premise or from a source cloud to a target cloud. Then on this move, we might lose some more interesting benefits of cloud native architectures and some more modern infrastructure that we could use. I think that the point now is like, so what's the next step, so what can we do after we already migrated some big chunk of infrastructure?
Alexey Boas:
Yeah, in Thoughtworks, we've been talking a lot about digital platform and creating a platform on top of the cloud platform. What's your experience with that? Or maybe we can start, why would we do that at all, so what's the value of that? We're already getting a level of abstraction from the cloud provider, and does it make sense to build a platform on top of that, and why would you that?
Alexandre Goedert:
That's a very good question. It's interesting, because if you think about the type of services that the cloud providers are delivering today, it's even becoming more and more complex, the scenarios, and they try to do very complex bundles. For example, if you take GCP, you have nowadays a way to put a code, there's a source repository, there's cloud builds that is some tool for doing a continuous integration pipeline. You might wonder, why do we need to do something on top?
Alexandre Goedert:
I think the point is, many times the companies, they already have their own way to do things, and other times, if you just check the whole package provided by a cloud vendor, they have different levels of maturity as well. For example, in a given company, they might not want to use cloud build, they might want to use Jenkins because they are already familiar with that, they have already invested a lot of time and money in building scripts for Jenkins to capture pipelines as code, so I think that's one reason.
Alexandre Goedert:
We try to leverage what the company already has invested with the new stuff that is appearing on the cloud. You have to integrate all these things, and so that's what we call a platform. It's basically a way for the development teams to reuse all these things, to bootstrap a new development environment so they can easily deploy applications, they can easily find new APIs that they can consume and create new business concepts. It's a way of speeding up the business, starting from the infrastructure layer.
Ashok Subramanian:
From your perspective, your experience, the way you're describing this digital platform is something that will give the teams the necessary building blocks and flexibility to accelerate the development to create new business features, is that correct?
Alexandre Goedert:
Sure, it's correct I think that's the main idea behind it. It's sometimes a challenge to integrate all these different pieces, and when we talk about platform, we want to integrate things like SCM repository, continuous integration pipeline, infrastructure itself, things like secret management. Nowadays, we want to supervise secret management as servicing, that is accessed from every different point in the company, and also monitoring tools. We want to package all these things together and make it playable as code, and also disposable if possible. I think that's the main type of work we do, and as you said, the goal is to accelerate the building of new applications and deployments.
Alexey Boas:
I find it interesting when you say accelerating, because I have seen situations in which the technology team was actually trained to hide the cloud provider, even to avoid lock in or something like that, so trying to hide many of the resources and features of the cloud provider on that platform. That actually made it harder for the teams, because the teams knew about the providers, but they couldn't use things. That mindset of accelerating and facilitating what the team has to do, I think it's important. How do you feel about that?
Alexandre Goedert:
I think that's one of the main themes when we talk about, why do we need a platform over another one? In one way, you want to leverage what the company already has, but you don't want to be a blocker for all the new stuff that the cloud provider is giving. There's a lot of interesting things about artificial intelligence, machine learning, big data, and of course we want to use that as well. It's obvious that these things will cause some level of locking, but we don't think that's the main problem. We just want to build something that is easy to evolve and to adapt, and that's very important.
Alexandre Goedert:
We don't want to to create a blocker or this layer over the cloud that is really frustrating for the devs, we want to build something that if the developer doesn't like it, it can be changed, it can be evolved. That's the thing about building it as code, and also making sure that developers start having some experience with that, because then we can create a collaboration around the platform itself.
Ashok Subramanian:
So almost taking a product sort of mindset but more from a developer and a team perspective, to make sure that the end user goal of being able to do the job that they want to do, rather than repeating stuff that is possibly everyone's going to repeat.
Alexandre Goedert:
I think you might say that it's a product mindset applied to infrastructure, because when we talk about product, for example, we want to connect with what the customer needs. The customer, in this case, they are the developers. I think it's very easy to, if you're just guided by infrastructure needs, to build something that is not usable for anybody else. I think it's very important to get very constant feedback from the developers and to make sure that what you're building is actually useful.
Ashok Subramanian:
Here when you're looking at building the platform, can you say, what are some of the considerations? Things like, how do you enable this thing between standardization of some of the things like APIs and so forth, and where is the flexibility for the team to do some of the experimentation? In your experience as you've gone building digital platforms, what's the sort of approach or the journey?
Alexandre Goedert:
I think that's an interesting point, because initially especially when we are starting the call migration with a new vendor, even though sometimes we have some knowledge as a consulting company but the client doesn't have it, so it's important to make sure people understand the tools. There's a risk of creating something very early on without having all the knowledge. We recommend actually spending some time initially to play with the tools, to try out quickly some different deployment scenarios, try to measure them, see if they work for the devs, if they pay off in the long run.
Alexandre Goedert:
I think after some time, we can start seeing some common patterns. For example, it's common to have many big companies, many development teams that have very similar infrastructure needs. For example, if I have 10 developer teams that need to deploy a static website or pieces of the same static website, I'm not going to use 10 different infrastructures, it's going to be a nightmare in terms of maintenance. Even though if you imagine that each team can maintain his own infrastructure, in the real world, there are dependencies. There are problems that they will need to escalate to some centralizing for such a department or something.
Alexandre Goedert:
We want to have a minimal level of standardization of this thing, so I would say that this is the second step. We start experimenting, find the common patterns, analyze what works best for each team, and then you start building some common building blocks. For example, when we talk about these building blocks and in the cloud, we can have one configuration that is like having a storage bucket for static files with load balancing in front of it, connected to a CDN configuration, so we have static files cached quickly and distributed.
Alexandre Goedert
We also can have, for example, some web application firewall configuration to do the protection against distributed attacks and things like that. We can pack everything together as code, and we can deliver for the team that has this need. Then we just scale up to X teams, and then we can build other scenarios. For example, we can have backend application with a Kubernetes cluster connected to a database that could be Cloud Sequel in GCP, for instance. Perhaps in this case, the load balancing could be an Ingress, and so all these things we pack and then scale up. I think that's the general idea when we start building this base, this foundation for the live infrastructure.
Alexey Boas:
How has been the reception by the teams? In the last, part one of this discussion, we talked about shifting the mindset of the organization, and is every team ready to embrace the work of managing their own infrastructure, and how do you see that working?
Alexandre Goedert:
That's interesting too. We've seen that there are different levels of interest in infrastructure maturity, and of course people are usually busy. They have their own backlogs in their teams, so people need to be interested in infrastructure to accommodate these things. What we've seen is, there's a tendency of having backend people more interested in infrastructure, usually, because I think they are closer to that and they already have some experience, and I think it's easier for them to start experimenting.
Alexandre Goedert:
While front end teams, usually when we have the separation, it's not always like that, but front end people, usually they seem more distant. What we tried to build is, if someone cannot accommodate the burden of dealing with its own infrastructure, we tried to build some bootstrap mechanisms. For example, we can have a common line interface that it's usually user-friendly for a dev, they can use to bootstrap the one of these buckets of infrastructure they need, and just start quickly.
Alexandre Goedert:
Then if they need to change something, they can also do it by code afterwards. We don't want to use this common line interface as a way to build everything in infrastructure, otherwise it's going to be a fat layer over cloud. It's just a way to quickly allow them to move, but still allowing the evolution afterwards. There are other teams that they just say, "Okay, give me the code. I know what I'm doing, I already have some expertise with Terraform, and I want to experiment." That's another scenario as well.
Alexey Boas:
It's interesting, it looks like actually the classic trade off between standardization and flexibility and ability to innovate and to move faster for the teams as well, right?
Ashok Subramanian:
Yeah, being able to keep looking at the ability to innovate quickly, but also I suppose this balance between centralized teams that are building this infrastructure out, and being able to get the innovation back from teams as well. I suppose as the balance between the 80/20 rule, where you can look for most of those common use cases but allow teams to manage, I suppose, give them the autonomy to work in those cases where it doesn't fit into the common pattern.
Alexandre Goedert:
Yes, of course it's very important to allow people to experiment. One thing we try to do is we try to think about a sandbox environment that is easily consumable by your teams. For example, if I can easily bootstrap a sandbox environment and I can have some way to monitor costs, the teams, they can have a budget for experimenting, for example. If we allow them to have a very detailed view on how much it's costing to them, it's easier to experiment. You know it's like nowadays, new tools in the cloud, they come like every month, so we need spaces for experimentation, and it's easier to just give this space to the teams.
Alexandre Goedert:
Usually what happens, teams start to experiment, they find something interesting, they can then give us the feedback and send a pull request for the platform. I want to use this new tool, I think it's interesting, I just found this Terraform provider that you can use, and that's how the cycle works.
Ashok Subramanian:
I think that's a good tip that you've shared, that the sandbox, I think it's a good pattern for teams to be able to follow to experiment in a safe manner, but also understand the cost and other sort of impacts as well.
Alexandre Goedert:
Yeah, exactly. Nowadays, I think there's a lot of tools to do cost tracking in the cloud. It's not always easy to come up with a process to be transparent in the company, so many times the technology part is the easy one, but we have to find out how to build up a process in the company. To connect these bureaucratic things like center of costs with teams, and teams with the cloud, so this is also part of the work of having a digital platform, so facilitating all of this process with technology.
Alexey Boas:
In your experience, what have been the specific things that helped the most speeding up the teams? What are the most reusable parts, and things like configuration, network policies and things like that that tend to be company-wide, they come to mind, but in your experience, what are those things that get highly reusable and help the teams ramp up?
Alexandre Goedert:
There's some parts of infrastructure that, even if a developer team wants to start contributing, it might be tricky. For example, if we talk about networking policies, it's something that usually some of the main people, they have more knowledge and experience. We usually want to capture that first, and these are things that don't change very often, so we have some ways. For example, in GCP, we have the concept of shared VPC, so VPCs are virtual private cloud, so usually each project in GCP has its own VPC. You can create, like if you don't do anything, you just created a full VPC, and it works like a logical network in Google.
Alexandre Goedert:
If you want, you can create customized rules, like firewall rules and network policies to allow people to operate safely, and you can put this configuration in a special project that is like a master project, and you have slave projects connected to this one sharing the same configuration. That's one possible scenario that we've seen happening already. I think that once we capture all these things that are harder to change, it's easier for teams to experiment then, because other things like, I don't know, NoSQL database, memory store, big data things, so they're more closely related to the applications, and people can experiment in an easier way. That's one possible approach for speeding up teams with the things they know most.
Ashok:
That's something you mentioned, using concepts like code in infrastructure a few times. I suppose one of the things around a digital platform would be the pace and speed at which the teams would like to move, and as part of that pace and speed, we know when we are writing software, we know quality is really important and we know how to do that. In your experience, what have you seen, and what have teams been using around the infrastructure?
Alexandre Goedert:
That's also something that is changing, because as we are capturing more infrastructure as code, we want to be sure that we have proper quality. Every time that it's possible, we try to apply the same engineering practices we were practicing before for common code. The first thing is, how can we build a continuous integration pipeline for infrastructure? It's possible, we've been doing that, but it's not the same approach for testing especially as we used to have for common code. For example, as we are dealing more with Terraform nowadays, Terraform, it has a declarative syntax, so you basically express what you want to build in your cloud provider.
Alexandre Goedert:
For example, this thing connects to a public API of the cloud provider, and build the stuff you need. Usually it's done by what we call providers in Terraform, so there's a lot of providers that they are written in goal, and they will just extend Terraform to have all these different functionalities for different cloud vendors. First of all, we don't want to unit test the declarative code because we don't consider that as something that it would break, because someone else has tested already, we assume.
Alexandre Goedert:
What we want to do in this level is just a syntax check, we want to make sure. There's a common that is called Terraform format, I want to make sure that it has a good format. We run a Terraform plan, we want to make sure the Terraform understands all that syntax and it's able to apply that in a later stage. That's all for this, what would be a unit test approach. Afterwards, we want to do an integration test. We actually run Terraform Apply, all this infrastructure gets created for real, and then we want to use tools to test if it was really created in the way we expect.
Alexandre Goedert:
For these type of tests, we have been using Terra Test or Inspec, and they help a lot in finding out if the infrastructure is ready to use. We might have some other later stages as well, so for example, if you want to do conformance tests to see if things are compliant according to a given policy, you can use Scout or G-Scout. Also we've been using for performance tests a tool called Locust, so it's a very lightweight performance tool. The tests are written in Python, so it's kind of attractive for infrastructure code, people tend to like it more.
Alexandre Goedert:
That's usually what we've been doing, so we do these different stages. It's a very similar concept, still a continuous integration pipeline, but it has a different base, basically. That's interesting because there's a lot of latency involved, so when we talk about infrastructure testing. We've seen two types of tests, so one that is totally self contained, so in this scenario, we build an infrastructure from scratch and then we destroy it afterwards, and it's easier to test so we have no interference from other things. Of course this is lower, so we want to do some core modules, these kind of tests.
Alexandre Goedert:
There's another type of test that is like, many times we need to integrate many modules to understand that in conjunction they still work. If you just try to recreate everything in the scenario, it's just too slow, so we might want to have some persistent environments where we can do just Terraform Apply over and over. Terraform as it holds is an internal state, when you do consecutive Terraform Applies, it just applies the delta of the configuration. That's much faster, but by doing that through time, you can generate some configuration drift because you're not cleaning up the environment. Usually we have this two types of tests, so it's not the same as having a test pyramid as we have for application code, but it's another organization and combination of having some cheaper tests or some more expensive tests in the upper part.
Ashok Subramanian:
I think what you described also gives further validation for this concept of a digital platform, because of that thing, if every single team was to try and do it themselves, it's feels like quite a lot of effort and something that can be pulled up together by different teams together to take advantage once. Especially things like testing, clearly there's a lot of thought that has to go into how you structure, and clearly multiple teams can then take advantage of that.
Alexandre Goedert:
Yeah, I think so. Also, I think for infrastructure that's totally the case, especially because it's something relatively new still, how to use this practice for testing infrastructure. Also when we think about our application in general, when we talk about this bootstrapping process, we like to give some templates of common application cases that the company might have. This can also help new developers or when people want to use something as a reference, they can use what the platform is already delivering. I think that we can use it for both, actually.
Alexey Boas:
Alexandre, in the projects you've seen and you're working on, what are some of the challenges that you see, and next steps once you have that?
Alexandre Goedert:
This is all good, when we have a deliberate infrastructure base and a foundation, but we actually want to use this to be a foundation of other layers. We talk about, for example, API strategy, so how can I use this digital platform to build APIs on top of it? Me as a developer, I want to just quickly browse through a catalog of APIs and explore some business scenarios. One of the challenges is many develop development teams, they start building their own APIs, that's natural, they have the knowledge, but it is not very trivial to have a consistent API strategy throughout the company.
Alexandre Goedert:
It's very common to have very granular APIs that when you look in a catalog, it's not actually usable, it's just not exposing the business concepts in the right level of abstraction. There's some exercises that people are doing here, so first of all, to understand what should be the right strategy for APIs. We can use even storming, so having conversations with the business stakeholders to try to understand how different parts connect in the same business flow. We want to define things like what are the business domains and sub-domains, and we try to organize APIs in different levels. We might have a technical foundation of APIs that are building blocks for higher level business APIs.
Alexandre Goedert:
It's good to organize these things so we know where we cut it. We also try to use domain driven design exercise, so to find what are the aggregates, which parts should talk to each other. Then it's easier to build things like an internal API catalog that makes sense for people. There's another level of the API strategy that is also, we usually want to build sometimes an external API gateway. We might want to expose things for monetization, for example, or especially big companies, perhaps they want to allow the startup scene to start consuming their own APIs. It can unlock different revenue opportunities as well, so I think that's something that you can gain afterwards.
Alexandre Goedert:
All this work, it's complex because you have to talk to a lot of people and connect business with the developers, but when properly done, I think it delivers a lot of value for the companies. This is one thing, and I think the later layers that we want to have in this strategy are related to data. It's common for companies to have been working with data at some degree, but not every time the data is easily consumable by people. For example, I might want to use an API catalog to test a business concept, but it's much easier if I can check all the data to gain insights, and then create a hypothesis, and validate it afterwards.
Alexandre Goedert:
We try to make this connection using the platform as well, so to allow a culture of experimentation in the company. I think very high level, this is what we try to do. Talking about the challenge, I think the challenge is, infrastructure is something that is easier for people to understand and consume nowadays, the higher levels, they require all this effort of talking to people, convincing them, and creating the vision of this whole strategy.
Alexey Boas:
It ties back to the organizational mindset we talked about in that last episode, and seeing that from a broader, how to structure the business, the business services, and the whole company aligned around strategy on that, and leveraging that.
Alexandre Goedert:
Exactly.
Alexey Boas:
Thank you very much for all the insights, and I think, as Ashok said, it makes a very strong case for thinking about a digital platform as something across the company as a whole, and it makes a lot of sense. I guess this takes us to the end of the episode, it was great to have you with us and share all the insights. Thank you very much, and thank you all for joining.
Alexandre Goedert:
Thank you, Alexey and Ashok, nice to be here with you.
Ashok Subramanian:
Thank you, Alexandre. Good speaking to you.
Alexey Boas:
If you have any feedback for us, don't hesitate to reach out or deliver rating or comments on your preferred platform. Thank you so much for listening, bye.
Neal Ford:
On the next issue of the Thoughtworks Technology Podcast, join myself and my coauthor Mark Richards, this is Neal Ford. Rebecca is going to interview us about our upcoming book called Fundamentals of Software Architecture, so please tune in then.