Brief summary
In the much-anticipated final episode of our Humanizing Data Strategy takeover series, we’re exploring the ethics of data. If you’re a data leader considering how to strike the right balance between advancing your organization’s data capabilities while maintaining ethical integrity, this episode is a must-listen.
Episode highlights
- Tiankai introduces competence and conscience as the focus of today's podcast. He describes competence as how to balance data literacy with business acumen, and conscience as how to apply human critical thinking to ensure we are doing the right things with data.
- Emily suggests that, while conscience can relate to issues like machine learning bias, it also relates to the carbon and the water usage of data centers that are being built for increasing AI computation, and the social impact of data, analytics and algorithmics interfacing with environmental policy, social policy and even foreign affairs.
- Emily warns of the risks of hoarding data - the more data that you have, the more risk you hold, even if you're not using that data. There is much more risk of compliance risk for GDPR, and much more risk of data leakage.
- On competence, Tiankai suggests that data practitioners can't afford to sit back and wait until everyone else learns how data works. They have to make a step towards the business stakeholders and their expertise areas, too, to find common understanding.
- Tiankai says that instead of trying to avoid certain departments or colleagues, it's important to think about how to bridge the gap with them, rather than trying to find the quickest way around them. We all have our own responsibilities, and it's generally better to work together. Doing things in isolation is rarely a good idea.
- Emily says that in order to develop a data-driven mindset from a business context, it's crucial to get away from the mindset that there is data that you just need to pull that will give you the answers. There's no magic table that just has all the right facts. You have to introspect and look at the data and start to ask questions about it.
- Emily follows this by saying that a good data scientist, data analyst, data engineer, or data practitioner will look at data as an opportunity to uncover the right question rather than uncover the right answer. When you have a single mindset, or a dominant culture or dominant point of view, it becomes harder to do that. Emily suggests this is why diverse teams generally work and perform better.
- Tiankai predicts that there will be more regulations to follow, and more concerns to stay aware of in the future. The question is also how to build the scalability and flexibility to deal with requirements that we don't even know about yet.
Transcript
[00:00:00] Kimberly Boyd: Welcome to an exclusive Pragmatism in Practice podcast takeover, Humanizing Data Strategy. Guest hosted by Markus Buhmann, Data Strategy Principal at Thoughtworks. In this series, we'll speak to leaders who have successfully bridged the gap between analytics and real-world human needs. We'll explore some of the ideas and strategies they've leveraged, equipping you with insights you need to create a sustainable and impactful data strategy in your own organization.
[00:00:27] Markus Buhmann: Welcome to Pragmatism in Practice, a podcast from Thoughtworks, where we share stories of practical approaches to becoming a modern digital business. I'm Markus Buhmann, your guest host for the third and final episode of our Humanizing Data Strategy podcast takeover. As ever, I'm joined by my colleague, the head of Data Strategy and Governance in Thoughtworks Europe, the author of the book of the same name, Tiankai Feng. Hello, Tiankai.
[00:00:50] Tiankai Feng: Hello again. Nice to be here.
[00:00:53] Markus: Today we're joined by the Vice President of the Data and AI Service Line in Europe, Emily Gorcenski. Hello, Emily.
[00:01:00] Emily Gorcenski: Hello. Happy to be here.
[00:01:02] Markus: Thanks for joining us. I guess we should start by, Emily, I want you to introduce yourself and tell us what you do in Thoughtworks in Europe, and what does the Vice President of Data and AI do from day to day?
[00:01:17] Emily: Yes, it's actually a very diverse role. I'm responsible for looking over all of our data and AI offerings. What that means is basically keeping the pulse on the state of technology in the industry, looking at the trends, and trying to make sure that we can tailor the way that we develop software, our agile practices, our continuous delivery practices, our product focus and user focus practices, and bring those to the data space.
What that means in practical terms is, how do we build data architectures in a new and better way? How do we do data governance in a new and better way? How do we do AI in a new and better way? Of course, at the core of that is always the user. We're always looking at how do we focus on the user experience? How do we focus on, answering the right questions, and driving business value from the data that we're working with and the insights and analytics that we're performing.
[00:02:19] Markus: Fantastic. Thank you. How did you get to those rarefied heights?
[00:02:25] Emily: My background is as a data scientist. Actually, I came up through the R&D space. I worked in as a computational mathematician for about a decade, studying control systems, control theory for things like airplanes and satellites and submarines and all of that good stuff. I think eventually what I realized was that the work that I was doing was this new fangled field called data science. When you look at the salaries, the data scientists got paid a lot more than the research engineers.
I go head first into taking all of my mathematics experience and converting that into more of the pragmatic machine learning experience and then realized that I opened up a whole new space of problems to solve. Moving from a very-- I don't want to say narrow because I worked on a lot of things, but a niche field to something that had a little bit more direct user customer impact.
I think what motivated me to make this shift from the R&D space was that at the end of the R&D cycle, I wrote a paper, if I was lucky, maybe five people read it. Then it was questionable about what impact I had on the world. Whereas now, working with our clients, working with cool applications, cool platforms, you hear from users every day about the impact that you have on them. That makes me happy.
[00:03:50] Markus: That leads us really nicely into the last two of the five C's that we've been talking about. Tiankai, do you want to recap those just now for anyone who hasn't listened to the other two episodes?
[00:04:03] Tiankai: Yes, absolutely. Really the two C's actually of today's focus are competence and conscience. Competence really in a sense from how to balance data literacy with business acumen and how to bridge the gap basically from understanding data conceptually and technically, but also how to translate business expertise and business objectives into actually then driving value with data.
Conscience really how to apply human critical thinking and basically naturally our conscience so we can ensure we are doing the right things with data. What means right things with data can have various aspects or angles to it that could be from a DEI point of view. It could be from a sustainability point of view. It could be from a compliance legal point of view. Those are the topics that we want to discuss here today.
[00:04:52] Markus: It's really interesting, Emily, because your background as a research scientist and a mathematician, then moving more into data science touches across all of those sorts of things with, you hear stories of bias, you hear stories of sustainability. I was just wondering from that 30,000 foot view, what are those really salient issues for you in terms of those competence and compliance and conscience concerns?
[00:05:24] Emily: Yes, I think maybe if we start with conscience. We've seen over the years a number of case studies and examples of things like machine learning bias gone wrong or not being properly controlled for. I think it's a little bit old hat at this point. There's always the classic stories of facial recognition systems that don't detect people of color as accurately. There's all sorts of risks there. I don't want to diminish that. That's obviously a very important aspect of, especially the AI space. I think increasingly, if we look at the macro geo conditions, we're seeing a lot of really alarming trends in the space that need to be properly accounted for, properly controlled for.
This is everything from looking at the carbon and the water usage data centers that are being built for increasing AI computation. There's a massive local footprint that these data centers have in the local resource economy. I live in Virginia part-time in the United States, and here we're seeing rising energy prices because of all of the data centers that are being built throughout the state. That's ending up raising the prices for everyday consumers and just people who live here, which has a disproportionate impact, obviously, especially in the poorer rural communities of the state that are not offset by the jobs that these data centers create.
I think that there's a social aspect to this that needs to be looked at even from the supply chain point of view. Then I think also we're starting to see a lot more concerns around data sovereignty. As we record this today, it is the 15th of January, all of the talk in the tech news is around TikTok and whether or not the United States will ban TikTok or force a sale because of the worries or the fears of data transfer to China. We're starting to see the impact of data and analytics and algorithmics starting to interface with environmental policy and social policy and even foreign affairs policy.
I think that as data practitioners, maybe that's not the thing that's on our minds every day. I don't usually go into work worrying about how to solve the geopolitical tensions between China and the United States. One of the things that I can do as a data practitioner is think about how am I making decisions on what data I need in order to achieve a goal? What data am I collecting?
Am I doing things like putting the proper monitoring in place so that I know if somebody is trying to exfiltrate that data to a third country that they're not allowed to? Do I have the right access controls in place? Am I clearly communicating with the user, with the consumer over their data processing rights and what we're doing with it? Am I working to actively minimize those things? I think that as technologists, we all have a role to play in trying to look at that. That's part of the engineering craft that we need to bring into the role. I do think it's important to have that in your mind at all times when you're working on things.
[00:08:53] Markus: One of the things I find really fascinating as well is that it's not just the-- There's a couple of things I want to pick up on there as well. It's not just the environmental use as well. It's the quite alarming attempts to wipe out, and I use that word deliberately, whole species of property rights. One of the things we see in the creative industries, for example, we see that the big tech companies are making extraordinary claims that taking creative work for training data is perfectly acceptable without compensating the creators. That's a topic that's very near to my heart.
I worked in the music industry on both sides of the fence and in my capacity as a data person. It's those sorts of things as well that I'm mindful of. I know Tiankai, you're a musician as well famously.
This is a real fertile ground. who would have thought compliance and intellectual property and social policy is now part of a technology conversation explicitly. One of the other things I wanted to pick up on there as well, Emily, is that when you talked about the amount of data that you want to collect, I know Tiankai has written around data hoarding. That's something I'm really, really interested in exploring because that also has those impacts on your data sets, your data models, management costs, all of those sorts of things. What do you think about that?
[00:10:24] Emily: I think that there's been a philosophy that goes back at least 20 years in the industry that's connected to this quote, data is the new oil. I think that it's funny because many aphorisms of our age is misappropriated and presented not in its entirety because the original quote, and I forget the original author, said something like, "Data is the new oil. It is useless until it is refined." It just goes to show that there is this mentality of like, "Oh, we have to get a lot of data. We have to get a lot of data. We have to get a lot of data."
There is this data mining as a practice was all the rage 15 years ago. You very rarely hear anyone talking about it now because it's been usurped by all of these other practices. That mindset stuck. It was this idea of like you have to collect data, and having the data has value in some way. Then you get to this hoarding behavior, and it paints you into a corner because the more data that you have, the more risk you hold, even if you're not doing anything with that data. Because you have so much more risk of compliance risk for GDPR. You have so much more risk of data leakage.
We see every day there's another news story about data that was exfiltrated from a company because they hadn't secured a bucket properly. The new trend is credential stuffing attacks on employee logins that give access to either CRM systems or internal tooling or even public cloud systems. That data hoarding, it does create a bunch of risk for you. I think the irony is that if you don't have good practices to extract value from data, the marginal value of that data diminishes one over N with time. That data that you have from eight, nine years ago probably isn't that good anymore because those customers have changed their behaviors.
The world has changed. You're probably not going to get as many deep insights from that now, half a decade or more later as you would have if you were able to act on them within the hours, days, or weeks after collecting that data. I think that this idea of data hoarding needs to be coupled with a revamp in a practice. Getting to that continuous delivery type of paradigm with data, meaning you're able to act on it much more rapidly, you're able to leverage it in a much more meaningful way in order to drive near term value.
Then if you do believe that there is a need to keep that data, either that it has some sort of long-term value, which can be the case, then I think that you have to really think about the usage patterns and how you might want to take that data and store it in a safer way or to provide some more other controls around it so that it retains its value, but you start to minimize the risk.
There are, of course, situations where you have to keep receipts or invoices for 10 years based on statutory concerns or whatever. That's all fine. I'm not saying you shouldn't do that, but for data that you don't need to collect or that the value is maybe ephemeral, developing the practices to use it, to get rid of it and to validate that it's been properly removed, I think is a key part of a data practice and a data governance practice.
[00:14:01] Markus: As a hard and fast rule, 8,000 years is probably too long.
[00:14:06] Emily: 8,000 years is probably too long, but I don't know, we'll still be using Postgres in 8,000 years.
[00:14:13] Markus: I use that number because somebody I know, a chap called Joe McLeod was researching cookie lengths, and Home Depot keeps its cookies or tries to keep its cookies on your machine for 8,000 years. I think they're quite up on that longevity research that we're seeing coming out of Silicon Valley just now. Maybe they know something that we don't.
[00:14:36] Tiankai: Maybe on another note, when we talk about data hoarding, with all of the craze and the upcoming Gen AI use cases right now, unstructured data becomes much more important. Unstructured data typically has been one of the things that we all have not consciously started hoarding because you always have any written documents or pictures that you always keep around and never look at anymore, but you just have it there because you never think of deleting it.
If we apply more Gen AI use cases to especially using it as co-pilots or assistants in your organization, you all of a sudden want to grasp all of these unstructured data to learn the institutional knowledge, to either RIG from it or fine tune it or whatever. The consequence of it is also that nobody had ever checked the accuracy or even the overall quality of these unstructured data types.
The hoarding then comes with a different risk where is it not even better to not have hoarded all of the data compared to now having hoarded all of this and having to check it all if it's even applicable to use to generate value and what is the worse option of it all. I think in this current age now with famously AI needing a lot of data to learn from and to get fine-tuned with, it's really tricky now to see the value of data hoarding, is it actually valid to have all the data or not.
[00:15:56] Markus: At Thoughtworks, our chief scientists have been one of the original creators of the Agile Manifesto. For software, you could say that's almost a manifestation of the just-in-time manufacturing practices we saw developed in the second half of the 20th century. It doesn't feel to you that approach now is becoming more and more salient in terms of data management, data product creation. For me personally, that seems to be the logical flow of this. Rather than collecting all the stuff, it's collecting what you need when you need it.
[00:16:32] Emily: I think we're already starting to see that, but I think that it's an incomplete story. Because the just-in-time mindset works well for software because software is a continuously evolving space. When you're building a system or a microservice or something like that, you're typically looking at life-- The interaction lifespan is usually in the order of seconds. You have a request, you get a response, and then that completes the transaction. That's where the value is encoded.
When I click checkout, there's the value of that checkout application or microservice on the website is encoded in the user experience and the flow and all of that stuff, and it must do the right thing and all of that. When you're looking at data, you're looking at a much longer time span. The value doesn't come from doing one thing really, really well a million times an hour or a million times a day or whatever it might be. It comes from something different, which is the time history. It comes from the trends. It comes from the patterns.
It comes from the diversity of the input space that you have. I think that the just-in-time approach of Agile is an important part of building for more robust data systems and data architectures. Then I also think that there's an element of this that we need to not overlook, and that's the product side of it. Agile is also inherently product-focused and user-focused. One of the things that I see a lot of people overlooking in the data space is trying to establish that long-term product vision. One of the problems with just-in-time is that you can't go back in time and collect data that you didn't really record.
You might be able to infer it based on data that you did record. You might be able to buy it from other sources, but you're always going to have that sort of limitation in the quality. One of the things that you might want to start thinking about is the overall user experience, the product experience, and start to think forward into the future so that you are starting now to collect data that you might need for a future iteration of your product. I'll give you an example that I've heard of. I won't specify who did this, but there was a dynamic pricing application case study that I heard of at one point.
The client wanted to build or set up a dynamic pricing, and they had a lot of data, and they were very convinced that their massive amount of data that they had, this was back in the time where big data was the buzzword. They were convinced that this massive amount of data was going to give them all of the clues that they needed to build a wonderful dynamic pricing engine. The problem was that they had never really changed their prices, so they didn't really know what the impact on a changed price would be on the user behavior.
Despite the fact that there is a lot of data without that sort of dynamic response in the data with regards to the type of product that they were trying to build, they weren't really able to get as much meaningful information from it. That project or that effort required the import of external data that they had to purchase in the open marketplace in order to solve that problem. I think that that's a good example of thinking ahead.
Today or this year, you may not have a strategy to start experimenting with your pricing, but if you have a five-year plan or five-year roadmap that says, "Hey, we're going to start using AI to do product recommendations, or pricing or something like that," you may need to start collecting that or doing the experiments now even before you write the first line of code for that AI, just to start collecting the data to see what the user behavior will be.
I do think that that does go into that agile mindset because then you're starting that philosophy of closing the loop with experimentation, closing the loop with iteration, having things like feature flags and stuff like that, where you can start to gather that dynamic type of data. I think it's a multifaceted problem space, and having that mindset will do a lot to help your future business.
[00:20:41] Markus: Tiankai, some of those qualities there that Emily was speaking of very much more dependent on a person's skills, experience, their empathy, their inclusivity, their ability to engage with different kinds of people. It's not enough now to be, "Well, I know Snowflake and Airflow, and Python." It seems that that sort of more open and inclusive approaches is right at the heart of the skillset of a modern data practitioner. Do you want to comment to that, Tiankai?
[00:21:15] Tiankai: Yes, absolutely. That, I think goes back to what I mentioned with what competence actually means to me. The whole idea that data practitioners cannot afford to just sit back and wait until everyone else learns how data works and then say, "Oh, they finally speak my language, now I can talk to them." They have to make a step towards the business stakeholders and their expertise areas, too, to find common understanding with them.
If we talk, for example, dynamic pricing, then of course they can look into all of the data and the metrics, but there needs to be some business reasons behind it. Some marketing expert or an e-commerce retail expert, for example, would know a little bit more about the background of how it all has to have a business objective behind it. Then to understand each other, maybe spend some more time together rather than just expecting certain tasks to be done by each other, would actually help with that.
That is starting rather with mindset than with skillset. The whole mindset of not thinking, I'm just an expert in my own area, and everyone else has to come to me, but I need to expand my knowledge, and I have to understand more than my core expertise area. That's the only way of how you get from just being isolated departments to actually collaboration, which is one of the other Cs, and to then drive value with data.
[00:22:33] Markus: I know it's a bit of a hot button term at the moment for reasons, but this speaks to diversity, equity and inclusion as part of that whole process. From your experiences, is there some strategies or some sort of tips that data practitioners might want to think about in approaching their data problems from that perspective?
[00:22:56] Tiankai: Yes, sure. I think to be very honest and also speaking from my own experience, the other side of DEI, as you called it, is unconscious bias. Having been a corporate citizen myself, I know that there's always these narratives about specific departments that you just start building. That creates sometimes rather negative connotations rather than positive ones, as like there would be rumors like, "The data people will never understand, let's just do our own thing," or, "IT never does what we want them to do anyway, so let's do our own thing." That is not helpful.
That is also in the sense of DEI, not really the right behavior to do if you start having that kind of biases, and you actively amplify those to exactly go the other way instead of solving that problem. I think anyone who's listening, if they think about who they need to drive success and impact with data, and they're automatically trying to avoid certain departments right now or colleagues, maybe instead think about how you can bridge it to them and close the gap to them more rather than trying to find the quickest way around them to not do it the right way. Because in the end, I think it's just better if we all work together to drive it. We all have our own responsibilities, and doing things in isolation is never great.
[00:24:13] Markus: Emily, is there something-- Thank you, Tiankai. Is there anything you want to add to that?
[00:24:17] Emily: Working with data requires you to start from a place where you assume that you're wrong. In order to get meaning out of your data, in order to have useful insights, it's really a process of whittling down the ways in which you are wrong. You have to exclude certain things to come to the right answer, to the best answer.
I think that when you have a team composition that assumes that their perspective is universal or that assumes that their perspective is the right one, it becomes much more difficult to rule out those ways that you might be wrong or that you might be right. It really reduces the way that you can be inquisitive about the patterns that you're seeing and the expectations and the assumptions. In order to develop a data-driven mindset from a business context, I think that it's really important to get away from this mindset that there is data that you just need to pull that will give you the answers.
There's no magic table that just has all the right facts that you just need to write a magical SQL query to get the data out of. You have to introspect and look at the data and start to ask questions about it. A good data scientist, data analyst, data engineer, data practitioner will look at data as an opportunity to uncover the right question rather than uncover the right answer. I think that when you have a single mindset, or when you have a dominant culture or dominant point of view, it becomes much harder to do that. There's a reason why diverse teams work better and perform better. It's because people are able to look at problems in different ways.
They're able to bring in different cultural contexts. I'm an American, for example, but I work and live in Europe. Every day I'm challenged by the European way of doing business, which is different than what I've been brought up into. It shows me something about how I need to look at the business world. It challenges me how I need to look at problems. It has changed the way that I approach problem solving. I do think that it is important to have that diversity of experience and diversity of culture, even if you're solving what you think are very deterministic, tangible, techie type of problems.
[00:26:48] Markus: No, that's fantastic. Thank you very much. I want to change gear a little bit and talk about maybe advising people who are starting out as data practitioners. There's a wealth of experience here in the room. How can we support and encourage the careers of people who are thinking about moving into data as a career or thinking about, is data still the coolest/hottest job in the world? What are those sorts of things that you should be thinking about? We'll start with you, Emily.
[00:27:32] Emily: Yes, I think, data is not going to go anywhere. There's a lot of good in what's coming out of AI right now. I think there's a lot of hype in what's coming out of AI right now. Regardless of how that plays out, the truth is that we live in a digital world, and it's not like we're going to have less data in the future to work through. We're going to have more no matter what. The world is shifting to a knowledge-based economy. If you want to get into data, now is the best time to do it because there's so much opportunity.
There's so much diversity and need for data practitioners in every aspect of the economy. I do think that it's a great field for people to get into. I think what's fantastic about it is that it is so open to multidisciplinarity. If you have a background in psychology, data can be the right field for you. If you have background in business or economics, or if you're a mechanic who wants to work as a data practitioner, there's a massive amount of data that's being produced in factories and from automobiles and airplanes and jet engines and all of these things, refrigerators and industrial equipment.
That context knowledge is super, super valuable to be able to solve those problems. I think it's just such a fantastically open field for people to get into, which is why I've always loved it. It's why I continue to love it. Yes, I think if you're looking at a data career, it's never been a better time to get into it.
[00:29:13] Markus: Tiankai, how about you?
[00:29:15] Tiankai: Yes, I also just wanted to address that the options of so many disciplines being involved in it. Actually, it's a really good time because I just saw yesterday that a new infographic from the World Economic Forum about the hottest growing jobs in 2025. Number one says big data specialist. I had to laugh about it because big data has been a word that I haven't heard now for 10 years or so. The World Economic Forum thinks it's a blowing up term for 2025. Other than that, it's really clear. That a lot of research companies, they all agree that data jobs are really hot, and they are really in right now.
It's really good time to get into it. I think maybe one other thing I would only add is that some people ask me, especially when they're not having any data responsibilities in the day to day right now, how they get started. They tell me that they don't do anything with data at all.
Then I tell them, "If you just look into how you manage your own personal finances," I'm sure everyone does a little bit, "That's already a great start." You're probably somehow classifying your own expenses in a certain way and your incomes, and you're doing something with it to manage your life and just think about that a little bit deeper and then apply what you learned there. It's pretty similar. You can basically build on that. There's always something that you can build on in our lives. We can all get into data one way or another.
[00:30:40] Markus: Just from a very specific corporate perspective, as data leaders, one of those things we want to develop is more data literacy, competence, confidence, within an organization, your point, Tiankai, well, I don't do data and the corporate environment is, it's definitely not true. What are the sorts of things that data leaders can do to start to build that confidence first? Then that competence and literacy.
[00:31:10] Tiankai: If we think about where confidence comes from, it's really the feeling of knowing enough and that you're able to apply it in a certain way, but you also seeing that it actually leads to an impact, and then we can get confident about what you know is actually the right things that you know and that you can apply it with impact. That takes almost a feedback loop within a group or within an organization. That what you learn can be applied and that you are having the right environment that you can apply it, meaning you'd need some kind of a environment where you can try out new things, and you can experiment a little.
To also then acknowledge the impact and the success you get out of experiments. On the other hand, that you're able to also fail if things don't work, and you're not getting punished for it by the culture that you have. An interplay between having the right cultural environment to be able to bring in new knowledge, to try it out, and then to basically seek the impact with what you have, plus then spreading the word around everyone being able to actually try new things and to then scale up that kind of experience from learning new things to apply it, especially in the data space, I think that can be really in the focus and is a good way to do it.
[00:32:24] Markus: Emily, from a macro leadership perspective, you're dealing with, the people in those leadership positions, and they hear we need to be more AI driven. We need to be more data driven. Those are very vague, very woolly sorts of terms. What are the conversations and practical steps that you're outlining with those people in leadership positions to achieve those sorts of things?
[00:32:49] Emily: Yes, I think the first thing to do is to ask them, what do you mean by data driven? What do you want the data to drive? Are you willing to make different decisions if the data tells you something that does not go against your instinct or your beliefs? Then really to start looking at how they use data and how willing they are to let data inform their practices. I think that part of it is a cultural aspect. You need to be able to be challenged by data. If you don't have the ability to be challenged by data, then I think the first thing to do is work on the culture of asking questions of data, and letting data surprise you.
Then I think when you get to that point, then it's really a matter of user experience and practice and pragmatism. There will never be a world where CEOs and VPs are writing their own SQL queries and building their own dashboards. From the very earliest days of my career, dreamed of low code solutions where you just click in those magic insights appear, you're still going to need the expert insight, the professionalism, the nuance. Then really trying to build in that culture of focusing on the user, focusing on their needs and making sure that the data people are understanding the business side of things so that you can come together, I think is also really important.
I see a lot of companies doing data engineering for the sake of doing data engineering. This is why we get into these situations where massive enterprises have 7,000 SQL jobs and over a million lines of SQL code that nobody knows what it does or how it works or what purpose it has, or if anyone uses it, and they're spending tons and tons and tons of money just in pure computing time, just to make it all run. I think that bringing data closer to the business is super important to become data-driven, but it also requires commitment from both sides of that divide to really work to bridge that gap.
[00:34:49] Markus: No, thank you, Emily. I was just going to say one of the things as well, we hear about from a compliance perspective. I know it's not one of the C's, but this is one of the things-- I guess if you were to put that to somebody, what are the five Cs? Some of the [unintelligible 00:35:02] they'll go compliance because data governance is about saying no, as we all know. One of the concerns that we see, and this is something we talked about earlier a little bit around privacy, security, ungoverned algorithms and AI and the legal concerns that all throws up. What are the sorts of conversations that you're seeing, Emily, in those sorts of spaces and some of the advice that you're giving out?
[00:35:30] Emily: Yes, I think that right now, the biggest concern is uncertainty. I think that people are uncertain what all of these new regulations are going to mean for the business. Certainly, I think that it's important to stay compliant with the regulations, but I'm also not super afraid of them. I would encourage anyone out there to also not be afraid of them. Compliance should be an opportunity for you to look at and inspect your engineering practices to make them more efficient. Actually, I think that what we've seen in things like GDPR compliance, I've been involved myself as a tech lead during GDPR compliance initiatives.
What we ended up with was the exercise of making sure that our pipelines were compliant, and our storage systems were compliant was at the end of the day, a much more robust and efficient and secure pipeline. That compliance initiative, I think it took us a couple of weeks of dedicated effort to make sure that everything was actually in compliance. At the end of the day, we saved that money, probably tenfold by having a better system. I think the same is going to be true for AI. I think that the same is going to be true for data sharing with the Data Act.
I think you'll end up in a better place with better systems by having to comply with some of these forthcoming regulations. For me, I think it's not a dirty word. I think it's an opportunity to do things the right way. Yes, you will have to spend a little bit more money in the near term, but typically, we do see that data teams are underfunded in terms of having enough resources to properly engineer their systems. I think you'll be fine when it comes to that compliance.
[00:37:20] Markus: It's funny you say that, because that's the conversation I have with a lot of my clients, particularly in the banking space. It's that the regulators are inviting you to look at your estate and give you the permission to take a fairly aggressive approach in clearing up that technical debt, because that technical debt, that will invariably get you into trouble with a regulator. That's just my personal experience in sample of one. Tiankai, around those concerns, what sort of other conversations that you've been having with people?
[00:37:56] Tiankai: Besides what Emily already mentioned with the uncertainty, the trend goes that there will be more and more regulations and more and more concerns out there. The question is also how to build up a certain scalability and flexibility into dealing with regulations and other requirements that we don't know about yet. We are innovating rapidly right now in the AI space for sure as well. That can lead to so many more consequences that we don't know about yet.
How can you actually build up a foundation where if a new regulation comes in, we don't have to start from scratch again to build up new things, but we can just adapt what we already have and to then easier in a better compliant way do that. Another thing would be only to think about these regulations, not only as these burdens that all companies have to face and another thing on top of what needs to be done already, it's a reactive way of dealing with really human concerns. The whole risk ratification in the UAI Act is there to actually protect humanity in a way. Society, albeit really, I would say reactively, but we can do that proactively, too.
If you think about ethics, think about DEI aspects a little bit more practically and that these thoughts and the experts in the organization have a say into how innovation works in the organization, maybe you don't have to react anymore, but you're already on the right track, and you're by default doing the right thing already. That could be really helpful in the future.
[00:39:25] Markus: I think that leads us into the final question to you both. Tiankai, that's a really, really optimistic view of looking at things. You're one of the most optimistic people I know.
[00:39:36] Tiankai: I try. [laughs]
[00:39:39] Markus: Why is that optimism so important now when it comes to data strategy and being a data practitioner? Can you share your thoughts on that, please?
[00:39:50] Tiankai: Sure, I think that's a special thing that some readers noted to me that optimism seems to be a word that comes and pops up over and over again in the book. I only realized that afterwards that it's reflecting my own attitude towards it. If I would decode it, I believe that optimism is important because for things to get better, you have to first believe that things can get better.
That is naturally optimism. Because if you think things will not get better anyway, and nothing will work, then what's the whole point of doing it in the first place. My hypothesis is just that if you really truly believe that things can get better, then you will make an effort for things to get better, too. You are driven by that intrinsically rather than doing it because you have to. As long as we can keep our optimism to know that data will become better, AI will become better, we will do the right things in a better way, then that is the best human drive we can have to be successful in data.
[00:40:52] Markus: Emily, I saw you nodding there. Do you want to add something?
[00:40:56] Emily: Yes, I think as a social justice activist that I am, there's this belief or the saying that we have that a better world is possible. I think that even if things are really chaotic and really feeling like things are not going your way, I think the precondition for making things better is believing that they can be better. Optimism is necessary, but it is not sufficient. There's also a saying that hope is not a strategy. I think that you have to couple optimism with action and with intentionality and with risk management. You also have to be willing to accept that not everything is going to work out the way that you imagine it to be.
Resilience is also part of that process. When we look at the data space, there's plenty of stories out there of companies that have spent tens of millions of dollars on a failed data initiative. Then they become a little bit skittish of the next data initiative because the last one failed. I think that the only time that you can really count something as a failure is if you don't learn from it.
What I would say to all the data professionals out there is that, look at what you can learn from different initiatives that may not have been successful. Take those learnings, figure out a better way to do it, and believe that the outcome that you envision is possible. It may just not be with the process that you envision. If you can release yourself of the burden of believing that the right outcome can only come from the right way of getting there, then you'll be all right. Believe in your outcomes and let the outcome be your guide, not the process.
[00:42:47] Markus: I think that's a terrific place to end our podcast today. It leaves me to thank Tiankai. Thank you for being such a great partner on this takeover of our podcast. Thank you.
[00:43:01] Tiankai: Absolutely. It was great fun.
[00:43:03] Markus: Emily, it's always a pleasure to speak with you. Thank you so much for making time to speak with us today on our topic.
[00:43:11] Emily: My pleasure.
[00:43:12] Markus: It just leaves me to thank our producers, Kevin and Ryan, for the hard work that they're putting in to bring this to you all. I want to thank everybody else, the listeners for joining us on this episode and our takeover of pragmatism and practice. We hope you enjoyed our discussing humanizing data strategy as much as we have.
As I mentioned in the previous podcast, the book is available at all good and evil bookstores everywhere. If you enjoyed this episode and have any questions, please feel free to reach out to us. We're active on all the usual social media channels, LinkedIn in particular, and we'll look forward to hearing from you. Thank you all very much. Hopefully, we'll get a chance to speak to you again soon.
[music]
[00:44:06] [END OF AUDIO]