Getting to grips with data visualisation

Podcast host Zhamak Dehghani and Alexey Boas | Podcast guest David Colls and Ned Letcher

January 21, 2021 | 42 min 9 sec

Read transcript

Listen on these platforms

Brief summary

A vast array of powerful data visualization tools are gaining traction in enterprises looking to make sense of their data sets, for instance D3, Bokeh, Shiny and Dash. In this episode, our team explores to concept of data visualization as part of a complete digital experience, with the workflows and journeys of a wide variety of users.

Podcast transcript

Zhamak Dehghani:

Hi, everyone. Welcome to another episode of Thoughtworks Technology podcast. I'm one of your regular hosts Zhamak and I'm here today with my co-host Alexey. Hi, Alexey.

Alexey Boas:

Hello Zhamaki. I'm Alexey. I'm head of technology for Brazil, and it's great pleasure to be here with you all.

Zhamak Dehghani:

Wonderful. Today we're going to talk about a topic that's actually close to my heart, data visualization. And the reason we picked this topic is that we had a lot of conversations around visualization in general in the last few months when we were putting the latest version of our Tech Radar together. And it looks like as if the technology landscape, the developer ecosystem is becoming more and more complex. A picture, a visualization goes a long way in taking away that kind of cognitive load that we're all feeling. And our discussions were around, I guess, a spectrum of visualization tools around architecture visualization, infrastructure visualization, and data visualization. Today we have two guests with us, David and Ned to deep dive into data visualization landscapes. Hi, David.

David Colls:

Hi Zhamak. Thanks for having me.

Zhamak Dehghani:

Of course. Can you tell our audience a few things about you. I think they might recognize your voice because you're becoming one of our regular guests.

David Colls:

It's a pleasure to be back again. I lead the data and AI practice for Thoughtworks Australia, and I'm a visualization enthusiast from way back, keen to explore how we can apply it to all aspects of our work.

Zhamak Dehghani:

Wonderful to have you. Hi Ned.

Ned Letcher:

Hello. It's great to be here.

Zhamak Dehghani:

Wonderful. Can you say a few things to our audience about yourself?

Ned Letcher:

Yeah, sure. I'm a data science engineer at Thoughtworks in Australia. I started earlier this year and since joining, I subscribed to this podcast and I've been listening and really enjoying it. It's pretty cool to be on the other side.

Zhamak Dehghani:

Wonderful. We have a multi continental episodes from Brazil and Australia and US. It would be great to deep dive into what data visualization is about today.

Alexey Boas:

Yeah. And maybe that's a good starting point because we've been hearing a lot about data visualization, several different use cases, it's almost to an umbrella term. David and Ned, what do we mean by data visualization? What is it and why is it important at all?

David Colls:

When I think about it Alexey, I think about rendering data in a form that can be consumed intuitively as well as analytically. Often we do need to look at visualization through an analytical lens to use it to understand and quantify problems and effects. But the intuitive aspect of data visualization, I think, is what makes it so powerful because it allows us to spot anomalies and discrepancies easily without having to engage deeper thinking. But it also probes us to ask why type questions as well. Visualization helps us discover questions as well as produce answers to those questions. And so that's where I think the power of data visualization lies.

Ned Letcher:

Yeah. It's a really useful tool, super important to a lot of what we do with working with data and getting value from it. And I think, for me, one of the big reasons why it's important is, we're trying to understand what's going on with data. You want to sort of get a sense of what the data tells you. We can't understand an entire data set intuitively. And so what we are often trying to do is collapse it down to something that we can get a handle on. One way people do that is summaries statistics like means medians and those sorts of things. But they can often be deceptive.

Ned Letcher:

That the mean of two datasets can be quite similar, even things like standard deviation which is a bit more of a statistical sort of way. They can mislead you and so there's actually a really cool example of this called Anscombe data sets, a bunch of different datasets that have all the same high-level statistics, but it turns out when you plot them, they look radically different. They're like circles and weird lines that are totally different. It's just super important to actually get the data and look at it and see what's going on.

Alexey Boas:

It definitely looks very useful in many different ways for different people. We talk about data visualization to understand data sets and the connection to business intelligence for awhile and data science, machine learning, but also different types of consumers. From a business perspective, a lot of people talk about data dashboards and the importance of having them and being able to look at that. It's really something we feel that, applications in different ways and several different stakeholders or people who are interested in that. Is that so, is that true?

David Colls:

I think the key thing when designing a visualization and sometimes this might be a personal exercise where an individual analyst or data scientist is working with the data themselves and using visualization to try and understand it purely for themselves, but more commonly the visualizations actually going to be consumed by a wider audience. And in that case, you need to think hard about the audience and what their needs are with regards to the visualization. And it may be that there are people quite similar to you. There may be other data scientists or analysts in which case they'll come from a similar level of understanding about the visualization techniques you're using, or it may be a specialist business audience, or a specialist organizational audience that have particular conventions about representing data or may be very general and wide audience and we might even talk about, even the broader public.

David Colls:

We might even talk about data journalism in that case. How do you craft a narrative with data in a visual way that can be understood by a wide audience? And we've done some interesting work in that regard. We're the energy market operator here in Australia making the workings, inner workings of the energy market transparent as was the mandate of the market operator to run a transparent market. Telling that story through data visualization to the public and the techniques that you might use to tell that story to a wider audience are quite different to what you might do with your pursuing an individual visualization and understanding exercise.

Ned Letcher:

To add to that, I think what Dave said is quite right. It's really important to think about the purpose of the visualization when you're thinking about who the consumer is. There's different applications of database. We can talk about discovery of insights. And so if you're a data scientist or analyst, there's exploratory data visualization for sort of seeing what's going on, and then that sort of means that you as the practitioner kind of the consumer but then we could also be talking about database for communication, and so we could be building out dashboards or presentations. And then in that case, indeed, we're talking about business stakeholders and they're going to have different needs, different desires of understanding and they'll have ...

Ned Letcher:

I guess you really have to sort of exercise your theory of mind and do some research and work out what do they want to get out of it? And this is where we can see agile practices being important, because you don't always just want to sort of throw the visualization at them and then sort of hope that it works. You want to actually get feedback and check that it's sort of doing what it needs to do.

Zhamak Dehghani:

Just to contextualize this for our audience, it looks like we have to put the audience or the consumer of the data visualization at the center of that practice. From the moment we decide how are we going to visualize the data, we have to think about who is going to be using it? What characteristics maybe they have? Can you kind of put this in some examples or archetypes. Are there any specific archetypes of consumers or consuming use cases that you could kind of enumerate for the audience and give a little bit of an insight into if you're doing visualization for this archetype for a data scientist, I know you started as an exploratory phase of data science, myself as a data scientist is a consumer, and these are the things like we need to consider. Can you elaborate on that, a few archetypes and then deep dive into what sort of things we need to think about in terms of the approach or the characteristics of data visualization.

Ned Letcher:

Sure. I can talk a little bit about maybe sort of from the exploratory data analysis side of things and from data science, then maybe Dave might have a few thoughts about sort of the business side of things. I think often you're sort of talking about a context where you've got a new data set and you're sort of trying to work out what's going on. And so it's not just data visualization. It's the whole sort of process. We're trying to work out, do we need to do any cleaning? Is there anything going wrong with the data? And so I'll open up the data set for me, that's often in a tool like a Jupyter notebook I'm using Python. And visualization is sort of one of the first steps just to sort of eyeball it.

Ned Letcher:

We can actually use tools to say, "Well, is there missing data," so we can visualize those missing values or numerical values. And then at that point, you start to think about what kind of data do I have. Is this a time series data? Are we talking about sort of pedestrians moving around the city, which is an example I've looked at recently. So then we're sort of potentially talking about my mind shots and it's not always clear what the plot type that you want is going to be. And so sometimes there's a degree of experimentation and I think that's another sort of important key thing, is iterating on the visualization. You'll try something and see how well it works.

Ned Letcher:

And then you'll sort of try again. And so a lot of the time with these data sets, there's this big sort of high dimensional data sets and maybe we need to work out, is this going to be a heat map, do we need to sort of collapse this down? Do we need to do some cleaning? But again, it sort of really depends on the process. And I guess, even though I said before, it's important to just think about the fact that there's different sort of consumption patterns and you're the consumer here as doing the exploratory data analysis. It's still really important to sort of do things like add titles to your plots and label them because just like your comment code, you don't want to get back and go, "What was this plot all about?" I can see there's something going on the Y axis, is this kind of sales? What is it? Aggregated on the X axis? And so I guess it's important to sort of think about those little details as you're going through the analysis.

David Colls:

Yeah, absolutely Ned. I think we need that type of analysis, so I'm always conscious of future Dave as the audience. Future Dave will have no idea what present Dave meant when he put that visualization together. Really try and make everything as explicit as possible. But in the business context, it actually helps to think of it, visualization, not as a separate activity at all, but as any kind of digital experience as you might approach any kind of digital experience and delivering that digital experience to an audience because visualization doesn't exist for its own sake. It's often there to provide insight, to tell a story or to inform action. And so starting, even from the user experience perspective of understanding your audience, their pains and their gains in their regular work for a standard dashboard is pretty key for the design there for ad hoc analysis.

David Colls:

That's generally a more lightweight process, but if you have differentiating between a kind of run-the-business question, what we do every day, say in retail scenario, how much stock should we order based on stock levels and sales in the recent past. And that would be the kind of thing that you would find on a run the business dashboard versus ... And that can be designed as a full digital experience and actually delivering the product with a digital tool set that allows you to manage the visualization as code actually gives you a lot of flexibility in how you do that without a lot of additional overhead. But then if you're looking at other business scenarios such as more ad hoc descriptive or comparative type queries like, "What happened here? Or how does this compare to other scenarios?"

David Colls:

We don't have a lot of stock in the shelves. Or you might've noticed that we don't have a lot of stock in the shelves, let's run a query and visualize exactly the parameters around that. Or is this similar to last Monday or last weekend when we were also missing stock. Those sort of ad hoc scenarios you don't necessarily invest the same amount of effort in the user experience in those, but it's still pretty key to understand your audience.

Zhamak Dehghani:

Yeah. I love what you just said Dave, that data visualization is just a piece of the digital experience and digital workflows and digital journeys of different users. And integration of that piece to the full journey experience is an important one. And sometimes as you mentioned, that integration involves just showing some data and sometimes it involves interacting with data and querying and questioning, and then looking at it differently to satisfy answering questions or whatever journey they're on. I think that's a wonderful framing of where it fits into the suite of applications for business.

David Colls:

And often there are different audiences as well for the same piece of information in the same way as your digital delivery team might have statistics about deployment frequency that need to be translated into something that's meaningful for the business. That might be very helpful for the team to manage their own work. We see the same thing in visualization, and so there might be multiple layers of an analytics visualization for instance the visualizing the loss or performance of a machine learning model in a way that can drive the training cycle. But then that has to be interpreted in terms that the business can understand. Potentially, what is the revenue uplift as a result of this improvement in performance? Or what is the time saving or cost savings as a result of this improvement in performance?

Alexey Boas:

Yeah, those different levels I also find it striking the power of communication that they bring. Ned was talking about communication. It reminds me of one client retailer we were working with years ago, and they were interested in having a dashboard around inventory. And it turned out that they had five or six different definitions for inventory, depending on different areas of the company. And they all approach it in different ways. They were all correct because they had different perspectives on the same concept and even driving that conversation and allowing them to look from those different perspectives on how does that change the operational process. It's really, really very powerful.

David Colls:

Yeah. I think one of the great things about visualization is it allows you to put things in context as well. And related to that comment about future Dave needing to make everything explicit, you can make those different definitions of inventory explicit. You can find the right visual metaphor, such as a waterfall chart, for instance, to show how components or the relationship that set type relationship, add these things together and you get this definition or take this away, take this definition of inventory away from this other definition and this is what you're left with as another definition. And so this can help people understand the relationship between things. We had a similar situation working with a call center a few years back where there were at least seven different definitions of the duration of a call and actually by producing a fairly raw visualization where we recreated the flow of a call through the call center then it was quite clear, the difference between those different definitions of call duration. What other events they related to through the lifecycle of the call?

Ned Letcher:

Yeah, that's it. That's a super interesting example, Dave. I think it actually gets at, in particularly we're talking about business applications of visualization. It's not like the analyst just sort of sits down with some data and sort of has added there. We're sort of talking about the output of often somewhat involved journey towards getting data ready for preparation. That sort of organizations have data platforms and data has to be integrated from various different sort of CRM, internal databases, web analytics sort of sources. And then data engineers might be involved in sort of curating. And then we're getting to the point, then finally, we've sort of got this nice sort of claimed tables where they're suitable for doing visualizations.

Ned Letcher:

And that's not just a sort of a purely abstract data process. Obviously it's super important to understand what's the business questions that we need to ask, what are the key metrics of sort of performance? Key performance indicators or KPIs and an interesting thing there is that, you often realize those definitions on actually a clear cut. They might be described in say a confluence page in plain language and language is often ambiguous. And so people kind of come up with different interpretations of it. There's actually an interesting trend that I've been noticing whereby we see some parts of the BI process actually being incorporated into the data warehouse and the modeling stages. And so then you're talking about sort of converging on single definitions of KPIs that multiple consumers. The data scientists and the analysts don't end up having to recreate the implementations of KPIs, for example. It's super interesting, because we're not just talking about data as such, it's this sort of intersection of business and data concerns which makes it all that much more interesting and challenging.

Zhamak Dehghani:

I want to move us a little bit into tooling. One thing I noticed that when we were putting the radar together, there were a lot of existing tools that were resurrecting again, and we're talking about them bulky. And there were a number of newer tools Dash and Shiny. And there was this really big spectrum of, I suppose, tools from simple Python libraries for you to just try something. And in a quick visualization to full blown enterprise web applications, we did a little visualization at the center of it. I wonder if you can kind of unpack the landscape of tooling today, what flavors of different tools we have and what trends you see in that to kind of guide the ideas to navigate this space.

Ned Letcher:

Yeah, sure. And it can be a bit overwhelming indeed, because there's so many different tools and flavors of tools out there. I think it's important to sort of contextualize and say, "Well, what kind of problems are we solving?" And one way that I think you sort of alluded to that we can sort of break things down a bit because we sort of have this sort of BI landscape of tools. We have things like, a lot of times people are using even Excel but then more sort of directed at that purpose like Looker and Tableau and Power BI, for example. I guess with those sorts of tools we're talking about enterprise applications that obviously a bit more off the shelf, they've got certain usage patterns that they'll work well with. And I guess there's a little bit less flexibility in terms of if, as opposed to the other family of tools that you mentioned, all these Python NR or even JavaScript tools.

Zhamak Dehghani:

It looks like we have a set of tools that are on the consumer facing, the tag on top of the existing BI or business intelligence package solutions. And they give you some things out of the box easily, but they're more rigid. And on the other end of the spectrum, we've got small tools, D3, I'm a big fan of D3 because they can quickly put a file together, JCL file together and get this beautiful, delicious kind of visualization made. So you have D3 and maybe Bokeh and those tools and something in the middle, maybe Dash and Shiny that they have their own application. Do you think there is an element of integration that people should care about?

Zhamak Dehghani:

I wonder how we can guide the audience to say, "This category is good for this sort of use cases." And if you're thinking about dimension of, let's say, integration. If you want to integrate your solution with an enterprise package, these are the solutions to pick. If you just want to prototype quickly, this is another set of ... I guess we can kind of dissect the taxonomy of the tooling on there. So many different dimensions here.

Ned Letcher:

Yeah. I think that's a good question. A lot of the BI tools do have the ability to call out to Python packages. What's interesting there is that it's often, it's kind of in those enterprise tools own terms. They might have a built in Python environment and you don't have control over that. There's problems with that where you don't know which version of Plotly or bouquets that you're getting out of the box, and it may be hard to do version control, for example. A pattern that I've seen regarding integration that I think works quite well is where you sort of have an API around slightly upstream of the BI tool. And so you can see this with, for example, Looker and they have this sort of look ML modeling in the warehouse, and then you can create client or a rest API from that.

Ned Letcher:

And now we can sort of have our Python users out data scientists tapping into the same models that are being fed to Looker. And there's a few other sort of tools that can help with that besides Looker. And I sort of see that as being a slightly, perhaps better way of integrating those two different worlds where you sort of define your KPIs in one place and we can bring our own best practice engineering tools to whether it's with Python or JavaScript.

David Colls:

I tend to look at the ecosystem at a higher level. I don't have the depth of experience of Ned with the current crop of tools. And I tend to think in terms of, are we using well understood data with well understood requirements on one access through to, is it a more type of exploratory analysis or is it new data that we don't understand well? And so that's the access of the data. And then when it comes to the presentation, are we going to be using standardized presentation techniques? The types of things you might find in your Excel chart library or in Tableau out of the box, or do we need actually quite customized presentations to help us explore data or to tell a story in a unique way.

David Colls:

And so that gives me a quadrant diagram, if you can just visualize that in your head for me. And then the four quadrants in that diagram, we could talk about the sort of operational use cases as I talked about before the BI dashboard. This is well understood data and standard presentation or we could look at the power user scenario and self-service is becoming a much bigger driver for business analytics these days as well. This is more of the ad hoc analysis, the questioning what's happening with the data. The power user might be looking at some new data that they would be looking at it through standard lenses of the charts that are available in a typical library.

Zhamak Dehghani:

Yeah, I love it. I can see from this conversation, there's multidimensional matrix for me in my head that if I'm a power user and I don't really know what the data is versus I'm the end user, but I want some sort of a pre modeled visualization accessible through API, this is kind of the landscape. I'm curious though, is there another access here around data science oriented workflows where you deal with a large amount of data, so the size and the volume of data Needed, or maybe any other aspects of data science that we should think about when we pick a tool. And if you have examples of that, what could be suitable for those use cases.

Ned Letcher:

Yeah. And I think something was sort of just been alluding to the ... Really sort of picks up. Perhaps data size and volume is certainly part of it. But even before that, I think what you're trying to do with it, and in the context of data science, it's often a bit more novel, you haven't done it yet before. There's this exploratory sense that I talked about before, and we are talking about statistical techniques. We do need maybe perhaps slightly more sort of plot types that you may not expect to get from your BI tools. Coming from a data science perspective myself, I was using a BI chatting tool recently, and I just went to go and plot histogram, and I realized it didn't actually have a histogram, which surprised me because that seems like a basic chat type, but it turns out for a lot of business use cases, that's not something that you might be likely to see as much in a dashboard or a sort of an automated reporting kind of scenario.

Ned Letcher:

I think that's a particular dimension in terms of just what are the use cases and that sort of divides things up a little bit. And then we can go a bit deeper, we can sort of talk about, well, if your use cases are a little bit more statistically oriented, there are some packages that will work better. So say for example, Seaborne, which is built on top of map plot lib, Python libraries, it sort of has its Genesis in supporting interesting heat maps with dendrogram. So you get these tree sort of structures over the top of the heat maps, where you can unpick apart relationships between clusters. I've used that before when I'm trying to analyze email agents answering different calls. Responding to different chats on different topics.

Ned Letcher:

And you're trying to work out the patterns between the texts that you see in those. That was a natural language processing use case. We're doing topic modeling and try to get an intuitive sense of what are the relationship between these topics that we used to heat map to light up. It really depends on the use case is I guess one, one thing, what sort of plot types are you going to be using? I mentioned before time series, that's obviously super common and we're talking about predictive forecasting of whether it's sales, is obviously a common one. And then sometimes you will need to use tools to help you with massive data sets. A common approach there is to down sample and to just sort of take a smaller set of the data that will work in your memory.

Ned Letcher:

But if it's important that you do visualize every single point, there are tools out there, datashader is a really good one in the Python landscape for doing that. And it can even help you use a GPU rendering to sort of, I think one of the demos I've seen is a visualization of the US census, where it actually renders each point depending on the answer to a question. I think there are some basic sort of source that will help you do a lot of general purpose sort of data science and analytics, but then you can get really specific for different use cases.

Zhamak Dehghani:

No pressure, but I think there's an article I can see David and Ned co-authoring around mapping the landscape to different use cases for it.

David Colls:

Absolutely the data visualization map.

Ned Letcher:

Visualizing the visualization landscape.

Alexey Boas:

On that note, I mean, we talked a bit of several different tools and different use cases. I'm curious, how do you see the BI landscape changing? And Ned, you talked a little bit about incorporation of BI layer into the data warehouse. Is there still a place for the more traditional BI dashboards? We talked about operational dashboards, but some of these tools have evolved in different ways, but we still have some of the more traditional players, but together with new players. How do you see that landscape moving? And even if we consider the new sets of managed services that cloud providers offer and lots of new things Shiny, but how do you see that part of landscape evolving?

Ned Letcher:

Yeah, there's a few interesting trends I've sort of been trying to unpack some of these. I think one big one is that most of these tools are now cloud-based now that you sort of dial up in a browser. I think obviously you can use Excel locally, but I think that even now has a cloud version. As far as I can tell, perhaps there might be some aspects of power BI that you might need to use the local client, but even then it has a web version. Otherwise, the rest is in the cloud. One interesting trend that I've, I mean, I'm not yet sure that I would call it a trend, perhaps it is. It's sort of questioning the value of dashboards, which is interesting. I think I've seen a number of Twitter threads of practitioners sort of groaning about, "Ah, I've been yet another dashboard to make." They've asked me to sort of create a dashboard that gives me a handful of KPIs, whether it's sales, who knows.

Ned Letcher:

The stakeholders have added and then maybe they don't really hear from the stakeholder and they deliver it. And it's not clear if it's being used. I think there's a few things there. One is that I think I hear people saying things like, Well, maybe you don't always need a dashboard." Maybe sometimes people actually want to feel like they have agency or control, it makes them feel safe, sort of seeing these metrics going along and really what they want to be able to do is just log in and go, "Yeah, everything's green. We're good." Whereas in that situation, it sounds like what you really need is an alerting system to let you know when an important KPI goes red and then a Slack message or whatever.

Ned Letcher:

And then it's like we can actually go and dig into what's going on. And then I guess another trend that's related to that is that some BI services are actually moving more towards the notebook side of things which has become popular out of data science. Our markdown notebooks and Jupyter notebooks in the Python and also other languages and that's a really great way of doing this sort of often referred to as literate coding, where you have documentation in line with the code that produces it. And it's obviously particularly good for database because you can produce the outputs of the code and have documentation describing it. This is sort of perhaps an approach that some people are saying maybe this works well for a BI context where we can actually have these interactive documents. And it's now somewhere between a living document and a dashboard that kind of talks about what's going on. That's an interesting trend that we'll be curious to see how that plays out.

Zhamak Dehghani:

The point you made about this kind of ad hoc dashboards, this dashboard or that dashboard, I think it comes back to David's comments earlier, that we should see these as part of the digital experience and not just some ad hoc point solution. What does that experience look like? I remember earlier in the year when kind of COVID was at the peak of it, there were a group of data scientists and folks at the big hospital here, university hospital here in US, and their job was figuring out what dashboards do we have that represents the status of COVID and the responses to the hospitals and the availability of beds and so on. And they've cleaned that up and then build yet another holistic dashboard. I think this dashboard toxicities and other field of just cleaning things up and I love it. I just go back to David's comment. I love that comment that this is digital experience. We should think about it the same way that we are modernizing our businesses or our organizations plugging kind of that visualization, where it belongs.

David Colls:

Yeah. I loved Ned Letcher's comment to that about just another dashboard. I feel like it comes back to a fundamental principle that Ned and I were talking about the other day, maximizing the data to increase you. If you only need a binary indicator, then you don't need a whole dashboard. That's a classic example. I was thinking back to the days where built lights used to be a single build light rather than a big dashboard. And that was a great piece of rendering data in a way that it could be intuitively consumed. If it's green, go about your business. If it's red, then we need to fix the build. I think it comes back to those fundamental principles of design.

David Colls:

And in this case here, the data to increase you, if you only have a binary signal, then you don't need an entire dashboard conveyed that. But in terms of the trends around BI as well, I think there's also organizational trends playing into, especially the self-service aspect of BI. Do we need pre-canned dashboards or do we actually need autonomous teams that have the capability? As a team, it doesn't need to reside in one individual, or it doesn't need to be the case that every individual in the organization can do their own ad hoc analytics and data visualization, but as autonomous teams responsible for business outcomes, can we consume, navigate, understand and present the data that we need. And then there might be a range of different tools that come into play to do that from across the spectrum that Ned's describe from point and click visualizations through to more coded ecosystems and then sort of the high performance visualizations were required as well.

Ned Letcher:

I think that's a really good point in terms of empowering teams to answer their own questions. And if your organization has done the legwork to invest in a good data platform, people talk about the data value pyramid where you need the foundations right to sort of get. Up to the top we might have interventions on insights that maybe even machine learning, but even right down below, we've got charting and visualization, but they themselves require you to have a good foundation of, does the data flow through the organization in a timely and accurate manner? Is it prepared in a way that's ready to use? And if you do all that work and your team is empowered to access that data, then they can actually pose and answer their own questions. I think that's another trend where some of these tools are allowing analysts to be sort of closer to sort of the engineering kind of side of things.

Ned Letcher:

If they have the right abstractions in place, we're not sort of asking them to code up their own platform, but being given the sort the right tools. I think that's another interesting trend. But then perhaps related to that is, what about for smaller organizations, maybe you don't have the resources to invest in one of these sort of amazing data platforms and especially when you're earlier in your journey. And so there's actually a few service providers I've seen where they're trying to provide a bit more of an integrated solution. It's self service BI. It also runs in your data warehouse. And so it's actually trying to be a little bit of a data platform plus BI layer out of the box. And if you do a little bit of work setting it up, then perhaps there are small organizations that can actually ... And this is a question that I hear people talking about it's like, well, maybe for a certain size organization, you don't need a data engineer, which some people might find controversial. I think that's another interesting trend.

Zhamak Dehghani:

We're coming at the top of the hour. If you could leave the audience with one tip or recommendation, what would that be? If there are nuggets of gyms through the conversation, but is there one last comment that you want to leave the audience with?

David Colls:

For me it's, you'll never get the visualization right first time. It's an iterative process and in the same way that it helps you, the final visualization might help you discover questions, you discover questions, assumptions, this apprehensions that you had along the way in the same as running your code, we'll show you just exactly what you misunderstood about the problem and solution producing a visualization will show you what you misunderstood about the data and how it looks. It iterate and the end up will get better.

Ned Letcher:

Yeah, absolutely. I agree with Dave. Iterate, iterate, iterate. It's never the first plot that gets sort of the value that you're after, it's always a few. It says unknown unknowns. You don't know what you're going to find. Some tips from me. I think, one of the big ones is, we're assuming we're visualizing, but just remember, plot your data. Sometimes it's just tempting just to use those summary statistics, but they mislead. Always plot it and eyeball it and sanity check. And then in particular, when you're thinking about using it for communication, think about the consumer, remember that they don't have what's in your head. They may not know. You've had the perspective of looking at all the data, so really try and think about what do I need to title the plot? What do I need to put in axes? We have diverse sets of users so it's important to sort of pay attention to accessibility concerns, use color palettes that aren't going to be difficult for colorblindness and there's tools out there to sort of grab them.

Ned Letcher:

And there's a whole literature on that side of things. And then have fun. I love that discovery moment of where you plot it and you go, "Oh, wow. I didn't know that happen." Those are the moments that I love.

Zhamak Dehghani:

Wonderful. Well, thank you. Thank you for that. It looks like where we ended we could have a whole other podcast around just tips and tricks around data visualization, but we're going to stop here today. Thank you for joining us.

Ned Letcher:

My pleasure.

David Colls:

Thanks, thanks Alexey.

View full transcript

View less

More episodes

Episode name

Published

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights

Getting to grips with data visualization

Brief summary

Check out the latest edition of the Technology Radar