Brief summary
A vast array of powerful data visualization tools are gaining traction in enterprises looking to make sense of their data sets, for instance D3, Bokeh, Shiny and Dash. In this episode, our team explores to concept of data visualization as part of a complete digital experience, with the workflows and journeys of a wide variety of users.
Podcast transcript
Zhamak Dehghani:
Hi, everyone. Welcome to another episode of Thoughtworks Technology podcast. I'm one of your regular hosts Zhamak and I'm here today with my co-host Alexey. Hi, Alexey.
Alexey Boas:
Hello Zhamaki. I'm Alexey. I'm head of technology for Brazil, and it's great pleasure to be here with you all.
Zhamak Dehghani:
Wonderful. Today we're going to talk about a topic that's actually close to my heart, data visualization. And the reason we picked this topic is that we had a lot of conversations around visualization in general in the last few months when we were putting the latest version of our Tech Radar together. And it looks like as if the technology landscape, the developer ecosystem is becoming more and more complex. A picture, a visualization goes a long way in taking away that kind of cognitive load that we're all feeling. And our discussions were around, I guess, a spectrum of visualization tools around architecture visualization, infrastructure visualization, and data visualization. Today we have two guests with us, David and Ned to deep dive into data visualization landscapes. Hi, David.
David Colls:
Hi Zhamak. Thanks for having me.
Zhamak Dehghani:
Of course. Can you tell our audience a few things about you. I think they might recognize your voice because you're becoming one of our regular guests.
David Colls:
It's a pleasure to be back again. I lead the data and AI practice for Thoughtworks Australia, and I'm a visualization enthusiast from way back, keen to explore how we can apply it to all aspects of our work.
Zhamak Dehghani:
Wonderful to have you. Hi Ned.
Ned Letcher:
Hello. It's great to be here.
Zhamak Dehghani:
Wonderful. Can you say a few things to our audience about yourself?
Ned Letcher:
Yeah, sure. I'm a data science engineer at Thoughtworks in Australia. I started earlier this year and since joining, I subscribed to this podcast and I've been listening and really enjoying it. It's pretty cool to be on the other side.
Zhamak Dehghani:
Wonderful. We have a multi continental episodes from Brazil and Australia and US. It would be great to deep dive into what data visualization is about today.
Alexey Boas:
Yeah. And maybe that's a good starting point because we've been hearing a lot about data visualization, several different use cases, it's almost to an umbrella term. David and Ned, what do we mean by data visualization? What is it and why is it important at all?
David Colls:
When I think about it Alexey, I think about rendering data in a form that can be consumed intuitively as well as analytically. Often we do need to look at visualization through an analytical lens to use it to understand and quantify problems and effects. But the intuitive aspect of data visualization, I think, is what makes it so powerful because it allows us to spot anomalies and discrepancies easily without having to engage deeper thinking. But it also probes us to ask why type questions as well. Visualization helps us discover questions as well as produce answers to those questions. And so that's where I think the power of data visualization lies.
Ned Letcher:
Yeah. It's a really useful tool, super important to a lot of what we do with working with data and getting value from it. And I think, for me, one of the big reasons why it's important is, we're trying to understand what's going on with data. You want to sort of get a sense of what the data tells you. We can't understand an entire data set intuitively. And so what we are often trying to do is collapse it down to something that we can get a handle on. One way people do that is summaries statistics like means medians and those sorts of things. But they can often be deceptive.
Ned Letcher:
That the mean of two datasets can be quite similar, even things like standard deviation which is a bit more of a statistical sort of way. They can mislead you and so there's actually a really cool example of this called Anscombe data sets, a bunch of different datasets that have all the same high-level statistics, but it turns out when you plot them, they look radically different. They're like circles and weird lines that are totally different. It's just super important to actually get the data and look at it and see what's going on.
Alexey Boas:
It definitely looks very useful in many different ways for different people. We talk about data visualization to understand data sets and the connection to business intelligence for awhile and data science, machine learning, but also different types of consumers. From a business perspective, a lot of people talk about data dashboards and the importance of having them and being able to look at that. It's really something we feel that, applications in different ways and several different stakeholders or people who are interested in that. Is that so, is that true?
David Colls:
I think the key thing when designing a visualization and sometimes this might be a personal exercise where an individual analyst or data scientist is working with the data themselves and using visualization to try and understand it purely for themselves, but more commonly the visualizations actually going to be consumed by a wider audience. And in that case, you need to think hard about the audience and what their needs are with regards to the visualization. And it may be that there are people quite similar to you. There may be other data scientists or analysts in which case they'll come from a similar level of understanding about the visualization techniques you're using, or it may be a specialist business audience, or a specialist organizational audience that have particular conventions about representing data or may be very general and wide audience and we might even talk about, even the broader public.
David Colls:
We might even talk about data journalism in that case. How do you craft a narrative with data in a visual way that can be understood by a wide audience? And we've done some interesting work in that regard. We're the energy market operator here in Australia making the workings, inner workings of the energy market transparent as was the mandate of the market operator to run a transparent market. Telling that story through data visualization to the public and the techniques that you might use to tell that story to a wider audience are quite different to what you might do with your pursuing an individual visualization and understanding exercise.
Ned Letcher:
To add to that, I think what Dave said is quite right. It's really important to think about the purpose of the visualization when you're thinking about who the consumer is. There's different applications of database. We can talk about discovery of insights. And so if you're a data scientist or analyst, there's exploratory data visualization for sort of seeing what's going on, and then that sort of means that you as the practitioner kind of the consumer but then we could also be talking about database for communication, and so we could be building out dashboards or presentations. And then in that case, indeed, we're talking about business stakeholders and they're going to have different needs, different desires of understanding and they'll have ...
Ned Letcher:
I guess you really have to sort of exercise your theory of mind and do some research and work out what do they want to get out of it? And this is where we can see agile practices being important, because you don't always just want to sort of throw the visualization at them and then sort of hope that it works. You want to actually get feedback and check that it's sort of doing what it needs to do.
Zhamak Dehghani:
Just to contextualize this for our audience, it looks like we have to put the audience or the consumer of the data visualization at the center of that practice. From the moment we decide how are we going to visualize the data, we have to think about who is going to be using it? What characteristics maybe they have? Can you kind of put this in some examples or archetypes. Are there any specific archetypes of consumers or consuming use cases that you could kind of enumerate for the audience and give a little bit of an insight into if you're doing visualization for this archetype for a data scientist, I know you started as an exploratory phase of data science, myself as a data scientist is a consumer, and these are the things like we need to consider. Can you elaborate on that, a few archetypes and then deep dive into what sort of things we need to think about in terms of the approach or the characteristics of data visualization.
Ned Letcher:
Sure. I can talk a little bit about maybe sort of from the exploratory data analysis side of things and from data science, then maybe Dave might have a few thoughts about sort of the business side of things. I think often you're sort of talking about a context where you've got a new data set and you're sort of trying to work out what's going on. And so it's not just data visualization. It's the whole sort of process. We're trying to work out, do we need to do any cleaning? Is there anything going wrong with the data? And so I'll open up the data set for me, that's often in a tool like a Jupyter notebook I'm using Python. And visualization is sort of one of the first steps just to sort of eyeball it.
Ned Letcher:
We can actually use tools to say, "Well, is there missing data," so we can visualize those missing values or numerical values. And then at that point, you start to think about what kind of data do I have. Is this a time series data? Are we talking about sort of pedestrians moving around the city, which is an example I've looked at recently. So then we're sort of potentially talking about my mind shots and it's not always clear what the plot type that you want is going to be. And so sometimes there's a degree of experimentation and I think that's another sort of important key thing, is iterating on the visualization. You'll try something and see how well it works.
Ned Letcher:
And then you'll sort of try again. And so a lot of the time with these data sets, there's this big sort of high dimensional data sets and maybe we need to work out, is this going to be a heat map, do we need to sort of collapse this down? Do we need to do some cleaning? But again, it sort of really depends on the process. And I guess, even though I said before, it's important to just think about the fact that there's different sort of consumption patterns and you're the consumer here as doing the exploratory data analysis. It's still really important to sort of do things like add titles to your plots and label them because just like your comment code, you don't want to get back and go, "What was this plot all about?" I can see there's something going on the Y axis, is this kind of sales? What is it? Aggregated on the X axis? And so I guess it's important to sort of think about those little details as you're going through the analysis.
David Colls:
Yeah, absolutely Ned. I think we need that type of analysis, so I'm always conscious of future Dave as the audience. Future Dave will have no idea what present Dave meant when he put that visualization together. Really try and make everything as explicit as possible. But in the business context, it actually helps to think of it, visualization, not as a separate activity at all, but as any kind of digital experience as you might approach any kind of digital experience and delivering that digital experience to an audience because visualization doesn't exist for its own sake. It's often there to provide insight, to tell a story or to inform action. And so starting, even from the user experience perspective of understanding your audience, their pains and their gains in their regular work for a standard dashboard is pretty key for the design there for ad hoc analysis.
David Colls:
That's generally a more lightweight process, but if you have differentiating between a kind of run-the-business question, what we do every day, say in retail scenario, how much stock should we order based on stock levels and sales in the recent past. And that would be the kind of thing that you would find on a run the business dashboard versus ... And that can be designed as a full digital experience and actually delivering the product with a digital tool set that allows you to manage the visualization as code actually gives you a lot of flexibility in how you do that without a lot of additional overhead. But then if you're looking at other business scenarios such as more ad hoc descriptive or comparative type queries like, "What happened here? Or how does this compare to other scenarios?"
David Colls:
We don't have a lot of stock in the shelves. Or you might've noticed that we don't have a lot of stock in the shelves, let's run a query and visualize exactly the parameters around that. Or is this similar to last Monday or last weekend when we were also missing stock. Those sort of ad hoc scenarios you don't necessarily invest the same amount of effort in the user experience in those, but it's still pretty key to understand your audience.
Zhamak Dehghani:
Yeah. I love what you just said Dave, that data visualization is just a piece of the digital experience and digital workflows and digital journeys of different users. And integration of that piece to the full journey experience is an important one. And sometimes as you mentioned, that integration involves just showing some data and sometimes it involves interacting with data and querying and questioning, and then looking at it differently to satisfy answering questions or whatever journey they're on. I think that's a wonderful framing of where it fits into the suite of applications for business.
David Colls:
And often there are different audiences as well for the same piece of information in the same way as your digital delivery team might have statistics about deployment frequency that need to be translated into something that's meaningful for the business. That might be very helpful for the team to manage their own work. We see the same thing in visualization, and so there might be multiple layers of an analytics visualization for instance the visualizing the loss or performance of a machine learning model in a way that can drive the training cycle. But then that has to be interpreted in terms that the business can understand. Potentially, what is the revenue uplift as a result of this improvement in performance? Or what is the time saving or cost savings as a result of this improvement in performance?
Alexey Boas:
Yeah, those different levels I also find it striking the power of communication that they bring. Ned was talking about communication. It reminds me of one client retailer we were working with years ago, and they were interested in having a dashboard around inventory. And it turned out that they had five or six different definitions for inventory, depending on different areas of the company. And they all approach it in different ways. They were all correct because they had different perspectives on the same concept and even driving that conversation and allowing them to look from those different perspectives on how does that change the operational process. It's really, really very powerful.
David Colls:
Yeah. I think one of the great things about visualization is it allows you to put things in context as well. And related to that comment about future Dave needing to make everything explicit, you can make those different definitions of inventory explicit. You can find the right visual metaphor, such as a waterfall chart, for instance, to show how components or the relationship that set type relationship, add these things together and you get this definition or take this away, take this definition of inventory away from this other definition and this is what you're left with as another definition. And so this can help people understand the relationship between things. We had a similar situation working with a call center a few years back where there were at least seven different definitions of the duration of a call and actually by producing a fairly raw visualization where we recreated the flow of a call through the call center then it was quite clear, the difference between those different definitions of call duration. What other events they related to through the lifecycle of the call?
Ned Letcher:
Yeah, that's it. That's a super interesting example, Dave. I think it actually gets at, in particularly we're talking about business applications of visualization. It's not like the analyst just sort of sits down with some data and sort of has added there. We're sort of talking about the output of often somewhat involved journey towards getting data ready for preparation. That sort of organizations have data platforms and data has to be integrated from various different sort of CRM, internal databases, web analytics sort of sources. And then data engineers might be involved in sort of curating. And then we're getting to the point, then finally, we've sort of got this nice sort of claimed tables where they're suitable for doing visualizations.
Ned Letcher:
And that's not just a sort of a purely abstract data process. Obviously it's super important to understand what's the business questions that we need to ask, what are the key metrics of sort of performance? Key performance indicators or KPIs and an interesting thing there is that, you often realize those definitions on actually a clear cut. They might be described in say a confluence page in plain language and language is often ambiguous. And so people kind of come up with different interpretations of it. There's actually an interesting trend that I've been noticing whereby we see some parts of the BI process actually being incorporated into the data warehouse and the modeling stages. And so then you're talking about sort of converging on single definitions of KPIs that multiple consumers. The data scientists and the analysts don't end up having to recreate the implementations of KPIs, for example. It's super interesting, because we're not just talking about data as such, it's this sort of intersection of business and data concerns which makes it all that much more interesting and challenging.
Zhamak Dehghani:
I want to move us a little bit into tooling. One thing I noticed that when we were putting the radar together, there were a lot of existing tools that were resurrecting again, and we're talking about them bulky. And there were a number of newer tools Dash and Shiny. And there was this really big spectrum of, I suppose, tools from simple Python libraries for you to just try something. And in a quick visualization to full blown enterprise web applications, we did a little visualization at the center of it. I wonder if you can kind of unpack the landscape of tooling today, what flavors of different tools we have and what trends you see in that to kind of guide the ideas to navigate this space.
Ned Letcher:
Yeah, sure. And it can be a bit overwhelming indeed, because there's so many different tools and flavors of tools out there. I think it's important to sort of contextualize and say, "Well, what kind of problems are we solving?" And one way that I think you sort of alluded to that we can sort of break things down a bit because we sort of have this sort of BI landscape of tools. We have things like, a lot of times people are using even Excel but then more sort of directed at that purpose like Looker and Tableau and Power BI, for example. I guess with those sorts of tools we're talking about enterprise applications that obviously a bit more off the shelf, they've got certain usage patterns that they'll work well with. And I guess there's a little bit less flexibility in terms of if, as opposed to the other family of tools that you mentioned, all these Python NR or even JavaScript tools.
Zhamak Dehghani:
It looks like we have a set of tools that are on the consumer facing, the tag on top of the existing BI or business intelligence package solutions. And they give you some things out of the box easily, but they're more rigid. And on the other end of the spectrum, we've got small tools, D3, I'm a big fan of D3 because they can quickly put a file together, JCL file together and get this beautiful, delicious kind of visualization made. So you have D3 and maybe Bokeh and those tools and something in the middle, maybe Dash and Shiny that they have their own application. Do you think there is an element of integration that people should care about?
Zhamak Dehghani:
I wonder how we can guide the audience to say, "This category is good for this sort of use cases." And if you're thinking about dimension of, let's say, integration. If you want to integrate your solution with an enterprise package, these are the solutions to pick. If you just want to prototype quickly, this is another set of ... I guess we can kind of dissect the taxonomy of the tooling on there. So many different dimensions here.
Ned Letcher:
Yeah. I think that's a good question. A lot of the BI tools do have the ability to call out to Python packages. What's interesting there is that it's often, it's kind of in those enterprise tools own terms. They might have a built in Python environment and you don't have control over that. There's problems with that where you don't know which version of Plotly or bouquets that you're getting out of the box, and it may be hard to do version control, for example. A pattern that I've seen regarding integration that I think works quite well is where you sort of have an API around slightly upstream of the BI tool. And so you can see this with, for example, Looker and they have this sort of look ML modeling in the warehouse, and then you can create client or a rest API from that.
Ned Letcher:
And now we can sort of have our Python users out data scientists tapping into the same models that are being fed to Looker. And there's a few other sort of tools that can help with that besides Looker. And I sort of see that as being a slightly, perhaps better way of integrating those two different worlds where you sort of define your KPIs in one place and we can bring our own best practice engineering tools to whether it's with Python or JavaScript.
David Colls:
I tend to look at the ecosystem at a higher level. I don't have the depth of experience of Ned with the current crop of tools. And I tend to think in terms of, are we using well understood data with well understood requirements on one access through to, is it a more type of exploratory analysis or is it new data that we don't understand well? And so that's the access of the data. And then when it comes to the presentation, are we going to be using standardized presentation techniques? The types of things you might find in your Excel chart library or in Tableau out of the box, or do we need actually quite customized presentations to help us explore data or to tell a story in a unique way.
David Colls:
And so that gives me a quadrant diagram, if you can just visualize that in your head for me. And then the four quadrants in that diagram, we could talk about the sort of operational use cases as I talked about before the BI dashboard. This is well understood data and standard presentation or we could look at the power user scenario and self-service is becoming a much bigger driver for business analytics these days as well. This is more of the ad hoc analysis, the questioning what's happening with the data. The power user might be looking at some new data that they would be looking at it through standard lenses of the charts that are available in a typical library.
Zhamak Dehghani:
Yeah, I love it. I can see from this conversation, there's multidimensional matrix for me in my head that if I'm a power user and I don't really know what the data is versus I'm the end user, but I want some sort of a pre modeled visualization accessible through API, this is kind of the landscape. I'm curious though, is there another access here around data science oriented workflows where you deal with a large amount of data, so the size and the volume of data Needed, or maybe any other aspects of data science that we should think about when we pick a tool. And if you have examples of that, what could be suitable for those use cases.
Ned Letcher:
Yeah. And I think something was sort of just been alluding to the ... Really sort of picks up. Perhaps data size and volume is certainly part of it. But even before that, I think what you're trying to do with it, and in the context of data science, it's often a bit more novel, you haven't done it yet before. There's this exploratory sense that I talked about before, and we are talking about statistical techniques. We do need maybe perhaps slightly more sort of plot types that you may not expect to get from your BI tools. Coming from a data science perspective myself, I was using a BI chatting tool recently, and I just went to go and plot histogram, and I realized it didn't actually have a histogram, which surprised me because that seems like a basic chat type, but it turns out for a lot of business use cases, that's not something that you might be likely to see as much in a dashboard or a sort of an automated reporting kind of scenario.
Ned Letcher:
I think that's a particular dimension in terms of just what are the use cases and that sort of divides things up a little bit. And then we can go a bit deeper, we can sort of talk about, well, if your use cases are a little bit more statistically oriented, there are some packages that will work better. So say for example, Seaborne, which is built on top of map plot lib, Python libraries, it sort of has its Genesis in supporting interesting heat maps with dendrogram. So you get these tree sort of structures over the top of the heat maps, where you can unpick apart relationships between clusters. I've used that before when I'm trying to analyze email agents answering different calls. Responding to different chats on different topics.
Ned Letcher:
And you're trying to work out the patterns between the texts that you see in those. That was a natural language processing use case. We're doing topic modeling and try to get an intuitive sense of what are the relationship between these topics that we used to heat map to light up. It really depends on the use case is I guess one, one thing, what sort of plot types are you going to be using? I mentioned before time series, that's obviously super common and we're talking about predictive forecasting of whether it's sales, is obviously a common one. And then sometimes you will need to use tools to help you with massive data sets. A common approach there is to down sample and to just sort of take a smaller set of the data that will work in your memory.
Ned Letcher:
But if it's important that you do visualize every single point, there are tools out there, datashader is a really good one in the Python landscape for doing that. And it can even help you use a GPU rendering to sort of, I think one of the demos I've seen is a visualization of the US census, where it actually renders each point depending on the answer to a question. I think there are some basic sort of source that will help you do a lot of general purpose sort of data science and analytics, but then you can get really specific for different use cases.
Zhamak Dehghani:
No pressure, but I think there's an article I can see David and Ned co-authoring around mapping the landscape to different use cases for it.
David Colls:
Absolutely the data visualization map.
Ned Letcher:
Visualizing the visualization landscape.
Alexey Boas:
On that note, I mean, we talked a bit of several different tools and different use cases. I'm curious, how do you see the BI landscape changing? And Ned, you talked a little bit about incorporation of BI layer into the data warehouse. Is there still a place for the more traditional BI dashboards? We talked about operational dashboards, but some of these tools have evolved in different ways, but we still have some of the more traditional players, but together with new players. How do you see that landscape moving? And even if we consider the new sets of managed services that cloud providers offer and lots of new things Shiny, but how do you see that part of landscape evolving?
Ned Letcher:
Yeah, there's a few interesting trends I've sort of been trying to unpack some of these. I think one big one is that most of these tools are now cloud-based now that you sort of dial up in a browser. I think obviously you can use Excel locally, but I think that even now has a cloud version. As far as I can tell, perhaps there might be some aspects of power BI that you might need to use the local client, but even then it has a web version. Otherwise, the rest is in the cloud. One interesting trend that I've, I mean, I'm not yet sure that I would call it a trend, perhaps it is. It's sort of questioning the value of dashboards, which is interesting. I think I've seen a number of Twitter threads of practitioners sort of groaning about, "Ah, I've been yet another dashboard to make." They've asked me to sort of create a dashboard that gives me a handful of KPIs, whether it's sales, who knows.
Ned Letcher:
The stakeholders have added and then maybe they don't really hear from the stakeholder and they deliver it. And it's not clear if it's being used. I think there's a few things there. One is that I think I hear people saying things like, Well, maybe you don't always need a dashboard." Maybe sometimes people actually want to feel like they have agency or control, it makes them feel safe, sort of seeing these metrics going along and really what they want to be able to do is just log in and go, "Yeah, everything's green. We're good." Whereas in that situation, it sounds like what you really need is an alerting system to let you know when an important KPI goes red and then a Slack message or whatever.
Ned Letcher:
And then it's like we can actually go and dig into what's going on. And then I guess another trend that's related to that is that some BI services are actually moving more towards the notebook side of things which has become popular out of data science. Our markdown notebooks and Jupyter notebooks in the Python and also other languages and that's a really great way of doing this sort of often referred to as literate coding, where you have documentation in line with the code that produces it. And it's obviously particularly good for database because you can produce the outputs of the code and have documentation describing it. This is sort of perhaps an approach that some people are saying maybe this works well for a BI context where we can actually have these interactive documents. And it's now somewhere between a living document and a dashboard that kind of talks about what's going on. That's an interesting trend that we'll be curious to see how that plays out.
Zhamak Dehghani:
The point you made about this kind of ad hoc dashboards, this dashboard or that dashboard, I think it comes back to David's comments earlier, that we should see these as part of the digital experience and not just some ad hoc point solution. What does that experience look like? I remember earlier in the year when kind of COVID was at the peak of it, there were a group of data scientists and folks at the big hospital here, university hospital here in US, and their job was figuring out what dashboards do we have that represents the status of COVID and the responses to the hospitals and the availability of beds and so on. And they've cleaned that up and then build yet another holistic dashboard. I think this dashboard toxicities and other field of just cleaning things up and I love it. I just go back to David's comment. I love that comment that this is digital experience. We should think about it the same way that we are modernizing our businesses or our organizations plugging kind of that visualization, where it belongs.
David Colls:
Yeah. I loved Ned Letcher's comment to that about just another dashboard. I feel like it comes back to a fundamental principle that Ned and I were talking about the other day, maximizing the data to increase you. If you only need a binary indicator, then you don't need a whole dashboard. That's a classic example. I was thinking back to the days where built lights used to be a single build light rather than a big dashboard. And that was a great piece of rendering data in a way that it could be intuitively consumed. If it's green, go about your business. If it's red, then we need to fix the build. I think it comes back to those fundamental principles of design.
David Colls:
And in this case here, the data to increase you, if you only have a binary signal, then you don't need an entire dashboard conveyed that. But in terms of the trends around BI as well, I think there's also organizational trends playing into, especially the self-service aspect of BI. Do we need pre-canned dashboards or do we actually need autonomous teams that have the capability? As a team, it doesn't need to reside in one individual, or it doesn't need to be the case that every individual in the organization can do their own ad hoc analytics and data visualization, but as autonomous teams responsible for business outcomes, can we consume, navigate, understand and present the data that we need. And then there might be a range of different tools that come into play to do that from across the spectrum that Ned's describe from point and click visualizations through to more coded ecosystems and then sort of the high performance visualizations were required as well.
Ned Letcher:
I think that's a really good point in terms of empowering teams to answer their own questions. And if your organization has done the legwork to invest in a good data platform, people talk about the data value pyramid where you need the foundations right to sort of get. Up to the top we might have interventions on insights that maybe even machine learning, but even right down below, we've got charting and visualization, but they themselves require you to have a good foundation of, does the data flow through the organization in a timely and accurate manner? Is it prepared in a way that's ready to use? And if you do all that work and your team is empowered to access that data, then they can actually pose and answer their own questions. I think that's another trend where some of these tools are allowing analysts to be sort of closer to sort of the engineering kind of side of things.
Ned Letcher:
If they have the right abstractions in place, we're not sort of asking them to code up their own platform, but being given the sort the right tools. I think that's another interesting trend. But then perhaps related to that is, what about for smaller organizations, maybe you don't have the resources to invest in one of these sort of amazing data platforms and especially when you're earlier in your journey. And so there's actually a few service providers I've seen where they're trying to provide a bit more of an integrated solution. It's self service BI. It also runs in your data warehouse. And so it's actually trying to be a little bit of a data platform plus BI layer out of the box. And if you do a little bit of work setting it up, then perhaps there are small organizations that can actually ... And this is a question that I hear people talking about it's like, well, maybe for a certain size organization, you don't need a data engineer, which some people might find controversial. I think that's another interesting trend.
Zhamak Dehghani:
We're coming at the top of the hour. If you could leave the audience with one tip or recommendation, what would that be? If there are nuggets of gyms through the conversation, but is there one last comment that you want to leave the audience with?
David Colls:
For me it's, you'll never get the visualization right first time. It's an iterative process and in the same way that it helps you, the final visualization might help you discover questions, you discover questions, assumptions, this apprehensions that you had along the way in the same as running your code, we'll show you just exactly what you misunderstood about the problem and solution producing a visualization will show you what you misunderstood about the data and how it looks. It iterate and the end up will get better.
Ned Letcher:
Yeah, absolutely. I agree with Dave. Iterate, iterate, iterate. It's never the first plot that gets sort of the value that you're after, it's always a few. It says unknown unknowns. You don't know what you're going to find. Some tips from me. I think, one of the big ones is, we're assuming we're visualizing, but just remember, plot your data. Sometimes it's just tempting just to use those summary statistics, but they mislead. Always plot it and eyeball it and sanity check. And then in particular, when you're thinking about using it for communication, think about the consumer, remember that they don't have what's in your head. They may not know. You've had the perspective of looking at all the data, so really try and think about what do I need to title the plot? What do I need to put in axes? We have diverse sets of users so it's important to sort of pay attention to accessibility concerns, use color palettes that aren't going to be difficult for colorblindness and there's tools out there to sort of grab them.
Ned Letcher:
And there's a whole literature on that side of things. And then have fun. I love that discovery moment of where you plot it and you go, "Oh, wow. I didn't know that happen." Those are the moments that I love.
Zhamak Dehghani:
Wonderful. Well, thank you. Thank you for that. It looks like where we ended we could have a whole other podcast around just tips and tricks around data visualization, but we're going to stop here today. Thank you for joining us.
Ned Letcher:
My pleasure.
David Colls:
Thanks, thanks Alexey.