Data mesh is a decentralized organizational and technical approach in sharing, accessing and managing data for analytics and ML. Its objective is to create a sociotechnical approach that scales out getting value from data as the organization's complexity grows and as the use cases for data proliferate and the sources of data diversify. Essentially, it creates a responsible data-sharing model that is in step with organizational growth and continuous change. In our experience, interest in the application of data mesh has grown tremendously. The approach has inspired many organizations to embrace its adoption and technology providers to repurpose their existing technologies for a mesh deployment. Despite the great interest and growing experience in data mesh, its implementations face high cost of integration. Moreover, its adoption remains limited to sections of larger organizations and technology vendors are distracting the organizations from the hard socio aspects of data mesh — decentralized data ownership and a federated governance operating model.
These ideas are explored in Data Mesh, Delivering Data-Driven Value at Scale, which guides practitioners, architects, technical leaders and decision makers on their journeys from traditional big data architecture to data mesh. It provides a complete introduction to data mesh principles and its constituents; it covers how to design a data mesh architecture, guide and execute a data mesh strategy and navigate organizational design to a decentralized data ownership model. The goal of the book is to create a new framework for deeper conversations and lead to the next phase in maturity of data mesh.
Increasingly, we see a mismatch between what data-driven organizations want to achieve and what the current data architectures and organizational structures allow. Organizations want to embed data-driven decision-making, machine learning and analytics into many aspects of their products and services and how they operate internally; essentially they want to augment every aspect of their operational landscape with data-driven intelligence. Yet, we still have a ways to go before we can embed analytical data, access to it and how it is managed into the business domains and operations. Today, every aspect of managing analytical data is externalized outside of the operational business domains to the data team and to the data management monoliths: data lakes and data warehouses. Data mesh is a decentralized sociotechnical approach to remove the dichotomy of analytical data and business operation. Its objective is to embed sharing and using analytical data into each operational business domain and close the gap between the operational and analytical planes. It's founded on four principles: domain data ownership, data as a product, self-serve data platform and computational federated governance.
Our teams have been implementing the data mesh architecture; they've created new architectural abstractions such as the data product quantum to encapsulate the code, data and policy as an autonomous unit of analytical data sharing embedded into operational domains; and they've built self-serve data platform capabilities to manage the lifecycle of data product quanta in a declarative manner as described in Data Mesh. Despite our technical advances, we're still experiencing friction using the existing technologies in a data mesh topology, not to mention the resistance of business domains to embrace sharing and using data as a first-class responsibility in some organizations.
Data mesh marks a welcome architectural and organizational paradigm shift in how we manage big analytical data. The paradigm is founded on four principles: (1) domain-oriented decentralization of data ownership and architecture; (2) domain-oriented data served as a product; (3) self-serve data infrastructure as a platform to enable autonomous, domain-oriented data teams; and (4) federated governance to enable ecosystems and interoperability. Although the principles are intuitive and attempt to address many of the known challenges of previous centralized analytical data management, they transcend the available analytical data technologies. After building data mesh for multiple clients on top of the existing tooling, we learned two things: (a) there is a large gap in open-source or commercial tooling to accelerate implementation of data mesh (for example, implementation of a universal access model to time-based polyglot data which we currently custom build for our clients) and (b) despite the gap, it's feasible to use the existing technologies as the basic building blocks.
Naturally, technology fit is a major component of implementing your organization's data strategy based on data mesh. Success, however, demands an organizational restructure to separate the data platform team, create the role of data product owner for each domain and introduce the incentive structures necessary for domains to own and share their analytical data as products.
Data mesh is an architectural and organizational paradigm that challenges the age-old assumption that we must centralize big analytical data to use it, have data all in one place or be managed by a centralized data team to deliver value. Data mesh claims that for big data to fuel innovation, its ownership must be federated among domain data owners who are accountable for providing their data as products (with the support of a self-serve data platform to abstract the technical complexity involved in serving data products); it must also adopt a new form of federated governance through automation to enable interoperability of domain-oriented data products. Decentralization, along with interoperability and focus on the experience of data consumers, are key to the democratization of innovation using data.
If your organization has a large number of domains with numerous systems and teams generating data or a diverse set of data-driven use cases and access patterns, we suggest you assess data mesh. Implementation of data mesh requires investment in building a self-serve data platform and embracing an organizational change for domains to take on the long-term ownership of their data products, as well as an incentive structure that rewards domains serving and utilizing data as a product.
Data mesh is an architectural paradigm that unlocks analytical data at scale; rapidly unlocking access to an ever-growing number of distributed domain data sets, for a proliferation of consumption scenarios such as machine learning, analytics or data intensive applications across the organization. Data mesh addresses the common failure modes of the traditional centralized data lake or data platform architecture, with a shift from the centralized paradigm of a lake, or its predecessor, the data warehouse. Data mesh shifts to a paradigm that draws from modern distributed architecture: considering domains as the first-class concern, applying platform thinking to create a self-serve data infrastructure, treating data as a product and implementing open standardization to enable an ecosystem of interoperable distributed data products.