Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Part two: the 4 step framework for federated data governance

Part two: the four step framework for federated data governance

This is the second edition of a two-part series on data governance. Read the first part on leveraging the potential of data with good governance, here.

 

Sound data governance will deliver organization-wide outcomes. With cataloguing and quality management, data governance ensures trust in data. It will reduce time to market by ensuring discoverability, transparency, accessibility and ease of use. It will also facilitate integrations to expand the partner ecosystem and participation in open data initiatives. 

 

It goes without saying that data governance gives organizations the power to leverage their data efficiently, effectively and securely. But, implementing an organization-wide data governance program is no easy task. Based on our experience working with a leading Danish investment bank, we outline the top things to keep in mind while implementing a data governance program.

 

Step 1 - Begin with an 'as-is' state analysis and stakeholder engagement

 

Before even getting to the drawing board, understand the vision and the key drivers of your data governance program. Assess the current state before diving into any solutioning. Based on this understanding, conduct persona interviews with different data stakeholders and document their pain points and goals. Build a feature matrix that corresponds to expectations from the data governance solution along with priority and criticality of the features.

 

We segment our lessons and project expectations in the following way:

 

Business

 

 

Process

 

Technical

 

 

Financial

 

 

World-class customer experience

 

Federated governance

 

 

Alignment to data mesh 

 

Total cost of ownership

 

Package and sell asset management products

 

Self-serve platform

Alignment of the current implementation and target data architecture

 

Reduced operating costs

Introduce subscription models to clients

 

     

Industrialize the wholesale offering

 

     

Step 2 - Select your tools

 

It is essential to choose tools that are contextually appropriate for the organization. While doing so, remember: 

 

  • No one tool fits all requirements

  • Commercial tools have a long list of features, but more is not always merrier

  • For open-source tools, the community behind the tool, roadmap visibility, architecture, extensibility etc. are important

     

With that in mind, select your tools carefully: 

 

  • Analyse the current data ecosystem (source/analytical stores, data processing and pipelines, consuming applications etc.). Based on this analysis, list your non-negotiables and priorities. For instance, you might need a tool that can align with the data mesh principle or enable data office

 

  • Identify candidate tools from open-source and commercial offerings. Perform the first-pass elimination using a feature matrix, reflecting the gap in the existing data ecosystem, stakeholders' priorities and feature expectations from the tool 

 

  • Finalize the toolkit based on secondary research, proofs-of-concept, workshops and demos and interactions with product vendors. Use predefined questionnaires to guide conversations

 

Step 3 - Implement the tools 

 

We’d recommend the following steps:

 

  • Install and configure tools in the existing client ecosystem 

  • Deploy the tool in the on-cloud or on-prem environment 

  • Set up DevOps pipelines 

  • Create a forking/synching strategy 

  • Implement observability and alert mechanisms 

     

While implementing the data catalogue, bring together the technical metadata, ownership, lineage and business glossary, pushing the metadata repository through defined API/interfaces and making it available through the discovery interface.

 

For data quality implementation:

 

  • Define DSL to facilitate data producers to create quality rules that are pushed to the respective domain repository

  • Build pipelines to provision the automated jobs, perform data quality checks and emit the results 

  • Collect quality results/metrics, push to metrics stores and make them available to producers/consumers 

 

Set up a culture of data domain ownership. Form a data governance committee with well-defined roles and responsibilities. Create domain ownership, where domain teams are responsible for owning and managing the data. 

 

Step 4 - Empower your stakeholders

 

Upon successful implementation, your stakeholders — the consumers and data product owners — can perform specific functions.

Consumer

 

Data product owner

 

  • Discover data products

  • View metadata

  • View data lineage to understand the data flow better

  • View quality snapshot along with metadata

  • Access quality notifications/alerts for the datasets of interest

     

     

  • Link data products to the business terms, encouraging consumption 

  • Avoid data inconsistencies

  • Define information classification at the data element level

  • Profile the data before onboarding

     

     

Such a data workbench will enable domain teams to publish transparent and trustworthy data for consumers and pave the way to onboarding new partners with increased confidence. 

 

Data governance is essential for any organization working with big data because its implications are broader than just technology. It is cross-cutting, involving data collection and storage, data security of explicit and derived personally identifiable (PII) data, management of consent, algorithmic design, product design, organisational incentives, etc. 

 

This presents a unique opportunity to use technology and working models to eliminate risks and bias in data-driven solutions. Good solutions require diverse multi-disciplinary teams, tools  that can be use autonomously, scalable governance models and more. Data mesh is a perfect fit for this use case. 

 

 

Future of data governance: Data Mesh

 

Zhamak Dehghani, Director of Emerging Technologies and Founder of Data Mesh, defines it as a novel concept that embraces the ubiquitous data in an organization by the convergence of distributed domain driven architecture, self-serve platform design and product thinking with data. Below are the features that set it apart: 

 

  • Data as product (process & data) 

  • Data responsibility decentralized by domains

  • Data producers are data owners and empowered by self-service platform/capabilities)

  • Federated governance enables centralized data governance, data quality and data life cycle 

     

To learn more about Data Mesh, click here.  

 

This blog is a shorter version of a white paper from the 21st International Conference on Electronic Business (ICEB 2021).

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.