This is the first of a two-part series on data governance. Read the second part on things to consider while implementing data mesh-driven governance here.
Data is as ubiquitous as the desire to make use of it. And, organizations are getting more and more cognizant of its potential. Data production is increasing exponentially and so is its complexity. The following trends are driving how organizations leverage data:
Large user base that like to consume the data in their decision making
Partner-ecosystem-exchange of data with other organizations and the advent of platforms
Complex and tightening of regulations
More business users are consuming data alongside reduced data latency by lowering tech barriers
These trends also bring with them several challenges, which we categorize as external and internal risks.
External risks for data management
Security and privacy: organizations are responsible for protecting the security and privacy of data subjects. Not doing so is expensive — the global average cost of a data breach is $3.86M USD and the average cost per lost or stolen record in a data breach is $150M USD.
Compliance: a global organization needs to comply with the regulations of all the countries it operates in. These can be diverse and sometimes contradictory. There are data-specific regulations such as GDPR, CCPA, etc. other than the many new laws that are cropping up. Non-compliance can be expensive.
Brand and reputation: the share price of companies with a data breach was an average of 13% lower than the NASDAQ Index three years after the incident. Over and above, they have a larger obligation to social responsibility.
Internal productivity challenges and governance risks
Trust: the biggest challenge is data discovery and trust. When it is hard to find what data exists, how was it used before, where it is from etc., users tend to trust it less. To address productivity needs for the business, data should be easily accessible, discoverable and trustworthy.
Governance risk: knowing who accesses data, what they do with it, where they put it and how it gets consumed downstream is essential because it directly impacts its security.
To overcome these challenges and leverage the full potential of data, organizations need robust data governance.
Good governance requires balance and adjustment and when done well, it can fuel digital innovation without compromising security.
Understanding data governance
Data governance is a data management function that ensures the quality, integrity, security and usability of the data collected by an organization. Good data governance can:
Abide by security and governance vows
Provide efficient and transparent controls over data
Facilitate and support federated delivery
Make data visible and trustworthy
Decentralize data ownership
Traditionally, data governance is a centralized function, but the data mesh paradigm requires federated data governance to support distributed domain driven architecture and product thinking for data. Most importantly, it can ensure trust in data, including discoverability, security and accountability with two essential frameworks: data catalogue and data quality.
Key tenets of federated data governance
Data catalog
Data catalogues are data descriptions that serve as a metadata inventory, giving users the information necessary to evaluate data accessibility, health and location. They assist the user in locating relevant data and charting clear data representation. To be efficient, organizations need automated, scalable and distributed data catalogues. This includes:
- Technical metadata: describes the organization and structure of the data objects such as tables, events, objects, attributes with their types, lengths, indexes and connections
- Ownership: information that captures the relationship and origin of data
- Data lineage: facilitates distributed discovery, including field-level lineage with automated tables that map upstream and downstream dependencies
- Business glossary: provides an agreed-upon understanding of key business concepts, terms and the relationships between them
Data quality
Self-service data quality enables organizations to define and implement quality rules. A gap in data quality leads to a lack of credibility and loss of opportunity owing to dark data assets. Whereas good quality data enables:
Data discovery and trust
Avoidance of data skew
Automatic errors detection
Visual lineage
Empowerment of end-users to export, report and edit the rules
- Assuaging enterprise security concerns
- Breaking of silos
Data ownership
Data governance is more than a tech implementation. It’s a cultural change enabled by data stewards, data product managers and data domain owners. To have a successful transition towards data ownership, Vial suggests three essential mechanisms:
- Structural mechanisms that include organizational elements such as the creation of special roles, official policies and rules
- Procedural mechanisms used by the organization to ensure compliance with structural mechanisms, such as data audits and reviews. We would also add incentives to this bucket
- Relational mechanisms that include key activities such as communication and informal employee mentoring
As the volume, variety and velocity of data increases, newer challenges will emerge. It will be more complex to adhere to compliance requirements in dynamic environments. Integration of tools might grow tough, often incompatible with organizational needs. Additionally, change management can take time.
Organizations need a strategic approach to implement an organization-wide program keeping all the above factors in mind. In the next edition of this two-part series on data governance, we explore how you can successfully do so.
This blog is a shorter version of a white paper from the 21st International Conference on Electronic Business (ICEB 2021).
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.