Today’s high velocity, volume and variety of data has rendered data governance a business requirement. Given the massive amounts of data generated internally and easily available, externally – a disciplined approach to manage all this information is the need of the hour.
A data governance strategy helps maintain data privacy and meet regulatory compliances and regulations. Such a strategy consists of policies, standards, roles and processes that ensure the proper use of data, its availability, integrity, usability and security. There are also use cases in analytics support, operational efficiencies and consumer behavior which result in operational efficiency, better forecast results, increased end-user self-service, increased alignment and member growth.
Large scale enterprises with their several departments, more often than not, witness the same data being collected in different ways and formats. Sometimes, even the values of a particular captured data could differ from department to department.
For example, a business function might attribute an item as ‘missing’ by not giving it a value while another might use a specific value to denote an item is ‘missing’. This inconsistency in data extends across the organization and impacts processes and analytics down the line – like when using this data to create a predictive model.
Data governance ensures consistency across departments and processes and does away with the organization's silos. Thoughtworks Looking Glass 2022 discusses how leaders should consider where data ownership sits within their organization as data quality problems tend to emerge from organizational structures and architectures that don’t incentivize teams to produce and share the data resources they have.
Here are some key areas that I expect will receive more attention in the near future:
Source of truth: Data governance intends to understand and streamline the process of data collection and integration – to identify who owns the data and the source of truth. It can be considered a part of Master Data Management (MDM) and Data Quality Management. Identifying and maintaining the source of truth of data will guarantee both consistency of data alongside accuracy of analysis and modeling based on said data. However, this is easier said than done in spite of being an area of focus.
Data governance in AI and ML: Today, data governance is required in most areas of data and AI. The AI space is rapidly evolving and the adoption of AI based models is also rapidly increasing.
While AI technologies can have a wide and positive impact on social and economic welfare, legislation needs to be created to safeguard the fundamental rights and prevent safety risks for its users. Currently, data ethics is more of an organization’s ‘selective’ choosing. This means, there could be some lapses around AI solutions depending on impact on the organization and the effort required to make them more responsible.
Data has biases and leads to models that have biases. I foresee the design and adoption of concrete methods and frameworks that deal with such biases. Explainable AI is already seeing traction in the market. I expect more focus on the explainability of AI and ML, especially where decisions may have a big impact on people –as is with the banking and insurance sector, law enforcement, education and food distribution to name a few.
Applicability to small and medium sized businesses: While large enterprises are seeing value from investments in people/talent to have capabilities aligning with data governance objectives, the small and medium sized businesses will also begin seeing value from such investments – securing legal compliances and maintaining trust in their products and services.
Increased awareness across the organization: A well rounded awareness of data governance across roles and departments will nurture data quality, integrity and compliance. This year, I expect organizations to make short term and small investments targeted at augmenting awareness around data governance. The World Bank, for example, has been actively trying to increase awareness in this space.
Data governance is everyone's responsibility from the data professionals to business leaders. A data governance committee is a body responsible for data quality, data policies and regulations, best practices, strategizing and overseeing data governance programs, raising awareness of data governance within the organization, etc. I believe there will only be more instances of ‘data governance committees’ emerging that seek an active role from multiple stakeholders.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.