
Anyone with significant IT experience can relate to the steady increase in systems that collect data over the last several decades. Technology teams started out just trying to collect data as it trickled in via keypunch, then flowed in from online users, then flooded in as machine data (Internet of Things) joined the torrent.
As “data” turned into “big data” — the tongue-in-cheek definition of which is “more data than you’ve got budget to handle” — IT teams struggled to get their arms around this river of data. Against this backdrop, the concept of data governance emerged, but there hasn’t always been broad agreement on what governance means in practice or how to get it done.
So I’d like to begin this analysis with a baseline definition: Per the Data Governance Institute, “Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.” That looks accurate and thorough to me, but I have more than 30 years in information technology (IT). I wouldn’t want to use this definition for the board or C-suite!
Even as we in IT struggled to acquire, store, and protect all this data that was being collected, data users — business executives and their “data analytics” teams in finance, marketing, and other verticals — got overwhelmed trying to analyze and understand the data. The saying that “data is the new oil” resonates with me, but not for the commonly cited reason that data is this increasingly valuable commodity. I liken data to oil because it’s a raw material that’s very messy and hard to handle in its raw form, but after a complex clean-up and refining process, it turns into a beneficial product called “information.”
Governing the Ungovernable
I discussed this topic with a colleague and long-time CIO, Rusty Atkinson. Rusty says it well: “Most companies do not really know where their data is or what their data is.” And that’s the crux of the matter: Organizations didn’t start with a carefully drawn model of the information users wanted, then built tools to generate that information. We started with whatever data we could find, in varying forms and of varying quality, and “played with it” to create analyses and reports that looked interesting. Users saw those reports and then told us what they really wanted. Rinse and repeat, as all three “Data Vs” (volume, velocity, variety) grew like mad.
After 20 or 30 years for most firms — or as much as 50 or 60 years with early IT adopters — it becomes challenging to get one’s arms around this fast-growing, fast-changing sea of data. It’s especially tough given the budget available to do any IT architecture/governance when the business is clamoring for “cool new stuff” from IT.
How to bring order to all this data and the ongoing requests? I recommend a three-pronged approach to building and driving a data governance strategy:
Prong One: Define the Information Needed to Drive Business Decisions
- Don’t turn a nascent DG program into an academic exercise. Start with a critical decision-making need and work backward from required information to source data
- Once you start pinning down data provenance, important related functions become easier. These related functions include data cleansing, ETL (extract, transform, load), and analytic processes
Prong Two: Implement Modern (Cloud) Data Tools
- Until recently, the limitations of traditional on-premise database, data management, data transformation (ETL), and analytics tools meant using many different, often poorly integrated, products.
- Software-as-a-service (SaaS) products can efficiently cope with a wide variety of data types, can acquire data from many more sources, can cleanse and transform data faster, and can operate as data lakehouses (combining the analytical capabilities of data warehouses) with transactional capabilities (data lakes). Streamlined products make it easier for IT professionals to be good data custodians and allow end users (data owners) easier access to data and metadata (data about data).
- While not exactly a data governance decision, adopting a modern SaaS ERP (enterprise resource planning) or EHR (electronic health record) suite can be game-changing. A top-tier suite incorporates powerful data management tools, modern databases, analytics tools, and the ability to extend the data model (i.e., add new fields and data sets) from within the suite. Implementing such a suite jump-starts a data governance initiative while making it far easier to maintain effective data governance.
Prong Three: Combine Data Governance with Zero Trust
- Most organizations are desperately in need of cybersecurity upgrades. And many organizations have earmarked significant budgets for cybersecurity improvements. For reasons outside the scope of this article, zero-trust security is a promising approach to achieving better security. An essential part of a zero-trust security project is inventorying and classifying data elements and data sets to manage data access rights.
- Why not feed two birds with one scone (yes, I said it ) by combining a vital, well-funded cybersecurity project with an equally vital but usually underfunded data governance project? This will accelerate both projects while ensuring that data is correctly classified as to security and other needed metadata.
Final Thoughts
So there you have it. Data governance has been a big challenge because the need for data outstripped our budget for systematically managing it. But the availability of re-imagined cloud tools, plus the availability of more robust cybersecurity tools and strategies, make proactive data governance attainable. I call that a win-win!
Looking for more insights into all things data? Subscribe to the Data Modernization channel: