Contrary to popular belief, data are not the oil, fuel, energy, or life force coursing through the enterprise to inform decision-making, engender insights, and propel timely business action rooted in concrete facts.
Data quality is.
Without data quality, data science, big data, master data, and even metadata are all useless. Ensuring data quality is foundational to reducing risks and reaping any benefit from data themselves.
According to Profisee MDM Strategist Bill O’Kane, however, doing so is more difficult than it looks because there’s “absolute data quality and relative data quality. It could just be bad, unpopulated, mispopulated data.” Such common occurrences result in absolute data quality problems.
However, organizations must also account for relative data quality concerns pertaining to its consistency dimension, which O’Kane described as “one of the facets of data quality in the formal sense and that is [the fields] may be all filled in, and they may all be within the domains they’re supposed to be, but they’re not alike. I can’t put them together.”
Complications in either of these manifestations of data quality can unhinge data processes for any use case, prolonging time to value, escalating costs, and even wasting money. Standardizing data quality with approaches like Master Data Management redresses these issues to foment the highly lauded value data provides the enterprise.
Distributed Architecture
In addition to consistency, the other major facets of data quality include accuracy, recentness, completeness, and de-duplication. There are two cardinal challenges organizations must overcome to realize these characteristics of their data, which O’Kane categorized as “fragmentation and inconsistency. [Data] sits in a bunch of places and they don’t look like each other.” The former of these obstacles pertains to the increasingly distributed nature of the data space itself, in which a multitude of sources are external to the enterprise with varying architectures, applications, and data types. The latter relates to the distributed settings of data sources and their varying degrees of isolation.
“Data’s created in silos without consistent governance,” reflected Profisee VP of Marketing Martin Boyd. “It may or may not suit the purpose of the source system, but when you bring it together and compare it with other data about the same person or entity that came from a different system, you find that there’s missing information, overlaps, conflicts.” The advent of the cloud frequently exacerbates this issue, as it’s one of the chief mediums contributing to the distribution of data and is sometimes leveraged to integrate data. “You’ve got data coming from many different places: from ERPs, from your CRM, custom applications, cloud applications, legacy applications, lots of different places, and you suck it into [the public cloud],” Boyd commented.
Data Governance
The core concepts of data governance—commonly conceived of today as people, processes, and technology—can result in administering data quality not only in distinct source systems, but also in hubs such as MDM. This is a critical point because, with the amount of data increasing in distributed locations, there’s no one application, tool, or person who can instantly fix data quality issues. An MDM approach combines this triad of data governance precepts in a couple key ways to furnish absolute and relative data quality, including.
- Standards: MDM provides capabilities to transform data according to predefined rules so it will adhere to conventions for representing dates, customers names, or locations in a consistent way that “enforce data quality standards,” Boyd mentioned.
- Record Detection: This approach has mechanisms for determining if records are about the same entity across sources or within them, which is critical for de-duplicating data. MDM also uses fuzzy matching and probabilistic techniques to “match and merge, validate and correct and sync back across these systems,” Boyd noted.
- Data Stewardship: MDM also serves as a central locus with which to address any conflicting or missing information that requires human expertise to resolve, based on aspects of domain knowledge and experience. In these instances MDM helps with “stewardship work because the data’s just flat contradictory and the system has no way of knowing which version is correct,” Boyd explained. “In this case a human being has to step in and do work.”
- Automation: Much of the data quality transformations, matching, and merging MDM supplies is automated. Cognitive computing enables additional automation by learning from a data steward’s solution to a particular conflict or question about accuracy, for example, then applying it to similar solutions in the future.
Absolute and Relative Quality
By combining some of the foresaid aspects of data governance, as well as others like metadata management and data modeling, MDM institutes absolute and relative data quality. The former removes situations in which information is missing, outdated, or inaccurate. The latter is equally vital and enables organizations to synthesize information about entities across sources for several use cases, resulting in demonstrable business value from the foundation of “a trusted and reliable source of all that data,” Boyd indicated. The capacity to rapidly assess information across entities for various sources positively impacts everything from machine learning training data to recommendations for cross-selling and up-selling. Other key use cases include:
- Householding: By revealing which customers live under the same roof (or are in the same household), or which home insurance customers are also business owners, for example, MDM users can refine their marketing and sales efforts to decrease costs and increase revenues via householding and super householding.
- B2B2C: Perfecting customer data with MDM supports the movement towards B2B2C, which O’Kane referred to as a trend in which “manufacturers that typically didn’t have their end customers’ data and sentiments are now trying to get them.” By gathering more data about what Wal-Mart patrons, for example, are doing with a specific manufacturer’s products, these companies seek “more customer knowledge to predict customer behavior,” Boyd remarked.
- Customer Profitability: Proper data quality in the customer domain of MDM can also payoff by indicating “different margins for different customers for customer profitability,” specified Profisee Director of Value Management Harbert Bernard. “It’s key to be able to understand that and segment the value of your customers.”
A Single Truth Source
Data quality is the enterprise enabler of data-driven practices. Without it, data becomes much more of a risky liability than it is value-additive. With it, especially when there’s absolute and relative data quality, organizations can combine information about specific entities to not only minimize risk, but also enlarge their bottom lines because they know the truth about the data they’re employing.
“Your data can be great in all the individual sources you’ve brought it from, but when you put it together there’s conflict,” Boyd said. “What you need is high quality, trusted data in order to consume the data for anything, any business purpose you want, including machine learning, analytics, and reporting.”
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Speak Your Mind