Data Quality

Welcome to my articles on data quality. Articles in this series are linked below:

  1. How to profile data
  2. Generating profiled data
  3. Continuous Data Quality: Problem Statements
  4. Continuous Data Quality: Business Requirements

Introduction:

Over the years, I’ve had to teach a number of people about data quality. Each time I’ve been mildly shocked that such an important area to so many businesses is so badly understood, even in companies that deal exclusively in data. I hope this series of articles can turn that situation around.

What is a data quality error?

Quite simply, a data quality error is when an item of data fails to meets someone’s expectations.  To understand how to check for data quality, you need to first establish what these expectations are, and then secondly, to have a way of showing whether the data items do or don’t conform to them.

Sounds simple right?

But, here’s the thing. In places where data quality is an issue I’ve noticed it’s generally because the expectations about the data items in question are totally unclear to everyone involved: to the data suppliers, the data consumers, and most of all to the system developers who sit in the middle. What makes this problem even more difficult is that the few expectations that people do have are completely out of line with what I call “the data reality”, and are based purely on assumptions rather than on familiarity with the data itself.

How can this be turned around?

A really effective way to turn this situation around is to get a hold of a sizable chunk of actual real data, to profile it, and then to tell everyone what they can actually expect in real life. What I’m saying is; you need to clearly inform everyone about what they can expect, and where necessary, reset those expectations using evidence. If people aren’t happy with “the data reality”, at least the main players can size the problem and then set about putting in concrete plans to address the real issues.

In the following weeks I’m going to publish articles to clearly explain and give you all the tools needed to put a full data quality programme together, allowing you

  • to profile your data
  • to summarize and communicate your findings
  • to set up those processes as on-going monitoring
  • to negotiate with data suppliers
  • to build tools to handle known issues automatically
  • and most importantly, to report on the ongoing value your data quality processes are adding to your operation (i.e. to justify the expenditure)

Do leave comments if you find this material helpful.

Leave a comment