Continuous Data Quality: Business Requirements

Automated data processing is not all about how many machines you can throw at the problem. Data quality is a threat to any size operation. Integrating a data quality engine into your process is important if you need long term credibility, so below, is a list of high level requirements that support the problem statements in my last post.

Business Requirements:

  1. The automated ETL system should always produce trusted and robust outputs, even under conditions of variable file, record and data item quality.
  2. As quality failures can be detected at many levels of granularity, so too must the quality monitoring: so at file, message, record and data item levels
  3. To establish trust, all system inputs and outputs require a minimum level of data quality assessment to ensure that expectations set around data quality are met. These can be characterized by the principle of “Entry and Exit criteria checking.”
  4. If the input data quality changes through “drift”, error, or by design, the system should be able to handle these on-going situations in an agile manner that’s adaptable .
  5. Data points failing to meet expectations should be tagged as such, and should be made available for inspection and reporting.
  6. Enough information shouldbe collected about quality failures to describe, trace, and resolve the source of the issue
  7. Where data cleanse rules alter inputs or create data, these must be tracked and reported
  8. All data moving through the system must be traceable proving no data was unknowingly lost or introduced, and the provenance of all outputs is clear
  9. All file transfers in and out of the system must be wholly resilient in the face of machine, network and process failure
  10. Duplicate data files created through error must mitigated against in all cases
  11. Error codes should be directly interpretable, rather than requiring lookup in a dictionary.
  12. Rules to resolve data quality issues should be applied at the same level as the tagging of data quality problems
  13. the system should attempt to resolve a data resupply unaided by humans when files arrive that fail entry criteria checks
  14. data quality checks of many kinds are needed on individual and composite raw data fields:
  1. uniqueness checks
  2. format and data type conformance
  3. range constraints
  4. dictionary conformance
  5. character filters
  6. null constraints
  7. check digit verification

[The index for this series of articles is my data quality page.]

Tagged , , , , ,

Leave a comment