data quality engine

Automated data processing is not all about how many machines you can throw at the problem. Data quality is a threat to any size operation. Integrating a data quality engine into your process is important if you need long term credibility, so below, is a list of high level requirements that support the problem statements in my last post.

Business Requirements:

The automated ETL system should always produce trusted and robust outputs, even under conditions of variable file, record and data item quality.
As quality failures can be detected at many levels of granularity, so too must the quality monitoring: so at file, message, record and data item levels
To establish trust, all system inputs and outputs require a minimum level of data quality assessment to ensure that expectations set around data quality are met. These can be characterized by the principle of “Entry and Exit criteria checking.”
If the input data quality changes through “drift”, error, or by design, the system should be able to handle these on-going situations in an agile manner that’s adaptable .
Data points failing to meet expectations should be tagged as such, and should be made available for inspection and reporting.
Enough information shouldbe collected about quality failures to describe, trace, and resolve the source of the issue
Where data cleanse rules alter inputs or create data, these must be tracked and reported
All data moving through the system must be traceable proving no data was unknowingly lost or introduced, and the provenance of all outputs is clear
All file transfers in and out of the system must be wholly resilient in the face of machine, network and process failure
Duplicate data files created through error must mitigated against in all cases
Error codes should be directly interpretable, rather than requiring lookup in a dictionary.
Rules to resolve data quality issues should be applied at the same level as the tagging of data quality problems
the system should attempt to resolve a data resupply unaided by humans when files arrive that fail entry criteria checks
data quality checks of many kinds are needed on individual and composite raw data fields:

uniqueness checks

format and data type conformance

range constraints

dictionary conformance

character filters

null constraints

check digit verification

[The index for this series of articles is my data quality page.]

bytefreq

Data. Information. Technology. Architecture.

Tag Archives: data quality engine

Continuous Data Quality: Business Requirements

bytefreq

Data. Information. Technology. Architecture.

Tag Archives: data quality engine

Continuous Data Quality: Business Requirements

Share this: