Data Quality - Metrics
Table of Contents
1 - About
2 - Articles Related
3 - Metrics / Data Quality Criteria
3.1 - Uniqueness / Duplicate
3.2 - Timeliness
Timeliness refers to the time expectation for accessibility and availability of information.
Timeliness can be measured as the time between when information is expected and when it is readily available for use.
For example, in the financial industry, investment product pricing data is often provided by third-party vendors. As the success of the business depends on accessibility to that pricing data, service levels specifying how quickly the data must be provided can be defined and compliance with those timeliness constraints can be measured.
3.3 - Referential Integrity
Assigning unique identifiers to objects (customers, products, etc.) within your environment simplifies the management of your data, but introduces new expectations that any time an object identifier is used as foreign keys within a data set to refer to the core representation, that core representation actually exists. More formally, this is referred to referential integrity, and rules associated with referential integrity often are manifested as constraints against duplication (to ensure that each entity is represented once, and only once), and reference integrity rules, which assert that all values used all keys actually refer back to an existing master record.
3.4 - Accuracy
Data accuracy refers to the degree with which data correctly represents the “real-life” objects they are intended to model.
3.5 - Consistency
In its most basic form, consistency refers to data values in one data set being consistent with values in another data set.
3.6 - Completeness
An expectation of completeness indicates that certain attributes should be assigned values in a data set. Completeness rules can be assigned to a data set in three levels of constraints:
- Mandatory attributes that require a value,
- Optional attributes, which may have a value based on some set of conditions
- Inapplicable attributes, (such as maiden name for a single male), which may not have a value
Completeness may also be seen as encompassing usability and appropriateness of data values.
An example of a completeness rule is :
- to ensure that all orders are deliverable, each line item must refer to a product, and each line item must have a product identifier. Therefore, the line item is not valid unless the Product identifier field is complete.
- no maiden name for a single male
3.7 - Currency
Currency refers to the degree to which information is current with the world that it models.
Currency can measure how “up-to-date” information is, and whether it is correct despite possible time-related changes.
Data currency may be measured as :
- a function of the expected frequency rate at which different data elements are expected to be refreshed
- as well as verifying that the data is up to date.
This may require some automated and manual processes.
Currency rules may be defined :
- to assert the “lifetime” of a data value before it needs to be checked and possibly refreshed.
For example, one might assert that the contact information for each customer must be current, indicating a requirement to maintain the most recent values associated with the individual’s contact data.
3.8 - Conformity
Conformity refers to whether instances of data are either store, exchanged, or presented in a format that is consistent with the domain of values, as well as consistent with other similar attribute values.
3.9 - Duplicate
4 - Notes
4.1 - In Dutch