Data Quality - Metrics

Dataquality Metrics

About

Dataquality Metrics

Very often these metrics are named dimension but it is an abuse language, it acts rather in a dimensional model as attribute of the data rules.

Metrics / Data Quality Criteria

Uniqueness / Duplicate

Logical Data Modeling - Entity Integrity (Uniqueness|No Duplicate|Distinct)

Timeliness

Timeliness refers to the time expectation for accessibility and availability of information.

Timeliness can be measured as the time between when information is expected and when it is readily available for use.

For example, in the financial industry, investment product pricing data is often provided by third-party vendors. As the success of the business depends on accessibility to that pricing data, service levels specifying how quickly the data must be provided can be defined and compliance with those timeliness constraints can be measured.

Referential Integrity

Assigning unique identifiers to objects (customers, products, etc.) within your environment simplifies the management of your data, but introduces new expectations that any time an object identifier is used as foreign keys within a data set to refer to the core representation, that core representation actually exists. More formally, this is referred to referential integrity, and rules associated with referential integrity often are manifested as constraints against duplication (to ensure that each entity is represented once, and only once), and reference integrity rules, which assert that all values used all keys actually refer back to an existing master record.

Accuracy

Data accuracy refers to the degree with which data correctly represents the “real-life” objects they are intended to model.

Consistency

In its most basic form, consistency refers to data values in one data set being consistent with values in another data set.

Completeness

An expectation of completeness indicates that certain attributes should be assigned values in a data set. Completeness rules can be assigned to a data set in three levels of constraints:

  1. Mandatory attributes that require a value,
  2. Optional attributes, which may have a value based on some set of conditions
  3. Inapplicable attributes, (such as maiden name for a single male), which may not have a value

Completeness may also be seen as encompassing usability and appropriateness of data values.

An example of a completeness rule is :

  • to ensure that all orders are deliverable, each line item must refer to a product, and each line item must have a product identifier. Therefore, the line item is not valid unless the Product identifier field is complete.
  • no maiden name for a single male

Currency

Currency refers to the degree to which information is current with the world that it models.

Currency can measure how “up-to-date” information is, and whether it is correct despite possible time-related changes.

Data currency may be measured as :

  • a function of the expected frequency rate at which different data elements are expected to be refreshed
  • as well as verifying that the data is up to date.

This may require some automated and manual processes.

Currency rules may be defined :

  • to assert the “lifetime” of a data value before it needs to be checked and possibly refreshed.

For example, one might assert that the contact information for each customer must be current, indicating a requirement to maintain the most recent values associated with the individual’s contact data.

Conformity

(Conformance)

Conformity refers to whether instances of data are either store, exchanged, or presented in a format that is consistent with the domain of values, as well as consistent with other similar attribute values.

Duplicate

See Logical Data Modeling - Entity Integrity (Uniqueness|No Duplicate|Distinct)

Notes

In Dutch

  • Volledigheid
  • Tijdigheid
  • Bruikbaarheid
  • Consistentheid
  • Accuraatheid
  • Correctheid





Discover More
Dataquality Metrics
Data Quality

measures the quality of data through metrics and try to improve them. You will find it in two main domains : The management of attribute data with the Master Data Management (MDM) The management...
Data System Architecture
Data Warehousing - 34 Kimball Subsytems

This page takes back the Kimball Datawarehouse 34 Subsystem as a table of content and links them to a page on this website....
Oltp Dwh
Data Warehousing - Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in Online Transaction Processing (OLTP) environments....
Star Schema
Dimensional Data Modeling - Measure

In a dimensional model, a measure is a quantitative attribute of a fact (in a fact table) that is not a foreign key that creates a relationship to a dimension. A measure permits to quantify. A calculated...
Card Puncher Data Processing
Glossary

(CREATE, ALTER, DROP) (GRANT, REVOKE) (SELECT, UPDATE, INSERT, DELETE) (COMMIT, ROLLBACK) NLS (National Language Support) UDML is the abbreviation for Universal Database Markup Language ...
Data System Architecture
Logical Data Modeling - Entity Integrity (Uniqueness|No Duplicate|Distinct)

Entity integrity concerns the concept of uniqueness (also called no duplicate) Uniqueness is enforced with: primary key unique key (on a column or indice) Entity integrity is an integrity rule...
Mdm Sap
Master Data Management (MDM)

solutions are considered to hold the master for any given entity. In computing, master data management (MDM) comprises a set of processes and tools that consistently defines and manages the non-transactional_data...
Many To Many Relationship
OBIEE 10G - How to define a Many-to-Many relationship with a Bridge table ?

Dimensional Schemas (Star and snowflake schemas) work well for modeling a particular part of a business where there are one-to-many relationships between the dimension tables and the fact tables. However,...
Obiee Bridge Schema
OBIEE 10G/11G - How to model a bridge table (Many-to-Many relationship) with the joins property of a Logical Table ?

Dimensional Schemas (Star and snowflake schemas) work well for modeling a particular part of a business where there are one-to-many relationships between the dimension tables and the fact tables. However,...
Query Optimizer Process
Oracle Database - The Query Plan Estimator

The estimator is involved in the query optimizer process. Its main task is to measure the plans that give the plan generator. It generates three different types of measures : These measures...



Share this page:
Follow us:
Task Runner