Data Quality - Entity (Resolution|Disambiguation) - Record (linkage|matching) - Conflation
Table of Contents
1 - About
Also known as :
- entity disambiguation/linking,
- duplicate detection or deduplication
- record matching,
- (reference) reconciliation,
- object identification,
- and conflation.
Entity Resolution (ER) refers to the task of finding records in a data set that refer to the same entity across different data sources. (identifier)
A data set that has undergone ER may be referred to as being cross-linked.
2 - Articles Related
3 - Example
- Entity resolution across two data sets of commercial products.
4 - Approach
- See also: Device_fingerprint
5 - Library
- https://github.com/dedupeio/dedupe - A python library for accurate and scaleable fuzzy matching, record deduplication and entity-resolution. Dedupe is based on Mikhail Yuryevich Bilenko's Ph.D. dissertation: Learnable Similarity Functions and their Application to Record Linkage and Clustering. See similarity