English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

PROBLEM QUESTION: what is the probability that two different people - each from a different source system (database) will be matched in my target system (database)?
NOTE: the goal is to match the same person existing in different source systems (databases) in the target system (database)

MATCH CRITERIA:
> Match on Social Security Number, First Name, Last Name
+ SSN is not null
+ SSN is valid (using SSA rules)
+ First Name and Last Name are not case sensitive
+ First Name and Last Name have all spaces, dashes, etc. removed to create a string - which is used to match against.

SOURCE SYSTEMS:
> System A: 3 million customers.
+ Duplicate SSN's allowed. Current query reveals 4 duplicates
> System B: 1.5 million customers
+ No duplicate SSN's allowed.
> System C: 180,000 customers
+ Duplicate SSN's allowed. Current query reveals no duplicates

TARGET SYSTEM (database):
> Stores all matched records from the source systems the client is in.

2007-01-03 09:11:00 · 1 answers · asked by jabar 1 in Science & Mathematics Mathematics

1 answers

Eh... I don't know how you'd compute a probability for this. I wouldn't think that any two people would have the same security number, unless someone made a mistake typing it in, but the probability that two different people would have the same name AND the same social security number would be infinitesimal.

You'd be more likely to find problems by joining on SSN alone, and then outputting records where the SSN matched but the first name and/or last name didn't:

select
a.ssn,
a.lname,
a.fname,
b.fname,
b.lname
from
system_a_person a inner join
system_b_person b on
a.ssn = b.ssn
where
a.lname <> b.lname or
a.fname <> b.fname

2007-01-03 13:44:07 · answer #1 · answered by Jim Burnell 6 · 0 0

fedest.com, questions and answers