A growing trend in financial institutions is performing rule-based auto elimination on secondary attributes. So for example when we have a match on a name; but a mismatch on a secondary attribute such as date of birth, nationality, or location; then the match will be eliminated before a person looks at it.
We at SQA Consulting only encourage this practice, but it should be done with care, computer-based rules have no lee-way for applying common sense, so your rules should be precise and cover all eventualities.
In a previous article, we gave examples of where it would be unwise to eliminate on dates of birth being different, for many reasons. One of those reasons was the date of birth being the first of the month, and in this article, we will give you more information as to why that is the case.
SQA Consulting provides Data Profiling services to examine the data that is being used for customer screening. This analysis has led to many recurrent insights on the data held by financial institutions, one of which is that date of births – which should be evenly distributed across a year – tend to peak on day one of a month, and especially on the first of January. This bias varies from data source to data source and should be measured and managed according to your actual data. If you do see this bias then you should take account of it.
We will demonstrate this behaviour with graphs for some typical data.
As you can see there are twice as many people born on the first of the month in January than any other month. This is because when we only know the year of birth we default the month and day to the first of January. Yes, there are people genuinely born on the first of January, but in this data set, we cannot trust that date.
In the second graph we are showing which day of the month people are born in – but excluding the month of January which is heavily biased as shown above. We obviously expect the number of people born on the 31st to be much lower than any other date. The number of people born on the 1st of the month is slightly higher than for any other day, in fact, it is 14% higher than the average of the other 27 days that feature in every month. This behaviour illustrates where the year and month are known for a date of birth, but the day of birth isn’t known and a default of the 1st is used.
The bias for using the 1st of January is very strong, the bias for using the 1st of the month in other months is relatively weak but still present.
How this looks for your own data should direct your investigation protocol and your auto-elimination algorithms.
By safely applying auto elimination rules we can achieve great efficiency improvements the key is applying the rules in a safe way, please read the rest of our growing set of articles on how to auto eliminate safely here.
Alternatively, you can contact us at SQA Consulting, to see how we may assist you in developing the necessary skills needed for implementing these strategies.