-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Singapore TIN Number #203
Conversation
af88095
to
36ad63d
Compare
@arthurdejong Ready for review. |
3eb5fd1
to
997d3e4
Compare
Thanks for the PR. Sad about not being able to get the documentation on the check digit algorithm. However, since there is such a large dataset of valid numbers published it is actually not that hard to reverse engineer the algorithm. First I looked at the distribution of the check digits across the numbers and found that they were not evenly distributed. However when filtering the numbers by type I found:
This seems to suggest a mod 11 algorithm where the check digit alphabet is different based on the type. When assuming a simple weighted algorithm we can try to guess the weights. Looking only at the business numbers for now we can generate groups of numbers that only differ in the last (before the check digit) number and check how the last digit changes the check digit: sames = defaultdict(list)
for number in numbers:
sames[number[:7] + 'x'].append(number)
complete = [number for number, values in sames.items() if len(values) == 10]
for i in range(5):
number = random.choice(complete)
print('%s %s' % (number, ''.join(x[-1] for x in sames[number])))
This shows that the check digit alphabet is sames = defaultdict(list)
for number in numbers:
sames[number[:6] + 'x' + number[7:8]].append(number)
complete = [number for number, values in sames.items() if len(values) == 10]
for i in range(5):
number = random.choice(complete)
print('%s %s' % (number, ', '.join(str(alphabet.index(x[-1])) for x in sames[number])))
This shows that every time the x goes up one the check digit goes down by 4, which implies the weight should be 7 (-4 mod 11). Doing this for every digit (the first digit requires a bit of tweaks because only values from 0 to 5 are found) and shifting the alphabet a bit to get the correct offset we get: def calc_business_check_digit(number):
number = compact(number)
weights = (10, 4, 9, 3, 8, 2, 7, 1)
return 'XMKECAWLJDB'[sum(int(n) * w for n, w in zip(number, weights)) % 11] Unleashing this function on the data set I found only 11 numbers where the check digit does not match:
I have not tried the online validator for these numbers and I haven't looked at the other number types yet but I expect the analysis should be pretty simple to repeat with the approach above (perhaps with some tweaks for the numbers that have letters in them). |
Wow!!! |
@arthurdejong I have checked all those numbers that do not match and they all seem to be either terminated or cancelled in 2017. Maybe we should go with this algorithm? |
Yep, verified with another website and all those are deregistered. |
I managed to reverse-engineer the local company and other checksums as well (the last one was a much bigger puzzle because of the letters). That only leaves "Foreign Company" numbers (the ones starting with Do you have some examples of valid numbers for these? The one in the tests doesn't pass the [online validator](https://www.iras.gov.sg/irashome/GST/GST-registered-businesses/Other-services/Checking-if-a-Business-is-GST-Registered/ and there do not seem to be many references to this flavour. Also note that the "other" flavour has a code (FC) for foreign companies so perhaps it has been replaced? Do you have some more background and/or examples of valid "Foreign Company" numbers? Thanks. |
Sadly I have found no foreign company UEN numbers. If I correctly recall the examples used in testing for foreign companies were made up based on the documentation I have referenced in the ticket, while all the examples for the other types of UEN numbers are real examples. |
Are you OK if I merge it without the foreign company UEN numbers? If it is used and some valid numbers are not validated correctly someone will likely complain while no one will likely complain if an invalid number is considered valid. |
@arthurdejong I am 100% OK with that. I would suggest keeping that particular code, but commented. |
Fixes #111.