fix: calculation of the check digit when it is in the optional part#66
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #66 +/- ##
=======================================
Coverage 88.66% 88.66%
=======================================
Files 48 48
Lines 450 450
Branches 123 123
=======================================
Hits 399 399
Misses 51 51 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@targos I think we should be very careful with this change. It increases the number of accepted check digits (potential false positives). I would appreciate if we could find an official spec which somehow confirms / explains this. |
|
We should also double check the existing test cases which would fail without including the Related: #36 |
|
I have pushed more test cases. If we agree that the current impl is wrong, I would apply the following to this PR: Subject: [PATCH] fix: add support for parsing the document number check digit in more cases Some recent Belgian and Portuguese ID cards do not include the I have not found this behavior documented in any official specification so far, but it can be confirmed using documents available online (for example, PRADO). I have also encountered real IDs with this characteristic. diff --git a/src/parse/__tests__/td1.test.ts b/src/parse/__tests__/td1.test.ts
--- a/src/parse/__tests__/td1.test.ts (revision d7d8376abfecde7c4b172a0d13f0b571b6dae9eb)
+++ b/src/parse/__tests__/td1.test.ts (date 1767080209393)
@@ -291,8 +291,8 @@
it('parse document number', () => {
const MRZ = [
- 'I<UTOD23145890<1240<XYZ<<<<<<<',
- '7408122F1204159UTO<<<<<<<<<<<8',
+ 'I<UTOD23145890<1244<<<<<<<<<<<',
+ '7408122F1204159UTO<<<<<<<<<<<2',
'ERIKSSON<<ANNA<MARIA<<<<<<<<<<',
];
@@ -318,7 +318,7 @@
ranges: [
{ line: 0, start: 5, end: 14, raw: 'D23145890' },
{ line: 0, start: 14, end: 15, raw: '<' },
- { line: 0, start: 15, end: 30, raw: '1240<XYZ<<<<<<<' },
+ { line: 0, start: 15, end: 30, raw: '1244<<<<<<<<<<<' },
],
line: 0,
start: 5,
@@ -326,7 +326,7 @@
autocorrect: [],
});
expect(result.fields.documentNumber).toBe('D23145890124');
- expect(result.fields.documentNumberCheckDigit).toBe('0');
+ expect(result.fields.documentNumberCheckDigit).toBe('4');
const documentNumberCheckDigitDetails = result.details.find(
(d) => d.field === 'documentNumberCheckDigit',
@@ -336,7 +336,7 @@
line: 0,
start: 18,
end: 19,
- value: '0',
+ value: '4',
});
});
Index: src/parsers/parseDocumentNumberCheckDigit.ts
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/parsers/parseDocumentNumberCheckDigit.ts b/src/parsers/parseDocumentNumberCheckDigit.ts
--- a/src/parsers/parseDocumentNumberCheckDigit.ts (revision d7d8376abfecde7c4b172a0d13f0b571b6dae9eb)
+++ b/src/parsers/parseDocumentNumberCheckDigit.ts (date 1767080190539)
@@ -8,23 +8,14 @@
if (checkDigit === '<' && optional) {
const firstFiller = optional.indexOf('<');
const tail = optional.slice(0, firstFiller - 1);
+ source = `${source}${tail}`;
checkDigit = optional.charAt(firstFiller - 1);
- try {
- check(`${source}<${tail}`, checkDigit);
- return {
- value: checkDigit,
- start: firstFiller,
- end: firstFiller + 1,
- };
- } catch (error) {
- check(`${source}${tail}`, checkDigit);
- return {
- error: error.message,
- value: checkDigit,
- start: firstFiller,
- end: firstFiller + 1,
- };
- }
+ check(source, checkDigit);
+ return {
+ value: checkDigit,
+ start: firstFiller,
+ end: firstFiller + 1,
+ };
} else {
check(source, checkDigit);
return checkDigit; |
There was a problem hiding this comment.
I agree that the implementation which tries 2 different check digit implementations and marks it as valid if any of those pass is not ok.
The already existing test which had a new check digit introduced here and changed the parsing implementation: 5166825
Given it changed the existing test but added a dummy additional optional part + new check digit, I assume this is not a PRADO example anymore.
Therefore I think your implementation fixes the calculation of the check digit when it is in the optional part. We can keep only that one and mark it as a fix rather than a breaking change.
e87b31c to
367e39e
Compare
Based on examples from PRADO and ICAO examples (https://www.icao.int/sites/default/files/publications/DocSeries/9303_p11_cons_en.pdf), the `<` separator before the optional part should not be included in checksum calculation. This also resolved cheminfo#36.
367e39e to
e2f7e33
Compare
|
@stropitek Thanks for you feedback. I have applied it and also updated the initial comment and PR title accordingly. I have synced the test case and the README sample. |
|
For reference, this was probably the reference document we used at the time as there are similar examples. It includes examples where the document number check digit is valid only with your fix, so it all SGTM. https://www.icao.int/sites/default/files/publications/DocSeries/9303_p11_cons_en.pdf Same document as attachment: |
stropitek
left a comment
There was a problem hiding this comment.
Thank you for the contribution!
Based on examples from PRADO and ICAO examples (https://www.icao.int/sites/default/files/publications/DocSeries/9303_p11_cons_en.pdf), the
<separator before the optional part should not be included in checksum calculation.This also resolved #36.