Once all data is retrieved, compares data between the two databases :
- Title : generate a simple ratio, a partial ratio, a token sort ratio and a token set ratio using Levenshtein distance
- Publication dates : checks if one of the origin database date is equal to one of the target databes date (does not check if no dates are found)
- Publisher : compares every publisher from the origin database record to every publisher from the target database record using a simple ratio, then returns the couple with the highest ratio
Then, the script validate or not every comparison criteria for the chosen analysis and outputs a validation grade for this comparison.
The matching results analysis is based on 3 criterias :
- Of the 4 title ratios used, at least X of them meet the required floor
- The publishers ratio meet the required floor
- The publication date is used
The analysis output 5 data :
- Global validation result : analysis final result, can be :
- All checks were successful
- Checks partially successful : only some checks were succesful
- All checks failed
- No checks : chosen analysis does not check any criteria
- If nothing is displayed, the analysis did not happen
- Number of successful checks : the number of checks that were OK
- Title check :
True
if the number of title checks was superir or aquel to the minimum requiredFalse
if it was nto the case
- Publishers check
True
if the publishers ratio is superior or equal to the required floorFalse
if it was nto the case
- Dates check
True
if one of the origin database date matches one of the target database onesFalse
if it was nto the case
Note : details (notably ratios) are available at the end of CSV export columns
If the floor are configured to 0
, the critera will be ignored
- Floor ratio for title match :
0
- Minimum ratio number matching :
0
- Floor ratiofor publisher match :
0
- Use publication date :
NON
- Floor ratio for title match :
80
- Minimum ratio number matching :
3
- Floor ratiofor publisher match :
80
- Use publication date :
OUI
- Floor ratio for title match :
90
- Minimum ratio number matching :
4
- Floor ratiofor publisher match :
0
- Use publication date :
NON
- Floor ratio for title match :
95
- Minimum ratio number matching :
4
- Floor ratiofor publisher match :
95
- Use publication date :
NON
To add a new analysis, add a new object in analysis.json
with the following keys :
name
(str)
: interface displayed analysis nameTITLE_MIN_SCORE
(int
) : floor ratio for title matching to be OKNB_TITLE_OK
(int
) : minimum number of title matches for considering the entire title criteria to be OKPUBLISHER_MIN_SCORE
(int
) : floor ratio for publishers matching to be OKUSE_DATE
(bool
) : use publication date