-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Land a new pipeline for calculating fingerprints.
This now snapshots the IDFMap and updates it less often. It doesn't slow down loading, but it does improve the time to save a card from ~11 seconds to ~5 seconds. Before, the IDFMap would be update every time any card changed. This meant that when a card was saved, the IDFMap was updated (a possibly expensive operation) which then invalidated all other fingerprints. But the IDFMap, for large webs, rarely changes much. Now we only update the IDFMap if the set of changed card is greater than 10% of the size of cards. This means that after saving a single card, we use the pre-existing IDFMap. This makes the common case of saving cards significantly more snappy. Part of #694. Merge branch 'fingerprint-performance' * fingerprint-performance: Bring the load performance in line with what is was previously. Fix a performance issue when saving a card. Change the order of cardTFIDF and fingerprintForTFIDF to allow memoizeFirstArg. Wrap fingerprintForCardObj in mwmoizeFirstArg. Change it so cardObj is first argument for fingerprintForCardObj. Pop out fingerprintForCardObj. Factor out fingerprintForTFIDF and cardTFIDF to not be on fingerprint generator. Get rid of an unnecessary generator argument to Fingerprint constructor. Put idfMap calculation behind a memoization layer. Refactor it so the idfmap is calculated outside fingerprint genrator.
- Loading branch information
Showing
2 changed files
with
120 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters