You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently books recommender looks at Libgen database dump, finds matches, and saves the top k matches to the database. If a user thumbs a book, it gets perma-saved to the database; otherwise, next recommender run wipes the previous matches & re-saves new matches.
Libgen can have many duplicate books for a single book. This because there's different formats, editions, etc. I'm using the primary-key of book_id, which is Libgen-specific; not book-specific. Instead we should be using ISBN or another identifier as the primary_key to prevent duplicates from being saved on recommendation.
Find a good unique ID for Libgen books. Likely ISBN, but those might be null, or maybe there's a better ID
Re-do current books table to use that as primary_key
Write migration to clear out currently-saved books to remove duplicates; keeping the entry with the most interactions (thumbs/etc)
Also consider a simple Levenshtein distance check on book_title + book_author in case ISBNs are different, but the books are duplicates. Sometimes the same book's uploaded twice, with a single character change in the title
The text was updated successfully, but these errors were encountered:
Currently books recommender looks at Libgen database dump, finds matches, and saves the top k matches to the database. If a user thumbs a book, it gets perma-saved to the database; otherwise, next recommender run wipes the previous matches & re-saves new matches.
Libgen can have many duplicate books for a single book. This because there's different formats, editions, etc. I'm using the primary-key of book_id, which is Libgen-specific; not book-specific. Instead we should be using ISBN or another identifier as the primary_key to prevent duplicates from being saved on recommendation.
The text was updated successfully, but these errors were encountered: