Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix or rewrite scraper engine and scrapers #259

Open
dag7dev opened this issue Dec 18, 2024 · 3 comments
Open

Fix or rewrite scraper engine and scrapers #259

dag7dev opened this issue Dec 18, 2024 · 3 comments

Comments

@dag7dev
Copy link
Contributor

dag7dev commented Dec 18, 2024

Many years have passed since its first release.

Known demozoo bug:

  • it always select the first secondary mirror assuming that is from scene.org. This is wrong and should be carefully checked
  • demozoo is missing the screenshots, it always downloads only the first one

Known scrape engine bugs:

  • sometimes it sets "screenshots" to None: it should set that field to empty list if no screenshot detected
  • it doesn't handle very well other extensions than gb, therefore, it may be a good idea to fix this thing, sometimes in the manifest there is "gbc" but the engine has already renamed the file in gb

Other improvements:

  • change the general logic to become more flexible (e.g. select best source, include other extensions)
  • write a basic test suite
  • test from scratch other scrapers, since they may have become buggy due to change in the master scraper
@dag7dev dag7dev changed the title Fix or rewrite the base scraper engine and scrapers Fix or rewrite scraper engine and scrapers Dec 18, 2024
@avivace
Copy link
Member

avivace commented Dec 18, 2024

@dag7dev demozoo is also missing the screenshots, it always downloads only the first one

@dag7dev
Copy link
Contributor Author

dag7dev commented Dec 18, 2024

@avivace it is bug of the master engine I think

@avivace
Copy link
Member

avivace commented Dec 18, 2024

I believe the "more screenshots" link is simply never navigated, only the first screenshot appearing in the main page is considered.

See: https://github.com/gbdev/database/blob/master/scrapers/py_importers/demozoo.py#L186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants