Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler may be too polite #247

Open
laundmo opened this issue Jan 1, 2025 · 0 comments
Open

Crawler may be too polite #247

laundmo opened this issue Jan 1, 2025 · 0 comments

Comments

@laundmo
Copy link

laundmo commented Jan 1, 2025

when trying stract with one of my common searches, specifically searching docs.rs: "bevy Commands site:docs.rs" i noticed there were no results at all. even searching the whole crate "bevy site:docs.rs" led to no relevant results.

Reading the crawler documentation, especially the section about politeness, its immediately obvious why: 1 request every 5 seconds is simply not fast enough.

Some very rough math:
bevy has ~3200 items in their docs (counted on the "all items" docs.rs page)
3200*5seconds = 4.4h/24h = 5.45 scans of bevy-equivalent docs per day

docs.rs recieves around 800 releases at minimum per day, with one day recently having 1800 releases. Its very likely a few of these will be of similar size to bevy, or at least adding a few together will reach that level.

That means, assuming my math isnt horribly off in some way, at 5 seconds per request the crawler can never catch up.

I'm sure theres other domains like this, ones hosting a lot of new pages per day.

It may be worth considering if the crawler is too polite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant