Crawler may be too polite #247

laundmo · 2025-01-01T12:34:43Z

when trying stract with one of my common searches, specifically searching docs.rs: "bevy Commands site:docs.rs" i noticed there were no results at all. even searching the whole crate "bevy site:docs.rs" led to no relevant results.

Reading the crawler documentation, especially the section about politeness, its immediately obvious why: 1 request every 5 seconds is simply not fast enough.

Some very rough math:
bevy has ~3200 items in their docs (counted on the "all items" docs.rs page)
3200*5seconds = 4.4h/24h = 5.45 scans of bevy-equivalent docs per day

docs.rs recieves around 800 releases at minimum per day, with one day recently having 1800 releases. Its very likely a few of these will be of similar size to bevy, or at least adding a few together will reach that level.

That means, assuming my math isnt horribly off in some way, at 5 seconds per request the crawler can never catch up.

I'm sure theres other domains like this, ones hosting a lot of new pages per day.

It may be worth considering if the crawler is too polite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler may be too polite #247

Crawler may be too polite #247

laundmo commented Jan 1, 2025

Crawler may be too polite #247

Crawler may be too polite #247

Comments

laundmo commented Jan 1, 2025