feat(sitemap): bounded concurrency + default --limit 200#23
Merged
Conversation
…cation - Audit sitemap pages with a 5-worker pool instead of fully sequential to cut wall time without hammering the origin. - Default --limit to 200 when unset so large sitemaps (e.g. 1800 URLs) don't trigger hour-long sweeps; existing --limit overrides remain. - Split pagesSkipped into pagesFiltered (non-HTML) + pagesTruncated (limit) in the report; expose effectiveLimit and an onPlan callback. Formatters now spell out what was skipped and why; CLI prints a stderr notice up front when truncation fires.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sitemap.xml) used to trigger ~90 min sweeps that looked hung; the new pool plus a default--limit 200keep the common case under ~2 min while still being polite to one origin.pagesSkippedintopagesFiltered(non-HTML URLs dropped) andpagesTruncated(URLs beyond--limit), exposeeffectiveLimitin the report, and add anonPlancallback so the CLI can print a stderr notice up front when truncation fires; formatters now spell out what was skipped and how to opt out.1.5.0(backwards-compatible: existing JSON consumers keeppagesSkippedas the sum, and--limit 9999restores the old "audit everything" behavior).Test plan
pnpm run typecheckpnpm lintpnpm test(94 passed, including newmapWithConcurrencyorder/cap tests)pnpm run build