Skip to content

403 during scraping leads to incomplete documentation with no way to fix the problem #246

@vesper8

Description

@vesper8

I installed the standalone server via Docker and then queued https://jsonforms.io/docs/ for scraping

It identified 24 pages and then successfully scraped 19 of them and failed on the remaining 5 and then claimed "success".

I feel like there should be an option to complete the job.

Also.. it would be nice to be able to include a URL blacklist. For example I am only interested in the "vue" documentation but the website above also provides angular and react and I don't want to confuse the AI with documentation that doesn't apply.

This was the result:

📝 Job enqueued: dc350e83-79a8-4844-982d-3a9d7e02e5ab for [email protected]
🗑️ Removing all documents from [email protected] store
🗑️ Deleted 0 documents
💾 Cleared store for [email protected] before scraping.
🌐 Scraping page 1/1 (depth 0/3): https://jsonforms.io/docs/
📚 Adding document: What is JSON Forms? - JSON Forms
✂️  Split document into 4 chunks
🌐 Scraping page 2/24 (depth 1/3): https://jsonforms.io/docs/uischema/
📚 Adding document: UI Schema - JSON Forms
✂️  Split document into 1 chunks
🌐 Scraping page 3/24 (depth 1/3): https://jsonforms.io/docs/architecture
📚 Adding document: Architecture - JSON Forms
✂️  Split document into 2 chunks
🌐 Scraping page 4/24 (depth 1/3): https://jsonforms.io/docs/getting-started
📚 Adding document: Getting Started - JSON Forms
✂️  Split document into 1 chunks
🌐 Scraping page 5/24 (depth 1/3): https://jsonforms.io/docs/uischema/layouts
📚 Adding document: Layouts - JSON Forms
✂️  Split document into 4 chunks
🌐 Scraping page 6/24 (depth 1/3): https://jsonforms.io/docs/uischema/rules
📚 Adding document: Rules - JSON Forms
✂️  Split document into 3 chunks
🌐 Scraping page 7/24 (depth 1/3): https://jsonforms.io/docs/uischema/controls
📚 Adding document: Controls - JSON Forms
✂️  Split document into 9 chunks
🌐 Scraping page 8/24 (depth 1/3): https://jsonforms.io/docs/renderer-sets
📚 Adding document: Renderer sets - JSON Forms
✂️  Split document into 3 chunks
🌐 Scraping page 9/24 (depth 1/3): https://jsonforms.io/docs/labels
📚 Adding document: Labels - JSON Forms
✂️  Split document into 3 chunks
🌐 Scraping page 10/24 (depth 1/3): https://jsonforms.io/docs/i18n
📚 Adding document: i18n - JSON Forms
✂️  Split document into 8 chunks
🌐 Scraping page 11/24 (depth 1/3): https://jsonforms.io/docs/ref-resolving
📚 Adding document: Ref Resolving - JSON Forms
✂️  Split document into 2 chunks
🌐 Scraping page 12/24 (depth 1/3): https://jsonforms.io/docs/validation
📚 Adding document: Validation - JSON Forms
✂️  Split document into 3 chunks
🌐 Scraping page 13/24 (depth 1/3): https://jsonforms.io/docs/readonly
📚 Adding document: ReadOnly - JSON Forms
✂️  Split document into 4 chunks
🌐 Scraping page 14/24 (depth 1/3): https://jsonforms.io/docs/middleware
📚 Adding document: Middleware - JSON Forms
✂️  Split document into 5 chunks
🌐 Scraping page 15/24 (depth 1/3): https://jsonforms.io/docs/multiple-choice
📚 Adding document: Multiple Choice - JSON Forms
✂️  Split document into 5 chunks
🌐 Scraping page 16/24 (depth 1/3): https://jsonforms.io/docs/date-time-picker
📚 Adding document: Date and Time Picker - JSON Forms
✂️  Split document into 8 chunks
🌐 Scraping page 17/24 (depth 1/3): https://jsonforms.io/docs/tutorial
📚 Adding document: Create a JSON Forms App - JSON Forms
✂️  Split document into 5 chunks
🌐 Scraping page 18/24 (depth 1/3): https://jsonforms.io/docs/tutorial/custom-renderers
📚 Adding document: Custom Renderers - JSON Forms
✂️  Split document into 11 chunks
🌐 Scraping page 19/24 (depth 1/3): https://jsonforms.io/docs/tutorial/custom-layouts
📚 Adding document: Custom Layouts - JSON Forms
✂️  Split document into 7 chunks
❌ Failed processing page https://jsonforms.io/docs/tutorial/multiple-forms: ScraperError: Failed to fetch https://jsonforms.io/docs/tutorial/multiple-forms after 1 attempts: Request failed with status code 403
❌ Failed to process https://jsonforms.io/docs/tutorial/multiple-forms: ScraperError: Failed to fetch https://jsonforms.io/docs/tutorial/multiple-forms after 1 attempts: Request failed with status code 403
❌ Failed processing page https://jsonforms.io/docs/api: ScraperError: Failed to fetch https://jsonforms.io/docs/api after 1 attempts: Request failed with status code 403
❌ Failed to process https://jsonforms.io/docs/api: ScraperError: Failed to fetch https://jsonforms.io/docs/api after 1 attempts: Request failed with status code 403
❌ Failed processing page https://jsonforms.io/docs/integrations/react: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/react after 1 attempts: Request failed with status code 403
❌ Failed to process https://jsonforms.io/docs/integrations/react: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/react after 1 attempts: Request failed with status code 403
❌ Failed processing page https://jsonforms.io/docs/integrations/vue: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/vue after 1 attempts: Request failed with status code 403
❌ Failed to process https://jsonforms.io/docs/integrations/vue: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/vue after 1 attempts: Request failed with status code 403
❌ Failed processing page https://jsonforms.io/docs/integrations/angular: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/angular after 1 attempts: Request failed with status code 403
❌ Failed to process https://jsonforms.io/docs/integrations/angular: ScraperError: Failed to fetch https://jsonforms.io/docs/integrations/angular after 1 attempts: Request failed with status code 403
✅ Job completed: dc350e83-79a8-4844-982d-3a9d7e02e5ab
🔍 Searching jsonforms@latest for: vue
🔎 Validating existence of library: jsonforms
✅ Library 'jsonforms' confirmed to exist.
🔍 Finding best version for jsonforms
✅ Found best match version 3.6.0 for jsonforms
✅ Found 6 matching results

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions