-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a dynamic crawler selection system, allowing the deep research agent to use either Firecrawl or Crawl4AI as its web crawling backend. The changes maintain the existing functionality while adding flexibility in crawler choice. Key Changes: - Remove static Firecrawl import and instance creation - Implement dynamic crawler selection based on CRAWLER environment variable - Add HTTP API integration for Crawl4AI with proper authentication and polling - Update environment variable handling for both crawlers - Standardize response format between both crawlers Technical Details: - Add new environment variables: * CRAWLER: Toggle between "FIRECRAWL" and "CRAWL4AI" * CRAWL4AI_API_TOKEN: Authentication token for Crawl4AI * CRAWL4AI_BASE_URL: Optional custom endpoint (default: localhost:11235) - Implement polling mechanism for Crawl4AI's asynchronous API - Transform Crawl4AI responses to match Firecrawl's data structure - Add proper error handling and timeout management for HTTP requests - Update .env.example with comprehensive documentation The implementation ensures that the deep research functionality remains unchanged while providing users the flexibility to choose their preferred crawler backend. Error handling and timeout mechanisms have been carefully considered to maintain robustness regardless of the chosen crawler.
@dzhng , Adding the dynamic crawler selection logic introduces additional complexity to This refactor would:
Let me know if you'd like me to make this change, and I'll resubmit the changes to this PR. |
This commit enhances the README's markdown formatting to align with standard practices and improve readability across different markdown renderers. Detailed Changes: - Standardize code block indentation throughout the document * Ensure all code blocks are properly nested under their sections * Add consistent 3-space indentation for code blocks within lists - Fix code block formatting * Add proper line breaks before and after code blocks * Ensure all bash commands are properly tagged with ```bash - Improve documentation clarity * Add backticks around URLs in configuration examples * Fix inconsistent spacing in list items The changes maintain the same content while ensuring: - Consistent presentation across different markdown viewers - Proper nesting of code blocks within numbered lists - Clear visual hierarchy in the documentation - Better readability of configuration examples These formatting improvements help maintain a professional documentation standard while making the README more accessible to new contributors.
Added a second commit to the PR. This commit enhances the README's markdown formatting to align with standard practices and improve readability across different markdown renderers. Detailed Changes:
The changes maintain the same content while ensuring:
These formatting improvements help maintain a professional documentation standard while making the README more accessible to new contributors. |
what advantage did you get to have crawl4ai instead of firewall ? |
@didlawowo Price. No need to pay for firecrawl as a separate service. |
This commit introduces a dynamic crawler selection system, allowing the deep research agent to use either Firecrawl or Crawl4AI as its web crawling backend. The changes maintain the existing functionality while adding flexibility in crawler choice.
Key Changes:
Technical Details:
The implementation ensures that the deep research functionality remains unchanged while providing users the flexibility to choose their preferred crawler backend. Error handling and timeout mechanisms have been carefully considered to maintain robustness regardless of the chosen crawler.