feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100

brettdavies · 2025-02-18T02:25:32Z

This commit introduces a dynamic crawler selection system, allowing the deep research agent to use either Firecrawl or Crawl4AI as its web crawling backend. The changes maintain the existing functionality while adding flexibility in crawler choice.

Key Changes:

Remove static Firecrawl import and instance creation
Implement dynamic crawler selection based on CRAWLER environment variable
Add HTTP API integration for Crawl4AI with proper authentication and polling
Update environment variable handling for both crawlers
Standardize response format between both crawlers

Technical Details:

Add new environment variables:
- CRAWLER: Toggle between "FIRECRAWL" and "CRAWL4AI"
- CRAWL4AI_API_TOKEN: Authentication token for Crawl4AI
- CRAWL4AI_BASE_URL: Optional custom endpoint (default: localhost:11235)
Implement polling mechanism for Crawl4AI's asynchronous API
Transform Crawl4AI responses to match Firecrawl's data structure
Add proper error handling and timeout management for HTTP requests
Update .env.example with comprehensive documentation

The implementation ensures that the deep research functionality remains unchanged while providing users the flexibility to choose their preferred crawler backend. Error handling and timeout mechanisms have been carefully considered to maintain robustness regardless of the chosen crawler.

This commit introduces a dynamic crawler selection system, allowing the deep research agent to use either Firecrawl or Crawl4AI as its web crawling backend. The changes maintain the existing functionality while adding flexibility in crawler choice. Key Changes: - Remove static Firecrawl import and instance creation - Implement dynamic crawler selection based on CRAWLER environment variable - Add HTTP API integration for Crawl4AI with proper authentication and polling - Update environment variable handling for both crawlers - Standardize response format between both crawlers Technical Details: - Add new environment variables: * CRAWLER: Toggle between "FIRECRAWL" and "CRAWL4AI" * CRAWL4AI_API_TOKEN: Authentication token for Crawl4AI * CRAWL4AI_BASE_URL: Optional custom endpoint (default: localhost:11235) - Implement polling mechanism for Crawl4AI's asynchronous API - Transform Crawl4AI responses to match Firecrawl's data structure - Add proper error handling and timeout management for HTTP requests - Update .env.example with comprehensive documentation The implementation ensures that the deep research functionality remains unchanged while providing users the flexibility to choose their preferred crawler backend. Error handling and timeout mechanisms have been carefully considered to maintain robustness regardless of the chosen crawler.

brettdavies · 2025-02-18T02:31:32Z

@dzhng ,

Adding the dynamic crawler selection logic introduces additional complexity to deep-research.ts. Depending on how you want to maintain the repo, this may run afoul of SOLID principles by adding a second responsibility. If you agree, I can extract this functionality into a dedicated service/helper module (e.g., src/services/crawler.ts or src/helpers/crawler-factory.ts).

This refactor would:

Encapsulate crawler-specific logic and configuration
Provide a clean factory pattern for crawler instantiation
Make the main research logic more focused and testable
Simplify future additions of other crawler implementations

Let me know if you'd like me to make this change, and I'll resubmit the changes to this PR.

This commit enhances the README's markdown formatting to align with standard practices and improve readability across different markdown renderers. Detailed Changes: - Standardize code block indentation throughout the document * Ensure all code blocks are properly nested under their sections * Add consistent 3-space indentation for code blocks within lists - Fix code block formatting * Add proper line breaks before and after code blocks * Ensure all bash commands are properly tagged with ```bash - Improve documentation clarity * Add backticks around URLs in configuration examples * Fix inconsistent spacing in list items The changes maintain the same content while ensuring: - Consistent presentation across different markdown viewers - Proper nesting of code blocks within numbered lists - Clear visual hierarchy in the documentation - Better readability of configuration examples These formatting improvements help maintain a professional documentation standard while making the README more accessible to new contributors.

brettdavies · 2025-02-18T03:01:08Z

Added a second commit to the PR. This commit enhances the README's markdown formatting to align with standard practices and improve readability across different markdown renderers.

Detailed Changes:

Standardize code block indentation throughout the document
- Ensure all code blocks are properly nested under their sections
- Add consistent 3-space indentation for code blocks within lists
Fix code block formatting
- Add proper line breaks before and after code blocks
- Ensure all bash commands are properly tagged with ```bash
Improve documentation clarity
- Add backticks around URLs in configuration examples
- Fix inconsistent spacing in list items

The changes maintain the same content while ensuring:

Consistent presentation across different markdown viewers
Proper nesting of code blocks within numbered lists
Clear visual hierarchy in the documentation
Better readability of configuration examples

These formatting improvements help maintain a professional documentation standard while making the README more accessible to new contributors.

didlawowo · 2025-02-19T11:22:51Z

what advantage did you get to have crawl4ai instead of firewall ?

lucasgreenwell · 2025-02-22T02:25:23Z

@didlawowo Price. No need to pay for firecrawl as a separate service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100

feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100

brettdavies commented Feb 18, 2025

brettdavies commented Feb 18, 2025

brettdavies commented Feb 18, 2025

didlawowo commented Feb 19, 2025 •

edited

Loading

lucasgreenwell commented Feb 22, 2025

feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100

Are you sure you want to change the base?

feat(crawler): add support for Crawl4AI as alternative to Firecrawl #100

Conversation

brettdavies commented Feb 18, 2025

brettdavies commented Feb 18, 2025

brettdavies commented Feb 18, 2025

didlawowo commented Feb 19, 2025 • edited Loading

lucasgreenwell commented Feb 22, 2025

didlawowo commented Feb 19, 2025 •

edited

Loading