A production-ready Model Context Protocol (MCP) server that provides seamless integration with the ScrapeGraph AI API. This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
- Key Features
- Quick Start
- Available Tools
- Setup Instructions
- Example Use Cases
- Error Handling
- Common Issues
- Development
- Contributing
- Documentation
- Technology Stack
- License
- 8 Powerful Tools: From simple markdown conversion to complex multi-page crawling and agentic workflows
- AI-Powered Extraction: Intelligently extract structured data using natural language prompts
- Multi-Page Crawling: SmartCrawler supports asynchronous crawling with configurable depth and page limits
- Infinite Scroll Support: Handle dynamic content loading with configurable scroll counts
- JavaScript Rendering: Full support for JavaScript-heavy websites
- Flexible Output Formats: Get results as markdown, structured JSON, or custom schemas
- Easy Integration: Works seamlessly with Claude Desktop, Cursor, and any MCP-compatible client
- Enterprise-Ready: Robust error handling, timeout management, and production-tested reliability
- Simple Deployment: One-command installation via Smithery or manual setup
- Comprehensive Documentation: Detailed developer docs in
.agent/folder
Sign up and get your API key from the ScrapeGraph Dashboard
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claudeAsk Claude or Cursor:
- "Convert https://scrapegraphai.com to markdown"
- "Extract all product prices from this e-commerce page"
- "Research the latest AI developments and summarize findings"
That's it! The server is now available to your AI assistant.
The server provides 8 enterprise-ready tools for AI-powered web scraping:
Transform any webpage into clean, structured markdown format.
markdownify(website_url: str)- Credits: 2 per request
- Use case: Quick webpage content extraction in markdown
Leverage AI to extract structured data from any webpage with support for infinite scrolling.
smartscraper(
user_prompt: str,
website_url: str,
number_of_scrolls: int = None,
markdown_only: bool = None
)- Credits: 10+ (base) + variable based on scrolling
- Use case: AI-powered data extraction with custom prompts
Execute AI-powered web searches with structured, actionable results.
searchscraper(
user_prompt: str,
num_results: int = None,
number_of_scrolls: int = None
)- Credits: Variable (3-20 websites × 10 credits)
- Use case: Multi-source research and data aggregation
Basic scraping endpoint to fetch page content with optional heavy JavaScript rendering.
scrape(website_url: str, render_heavy_js: bool = None)- Use case: Simple page content fetching with JS rendering support
Extract sitemap URLs and structure for any website.
sitemap(website_url: str)- Use case: Website structure analysis and URL discovery
Initiate intelligent multi-page web crawling (asynchronous operation).
smartcrawler_initiate(
url: str,
prompt: str = None,
extraction_mode: str = "ai",
depth: int = None,
max_pages: int = None,
same_domain_only: bool = None
)- AI Extraction Mode: 10 credits per page - extracts structured data
- Markdown Mode: 2 credits per page - converts to markdown
- Returns:
request_idfor polling - Use case: Large-scale website crawling and data extraction
Retrieve results from asynchronous crawling operations.
smartcrawler_fetch_results(request_id: str)- Returns: Status and results when crawling is complete
- Use case: Poll for crawl completion and retrieve results
Run advanced agentic scraping workflows with customizable steps and structured output schemas.
agentic_scrapper(
url: str,
user_prompt: str = None,
output_schema: dict = None,
steps: list = None,
ai_extraction: bool = None,
persistent_session: bool = None,
timeout_seconds: float = None
)- Use case: Complex multi-step workflows with custom schemas and persistent sessions
To utilize this server, you'll need a ScrapeGraph API key. Follow these steps to obtain one:
- Navigate to the ScrapeGraph Dashboard
- Create an account and generate your API key
For automated installation of the ScrapeGraph API Integration Server using Smithery:
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claudeUpdate your Claude Desktop configuration file with the following settings (located on the top rigth of the Cursor page):
(remember to add your API key inside the config)
{
"mcpServers": {
"@ScrapeGraphAI-scrapegraph-mcp": {
"command": "npx",
"args": [
"-y",
"@smithery/cli@latest",
"run",
"@ScrapeGraphAI/scrapegraph-mcp",
"--config",
"\"{\\\"scrapegraphApiKey\\\":\\\"YOUR-SGAI-API-KEY\\\"}\""
]
}
}
}The configuration file is located at:
- Windows:
%APPDATA%/Claude/claude_desktop_config.json - macOS:
~/Library/Application\ Support/Claude/claude_desktop_config.json
Add the ScrapeGraphAI MCP server on the settings:
The server enables sophisticated queries across various scraping scenarios:
- Markdownify: "Convert the ScrapeGraph documentation page to markdown"
- SmartScraper: "Extract all product names, prices, and ratings from this e-commerce page"
- SmartScraper with scrolling: "Scrape this infinite scroll page with 5 scrolls and extract all items"
- Basic Scrape: "Fetch the HTML content of this JavaScript-heavy page with full rendering"
- SearchScraper: "Research and summarize recent developments in AI-powered web scraping"
- SearchScraper: "Search for the top 5 articles about machine learning frameworks and extract key insights"
- SearchScraper: "Find recent news about GPT-4 and provide a structured summary"
- Sitemap: "Extract the complete sitemap structure from the ScrapeGraph website"
- Sitemap: "Discover all URLs on this blog site"
- SmartCrawler (AI mode): "Crawl the entire documentation site and extract all API endpoints with descriptions"
- SmartCrawler (Markdown mode): "Convert all pages in the blog to markdown up to 2 levels deep"
- SmartCrawler: "Extract all product information from an e-commerce site, maximum 100 pages, same domain only"
- Agentic Scraper: "Navigate through a multi-step authentication form and extract user dashboard data"
- Agentic Scraper with schema: "Follow pagination links and compile a dataset with schema: {title, author, date, content}"
- Agentic Scraper: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"
The server implements robust error handling with detailed, actionable error messages for:
- API authentication issues
- Malformed URL structures
- Network connectivity failures
- Rate limiting and quota management
When running on Windows systems, you may need to use the following command to connect to the MCP server:
C:\Windows\System32\cmd.exe /c npx -y @smithery/cli@latest run @ScrapeGraphAI/scrapegraph-mcp --config "{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}"This ensures proper execution in the Windows environment.
"ScrapeGraph client not initialized"
- Cause: Missing API key
- Solution: Set
SGAI_API_KEYenvironment variable or provide via--config
"Error 401: Unauthorized"
- Cause: Invalid API key
- Solution: Verify your API key at the ScrapeGraph Dashboard
"Error 402: Payment Required"
- Cause: Insufficient credits
- Solution: Add credits to your ScrapeGraph account
SmartCrawler not returning results
- Cause: Still processing (asynchronous operation)
- Solution: Keep polling
smartcrawler_fetch_results()until status is "completed"
Tools not appearing in Claude Desktop
- Cause: Server not starting or configuration error
- Solution: Check Claude logs at
~/Library/Logs/Claude/(macOS) or%APPDATA%\Claude\Logs\(Windows)
For detailed troubleshooting, see the .agent documentation.
- Python 3.10 or higher
- pip or uv package manager
- ScrapeGraph API key
# Clone the repository
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
cd scrapegraph-mcp
# Install dependencies
pip install -e ".[dev]"
# Set your API key
export SGAI_API_KEY=your-api-key
# Run the server
scrapegraph-mcp
# or
python -m scrapegraph_mcp.serverTest your server locally using the MCP Inspector tool:
npx @modelcontextprotocol/inspector scrapegraph-mcpThis provides a web interface to test all available tools.
Linting:
ruff check src/Type Checking:
mypy src/Format Checking:
ruff format --check src/scrapegraph-mcp/
├── src/
│ └── scrapegraph_mcp/
│ ├── __init__.py # Package initialization
│ └── server.py # Main MCP server (all code in one file)
├── .agent/ # Developer documentation
│ ├── README.md # Documentation index
│ └── system/ # System architecture docs
├── assets/ # Images and badges
├── pyproject.toml # Project metadata & dependencies
├── smithery.yaml # Smithery deployment config
└── README.md # This file
We welcome contributions! Here's how you can help:
- Add method to
ScapeGraphClientclass in server.py:
def new_tool(self, param: str) -> Dict[str, Any]:
"""Tool description."""
url = f"{self.BASE_URL}/new-endpoint"
data = {"param": param}
response = self.client.post(url, headers=self.headers, json=data)
if response.status_code != 200:
raise Exception(f"Error {response.status_code}: {response.text}")
return response.json()- Add MCP tool decorator:
@mcp.tool()
def new_tool(param: str) -> Dict[str, Any]:
"""
Tool description for AI assistants.
Args:
param: Parameter description
Returns:
Dictionary containing results
"""
if scrapegraph_client is None:
return {"error": "ScrapeGraph client not initialized. Please provide an API key."}
try:
return scrapegraph_client.new_tool(param)
except Exception as e:
return {"error": str(e)}- Test with MCP Inspector:
npx @modelcontextprotocol/inspector scrapegraph-mcp-
Update documentation:
- Add tool to this README
- Update .agent documentation
-
Submit a pull request
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run linting and type checking
- Test with MCP Inspector and Claude Desktop
- Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Line length: 100 characters
- Type hints: Required for all functions
- Docstrings: Google-style docstrings
- Error handling: Return error dicts, don't raise exceptions in tools
- Python version: Target 3.10+
For detailed development guidelines, see the .agent documentation.
For comprehensive developer documentation, see:
- .agent/README.md - Complete developer documentation index
- .agent/system/project_architecture.md - System architecture and design
- .agent/system/mcp_protocol.md - MCP protocol integration details
- Python 3.10+ - Modern Python with type hints
- FastMCP - Lightweight MCP server framework
- httpx 0.24.0+ - Modern async HTTP client
- Ruff - Fast Python linter and formatter
- mypy - Static type checker
- Hatchling - Modern build backend
- Smithery - Automated MCP server deployment
- Docker - Container support with Alpine Linux
- stdio transport - Standard MCP communication
- ScrapeGraph AI API - Enterprise web scraping service
- Base URL:
https://api.scrapegraphai.com/v1 - Authentication: API key-based
This project is distributed under the MIT License. For detailed terms and conditions, please refer to the LICENSE file.
Special thanks to tomekkorbak for his implementation of oura-mcp-server, which served as starting point for this repo.
- ScrapeGraph AI Homepage
- ScrapeGraph Dashboard - Get your API key
- ScrapeGraph API Documentation
- GitHub Repository
- Model Context Protocol - Official MCP specification
- FastMCP Framework - Framework used by this server
- MCP Inspector - Testing tool
- Smithery - MCP server distribution
- Claude Desktop - Desktop app with MCP support
- Cursor - AI-powered code editor
- GitHub Issues - Report bugs or request features
- Developer Documentation - Comprehensive dev docs
Made with ❤️ by ScrapeGraphAI Team

