Skip to content

Commit e07cd76

Browse files
committed
feat: add sitemap example
1 parent f5e907e commit e07cd76

File tree

8 files changed

+1148
-0
lines changed

8 files changed

+1148
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Sitemap Examples
2+
3+
This directory contains examples demonstrating how to use the `sitemap` endpoint to extract URLs from website sitemaps.
4+
5+
## πŸ“ Examples
6+
7+
### 1. Basic Sitemap Extraction (`sitemap_example.js`)
8+
9+
Demonstrates the basic usage of the sitemap endpoint:
10+
- Extract all URLs from a website's sitemap
11+
- Display the URLs
12+
- Save URLs to a text file
13+
- Save complete response as JSON
14+
15+
**Usage:**
16+
```bash
17+
node sitemap_example.js
18+
```
19+
20+
**What it does:**
21+
1. Calls the sitemap API with a target website URL
22+
2. Retrieves all URLs from the sitemap
23+
3. Displays the first 10 URLs in the console
24+
4. Saves all URLs to `sitemap_urls.txt`
25+
5. Saves the full response to `sitemap_urls.json`
26+
27+
### 2. Advanced: Sitemap + SmartScraper (`sitemap_with_smartscraper.js`)
28+
29+
Shows how to combine sitemap extraction with smartScraper for batch processing:
30+
- Extract sitemap URLs
31+
- Filter URLs based on patterns (e.g., blog posts)
32+
- Scrape selected URLs with smartScraper
33+
- Display results and summary
34+
35+
**Usage:**
36+
```bash
37+
node sitemap_with_smartscraper.js
38+
```
39+
40+
**What it does:**
41+
1. Extracts all URLs from a website's sitemap
42+
2. Filters URLs (example: only blog posts or specific sections)
43+
3. Scrapes each filtered URL using smartScraper
44+
4. Extracts structured data from each page
45+
5. Displays a summary of successful and failed scrapes
46+
47+
**Use Cases:**
48+
- Bulk content extraction from blogs
49+
- E-commerce product catalog scraping
50+
- News article aggregation
51+
- Content migration and archival
52+
53+
## πŸ”‘ Setup
54+
55+
Before running the examples, make sure you have:
56+
57+
1. **API Key**: Set your ScrapeGraph AI API key as an environment variable:
58+
```bash
59+
export SGAI_APIKEY="your-api-key-here"
60+
```
61+
62+
Or create a `.env` file in the project root:
63+
```
64+
SGAI_APIKEY=your-api-key-here
65+
```
66+
67+
2. **Dependencies**: Install required packages:
68+
```bash
69+
npm install
70+
```
71+
72+
## πŸ“Š Expected Output
73+
74+
### Basic Sitemap Example Output:
75+
```
76+
πŸ—ΊοΈ Extracting sitemap from: https://example.com/
77+
⏳ Please wait...
78+
79+
βœ… Sitemap extracted successfully!
80+
πŸ“Š Total URLs found: 150
81+
82+
πŸ“„ First 10 URLs:
83+
1. https://example.com/
84+
2. https://example.com/about
85+
3. https://example.com/products
86+
...
87+
88+
πŸ’Ύ URLs saved to: sitemap_urls.txt
89+
πŸ’Ύ JSON saved to: sitemap_urls.json
90+
```
91+
92+
### Advanced Example Output:
93+
```
94+
πŸ—ΊοΈ Step 1: Extracting sitemap from: https://example.com/
95+
⏳ Please wait...
96+
97+
βœ… Sitemap extracted successfully!
98+
πŸ“Š Total URLs found: 150
99+
100+
🎯 Selected 3 URLs to scrape:
101+
1. https://example.com/blog/post-1
102+
2. https://example.com/blog/post-2
103+
3. https://example.com/blog/post-3
104+
105+
πŸ€– Step 2: Scraping selected URLs...
106+
107+
πŸ“„ Scraping (1/3): https://example.com/blog/post-1
108+
βœ… Success
109+
...
110+
111+
πŸ“ˆ Summary:
112+
βœ… Successful: 3
113+
❌ Failed: 0
114+
πŸ“Š Total: 3
115+
```
116+
117+
## πŸ’‘ Tips
118+
119+
1. **Rate Limiting**: When scraping multiple URLs, add delays between requests to avoid rate limiting
120+
2. **Error Handling**: Always use try/catch blocks to handle API errors gracefully
121+
3. **Filtering**: Use URL patterns to filter specific sections (e.g., `/blog/`, `/products/`)
122+
4. **Batch Size**: Start with a small batch to test before processing hundreds of URLs
123+
124+
## πŸ”— Related Documentation
125+
126+
- [Sitemap API Documentation](../../README.md#sitemap)
127+
- [SmartScraper Documentation](../../README.md#smart-scraper)
128+
- [ScrapeGraph AI API Docs](https://docs.scrapegraphai.com)

0 commit comments

Comments
Β (0)