| name | substack-to-remotion |
|---|---|
| description | Scrape Substack articles with images in-context, follow links for visualizations, output Remotion-ready JSON for video generation. Use when user wants to turn an article into a video. |
Scrape Substack articles with images in-context, follow links for additional visualizations, and output Remotion-ready JSON for video generation.
- User wants to turn a Substack article into a video
- User provides a Substack URL and wants to extract content
- Converting written articles into video scripts with visuals
- Extracting article images with their surrounding context preserved
- Gathering visualizations from linked sources (research papers, data)
# Basic scrape
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "https://example.substack.com/p/article-name"
# With custom output directory
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "URL" ./my-output/
# Follow more links (default is 5)
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "URL" ./output/ --max-links 10- Extracts article content - Title, subtitle, author, sections, paragraphs
- Downloads all images - Preserves their position in the article flow
- Follows links - Crawls linked pages for charts, graphs, data visualizations
- Outputs Remotion JSON - Structured scenes ready for video generation
output-directory/
├── remotion-scenes.json # Main Remotion input file
├── article-data.json # Full extracted article data
├── article-summary.md # Human-readable summary
├── linked-sources.json # Data from followed links
└── images/
├── article-000-*.png # Images from the article
├── article-001-*.png
└── linked/ # Images from followed links
├── linked-0-*.png
└── linked-1-*.png
The JSON output contains scenes of these types:
| Type | Description | Use In Remotion |
|---|---|---|
title |
Opening title card | Title sequence with article name, author |
section-header |
Section heading | Transition card between topics |
image-with-context |
Image + surrounding text | Main content - show image with narration |
narration |
Text-only content | Voiceover with background/b-roll |
quote |
Blockquote | Stylized quote display |
list |
Bullet/numbered list | Animated list reveal |
linked-source |
External visualization | Show source attribution + image |
{
"id": "scene-006",
"type": "image-with-context",
"content": {
"heading": "The Research Signal",
"context": "Three independent systematic reviews found consistent evidence...",
"image": {
"localPath": "images/article-001-fa981f78.png",
"alt": "Forest plot showing mortality reduction",
"caption": "Meta-analysis results"
}
},
"duration": 6
}After scraping, use the output with Remotion:
// In your Remotion project
import sceneData from './scraped-article/remotion-scenes.json';
export const ArticleVideo: React.FC = () => {
const { scenes } = sceneData;
return (
<Composition
id="ArticleVideo"
component={ArticleSequence}
durationInFrames={sceneData.totalDuration * 30}
fps={30}
width={1920}
height={1080}
defaultProps={{ scenes }}
/>
);
};| Option | Description | Default |
|---|---|---|
--max-links N |
Maximum linked pages to crawl | 5 |
--no-links |
Don't follow any links | false |
--images-only |
Only download images, skip JSON | false |
- Python 3.9+
- Playwright (
pip install playwright && playwright install chromium) - aiohttp (
pip install aiohttp)
The scraper intelligently filters links to focus on content-relevant sources:
Follows:
- Research papers and studies
- News articles
- Documentation pages
- Data sources
Skips:
- Social media (Twitter, Facebook, etc.)
- YouTube (can't extract without API)
- Subscription/login pages
- Navigation links
- For long articles: The scraper handles pagination automatically
- For paywalled content: You may need to be logged in (not yet supported)
- For video creation: Use
durationhints but adjust based on actual content - For narration: The
contextfield contains good voiceover text