Skip to content

Latest commit

 

History

History
147 lines (115 loc) · 4.52 KB

File metadata and controls

147 lines (115 loc) · 4.52 KB
name substack-to-remotion
description Scrape Substack articles with images in-context, follow links for visualizations, output Remotion-ready JSON for video generation. Use when user wants to turn an article into a video.

Substack to Remotion

Scrape Substack articles with images in-context, follow links for additional visualizations, and output Remotion-ready JSON for video generation.

When to Use

  • User wants to turn a Substack article into a video
  • User provides a Substack URL and wants to extract content
  • Converting written articles into video scripts with visuals
  • Extracting article images with their surrounding context preserved
  • Gathering visualizations from linked sources (research papers, data)

Quick Start

# Basic scrape
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "https://example.substack.com/p/article-name"

# With custom output directory
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "URL" ./my-output/

# Follow more links (default is 5)
python3 ~/.claude/skills/substack-to-remotion/scripts/scraper.py "URL" ./output/ --max-links 10

What It Does

  1. Extracts article content - Title, subtitle, author, sections, paragraphs
  2. Downloads all images - Preserves their position in the article flow
  3. Follows links - Crawls linked pages for charts, graphs, data visualizations
  4. Outputs Remotion JSON - Structured scenes ready for video generation

Output Structure

output-directory/
├── remotion-scenes.json      # Main Remotion input file
├── article-data.json         # Full extracted article data
├── article-summary.md        # Human-readable summary
├── linked-sources.json       # Data from followed links
└── images/
    ├── article-000-*.png     # Images from the article
    ├── article-001-*.png
    └── linked/               # Images from followed links
        ├── linked-0-*.png
        └── linked-1-*.png

Remotion Scene Types

The JSON output contains scenes of these types:

Type Description Use In Remotion
title Opening title card Title sequence with article name, author
section-header Section heading Transition card between topics
image-with-context Image + surrounding text Main content - show image with narration
narration Text-only content Voiceover with background/b-roll
quote Blockquote Stylized quote display
list Bullet/numbered list Animated list reveal
linked-source External visualization Show source attribution + image

Example Remotion Scene

{
  "id": "scene-006",
  "type": "image-with-context",
  "content": {
    "heading": "The Research Signal",
    "context": "Three independent systematic reviews found consistent evidence...",
    "image": {
      "localPath": "images/article-001-fa981f78.png",
      "alt": "Forest plot showing mortality reduction",
      "caption": "Meta-analysis results"
    }
  },
  "duration": 6
}

Using with Remotion

After scraping, use the output with Remotion:

// In your Remotion project
import sceneData from './scraped-article/remotion-scenes.json';

export const ArticleVideo: React.FC = () => {
  const { scenes } = sceneData;

  return (
    <Composition
      id="ArticleVideo"
      component={ArticleSequence}
      durationInFrames={sceneData.totalDuration * 30}
      fps={30}
      width={1920}
      height={1080}
      defaultProps={{ scenes }}
    />
  );
};

Options

Option Description Default
--max-links N Maximum linked pages to crawl 5
--no-links Don't follow any links false
--images-only Only download images, skip JSON false

Requirements

  • Python 3.9+
  • Playwright (pip install playwright && playwright install chromium)
  • aiohttp (pip install aiohttp)

Link Filtering

The scraper intelligently filters links to focus on content-relevant sources:

Follows:

  • Research papers and studies
  • News articles
  • Documentation pages
  • Data sources

Skips:

  • Social media (Twitter, Facebook, etc.)
  • YouTube (can't extract without API)
  • Subscription/login pages
  • Navigation links

Tips

  1. For long articles: The scraper handles pagination automatically
  2. For paywalled content: You may need to be logged in (not yet supported)
  3. For video creation: Use duration hints but adjust based on actual content
  4. For narration: The context field contains good voiceover text