The Docker Compose file defines the garage-meta and garage-data volumes as external volumes, so they need to be created manually once before starting the stack:
docker volume create garage-meta
docker volume create garage-data
docker compose build hn-producer
docker-compose up -dDon't forget to create your access key, secret key and buckets before launching a notebook, in the Garage UI interface !
Open in browser: http://localhost:3909/
Open in browser: http://localhost:8080/ui
Open in browser: http://localhost:8082
- View topics:
hn-stories,hn-comments
jupyter notebook explore_data.ipynbHN API → Kafka Producer → Kafka Topics
↓
┌────────────────┐
│ BRONZE Layer │ ← Spark + Delta Lake
│ (Raw Data) │ • Kafka → Delta
└────────────────┘ • ACID writes
↓
┌────────────────┐
│ SILVER Layer │ ← Spark + Delta Lake
│ (Clean Data) │ • HTML cleaning
└────────────────┘ • Quality scoring
Stories: id, by, title, url, score, descendants, time, type, text, kids, _kafka_offset, _kafka_partition, _bronze_ingested_at
Comments: id, by, parent, story_id, text, time, type, kids, deleted, dead, _kafka_offset, _kafka_partition, _bronze_ingested_at
Stories: id, author, title, url, score, comment_count, timestamp, text_raw, text_clean, has_url, has_text, type
Comments: id, author, story_id, parent, timestamp, text_raw, text_clean, has_text, word_count, char_count, has_replies, is_deleted, is_dead, quality_score, type