SCRIBE is an automated monitoring bot that watches Reddit and YouTube, analyzes content with a local AI (Ollama), and generates daily reports. It uses a package-based architecture allowing you to monitor multiple topics independently.
In short: You create a package for your topic → SCRIBE collects → AI analyzes → You receive a Markdown report (+ optional Discord notification).
- Multi-package architecture - Monitor multiple topics independently (AI, cybersecurity, gaming, etc.)
- Multi-source - Reddit and YouTube with customizable sources per package
- Local AI - Uses Ollama (Mistral, Phi4, Llama3, Qwen3...) - no cloud API needed
- Smart deduplication - Semantic detection of similar content (TF-IDF + SimHash)
- Professional reports - Structured Markdown with insights and metrics
- Multiple notification channels - Discord (rich embeds) and Synology Chat webhooks
- SQLite cache - Per-package isolation, avoids reprocessing
- Multilingual - Reports in 11 languages (en, fr, es, de, it, pt, nl, ru, zh, ja, ar)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ REDDIT │ │ YOUTUBE │ │ DISCORD │
│ subs │ │ transcripts │ │ + SYNOLOGY │
└──────┬──────┘ └──────┬──────┘ └──────▲──────┘
│ │ │
└─────────┬─────────┘ │
▼ │
┌─────────────────┐ │
│ 1. COLLECT │ │
│ Posts/Videos │ │
└────────┬────────┘ │
▼ │
┌─────────────────┐ │
│ 2. FILTER │ │
│ (SQLite Cache) │ │
└────────┬────────┘ │
▼ │
┌─────────────────┐ │
│ 3. AI ANALYSIS │ │
│ (LLM) │ │
│ Score 1-10 │ │
│ Category │ │
│ Insights │ │
└────────┬────────┘ │
▼ │
┌─────────────────┐ │
│ 4. DEDUPLICATE │ │
│ (Semantic) │ │
└────────┬────────┘ │
▼ │
┌─────────────────┐ │
│ 5. REPORT │────────────────────┘
│ (Markdown) │
└─────────────────┘
- Cache cleanup - 90-day retention policy
- Data collection - Reddit posts + YouTube videos with transcripts
- Content preparation - Unified format for all sources
- Cache filtering - Skip already processed content
- AI Analysis - Ollama scores relevance (1-10), categorizes, extracts insights
- Relevance filtering - Keep only score >= threshold (default: 7/10)
- Deduplicate - Semantic duplicate detection (TF-IDF + SimHash)
- Report - Generate Markdown + Discord notification with rich embeds
- Summary - Optional concise daily summary to separate Discord webhook
- Python 3.10+
- Ollama installed and running
- Git (for automatic updates)
# Download from https://ollama.ai/
# Then:
ollama pull qwen3:14b # or mistral, phi4, llama3git clone https://github.com/your-repo/SCRIBE.git
cd SCRIBEWindows - Simple method:
# Double-click setup.bat or run:
setup.batThis will automatically:
- Create the virtual environment
- Install all dependencies
- Copy
.env.exampleto.env(if needed) - Run configuration tests to verify everything works
Edit the .env file with your API keys:
# Reddit (https://reddit.com/prefs/apps)
REDDIT_CLIENT_ID=your_id
REDDIT_CLIENT_SECRET=your_secret
REDDIT_USER_AGENT=SCRIBE/1.0
# YouTube (https://console.cloud.google.com/)
YOUTUBE_API_KEY=your_key
# Ollama
OLLAMA_HOST=http://localhost:11434
# Package-specific webhooks (Discord and/or Synology Chat)
DISCORD_AI_TRENDS_WEBHOOK=your_discord_webhook
DISCORD_AI_TRENDS_SUMMARY_WEBHOOK=your_discord_summary_webhook
# Optional: Synology Chat webhooks
SYNOLOGY_AI_TRENDS_WEBHOOK=https://your-synology:5001/webapi/entry.cgi?api=SYNO.Chat.External&method=incoming&version=2&token=xxx
SYNOLOGY_AI_TRENDS_SUMMARY_WEBHOOK=https://your-synology:5001/webapi/entry.cgi?api=SYNO.Chat.External&method=incoming&version=2&token=yyyNote: Without Reddit/YouTube credentials, the app will still work but without those sources.
# Windows - Simple method:
run_ai_trends.bat
# Or manually:
python main.py --package ai_trends --mode once| File | Description |
|---|---|
setup.bat |
First-time setup: venv, dependencies, .env, and tests |
run_ai_trends.bat |
Run the AI Trends package |
test_connections.bat |
Run API connection tests (Reddit, YouTube, Ollama, Discord) |
python main.py --list-packagespython main.py --package ai_trends --mode onceThe report will be generated in data/ai_trends/reports/.
python main.py --package ai_trends --mode once --lang en # English
python main.py --package ai_trends --mode once --lang fr # Frenchpython main.py --package ai_trends --mode statsShows: processed content, relevance rate, breakdown by source/category.
python tests/test_connections.pySCRIBE's package system allows you to create independent monitoring configurations for any topic.
cp -r packages/ai_trends packages/cybersecuritypackages/cybersecurity/settings.yaml:
package:
name: cybersecurity
display_name: "Cybersecurity Watch"
description: "Monitor cybersecurity news and threats"
reddit:
subreddits:
- netsec
- cybersecurity
- hacking
- ReverseEngineering
posts_limit: 10
comments_limit: 5
timeframe: "day"
youtube:
keywords:
- "cybersecurity news"
- "malware analysis"
channels:
- "@JohnHammond"
- "@LiveOverflow"
videos_per_source: 5
analysis:
relevance_threshold: 7
similarity_threshold: 0.85
categories:
- Malware Analysis
- Vulnerability Research
- Threat Intelligence
- Network Security
- Incident Response
reporting:
language: "en"
min_insights: 1
discord:
enabled: true
webhook_env: "DISCORD_CYBERSECURITY_WEBHOOK"
rich_embeds: true
summary:
enabled: true
webhook_env: "DISCORD_CYBERSECURITY_SUMMARY_WEBHOOK"
# Optional: Synology Chat
synology:
enabled: false
webhook_env: "SYNOLOGY_CYBERSECURITY_WEBHOOK"packages/cybersecurity/prompts.yaml:
Adapt the system prompts for your domain:
relevance_analyzer- How to score content relevanceinsight_extractor- How to extract key insightsexecutive_summary- How to write the report summarydaily_summary- How to write the Discord summary
DISCORD_CYBERSECURITY_WEBHOOK=https://discord.com/api/webhooks/...
DISCORD_CYBERSECURITY_SUMMARY_WEBHOOK=https://discord.com/api/webhooks/...python main.py --package cybersecurity --mode onceShared configuration for all packages:
ollama:
model: "qwen3:14b" # or mistral, phi4, llama3
parameters:
temperature: 0.3
num_ctx: 32768Each package has its own:
- Sources - Subreddits, YouTube channels/keywords
- Thresholds - Relevance score, similarity detection
- Categories - Domain-specific classification
- Discord - Separate webhooks per package
SCRIBE supports multiple notification channels that can be enabled independently per package.
Main Notification (Step 8):
- Rich embeds with images (Reddit posts + YouTube thumbnails)
- Detailed insights per category
- Configured via
discord.webhook_envin package settings
Summary Notification (Step 9, Optional):
- Concise AI-generated overview (<2000 chars)
- Sent to separate webhook
- Enable in package settings:
discord:
enabled: true
webhook_env: "DISCORD_AI_TRENDS_WEBHOOK"
summary:
enabled: true
webhook_env: "DISCORD_AI_TRENDS_SUMMARY_WEBHOOK"Main Notification (Step 8b):
- Formatted text messages with category, insights, and metadata
- Configured via
synology.webhook_envin package settings
Summary Notification (Step 9b, Optional):
- Concise AI-generated overview (same as Discord)
- Sent to separate webhook
Enable Synology Chat in package settings:
synology:
enabled: true
webhook_env: "SYNOLOGY_AI_TRENDS_WEBHOOK"
summary:
enabled: true
webhook_env: "SYNOLOGY_AI_TRENDS_SUMMARY_WEBHOOK"Get your Synology webhook URL: Synology Chat > Integration > Incoming Webhook
Test Synology integration:
python tests/test_synology.pyTo run SCRIBE automatically every day:
- Press Win + R, type
taskschd.msc, press Enter - Click Create Basic Task
- Name: "SCRIBE AI Trends" → Next
- Trigger: Daily at your preferred time → Next
- Action: Start a program
- Program:
python - Arguments:
main.py --package ai_trends --mode once - Start in:
C:\path\to\SCRIBE
schtasks /create /tn "SCRIBE AI Trends" /tr "python main.py --package ai_trends --mode once" /sc daily /st 08:00# SCRIBE - CYBERSECURITY INTELLIGENCE REPORT
## 2025-01-15 | 08:00
New ransomware variants targeting critical infrastructure
detected this week, with increased activity from APT groups...
## Malware Analysis
3 insight(s)
### 1. New Ransomware Strain Targets Healthcare
**Source**: Reddit
**Link**: https://reddit.com/r/netsec/...
**Relevance**: 9/10
**Author**: u/malware_analyst
**Insights**: A new ransomware variant has been discovered
targeting healthcare systems with sophisticated evasion...
---
## Threat Intelligence
2 insight(s)
### 1. APT Group Activity Analysis
**Source**: YouTube
**Link**: https://youtube.com/watch?v=...
**Relevance**: 8/10
**Insights**: Detailed breakdown of recent APT campaign...
---
*Report generated by SCRIBE - 5 total insights*ollama list # View installed models
ollama pull qwen3:14b # Install missing modelpython main.py --list-packages # List available packagesEnsure your package directory exists in packages/ with valid settings.yaml.
Check that your .env contains the correct values from https://reddit.com/prefs/apps
YouTube API has a free daily limit. Reduce videos_per_source in your package settings.
- Use a lighter model (phi4 vs qwen3:14b)
- Reduce content limits in package settings
- Lower
relevance_threshold(e.g., 5 instead of 7) - Increase
posts_limitorvideos_per_source - Add more subreddits/channels
All tests are in the tests/ directory:
python tests/test_connections.py # Verify all API connections (Reddit, YouTube, Ollama, Discord, Synology)
python tests/test_discord_split.py # Test Discord message splitting
python tests/test_discord_images.py # Test Discord image embeds
python tests/test_synology.py # Test Synology Chat webhook integrationPer-package logging:
- Application:
logs/<package_name>.log - Raw data:
data/<package_name>/raw_logs/
- Create collector in
src/collectors/following existing patterns - Constructor must accept
config: dict - Return list of dicts with:
id,source,title,text,url,timestamp,metadata - Register in
main.pySCRIBE class
- Reddit and YouTube collection
- Local AI analysis (Ollama)
- Semantic deduplication
- Markdown reports
- Discord notifications with rich embeds
- Multi-package architecture
- Per-package Discord webhooks
- Synology Chat webhook support
- Daily summary feature
- Emerging trend detection
The project is designed to be extensible:
- Add packages:
packages/<your_topic>/ - Add sources:
src/collectors/ - New processors:
src/processors/ - Report templates:
src/storage/report_generator.py
MIT License - Free to use and modify
