GitHub - hydropix/SCRIBE: Automated intelligence gathering system that collects, analyzes, and synthesizes the latest AI news from Reddit and YouTube using local LLMs

Source Content Retrieval and Intelligence Bot Engine

Automated Multi-Topic Intelligence Gathering System

What is SCRIBE?

SCRIBE is an automated monitoring bot that watches Reddit and YouTube, analyzes content with a local AI (Ollama), and generates daily reports. It uses a package-based architecture allowing you to monitor multiple topics independently.

In short: You create a package for your topic → SCRIBE collects → AI analyzes → You receive a Markdown report (+ optional Discord notification).

Key Features

Multi-package architecture - Monitor multiple topics independently (AI, cybersecurity, gaming, etc.)
Multi-source - Reddit and YouTube with customizable sources per package
Local AI - Uses Ollama (Mistral, Phi4, Llama3, Qwen3...) - no cloud API needed
Smart deduplication - Semantic detection of similar content (TF-IDF + SimHash)
Professional reports - Structured Markdown with insights and metrics
Multiple notification channels - Discord (rich embeds) and Synology Chat webhooks
SQLite cache - Per-package isolation, avoids reprocessing
Multilingual - Reports in 11 languages (en, fr, es, de, it, pt, nl, ru, zh, ja, ar)

How Does It Work?

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   REDDIT    │     │   YOUTUBE   │     │   DISCORD   │
│    subs     │     │ transcripts │     │  + SYNOLOGY │
└──────┬──────┘     └──────┬──────┘     └──────▲──────┘
       │                   │                   │
       └─────────┬─────────┘                   │
                 ▼                             │
        ┌─────────────────┐                    │
        │   1. COLLECT    │                    │
        │   Posts/Videos  │                    │
        └────────┬────────┘                    │
                 ▼                             │
        ┌─────────────────┐                    │
        │   2. FILTER     │                    │
        │  (SQLite Cache) │                    │
        └────────┬────────┘                    │
                 ▼                             │
        ┌─────────────────┐                    │
        │  3. AI ANALYSIS │                    │
        │      (LLM)      │                    │
        │  Score 1-10     │                    │
        │  Category       │                    │
        │  Insights       │                    │
        └────────┬────────┘                    │
                 ▼                             │
        ┌─────────────────┐                    │
        │ 4. DEDUPLICATE  │                    │
        │   (Semantic)    │                    │
        └────────┬────────┘                    │
                 ▼                             │
        ┌─────────────────┐                    │
        │   5. REPORT     │────────────────────┘
        │   (Markdown)    │
        └─────────────────┘

The 9-Step Pipeline

Cache cleanup - 90-day retention policy
Data collection - Reddit posts + YouTube videos with transcripts
Content preparation - Unified format for all sources
Cache filtering - Skip already processed content
AI Analysis - Ollama scores relevance (1-10), categorizes, extracts insights
Relevance filtering - Keep only score >= threshold (default: 7/10)
Deduplicate - Semantic duplicate detection (TF-IDF + SimHash)
Report - Generate Markdown + Discord notification with rich embeds
Summary - Optional concise daily summary to separate Discord webhook

Quick Installation

Prerequisites

Python 3.10+
Ollama installed and running
Git (for automatic updates)

1. Install Ollama

# Download from https://ollama.ai/
# Then:
ollama pull qwen3:14b  # or mistral, phi4, llama3

2. Clone and First-Time Setup (Recommended)

git clone https://github.com/your-repo/SCRIBE.git
cd SCRIBE

Windows - Simple method:

# Double-click setup.bat or run:
setup.bat

This will automatically:

Create the virtual environment
Install all dependencies
Copy .env.example to .env (if needed)
Run configuration tests to verify everything works

3. Configure Your Credentials

Edit the .env file with your API keys:

# Reddit (https://reddit.com/prefs/apps)
REDDIT_CLIENT_ID=your_id
REDDIT_CLIENT_SECRET=your_secret
REDDIT_USER_AGENT=SCRIBE/1.0

# YouTube (https://console.cloud.google.com/)
YOUTUBE_API_KEY=your_key

# Ollama
OLLAMA_HOST=http://localhost:11434

# Package-specific webhooks (Discord and/or Synology Chat)
DISCORD_AI_TRENDS_WEBHOOK=your_discord_webhook
DISCORD_AI_TRENDS_SUMMARY_WEBHOOK=your_discord_summary_webhook

# Optional: Synology Chat webhooks
SYNOLOGY_AI_TRENDS_WEBHOOK=https://your-synology:5001/webapi/entry.cgi?api=SYNO.Chat.External&method=incoming&version=2&token=xxx
SYNOLOGY_AI_TRENDS_SUMMARY_WEBHOOK=https://your-synology:5001/webapi/entry.cgi?api=SYNO.Chat.External&method=incoming&version=2&token=yyy

Note: Without Reddit/YouTube credentials, the app will still work but without those sources.

4. Run the AI Trends Package

# Windows - Simple method:
run_ai_trends.bat

# Or manually:
python main.py --package ai_trends --mode once

Available Batch Files

File	Description
`setup.bat`	First-time setup: venv, dependencies, .env, and tests
`run_ai_trends.bat`	Run the AI Trends package
`test_connections.bat`	Run API connection tests (Reddit, YouTube, Ollama, Discord)

Usage

List Available Packages

python main.py --list-packages

Run a Specific Package

python main.py --package ai_trends --mode once

The report will be generated in data/ai_trends/reports/.

Run with Specific Language

python main.py --package ai_trends --mode once --lang en  # English
python main.py --package ai_trends --mode once --lang fr  # French

View Package Statistics

python main.py --package ai_trends --mode stats

Shows: processed content, relevance rate, breakdown by source/category.

Verify API Connections

python tests/test_connections.py

Creating a New Package

SCRIBE's package system allows you to create independent monitoring configurations for any topic.

1. Copy an Existing Package

cp -r packages/ai_trends packages/cybersecurity

2. Edit Package Settings

packages/cybersecurity/settings.yaml:

package:
  name: cybersecurity
  display_name: "Cybersecurity Watch"
  description: "Monitor cybersecurity news and threats"

reddit:
  subreddits:
    - netsec
    - cybersecurity
    - hacking
    - ReverseEngineering
  posts_limit: 10
  comments_limit: 5
  timeframe: "day"

youtube:
  keywords:
    - "cybersecurity news"
    - "malware analysis"
  channels:
    - "@JohnHammond"
    - "@LiveOverflow"
  videos_per_source: 5

analysis:
  relevance_threshold: 7
  similarity_threshold: 0.85

categories:
  - Malware Analysis
  - Vulnerability Research
  - Threat Intelligence
  - Network Security
  - Incident Response

reporting:
  language: "en"
  min_insights: 1

discord:
  enabled: true
  webhook_env: "DISCORD_CYBERSECURITY_WEBHOOK"
  rich_embeds: true
  summary:
    enabled: true
    webhook_env: "DISCORD_CYBERSECURITY_SUMMARY_WEBHOOK"

# Optional: Synology Chat
synology:
  enabled: false
  webhook_env: "SYNOLOGY_CYBERSECURITY_WEBHOOK"

3. Customize LLM Prompts

packages/cybersecurity/prompts.yaml:

Adapt the system prompts for your domain:

relevance_analyzer - How to score content relevance
insight_extractor - How to extract key insights
executive_summary - How to write the report summary
daily_summary - How to write the Discord summary

4. Add Webhooks to .env

DISCORD_CYBERSECURITY_WEBHOOK=https://discord.com/api/webhooks/...
DISCORD_CYBERSECURITY_SUMMARY_WEBHOOK=https://discord.com/api/webhooks/...

5. Run Your New Package

python main.py --package cybersecurity --mode once

Configuration

Global Settings (config/global.yaml)

Shared configuration for all packages:

ollama:
  model: "qwen3:14b"  # or mistral, phi4, llama3
  parameters:
    temperature: 0.3
    num_ctx: 32768

Package Settings (packages//settings.yaml)

Each package has its own:

Sources - Subreddits, YouTube channels/keywords
Thresholds - Relevance score, similarity detection
Categories - Domain-specific classification
Discord - Separate webhooks per package

Notifications

SCRIBE supports multiple notification channels that can be enabled independently per package.

Discord Notifications

Main Notification (Step 8):

Rich embeds with images (Reddit posts + YouTube thumbnails)
Detailed insights per category
Configured via discord.webhook_env in package settings

Summary Notification (Step 9, Optional):

Concise AI-generated overview (<2000 chars)
Sent to separate webhook
Enable in package settings:

discord:
  enabled: true
  webhook_env: "DISCORD_AI_TRENDS_WEBHOOK"
  summary:
    enabled: true
    webhook_env: "DISCORD_AI_TRENDS_SUMMARY_WEBHOOK"

Synology Chat Notifications

Main Notification (Step 8b):

Formatted text messages with category, insights, and metadata
Configured via synology.webhook_env in package settings

Summary Notification (Step 9b, Optional):

Concise AI-generated overview (same as Discord)
Sent to separate webhook

Enable Synology Chat in package settings:

synology:
  enabled: true
  webhook_env: "SYNOLOGY_AI_TRENDS_WEBHOOK"
  summary:
    enabled: true
    webhook_env: "SYNOLOGY_AI_TRENDS_SUMMARY_WEBHOOK"

Get your Synology webhook URL: Synology Chat > Integration > Incoming Webhook

Test Synology integration:

python tests/test_synology.py

Daily Scheduler (Windows)

To run SCRIBE automatically every day:

Using Task Scheduler GUI

Press Win + R, type taskschd.msc, press Enter
Click Create Basic Task
Name: "SCRIBE AI Trends" → Next
Trigger: Daily at your preferred time → Next
Action: Start a program
Program: python
Arguments: main.py --package ai_trends --mode once
Start in: C:\path\to\SCRIBE

Command Line

schtasks /create /tn "SCRIBE AI Trends" /tr "python main.py --package ai_trends --mode once" /sc daily /st 08:00

Generated Report Example

# SCRIBE - CYBERSECURITY INTELLIGENCE REPORT
## 2025-01-15 | 08:00

New ransomware variants targeting critical infrastructure
detected this week, with increased activity from APT groups...

## Malware Analysis
   3 insight(s)

### 1. New Ransomware Strain Targets Healthcare
**Source**: Reddit
**Link**: https://reddit.com/r/netsec/...
**Relevance**: 9/10
**Author**: u/malware_analyst

**Insights**: A new ransomware variant has been discovered
targeting healthcare systems with sophisticated evasion...

---

## Threat Intelligence
   2 insight(s)

### 1. APT Group Activity Analysis
**Source**: YouTube
**Link**: https://youtube.com/watch?v=...
**Relevance**: 8/10

**Insights**: Detailed breakdown of recent APT campaign...

---

*Report generated by SCRIBE - 5 total insights*

Troubleshooting

"Model not found"

ollama list              # View installed models
ollama pull qwen3:14b    # Install missing model

"Package not found"

python main.py --list-packages  # List available packages

Ensure your package directory exists in packages/ with valid settings.yaml.

"Reddit credentials invalid"

Check that your .env contains the correct values from https://reddit.com/prefs/apps

"YouTube quota exceeded"

YouTube API has a free daily limit. Reduce videos_per_source in your package settings.

Ollama too slow

Use a lighter model (phi4 vs qwen3:14b)
Reduce content limits in package settings

Not enough insights

Lower relevance_threshold (e.g., 5 instead of 7)
Increase posts_limit or videos_per_source
Add more subreddits/channels

Testing

All tests are in the tests/ directory:

python tests/test_connections.py       # Verify all API connections (Reddit, YouTube, Ollama, Discord, Synology)
python tests/test_discord_split.py     # Test Discord message splitting
python tests/test_discord_images.py    # Test Discord image embeds
python tests/test_synology.py          # Test Synology Chat webhook integration

Logs

Per-package logging:

Application: logs/<package_name>.log
Raw data: data/<package_name>/raw_logs/

Adding New Collectors

Create collector in src/collectors/ following existing patterns
Constructor must accept config: dict
Return list of dicts with: id, source, title, text, url, timestamp, metadata
Register in main.py SCRIBE class

Roadmap

Contributing

The project is designed to be extensible:

Add packages: packages/<your_topic>/
Add sources: src/collectors/
New processors: src/processors/
Report templates: src/storage/report_generator.py

License

MIT License - Free to use and modify

SCRIBE - Your Automated Multi-Topic Intelligence Assistant

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
config		config
docs		docs
packages/ai_trends		packages/ai_trends
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
SCRIBE_Logo.png		SCRIBE_Logo.png
main.py		main.py
requirements.txt		requirements.txt
run_ai_trends.bat		run_ai_trends.bat
setup.bat		setup.bat
test_connections.bat		test_connections.bat

hydropix/SCRIBE

Folders and files

Latest commit

History

Repository files navigation