🎯 Complete Feature List

EU Intelligence Hub - European News Intelligence Platform

Last Updated: 2025-11-17

📋 Table of Contents

Core Features
AI/ML Features
API Endpoints
Frontend Features
Admin Features
Automation Features
Infrastructure Features

🚀 Core Features

1. Multi-Language Keyword Tracking

Status: ✅ Fully Implemented

Track keywords across 9 languages with automatic translations:

🇬🇧 English (keyword_en) - Primary
🇹🇭 Thai (keyword_th)
🇩🇪 German (keyword_de)
🇫🇷 French (keyword_fr)
🇪🇸 Spanish (keyword_es)
🇮🇹 Italian (keyword_it)
🇵🇱 Polish (keyword_pl)
🇸🇪 Swedish (keyword_sv)
🇳🇱 Dutch (keyword_nl)

How it works:

Submit keyword in English
AI automatically translates to other languages using context-aware translation
Searches conducted across all language variants

Database Table: keywords with 9 language columns Service: app.services.keyword_approval:KeywordApprovalService

2. Dual-Layer Sentiment Analysis

Status: ✅ Fully Implemented

Two-stage sentiment analysis for speed + accuracy:

Stage 1: VADER Baseline (Fast)

Lexicon-based sentiment scoring
Real-time processing (<10ms per article)
Fallback when Gemini unavailable

Stage 2: Gemini AI Enhancement (Accurate)

Nuanced opinion detection
Sarcasm and context understanding
Confidence scoring (0.0 - 1.0)
Subjectivity rating (0.0 - 1.0)
Emotion breakdown: positive/negative/neutral

Output Fields:

{
  "sentiment_overall": float,         # -1.0 to 1.0
  "sentiment_confidence": float,      # 0.0 to 1.0
  "sentiment_subjectivity": float,    # 0.0 to 1.0
  "emotion_positive": float,          # 0.0 to 1.0
  "emotion_negative": float,          # 0.0 to 1.0
  "emotion_neutral": float            # 0.0 to 1.0
}

Service: app.services.sentiment:SentimentAnalyzer Fallback: Graceful degradation to VADER when Gemini fails

3. Vector Embedding Semantic Search

Status: ✅ Fully Implemented

Technology:

Sentence Transformers (all-MiniLM-L6-v2 model)
384-dimensional embeddings
PostgreSQL pgvector extension
Cosine similarity search

Capabilities:

Find conceptually similar articles (not just keyword matches)
Semantic search: "tourism growth" finds "visitor increases"
Article similarity detection (>0.7 threshold)
50ms average query time for 100K embeddings

Database: Vector columns in articles and keywords tables Service: app.services.embeddings:EmbeddingGenerator API: /api/search/semantic, /api/search/similar/{article_id}

4. Intelligent Keyword Suggestion System

Status: ✅ Fully Implemented

User-submitted keyword suggestions with AI evaluation:

Workflow:

User submits keyword suggestion via /suggest page
AI evaluates significance (Gemini API)
Admin reviews pending suggestions
Auto-translation to 9 languages on approval
Immediate news search triggered (bypassing 3-hour cooldown)

AI Evaluation Metrics:

Searchability Score (0-100): How easy to find news articles
Significance Score (0-100): Global/regional importance
Specificity: "too_broad", "optimal", "too_specific"
Decision: "approve", "reject", "needs_review"
Reasoning: AI explanation of decision

Database Tables:

keyword_suggestions - User submissions
keyword_evaluations - AI evaluation history

Services:

app.services.keyword_approval:KeywordApprovalService
app.api.suggestions:router
app.api.admin:router (approval/rejection)

5. Smart Keyword Search Scheduling

Status: ✅ Fully Implemented

Prevents duplicate searches and manages API quota efficiently:

Features:

3-hour cooldown between searches for same keyword
Priority queue system (0-100 scale)
Retry mechanism (max 3 attempts)
Scheduled search queue

Database Table: keyword_search_queue Columns:

scheduled_at - When to execute search
priority - Queue priority (0-100)
attempts - Retry count
max_attempts - Failure threshold
last_error - Error tracking

Celery Tasks:

populate_keyword_queue - Runs every 30 minutes
process_keyword_queue - Runs every 15 minutes

Service: app.services.keyword_scheduler:KeywordScheduler

6. News Source Management

Status: ✅ Fully Implemented

Configurable news sources with metadata:

12 Pre-configured Sources:

BBC News
Reuters
Deutsche Welle (DW)
France 24
Euronews
The Guardian
The Telegraph
El País
Le Monde
Corriere della Sera
Politico Europe
EUobserver

Source Configuration:

Name, base URL, language
Country of origin
Priority level (0-100)
Parser type
Tags for categorization
Enable/disable toggle

Ingestion Tracking:

Last run timestamp
Articles ingested count
Success/failure status

Database Tables:

news_sources - Source configuration
source_ingestion_history - Tracking

API Endpoints:

GET /admin/sources - List sources
POST /admin/sources - Add source
POST /admin/sources/{id}/toggle - Enable/disable
GET /admin/sources/{id}/ingestion - View history

🤖 AI/ML Features

7. Gemini AI Integration

Status: ✅ Fully Implemented

Use Cases:

Sentiment Analysis: Nuanced opinion detection
News Discovery: AI-researched article finding
Keyword Extraction: spaCy NER + Gemini validation
Keyword Evaluation: Automatic suggestion scoring
Translation: Context-aware multi-language

Rate Limiting:

30 calls per minute (configurable)
Automatic retry with exponential backoff
Fallback to deterministic methods

Error Handling:

Graceful degradation
Detailed error logging
Quota tracking

Service: app.services.gemini_client:GeminiClient

8. Keyword Extraction Pipeline

Status: ✅ Fully Implemented

Two-Stage Extraction:

Stage 1: spaCy NER

Named Entity Recognition
Extracts: PERSON, ORG, GPE, LOC, EVENT
Fast baseline extraction

Stage 2: Gemini Validation

Confirms relevance
Extracts additional keywords missed by spaCy
Filters noise and irrelevant entities

Output: List of validated keywords with confidence scores

Service: app.services.keyword_extractor:KeywordExtractor

9. Fact vs. Opinion Classification

Status: ✅ Fully Implemented

Automatically classifies articles:

fact: Objective reporting
opinion: Editorial/commentary
mixed: Contains both

Method: Gemini AI analysis of article content Database: articles.classification column Use Case: Filter news by objectivity level

📡 API Endpoints

Keyword Management (7 endpoints)

GET    /api/keywords/                         # Search with pagination & filters
GET    /api/keywords/{id}                     # Detailed keyword info
GET    /api/keywords/{id}/articles            # Related articles (sorted)
GET    /api/keywords/{id}/relations           # Mind map data
POST   /api/suggestions/                      # Submit keyword suggestion
GET    /api/suggestions/                      # List all suggestions
POST   /api/suggestions/{id}/vote             # Upvote suggestion

Sentiment Analysis (4 endpoints)

GET    /api/sentiment/keywords/{id}/sentiment           # Overall stats
GET    /api/sentiment/keywords/{id}/sentiment/timeline  # Time-series (7/30/90 days)
GET    /api/sentiment/keywords/compare                  # Multi-keyword comparison
GET    /api/sentiment/articles/{id}/sentiment           # Article-level analysis

Semantic Search (4 endpoints) ✨ ENHANCED

GET    /api/search/articles                   # Full-text search
GET    /api/search/semantic                   # Vector similarity
GET    /api/search/similar/{article_id}       # Find similar articles
GET    /api/search/keywords/multilingual      # Multi-language keyword search ✨ NEW

New Feature: Multilingual Keyword Search

Search keywords across ALL 9 European languages simultaneously
Query in any language (EN, TH, DE, FR, ES, IT, PL, SV, NL)
Returns results showing which language matched
Perfect for cross-language discovery

Document Processing (1 endpoint)

POST   /api/documents/upload                  # Upload PDF/DOCX/TXT

Admin - Source Management (5 endpoints) ✨ ENHANCED

GET    /admin/sources                         # List all news sources
POST   /admin/sources                         # Add new source
POST   /admin/sources/{id}/toggle             # Enable/disable source
GET    /admin/search                          # Comprehensive admin search ✨ NEW
GET    /admin/sources/{id}/ingestion          # View ingestion history

New Feature: Admin Comprehensive Search

Admin-only endpoint (requires HTTP Basic Auth)
Searches across ALL content types simultaneously:
- Keywords (all 9 languages)
- Articles (all languages, title, summary, full_text)
- Keyword Suggestions (all languages + reason text)
- News Sources (name, country, language)
Filter by specific type or search all at once
Returns unified results grouped by type
Perfect for managing and auditing the entire system

Admin - Keyword Approval (5 endpoints) ✨ NEW

POST   /admin/keywords/suggestions/{id}/process    # AI evaluation
POST   /admin/keywords/suggestions/{id}/approve    # Approve + auto-translate + search
POST   /admin/keywords/suggestions/{id}/reject     # Reject suggestion
GET    /admin/keywords/suggestions/pending         # View pending
GET    /admin/keywords/suggestions/stats           # Dashboard stats

Admin - Evaluation History (1 endpoint) ✨ NEW

GET    /admin/suggestions/{id}/evaluations    # View AI evaluation history

Total: 30+ API endpoints Documentation: http://localhost:8000/docs (Swagger UI)

🖥️ Frontend Features

8 Main Pages ✨ UPDATED

1. Home Page (`/`)

Features:

Keyword search with autocomplete
Grid/list view of keywords
Article count badges
Sentiment color indicators
Quick access to details

2. Search Page (`/search`)

Features:

Advanced filtering (date range, sentiment, source)
Full-text and semantic search toggle
Article preview cards
Pagination
Sort by date/sentiment/relevance

3. Keyword Detail Page (`/keywords/{id}`)

Features:

90-day sentiment timeline (Recharts)
Article list with sentiment badges
Keyword relationship mind map (React Flow)
Export data functionality
Share buttons

4. Suggest Page (`/suggest`) ✨ UPDATED

Features:

Multi-language keyword input (9 languages)
Category selection
Reason/justification textarea
Optional contact email
Success/error feedback
View submitted suggestions

5. Upload Page (`/upload`)

Features:

Drag & drop file upload (PDF/DOCX/TXT)
Progress indicator
Automatic processing
Sentiment analysis results
Keyword extraction display

6. Admin Sources Page (`/admin/sources`) ✨ NEW

Features:

List all 12 news sources
Enable/disable toggle switches
View ingestion statistics
Add new sources (modal form)
Edit source configuration
Delete sources (with confirmation)
Ingestion history charts

Requires: Admin authentication (HTTP Basic Auth)

7. Admin Suggestions Page (`/admin/suggestions`) ✨ NEW

Features:

View pending keyword suggestions
AI evaluation scores display
One-click approve/reject
Batch processing
Filter by status (pending/approved/rejected)
View evaluation history
Statistics dashboard:
- Total suggestions
- Approval rate
- Average AI scores
- Top categories

Requires: Admin authentication (HTTP Basic Auth)

8. Admin Search Page (`/admin/search`) ✨ NEW

Features:

Comprehensive search across ALL content types
Search in any of 9 European languages
Filter by type (keywords/articles/suggestions/sources/all)
Results grouped by category with visual icons
Direct navigation to detailed views
Syntax highlighting for matched terms
Results show:
- Keywords: All translations, category, popularity
- Articles: Title, summary, sentiment, source, language
- Suggestions: Status, votes, reason
- Sources: URL, language, country, enabled status
Real-time search with debouncing
Responsive design for mobile/tablet/desktop

Use Cases:

Quick system-wide content discovery
Audit all content in any language
Find articles from specific countries
Manage suggestions efficiently
Monitor source status

Requires: Admin authentication (HTTP Basic Auth)

🔄 Automation Features

Celery Background Tasks

1. News Scraping (`scrape_news`)

Schedule: Hourly (at :00) Function: Scrape articles from all enabled sources Process:

Query enabled news sources
For each source, use Gemini to find recent articles
Extract keywords using spaCy + Gemini
Generate embeddings
Analyze sentiment (VADER + Gemini)
Store in database
Record ingestion statistics

Task: app.tasks.scraping:scrape_news

2. Sentiment Aggregation (`aggregate_daily_sentiment`)

Schedule: Daily at 00:30 UTC Function: Pre-compute daily sentiment trends Process:

Group articles by keyword + date
Calculate weighted average sentiment
Count positive/negative/neutral articles
Identify top positive/negative sources
Store in sentiment_trends table

Benefits: 5ms queries vs 850ms raw scans

Task: app.tasks.sentiment_aggregation:aggregate_daily_sentiment

3. Keyword Suggestion Processing (`process_pending_suggestions`)

Schedule: Daily at 02:00 UTC Function: Batch AI evaluation of pending suggestions Process:

Query pending suggestions
Run AI evaluation (Gemini)
Calculate searchability & significance scores
Determine specificity level
Auto-approve high-scoring suggestions
Store evaluation history

Task: app.tasks.keyword_management:process_pending_suggestions

4. Keyword Performance Review (`review_keyword_performance`)

Schedule: Weekly (Monday 03:00 UTC) Function: Analyze keyword usage and suggest removal Process:

Identify keywords with no articles (>30 days)
Calculate search frequency
Determine popularity scores
Flag for admin review
Send summary report

Task: app.tasks.keyword_management:review_keyword_performance

5. Database Backup (`daily_database_backup`)

Schedule: Daily at 01:00 UTC Function: Automated PostgreSQL backup Process:

pg_dump full database
Compress (gzip)
Timestamp filename
Store in /backups directory
Verify backup integrity

Task: app.tasks.backup_tasks:daily_database_backup

6. Backup Cleanup (`cleanup_old_backups`)

Schedule: Daily at 04:00 UTC Function: Remove old backups (7-day retention) Task: app.tasks.backup_tasks:cleanup_old_backups

7. Database Health Check (`database_health_check`)

Schedule: Hourly Function: Monitor database health Checks:

Connection status
Disk space
Table sizes
Index health
Query performance

Task: app.tasks.backup_tasks:database_health_check

8. Keyword Queue Population (`populate_keyword_queue`)

Schedule: Every 30 minutes Function: Schedule keyword searches Process:

Find keywords due for search (3-hour cooldown)
Calculate priority based on popularity
Add to keyword_search_queue
Set scheduled_at timestamp

Task: app.tasks.keyword_search:populate_keyword_queue

9. Keyword Queue Processing (`process_keyword_queue`)

Schedule: Every 15 minutes Function: Execute scheduled searches Process:

Query queue for due searches
Execute news search (Gemini + scraping)
Update last_searched timestamp
Set next search time (current + 3 hours)
Remove from queue

Task: app.tasks.keyword_search:process_keyword_queue

🏗️ Infrastructure Features

Docker Orchestration

Services: 11 containers

PostgreSQL 16 (with pgvector)
Redis 7
Backend (FastAPI + Uvicorn)
Celery Worker
Celery Beat
Frontend (React + Vite dev server)
Nginx (reverse proxy)
Prometheus (metrics)
Grafana (dashboards)
Postgres Exporter (metrics)
Redis Exporter (metrics)

Monitoring Stack

Prometheus Metrics:

HTTP request rate/latency
Database query performance
Celery task execution time
Cache hit/miss rates
Error rates

Grafana Dashboards:

System overview
API performance
Database health
Celery tasks
Custom business metrics

Security Features

HTTPS/SSL: Let's Encrypt auto-renewal
Rate Limiting: Nginx (10 req/s API, 30 req/s general)
CORS: Configured origins only
Authentication: HTTP Basic Auth for admin
Input Validation: Pydantic models
SQL Injection: SQLAlchemy ORM (parameterized)
Security Headers: CSP, HSTS, X-Frame-Options

Backup & Recovery

Daily automated backups
7-day retention
Restore script: ./scripts/restore.sh
Health monitoring: ./scripts/health_check.sh

📊 Database Schema

12 Tables

keywords - Multi-language keyword tracking (9 language columns)
articles - News articles with 6 sentiment fields + embeddings
keyword_articles - Junction table with relevance scores
keyword_relations - Mind map relationship data
keyword_suggestions - User-submitted suggestions (9 language columns)
keyword_evaluations - AI evaluation history
keyword_search_queue - Scheduled search queue
documents - Uploaded file metadata
sentiment_trends - Daily pre-computed aggregations
comparative_sentiment - Multi-keyword comparisons
news_sources - Configurable source list
source_ingestion_history - Scraping statistics

Vector Embeddings

articles.embedding: vector(384)
keywords.embedding: vector(384)
pgvector extension: v0.8.1

Indexes (18 total)

Primary keys on all tables
Foreign keys with CASCADE delete
Sentiment, date, source indexes
Unique constraints on URLs
Composite indexes for queries

🎯 Use Cases

For Intelligence Analysts

✅ Track European media narrative shifts over time ✅ Identify sentiment trends by source/country ✅ Discover relationships between topics ✅ Export data for reports

For PR Teams

✅ Monitor brand sentiment in European media ✅ Identify favorable/critical outlets ✅ Track coverage volume and tone ✅ Compare against competitors

For Researchers

✅ Separate facts from opinions ✅ Analyze media bias patterns ✅ Track specific policy topics ✅ Access historical sentiment data

For News Organizations

✅ Aggregate European coverage on stories ✅ Identify trending topics ✅ Find similar articles ✅ Monitor competitor coverage

📈 Performance Metrics

Article Processing: 10,000/hour
Semantic Search: 50ms average (100K embeddings)
Timeline Query: 5ms (pre-aggregated)
API Response: <500ms p95
Embedding Generation: <100ms per article
Sentiment Analysis: <200ms per article (VADER + Gemini)

🔜 Roadmap Features

In Development

Email alerts for sentiment changes
Multi-keyword watchlists
PDF report generation
Data export (CSV/JSON/Excel)

Planned

Browser extension for quick saves
Mobile app (iOS/Android)
Webhook notifications
Advanced analytics dashboard

Document Version: 2.0 Last Updated: 2025-10-20 Maintained By: EU Intelligence Hub Team

FilesExpand file tree

FEATURES.md

Latest commit

History

FEATURES.md

File metadata and controls

🎯 Complete Feature List

EU Intelligence Hub - European News Intelligence Platform

📋 Table of Contents

🚀 Core Features

1. Multi-Language Keyword Tracking

2. Dual-Layer Sentiment Analysis

3. Vector Embedding Semantic Search

4. Intelligent Keyword Suggestion System

5. Smart Keyword Search Scheduling

6. News Source Management

🤖 AI/ML Features

7. Gemini AI Integration

8. Keyword Extraction Pipeline

9. Fact vs. Opinion Classification

📡 API Endpoints

Keyword Management (7 endpoints)

Sentiment Analysis (4 endpoints)

Semantic Search (4 endpoints) ✨ ENHANCED

Document Processing (1 endpoint)

Admin - Source Management (5 endpoints) ✨ ENHANCED

Admin - Keyword Approval (5 endpoints) ✨ NEW

Admin - Evaluation History (1 endpoint) ✨ NEW

🖥️ Frontend Features

8 Main Pages ✨ UPDATED

1. Home Page (/)

2. Search Page (/search)

3. Keyword Detail Page (/keywords/{id})

4. Suggest Page (/suggest) ✨ UPDATED

5. Upload Page (/upload)

6. Admin Sources Page (/admin/sources) ✨ NEW

7. Admin Suggestions Page (/admin/suggestions) ✨ NEW

8. Admin Search Page (/admin/search) ✨ NEW

🔄 Automation Features

Celery Background Tasks

1. News Scraping (scrape_news)

2. Sentiment Aggregation (aggregate_daily_sentiment)

3. Keyword Suggestion Processing (process_pending_suggestions)

4. Keyword Performance Review (review_keyword_performance)

5. Database Backup (daily_database_backup)

6. Backup Cleanup (cleanup_old_backups)

7. Database Health Check (database_health_check)

8. Keyword Queue Population (populate_keyword_queue)

9. Keyword Queue Processing (process_keyword_queue)

🏗️ Infrastructure Features

Docker Orchestration

Monitoring Stack

Security Features

Backup & Recovery

📊 Database Schema

12 Tables

Vector Embeddings

Indexes (18 total)

🎯 Use Cases

For Intelligence Analysts

For PR Teams

For Researchers

For News Organizations

📈 Performance Metrics

🔜 Roadmap Features

In Development

Planned

1. Home Page (`/`)

2. Search Page (`/search`)

3. Keyword Detail Page (`/keywords/{id}`)

4. Suggest Page (`/suggest`) ✨ UPDATED

5. Upload Page (`/upload`)

6. Admin Sources Page (`/admin/sources`) ✨ NEW

7. Admin Suggestions Page (`/admin/suggestions`) ✨ NEW

8. Admin Search Page (`/admin/search`) ✨ NEW

1. News Scraping (`scrape_news`)

2. Sentiment Aggregation (`aggregate_daily_sentiment`)

3. Keyword Suggestion Processing (`process_pending_suggestions`)

4. Keyword Performance Review (`review_keyword_performance`)

5. Database Backup (`daily_database_backup`)

6. Backup Cleanup (`cleanup_old_backups`)

7. Database Health Check (`database_health_check`)

8. Keyword Queue Population (`populate_keyword_queue`)

9. Keyword Queue Processing (`process_keyword_queue`)