Last Updated: 2025-11-17
- Core Features
- AI/ML Features
- API Endpoints
- Frontend Features
- Admin Features
- Automation Features
- Infrastructure Features
Status: ✅ Fully Implemented
Track keywords across 9 languages with automatic translations:
- 🇬🇧 English (keyword_en) - Primary
- 🇹🇭 Thai (keyword_th)
- 🇩🇪 German (keyword_de)
- 🇫🇷 French (keyword_fr)
- 🇪🇸 Spanish (keyword_es)
- 🇮🇹 Italian (keyword_it)
- 🇵🇱 Polish (keyword_pl)
- 🇸🇪 Swedish (keyword_sv)
- 🇳🇱 Dutch (keyword_nl)
How it works:
- Submit keyword in English
- AI automatically translates to other languages using context-aware translation
- Searches conducted across all language variants
Database Table: keywords with 9 language columns
Service: app.services.keyword_approval:KeywordApprovalService
Status: ✅ Fully Implemented
Two-stage sentiment analysis for speed + accuracy:
Stage 1: VADER Baseline (Fast)
- Lexicon-based sentiment scoring
- Real-time processing (<10ms per article)
- Fallback when Gemini unavailable
Stage 2: Gemini AI Enhancement (Accurate)
- Nuanced opinion detection
- Sarcasm and context understanding
- Confidence scoring (0.0 - 1.0)
- Subjectivity rating (0.0 - 1.0)
- Emotion breakdown: positive/negative/neutral
Output Fields:
{
"sentiment_overall": float, # -1.0 to 1.0
"sentiment_confidence": float, # 0.0 to 1.0
"sentiment_subjectivity": float, # 0.0 to 1.0
"emotion_positive": float, # 0.0 to 1.0
"emotion_negative": float, # 0.0 to 1.0
"emotion_neutral": float # 0.0 to 1.0
}Service: app.services.sentiment:SentimentAnalyzer
Fallback: Graceful degradation to VADER when Gemini fails
Status: ✅ Fully Implemented
Technology:
- Sentence Transformers (all-MiniLM-L6-v2 model)
- 384-dimensional embeddings
- PostgreSQL pgvector extension
- Cosine similarity search
Capabilities:
- Find conceptually similar articles (not just keyword matches)
- Semantic search: "tourism growth" finds "visitor increases"
- Article similarity detection (>0.7 threshold)
- 50ms average query time for 100K embeddings
Database: Vector columns in articles and keywords tables
Service: app.services.embeddings:EmbeddingGenerator
API: /api/search/semantic, /api/search/similar/{article_id}
Status: ✅ Fully Implemented
User-submitted keyword suggestions with AI evaluation:
Workflow:
- User submits keyword suggestion via
/suggestpage - AI evaluates significance (Gemini API)
- Admin reviews pending suggestions
- Auto-translation to 9 languages on approval
- Immediate news search triggered (bypassing 3-hour cooldown)
AI Evaluation Metrics:
- Searchability Score (0-100): How easy to find news articles
- Significance Score (0-100): Global/regional importance
- Specificity: "too_broad", "optimal", "too_specific"
- Decision: "approve", "reject", "needs_review"
- Reasoning: AI explanation of decision
Database Tables:
keyword_suggestions- User submissionskeyword_evaluations- AI evaluation history
Services:
app.services.keyword_approval:KeywordApprovalServiceapp.api.suggestions:routerapp.api.admin:router(approval/rejection)
Status: ✅ Fully Implemented
Prevents duplicate searches and manages API quota efficiently:
Features:
- 3-hour cooldown between searches for same keyword
- Priority queue system (0-100 scale)
- Retry mechanism (max 3 attempts)
- Scheduled search queue
Database Table: keyword_search_queue
Columns:
scheduled_at- When to execute searchpriority- Queue priority (0-100)attempts- Retry countmax_attempts- Failure thresholdlast_error- Error tracking
Celery Tasks:
populate_keyword_queue- Runs every 30 minutesprocess_keyword_queue- Runs every 15 minutes
Service: app.services.keyword_scheduler:KeywordScheduler
Status: ✅ Fully Implemented
Configurable news sources with metadata:
12 Pre-configured Sources:
- BBC News
- Reuters
- Deutsche Welle (DW)
- France 24
- Euronews
- The Guardian
- The Telegraph
- El País
- Le Monde
- Corriere della Sera
- Politico Europe
- EUobserver
Source Configuration:
- Name, base URL, language
- Country of origin
- Priority level (0-100)
- Parser type
- Tags for categorization
- Enable/disable toggle
Ingestion Tracking:
- Last run timestamp
- Articles ingested count
- Success/failure status
Database Tables:
news_sources- Source configurationsource_ingestion_history- Tracking
API Endpoints:
GET /admin/sources- List sourcesPOST /admin/sources- Add sourcePOST /admin/sources/{id}/toggle- Enable/disableGET /admin/sources/{id}/ingestion- View history
Status: ✅ Fully Implemented
Use Cases:
- Sentiment Analysis: Nuanced opinion detection
- News Discovery: AI-researched article finding
- Keyword Extraction: spaCy NER + Gemini validation
- Keyword Evaluation: Automatic suggestion scoring
- Translation: Context-aware multi-language
Rate Limiting:
- 30 calls per minute (configurable)
- Automatic retry with exponential backoff
- Fallback to deterministic methods
Error Handling:
- Graceful degradation
- Detailed error logging
- Quota tracking
Service: app.services.gemini_client:GeminiClient
Status: ✅ Fully Implemented
Two-Stage Extraction:
Stage 1: spaCy NER
- Named Entity Recognition
- Extracts: PERSON, ORG, GPE, LOC, EVENT
- Fast baseline extraction
Stage 2: Gemini Validation
- Confirms relevance
- Extracts additional keywords missed by spaCy
- Filters noise and irrelevant entities
Output: List of validated keywords with confidence scores
Service: app.services.keyword_extractor:KeywordExtractor
Status: ✅ Fully Implemented
Automatically classifies articles:
- fact: Objective reporting
- opinion: Editorial/commentary
- mixed: Contains both
Method: Gemini AI analysis of article content
Database: articles.classification column
Use Case: Filter news by objectivity level
GET /api/keywords/ # Search with pagination & filters
GET /api/keywords/{id} # Detailed keyword info
GET /api/keywords/{id}/articles # Related articles (sorted)
GET /api/keywords/{id}/relations # Mind map data
POST /api/suggestions/ # Submit keyword suggestion
GET /api/suggestions/ # List all suggestions
POST /api/suggestions/{id}/vote # Upvote suggestionGET /api/sentiment/keywords/{id}/sentiment # Overall stats
GET /api/sentiment/keywords/{id}/sentiment/timeline # Time-series (7/30/90 days)
GET /api/sentiment/keywords/compare # Multi-keyword comparison
GET /api/sentiment/articles/{id}/sentiment # Article-level analysisGET /api/search/articles # Full-text search
GET /api/search/semantic # Vector similarity
GET /api/search/similar/{article_id} # Find similar articles
GET /api/search/keywords/multilingual # Multi-language keyword search ✨ NEWNew Feature: Multilingual Keyword Search
- Search keywords across ALL 9 European languages simultaneously
- Query in any language (EN, TH, DE, FR, ES, IT, PL, SV, NL)
- Returns results showing which language matched
- Perfect for cross-language discovery
POST /api/documents/upload # Upload PDF/DOCX/TXTGET /admin/sources # List all news sources
POST /admin/sources # Add new source
POST /admin/sources/{id}/toggle # Enable/disable source
GET /admin/search # Comprehensive admin search ✨ NEW
GET /admin/sources/{id}/ingestion # View ingestion historyNew Feature: Admin Comprehensive Search
- Admin-only endpoint (requires HTTP Basic Auth)
- Searches across ALL content types simultaneously:
- Keywords (all 9 languages)
- Articles (all languages, title, summary, full_text)
- Keyword Suggestions (all languages + reason text)
- News Sources (name, country, language)
- Filter by specific type or search all at once
- Returns unified results grouped by type
- Perfect for managing and auditing the entire system
POST /admin/keywords/suggestions/{id}/process # AI evaluation
POST /admin/keywords/suggestions/{id}/approve # Approve + auto-translate + search
POST /admin/keywords/suggestions/{id}/reject # Reject suggestion
GET /admin/keywords/suggestions/pending # View pending
GET /admin/keywords/suggestions/stats # Dashboard statsGET /admin/suggestions/{id}/evaluations # View AI evaluation historyTotal: 30+ API endpoints Documentation: http://localhost:8000/docs (Swagger UI)
Features:
- Keyword search with autocomplete
- Grid/list view of keywords
- Article count badges
- Sentiment color indicators
- Quick access to details
Features:
- Advanced filtering (date range, sentiment, source)
- Full-text and semantic search toggle
- Article preview cards
- Pagination
- Sort by date/sentiment/relevance
Features:
- 90-day sentiment timeline (Recharts)
- Article list with sentiment badges
- Keyword relationship mind map (React Flow)
- Export data functionality
- Share buttons
Features:
- Multi-language keyword input (9 languages)
- Category selection
- Reason/justification textarea
- Optional contact email
- Success/error feedback
- View submitted suggestions
Features:
- Drag & drop file upload (PDF/DOCX/TXT)
- Progress indicator
- Automatic processing
- Sentiment analysis results
- Keyword extraction display
Features:
- List all 12 news sources
- Enable/disable toggle switches
- View ingestion statistics
- Add new sources (modal form)
- Edit source configuration
- Delete sources (with confirmation)
- Ingestion history charts
Requires: Admin authentication (HTTP Basic Auth)
Features:
- View pending keyword suggestions
- AI evaluation scores display
- One-click approve/reject
- Batch processing
- Filter by status (pending/approved/rejected)
- View evaluation history
- Statistics dashboard:
- Total suggestions
- Approval rate
- Average AI scores
- Top categories
Requires: Admin authentication (HTTP Basic Auth)
Features:
- Comprehensive search across ALL content types
- Search in any of 9 European languages
- Filter by type (keywords/articles/suggestions/sources/all)
- Results grouped by category with visual icons
- Direct navigation to detailed views
- Syntax highlighting for matched terms
- Results show:
- Keywords: All translations, category, popularity
- Articles: Title, summary, sentiment, source, language
- Suggestions: Status, votes, reason
- Sources: URL, language, country, enabled status
- Real-time search with debouncing
- Responsive design for mobile/tablet/desktop
Use Cases:
- Quick system-wide content discovery
- Audit all content in any language
- Find articles from specific countries
- Manage suggestions efficiently
- Monitor source status
Requires: Admin authentication (HTTP Basic Auth)
Schedule: Hourly (at :00) Function: Scrape articles from all enabled sources Process:
- Query enabled news sources
- For each source, use Gemini to find recent articles
- Extract keywords using spaCy + Gemini
- Generate embeddings
- Analyze sentiment (VADER + Gemini)
- Store in database
- Record ingestion statistics
Task: app.tasks.scraping:scrape_news
Schedule: Daily at 00:30 UTC Function: Pre-compute daily sentiment trends Process:
- Group articles by keyword + date
- Calculate weighted average sentiment
- Count positive/negative/neutral articles
- Identify top positive/negative sources
- Store in
sentiment_trendstable
Benefits: 5ms queries vs 850ms raw scans
Task: app.tasks.sentiment_aggregation:aggregate_daily_sentiment
Schedule: Daily at 02:00 UTC Function: Batch AI evaluation of pending suggestions Process:
- Query pending suggestions
- Run AI evaluation (Gemini)
- Calculate searchability & significance scores
- Determine specificity level
- Auto-approve high-scoring suggestions
- Store evaluation history
Task: app.tasks.keyword_management:process_pending_suggestions
Schedule: Weekly (Monday 03:00 UTC) Function: Analyze keyword usage and suggest removal Process:
- Identify keywords with no articles (>30 days)
- Calculate search frequency
- Determine popularity scores
- Flag for admin review
- Send summary report
Task: app.tasks.keyword_management:review_keyword_performance
Schedule: Daily at 01:00 UTC Function: Automated PostgreSQL backup Process:
- pg_dump full database
- Compress (gzip)
- Timestamp filename
- Store in
/backupsdirectory - Verify backup integrity
Task: app.tasks.backup_tasks:daily_database_backup
Schedule: Daily at 04:00 UTC
Function: Remove old backups (7-day retention)
Task: app.tasks.backup_tasks:cleanup_old_backups
Schedule: Hourly Function: Monitor database health Checks:
- Connection status
- Disk space
- Table sizes
- Index health
- Query performance
Task: app.tasks.backup_tasks:database_health_check
Schedule: Every 30 minutes Function: Schedule keyword searches Process:
- Find keywords due for search (3-hour cooldown)
- Calculate priority based on popularity
- Add to
keyword_search_queue - Set
scheduled_attimestamp
Task: app.tasks.keyword_search:populate_keyword_queue
Schedule: Every 15 minutes Function: Execute scheduled searches Process:
- Query queue for due searches
- Execute news search (Gemini + scraping)
- Update
last_searchedtimestamp - Set next search time (current + 3 hours)
- Remove from queue
Task: app.tasks.keyword_search:process_keyword_queue
Services: 11 containers
- PostgreSQL 16 (with pgvector)
- Redis 7
- Backend (FastAPI + Uvicorn)
- Celery Worker
- Celery Beat
- Frontend (React + Vite dev server)
- Nginx (reverse proxy)
- Prometheus (metrics)
- Grafana (dashboards)
- Postgres Exporter (metrics)
- Redis Exporter (metrics)
Prometheus Metrics:
- HTTP request rate/latency
- Database query performance
- Celery task execution time
- Cache hit/miss rates
- Error rates
Grafana Dashboards:
- System overview
- API performance
- Database health
- Celery tasks
- Custom business metrics
- HTTPS/SSL: Let's Encrypt auto-renewal
- Rate Limiting: Nginx (10 req/s API, 30 req/s general)
- CORS: Configured origins only
- Authentication: HTTP Basic Auth for admin
- Input Validation: Pydantic models
- SQL Injection: SQLAlchemy ORM (parameterized)
- Security Headers: CSP, HSTS, X-Frame-Options
- Daily automated backups
- 7-day retention
- Restore script:
./scripts/restore.sh - Health monitoring:
./scripts/health_check.sh
- keywords - Multi-language keyword tracking (9 language columns)
- articles - News articles with 6 sentiment fields + embeddings
- keyword_articles - Junction table with relevance scores
- keyword_relations - Mind map relationship data
- keyword_suggestions - User-submitted suggestions (9 language columns)
- keyword_evaluations - AI evaluation history
- keyword_search_queue - Scheduled search queue
- documents - Uploaded file metadata
- sentiment_trends - Daily pre-computed aggregations
- comparative_sentiment - Multi-keyword comparisons
- news_sources - Configurable source list
- source_ingestion_history - Scraping statistics
- articles.embedding: vector(384)
- keywords.embedding: vector(384)
- pgvector extension: v0.8.1
- Primary keys on all tables
- Foreign keys with CASCADE delete
- Sentiment, date, source indexes
- Unique constraints on URLs
- Composite indexes for queries
✅ Track European media narrative shifts over time ✅ Identify sentiment trends by source/country ✅ Discover relationships between topics ✅ Export data for reports
✅ Monitor brand sentiment in European media ✅ Identify favorable/critical outlets ✅ Track coverage volume and tone ✅ Compare against competitors
✅ Separate facts from opinions ✅ Analyze media bias patterns ✅ Track specific policy topics ✅ Access historical sentiment data
✅ Aggregate European coverage on stories ✅ Identify trending topics ✅ Find similar articles ✅ Monitor competitor coverage
- Article Processing: 10,000/hour
- Semantic Search: 50ms average (100K embeddings)
- Timeline Query: 5ms (pre-aggregated)
- API Response: <500ms p95
- Embedding Generation: <100ms per article
- Sentiment Analysis: <200ms per article (VADER + Gemini)
- Email alerts for sentiment changes
- Multi-keyword watchlists
- PDF report generation
- Data export (CSV/JSON/Excel)
- Browser extension for quick saves
- Mobile app (iOS/Android)
- Webhook notifications
- Advanced analytics dashboard
Document Version: 2.0 Last Updated: 2025-10-20 Maintained By: EU Intelligence Hub Team