feat: add Prometheus metrics and Grafana dashboard for observability#162
feat: add Prometheus metrics and Grafana dashboard for observability#162rach-kanc wants to merge 5 commits into
Conversation
…itoring and Grafana dashboard integration
|
@rach-kanc is attempting to deploy a commit to the s3dfx-cyber's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR adds Prometheus metrics and Grafana monitoring: a metrics module with request counters, latency histograms, and detection counters; PrometheusMiddleware and increment_detection; analyzer and ingest services register the middleware, mount /metrics, and emit detection metrics; and a Grafana dashboard JSON visualizes request rate, p95 latency, and threat detections. ChangesPrometheus Metrics & Grafana Monitoring
Sequence DiagramsequenceDiagram
participant Client as Client/LoadBalancer
participant Ingest as Ingest Service
participant Analyzer as Analyzer Service
participant Middleware as PrometheusMiddleware
participant Metrics as Prometheus
participant Grafana as Grafana Dashboard
Client->>Analyzer: HTTP request
Analyzer->>Middleware: request enters
Middleware->>Analyzer: call_next(request)
Analyzer->>Metrics: increment_detection(analyzer, threat_type, verdict)
Middleware->>Metrics: record REQUEST_COUNT / REQUEST_LATENCY
Client->>Ingest: ingest LLM event
Ingest->>Ingest: quick_heuristic_check -> threat_type, verdict
Ingest->>Metrics: increment_detection(ingest, threat_type, verdict)
Metrics-->>Grafana: scrape /metrics (REQUEST_COUNT, REQUEST_LATENCY, DETECTION_COUNT)
Grafana->>Grafana: visualize request rate, p95 latency, threat detections
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/ingest/app.py`:
- Around line 324-327: The current fragile mapping derives threat_type from
risk_score using magic thresholds; instead modify quick_heuristic_check to
return both the numeric risk_score and a canonical threat_type (e.g.,
"prompt_injection", "jailbreak", "data_extraction", "none"), update the call
site in services/ingest/app.py to capture that returned threat_type (instead of
computing it from risk_score) and pass it into
increment_detection(service="ingest", threat_type=threat_type, verdict=verdict),
and remove the hardcoded nested ternary that inferred threat_type from
risk_score so the source of truth comes from quick_heuristic_check.
In `@services/utils/metrics.py`:
- Around line 52-57: The except block that updates REQUEST_COUNT and
REQUEST_LATENCY (using variables method, endpoint, start_time and labels
REQUEST_COUNT/REQUEST_LATENCY) currently uses "raise e" which loses the original
traceback; change that to a bare "raise" so the original exception context is
preserved after you increment the metrics (i.e., keep the metric updates and
then use "raise" instead of "raise e").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a03de50b-2a7f-47cb-a186-8b751c912c3e
📒 Files selected for processing (5)
monitoring/grafana/tenet_dashboard.jsonrequirements.txtservices/analyzer/app.pyservices/ingest/app.pyservices/utils/metrics.py
There was a problem hiding this comment.
2 issues found across 5 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
…red logging, and health monitoring
…or service observability
|
Just let me know what you need, or I can go ahead and kick off a review! ✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@S3DFX-CYBER Is my PR ready to merge ? All AI checks passed 👍 |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@rach-kanc unit tests are failing |
|
@S3DFX-CYBER fixed |
Summary
This PR instruments the
ingestandanalyzermicroservices with Prometheus metrics to enable real-time observability, performance monitoring, and threat detection tracking. It also provides a ready-to-use Grafana dashboard template for SOC-style visibility.Key Changes
prometheus_clienttorequirements.txt.services/utils/metrics.pywhich includes standard counters (REQUEST_COUNT,REQUEST_LATENCY,DETECTION_COUNT) and a customPrometheusMiddlewarefor tracking FastAPI request throughput and latency automatically./metricsendpoint, attached the middleware, and added detection incrementing logic based on heuristic verdicts./metricsendpoint, attached the middleware, and added detection counters to both synchronous requests and background event processing.monitoring/grafana/tenet_dashboard.jsonfor visualizing API request rates, endpoint latencies (p95), and the rate of different threat detections.Related Issue
Fixes #100
Type of Change
Summary by cubic
Adds Prometheus metrics to the
ingestandanalyzerservices, exposes/metrics, and ships a Grafana dashboard for request rate, latency, and threat detections. Hardensingestwith a Redis circuit breaker and health monitoring; updates unit tests for ingest heuristics and endpoints.New Features
PrometheusMiddleware;/metricsin both services viaprometheus_client.analyzer(API + background) andingestheuristics, labeled by service and threat type.monitoring/grafana/tenet_dashboard.json) for request rate, p95 latency, and detection rates.ingesthardening: Redis circuit breaker, structured logging, and health checks.prometheus-client.ingestendpoints, heuristics (includingthreat_type), and validation.Migration
/metricsoningestandanalyzer.Written for commit 58b0e81. Summary will update on new commits.
Summary by CodeRabbit