Skip to content

Feat/gpu resource monitoring health dashboard#475

Open
sublime247 wants to merge 5 commits intoPulsefy:mainfrom
sublime247:feat/gpu-resource-monitoring-health-dashboard
Open

Feat/gpu resource monitoring health dashboard#475
sublime247 wants to merge 5 commits intoPulsefy:mainfrom
sublime247:feat/gpu-resource-monitoring-health-dashboard

Conversation

@sublime247
Copy link
Copy Markdown
Contributor

Closes #160

- Add AiMetricsService with Prometheus metrics for AI inference
  (request count, latency histogram/summary, error counter)
- Track model load times and concurrent inference count
- Monitor system RAM and GPU VRAM via periodic sampling (nvidia-smi)
- Implement graceful request throttling via AiThrottleGuard
  (concurrency limit, RAM threshold, VRAM threshold → 503 + Retry-After)
- Add AiMetricsInterceptor for automatic per-route inference latency logging
- Expose GET /ai/metrics (JSON health report with status/throttling/resources)
- Expose GET /ai/metrics/prometheus (Prometheus scraping endpoint)
- Expose GET /ai/metrics/health (liveness check, 200 or 503)
- Add AI-specific Prometheus alert rules (latency, error rate, RAM/VRAM,
  throttling, concurrency, model load time)
- Add AI recording rules for pre-computed metric aggregations
- Add Prometheus scrape job for /ai/metrics/prometheus
- Register AiMetricsModule globally in AppModule
- Add 29 unit tests covering service, controller, and guard
- Add AI_* env vars to .env.example
- Remove unnecessary async from onModuleInit (require-await)
- Use static import for child_process execSync (no-require-imports, no-unsafe-assignment)
- Type getRequest<Request>() and getResponse<Response>() (no-unsafe-member-access)
- Cast x-ai-model header to string (no-unsafe-argument)
- Update onModuleInit test to match sync signature
Copy link
Copy Markdown
Contributor

@Cedarich Cedarich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sublime247 Can you check this issue and confirm you're working on the right directory

@sublime247
Copy link
Copy Markdown
Contributor Author

Oohhhhh
Different repository🤦🏽

@Cedarich
Copy link
Copy Markdown
Contributor

Fix conflict

@Cedarich
Copy link
Copy Markdown
Contributor

@sublime247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributed Content Validation Network

2 participants