Skip to content

Health Check and Status Middleware #330

@phertyameen

Description

@phertyameen

Labels: middleware, monitoring, devops, medium-priority

Description:

Implement health check endpoints and middleware to monitor application status, dependencies, and readiness for production traffic.

Requirements:

Create multiple health check endpoints:

  • /health (basic liveness check)
  • /health/ready (readiness check - can serve traffic)
  • /health/live (liveness check - application running)
  • /health/detailed (comprehensive status - admin only)

Check status of critical dependencies:

  • Database connectivity
  • Redis cache connection
  • External APIs (if critical)
  • File system access
  • Memory usage

Return appropriate HTTP status codes:

  • 200 OK: All systems healthy
  • 503 Service Unavailable: Critical failure
  • 207 Multi-Status: Partial degradation

Include version information and uptime
Support dependency health caching (avoid overwhelming checks)
Provide detailed error messages in degraded state
Integrate with orchestration tools (Kubernetes, Docker)
Log health check failures
Support graceful shutdown signaling

Acceptance Criteria:

  • Load balancers can determine instance health
  • Kubernetes uses health checks for pod management
  • Health checks complete in under 1 second
  • Critical dependency failures return 503
  • Non-critical failures return 200 with warnings
  • Detailed health check shows all dependency statuses
  • No excessive database/cache queries from health checks
  • Version and build information included
  • Uptime tracked and reported
  • Graceful shutdown support (return 503 when shutting down)

Endpoint Specifications:

GET /health

  • Simple liveness check
  • Returns 200 if application is running
  • No dependency checks
  • Used by load balancers

GET /health/ready

  • Readiness check
  • Verifies database and cache connectivity
  • Returns 200 if ready to serve traffic
  • Returns 503 if not ready

GET /health/live

  • Liveness check
  • Application process running check
  • Returns 200 if process alive
  • Used by Kubernetes liveness probe

GET /health/detailed (Admin only)

  • Comprehensive health status
  • All dependency statuses
  • Memory and CPU metrics
  • Version and build info
  • Uptime and request counts

Response Format:

{
  status: "healthy" | "degraded" | "unhealthy",
  version: "1.0.0",
  uptime: 3600 (seconds),
  timestamp: ISO8601,
  checks: {
    database: { status: "healthy", responseTime: 5 },
    redis: { status: "healthy", responseTime: 2 },
    memory: { status: "healthy", usage: 45% }
  }
}

Integration:

  • Docker HEALTHCHECK directive
  • Kubernetes liveness/readiness probes
  • Load balancer health checks
  • Monitoring systems (Datadog, New Relic)

Performance:

  • Cache health check results (30 seconds)
  • Async dependency checks
  • Timeout individual checks (5 seconds max)
  • Fail fast on critical failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions