Skip to content

feat(observability): toolkit and DB pool metrics (Trino, DataHub, S3, OAuth, database/sql) #461

@cjimti

Description

@cjimti

Part of #459.

Problem

Only MCP middleware and the apigateway outbound transport are instrumented today.
No visibility into Trino query volume / latency, DataHub fetch performance, S3 ops,
OAuth token issuance and refresh outcomes, or database/sql connection pool
saturation. Diagnostics for the rest of the platform are still blind.

Acceptance criteria

New OTel instruments registered in pkg/observability/metrics.go:

  • Trino
    • trino_queries_total{status, query_kind}
    • trino_query_duration_seconds{query_kind}
    • trino_bytes_scanned_total{query_kind}
  • DataHub
    • datahub_requests_total{operation, status}
    • datahub_request_duration_seconds{operation}
  • S3
    • s3_operations_total{operation, status}
    • s3_operation_duration_seconds{operation}
  • OAuth
    • oauth_token_issuance_total{grant_type, status}
    • oauth_token_refresh_total{status}
    • oauth_token_refresh_duration_seconds
  • DB pool (via OTel observable gauges that tap (*sql.DB).Stats())
    • db_pool_open_connections{pool}
    • db_pool_in_use{pool}
    • db_pool_idle{pool}
    • db_pool_wait_count_total{pool}
    • db_pool_wait_duration_seconds_total{pool}

Plus:

  • Status labels bounded: ok, client_err, server_err, upstream_err.
  • DB pool gauges register exactly once per platform start via meter.RegisterCallback.
  • Each toolkit records observations without changing public interfaces.
  • Unit tests per instrument; integration test issues a query through each toolkit
    and asserts the corresponding metric increments.

Implementation notes

  • Trino: instrument the query execution provider in pkg/query/trino/adapter.go.
  • DataHub: instrument the semantic provider in pkg/semantic/datahub/adapter.go.
  • S3: instrument the S3 toolkit call sites.
  • OAuth: instrument pkg/oauth/server.go token endpoints.
  • DB pool: add RegisterDBPool(*sql.DB, name string) on *observability.Metrics, called once per managed DB handle from cmd/mcp-data-platform/main.go. The pool label distinguishes platform DB vs OAuth store vs audit store if separate.
  • All label cardinality bounded by configured connections / operations.

Files

  • pkg/observability/metrics.go
  • pkg/query/trino/adapter.go
  • pkg/semantic/datahub/adapter.go
  • S3 toolkit adapters
  • pkg/oauth/server.go
  • cmd/mcp-data-platform/main.go

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions