Part of #459.
Problem
Only MCP middleware and the apigateway outbound transport are instrumented today.
No visibility into Trino query volume / latency, DataHub fetch performance, S3 ops,
OAuth token issuance and refresh outcomes, or database/sql connection pool
saturation. Diagnostics for the rest of the platform are still blind.
Acceptance criteria
New OTel instruments registered in pkg/observability/metrics.go:
- Trino
trino_queries_total{status, query_kind}
trino_query_duration_seconds{query_kind}
trino_bytes_scanned_total{query_kind}
- DataHub
datahub_requests_total{operation, status}
datahub_request_duration_seconds{operation}
- S3
s3_operations_total{operation, status}
s3_operation_duration_seconds{operation}
- OAuth
oauth_token_issuance_total{grant_type, status}
oauth_token_refresh_total{status}
oauth_token_refresh_duration_seconds
- DB pool (via OTel observable gauges that tap
(*sql.DB).Stats())
db_pool_open_connections{pool}
db_pool_in_use{pool}
db_pool_idle{pool}
db_pool_wait_count_total{pool}
db_pool_wait_duration_seconds_total{pool}
Plus:
- Status labels bounded:
ok, client_err, server_err, upstream_err.
- DB pool gauges register exactly once per platform start via
meter.RegisterCallback.
- Each toolkit records observations without changing public interfaces.
- Unit tests per instrument; integration test issues a query through each toolkit
and asserts the corresponding metric increments.
Implementation notes
- Trino: instrument the query execution provider in
pkg/query/trino/adapter.go.
- DataHub: instrument the semantic provider in
pkg/semantic/datahub/adapter.go.
- S3: instrument the S3 toolkit call sites.
- OAuth: instrument
pkg/oauth/server.go token endpoints.
- DB pool: add
RegisterDBPool(*sql.DB, name string) on *observability.Metrics, called once per managed DB handle from cmd/mcp-data-platform/main.go. The pool label distinguishes platform DB vs OAuth store vs audit store if separate.
- All label cardinality bounded by configured connections / operations.
Files
pkg/observability/metrics.go
pkg/query/trino/adapter.go
pkg/semantic/datahub/adapter.go
- S3 toolkit adapters
pkg/oauth/server.go
cmd/mcp-data-platform/main.go
Part of #459.
Problem
Only MCP middleware and the apigateway outbound transport are instrumented today.
No visibility into Trino query volume / latency, DataHub fetch performance, S3 ops,
OAuth token issuance and refresh outcomes, or
database/sqlconnection poolsaturation. Diagnostics for the rest of the platform are still blind.
Acceptance criteria
New OTel instruments registered in
pkg/observability/metrics.go:trino_queries_total{status, query_kind}trino_query_duration_seconds{query_kind}trino_bytes_scanned_total{query_kind}datahub_requests_total{operation, status}datahub_request_duration_seconds{operation}s3_operations_total{operation, status}s3_operation_duration_seconds{operation}oauth_token_issuance_total{grant_type, status}oauth_token_refresh_total{status}oauth_token_refresh_duration_seconds(*sql.DB).Stats())db_pool_open_connections{pool}db_pool_in_use{pool}db_pool_idle{pool}db_pool_wait_count_total{pool}db_pool_wait_duration_seconds_total{pool}Plus:
ok,client_err,server_err,upstream_err.meter.RegisterCallback.and asserts the corresponding metric increments.
Implementation notes
pkg/query/trino/adapter.go.pkg/semantic/datahub/adapter.go.pkg/oauth/server.gotoken endpoints.RegisterDBPool(*sql.DB, name string)on*observability.Metrics, called once per managed DB handle fromcmd/mcp-data-platform/main.go. Thepoollabel distinguishes platform DB vs OAuth store vs audit store if separate.Files
pkg/observability/metrics.gopkg/query/trino/adapter.gopkg/semantic/datahub/adapter.gopkg/oauth/server.gocmd/mcp-data-platform/main.go