-
Notifications
You must be signed in to change notification settings - Fork 44
feat(dashmate): add prometheus service discovery labels #2818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2.2-dev
Are you sure you want to change the base?
Conversation
WalkthroughAdded Prometheus metrics discovery and scraping support to Dashmate services via Docker labels. Removed validation requiring gateway admin enablement for metrics. Added documentation for local Prometheus monitoring setup and updated service documentation to reflect automatic admin endpoint enablement when metrics are active. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e7aac7d to
4e1c115
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
packages/dashmate/test/unit/templates/envoyTemplate.spec.js (1)
20-22: Consider more robust YAML parsing for assertions.String matching with
include()is fragile and may fail if YAML formatting changes (indentation, line breaks, etc.). Consider parsing the YAML and asserting on the structure.+import yaml from 'js-yaml'; + describe('envoy template', () => { it('should render admin interface when metrics are enabled even if admin is disabled', () => { const getBaseConfig = getBaseConfigFactory(HomeDir.createTemp()); const config = getBaseConfig(); config.set('platform.gateway.metrics.enabled', true); config.set('platform.gateway.admin.enabled', false); const renderTemplate = renderTemplateFactory(); const renderServiceTemplates = renderServiceTemplatesFactory(renderTemplate); const renderedConfigs = renderServiceTemplates(config); const envoyConfig = renderedConfigs['platform/gateway/envoy.yaml']; + const parsed = yaml.load(envoyConfig); + + const adminCluster = parsed.static_resources.clusters.find(c => c.name === 'admin'); + expect(adminCluster).to.exist; + expect(adminCluster.load_assignment.endpoints[0].lb_endpoints[0].endpoint.address.socket_address.address).to.equal('127.0.0.1'); + expect(adminCluster.load_assignment.endpoints[0].lb_endpoints[0].endpoint.address.socket_address.port_value).to.equal(9901); - expect(envoyConfig).to.include('cluster_name: admin'); - expect(envoyConfig).to.include('address: 127.0.0.1'); - expect(envoyConfig).to.include('port_value: 9901'); }); });packages/dashmate/test/unit/templates/dynamicCompose.spec.js (2)
28-28: Clarify the purpose of checking for ':0'.The assertion
expect(rsDapiBlock).to.not.include(':0')is unclear. What specific configuration are you checking for? Consider adding a comment or using a more explicit assertion.If this is checking that no port mapping exists with port 0, consider making it explicit:
expect(rsDapiBlock).to.not.include('ports:'); - expect(rsDapiBlock).to.not.include(':0'); + // Verify no port mappings are present (no format like 'host:port:container') + expect(rsDapiBlock).to.not.match(/\d+:\d+:\d+/);
41-42: Verify port exposure format consistency.Line 41 checks for the full port mapping format, but line 42 checks for a port entry that might match other content. Consider whether both assertions are necessary or if line 42 should be more specific.
expect(rsDapiBlock).to.include('ports:\n - 127.0.0.1:29091:29091'); - expect(rsDapiBlock).to.include('- 29091'); + // The port should also be listed in the expose section + expect(rsDapiBlock).to.match(/expose:\s+- 29091/);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
.yarn/cache/fsevents-patch-19706e7e35-10.zipis excluded by!**/.yarn/**,!**/*.zippackages/dashmate/templates/dynamic-compose.yml.dotis excluded by!**/*.dotpackages/dashmate/templates/platform/gateway/envoy.yaml.dotis excluded by!**/*.dot
📒 Files selected for processing (8)
packages/dashmate/docker-compose.rate_limiter.metrics.yml(1 hunks)packages/dashmate/docker-compose.yml(4 hunks)packages/dashmate/docs/config/gateway.md(1 hunks)packages/dashmate/docs/services/gateway.md(1 hunks)packages/dashmate/docs/services/index.md(1 hunks)packages/dashmate/src/doctor/analyse/analyseConfigFactory.js(0 hunks)packages/dashmate/test/unit/templates/dynamicCompose.spec.js(1 hunks)packages/dashmate/test/unit/templates/envoyTemplate.spec.js(1 hunks)
💤 Files with no reviewable changes (1)
- packages/dashmate/src/doctor/analyse/analyseConfigFactory.js
🧰 Additional context used
📓 Path-based instructions (1)
packages/**/**/*.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
packages/**/**/*.{js,ts,jsx,tsx}: Adhere to ESLint with Airbnb/TypeScript configs for JS/TS code
Use camelCase for JS/TS variables and functions
Use PascalCase for JS/TS classes
Prefer kebab-case filenames within JS packages
Files:
packages/dashmate/test/unit/templates/envoyTemplate.spec.jspackages/dashmate/test/unit/templates/dynamicCompose.spec.js
🧬 Code graph analysis (2)
packages/dashmate/test/unit/templates/envoyTemplate.spec.js (3)
packages/dashmate/configs/defaults/getBaseConfigFactory.js (1)
getBaseConfigFactory(16-480)packages/dashmate/src/templates/renderTemplateFactory.js (1)
renderTemplateFactory(8-30)packages/dashmate/src/templates/renderServiceTemplatesFactory.js (1)
renderServiceTemplatesFactory(7-37)
packages/dashmate/test/unit/templates/dynamicCompose.spec.js (3)
packages/dashmate/configs/defaults/getBaseConfigFactory.js (1)
getBaseConfigFactory(16-480)packages/dashmate/src/templates/renderTemplateFactory.js (1)
renderTemplateFactory(8-30)packages/dashmate/src/templates/renderServiceTemplatesFactory.js (1)
renderServiceTemplatesFactory(7-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Docker images (RS-DAPI, rs-dapi, rs-dapi) / Build RS-DAPI image
- GitHub Check: Build Docker images (Drive, drive, drive-abci) / Build Drive image
- GitHub Check: Build Docker images (DAPI, dapi, dapi) / Build DAPI image
- GitHub Check: Build Docker images (Dashmate helper, dashmate-helper, dashmate-helper) / Build Dashmate helper image
- GitHub Check: Build JS packages / Build JS
- GitHub Check: Rust crates security audit
🔇 Additional comments (8)
packages/dashmate/docs/services/index.md (1)
122-122: LGTM! Clear documentation of the config name label.The addition clearly explains the purpose of the
org.dashmate.config.namelabel and provides useful context for multi-node scenarios where distinguishing instances is important.packages/dashmate/docs/config/gateway.md (1)
57-57: LGTM! Clear explanation of the admin/metrics interaction.The documentation clearly explains that the admin endpoint is automatically enabled for metrics proxying while clarifying the exposure behavior based on the admin setting.
packages/dashmate/docker-compose.yml (4)
57-60: LGTM! Prometheus labels added correctly to drive_abci.The labels follow Prometheus service discovery conventions and properly reference environment variables with appropriate defaults.
118-121: LGTM! Prometheus labels added correctly to drive_tenderdash.The labels are consistent with other services and properly configured.
207-210: LGTM! Prometheus labels added correctly to rs_dapi.The labels follow the same pattern as other services and are properly configured.
251-254: LGTM! Prometheus labels added correctly to gateway.The labels are consistent with other services and properly configured for Prometheus service discovery.
packages/dashmate/docker-compose.rate_limiter.metrics.yml (1)
15-18: LGTM! Prometheus labels added correctly to rate limiter metrics.The labels are consistent with the main docker-compose.yml changes and follow the same pattern for Prometheus service discovery.
packages/dashmate/docs/services/gateway.md (1)
264-264: LGTM! Clear documentation of admin/metrics behavior.The note clearly explains the automatic admin endpoint enablement and exposure behavior, consistent with the configuration documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/prometheus/docker-compose.yml (1)
5-5: Pin container image versions instead of usinglatesttag.Using
latestimage tags makes deployments non-reproducible and can introduce unexpected breaking changes when new image versions are released. For documentation examples, pinning specific versions ensures consistent behavior across different deployments and time periods.Apply this diff to pin specific versions:
- image: tecnativa/docker-socket-proxy:latest + image: tecnativa/docker-socket-proxy:0.5.0- image: prom/prometheus:latest + image: prom/prometheus:v2.52.0Consider documenting which versions were tested with this setup, or providing a note on how to find the latest stable versions.
Also applies to: 17-17
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/prometheus/README.md(1 hunks)docs/prometheus/docker-compose.yml(1 hunks)docs/prometheus/prometheus.yml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/prometheus/README.md
🧰 Additional context used
📓 Path-based instructions (1)
docs/**
📄 CodeRabbit inference engine (AGENTS.md)
Store project documentation in docs/
Files:
docs/prometheus/prometheus.ymldocs/prometheus/docker-compose.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Docker images (Drive, drive, drive-abci) / Build Drive image
- GitHub Check: Build Docker images (DAPI, dapi, dapi) / Build DAPI image
- GitHub Check: Build Docker images (RS-DAPI, rs-dapi, rs-dapi) / Build RS-DAPI image
- GitHub Check: Build Docker images (Dashmate helper, dashmate-helper, dashmate-helper) / Build Dashmate helper image
- GitHub Check: Build JS packages / Build JS
- GitHub Check: Rust crates security audit
🔇 Additional comments (5)
docs/prometheus/docker-compose.yml (2)
19-19: Clarifynetwork_mode: hostdesign decision.Using
network_mode: hostmakes the Prometheus container bind directly to the host network, which means Prometheus will listen on port 9080 across all host interfaces (not just the container's network). This can affect portability and security posture in multi-container environments.Verify this is the intended behavior; if Prometheus should only be accessible locally, consider using bridge networking with explicit port bindings like
ports: ["127.0.0.1:9080:9080"].
1-33: Good security practices for Docker socket access.The docker_socket_proxy setup correctly implements principle of least privilege by:
- Mounting the Docker socket read-only
- Setting minimal environment permissions (CONTAINERS=1, NETWORKS=1)
- Binding to localhost only (127.0.0.1:2375)
This is a solid foundation for local service discovery.
docs/prometheus/prometheus.yml (3)
20-24: Address replacement hardcodes127.0.0.1for all targets.The relabeling rule replaces the scrape address to always use
127.0.0.1:<port>. This assumes all services expose metrics on localhost, which may not hold for services configured with different network modes or multi-host setups.Verify that all Dashmate services are configured to be reachable on
127.0.0.1on their respectiveprometheus.io/portlabels. If some services expose metrics differently, you may need conditional relabeling or additional configuration.
14-30: Relabeling rules assume source labels are present.The relabeling rules for
prometheus.io/path,prometheus.io/port,org.dashmate.service_title, andorg.dashmate.config_nameuse regex-based extraction without defaults or fallback behavior. If any of these labels are missing from a container, the rules will silently fail to apply, potentially leaving targets misconfigured or unlabeled.Verify that all services intended for scraping include the required Docker labels, or add explicit relabeling rules with defaults (e.g., using
target_labelwithout source to set defaults).
1-30: Configuration structure is sound for Docker service discovery.The Prometheus configuration correctly:
- Uses Docker daemon at the docker_socket_proxy endpoint
- Implements opt-in scraping via
prometheus.io/scrape=truelabel filter- Maps container labels to Prometheus target labels for observability (
service,config)- Allows per-service customization of metrics path and port
This provides a good foundation for discovering and scraping metrics from Dashmate services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/dashmate/docs/prometheus/docker-compose.yml (2)
18-18: Clarify container naming convention.The container name
prometheus-rsdapiis unclear and inconsistent with the service focus on dashmate. Consider renaming toprometheus-dashmateor simplyprometheusfor clarity.Apply this diff to improve naming clarity:
prometheus: - container_name: prometheus-rsdapi + container_name: prometheus-dashmate
19-19: Document or reconsidernetwork_mode: hostsecurity implications.Using
network_mode: hostexposes Prometheus on all network interfaces (by default, port 9080). For a local development setup, this may be acceptable, but consider:
- Adding a note in accompanying documentation about security implications
- Alternatively, binding to
127.0.0.1:9080explicitly in the command args for defense-in-depthIf restricting to localhost is desired, you could adjust the command to bind explicitly:
command: - --config.file=/etc/prometheus/prometheus.yml - --storage.tsdb.path=/prometheus - --web.enable-lifecycle - - --web.listen-address=:9080 + - --web.listen-address=127.0.0.1:9080Alternatively, document this as a local-development-only setup in a README or comment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
packages/dashmate/docs/prometheus/README.md(1 hunks)packages/dashmate/docs/prometheus/docker-compose.yml(1 hunks)packages/dashmate/docs/prometheus/prometheus.yml(1 hunks)packages/dashmate/docs/services/index.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/dashmate/docs/prometheus/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/dashmate/docs/services/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Docker images (RS-DAPI, rs-dapi, rs-dapi) / Build RS-DAPI image
- GitHub Check: Build Docker images (Drive, drive, drive-abci) / Build Drive image
- GitHub Check: Build Docker images (Dashmate helper, dashmate-helper, dashmate-helper) / Build Dashmate helper image
- GitHub Check: Build JS packages / Build JS
- GitHub Check: Rust crates security audit
- GitHub Check: Determine changed packages
🔇 Additional comments (1)
packages/dashmate/docs/prometheus/prometheus.yml (1)
1-30: Configuration looks good for local development Docker SD setup.The Prometheus configuration correctly sets up Docker service discovery with appropriate relabeling rules to map container labels to scrape targets. The hardcoded
127.0.0.1:2375is suitable for the local socket proxy documented in the accompanyingdocker-compose.yml. The relabeling rules cleanly map the expected label structure (prometheus_io_, org.dashmate_) to Prometheus targets and labels.
| const problems = []; | ||
|
|
||
| if (config?.get('platform.enable')) { | ||
| // Gateway admin is disabled while metrics are enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case will never happen, see https://github.com/dashpay/platform/pull/2818/files#r2472878855
| port_value: 8081 | ||
| {{?}} | ||
| {{? it.platform.gateway.metrics.enabled && it.platform.gateway.admin.enabled }} | ||
| {{? it.platform.gateway.metrics.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you remove this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics without admin don't work.
Assuming the intention of an user who enabled metrics is to get them really working ;), I've changed logic so that when metrics are enabled, always enable admin automatically but don't expose it to public.
See line 546-550 below. When admin is not enabled, adminHost will be 127.0.0.1, what means it will listen on loopback of docker container (not docker host).
Issue being fixed or feature implemented
When testing rs-dapi, we want to see some metrics generated in prometheus format.
What was done?
Extended dashmate-generated Docker containers with labels that enable Prometheus metrics.
Automatically start envoy
adminendpoint when envoy metrics are enabled.How Has This Been Tested?
Local devnet
Breaking Changes
None
Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit
New Features
Documentation
Tests