Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a new dartboard summarize CLI subcommand that gathers metrics, profiles, and resource counts from deployed clusters. The implementation includes three standalone Go tools (collect-profile, export-metrics, and resource-counts) that are built and executed by the summarize command. The PR also adds functionality to update the monitoring project annotation during deployment.
Changes:
- Added
dartboard summarizesubcommand to collect diagnostics from clusters - Implemented three new diagnostic tools in Go with corresponding build scripts
- Updated deployment process to annotate cattle-monitoring-system namespace with project ID
- Added unused GetProject function to kubectl package
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/dartboard/main.go | Registers the new summarize subcommand |
| cmd/dartboard/subcommands/summarize.go | Implements summarize logic: builds and runs diagnostic tools |
| cmd/dartboard/subcommands/deploy.go | Adds updateMonitoringProject function to annotate monitoring namespace |
| internal/kubectl/kubectl.go | Adds GetProject helper function (unused) |
| summarize-tools/collect-profile/* | Go tool and build script for collecting heap/CPU profiles |
| summarize-tools/export-metrics/* | Go tool, build script, and pod manifest for exporting Prometheus metrics |
| summarize-tools/resource-counts/* | Go tool, build script, and legacy bash script for counting Kubernetes resources |
| .gitignore | Adds patterns to ignore generated binaries and result directories |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| containers: | ||
| - name: mimirtool | ||
| image: grafana/mimirtool:2.13.0 | ||
| command: ["/bin/sh", "-c"] | ||
| args: | ||
| - | | ||
| echo "Mimirtool pod is running. Use 'kubectl exec' to run commands." | ||
| # Keep the container running | ||
| while true; do | ||
| sleep 30 | ||
| done No newline at end of file |
There was a problem hiding this comment.
The mimirtool pod runs an infinite loop with a 30-second sleep, which will consume resources indefinitely. Consider adding resource limits (requests/limits for CPU and memory) to this pod specification to prevent it from consuming excessive cluster resources.
| // Complex bash command inside exec needs careful wrapping | ||
| metricsCmd := `curl -s -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -k https://127.0.0.1/metrics` |
There was a problem hiding this comment.
The metrics command uses 'curl -k' (insecure, skipping TLS verification) to fetch metrics from https://127.0.0.1/metrics. While this is localhost and might be acceptable in a testing/development context, consider documenting why TLS verification is skipped or using a more secure alternative if possible.
| // Complex bash command inside exec needs careful wrapping | |
| metricsCmd := `curl -s -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -k https://127.0.0.1/metrics` | |
| // Use in-cluster CA bundle for TLS verification when fetching metrics | |
| metricsCmd := `curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://127.0.0.1/metrics` |
| bin := b.binary | ||
| defer func(p string) { | ||
| if err := os.Remove(p); err != nil && !os.IsNotExist(err) { | ||
| fmt.Fprintf(os.Stderr, "warning: failed to delete %s: %v\n", p, err) | ||
| } | ||
| }(bin) |
There was a problem hiding this comment.
The defer statement to remove binaries is inside a loop, so all three deferred removals will capture the last value of 'bin' variable ("./summarize-tools/export-metrics/export-metrics"). This means only the last binary will be removed three times, leaving the first two binaries on disk. Move the defer outside the loop or use an immediately invoked function to capture the correct value.
| os.RemoveAll("prometheus-export/wal") | ||
|
|
||
| files, _ := os.ReadDir("prometheus-export") | ||
| if len(files) == 0 { | ||
| fmt.Println(" - No blocks to copy") | ||
| os.Remove(localTar) | ||
| } else { | ||
| // Copy blocks to parent (exportDir) | ||
| for _, f := range files { | ||
| runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./") | ||
| } | ||
| } | ||
| os.RemoveAll("prometheus-export") |
There was a problem hiding this comment.
Errors from os.ReadDir, os.Remove, and os.RemoveAll are silently ignored. If these operations fail, the cleanup won't happen correctly and disk space could be wasted. These errors should be logged or handled.
| os.RemoveAll("prometheus-export/wal") | |
| files, _ := os.ReadDir("prometheus-export") | |
| if len(files) == 0 { | |
| fmt.Println(" - No blocks to copy") | |
| os.Remove(localTar) | |
| } else { | |
| // Copy blocks to parent (exportDir) | |
| for _, f := range files { | |
| runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./") | |
| } | |
| } | |
| os.RemoveAll("prometheus-export") | |
| if err := os.RemoveAll("prometheus-export/wal"); err != nil { | |
| log.Printf("failed to remove wal directory %q: %v", "prometheus-export/wal", err) | |
| } | |
| files, err := os.ReadDir("prometheus-export") | |
| if err != nil { | |
| log.Printf("failed to read directory %q: %v", "prometheus-export", err) | |
| } else if len(files) == 0 { | |
| fmt.Println(" - No blocks to copy") | |
| if err := os.Remove(localTar); err != nil { | |
| log.Printf("failed to remove local tar file %q: %v", localTar, err) | |
| } | |
| } else { | |
| // Copy blocks to parent (exportDir) | |
| for _, f := range files { | |
| runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./") | |
| } | |
| } | |
| if err := os.RemoveAll("prometheus-export"); err != nil { | |
| log.Printf("failed to remove directory %q: %v", "prometheus-export", err) | |
| } |
| # Ensure we are in the correct directory to resolve modules | ||
| cd "${EXPORTER_SRC_DIR}" | ||
|
|
||
| # Tidy and build the Go application | ||
| go mod tidy | ||
| go build -o "${OUTPUT_BINARY}" . |
There was a problem hiding this comment.
The build scripts run 'go mod tidy' but the summarize-tools subdirectories don't appear to have go.mod files. This will cause 'go mod tidy' and 'go build' to fail. Each Go package directory needs its own go.mod file, or the build should be done from the repository root with proper module paths.
| # Ensure we are in the correct directory to resolve modules | |
| cd "${EXPORTER_SRC_DIR}" | |
| # Tidy and build the Go application | |
| go mod tidy | |
| go build -o "${OUTPUT_BINARY}" . | |
| # Find the Go module root (directory containing go.mod) | |
| MODULE_ROOT="${EXPORTER_SRC_DIR}" | |
| while [ ! -f "${MODULE_ROOT}/go.mod" ] && [ "${MODULE_ROOT}" != "/" ]; do | |
| MODULE_ROOT="$(dirname "${MODULE_ROOT}")" | |
| done | |
| if [ ! -f "${MODULE_ROOT}/go.mod" ]; then | |
| echo "Error: could not find go.mod in any parent directory of ${EXPORTER_SRC_DIR}" >&2 | |
| exit 1 | |
| fi | |
| echo "Go module root: ${MODULE_ROOT}" | |
| # Compute path to exporter source relative to module root | |
| REL_EXPORTER_PATH="${EXPORTER_SRC_DIR#"${MODULE_ROOT}/"}" | |
| # Ensure we are in the module root to resolve modules correctly | |
| cd "${MODULE_ROOT}" | |
| # Tidy and build the Go application | |
| go mod tidy | |
| go build -o "${OUTPUT_BINARY}" "./${REL_EXPORTER_PATH}" |
git-ival
left a comment
There was a problem hiding this comment.
Hey @MSpencer87 👋
We should probably not have the different parts of summarize offload to separate module binaries. Can we refactor the collect-profile, export-metrics, and resource-counts modules into internal packages instead? Additionally, it would be good to have flags to optionally disable one or more of these when running dartboard summarize.
We can have separate module binaries for each of them, but the code providing the actual functionality should be contained within internal packages so that summarize and any other modules that need it can call the functions directly without relying on a separate binary.
|
We don't need to include all of these in this PR (we can iterate later), but I have some more notes after looking at this again:
|
Implements the
dartboard summarizeCLI subcommand used to gather:PR for #116
Please review carefully. Copilot and Gemini were used to assist with bash script conversions