Skip to content

Add dartboard summarize subcommand#157

Open
MSpencer87 wants to merge 5 commits intorancher:mainfrom
MSpencer87:feature-summarize
Open

Add dartboard summarize subcommand#157
MSpencer87 wants to merge 5 commits intorancher:mainfrom
MSpencer87:feature-summarize

Conversation

@MSpencer87
Copy link
Contributor

Implements the dartboard summarize CLI subcommand used to gather:

  • Metrics
  • Profiles
  • Resource Counts

PR for #116

Please review carefully. Copilot and Gemini were used to assist with bash script conversions

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a new dartboard summarize CLI subcommand that gathers metrics, profiles, and resource counts from deployed clusters. The implementation includes three standalone Go tools (collect-profile, export-metrics, and resource-counts) that are built and executed by the summarize command. The PR also adds functionality to update the monitoring project annotation during deployment.

Changes:

  • Added dartboard summarize subcommand to collect diagnostics from clusters
  • Implemented three new diagnostic tools in Go with corresponding build scripts
  • Updated deployment process to annotate cattle-monitoring-system namespace with project ID
  • Added unused GetProject function to kubectl package

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
cmd/dartboard/main.go Registers the new summarize subcommand
cmd/dartboard/subcommands/summarize.go Implements summarize logic: builds and runs diagnostic tools
cmd/dartboard/subcommands/deploy.go Adds updateMonitoringProject function to annotate monitoring namespace
internal/kubectl/kubectl.go Adds GetProject helper function (unused)
summarize-tools/collect-profile/* Go tool and build script for collecting heap/CPU profiles
summarize-tools/export-metrics/* Go tool, build script, and pod manifest for exporting Prometheus metrics
summarize-tools/resource-counts/* Go tool, build script, and legacy bash script for counting Kubernetes resources
.gitignore Adds patterns to ignore generated binaries and result directories

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +20
containers:
- name: mimirtool
image: grafana/mimirtool:2.13.0
command: ["/bin/sh", "-c"]
args:
- |
echo "Mimirtool pod is running. Use 'kubectl exec' to run commands."
# Keep the container running
while true; do
sleep 30
done No newline at end of file
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mimirtool pod runs an infinite loop with a 30-second sleep, which will consume resources indefinitely. Consider adding resource limits (requests/limits for CPU and memory) to this pod specification to prevent it from consuming excessive cluster resources.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +193
// Complex bash command inside exec needs careful wrapping
metricsCmd := `curl -s -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -k https://127.0.0.1/metrics`
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics command uses 'curl -k' (insecure, skipping TLS verification) to fetch metrics from https://127.0.0.1/metrics. While this is localhost and might be acceptable in a testing/development context, consider documenting why TLS verification is skipped or using a more secure alternative if possible.

Suggested change
// Complex bash command inside exec needs careful wrapping
metricsCmd := `curl -s -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -k https://127.0.0.1/metrics`
// Use in-cluster CA bundle for TLS verification when fetching metrics
metricsCmd := `curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://127.0.0.1/metrics`

Copilot uses AI. Check for mistakes.
Comment on lines 74 to 79
bin := b.binary
defer func(p string) {
if err := os.Remove(p); err != nil && !os.IsNotExist(err) {
fmt.Fprintf(os.Stderr, "warning: failed to delete %s: %v\n", p, err)
}
}(bin)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defer statement to remove binaries is inside a loop, so all three deferred removals will capture the last value of 'bin' variable ("./summarize-tools/export-metrics/export-metrics"). This means only the last binary will be removed three times, leaving the first two binaries on disk. Move the defer outside the loop or use an immediately invoked function to capture the correct value.

Copilot uses AI. Check for mistakes.
Comment on lines 136 to 148
os.RemoveAll("prometheus-export/wal")

files, _ := os.ReadDir("prometheus-export")
if len(files) == 0 {
fmt.Println(" - No blocks to copy")
os.Remove(localTar)
} else {
// Copy blocks to parent (exportDir)
for _, f := range files {
runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./")
}
}
os.RemoveAll("prometheus-export")
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from os.ReadDir, os.Remove, and os.RemoveAll are silently ignored. If these operations fail, the cleanup won't happen correctly and disk space could be wasted. These errors should be logged or handled.

Suggested change
os.RemoveAll("prometheus-export/wal")
files, _ := os.ReadDir("prometheus-export")
if len(files) == 0 {
fmt.Println(" - No blocks to copy")
os.Remove(localTar)
} else {
// Copy blocks to parent (exportDir)
for _, f := range files {
runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./")
}
}
os.RemoveAll("prometheus-export")
if err := os.RemoveAll("prometheus-export/wal"); err != nil {
log.Printf("failed to remove wal directory %q: %v", "prometheus-export/wal", err)
}
files, err := os.ReadDir("prometheus-export")
if err != nil {
log.Printf("failed to read directory %q: %v", "prometheus-export", err)
} else if len(files) == 0 {
fmt.Println(" - No blocks to copy")
if err := os.Remove(localTar); err != nil {
log.Printf("failed to remove local tar file %q: %v", localTar, err)
}
} else {
// Copy blocks to parent (exportDir)
for _, f := range files {
runCmd("cp", "-R", filepath.Join("prometheus-export", f.Name()), "./")
}
}
if err := os.RemoveAll("prometheus-export"); err != nil {
log.Printf("failed to remove directory %q: %v", "prometheus-export", err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +20
# Ensure we are in the correct directory to resolve modules
cd "${EXPORTER_SRC_DIR}"

# Tidy and build the Go application
go mod tidy
go build -o "${OUTPUT_BINARY}" .
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build scripts run 'go mod tidy' but the summarize-tools subdirectories don't appear to have go.mod files. This will cause 'go mod tidy' and 'go build' to fail. Each Go package directory needs its own go.mod file, or the build should be done from the repository root with proper module paths.

Suggested change
# Ensure we are in the correct directory to resolve modules
cd "${EXPORTER_SRC_DIR}"
# Tidy and build the Go application
go mod tidy
go build -o "${OUTPUT_BINARY}" .
# Find the Go module root (directory containing go.mod)
MODULE_ROOT="${EXPORTER_SRC_DIR}"
while [ ! -f "${MODULE_ROOT}/go.mod" ] && [ "${MODULE_ROOT}" != "/" ]; do
MODULE_ROOT="$(dirname "${MODULE_ROOT}")"
done
if [ ! -f "${MODULE_ROOT}/go.mod" ]; then
echo "Error: could not find go.mod in any parent directory of ${EXPORTER_SRC_DIR}" >&2
exit 1
fi
echo "Go module root: ${MODULE_ROOT}"
# Compute path to exporter source relative to module root
REL_EXPORTER_PATH="${EXPORTER_SRC_DIR#"${MODULE_ROOT}/"}"
# Ensure we are in the module root to resolve modules correctly
cd "${MODULE_ROOT}"
# Tidy and build the Go application
go mod tidy
go build -o "${OUTPUT_BINARY}" "./${REL_EXPORTER_PATH}"

Copilot uses AI. Check for mistakes.
Copy link
Member

@git-ival git-ival left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MSpencer87 👋
We should probably not have the different parts of summarize offload to separate module binaries. Can we refactor the collect-profile, export-metrics, and resource-counts modules into internal packages instead? Additionally, it would be good to have flags to optionally disable one or more of these when running dartboard summarize.

We can have separate module binaries for each of them, but the code providing the actual functionality should be contained within internal packages so that summarize and any other modules that need it can call the functions directly without relying on a separate binary.

@git-ival
Copy link
Member

git-ival commented Feb 5, 2026

We don't need to include all of these in this PR (we can iterate later), but I have some more notes after looking at this again:

  • Once this PR is merged, we will need to update the to-be-merged dartboard-choice jenkins job
  • We may want to make use of the standard tar package (https://pkg.go.dev/archive/tar)
  • Ideally the generated "summarize-results" directory would at least be prefixed with the tofu_workspace used in the dart file
    • It may even make sense to put the directory inside of the {terraform_workspace}_config/ directory that is generated by the tofu modules
  • Finally, it would be cool if copy/paste-able instructions for how to visualize/use the collected metrics and profiles was generated automatically and output to the std output
    • A very short how-to for importing the tsdb files into grafana/prometheus and viewing them
    • A very simple how-to for running go tool pprof with the different profiles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants