Skip to content

ts: declare expected type of each allowlisted child metric#169750

Closed
jasonlmfong wants to merge 2 commits intocockroachdb:masterfrom
jasonlmfong:jf/child-ts-metrics-types
Closed

ts: declare expected type of each allowlisted child metric#169750
jasonlmfong wants to merge 2 commits intocockroachdb:masterfrom
jasonlmfong:jf/child-ts-metrics-types

Conversation

@jasonlmfong
Copy link
Copy Markdown
Member

@jasonlmfong jasonlmfong commented May 5, 2026

Previously, AllowedChildMetrics was a set of metric names with no information about what kind of Prometheus metric each one represents. recordChangefeedChildMetrics decided how to record each entry by attempting type assertions at runtime. This made the TSDB shape of an entry an emergent property of whatever Go type the metric happened to have at registration time -- a future implementation change (e.g. swapping a counter for a histogram, or replacing a gauge with a different aggmetric variant) would silently rewrite what gets written to TSDB.

This change declares the expected kind alongside the name. The recorder now dispatches on the declared class and verifies the runtime type matches before recording; a mismatch skips the metric rather than emitting it under the wrong shape. Split the two recording paths into recordHistogramChildren and recordScalarChildren to keep the dispatch readable.

The map is unexported and accessed via LookupChildMetricClass so the declared kind cannot be bypassed by callers iterating the map directly.

Resolves: #169179
Epic: none
Release note: None


this is stacked behind #169817

@jasonlmfong jasonlmfong requested review from a team as code owners May 5, 2026 15:05
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 5, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jasonlmfong jasonlmfong force-pushed the jf/child-ts-metrics-types branch from a809966 to eeb615f Compare May 5, 2026 15:05
Comment on lines +1165 to 1175
// hashLabels computes a stable hash of a label set. The zero-byte separator
// after each field prevents collisions where boundaries could otherwise shift
// (e.g. without it, {a="bc"} and {ab="c"} would both hash "abc").
func hashLabels(labels []*prometheusgo.LabelPair) uint64 {
h := fnv.New64a()
for _, label := range labels {
h.Write([]byte(label.GetName()))
h.Write(hashSep)
h.Write([]byte{0})
h.Write([]byte(label.GetValue()))
h.Write(hashSep)
h.Write([]byte{0})
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not incur allocation when inlined, so i cleaned it up

Copy link
Copy Markdown
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments:

  • this diff contains changes that are unrelated to the intent, you're refactoring code while changing its behavior which makes it tough to review
  • you can automate some pretty fine-grained commit management so, can you make this PR contain two commits: one that's a mechanical refactor and one that's a behavioral change? I think the behavior change is much smaller than the refactor in terms of the diff.
  • separately: if you're going to expand the lines of code and methods that recordChangefeedChildMetrics is going to "occupy" in recorder.go, I'd prefer for it to get its own file. This isn't really something I actually want to do, as it creates even more churn in the codebase for a feature that we really should delete in a year or two and makes backports challenging, but just consider for next time regarding how features expand within a file and create pollution.

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 5, 2026

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

Three unrelated cleanups:
- Drop the hashSep package-level variable in hashLabels and use a
  []byte{0} literal directly. The compiler does not allocate for that
  literal, so the hoist was unnecessary.
- Inline single-use value and metricName temporaries in the histogram
  emit path; they were computed only to be plugged into the next
  struct literal.
- Remove the metricName := name shadow in IsAllowedChildMetric. The
  function parameter is already a local string and can be reassigned
  in place.

No behavior change.

Epic: none
Release note: None
@jasonlmfong jasonlmfong force-pushed the jf/child-ts-metrics-types branch from eeb615f to 596efde Compare May 5, 2026 16:57
Previously, AllowedChildMetrics was a set of metric names with no
information about what kind of Prometheus metric each one represents.
recordChangefeedChildMetrics decided how to record each entry by
attempting type assertions at runtime. This made the TSDB shape of an
entry an emergent property of whatever Go type the metric happened to
have at registration time -- a future implementation change (e.g.
swapping a counter for a histogram, or replacing a gauge with a
different aggmetric variant) would silently rewrite what gets written
to TSDB.

This change declares the expected kind alongside the name. The
recorder now dispatches on the declared class and verifies the runtime
type matches before recording; a mismatch skips the metric rather than
emitting it under the wrong shape.

The map is unexported and accessed via LookupChildMetricClass so the
declared kind cannot be bypassed by callers iterating the map directly.

Resolves: cockroachdb#169179
Epic: none
Release note: None
@jasonlmfong jasonlmfong force-pushed the jf/child-ts-metrics-types branch from 596efde to 85bde5d Compare May 5, 2026 17:23
@jasonlmfong
Copy link
Copy Markdown
Member Author

I extracted the cleanups into a separate PR, also ok with not proceeding with these changes that increase the surface area for no real gain

@jasonlmfong jasonlmfong closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ts: per-child TSDB recorder can write wrong shape if a metric's implementation type drifts

3 participants