Skip to content

Commit 47bcfc1

Browse files
authored
Merge pull request #50 from coder/ssncferreira/prometheus_native_histograms
feat: enable native histograms in prometheus
2 parents f60876c + f472c18 commit 47bcfc1

File tree

6 files changed

+147
-6
lines changed

6 files changed

+147
-6
lines changed

README.gotmpl

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,73 @@ grafana:
215215
path: "/"
216216
```
217217

218+
### Prometheus
219+
220+
To access Prometheus, run:
221+
222+
```bash
223+
kubectl -n coder-observability port-forward svc/prometheus 9090:80
224+
```
225+
226+
And open your web browser to http://localhost:9090/graph.
227+
228+
#### Native Histograms
229+
230+
Native histograms are an **experimental** Prometheus feature that remove the need to predefine bucket boundaries and instead provide higher-resolution, adaptive buckets (see [Prometheus docs](https://prometheus.io/docs/specs/native_histograms/) for details).
231+
232+
Unlike classic histograms, which are sent in plain text, **native histograms require the protobuf protocol**.
233+
In addition to running Prometheus with native histogram support, since the Prometheus Helm chart is configured with remote write, the Grafana Agent must be configured to scrape and remote write using protobuf.
234+
Native histograms are **disabled by default**, but when you enable them globally, the Helm chart automatically updates the Grafana Agent configuration accordingly.
235+
236+
To enable native histograms, define this in your `values.yaml`:
237+
238+
```yaml
239+
global:
240+
telemetry:
241+
metrics:
242+
nativeHistograms: true
243+
244+
prometheus:
245+
server:
246+
extraFlags:
247+
- web.enable-lifecycle
248+
- enable-feature=remote-write-receiver
249+
- enable-feature=native-histograms
250+
```
251+
252+
After updating values, it might be required to restart the Grafana Agent so it picks up the new configuration:
253+
```bash
254+
kubectl -n coder-observability rollout restart daemonset/grafana-agent
255+
```
256+
257+
⚠️ **Important**: Classic and native histograms cannot be aggregated together.
258+
If you switch from classic to native histograms, dashboards may need to account for the transition. See [Prometheus migration guidelines](https://prometheus.io/docs/specs/native_histograms/#migration-considerations) for details.
259+
260+
<details>
261+
<summary>Validate Prometheus Native Histograms</summary>
262+
263+
1) Check Prometheus flags:
264+
265+
Open http://localhost:9090/flags and confirm that `--enable-feature` includes `native-histograms`.
266+
267+
2) Inspect histogram metrics:
268+
269+
* Classic histograms expose metrics with suffixes: `_bucket`, `_sum`, and `_count`.
270+
* Native histograms are exposed directly under the metric name.
271+
* Example: query `coderd_workspace_creation_duration_seconds` in http://localhost:9090/graph.
272+
273+
3) Check Grafana Agent (if remote write is enabled):
274+
275+
To confirm, run:
276+
```bash
277+
kubectl -n coder-observability port-forward svc/grafana-agent 3030:80
278+
```
279+
Then open http://localhost:3030:
280+
* scrape configurations defined in `prometheus.scrape.cadvisor`, should have `enable_protobuf_negotiation: true`
281+
* remote write configurations defined in `prometheus.remote_write.default` should have `send_native_histograms: true`
282+
283+
</details>
284+
218285
## Subcharts
219286

220287
{{ template "chart.requirementsTable" . }}

README.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,73 @@ grafana:
215215
path: "/"
216216
```
217217

218+
### Prometheus
219+
220+
To access Prometheus, run:
221+
222+
```bash
223+
kubectl -n coder-observability port-forward svc/prometheus 9090:80
224+
```
225+
226+
And open your web browser to http://localhost:9090/graph.
227+
228+
#### Native Histograms
229+
230+
Native histograms are an **experimental** Prometheus feature that remove the need to predefine bucket boundaries and instead provide higher-resolution, adaptive buckets (see [Prometheus docs](https://prometheus.io/docs/specs/native_histograms/) for details).
231+
232+
Unlike classic histograms, which are sent in plain text, **native histograms require the protobuf protocol**.
233+
In addition to running Prometheus with native histogram support, since the Prometheus Helm chart is configured with remote write, the Grafana Agent must be configured to scrape and remote write using protobuf.
234+
Native histograms are **disabled by default**, but when you enable them globally, the Helm chart automatically updates the Grafana Agent configuration accordingly.
235+
236+
To enable native histograms, define this in your `values.yaml`:
237+
238+
```yaml
239+
global:
240+
telemetry:
241+
metrics:
242+
nativeHistograms: true
243+
244+
prometheus:
245+
server:
246+
extraFlags:
247+
- web.enable-lifecycle
248+
- enable-feature=remote-write-receiver
249+
- enable-feature=native-histograms
250+
```
251+
252+
After updating values, it might be required to restart the Grafana Agent so it picks up the new configuration:
253+
```bash
254+
kubectl -n coder-observability rollout restart daemonset/grafana-agent
255+
```
256+
257+
⚠️ **Important**: Classic and native histograms cannot be aggregated together.
258+
If you switch from classic to native histograms, dashboards may need to account for the transition. See [Prometheus migration guidelines](https://prometheus.io/docs/specs/native_histograms/#migration-considerations) for details.
259+
260+
<details>
261+
<summary>Validate Prometheus Native Histograms</summary>
262+
263+
1) Check Prometheus flags:
264+
265+
Open http://localhost:9090/flags and confirm that `--enable-feature` includes `native-histograms`.
266+
267+
2) Inspect histogram metrics:
268+
269+
* Classic histograms expose metrics with suffixes: `_bucket`, `_sum`, and `_count`.
270+
* Native histograms are exposed directly under the metric name.
271+
* Example: query `coderd_workspace_creation_duration_seconds` in http://localhost:9090/graph.
272+
273+
3) Check Grafana Agent (if remote write is enabled):
274+
275+
To confirm, run:
276+
```bash
277+
kubectl -n coder-observability port-forward svc/grafana-agent 3030:80
278+
```
279+
Then open http://localhost:3030:
280+
* scrape configurations defined in `prometheus.scrape.cadvisor`, should have `enable_protobuf_negotiation: true`
281+
* remote write configurations defined in `prometheus.remote_write.default` should have `send_native_histograms: true`
282+
283+
</details>
284+
218285
## Subcharts
219286

220287
| Repository | Name | Version |
@@ -261,8 +328,9 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main
261328
| global.externalZone | string | `"svc.cluster.local"` | |
262329
| global.postgres | object | `{"alerts":{"groups":{"Basic":{"delay":"1m","enabled":true},"Connections":{"delay":"5m","enabled":true,"thresholds":{"critical":0.9,"notify":0.5,"warning":0.8}},"Notifications":{"delay":"15m","enabled":true,"thresholds":{"critical":0.9,"notify":0.5,"warning":0.8}}}},"database":"coder","exporter":{"image":"quay.io/prometheuscommunity/postgres-exporter"},"hostname":"localhost","mountSecret":"secret-postgres","password":null,"port":5432,"sslmode":"disable","sslrootcert":null,"username":"coder","volumeMounts":[],"volumes":[]}` | postgres connection information NOTE: these settings are global so we can parameterise some values which get rendered by subcharts |
263330
| global.postgres.alerts | object | `{"groups":{"Basic":{"delay":"1m","enabled":true},"Connections":{"delay":"5m","enabled":true,"thresholds":{"critical":0.9,"notify":0.5,"warning":0.8}},"Notifications":{"delay":"15m","enabled":true,"thresholds":{"critical":0.9,"notify":0.5,"warning":0.8}}}}` | alerts for postgres |
264-
| global.telemetry | object | `{"metrics":{"scrape_interval":"15s","scrape_timeout":"12s"},"profiling":{"delta_profiling_duration":"30s","scrape_interval":"60s","scrape_timeout":"70s"}}` | control telemetry collection |
265-
| global.telemetry.metrics | object | `{"scrape_interval":"15s","scrape_timeout":"12s"}` | control metric collection |
331+
| global.telemetry | object | `{"metrics":{"nativeHistograms":false,"scrape_interval":"15s","scrape_timeout":"12s"},"profiling":{"delta_profiling_duration":"30s","scrape_interval":"60s","scrape_timeout":"70s"}}` | control telemetry collection |
332+
| global.telemetry.metrics | object | `{"nativeHistograms":false,"scrape_interval":"15s","scrape_timeout":"12s"}` | control metric collection |
333+
| global.telemetry.metrics.nativeHistograms | bool | `false` | enable Prometheus native histograms or default to classic histograms |
266334
| global.telemetry.metrics.scrape_interval | string | `"15s"` | how often the collector will scrape discovered pods |
267335
| global.telemetry.metrics.scrape_timeout | string | `"12s"` | how long a request will be allowed to wait before being canceled |
268336
| global.telemetry.profiling.delta_profiling_duration | string | `"30s"` | duration of each pprof profiling capture, must be less than scrape_interval |

coder-observability/Chart.lock

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
dependencies:
22
- name: pyroscope
33
repository: https://grafana.github.io/helm-charts
4-
version: 1.14.1
4+
version: 1.14.2
55
- name: grafana
66
repository: https://grafana.github.io/helm-charts
77
version: 7.3.12
@@ -14,5 +14,5 @@ dependencies:
1414
- name: grafana-agent
1515
repository: https://grafana.github.io/helm-charts
1616
version: 0.37.0
17-
digest: sha256:5a5f27f74bbf34848da9c1bab508d3b33fda19789016c2eda9608dcd6373921d
18-
generated: "2025-08-04T13:28:59.433447595-05:00"
17+
digest: sha256:38b7d46261c4d39a103fbf61eac9da26a997024221ab81078ea5b34fc2b83c68
18+
generated: "2025-08-27T14:16:57.521541846Z"

coder-observability/templates/_collector-config.tpl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ prometheus.scrape "pods" {
230230
231231
scrape_interval = "{{ .Values.global.telemetry.metrics.scrape_interval }}"
232232
scrape_timeout = "{{ .Values.global.telemetry.metrics.scrape_timeout }}"
233+
enable_protobuf_negotiation = {{ .Values.global.telemetry.metrics.nativeHistograms | default false }}
233234
}
234235

235236
// These are metric_relabel_configs while discovery.relabel are relabel_configs.
@@ -301,6 +302,7 @@ prometheus.scrape "cadvisor" {
301302
bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
302303
scrape_interval = "{{ .Values.global.telemetry.metrics.scrape_interval }}"
303304
scrape_timeout = "{{ .Values.global.telemetry.metrics.scrape_timeout }}"
305+
enable_protobuf_negotiation = {{ .Values.global.telemetry.metrics.nativeHistograms | default false }}
304306
}
305307

306308
prometheus.relabel "cadvisor" {
@@ -346,6 +348,7 @@ prometheus.relabel "cadvisor" {
346348

347349
prometheus.remote_write "default" {
348350
endpoint {
351+
send_native_histograms = {{ .Values.global.telemetry.metrics.nativeHistograms | default false }}
349352
url ="http://{{ include "prometheus.server.fullname" .Subcharts.prometheus }}.{{ .Release.Namespace }}.{{ .Values.global.zone }}/api/v1/write"
350353

351354
// drop instance label which unnecessarily adds new series when pods are restarted, since pod IPs are dynamically assigned
@@ -396,6 +399,7 @@ prometheus.scrape "coder_metrics" {
396399

397400
forward_to = [prometheus.remote_write.default.receiver]
398401
scrape_interval = "{{ .scrapeInterval }}"
402+
enable_protobuf_negotiation = {{ .Values.global.telemetry.metrics.nativeHistograms | default false }}
399403
}
400404
{{- end }}
401405
{{- end }}

coder-observability/values.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ global:
113113
scrape_interval: 15s
114114
# global.telemetry.metrics.scrape_timeout -- how long a request will be allowed to wait before being canceled
115115
scrape_timeout: 12s
116+
# global.telemetry.metrics.nativeHistograms -- enable Prometheus native histograms or default to classic histograms
117+
nativeHistograms: false
116118
profiling:
117119
# global.telemetry.profiling.scrape_interval -- how often the collector will scrape pprof endpoints
118120
scrape_interval: 60s

0 commit comments

Comments
 (0)