Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample rate config for dogstatsd #739

Merged
merged 6 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### Added
- Sampling and sampling probability configuration parameters added for
dogstatsd payloads.

## [0.20.0-rc4]
### Fixed
- Memory consumption stability issues in Dogstatsd payload generator
Expand Down
8 changes: 8 additions & 0 deletions lading_payload/src/block.rs
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@ impl Cache {
// TODO -- Validate user input for multivalue_pack_probability.
multivalue_pack_probability,
multivalue_count,
sampling,
sampling_probability,
kind_weights,
metric_weights,
value,
Expand All @@ -232,6 +234,8 @@ impl Cache {
*tags_per_msg,
*multivalue_count,
*multivalue_pack_probability,
*sampling,
*sampling_probability,
*kind_weights,
*metric_weights,
*value,
Expand Down Expand Up @@ -347,6 +351,8 @@ fn stream_inner(
// TODO -- Validate user input for multivalue_pack_probability.
multivalue_pack_probability,
multivalue_count,
sampling,
sampling_probability,
kind_weights,
metric_weights,
value,
Expand All @@ -364,6 +370,8 @@ fn stream_inner(
*tags_per_msg,
*multivalue_count,
*multivalue_pack_probability,
*sampling,
*sampling_probability,
*kind_weights,
*metric_weights,
*value,
Expand Down
35 changes: 31 additions & 4 deletions lading_payload/src/dogstatsd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ fn multivalue_pack_probability() -> f32 {
0.08
}

fn sampling() -> ConfRange<f32> {
ConfRange::Inclusive { min: 0.1, max: 1.0 }
}

fn sampling_probability() -> f32 {
0.5
}

/// Weights for `DogStatsD` kinds: metrics, events, service checks
///
/// Defines the relative probability of each kind of `DogStatsD` datagram.
Expand Down Expand Up @@ -258,6 +266,15 @@ pub struct Config {
#[serde(default = "multivalue_count")]
pub multivalue_count: ConfRange<u16>,

/// Sampling rate to report for metrics
GeorgeHahn marked this conversation as resolved.
Show resolved Hide resolved
#[serde(default = "sampling")]
pub sampling: ConfRange<f32>,
GeorgeHahn marked this conversation as resolved.
Show resolved Hide resolved

/// Probability between 0 and 1 that a given dogstatsd msg
GeorgeHahn marked this conversation as resolved.
Show resolved Hide resolved
/// will be reported as a sampled value.
GeorgeHahn marked this conversation as resolved.
Show resolved Hide resolved
#[serde(default = "sampling_probability")]
pub sampling_probability: f32,

/// Defines the relative probability of each kind of DogStatsD kinds of
/// payload.
#[serde(default)]
Expand Down Expand Up @@ -380,6 +397,8 @@ impl MemberGenerator {
tags_per_msg: ConfRange<u8>,
multivalue_count: ConfRange<u16>,
multivalue_pack_probability: f32,
sampling: ConfRange<f32>,
sampling_probability: f32,
kind_weights: KindWeights,
metric_weights: MetricWeights,
value_conf: ValueConf,
Expand Down Expand Up @@ -446,6 +465,8 @@ impl MemberGenerator {
name_length,
multivalue_count,
multivalue_pack_probability,
sampling,
sampling_probability,
&WeightedIndex::new(metric_choices)?,
small_strings,
&mut tags_generator,
Expand Down Expand Up @@ -536,6 +557,8 @@ impl DogStatsD {
tags_per_msg(),
multivalue_count(),
multivalue_pack_probability(),
sampling(),
sampling_probability(),
KindWeights::default(),
MetricWeights::default(),
value_config(),
Expand All @@ -559,6 +582,8 @@ impl DogStatsD {
tags_per_msg: ConfRange<u8>,
multivalue_count: ConfRange<u16>,
multivalue_pack_probability: f32,
sampling: ConfRange<f32>,
sampling_probability: f32,
kind_weights: KindWeights,
metric_weights: MetricWeights,
value_conf: ValueConf,
Expand All @@ -576,6 +601,8 @@ impl DogStatsD {
tags_per_msg,
multivalue_count,
multivalue_pack_probability,
sampling,
sampling_probability,
kind_weights,
metric_weights,
value_conf,
Expand Down Expand Up @@ -621,9 +648,9 @@ mod test {

use crate::{
dogstatsd::{
contexts, multivalue_count, multivalue_pack_probability, name_length,
service_check_names, tag_key_length, tag_value_length, tags_per_msg, value_config,
KindWeights, MetricWeights,
contexts, multivalue_count, multivalue_pack_probability, name_length, sampling,
sampling_probability, service_check_names, tag_key_length, tag_value_length,
tags_per_msg, value_config, KindWeights, MetricWeights,
},
DogStatsD, Serialize,
};
Expand All @@ -643,7 +670,7 @@ mod test {
let dogstatsd = DogStatsD::new(contexts(), service_check_names(),
name_length(), tag_key_length(),
tag_value_length(), tags_per_msg(),
multivalue_count(), multivalue_pack_probability, kind_weights,
multivalue_count(), multivalue_pack_probability, sampling(), sampling_probability(), kind_weights,
metric_weights, value_conf, &mut rng).unwrap();

let mut bytes = Vec::with_capacity(max_bytes);
Expand Down
21 changes: 21 additions & 0 deletions lading_payload/src/dogstatsd/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,27 @@ pub(crate) enum ZeroToOne {
Frac(u32),
}

#[derive(Debug, Clone, Copy)]
pub(crate) enum ZeroToOneError {
OutOfRange,
}

impl TryFrom<f32> for ZeroToOne {
type Error = ZeroToOneError;

fn try_from(value: f32) -> Result<Self, Self::Error> {
if (value - 1.0).abs() < f32::EPSILON {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value 1.0 is exactly representable in IEEE-754-conformant 32-bit floating-point arithmetic, so the epsilon check here should be unnecessary, though I appreciate the intent because it is usually good practice.

Ok(Self::One)
} else if !(0.0..=1.0).contains(&value) {
Err(ZeroToOneError::OutOfRange)
} else {
#[allow(clippy::cast_sign_loss)]
#[allow(clippy::cast_possible_truncation)]
Ok(Self::Frac((1.0 / value) as u32))
}
}
}

impl Distribution<ZeroToOne> for Standard {
fn sample<R>(&self, rng: &mut R) -> ZeroToOne
where
Expand Down
18 changes: 14 additions & 4 deletions lading_payload/src/dogstatsd/metric.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ pub(crate) struct MetricGenerator {
pub(crate) templates: Vec<template::Template>,
pub(crate) multivalue_count: ConfRange<u16>,
pub(crate) multivalue_pack_probability: f32,
pub(crate) sampling: ConfRange<f32>,
pub(crate) sampling_probability: f32,
pub(crate) num_value_generator: NumValueGenerator,
}

Expand All @@ -33,6 +35,8 @@ impl MetricGenerator {
name_length: ConfRange<u16>,
multivalue_count: ConfRange<u16>,
multivalue_pack_probability: f32,
sampling: ConfRange<f32>,
sampling_probability: f32,
metric_weights: &WeightedIndex<u16>,
container_ids: Vec<String>,
tags_generator: &mut common::tags::Generator,
Expand Down Expand Up @@ -68,6 +72,8 @@ impl MetricGenerator {
templates,
multivalue_count,
multivalue_pack_probability,
sampling,
sampling_probability,
num_value_generator: NumValueGenerator::new(value_conf),
}
}
Expand All @@ -85,11 +91,15 @@ impl<'a> Generator<'a> for MetricGenerator {
let template: &Template = self.templates.choose(&mut rng).unwrap();

let container_id = choose_or_not_ref(&mut rng, &self.container_ids).map(String::as_str);
// TODO sample_rate should be option and have a probability that determines if its present
// Mostly inconsequential for the Agent, for certain metric types the Agent
// applies some correction based on this value. Affects count and histogram computation.
// https://docs.datadoghq.com/metrics/custom_metrics/dogstatsd_metrics_submission/#sample-rates
let sample_rate = rng.gen();
let prob: f32 = OpenClosed01.sample(&mut rng);
let sample_rate = if prob < self.sampling_probability {
Comment on lines +95 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor pedantic nit: because floating-point arithmetic quantizes the range [0, 1], omitting zero here does have a small effect on the probability logic that would be material for small values of self.sampling_probability (e.g., f64::MIN_POSITIVE). Does this actually matter in practice? Probably not, unless our users are pedantic masochists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, I don't think the range needs to be changed unless we think someone would actually fuzz our system with an outrageously small sampling probability.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged, but I'd still like to understand this better. Would one of the other provided distributions be more correct? https://docs.rs/rand/latest/rand/distributions/struct.Standard.html#floating-point-implementation The Standard distribution stands out to me, but it seems like it would have the same issue on the opposite side.

let sample_rate = self.sampling.sample(&mut rng).clamp(0.0, 1.0);
let sample_rate = common::ZeroToOne::try_from(sample_rate).unwrap();
Some(sample_rate)
} else {
None
};

let mut values = Vec::with_capacity(self.multivalue_count.end() as usize);
let value: common::NumValue = self.num_value_generator.generate(&mut rng);
Expand Down
Loading