Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- Only apply non-destructive PII rules to log bodies by default. ([#5272](https://github.com/getsentry/relay/pull/5272))
- Add `sentry.origin` attribute to OTLP spans. ([#5294](https://github.com/getsentry/relay/pull/5294))
- Normalize deprecated attribute names according to `sentry-conventions` for logs and V2 spans. ([#5257](https://github.com/getsentry/relay/pull/5257))

**Breaking Changes**:

Expand Down
3 changes: 2 additions & 1 deletion relay-conventions/src/consts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ convention_attributes!(
HTTP_TARGET => "http.target",
MESSAGING_SYSTEM => "messaging.system",
ENVIRONMENT => "sentry.environment",
OBSERVED_TIMESTAMP_NANOS => "sentry.observed_timestamp_nanos",
OBSERVED_TIMESTAMP_NANOS => "sentry._internal.observed_timestamp_nanos",
OBSERVED_TIMESTAMP_NANOS_INTERNAL => "sentry.observed_timestamp_nanos",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Timestamp Naming Confusion

The OBSERVED_TIMESTAMP_NANOS constants are confusingly named. OBSERVED_TIMESTAMP_NANOS maps to the internal attribute sentry._internal.observed_timestamp_nanos, while OBSERVED_TIMESTAMP_NANOS_INTERNAL maps to the public sentry.observed_timestamp_nanos. This swapped naming is counterintuitive and could lead to developer confusion and bugs.

Fix in Cursor Fix in Web

OP => "sentry.op",
ORIGIN => "sentry.origin",
PLATFORM => "sentry.platform",
Expand Down
211 changes: 198 additions & 13 deletions relay-event-normalization/src/eap/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@
use chrono::{DateTime, Utc};
use relay_common::time::UnixTimestamp;
use relay_conventions::{
BROWSER_NAME, BROWSER_VERSION, OBSERVED_TIMESTAMP_NANOS, USER_AGENT_ORIGINAL, USER_GEO_CITY,
USER_GEO_COUNTRY_CODE, USER_GEO_REGION, USER_GEO_SUBDIVISION,
AttributeInfo, BROWSER_NAME, BROWSER_VERSION, OBSERVED_TIMESTAMP_NANOS,
OBSERVED_TIMESTAMP_NANOS_INTERNAL, USER_AGENT_ORIGINAL, USER_GEO_CITY, USER_GEO_COUNTRY_CODE,
USER_GEO_REGION, USER_GEO_SUBDIVISION, WriteBehavior,
};
use relay_event_schema::protocol::{AttributeType, Attributes, BrowserContext, Geo};
use relay_protocol::{Annotated, ErrorKind, Value};
use relay_protocol::{Annotated, ErrorKind, Remark, RemarkType, Value};

use crate::{ClientHints, FromUserAgentInfo as _, RawUserAgentInfo};

Expand Down Expand Up @@ -64,14 +65,21 @@ pub fn normalize_attribute_types(attributes: &mut Annotated<Attributes>) {

/// Adds the `received` time to the attributes.
pub fn normalize_received(attributes: &mut Annotated<Attributes>, received: DateTime<Utc>) {
attributes
.get_or_insert_with(Default::default)
.insert_if_missing(OBSERVED_TIMESTAMP_NANOS, || {
received
.timestamp_nanos_opt()
.unwrap_or_else(|| UnixTimestamp::now().as_nanos() as i64)
.to_string()
});
let attributes = attributes.get_or_insert_with(Default::default);

attributes.insert_if_missing(OBSERVED_TIMESTAMP_NANOS, || {
received
.timestamp_nanos_opt()
.unwrap_or_else(|| UnixTimestamp::now().as_nanos() as i64)
.to_string()
});

attributes.insert_if_missing(OBSERVED_TIMESTAMP_NANOS_INTERNAL, || {
received
.timestamp_nanos_opt()
.unwrap_or_else(|| UnixTimestamp::now().as_nanos() as i64)
.to_string()
});
}

/// Normalizes the user agent/client information into [`Attributes`].
Expand Down Expand Up @@ -138,6 +146,58 @@ pub fn normalize_user_geo(
attributes.insert_if_missing(USER_GEO_REGION, || geo.region);
}

/// Normalizes deprecated attributes according to `sentry-conventions`.
///
/// Attributes with a status of `"normalize"` will be moved to their replacement name.
/// If there is already a value present under the replacement name, it will be left alone,
/// but the deprecated attribute is removed anyway.
///
/// Attributes with a status of `"backfill"` will be copied to their replacement name if the
/// replacement name is not present. In any case, the original name is left alone.
pub fn normalize_attribute_names(attributes: &mut Annotated<Attributes>) {
normalize_attribute_names_inner(attributes, relay_conventions::attribute_info)
}

fn normalize_attribute_names_inner(
attributes: &mut Annotated<Attributes>,
attribute_info: fn(&str) -> Option<&'static AttributeInfo>,
) {
let attributes = attributes.get_or_insert_with(Default::default);
let attribute_names: Vec<_> = attributes.keys().cloned().collect();

for name in attribute_names {
let Some(attribute_info) = attribute_info(&name) else {
continue;
};

match attribute_info.write_behavior {
WriteBehavior::CurrentName => continue,
WriteBehavior::NewName(new_name) => {
let Some(old_attribute) = attributes.get_raw_mut(&name) else {
continue;
};

let new_attribute = old_attribute.clone();
old_attribute.set_value(None);
old_attribute
.meta_mut()
.add_remark(Remark::new(RemarkType::Removed, "attribute.deprecated"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something better that we could add to this remark as the "rule ID"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to differentiate here on the remarks:

  • removed
  • moved (renamed)
  • removed, not moved (because new attribute already exists)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the difference between the first and the third?
  2. Which RemarkType would you use for the second?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between the first and the third?

  1. it's just a deletion
  2. it was supposed to be a rename, but the target already existed so it was removed. This rename does degrade to a remove, but to better understand what it is happening, I imagine a hint that something was supposed to be moved but it degraded to a remove is helpful.

Which RemarkType would you use for the second?

Removed fits closest imo, we can further differentiate on the rule_id imo. Not sure if it's worth introducing a rename, when that is essentially a just a remove.
Alternatively, we can also only mark the new value that it was renamed.


if !attributes.contains_key(new_name) {
attributes.insert_raw(new_name.to_owned(), new_attribute);
}
}
WriteBehavior::BothNames(new_name) => {
if !attributes.contains_key(new_name)
&& let Some(current_attribute) = attributes.get_raw(&name).cloned()
{
attributes.insert_raw(new_name.to_owned(), current_attribute);
}
}
}
}
}

#[cfg(test)]
mod tests {
use relay_protocol::SerializableAnnotated;
Expand All @@ -155,6 +215,10 @@ mod tests {

insta::assert_json_snapshot!(SerializableAnnotated(&attributes), @r#"
{
"sentry._internal.observed_timestamp_nanos": {
"type": "string",
"value": "1234201337"
},
"sentry.observed_timestamp_nanos": {
"type": "string",
"value": "1234201337"
Expand All @@ -167,6 +231,10 @@ mod tests {
fn test_normalize_received_existing() {
let mut attributes = Annotated::from_json(
r#"{
"sentry._internal.observed_timestamp_nanos": {
"type": "string",
"value": "111222333"
},
"sentry.observed_timestamp_nanos": {
"type": "string",
"value": "111222333"
Expand All @@ -180,14 +248,18 @@ mod tests {
DateTime::from_timestamp_nanos(1_234_201_337),
);

insta::assert_json_snapshot!(SerializableAnnotated(&attributes), @r#"
insta::assert_json_snapshot!(SerializableAnnotated(&attributes), @r###"
{
"sentry._internal.observed_timestamp_nanos": {
"type": "string",
"value": "111222333"
},
"sentry.observed_timestamp_nanos": {
"type": "string",
"value": "111222333"
}
}
"#);
"###);
}

#[test]
Expand Down Expand Up @@ -470,4 +542,117 @@ mod tests {
"#,
);
}

#[test]
fn test_normalize_attributes() {
fn mock_attribute_info(name: &str) -> Option<&'static AttributeInfo> {
use relay_conventions::Pii;

match name {
"replace.empty" => Some(&AttributeInfo {
write_behavior: WriteBehavior::NewName("replaced"),
pii: Pii::Maybe,
aliases: &["replaced"],
}),
"replace.existing" => Some(&AttributeInfo {
write_behavior: WriteBehavior::NewName("not.replaced"),
pii: Pii::Maybe,
aliases: &["not.replaced"],
}),
"backfill.empty" => Some(&AttributeInfo {
write_behavior: WriteBehavior::BothNames("backfilled"),
pii: Pii::Maybe,
aliases: &["backfilled"],
}),
"backfill.existing" => Some(&AttributeInfo {
write_behavior: WriteBehavior::BothNames("not.backfilled"),
pii: Pii::Maybe,
aliases: &["not.backfilled"],
}),
_ => None,
}
}

let mut attributes = Annotated::new(Attributes::from([
(
"replace.empty".to_owned(),
Annotated::new("Should be moved".to_owned().into()),
),
(
"replace.existing".to_owned(),
Annotated::new("Should be removed".to_owned().into()),
),
(
"not.replaced".to_owned(),
Annotated::new("Should be left alone".to_owned().into()),
),
(
"backfill.empty".to_owned(),
Annotated::new("Should be copied".to_owned().into()),
),
(
"backfill.existing".to_owned(),
Annotated::new("Should be left alone".to_owned().into()),
),
(
"not.backfilled".to_owned(),
Annotated::new("Should be left alone".to_owned().into()),
),
]));

normalize_attribute_names_inner(&mut attributes, mock_attribute_info);

insta::assert_json_snapshot!(SerializableAnnotated(&attributes), @r###"
{
"backfill.empty": {
"type": "string",
"value": "Should be copied"
},
"backfill.existing": {
"type": "string",
"value": "Should be left alone"
},
"backfilled": {
"type": "string",
"value": "Should be copied"
},
"not.backfilled": {
"type": "string",
"value": "Should be left alone"
},
"not.replaced": {
"type": "string",
"value": "Should be left alone"
},
"replace.empty": null,
"replace.existing": null,
"replaced": {
"type": "string",
"value": "Should be moved"
},
"_meta": {
"replace.empty": {
"": {
"rem": [
[
"attribute.deprecated",
"x"
]
]
}
},
"replace.existing": {
"": {
"rem": [
[
"attribute.deprecated",
"x"
]
]
}
}
}
}
"###);
}
}
31 changes: 31 additions & 0 deletions relay-event-schema/src/protocol/attributes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,24 @@ impl Attributes {
Some(&self.0.get(key)?.value()?.value.value)
}

/// Returns the annotated attribute with the given key.
pub fn get_raw<Q>(&self, key: &Q) -> Option<&Annotated<Attribute>>
where
String: Borrow<Q>,
Q: Ord + ?Sized,
{
self.0.get(key)
}

/// Mutably returns the annotated attribute with the given key.
pub fn get_raw_mut<Q>(&mut self, key: &Q) -> Option<&mut Annotated<Attribute>>
where
String: Borrow<Q>,
Q: Ord + ?Sized,
{
self.0.get_mut(key)
}

/// Inserts an attribute with the given value into this collection.
pub fn insert<K: Into<String>, V: Into<AttributeValue>>(&mut self, key: K, value: V) {
fn inner(slf: &mut Attributes, key: String, value: AttributeValue) {
Expand Down Expand Up @@ -296,6 +314,19 @@ impl Attributes {
) -> std::collections::btree_map::IterMut<'_, String, Annotated<Attribute>> {
self.0.iter_mut()
}

/// Returns an iterator over the keys in this collection.
pub fn keys(&self) -> std::collections::btree_map::Keys<'_, String, Annotated<Attribute>> {
self.0.keys()
}

/// Provides mutable access to an entry in this collection.
pub fn entry(
&mut self,
key: String,
) -> std::collections::btree_map::Entry<'_, String, Annotated<Attribute>> {
self.0.entry(key)
}
}

impl IntoIterator for Attributes {
Expand Down
3 changes: 2 additions & 1 deletion relay-server/src/processing/logs/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,10 @@ fn scrub_log(log: &mut Annotated<OurLog>, ctx: Context<'_>) -> Result<()> {

fn normalize_log(log: &mut Annotated<OurLog>, meta: &RequestMeta) -> Result<()> {
if let Some(log) = log.value_mut() {
eap::normalize_attribute_types(&mut log.attributes);
eap::normalize_attribute_names(&mut log.attributes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Deprecation Normalization Timing Issue

The call to eap::normalize_attribute_names() at line 113 runs before eap::normalize_received() at line 114. However, normalize_received() inserts the sentry.observed_timestamp_nanos attribute, which may be marked as deprecated in sentry-conventions. Since the deprecation normalization has already run, any attribute inserted by normalize_received() will not be subject to the deprecation normalization (renaming/backfilling), creating an inconsistency where deprecated attributes inserted by normalize_received() remain in their deprecated form. The deprecation normalization should run AFTER all attributes are populated.

Fix in Cursor Fix in Web

eap::normalize_received(&mut log.attributes, meta.received_at());
eap::normalize_user_agent(&mut log.attributes, meta.user_agent(), meta.client_hints());
eap::normalize_attribute_types(&mut log.attributes);
}

process_value(
Expand Down
2 changes: 1 addition & 1 deletion relay-server/src/processing/logs/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use crate::envelope::WithHeader;

/// Returns the calculated size of a [`OurLog`].
///
/// Unlike [`relay_ourlogs::calculate_size`], this access the already manifested byte size
/// Unlike [`relay_ourlogs::calculate_size`], this accesses the already manifested byte size
/// of the log, instead of calculating it.
///
/// When compiled with debug assertion the function asserts the presence of a manifested byte size.
Expand Down
3 changes: 2 additions & 1 deletion relay-server/src/processing/spans/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,10 @@ fn normalize_span(
// TODO: `validate_span()` (start/end timestamps)

if let Some(span) = span.value_mut() {
eap::normalize_attribute_types(&mut span.attributes);
eap::normalize_attribute_names(&mut span.attributes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Deprecation Normalization Timing Issue

The call to eap::normalize_attribute_names() at line 92 runs before eap::normalize_received() at line 93. However, normalize_received() inserts the sentry.observed_timestamp_nanos attribute, which may be marked as deprecated in sentry-conventions. Since the deprecation normalization has already run, any attribute inserted by normalize_received() will not be subject to the deprecation normalization (renaming/backfilling), creating an inconsistency where deprecated attributes inserted by normalize_received() remain in their deprecated form. The deprecation normalization should run AFTER all attributes are populated.

Fix in Cursor Fix in Web

eap::normalize_received(&mut span.attributes, meta.received_at());
eap::normalize_user_agent(&mut span.attributes, meta.user_agent(), meta.client_hints());
eap::normalize_attribute_types(&mut span.attributes);
eap::normalize_user_geo(&mut span.attributes, || {
meta.client_addr().and_then(|ip| geo_lookup.lookup(ip))
});
Expand Down
4 changes: 4 additions & 0 deletions tests/integration/test_nel.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ def test_nel_converted_to_logs(mini_sentry, relay):
"type": "string",
"value": time_within_delta(expect_resolution="ns"),
},
"sentry._internal.observed_timestamp_nanos": {
"type": "string",
"value": time_within_delta(expect_resolution="ns"),
},
},
"body": "The user agent successfully received a response, but it had a 500 status code",
"level": "warn",
Expand Down
Loading
Loading