Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe Logger metadata should affect grouping #837

Open
ruslandoga opened this issue Dec 17, 2024 · 11 comments · May be fixed by #850
Open

Maybe Logger metadata should affect grouping #837

ruslandoga opened this issue Dec 17, 2024 · 11 comments · May be fixed by #850

Comments

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 17, 2024

Right now logged errors seem to be grouped by message only.

iex> Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 1")}})
iex> Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 2")}})
# 16:06:06.217 [warning] Event dropped due to being a duplicate of a previously-captured event.
:ok

Would it make sense to consider a hash of :sentry metadata as well?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 17, 2024
@whatyouhide
Copy link
Collaborator

Mmmm what do we do for errors, do you know? I haven't looked and won't have time til the end of the day.

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

@whatyouhide this is what we do:

  # Used to compare events for deduplication. See "Sentry.Dedupe".
  @doc false
  @spec hash(t()) :: non_neg_integer()
  def hash(%__MODULE__{} = event) do
    :erlang.phash2([
      event.exception,
      event.message,
      event.level,
      event.fingerprint
    ])
  end

Adding metadata to this sounds like a good idea to me.

@whatyouhide
Copy link
Collaborator

I’m not sure because metadata might contain fleeting data that gets updated on every call (imagine a timestamp) but that shouldn't affect deduplication. What do other SDKs do?

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

I’m not sure because metadata might contain fleeting data that gets updated on every call (imagine a timestamp) but that shouldn't affect deduplication. What do other SDKs do?

I'm not sure - @sl0thentr0py do you happen to know?

@whatyouhide
Copy link
Collaborator

@solnic you could probably grep through source code of at least the Python and Ruby SDKs because I remember finding this in there pretty easily.

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

@whatyouhide I did look for it in ruby and couldn't find it 😅 I'll give it another shot hah

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Dec 19, 2024

Maybe:

  • allow disabling deduplication for loggers (Sentry sinks can do rate-limiting per grouping if needed)
  • deduplicate at least on three (or so) the available fields (can be though of degrees of freedom in this scenario), e.g. if for a Logger.error only two of these are present, level (which is always error? so it's not really a degree of freedom) and message, we can pick additional optional field, e.g. metadata from the call, and we can use a probabilistic view of the metadata fields
  • replace exact hash matching with SimHash/MinHash and some similarity measure
  • we just make this issue a note in documentation and explain how to set a custom fingerprint on Logger.error calls

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 19, 2024
@sl0thentr0py
Copy link
Member

sl0thentr0py commented Jan 9, 2025

@solnic

You can either

  • not run the dedupe logic for logger messages at all
  • add the sentry metadata to the hash

either way is fine with me. In general, we have comprehensive server side grouping, so these SDK side dedupes are mainly for not sending the exact same payload twice.

@whatyouhide
Copy link
Collaborator

either way is fine with me. In general, we have comprehensive server side grouping, so these SDK side dedupes are mainly for not sending the exact same payload twice.

Since that's the case, I think it makes more sense to include the meta in the deduplication then, or do the deduplication at a later stage (so that logger messages just become Sentry.Events and we do deduplication there).

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Jan 13, 2025

I think it makes more sense to include the meta in the deduplication

I'd be happy to open a PR with this approach if this issue is not taken yet! :)

or do the deduplication at a later stage (so that logger messages just become Sentry.Events and we do deduplication there).

Isn't that what's already happening? As per #837 (comment)

  def hash(%__MODULE__{} = event) do
    # for a log event it's something like
    :erlang.phash2([
      _exception = nil,
      _message = "error from logs",
      _level = "error",
      _fingerprint = nil
    ])
  end

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 13, 2025
@whatyouhide
Copy link
Collaborator

Even for events the deduplication might be buggy here, because user/tags/extra should probably be factored in. So a PR is absolutely more than welcome! 💟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants