-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields #11239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want these as dimensions? These fields describe the conduit for the data, rather than anything related to the metrics themselves. So I would see them as metadata rather than dimensions.
@axw For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped. I'm still trying to test out if aws.firehose.arn needs to be a dimension with the use case of having same metrics ingesting from two different firehose streams. But I think this case the exportARN will be different so we should be ok. |
Testing with firehose integration assets, I was able to ingest documents with the same timestamp, dimension, namespace, accountID, exportARN and region but different
But adding |
Do we know why there are multiple requests with all the same dimensions? Are they retries, where one of them should get dropped? |
@axw These are multiple requests with the same dimensions BUT different metric data points. For example |
Also discussed on Slack. We're having to choose the best of a bad lot here:
I don't like the second option (what this PR implements), but I think it's the most reasonable solution at the moment. The reason I don't like it is that it would make automatic rollups ineffective, without some knowledge that request_id is something to ignore in rollups. Not a problem for today, so we can kick that can down the road a bit longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny and I had a chat on Slack about this, as it's another issue related to elastic/elasticsearch#99123. Although we can technically work around it using the request ID, this will reduce the storage efficiency of TSDB.
It's still a workaround, and not ideal, but we should probably hash the metric names instead. The hash can then be included as a dimension. That has been done for the Prometheus integration, so I'd suggest following suit. I think we could do it in the integration ingest pipeline?
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
invocation |
480.54 | 364.83 | -115.71 (-24.08%) | 💔 |
Package awsfirehose
👍(0) 💚(1) 💔(1)
Expand to view
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
metrics |
35714.29 | 10638.3 | -25075.99 (-70.21%) | 💔 |
To see the full report comment with /test benchmark fullreport
…ws_bedrock packages
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks. Main issue is regarding the change in field access in routing rules - doesn't seem related to the main change?
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as codeowner for aws_bedrock with one nit.
Some nits in other files.
packages/aws/data_stream/apigateway_metrics/fields/package-fields.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple of minor comments in addition to @efd6's
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
|
💚 Build Succeeded
History
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Package aws_bedrock - 0.10.0 containing this change is available at https://epr.elastic.co/search?package=aws_bedrock |
Package awsfirehose - 1.3.0 containing this change is available at https://epr.elastic.co/search?package=awsfirehose |
…ingerprint fields (elastic#11239) This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration. This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.
…ingerprint fields (elastic#11239) This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration. This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.
Proposed commit message
This PR adds
aws.firehose.arn
andaws.firehose.request_id
field definitions for the firehose integration.This PR also adds
aws.metrics_names_fingerprint
which is a hash of the list of metric names exist in each document. This way we don't have to countrequest_id
as a dimension.Checklist
changelog.yml
file.How to test this PR locally?
metrics-aws.cloudwatch-default
data stream to mimic metrics coming in from Firehose:POST metrics-aws.cloudwatch-default/_doc/