Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields #11239

Merged
merged 13 commits into from
Oct 1, 2024

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Sep 24, 2024

Proposed commit message

This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration.

This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally?

  1. Install firehose integration
  2. Install aws integration
  3. POST two documents using the dev tool into the metrics-aws.cloudwatch-default data stream to mimic metrics coming in from Firehose: POST metrics-aws.cloudwatch-default/_doc/
``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "7628c293-ba54-4ea4-9abe-d256976c1dbd", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_Success": { "count": 240, "sum": 240, "avg": 1, "max": 1, "min": 1 }, "aws.kinesis.metrics.GetRecords_Bytes": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Latency": { "count": 240, "sum": 1864, "avg": 7.766667, "max": 18, "min": 6 }, "aws.kinesis.metrics.GetRecords_IteratorAge": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Records": { "count": 240, "sum": 0, "avg": 0, "min": 0, "max": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ``` ``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "70222524-bdc0-47f4-b7f8-adb7c6998bc7", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_IteratorAgeMilliseconds": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.ReadProvisionedThroughputExceeded": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ```
4. With these two POST you should see success for both.
``` { "_index": ".ds-metrics-aws.kinesis-default-2024.09.25-000001", "_id": "kpqzKhZo8euJ-o_TAAABkiuFKMA", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 } ```
5. These two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. With adding the `aws.metrics_names_fingerprint` as TSDB dimension, we are able to ingest these two documents into ES. Screenshot 2024-09-25 at 5 09 40 PM

@andrewkroh andrewkroh added the Integration:awsfirehose Amazon Data Firehose label Sep 24, 2024
@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review September 24, 2024 22:02
@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner September 24, 2024 22:02
@kaiyan-sheng kaiyan-sheng self-assigned this Sep 24, 2024
@andrewkroh andrewkroh added bugfix Pull request that fixes a bug issue Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Sep 24, 2024
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want these as dimensions? These fields describe the conduit for the data, rather than anything related to the metrics themselves. So I would see them as metadata rather than dimensions.

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Sep 25, 2024

@axw For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped.
Screenshot 2024-09-24 at 8 35 50 PM

I'm still trying to test out if aws.firehose.arn needs to be a dimension with the use case of having same metrics ingesting from two different firehose streams. But I think this case the exportARN will be different so we should be ok.

@kaiyan-sheng
Copy link
Contributor Author

Testing with firehose integration assets, I was able to ingest documents with the same timestamp, dimension, namespace, accountID, exportARN and region but different aws.firehose.request_id.
But testing with both firehose and aws integration assets installed, I still get error when ingesting the second document, which is the one has the differerent aws.firehose.request_id:

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[kpqzKhZo8euJ-o_TAAABkicK60A][LI4OMA7GPJ422d7twNPxBkhU9sqRvtsl0QxVjz5Q9TriPoJxICdKr4x1LVlk@2024-09-25T02:38:00.000Z]: version conflict, document already exists (current version [1])",
        "index_uuid": "5r5AtNd8Qj-HMQkXPit4tw",
        "shard": "0",
        "index": ".ds-metrics-aws.kinesis-default-2024.09.25-000002"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[kpqzKhZo8euJ-o_TAAABkicK60A][LI4OMA7GPJ422d7twNPxBkhU9sqRvtsl0QxVjz5Q9TriPoJxICdKr4x1LVlk@2024-09-25T02:38:00.000Z]: version conflict, document already exists (current version [1])",
    "index_uuid": "5r5AtNd8Qj-HMQkXPit4tw",
    "shard": "0",
    "index": ".ds-metrics-aws.kinesis-default-2024.09.25-000002"
  },
  "status": 409
}

But adding aws.firehose.request_id as dimension into the AWS integration solves the problem. Do we need to add these fields in both integrations? Will look into it more tomorrow.

@kaiyan-sheng kaiyan-sheng requested a review from axw September 25, 2024 03:04
@axw
Copy link
Member

axw commented Sep 25, 2024

For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped.

Do we know why there are multiple requests with all the same dimensions? Are they retries, where one of them should get dropped?

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Sep 25, 2024

@axw These are multiple requests with the same dimensions BUT different metric data points. For example cpu.utilization and network.in will be in request number 1 and then network.out will be in request number 2. But in request number 2, there can be metric data points for other services/timestamp.

@axw
Copy link
Member

axw commented Sep 25, 2024

Also discussed on Slack. We're having to choose the best of a bad lot here:

  1. we could send each metric in its own document (see Challenge of one metric per document elasticsearch#91775), and make the metric name a dimension/identifying field for the time series
  2. we can continue grouping, but add the request ID to disambiguate

I don't like the second option (what this PR implements), but I think it's the most reasonable solution at the moment. The reason I don't like it is that it would make automatic rollups ineffective, without some knowledge that request_id is something to ignore in rollups. Not a problem for today, so we can kick that can down the road a bit longer.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny and I had a chat on Slack about this, as it's another issue related to elastic/elasticsearch#99123. Although we can technically work around it using the request ID, this will reduce the storage efficiency of TSDB.

It's still a workaround, and not ideal, but we should probably hash the metric names instead. The hash can then be included as a dimension. That has been done for the Prometheus integration, so I'd suggest following suit. I think we could do it in the integration ingest pipeline?

@kaiyan-sheng kaiyan-sheng changed the title Add aws.firehose.arn and aws.firehose.request_id fields Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields Sep 25, 2024
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Sep 25, 2024

🚀 Benchmarks report

Package aws_bedrock 👍(0) 💚(0) 💔(1)

Expand to view
Data stream Previous EPS New EPS Diff (%) Result
invocation 480.54 364.83 -115.71 (-24.08%) 💔

Package awsfirehose 👍(0) 💚(1) 💔(1)

Expand to view
Data stream Previous EPS New EPS Diff (%) Result
metrics 35714.29 10638.3 -25075.99 (-70.21%) 💔

To see the full report comment with /test benchmark fullreport

@kaiyan-sheng kaiyan-sheng marked this pull request as draft September 25, 2024 22:57
@kaiyan-sheng kaiyan-sheng added the Integration:aws_bedrock Amazon Bedrock label Sep 25, 2024
@kaiyan-sheng kaiyan-sheng requested a review from axw September 25, 2024 23:36
@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review September 25, 2024 23:44
@andrewkroh andrewkroh added Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Sep 25, 2024
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. Main issue is regarding the change in field access in routing rules - doesn't seem related to the main change?

@kaiyan-sheng kaiyan-sheng requested a review from axw September 26, 2024 21:28
Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as codeowner for aws_bedrock with one nit.

Some nits in other files.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of minor comments in addition to @efd6's

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
21.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @kaiyan-sheng

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kaiyan-sheng kaiyan-sheng merged commit d93fd8f into elastic:main Oct 1, 2024
4 of 5 checks passed
@kaiyan-sheng kaiyan-sheng deleted the add_fields branch October 1, 2024 15:57
@elastic-vault-github-plugin-prod

Package aws_bedrock - 0.10.0 containing this change is available at https://epr.elastic.co/search?package=aws_bedrock

@elastic-vault-github-plugin-prod

Package awsfirehose - 1.3.0 containing this change is available at https://epr.elastic.co/search?package=awsfirehose

harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 4, 2025
…ingerprint fields (elastic#11239)

This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration.
This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 5, 2025
…ingerprint fields (elastic#11239)

This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration.
This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:aws_bedrock Amazon Bedrock Integration:awsfirehose Amazon Data Firehose Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants