-
Notifications
You must be signed in to change notification settings - Fork 41
Support for specifying log groups for subscription filters that exceed parameter value limit #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hello @pushred
we created a script for such purpose, it is already merged in it requires a yaml file for configuring the ARNs and IDs, similar to the ones you provide as parameters when publishing from SAR. It will be part of the |
this is indeed not possible when deploying from SAR: our SAR templates include a macro, that can be referenced only by literal name. basically we cannot create multiple macros for each deployment with dynamic names, since we could not use function required to compose the dynamic name when referencing the macros. trying to create the same macro after the first deployment will result in the failure of the following deployments |
@aspacca thank you for such a quick reply, and good timing with the addition of this script! I managed to run it successfully with a few tweaks:
I also had to have Docker running for SAM. With all that out of the way I successfully deployed a stack with a single log group. However once I added all 176 groups I am now encountering this error:
My publish config file is 19k. When this error is raised the temporary file no longer exists so I'm not able to simply resume the process. The script may need to be revised to use the S3 bucket deploy method. I'll review making that modification for my purposes but curious if this would also potentially happen for your version. |
There actually appears to be a bug in the script that is at least contributing to the file size issue. I commented out the - Effect: Allow
Action: logs:DescribeLogGroups
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:*:*
- Effect: Allow
Action: logs:DescribeLogStreams
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-a:log-stream:*
- Effect: Allow
Action: logs:DescribeLogGroups
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:*:*
- Effect: Allow
Action: logs:DescribeLogStreams
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-a:log-stream:*
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-b:log-stream:*
- Effect: Allow
Action: logs:DescribeLogGroups
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:*:*
- Effect: Allow
Action: logs:DescribeLogStreams
Resource:
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-a:log-stream:*
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-b:log-stream:*
- arn:aws:logs:us-east-1:123456789101:log-group:/aws/lambda/log-group-c:log-stream:* If I remove all of those instances I am still exceeding the template size limit at 60k but it is still far less. |
Well I worked around that, but hit another AWS limit that will probably force a much bigger workaround. With this many log groups my ElasticServerlessForwarderPolicy is ~22k but this exceeds a maximum policy size of 10,240 bytes. To workaround that I tried specifying a wildcard to match based on a log group name prefix, e.g. Effect: Allow
Action: `
Resource:
- arn:aws:logs:us-east-1: 123456789101:log-group:/aws/lambda/naming-prefix-*:log-stream:* It's unclear if that is supported but attempting to deploy I ran into either a side effect of that or another limit with failures to create
This appears to be due to the number of Are there any options remaining other than somehow deploying this fully outside of CloudFormation? Or a major refactor of our project to bring our function count way down. |
hi @pushred , thanks for your feedback
thanks for reporting it: I missed that. I will push a fix as soon as possible
thanks again, I will think if making it the default behaviour or an option
while can optimise the size of the policy to make it the smallest possibile for the number of given ARNs, in the end the final size can hit the limit from AWS. there's nothing we (or you) can do for this, but for deploying multiple lambdas so that the policy is split in different smaller ones |
if you provide a "glob" value directly as ARN in the publish config yaml, while this might produce (just assuming, I didn't check) a smaller policy that's working properly, it will fail to attach the cloudwatch logs groups as trigger of the forwarder |
I spotted the bug in an extra level of indentation |
I've decided to make it the default behaviour: could you please test script from the branch with the fixes? |
not changing this for the moment: we'll most likely provide instructions about the fact that but thanks anyway for rising the issue |
Confirmed that with the latest script the generated
Good to know.. there is no escaping the limits around this.
Ah right, I haven't worked with Python lately but we do use venv in all of our Python projects so that should work well.
So based on what you mentioned earlier re: multiple deployments, my understanding is that multiple CloudFormation ESF stacks isn't possible but there could be multiple Lambdas within the deployed stack over which the log groups can be spread. Given that ESF controls the stack, is this an enhancement that could be made anytime soon? Beyond the current project I am working on I'm also wondering about other Lambda-based services where we would like to ship logs. If everything must be routed through the same ESF stack we would need this capability regardless of how well we could decompose the project or otherwise reduce the # of functions. Multiple AWS accounts could be an option but I don't believe a viable one in our org. |
Thinking about this more, the prospect of having dozens of Lambdas and queues to handle our quantity of log groups across multiple environments is daunting. Our use case is probably better suited by Elastic Agent which I looked at that initially but somehow overlooked the Kibana Fleet API for enabling log group integrations programatically. Will be pursuing that for now. |
please, let me try to rephrase what are exactly your needs:
Did I summarise correctly? I assume the feature provided by each between Elastic Agent and the Elastic Serverless Forwarder, are enough for you and the choice is based only on the three points above. how Elastic Agent vs Elastic Serverless Forwarder compare for each of the points?
Finally I want to mention an aspect to consider in relation to scaling:
This is my knowledge, just looking at the information made available from AWS: if you are concerned about hitting any scaling limits in your choice, you should contact AWS in order to properly address how a solution based on Lambda and one on EC2 instances compare. |
It's a preference in our org for IaC via Terraform but some exceptions have been made for projects built with the Serverless framework, which is built over CloudFormation as well. So by extension we expected ESF to be granted a similar exception. But we also have instances of Fleet-managed Elastic Agent already in use. The agents thus far are containerized and run on the same server as the services that are sending telemetry.
This would be ideal as we otherwise have to somehow otherwise handle the stack splitting.
Correct, but we understand the hard AWS constraints that are preventing this from happening. Thanks for your comparison of Elastic Agent vs. ESF and the other notes re: scaling. This has all been helpful today and spawned some lengthier discussion on our requirements. I confess that I jumped into trying to use ESF without much review, following a colleague's earlier effort that identified it as a solution following a separate earlier POC effort using Functionbeat. After further reviewing our needs and the context of everything else running in our account we've concluded that getting logs from CloudWatch's API, as the Elastic Agent/Filebeat integration does, won't be feasible either due to hard limits on FilterLogEvents API requests. I think we may have been misguided in gravitating to Functionbeat and ESF for a solution to ingesting Lambda logs. This was partly due to their familiar deployment model. We associated the other solutions, e.g. Filebeat as being more suited to our EC2 instances. I see the value of ESF in simpler scenarios as something that is faster to deploy and cheaper to operate. Longer term I believe what we actually need is something like the Elastic APM AWS Lambda extension which would bypass the CloudFront layer altogether. The team behind that indeed added some support for collecting logs last year but it is still preliminary. The recently released AWS Lambda Telemetry API seems that it will further the possibilities and many of Elastic's competitors already have extension-based solutions for collecting logs, using an earlier iteration of that API. Unfortunately the project I am working on has a timeline that requires observability to be in place before such a solution will exist. So we are going to pursue building our own extension in order to write logs to S3 and then ingest them using some other Elastic solution. Thank you for all of your assistance and time! |
once you have the logs in S3, you can still use ESF for the ingestion (see this blog post, you can ignore the part about the integrations if you are ingesting your custom application logs). as far as I understand you have already multiple Lambdas in your stack, and you saw Functionbeat/ESF as a natural solution since they are Lambda as well: this is true just as far as you will use something that's based on a technology you already know, with all the benefits that come with it Elastic APM AWS Lambda extension added support for forwarding logs in v1.2.0: I think that would be the best solution for you ESF itself will switch to the extension from the Elastic APM Python Agent as soon as this issue will be addressed |
@aspacca any timeline for an apm aws lambda extension with functions running on the .net runtime? I reason I ask is because we have the same issue as @pushred and are looking for a elegant solution to push lambda logs to elastic cloud. Currently we have a hoe grown solution of splitting up function beat into various lambdas dividing the log groups up amongst those lambdas. However we would like to truly shift left with lambda extension and have this be a part of our terraform provisioning to allow "DEVS" to toggle if they want logging or not for there lambdas. |
@aspacca ah right! I forgot about S3 as a possible input. We plan to consolidate all of the logs in a single bucket so that would avoid these issues we're having around limits. We did see the addition of forwarding logs in the extension for 1.2.0 however it seems that the logs are only available in the APM UI and it doesn't appear that we can specify the datastream. But those are some assumptions based on what we saw in GitHub and the limited docs. |
@aspacca in further research I reviewed some of our org's other Serverless projects which currently send logs to Sumo Logic. I found that it's own Lambda solution for log collection somehow bypasses the need to create a trigger for each of the application Lambda function log groups. Instead their shipper Lambda is triggered by a subscription filter on their DLQ Lambda's log group. All of the Lambda functions also have a subscription filter pointing to it as a destination. When I view the shipper function however I do not see these filters as triggers, nor are there any resource policy statements for each log group. I'm not sure of the difference between a trigger and a subscription filter actually. Their stack has a single subscription filter that seems to be shared by all functions and permissions are similarly consolidated. It also addresses the problem of subscribing to log groups on an ongoing basis. With ESF I intended to handle this at deployment by re-deploying ESF with a list of the current log groups at deploy time. But Sumo's Log Group Connector handles this with a Lambda function that is triggered on CreateLogGroup events and configured with a log group name prefix and such to match groups to subscribe. It's a nice solution because we haven't had to touch it even as projects and their deployments have changed. I'm curious whether you're aware of their solution and why ESF has taken the approach it has in contrast? We're committed to using Elastic for all of the benefit of Kibana, especially as it relates to structured logging. But I'm expecting some questions around why Elastic requires a more complicated solution than what we already have in place for Sumo. |
a subscription filter is the way to trigger a lambda from an event in a cloudwatch logs log group, opposed to an event source that's used for sqs and kinesis. the Sumo Logic's shipper lambda has a subscription filter, by default, on the cloudwatch logs log group that's created when deploying the cloudformation template. you have then to send all your logs to that cloudwatch logs log group. this is something we decided explicitly to avoid: we'd like for the users to not change their existing inputs (either be cloudwatch logs, kinesis data stream etc etc). we ask for the ARNs of the inputs and we manage all the required permissions and settings at deployment type. compared to the Sumo Logic's solution, if you have existing cloudwatch logs log groups, they require you to manually add all the subscription filters and permissions on your own. the Sumo's Log Group Connector works only for newly created cloudwatch logs groups, but I can see its value for dealing automatically with new log groups without the need to update the lambda deployment. beware that you can achieve something similar to a "single cloudwatch log logs group with multiple forwarder destinations" with the Elastic Serverless Forwarder: it's just a matter to send the separated logs of each of your Lambda apps to a single log group but in a different and persistent log stream.
beware that if you want to ingest each logs to a different datastream, you have to create multiple s3 notifications to different sqs queues, each for every logs (identified by a prefix in the bucket or something similar), in order to be able to specify a different target, similar to the example above with cloudwatch logs as for the DLQ, you can check on the Forwarder documentation the details about error handling: https://www.elastic.co/guide/en/observability/current/aws-serverless-troubleshooting.html#_error_handling |
@pushred can I close this issue or do you need more information? |
@aspacca Sure you can close, we are currently testing ESF with Kinesis as an input, so far that is addressing our issue here. Thanks! |
@pushred I know you moved from Cloudwatch logs input to Kinesis input due to this issue #276 removed the calls to Now having any number of Cloudwatch logs input does not produce any policy at all, so no limit can be exceed because of that this might let you go back to Cloudwatch logs input and avoid the extra complexity you mentioned in #258 we'll release soon a new version |
Great news! Thanks for the update. We haven't gone to production yet so may be able to give this a shot. |
i just tried the latest via 62 log groups and ran into an issue `` `` |
Describe the enhancement:
The project where I am trying to use ESF currently has 176 log groups with more likely to come. Following the published docs for deploying ESF, all three of the options require the use of CloudFormation parameters to specify the ARNs for these groups. However with so many groups I quickly exceed the 4,096 byte limit of parameter values.
I'm not sure that there are any other possible ways I could provide the ARNs, given the type limitations of parameter values. Ideally I could pass an actual array/list, but CF does not appear to support this.
So given that, the enhancement I may need is a means of deploying ESF without using SAR/CloudFormation. The only options that I can see are:
"Sharding" ESF deployments. Multiple deploys that each handle a subset of the ARNs. This would currently require 5 stacks — not ideal.
Building my own CloudFormation stack. Essentially forking this project in order to bypass parameters and decomposing the stack and deploying it through some other means. Aside from the upfront cost of doing this I think it would impose a maintenance burden in staying aligned. And almost definitely unsupported.
Are there any other options?
Is option 1 the only viable option while AWS does not offer a solution?
Describe a specific use case for the enhancement or feature:
Projects with more than ~30 log groups.
The text was updated successfully, but these errors were encountered: